4e010bc048
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
Implement full probabilistic signal processing pipeline gated behind probabilistic_scoring_enabled feature flag in risk_configs: - Bayesian log-likelihood accumulator with Beta posterior and entropy - Regime detector (trend-following, panic, mean-reversion, uncertainty) - Source accuracy tracker with per-source historical prediction accuracy - Sigmoid confidence gate replacing binary gate - Information gain surprise weighting for rare events - Adaptive recency decay with event-specific half-lives - Regime multiplier replacing market context multiplier - Weighted disagreement entropy for contradiction detection - Multiplicative macro exposure with conditional integration - Graph-distance attenuated competitive signal propagation - Exponentially weighted momentum with volatility scaling - Expected value recommendation gate All changes backward-compatible: flag=false preserves exact current behavior. New outputs stored in existing JSONB columns (no schema changes except source_accuracy table via migration 034). Tests: 26 property-based tests (14 correctness properties), 99 unit tests, 1789 total tests passing with zero regressions.
294 lines
30 KiB
Markdown
294 lines
30 KiB
Markdown
# Requirements Document — Signal Math Upgrade
|
||
|
||
## Introduction
|
||
|
||
The Stonks Oracle platform uses a three-layer signal aggregation engine (company-specific, macro, competitive) to produce market intelligence and drive paper-trading decisions. The current mathematical models are structurally too deterministic and too linear for a market system that is fundamentally probabilistic, regime-dependent, and nonlinear. The pipeline behaves as weighted sentiment aggregation with heuristics rather than a probabilistic forecasting engine.
|
||
|
||
This feature upgrades the signal processing mathematics across all pipeline stages — from signal scoring through trend assembly, macro impact, competitive signals, trend projection, and recommendation generation — to replace heuristic formulas with probabilistic, regime-aware, and adaptive alternatives. The goal is to transform prediction quality while preserving the existing `WeightedSignal` abstraction, three-layer architecture, and database schema compatibility.
|
||
|
||
## Glossary
|
||
|
||
- **Aggregation_Engine**: The core pipeline in `services/aggregation/worker.py` that merges signals from all three layers and computes `TrendSummary` objects across five time windows.
|
||
- **Signal_Scorer**: The scoring module in `services/aggregation/scoring.py` that transforms raw intelligence records into `WeightedSignal` objects with composite aggregation weights.
|
||
- **Trend_Assembler**: The component in `services/aggregation/worker.py` that derives trend direction, strength, confidence, and contradiction from merged weighted signals.
|
||
- **Macro_Scorer**: The macro impact scoring module in `services/aggregation/interpolation.py` that computes per-company impact from global events using overlap-based exposure profiles.
|
||
- **Competitive_Scorer**: The competitive signal modules in `services/aggregation/pattern_matcher.py` and `services/aggregation/signal_propagation.py` that mine historical patterns and propagate cross-company signals.
|
||
- **Projection_Engine**: The trend projection module in `services/aggregation/projection.py` that computes forward-looking trend estimates from momentum and macro decay.
|
||
- **Recommendation_Engine**: The recommendation pipeline in `services/recommendation/` that translates trend assessments into actionable buy/sell/hold/watch decisions with position sizing.
|
||
- **WeightedSignal**: The core data abstraction pairing a document reference with a composite aggregation weight, sentiment value, and impact score.
|
||
- **Beta_Distribution**: A probability distribution on [0, 1] parameterized by α and β, used to model the posterior probability of bullish vs bearish sentiment.
|
||
- **Regime_Detector**: A new component that classifies the current market regime (trend-following, panic, mean-reversion, uncertainty) from price and volume statistics.
|
||
- **Sigmoid_Function**: The logistic function σ(x) = 1/(1+e^(-x)) used to convert log-likelihood accumulations into probabilities.
|
||
- **Adaptive_Decay**: A recency decay mechanism where the half-life varies per signal based on event impact, surprise, and market reaction rather than using a fixed constant per window.
|
||
- **Information_Gain**: A measure of how surprising an event is relative to its base rate, computed as -log P(event_type), used to weight novel signals more heavily.
|
||
- **Entropy**: Shannon entropy H = -p·log(p) - (1-p)·log(1-p), used to detect mixed sentiment states where the probability distribution is spread rather than concentrated.
|
||
- **EMA**: Exponential Moving Average, a weighted moving average giving more weight to recent observations, used for trend and volatility regime detection.
|
||
|
||
---
|
||
|
||
## Requirements
|
||
|
||
### Requirement 1: Probabilistic Sentiment Accumulation via Bayesian Evidence
|
||
|
||
**User Story:** As a quantitative analyst, I want the signal scoring layer to accumulate sentiment evidence probabilistically using Bayesian methods, so that the system captures uncertainty structure instead of collapsing sentiment into binary ±1 labels.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN a set of weighted signals is provided for a ticker and window, THE Signal_Scorer SHALL compute a log-likelihood accumulation L_t = Σ(w_i · s_i) where w_i is the combined signal weight and s_i is the sentiment value.
|
||
2. WHEN the log-likelihood L_t has been computed, THE Signal_Scorer SHALL convert the accumulation to a bullish probability using the Sigmoid_Function: P_bull = σ(L_t) = 1/(1+e^(-L_t)).
|
||
3. WHEN weighted signals are provided, THE Signal_Scorer SHALL maintain a Beta_Distribution posterior with parameters α_t = α_0 + W_bull and β_t = β_0 + W_bear, where W_bull is the sum of combined weights for positive signals and W_bear is the sum for negative signals, and α_0 = β_0 = 1.0 as uninformative priors.
|
||
4. THE Signal_Scorer SHALL compute Bayesian confidence from the Beta_Distribution posterior variance as C = 1 - 4αβ/(α+β)², where C ranges from 0.0 (maximum uncertainty at α=β) to approaching 1.0 (strong evidence concentration).
|
||
5. WHEN no signals exist for a ticker and window, THE Signal_Scorer SHALL return P_bull = 0.5, α = 1.0, β = 1.0, and C = 0.0, representing the uninformative prior state.
|
||
6. THE Signal_Scorer SHALL preserve the existing `WeightedSignal` dataclass interface, adding the Bayesian posterior fields (P_bull, α, β, Bayesian confidence) as additional output alongside the existing weighted sentiment average.
|
||
7. FOR ALL valid sets of weighted signals, computing the Beta posterior then extracting P_bull SHALL produce a value within 0.05 of σ(L_t) when signal weights are uniform (round-trip consistency between the two probabilistic representations).
|
||
|
||
---
|
||
|
||
### Requirement 2: Sigmoid Confidence Gate Replacing Binary Gate
|
||
|
||
**User Story:** As a quantitative analyst, I want the binary confidence gate replaced with a smooth sigmoid transition, so that marginally confident signals contribute proportionally rather than being completely discarded or fully included.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN a document signal has extraction confidence x, THE Signal_Scorer SHALL compute a soft gate value p = σ(5·(x - 0.5)) = 1/(1+e^(-5·(x-0.5))) instead of the current binary 0/1 gate.
|
||
2. WHEN extraction confidence is 0.5, THE Signal_Scorer SHALL produce a gate value of 0.5 (the sigmoid midpoint).
|
||
3. WHEN extraction confidence is below 0.2, THE Signal_Scorer SHALL produce a gate value below 0.05, preserving near-zero weight for very low confidence signals.
|
||
4. WHEN extraction confidence is above 0.8, THE Signal_Scorer SHALL produce a gate value above 0.95, preserving near-full weight for high confidence signals.
|
||
5. THE Signal_Scorer SHALL use the sigmoid gate value as a multiplicative factor in the combined weight formula in place of the current binary G_conf.
|
||
6. FOR ALL extraction confidence values in [0.0, 1.0], THE Signal_Scorer SHALL produce gate values that are monotonically increasing (higher confidence always produces equal or higher gate values).
|
||
|
||
---
|
||
|
||
### Requirement 3: Information Gain Surprise Weighting
|
||
|
||
**User Story:** As a quantitative analyst, I want signals weighted by their information gain (surprise factor), so that rare and unexpected events receive proportionally higher influence than routine signals.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN a signal has a known event type (e.g., earnings, product_launch, regulatory, legal, m_and_a), THE Signal_Scorer SHALL compute an information gain factor r = 1 + λ·(-log₂ P(event_type)), where P(event_type) is the empirical base rate of that event type and λ is a configurable scaling parameter with default 0.3.
|
||
2. WHEN the event type base rate is not available, THE Signal_Scorer SHALL use a default base rate of 0.1 (treating the event as moderately rare).
|
||
3. THE Signal_Scorer SHALL multiply the information gain factor r into the combined weight formula as an additional multiplicative component.
|
||
4. THE Signal_Scorer SHALL clamp the information gain factor to a maximum of 3.0 to prevent extremely rare events from dominating the aggregation.
|
||
5. FOR ALL event types with base rate in (0, 1], THE Signal_Scorer SHALL produce information gain factors that are monotonically decreasing with increasing base rate (rarer events always receive higher surprise weight).
|
||
|
||
---
|
||
|
||
### Requirement 4: Historical Source Accuracy Tracking
|
||
|
||
**User Story:** As a quantitative analyst, I want source credibility to incorporate historical prediction accuracy, so that sources with a track record of correct directional calls receive higher weight.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE Signal_Scorer SHALL maintain a per-source accuracy metric computed as the fraction of past signals from that source where the predicted direction matched the subsequent 7-day price movement direction.
|
||
2. WHEN a source has at least 10 historical signals with known outcomes, THE Signal_Scorer SHALL incorporate the source accuracy as a multiplicative factor on the credibility weight, scaled linearly from 0.5 (0% accuracy) to 1.5 (100% accuracy).
|
||
3. WHEN a source has fewer than 10 historical signals, THE Signal_Scorer SHALL use a neutral accuracy factor of 1.0 (no adjustment).
|
||
4. THE Signal_Scorer SHALL update source accuracy metrics asynchronously after each aggregation cycle, using realized price data from the market data tables.
|
||
5. THE Signal_Scorer SHALL store source accuracy metrics in a database table with columns for source identifier, accuracy ratio, sample count, and last updated timestamp.
|
||
|
||
---
|
||
|
||
### Requirement 5: Adaptive Recency Decay with Event-Specific Half-Lives
|
||
|
||
**User Story:** As a quantitative analyst, I want recency decay half-lives to adapt based on event characteristics, so that high-impact events persist longer in the aggregation while routine signals decay faster.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN computing recency decay for a signal, THE Signal_Scorer SHALL use an adaptive half-life τ_i = τ_base · (1 + β_impact) · (1 + β_surprise) · (1 + β_market_reaction), where τ_base is the current fixed half-life for the window.
|
||
2. THE Signal_Scorer SHALL compute β_impact from the signal's impact score, scaled linearly from 0.0 (impact_score = 0) to 1.0 (impact_score = 1.0).
|
||
3. THE Signal_Scorer SHALL compute β_surprise from the information gain factor (Requirement 3), scaled linearly from 0.0 (r = 1.0, no surprise) to 1.0 (r = 3.0, maximum surprise).
|
||
4. THE Signal_Scorer SHALL compute β_market_reaction from the market context multiplier, scaled linearly from 0.0 (multiplier = 1.0, no market reaction) to 0.5 (multiplier = 1.45, maximum market reaction).
|
||
5. WHEN all three β factors are at their maximum, THE Signal_Scorer SHALL produce an adaptive half-life of at most 6× the base half-life (τ_base · 2.0 · 2.0 · 1.5 = 6.0 · τ_base).
|
||
6. WHEN all three β factors are zero (routine, unsurprising signal in calm market), THE Signal_Scorer SHALL produce the same half-life as the current fixed system (τ_base).
|
||
7. FOR ALL combinations of impact, surprise, and market reaction values, THE Signal_Scorer SHALL produce adaptive half-lives that are greater than or equal to τ_base (adaptive decay is always slower or equal to the base decay, never faster).
|
||
|
||
---
|
||
|
||
### Requirement 6: Volatility-Adjusted Normalization (Regime-Aware Scoring)
|
||
|
||
**User Story:** As a quantitative analyst, I want signal weights normalized by current market volatility and volume conditions, so that the same signal magnitude is interpreted differently in calm vs volatile markets.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN market data is available for a ticker, THE Signal_Scorer SHALL compute a return z-score z_r = (r_t - μ_20) / σ_20, where r_t is the current return, μ_20 is the 20-day mean return, and σ_20 is the 20-day return standard deviation.
|
||
2. WHEN market data is available for a ticker, THE Signal_Scorer SHALL compute a volume z-score z_v = (log(V_t) - μ_V) / σ_V, where V_t is the current volume, μ_V is the 20-day mean of log-volume, and σ_V is the 20-day standard deviation of log-volume.
|
||
3. THE Signal_Scorer SHALL compute a regime multiplier M_regime = 1 + 0.15·|z_r| + 0.10·|z_v|, which amplifies signal weights during abnormal market conditions.
|
||
4. THE Signal_Scorer SHALL clamp M_regime to the range [1.0, 2.5] to prevent extreme z-scores from producing runaway weight amplification.
|
||
5. WHEN market data is not available for a ticker, THE Signal_Scorer SHALL use M_regime = 1.0 (no regime adjustment).
|
||
6. THE Signal_Scorer SHALL replace the current market context multiplier (M_context) with M_regime in the combined weight formula.
|
||
|
||
---
|
||
|
||
### Requirement 7: Regime Detection and Classification
|
||
|
||
**User Story:** As a quantitative analyst, I want the system to detect and classify the current market regime for each ticker, so that scoring thresholds and behavior adapt to whether the market is trending, panicking, mean-reverting, or uncertain.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN market data is available, THE Regime_Detector SHALL compute a trend indicator R = sign(EMA_20 - EMA_100), where EMA_20 and EMA_100 are exponential moving averages of closing prices over 20 and 100 days respectively.
|
||
2. WHEN market data is available, THE Regime_Detector SHALL compute a volatility ratio V_r = σ_20 / σ_100, where σ_20 and σ_100 are the 20-day and 100-day return standard deviations.
|
||
3. THE Regime_Detector SHALL classify the market regime into one of four categories based on R and V_r: trend-following (R ≠ 0 AND V_r < 1.2), panic (V_r > 1.5), mean-reversion (R = 0 AND V_r < 1.0), uncertainty (all other cases).
|
||
4. WHEN the regime is classified as panic, THE Aggregation_Engine SHALL reduce the bullish/bearish threshold from ±0.15 to ±0.10 (making the system more sensitive to directional signals during high-volatility periods).
|
||
5. WHEN the regime is classified as mean-reversion, THE Aggregation_Engine SHALL increase the bullish/bearish threshold from ±0.15 to ±0.20 (requiring stronger evidence for directional calls in range-bound markets).
|
||
6. WHEN the regime is classified as trend-following, THE Aggregation_Engine SHALL use the default thresholds of ±0.15.
|
||
7. WHEN the regime is classified as uncertainty, THE Aggregation_Engine SHALL use the default thresholds of ±0.15 and increase the contradiction penalty multiplier from 0.4 to 0.6.
|
||
8. THE Regime_Detector SHALL persist the current regime classification per ticker to the database for auditability and dashboard display.
|
||
9. WHEN market data is insufficient to compute EMA_100 (fewer than 100 days of price history), THE Regime_Detector SHALL default to the uncertainty regime.
|
||
|
||
---
|
||
|
||
### Requirement 8: Bayesian Posterior Confidence Replacing Heuristic Confidence
|
||
|
||
**User Story:** As a quantitative analyst, I want trend confidence derived from the Bayesian posterior distribution rather than the current heuristic weighted formula, so that confidence reflects actual evidence concentration rather than an ad-hoc combination of factors.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN computing trend confidence, THE Trend_Assembler SHALL use the Bayesian confidence C = 1 - 4αβ/(α+β)² from the Beta_Distribution posterior (Requirement 1) as the primary confidence component with weight 0.5.
|
||
2. THE Trend_Assembler SHALL retain the source count factor (min(N_unique/15, 0.8)) as a secondary confidence component with weight 0.25, rewarding evidence breadth.
|
||
3. THE Trend_Assembler SHALL retain the contradiction penalty (contradiction_score × 0.4) as a confidence reduction.
|
||
4. THE Trend_Assembler SHALL compute the combined confidence as: confidence = 0.5 × C_bayesian + 0.25 × F_count + 0.25 × C_avg_credibility - P_contradiction, clamped to [0.0, 1.0].
|
||
5. THE Trend_Assembler SHALL preserve the existing confidence thresholds for recommendation eligibility (0.35 minimum, 0.50 paper, 0.70 live) without modification.
|
||
6. FOR ALL signal sets where all signals agree on direction, THE Trend_Assembler SHALL produce Bayesian confidence that increases monotonically with the number of agreeing signals.
|
||
|
||
---
|
||
|
||
### Requirement 9: Entropy-Based Mixed Signal Detection
|
||
|
||
**User Story:** As a quantitative analyst, I want mixed trend detection based on Shannon entropy rather than simple contradiction thresholds, so that the system can distinguish between genuine uncertainty (high entropy) and weak signal (low total weight).
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN the bullish probability P_bull has been computed from the Bayesian posterior, THE Trend_Assembler SHALL compute Shannon entropy H = -P_bull·log₂(P_bull) - (1-P_bull)·log₂(1-P_bull).
|
||
2. WHEN H > 0.9 (entropy close to maximum of 1.0, indicating near-equal probability of bullish and bearish), THE Trend_Assembler SHALL classify the trend direction as mixed, regardless of the weighted sentiment average.
|
||
3. WHEN H ≤ 0.9 AND P_bull > 0.65, THE Trend_Assembler SHALL classify the trend direction as bullish.
|
||
4. WHEN H ≤ 0.9 AND P_bull < 0.35, THE Trend_Assembler SHALL classify the trend direction as bearish.
|
||
5. WHEN H ≤ 0.9 AND 0.35 ≤ P_bull ≤ 0.65, THE Trend_Assembler SHALL classify the trend direction as neutral.
|
||
6. THE Trend_Assembler SHALL persist the entropy value H alongside the trend summary for auditability.
|
||
7. FOR ALL P_bull values in (0, 1), THE Trend_Assembler SHALL compute entropy values in (0, 1], with maximum entropy of 1.0 occurring at P_bull = 0.5.
|
||
|
||
---
|
||
|
||
### Requirement 10: Multiplicative Macro Exposure Scoring
|
||
|
||
**User Story:** As a quantitative analyst, I want macro impact computed using multiplicative exposure rather than linear weighted sums, so that a company exposed across multiple dimensions receives compounding impact rather than simple addition.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN computing macro impact for a company, THE Macro_Scorer SHALL use the multiplicative exposure formula S_macro = severity · (1 - Π_k(1 - w_k · O_k)), where O_k are the overlap components (geographic, supply chain, commodity, sector) and w_k are their respective weights.
|
||
2. THE Macro_Scorer SHALL use the following overlap weights: w_geo = 0.35, w_supply = 0.25, w_commodity = 0.25, w_sector = 0.15 (matching the current linear weight distribution).
|
||
3. WHEN a company has zero overlap across all dimensions, THE Macro_Scorer SHALL produce S_macro = 0.0 (no impact).
|
||
4. WHEN a company has maximum overlap across all dimensions (all O_k = 1.0), THE Macro_Scorer SHALL produce S_macro = severity · (1 - (1-0.35)·(1-0.25)·(1-0.25)·(1-0.15)), which is approximately severity · 0.724.
|
||
5. THE Macro_Scorer SHALL preserve the existing severity weight mapping (critical=1.0, high=0.75, moderate=0.5, low=0.25).
|
||
6. THE Macro_Scorer SHALL preserve the existing resilience modifier (R_tier) applied after the multiplicative exposure computation.
|
||
7. FOR ALL overlap configurations, THE Macro_Scorer SHALL produce impact scores where adding a non-zero overlap in any dimension increases the total impact (monotonicity property).
|
||
|
||
---
|
||
|
||
### Requirement 11: Conditional Macro Signal Integration
|
||
|
||
**User Story:** As a quantitative analyst, I want macro signals treated as conditional modifiers on company signals rather than additive contributions, so that macro context amplifies or dampens existing company-level evidence rather than independently shifting the trend.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN both company signals and macro signals exist for a ticker, THE Aggregation_Engine SHALL apply macro impact as a multiplicative modifier on the company signal strength: S_adjusted = S_company · (1 + M_macro · sign_alignment), where M_macro is the normalized macro impact and sign_alignment is +1 when macro and company signals agree in direction, -1 when they disagree.
|
||
2. THE Aggregation_Engine SHALL clamp the macro modifier (1 + M_macro · sign_alignment) to the range [0.5, 1.5] to prevent macro signals from inverting or excessively amplifying company signals.
|
||
3. WHEN only macro signals exist (no company signals), THE Aggregation_Engine SHALL fall back to the current additive behavior with the existing macro weight of 0.3, preserving the macro-only suppression safety mechanism.
|
||
4. WHEN only company signals exist (macro layer disabled or no macro events), THE Aggregation_Engine SHALL use company signals without modification (modifier = 1.0).
|
||
5. THE Aggregation_Engine SHALL log the macro modifier value applied to each ticker for auditability.
|
||
|
||
---
|
||
|
||
### Requirement 12: Graph-Distance Competitive Signal Attenuation
|
||
|
||
**User Story:** As a quantitative analyst, I want competitive signal transfer attenuated by network graph distance and historical correlation, so that signals propagate more strongly to closely related competitors and decay for distant relationships.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN propagating a signal from a source company to a target company, THE Competitive_Scorer SHALL compute transfer strength as S_transfer = S_source · ρ_historical · e^(-d_network), where S_source is the source signal strength, ρ_historical is the historical price correlation between the two companies, and d_network is the graph distance in the competitor relationship network.
|
||
2. THE Competitive_Scorer SHALL compute graph distance d_network as the shortest path length in the competitor relationship graph, where direct competitors have distance 1, competitors-of-competitors have distance 2, and so on.
|
||
3. WHEN the graph distance exceeds 3, THE Competitive_Scorer SHALL not propagate the signal (e^(-3) ≈ 0.05, below meaningful contribution).
|
||
4. THE Competitive_Scorer SHALL compute ρ_historical as the 90-day rolling Pearson correlation of daily returns between the source and target companies.
|
||
5. WHEN historical correlation data is insufficient (fewer than 30 trading days of overlapping data), THE Competitive_Scorer SHALL use a default correlation of 0.3 for same-sector companies and 0.1 for cross-sector companies.
|
||
6. THE Competitive_Scorer SHALL preserve the existing relationship strength threshold (R_relationship ≥ 0.2) as a pre-filter before applying the graph-distance attenuation.
|
||
7. FOR ALL source-target pairs, THE Competitive_Scorer SHALL produce transfer strengths that decrease monotonically with increasing graph distance (closer competitors always receive stronger signal transfer).
|
||
|
||
---
|
||
|
||
### Requirement 13: Exponentially Weighted Momentum
|
||
|
||
**User Story:** As a quantitative analyst, I want trend momentum computed using exponentially weighted historical changes rather than a simple current-minus-previous difference, so that the momentum estimate is smoother and less sensitive to single-cycle noise.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN computing trend momentum, THE Projection_Engine SHALL use an exponentially weighted sum M_t = Σ_{k=0}^{K-1} λ^k · ΔS_{t-k}, where ΔS_{t-k} is the signed strength change at lag k, λ = 0.7 is the decay factor, and K is the number of available historical cycles (up to 10).
|
||
2. THE Projection_Engine SHALL normalize the momentum by dividing by the geometric series sum Σ λ^k to produce a value in [-1, 1].
|
||
3. WHEN fewer than 2 historical cycles are available, THE Projection_Engine SHALL fall back to the current heuristic (momentum = direction_sign × strength × 0.5).
|
||
4. THE Projection_Engine SHALL compute volatility-scaled momentum M_adj = M_t / max(σ_20, 0.01), where σ_20 is the 20-day return standard deviation, to normalize momentum relative to the ticker's typical price movement.
|
||
5. THE Projection_Engine SHALL clamp M_adj to [-2.0, 2.0] to prevent division by very small σ_20 from producing extreme values.
|
||
6. FOR ALL sequences of monotonically increasing signed strengths, THE Projection_Engine SHALL produce positive momentum values (correctly detecting strengthening bullish trends).
|
||
|
||
---
|
||
|
||
### Requirement 14: Expected Value Recommendation Gate
|
||
|
||
**User Story:** As a quantitative analyst, I want recommendation eligibility based on expected value rather than simple confidence and strength thresholds, so that the system only recommends trades with positive risk-adjusted expected outcomes.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN evaluating recommendation eligibility, THE Recommendation_Engine SHALL compute expected value EV = P_bull · R_up - P_bear · R_down, where P_bull is the Bayesian bullish probability, P_bear = 1 - P_bull, R_up is the estimated upside return, and R_down is the estimated downside return.
|
||
2. THE Recommendation_Engine SHALL estimate R_up and R_down from the trend strength and the ticker's 20-day historical volatility: R_up = strength · σ_20 · √(horizon_days) and R_down = (1 - strength) · σ_20 · √(horizon_days), where horizon_days corresponds to the trend window duration.
|
||
3. WHEN EV is positive and exceeds a configurable threshold (default 0.005, representing 0.5% expected return), THE Recommendation_Engine SHALL allow the recommendation to proceed through the existing eligibility gates.
|
||
4. WHEN EV is negative or below the threshold, THE Recommendation_Engine SHALL force the recommendation to informational mode regardless of confidence and strength.
|
||
5. THE Recommendation_Engine SHALL persist the computed EV alongside the recommendation for auditability.
|
||
6. THE Recommendation_Engine SHALL preserve all existing eligibility gates (confidence ≥ 0.35, strength ≥ 0.10, contradiction ≤ 0.60, evidence ≥ 2, direction ≠ neutral) as additional requirements beyond the EV gate.
|
||
|
||
---
|
||
|
||
### Requirement 15: Contradiction Handling via Weighted Disagreement Entropy
|
||
|
||
**User Story:** As a quantitative analyst, I want contradiction detection to use weighted disagreement entropy rather than a simple minority/majority ratio, so that the system better distinguishes between a few strong dissenting signals and many weak ones.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. WHEN computing contradiction, THE Trend_Assembler SHALL compute weighted disagreement entropy using the effective weight distribution across positive and negative signal groups.
|
||
2. THE Trend_Assembler SHALL compute the positive weight fraction f_pos = W_positive / (W_positive + W_negative) and negative weight fraction f_neg = W_negative / (W_positive + W_negative), where W_positive and W_negative are the sums of effective weights (combined_weight × impact_score) for each sentiment group.
|
||
3. THE Trend_Assembler SHALL compute contradiction entropy as H_contradiction = -f_pos·log₂(f_pos) - f_neg·log₂(f_neg), normalized to [0, 1] (maximum at f_pos = f_neg = 0.5).
|
||
4. THE Trend_Assembler SHALL weight the contradiction entropy by the total evidence mass: contradiction_score = H_contradiction · min(1.0, (W_positive + W_negative) / W_threshold), where W_threshold is a configurable parameter (default 5.0) representing the evidence mass at which contradiction becomes fully significant.
|
||
5. WHEN only positive or only negative signals exist (no disagreement), THE Trend_Assembler SHALL produce a contradiction score of 0.0.
|
||
6. THE Trend_Assembler SHALL preserve the existing `ContradictionResult` interface, populating the overall score with the entropy-based value and retaining the `DisagreementDetail` objects for catalyst-level analysis.
|
||
7. FOR ALL signal sets with both positive and negative signals, THE Trend_Assembler SHALL produce contradiction scores that increase monotonically as the weight distribution approaches equal split (f_pos → 0.5).
|
||
|
||
---
|
||
|
||
### Requirement 16: Backward Compatibility and Migration
|
||
|
||
**User Story:** As a platform operator, I want the mathematical upgrades to be backward-compatible with the existing database schema and deployable incrementally, so that the upgrade does not require downtime or data migration.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE Aggregation_Engine SHALL preserve the existing `WeightedSignal`, `SignalWeight`, `TrendSummary`, and `Recommendation` dataclass interfaces, adding new fields as optional attributes with default values.
|
||
2. THE Aggregation_Engine SHALL store new mathematical outputs (P_bull, α, β, entropy, regime, EV) in the existing JSONB metadata fields of `trend_windows` and `recommendations` tables rather than requiring new columns.
|
||
3. THE Aggregation_Engine SHALL support a feature flag `probabilistic_scoring_enabled` in `risk_configs` that toggles between the current heuristic pipeline and the new probabilistic pipeline, defaultable to `false` for safe rollout.
|
||
4. WHEN `probabilistic_scoring_enabled` is false, THE Aggregation_Engine SHALL produce identical outputs to the current system (no behavioral change).
|
||
5. WHEN `probabilistic_scoring_enabled` is true, THE Aggregation_Engine SHALL use the new Bayesian, regime-aware, and adaptive formulas for all pipeline stages.
|
||
6. IF the feature flag toggle fails to read from the database, THEN THE Aggregation_Engine SHALL default to the current heuristic pipeline (fail-safe behavior).
|
||
7. THE Aggregation_Engine SHALL log which pipeline mode (heuristic or probabilistic) is active at the start of each aggregation cycle.
|
||
|
||
---
|
||
|
||
### Requirement 17: Property-Based Testing for Mathematical Correctness
|
||
|
||
**User Story:** As a developer, I want comprehensive property-based tests validating the mathematical correctness of all new formulas, so that edge cases and numerical stability issues are caught before deployment.
|
||
|
||
#### Acceptance Criteria
|
||
|
||
1. THE test suite SHALL include property-based tests using Hypothesis for the sigmoid confidence gate verifying monotonicity (higher confidence input always produces higher or equal gate output) across all float inputs in [0.0, 1.0].
|
||
2. THE test suite SHALL include property-based tests for the Beta_Distribution posterior verifying that α + β increases monotonically with the number of signals processed (evidence always accumulates).
|
||
3. THE test suite SHALL include property-based tests for the Bayesian confidence formula verifying that confidence is 0.0 when α = β (maximum uncertainty) and approaches 1.0 as the ratio α/β or β/α increases.
|
||
4. THE test suite SHALL include property-based tests for the adaptive decay verifying that the adaptive half-life is always greater than or equal to the base half-life for all valid input combinations.
|
||
5. THE test suite SHALL include property-based tests for the multiplicative macro exposure verifying monotonicity (adding non-zero overlap in any dimension increases total impact).
|
||
6. THE test suite SHALL include property-based tests for the exponentially weighted momentum verifying that monotonically increasing strength sequences produce positive momentum.
|
||
7. THE test suite SHALL include a round-trip property test verifying that computing the Beta posterior from signals, extracting P_bull, then reconstructing approximate signal weights produces values consistent with the original inputs.
|
||
8. THE test suite SHALL include property-based tests for the expected value computation verifying that EV is positive when P_bull > 0.5 and R_up > R_down (basic directional consistency).
|
||
9. THE test suite SHALL include property-based tests for numerical stability verifying that no formula produces NaN, infinity, or values outside documented ranges for any valid input combination.
|
||
10. THE test suite SHALL use `@settings(max_examples=100)` and follow the project convention of `test_pbt_*` file naming.
|