docs: add comprehensive mathematical reference for all pipeline equations

2026-04-28 17:01:03 +00:00
parent 3b22f5e1fc
commit 4954318f7b
1 changed files with 651 additions and 0 deletions
@@ -0,0 +1,651 @@
+# Stonks Oracle — Mathematical Reference
+
+Every equation, formula, threshold, and constant used in the signal processing, aggregation, recommendation, and trading pipeline. Organized by pipeline stage.
+
+Code references are provided so each formula can be traced to its implementation.
+
+---
+
+## 1. Signal Scoring
+
+**Source:** `services/aggregation/scoring.py`
+
+### 1.1 Combined Signal Weight
+
+Each document signal receives a composite weight:
+
+```
+W_combined = G_conf × W_recency × W_credibility × (1 + B_novelty) × M_context
+```
+
+| Component | Symbol | Formula | Range |
+|---|---|---|---|
+| Confidence gate | G_conf | 1 if extraction_confidence ≥ 0.2, else 0 | {0, 1} |
+| Recency decay | W_recency | 2^(−t_age / t_half) | [0.01, 1.0] |
+| Credibility | W_credibility | clamp(credibility, 0.1, 1.0)^α | [0.1, 1.0] |
+| Novelty bonus | B_novelty | novelty_score × 0.25 | [0, 0.25] |
+| Market context | M_context | 1 + boost_vol + boost_vol_surge | [1.0, 1.45] |
+
+### 1.2 Recency Decay
+
+```
+W_recency = max( 2^(−t_age / t_half), 0.01 )
+```
+
+where `t_age` is document age in hours and half-lives by window are:
+
+| Window | t_half (hours) |
+|---|---|
+| intraday | 2 |
+| 1d | 12 |
+| 7d | 72 |
+| 30d | 240 |
+| 90d | 720 |
+
+### 1.3 Credibility Weight
+
+```
+W_credibility = clamp(c_raw, 0.1, 1.0)^α       where α = 1.0 (default)
+```
+
+α > 1 penalizes low-credibility sources more aggressively; α < 1 flattens the curve.
+
+### 1.4 Market Context Multiplier
+
+```
+boost_vol = min( ln(1 + max(σ − 1.0, 0)) × 0.15, 0.30 )
+
+boost_surge = 0.15   if ΔV% > 50%,  else 0
+
+M_context = 1.0 + boost_vol + boost_surge
+```
+
+where σ is price volatility and ΔV% is volume change percentage.
+
+### 1.5 Weighted Sentiment Average
+
+```
+S_avg = Σ(W_combined_i × impact_i × sentiment_i) / Σ(W_combined_i × impact_i)
+```
+
+- sentiment_i ∈ {+1.0 (positive), −1.0 (negative), 0.0 (neutral/mixed)}
+- impact_i ∈ [0, 1] from extraction
+- Returns 0.0 when denominator = 0
+
+---
+
+## 2. Trend Summary Assembly
+
+**Source:** `services/aggregation/worker.py`
+
+### 2.1 Trend Direction
+
+| Condition | Direction |
+|---|---|
+| S_avg ≥ 0.15 | Bullish |
+| S_avg ≤ −0.15 | Bearish |
+| contradiction > 0.10 AND |S_avg| < 0.30 | Mixed |
+| otherwise | Neutral |
+
+### 2.2 Trend Strength
+
+```
+strength = min(|S_avg|, 1.0)
+```
+
+### 2.3 Contradiction Score
+
+**Source:** `services/aggregation/contradiction.py`
+
+```
+contradiction = W_minority / (W_positive + W_negative)
+```
+
+where:
+```
+W_positive = Σ(W_combined_i × impact_i)   for signals with sentiment > 0
+W_negative = Σ(W_combined_i × impact_i)   for signals with sentiment < 0
+W_minority = min(W_positive, W_negative)
+```
+
+Range: [0, 1]. 0 = full agreement, 0.5 = equal-weight disagreement.
+
+### 2.4 Trend Confidence
+
+```
+confidence = clamp(0.3 × F_count + 0.3 × C_avg + 0.4 × A_agreement − P_contradiction, 0, 1)
+```
+
+| Component | Formula |
+|---|---|
+| F_count (source count) | min(N_unique / 15, 0.8) |
+| C_avg (extraction confidence) | mean of extraction confidences |
+| A_agreement (signal agreement) | fraction_same_direction × min(1, log₂(N_unique + 1) / log₂(8)) |
+| P_contradiction | contradiction_score × 0.4 |
+
+---
+
+## 3. Macro Impact Scoring (Layer 2)
+
+**Source:** `services/aggregation/interpolation.py`
+
+### 3.1 Overlap Components
+
+**Geographic overlap:**
+```
+O_geo = Σ revenue_pct_r   for each event region r in company's revenue mix
+```
+Range: [0, 1]
+
+**Supply chain overlap:**
+```
+O_supply = |event_regions ∩ supply_regions| / |supply_regions|
+```
+
+**Commodity overlap:**
+```
+O_commodity = |event_commodities ∩ company_commodities| / |company_commodities|
+```
+
+**Sector overlap:**
+```
+O_sector = 1.0 if company_sector ∈ event_affected_sectors, else 0.0
+```
+
+### 3.2 Raw Macro Impact Score
+
+```
+S_raw = W_severity × (0.35 × O_geo + 0.25 × O_supply + 0.25 × O_commodity + 0.15 × O_sector)
+```
+
+Severity weights:
+
+| Severity | W_severity |
+|---|---|
+| critical | 1.0 |
+| high | 0.75 |
+| moderate | 0.5 |
+| low | 0.25 |
+
+### 3.3 Resilience Modifier
+
+For international events, the raw score is adjusted by market position:
+
+```
+S_final = clamp(S_raw × R_tier, 0, 1)
+```
+
+| Market Position Tier | R_tier |
+|---|---|
+| Global leader | 0.70 |
+| Multinational | 0.85 |
+| Regional | 1.00 |
+| Domestic | 1.20 |
+
+For domestic-only events, R_tier = 1.0 regardless of tier.
+
+### 3.4 Macro Impact Confidence
+
+```
+confidence = min(event_confidence × min(O_total + 0.3, 1.0), 1.0)
+```
+
+where O_total = O_geo + O_supply + O_commodity + O_sector.
+
+### 3.5 Accelerated Staleness Decay
+
+For short-term events older than 48 hours:
+
+```
+decay_standard = e^(−0.693 × t_age_hours / t_half_hours)     (t_half default = 168h)
+decay_accelerated = decay_standard × 0.5
+```
+
+### 3.6 Macro Signal as WeightedSignal
+
+When merged into the aggregation engine:
+
+```
+impact_score_macro = macro_impact_score × W_macro       (W_macro = 0.3 default)
+sentiment_value = +1 if positive, −1 if negative
+```
+
+Recency decay uses the global event's publication time.
+
+---
+
+## 4. Competitive Signals (Layer 3)
+
+### 4.1 Pattern Confidence
+
+**Source:** `services/aggregation/pattern_matcher.py`
+
+```
+confidence = F_sample × 0.4 + F_consistency × 0.4 + F_recency × 0.2
+```
+
+| Factor | Formula |
+|---|---|
+| F_sample | min(N_samples / 20, 1.0) |
+| F_consistency | max(pct_bullish, pct_bearish) |
+| F_recency | 1.0 if age ≤ 7d; 0.7 if age ≤ 90d; 0.4 otherwise |
+
+**Modifiers:**
+- Major corporate decision (m&a, earnings, legal): confidence × 1.3
+- Insufficient data (N_samples < min_pattern_samples): cap at 0.25
+- Stale data (age > staleness_window_days): confidence × staleness_decay_penalty
+
+**Lookback windows:**
+- Routine signals: 180 days
+- Major corporate decisions: 365 days
+
+### 4.2 Cross-Company Signal Strength
+
+**Source:** `services/aggregation/signal_propagation.py`
+
+```
+S_competitive = clamp(S_pattern_avg × R_relationship × C_pattern × I_source, 0, 1)
+```
+
+| Component | Description |
+|---|---|
+| S_pattern_avg | Average historical outcome strength [0, 1] |
+| R_relationship | Relationship strength from competitor_relationships [0, 1] |
+| C_pattern | Pattern confidence from §4.1 |
+| I_source | Source document's impact_score [0, 1] |
+
+**Threshold gate:** Skipped if R_relationship < propagation_strength_threshold (default 0.2).
+
+### 4.3 Competitive Signal as WeightedSignal
+
+```
+impact_score_competitive = S_competitive × W_competitive     (W_competitive = 0.2 default)
+direction = majority historical outcome (bullish or bearish)
+```
+
+---
+
+## 5. Trend Projection
+
+**Source:** `services/aggregation/projection.py`
+
+### 5.1 Trend Momentum
+
+```
+momentum = S_current_signed − S_previous_signed
+```
+
+where `S_signed = direction_sign × strength` (bullish = +1, bearish = −1, neutral = 0).
+
+When no previous data exists:
+```
+momentum = direction_sign × strength × 0.5
+```
+
+Range: [−1, 1]
+
+### 5.2 Macro Decay Projection
+
+For each active macro event projected forward by `H` days:
+
+```
+F_future = 2^(−(t_current + H) / t_half)
+I_projected = macro_impact_score × F_future × W_severity
+```
+
+Decay half-lives:
+
+| Duration | t_half (days) |
+|---|---|
+| short_term | 1.0 |
+| medium_term | 7.0 |
+| long_term | 30.0 |
+
+Aggregate direction: bullish if W_pos > 1.2 × W_neg; bearish if W_neg > 1.2 × W_pos; mixed if both > 0.
+
+### 5.3 Projection Blending
+
+```
+W_macro_blend = min(S_macro_projected × 0.4, 0.4)
+W_company = 1.0 − W_macro_blend
+
+S_blended = W_company × S_momentum_projected + W_macro_blend × S_macro_signed
+```
+
+**Catalyst boost:** `min(N_catalysts × 0.02, 0.1)` added to projected strength.
+
+**Projected confidence:**
+```
+C_projected = C_base × 0.8 + min(S_macro × 0.15, 0.1)
+```
+
+**Divergence detection:** Flagged when projected direction ≠ current trend direction.
+
+---
+
+## 6. Data Quality Suppression
+
+**Source:** `services/recommendation/suppression.py`
+
+### 6.1 Data Quality Score
+
+```
+Q = 0.4 × Q_confidence + 0.3 × Q_freshness + 0.3 × Q_coverage
+```
+
+| Component | Formula |
+|---|---|
+| Q_confidence | min(C_avg_extraction / 0.8, 1.0) |
+| Q_freshness | max(0, 1 − t_newest_hours / 168) |
+| Q_coverage | (N_valid / N_total) × min(N_valid / 10, 1.0) |
+
+**Suppression triggers** (any one → informational only):
+
+| Check | Threshold |
+|---|---|
+| Avg extraction confidence | < 0.40 |
+| Evidence staleness | > 168 hours (7 days) |
+| Source type diversity | < 1 distinct type |
+| Extraction failure rate | > 50% |
+| Valid document count | < 2 |
+| Data quality score | < 0.30 |
+
+### 6.2 Safety Suppression
+
+- **Macro-only:** If trend driven solely by macro signals with zero company evidence → forced informational
+- **Pattern-only:** If trend driven solely by pattern/competitive signals with no company or macro support → forced informational
+
+---
+
+## 7. Recommendation Eligibility
+
+**Source:** `services/recommendation/eligibility.py`
+
+### 7.1 Gate Checks (all must pass)
+
+| Check | Threshold |
+|---|---|
+| Confidence | ≥ 0.35 |
+| Trend strength | ≥ 0.10 |
+| Contradiction score | ≤ 0.60 |
+| Evidence count | ≥ 2 |
+| Direction | ≠ neutral |
+
+### 7.2 Action Mapping
+
+| Condition | Action |
+|---|---|
+| Bullish AND strength ≥ 0.25 | BUY |
+| Bearish AND strength ≥ 0.25 | SELL |
+| Directional AND confidence ≥ 0.50 | HOLD |
+| Mixed or weak | WATCH |
+
+### 7.3 Mode Escalation
+
+| Mode | Requirements |
+|---|---|
+| live_eligible | confidence ≥ 0.70, contradiction ≤ 0.25, evidence ≥ 5 |
+| paper_eligible | confidence ≥ 0.50 |
+| informational | everything else (WATCH/HOLD always informational) |
+
+### 7.4 Position Sizing
+
+```
+portfolio_pct = base + C_factor × S_factor × range × P_contradiction × P_evidence
+```
+
+| Component | Formula | Default |
+|---|---|---|
+| base | base_portfolio_pct | 0.01 (1%) |
+| range | max_portfolio_pct − base_portfolio_pct | 0.09 (9%) |
+| C_factor | confidence_sizing_weight × confidence | 0.8 × confidence |
+| S_factor | 0.5 + 0.5 × trend_strength | [0.5, 1.0] |
+| P_contradiction | 1 − (contradiction_penalty × contradiction_score) | penalty = 0.5 |
+| P_evidence | 0.50 if evidence < 3; 0.75 if evidence < 5; 1.0 otherwise | |
+
+Clamped to [base × 0.5, max_portfolio_pct].
+
+**Max loss percentage** uses the same structure with base = 0.003 (0.3%) and max = 0.02 (2%).
+
+---
+
+## 8. Trading Engine — Position Sizing
+
+**Source:** `services/trading/position_sizer.py`
+
+### 8.1 Base Allocation
+
+```
+raw_pct = (max_position_pct × 0.5) × (confidence / min_confidence) × multiplier
+clamped_pct = min(raw_pct, max_position_pct)
+dollar_amount = min(active_pool × clamped_pct, absolute_position_cap)
+```
+
+### 8.2 Correlation Reduction
+
+```
+ρ_avg = Σ(ρ_i × w_i) / Σ(w_i)     for existing positions
+```
+
+| ρ_avg | Action |
+|---|---|
+| > 0.8 | Reject order |
+| 0.5 < ρ_avg ≤ 0.8 | Reduce: factor = 1 − (ρ_avg − 0.5) / 0.3 |
+| ≤ 0.5 | No reduction |
+
+### 8.3 Sector Exposure Reduction
+
+```
+available = max(max_sector_pct × active_pool − current_sector_exposure, 0)
+dollar_amount = min(dollar_amount, available)
+```
+
+### 8.4 Diversification Bonus
+
+If < 3 sectors held AND entering a new sector: dollar_amount × 1.2 (capped at max_position_pct).
+
+### 8.5 Earnings Proximity
+
+| Days to earnings | Action |
+|---|---|
+| ≤ 1 | Reject |
+| 1–3 | 50% reduction |
+| > 3 | No adjustment |
+
+### 8.6 Portfolio Heat Check
+
+```
+heat_new = dollar_amount × atr_multiplier × 0.02
+heat_max = max_portfolio_heat × active_pool
+
+Reject if: heat_current + heat_new > heat_max
+```
+
+### 8.7 Share Rounding
+
+```
+shares = floor(dollar_amount / current_price)
+final_dollar = shares × current_price
+```
+
+Reject if shares = 0.
+
+---
+
+## 9. Stop-Loss and Take-Profit
+
+**Source:** `services/trading/stop_loss_manager.py`
+
+### 9.1 Initial Levels
+
+```
+stop_distance = ATR × M_atr
+stop_loss = entry_price − stop_distance
+take_profit = entry_price + stop_distance × R_reward_risk
+```
+
+| Trade type | M_atr | R_reward_risk |
+|---|---|---|
+| Standard | risk_tier.stop_loss_atr_multiplier | risk_tier.reward_risk_ratio |
+| Micro-trade | 1.0 | 1.5 |
+
+### 9.2 Dynamic Tightening
+
+| Condition | Effective multiplier |
+|---|---|
+| High-severity macro event | base × 0.5 |
+| Earnings within 3 days | base × 0.7 |
+| Portfolio heat > 80% of max | base × 0.7 |
+| Normal | base |
+
+### 9.3 Trailing Stop Activation
+
+Activates when:
+```
+favorable_move = current_price − entry_price > 0.5 × (take_profit − entry_price)
+```
+
+Once active, stop-loss floor = entry_price (breakeven).
+
+---
+
+## 10. Risk Management
+
+### 10.1 Position Limits
+
+**Source:** `services/risk/engine.py`
+
+| Limit | Default | Formula |
+|---|---|---|
+| Max position % | 5% | position_value / portfolio_value ≤ 0.05 |
+| Max position value | $10,000 | existing + new ≤ $10,000 |
+| Max shares/order | 1,000 | quantity ≤ 1,000 |
+| Max sector % | 25% | sector_value / portfolio_value ≤ 0.25 |
+| Max daily loss % | 2% | |daily_pnl| / portfolio_value ≤ 0.02 |
+| Max daily loss $ | $1,000 | |daily_pnl| ≤ $1,000 |
+| Max daily trades | 20 | trade_count < 20 |
+
+### 10.2 Order Clamping
+
+**Source:** `services/risk/engine.py` — `clamp_order_to_position_limits()`
+
+When a buy order exceeds position limits, instead of rejecting:
+
+```
+max_allowed_value = min(
+    max_position_value − existing_value,
+    max_position_pct × portfolio_value − existing_value
+)
+clamped_shares = min( floor(max_allowed_value / price_per_share), max_shares_per_order )
+```
+
+### 10.3 News Shock Lockout
+
+Trigger: impact_score ≥ 0.80 for catalyst ∈ {earnings, legal, m_and_a}
+Duration: 60 minutes (configurable)
+
+### 10.4 Symbol Cooldown
+
+Duration: 15 minutes between trades on same symbol.
+Max concurrent positions per symbol: 1.
+
+---
+
+## 11. Circuit Breaker
+
+**Source:** `services/trading/circuit_breaker.py`
+
+| Trigger | Condition | Cooldown |
+|---|---|---|
+| Daily loss | |daily_pnl| / portfolio_value > 0.05 | 2 hours |
+| Single position | position_loss_pct > 0.15 | 48 hours |
+| Volatility | ≥ 3 stop-losses within 30-minute window | 2 hours |
+
+---
+
+## 12. Risk Tier Auto-Adjustment
+
+**Source:** `services/trading/risk_tier_controller.py`
+
+Tiers: conservative → moderate → aggressive
+
+**Downgrade** (any one triggers, drops one level):
+- 30-day win rate < 40%
+- Current drawdown > 15%
+
+**Upgrade** (all must be true, raises one level):
+- 30-day win rate > 55%
+- Reserve pool > 20% of portfolio
+- Current drawdown < 5%
+
+---
+
+## 13. Portfolio Rebalancing
+
+**Source:** `services/trading/rebalancer.py`
+
+### 13.1 Single-Stock Rebalancing
+
+```
+excess = market_value − max_position_pct × active_pool
+sell_qty = min( floor(excess / current_price), position_quantity )
+```
+
+### 13.2 Sector Rebalancing
+
+```
+sector_excess = Σ(market_value_i) − max_sector_pct × active_pool
+```
+
+Sell from lowest-confidence positions first until excess is covered.
+
+### 13.3 Max Positions Enforcement
+
+```
+excess_count = N_positions − max_positions
+```
+
+Sell entire lowest-confidence positions until count is within limit.
+
+---
+
+## Constants Summary
+
+| Constant | Value | Location |
+|---|---|---|
+| Confidence gate floor | 0.20 | scoring.py |
+| Min recency weight | 0.01 | scoring.py |
+| Credibility floor/ceiling | 0.10 / 1.0 | scoring.py |
+| Novelty bonus max | 0.25 (25%) | scoring.py |
+| Volatility boost threshold | 1.0 price units | scoring.py |
+| Volatility boost max | 0.30 (30%) | scoring.py |
+| Volume surge threshold | 50% | scoring.py |
+| Volume surge boost | 0.15 (15%) | scoring.py |
+| Bullish/bearish threshold | ±0.15 | worker.py |
+| Mixed threshold | contradiction > 0.10, |S| < 0.30 | worker.py |
+| Macro signal weight | 0.30 | config.py |
+| Competitive signal weight | 0.20 | config.py |
+| Macro confidence threshold | 0.40 | interpolation.py |
+| Staleness accelerated decay | 0.50× | interpolation.py |
+| Short-term staleness hours | 48 | interpolation.py |
+| Pattern min samples | configurable | pattern_matcher.py |
+| Major decision weight multiplier | 1.3× | pattern_matcher.py |
+| Routine lookback | 180 days | pattern_matcher.py |
+| Major decision lookback | 365 days | pattern_matcher.py |
+| Propagation strength threshold | 0.20 | signal_propagation.py |
+| Data quality min score | 0.30 | suppression.py |
+| Evidence staleness max | 168 hours (7 days) | suppression.py |
+| Recommendation min confidence | 0.35 | eligibility.py |
+| Recommendation min strength | 0.10 | eligibility.py |
+| Action strength threshold | 0.25 | eligibility.py |
+| Live confidence threshold | 0.70 | eligibility.py |
+| Paper confidence threshold | 0.50 | eligibility.py |
+| Base portfolio allocation | 1% | eligibility.py |
+| Max portfolio allocation | 10% | eligibility.py |
+| Circuit breaker daily loss | 5% | circuit_breaker.py |
+| Circuit breaker single position | 15% | circuit_breaker.py |
+| Stop-loss cluster threshold | 3 hits / 30 min | circuit_breaker.py |
+| Tier downgrade win rate | < 40% | risk_tier_controller.py |
+| Tier upgrade win rate | > 55% | risk_tier_controller.py |
+| Tier upgrade max drawdown | < 5% | risk_tier_controller.py |
+| Tier upgrade min reserve | > 20% | risk_tier_controller.py |