Files
stonks-oracle/docs/intelligence-pipeline-deep-dive/03-signal-scoring-and-weighted-signals.md
T
Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
2026-04-22 02:56:41 +00:00

211 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Page 3 — Signal Scoring and the WeightedSignal Abstraction
The extraction pipeline described in [Page 2](02-ai-agent-processing-and-extraction.md) produces structured intelligence records — `document_impact_records` for company-specific documents, `macro_impact_records` for global events, and `competitive_signal_records` for cross-company pattern propagation. Each record carries a sentiment, an impact score, a confidence value, and a publication timestamp. But these raw values are not directly comparable. A high-confidence extraction from a reputable source published ten minutes ago should carry far more weight than a low-confidence extraction from an unknown source published three weeks ago. A document that breaks genuinely novel information should matter more than one that rehashes yesterday's earnings call. And when the market is moving fast — high volatility, surging volume — fresh signals become even more critical.
The signal scoring layer in `services/aggregation/scoring.py` solves this problem by transforming each raw intelligence record into a `WeightedSignal` object: a document reference paired with a composite aggregation weight that encodes recency, credibility, novelty, confidence, and market conditions into a single number. This page explains how that weight is computed, how sentiment labels become numeric values, and how three independent signal layers — Company, Macro, and Competitive — each produce `WeightedSignal` objects that are concatenated into a unified list before the aggregation engine computes trend summaries. For a visual breakdown of the composite weight formula, see the [Weighted Signal Computation diagram](diagrams/weighted-signal-computation.md). For the full picture of how the three layers merge, see the [Three-Layer Signal Merging diagram](diagrams/three-layer-signal-merging.md).
---
## The WeightedSignal and SignalWeight Dataclasses
The core abstraction is the `WeightedSignal` dataclass, defined in `services/aggregation/scoring.py`. It pairs a document reference with the computed weight and the signal's sentiment and impact values:
- **`document_id`** — the UUID of the source document (for company and macro signals) or a synthetic identifier for pattern-derived signals (e.g., `pattern:AAPL:earnings:7d`).
- **`weight`** — a `SignalWeight` object containing the component breakdown and the final combined score.
- **`sentiment_value`** — a numeric sentiment value: `+1.0` for positive, `-1.0` for negative, `0.0` for neutral or mixed.
- **`impact_score`** — the magnitude of impact, drawn from the extraction's per-company impact score for company signals, or scaled by a layer-specific weight multiplier for macro and competitive signals.
The `SignalWeight` dataclass captures the individual components that feed into the combined weight, making the scoring decision fully transparent and auditable:
- **`recency`** — the exponential decay weight based on document age.
- **`credibility`** — the source credibility weight after clamping and exponentiation.
- **`novelty_bonus`** — the additive bonus derived from the document's novelty score.
- **`confidence_gate`** — either `1.0` (signal passes) or `0.0` (signal is gated out).
- **`market_ctx_multiplier`** — a multiplicative boost from market conditions, always `>= 1.0`.
- **`combined`** — the final composite weight used by the aggregation engine.
The `ScoringConfig` frozen dataclass holds all tunable parameters for the scoring functions — half-life hours per window, credibility bounds, novelty bonus cap, confidence floor, and market context thresholds. A module-level `DEFAULT_CONFIG` singleton provides the production defaults, but every scoring function accepts an optional `config` parameter so that tests and alternative configurations can override any parameter without modifying global state.
---
## The Composite Weight Formula
The `compute_signal_weight()` function in `services/aggregation/scoring.py` computes the combined weight for a single document signal. The formula is:
```
combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier
```
Each factor is computed independently and then multiplied together. This multiplicative structure means that any single factor can zero out the entire weight (the confidence gate) or amplify it (the market context multiplier), and the interaction between factors is naturally captured — a highly credible, very recent document with novel information in a volatile market receives the maximum possible weight, while a stale, low-credibility document with routine information receives a weight close to zero.
The following sections describe each component in detail.
---
## Confidence Gate
The confidence gate is the first and most decisive filter. If the extraction confidence for a document falls below the `confidence_floor` threshold — set to `0.2` in the default `ScoringConfig` — the gate evaluates to `0.0` and the entire combined weight becomes zero. The document is effectively excluded from aggregation. If the confidence meets or exceeds the threshold, the gate evaluates to `1.0` and has no further effect on the weight.
This binary gate exists because documents with very low extraction confidence are too unreliable to aggregate. A confidence of 0.15 typically means the LLM struggled to parse the document — perhaps the text was truncated, the language was ambiguous, or the document type was unusual. Including such signals would add noise rather than information. The threshold of 0.2 is deliberately low; it filters only the most unreliable extractions while allowing moderately confident signals to participate (their lower confidence is reflected through the credibility component instead).
---
## Recency Decay
The `recency_weight()` function computes an exponential decay based on how old a document is relative to the aggregation anchor time. The formula is:
```
w = 2^(age_hours / half_life)
```
A document published exactly one half-life ago receives a recency weight of `0.5`. A document published two half-lives ago receives `0.25`, and so on. A document published at or after the reference time receives the maximum weight of `1.0`.
The half-life varies by trend window, reflecting the intuition that shorter windows need faster decay to stay responsive, while longer windows should give older documents more influence. The default half-lives, configured in `ScoringConfig.half_life_hours`, are:
| Window | Half-Life |
|--------|-----------|
| `intraday` | 2 hours |
| `1d` | 12 hours |
| `7d` | 72 hours (3 days) |
| `30d` | 240 hours (10 days) |
| `90d` | 720 hours (30 days) |
For the intraday window, a document published four hours ago already has a recency weight of `0.25` — it is rapidly losing influence as newer information arrives. For the 90-day window, that same four-hour-old document still has a recency weight of essentially `1.0`, because the 30-day half-life means age only becomes significant over weeks.
A floor value of `min_recency_weight = 0.01` prevents very old documents from being completely zeroed out. Even a document from months ago retains a trace-level weight of 1%, ensuring it can still contribute to trend computation if no newer signals exist. Both timestamps are normalized to UTC; naive datetimes are treated as UTC to avoid timezone-related scoring errors.
---
## Source Credibility
The `credibility_weight()` function transforms a source's credibility score into a weight component. The raw credibility value — a float between 0.0 and 1.0 stored in the `document_intelligence` table — is first clamped to the range `[0.1, 1.0]` using the `credibility_floor` and `credibility_ceiling` parameters from `ScoringConfig`. This clamping ensures that even the least credible sources retain a minimum weight of 0.1 rather than being completely silenced, while preventing any source from exceeding a weight of 1.0.
After clamping, the value is raised to the `credibility_exponent` power. The default exponent is `1.0`, which means the clamped credibility passes through unchanged. Setting the exponent above 1.0 would penalize low-credibility sources more aggressively — for example, an exponent of 2.0 would reduce a credibility of 0.5 to a weight of 0.25. Setting it below 1.0 would flatten the curve, making the system more tolerant of lower-credibility sources. The exponent is configurable through `ScoringConfig` to allow operators to tune the credibility sensitivity without changing the scoring code.
---
## Novelty Bonus
The novelty bonus rewards documents that contain genuinely new information. The bonus is computed as:
```
novelty_bonus = novelty_score × novelty_bonus_max
```
where `novelty_score` is the 0.0-to-1.0 value produced by the extraction model (see the `ExtractionResult` schema in [Page 2](02-ai-agent-processing-and-extraction.md)) and `novelty_bonus_max` is `0.25` by default. This means the bonus ranges from `0.0` (completely routine information) to `0.25` (maximally novel information), providing up to a 25% boost to the signal weight.
The bonus enters the composite formula as `(1 + novelty_bonus)`, so it acts as a multiplicative amplifier on the base weight. A document with a novelty score of 1.0 gets its weight multiplied by 1.25; a document with a novelty score of 0.0 gets multiplied by 1.0 (no change). This design ensures that novelty can only increase a signal's weight, never decrease it — routine information is not penalized, it simply does not receive the bonus.
---
## Market Context Multiplier
The `market_context_multiplier()` function computes a boost factor based on real-time market conditions for the ticker being aggregated. The multiplier is always `>= 1.0`, meaning market context can only amplify signal weights, never reduce them. When no market context data is available (the `MarketContext` object from `services/shared/schemas.py` has `has_data == False`), the multiplier defaults to `1.0`.
Two market features contribute to the boost:
**Volatility boost.** When the ticker's price volatility exceeds the `volatility_recency_boost_threshold` (default `1.0` in price units), the excess volatility is transformed through a logarithmic scaling function: `log₁₊(excess) × 0.15`. The logarithmic scaling prevents extreme volatility from producing runaway weight amplification. The boost is capped at `volatility_recency_boost_max = 0.30`, so the maximum volatility contribution is a 30% weight increase. The rationale is that in highly volatile markets, fresh intelligence is disproportionately valuable — a signal about NVDA matters more when NVDA is swinging 5% intraday than when it is trading in a tight range.
**Volume surge boost.** When the ticker's volume change percentage exceeds `volume_surge_threshold_pct = 50.0%` (meaning trading volume is at least 50% above the prior period's average), a flat `volume_surge_boost = 0.15` is added. Unlike the volatility boost, this is binary — either the volume threshold is met and the full 15% boost applies, or it is not and no boost is added. High-volume moves carry more conviction because they represent broader market participation rather than thin-market noise.
The two boosts are additive within the multiplier: `multiplier = 1.0 + volatility_boost + volume_surge_boost`. In the most extreme case — high volatility and a volume surge — the combined multiplier reaches `1.0 + 0.30 + 0.15 = 1.45`, amplifying the signal weight by 45%. The `MarketContext` data is fetched by `services/aggregation/market_context.py` from the market data tables in PostgreSQL, using the same ticker and window parameters as the impact record query.
---
## Sentiment Mapping
Before signals can be aggregated into trend summaries, the categorical sentiment labels from the extraction output must be converted to numeric values. The `sentiment_to_numeric()` function in `services/aggregation/scoring.py` performs this mapping:
| Sentiment Label | Numeric Value |
|----------------|---------------|
| `positive` | `+1.0` |
| `negative` | `-1.0` |
| `neutral` | `0.0` |
| `mixed` | `0.0` |
The mapping is case-insensitive. Any unrecognized label defaults to `0.0`. The choice to map both `neutral` and `mixed` to `0.0` is deliberate — a mixed-sentiment document (one that contains both positive and negative signals for the same company) should not push the trend in either direction. The contradiction between the positive and negative aspects is captured separately by the contradiction detection system described in [Page 4](04-trend-aggregation-and-accumulating-signals.md), rather than being baked into the sentiment value itself.
For macro signals, the direction-to-sentiment mapping in `services/aggregation/worker.py` follows the same pattern: `positive` maps to `+1.0`, `negative` to `-1.0`, and both `mixed` and `neutral` to `0.0`. For competitive signals built by `build_pattern_weighted_signals()` in `services/aggregation/signal_propagation.py`, the sentiment is derived from the pattern's directional bias: `+1.0` if `bullish_pct > bearish_pct`, `-1.0` otherwise.
---
## Weighted Sentiment Average
The `weighted_sentiment_average()` function computes the central metric that drives trend direction: a weight-adjusted average sentiment across all signals for a ticker in a given window. The formula is:
```
weighted_avg = Σ(combined_weight × impact_score × sentiment_value) / Σ(combined_weight × impact_score)
```
Each signal contributes its sentiment value scaled by both its composite weight and its impact score. The denominator normalizes by the total effective weight, producing a value in the range `[-1.0, +1.0]`. A result near `+1.0` means the weighted evidence is overwhelmingly positive; near `-1.0` means overwhelmingly negative; near `0.0` means either neutral or evenly split.
The use of `combined_weight × impact_score` as the effective weight means that high-impact, high-weight signals dominate the average. A single high-confidence, recent, credible document with a strong impact score can outweigh several older, lower-impact documents — which is the intended behavior. The aggregation engine in `services/aggregation/worker.py` passes this weighted average to `derive_trend_direction()`, which maps it to a `TrendDirection` enum value (bullish, bearish, mixed, or neutral) using the thresholds described in [Page 4](04-trend-aggregation-and-accumulating-signals.md).
If the total effective weight is zero — either because no signals exist or all signals were gated out by the confidence floor — the function returns `0.0`, which maps to a neutral trend direction.
---
## The Three Signal Layers
The aggregation engine in `services/aggregation/worker.py` does not treat all intelligence sources equally. Signals flow through three independent layers, each with a different relative weight, before being concatenated into a single `WeightedSignal` list for trend computation. This layered architecture allows the system to incorporate diverse intelligence sources while controlling how much influence each source type has on the final trend.
### Layer 1 — Company Signals (Weight: 1.0)
Company signals are the primary layer. They are built by `build_weighted_signals()` in `services/aggregation/worker.py` from `document_impact_records` — the per-company extraction output produced by the Document Intelligence Extractor (see [Page 2](02-ai-agent-processing-and-extraction.md)). Each impact record's sentiment is converted via `sentiment_to_numeric()`, and its impact score is used directly without any layer-level scaling. The `compute_signal_weight()` function produces the composite weight using the document's publication time, source credibility, novelty score, extraction confidence, and the ticker's current market context.
Company signals carry a relative weight of `1.0` — they are the baseline against which other layers are measured. This reflects the design principle that direct, company-specific intelligence (an earnings report about AAPL, a product launch by TSLA, a lawsuit against META) is the most relevant and reliable signal for that company's trend.
### Layer 2 — Macro Signals (Weight: 0.3)
Macro signals capture the indirect impact of global events on individual companies. They are built by `build_macro_weighted_signals()` in `services/aggregation/worker.py` from `macro_impact_records` — the per-company impact scores computed by the exposure-based interpolation engine after the Global Event Classifier processes a macro news article. The sentiment is mapped from the `impact_direction` field (`positive``+1.0`, `negative``-1.0`, `mixed`/`neutral``0.0`), and the impact score is scaled by `MACRO_SIGNAL_WEIGHT`, which defaults to `0.3` in `AggregationConfig`.
The 0.3 weight means that a macro signal's impact score is reduced to 30% of its raw value before entering the aggregation. This attenuation reflects the inherent uncertainty in macro-to-company impact estimation — a tariff announcement might affect XOM's revenue, but the magnitude depends on exposure profiles, supply chain flexibility, and competitive dynamics that the interpolation engine can only approximate. By weighting macro signals at 0.3 relative to company signals at 1.0, the system ensures that macro intelligence informs the trend without overwhelming direct company-specific evidence.
The recency decay, credibility, and confidence gating for macro signals use the same `compute_signal_weight()` function as company signals. The `published_at` timestamp comes from the global event's source document (the macro news article), and the `source_credibility` and `extraction_confidence` both use the macro impact record's `confidence` field.
### Layer 3 — Competitive Signals (Weight: 0.2)
Competitive signals capture cross-company effects: when a catalyst hits one company, historical patterns suggest how competitors might be affected. They are built by `build_pattern_weighted_signals()` in `services/aggregation/signal_propagation.py` from two sources: `HistoricalPattern` objects (self-company patterns mined by `services/aggregation/pattern_matcher.py`) and `CompetitiveSignalRecord` objects (cross-company propagation signals stored in `competitive_signal_records`).
For historical patterns, the sentiment is derived from the pattern's directional bias (`+1.0` if `bullish_pct > bearish_pct`, `-1.0` otherwise), and the impact score is the pattern's `avg_strength` multiplied by `competitive_signal_weight` (default `0.2` from `CompetitiveConfig`). The `published_at` for recency decay uses the pattern's `data_end` — the most recent data point in the pattern's sample — and the `extraction_confidence` uses the pattern's `pattern_confidence`. Source credibility is set to `1.0` because patterns are derived from validated historical data, and novelty is fixed at `0.5`.
For competitive signal records, the same structure applies: sentiment from `signal_direction`, impact from `signal_strength × competitive_signal_weight`, recency from `computed_at`, and confidence from `pattern_confidence`.
The 0.2 weight makes competitive signals the lightest layer. This is appropriate because competitive signal propagation involves the most inference — the system is predicting how Company B will react based on what happened to Company A in historically similar situations. The signal is valuable as supplementary evidence but should not drive trend direction on its own.
---
## Signal Merging in the Aggregation Engine
The `aggregate_company_window()` function in `services/aggregation/worker.py` orchestrates the merging of all three layers for a single ticker and window. The process follows a clear sequence:
1. **Fetch company impact records** from `document_impact_records` for the ticker within the window's time range.
2. **Fetch market context** for the ticker from market data tables.
3. **Build company weighted signals** via `build_weighted_signals()`.
4. **Check the macro toggle** — query `risk_configs` for the `macro_enabled` flag, then fetch and merge macro signals if enabled.
5. **Check the competitive toggle** — query `risk_configs` for the `competitive_enabled` flag, then fetch patterns, fetch competitive signals, and merge if enabled.
6. **Concatenate** all `WeightedSignal` lists into a single list.
7. **Assemble the `TrendSummary`** from the merged signals.
The concatenation in step 6 is a simple list append — `signals = signals + macro_signals` followed by `signals = signals + pattern_weighted`. There is no re-weighting or normalization at the merge point. The relative influence of each layer is already encoded in the impact scores (scaled by 0.3 for macro, 0.2 for competitive, 1.0 for company) and in the composite weights computed by `compute_signal_weight()`. The `weighted_sentiment_average()` function then naturally produces a sentiment average that reflects these relative weights.
---
## Runtime Toggles and Graceful Degradation
Both the macro and competitive signal layers can be enabled or disabled at runtime through the `risk_configs` PostgreSQL table, without restarting any service. The toggle state is read fresh from the database at the start of every aggregation cycle — there is no caching — so changes take effect on the very next cycle.
The `fetch_macro_enabled()` function in `services/aggregation/worker.py` queries the most recent active `risk_configs` row and reads the `config->>'macro_enabled'` JSON field. If the field is explicitly set to `"true"` or `"false"`, that value overrides the `AggregationConfig` default. If no config row exists or the field is absent, the function returns `None` and the engine falls back to the `AggregationConfig.macro_enabled` default (which is `True`). The `fetch_competitive_enabled()` function follows the identical pattern for the `competitive_enabled` field.
When a layer is disabled, the aggregation engine simply skips the fetch-and-merge step for that layer. Company signals are always computed — they cannot be toggled off. This means the system degrades gracefully: disabling the macro layer produces trends based on company signals alone (plus competitive signals if enabled), and disabling the competitive layer produces trends based on company and macro signals. Disabling both layers reduces the engine to its original single-layer behavior, using only direct document intelligence.
Crucially, disabling a layer does not stop upstream processing. When the macro layer is disabled, the Global Event Classifier continues to classify macro events and the interpolation engine continues to compute `macro_impact_records`. The data accumulates in PostgreSQL. When the layer is re-enabled, the aggregation engine immediately picks up all the macro impact records that were computed while the layer was disabled — there is no data loss or gap in coverage. The same applies to competitive signals: pattern mining and signal propagation continue regardless of the toggle state.
If the competitive signal fetch fails at runtime (for example, due to a database timeout), the aggregation engine catches the exception, logs it, and continues with company and macro signals only. This exception-based graceful degradation ensures that a transient failure in one layer does not block trend computation entirely.
---
## What Comes Next
At this point, every document intelligence record, macro impact record, and competitive signal record has been transformed into a `WeightedSignal` with a composite weight that encodes recency, credibility, novelty, confidence, and market conditions. The three signal layers have been merged into a single list, and the weighted sentiment average has been computed. But a single aggregation cycle produces only a snapshot — a point-in-time view of the evidence. The real power of the system emerges when these snapshots accumulate across multiple documents and time windows, building a case for action. [Page 4 — Trend Aggregation and Accumulating Signals](04-trend-aggregation-and-accumulating-signals.md) explains how the aggregation engine computes `TrendSummary` objects across five time windows, how consecutive same-direction signals strengthen trend confidence and escalate the system's response from neutral observation to actionable trading recommendations, and how contradiction detection and evidence ranking ensure that the trend reflects genuine consensus rather than noise.