Files
stonks-oracle/docs/sanitized-pipeline-deep-dive/04-trend-aggregation-and-accumulating-signals.md
T
Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
2026-04-22 02:56:41 +00:00

268 lines
28 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Page 4 — Trend Aggregation and Accumulating Signals
The scoring layer described in [Page 3](03-signal-scoring-and-weighted-signals.md) transforms every intelligence record into a `WeightedSignal` — a document reference paired with a composite weight that encodes recency, credibility, novelty, confidence, and environmental conditions. Three independent signal layers (Entity-Specific at weight 1.0, Environmental at 0.3, Relational at 0.2) each produce `WeightedSignal` objects that are concatenated into a single list. But a single list of weighted signals is still just raw material. The aggregation engine in `services/aggregation/worker.py` is where that raw material becomes a decision-grade assessment: a `TrendSummary` object that captures the direction, strength, confidence, contradiction level, and supporting evidence for an entity across a specific time window. This page explains how that transformation works — from weighted sentiment averages through trend direction derivation, contradiction detection, evidence ranking, and confidence computation — and, critically, how consecutive signals pointing in the same direction accumulate across documents and time windows to escalate the system's response from passive observation to actionable decision recommendations.
For a visual overview of the accumulation and escalation process, see the [Trend Accumulation and Escalation diagram](diagrams/trend-accumulation-escalation.md). For how the three signal layers merge into the aggregation engine, see the [Three-Layer Signal Merging diagram](diagrams/three-layer-signal-merging.md).
---
## Five Time Windows
The aggregation engine does not compute a single trend for each entity. It computes five, one for each time window defined in `services/aggregation/worker.py`:
| Window | Lookback Duration |
|--------|-------------------|
| `intraday` | 12 hours |
| `1d` | 1 day |
| `7d` | 7 days |
| `30d` | 30 days |
| `90d` | 90 days |
Each window produces an independent `TrendSummary` by fetching all impact records, macro impacts, and competitive signals for the entity within that window's time range. The `aggregate_company_window()` function in `services/aggregation/worker.py` orchestrates this per-window computation: it determines the time range from the window's lookback duration, fetches `document_impact_records` from PostgreSQL, retrieves environmental context, builds entity-specific weighted signals, checks the macro and competitive runtime toggles (see [Page 3](03-signal-scoring-and-weighted-signals.md) for toggle details), merges any enabled layer signals, and then assembles the `TrendSummary`.
The five-window design serves a specific purpose. Short windows (intraday, 1d) capture fast-moving sentiment shifts — a breaking negative performance disclosure, a sudden regulatory action — while long windows (30d, 90d) reveal sustained trends that persist across many documents and data cycles. An entity might show a negative intraday trend after a single unfavorable article, but a neutral 30-day trend because the broader evidence base is balanced. The recommendation engine downstream (described in [Page 5](05-recommendation-generation.md)) evaluates each window's `TrendSummary` independently, so the system can respond to both short-term catalysts and long-term directional shifts.
The `aggregate_company()` function iterates over all effective windows (configurable via `AggregationConfig.windows`, defaulting to all five) and calls `aggregate_company_window()` for each one. This means a single aggregation cycle for one entity produces up to five `TrendSummary` objects, each reflecting a different temporal perspective on the same underlying evidence.
---
## Trend Direction Derivation
Once the weighted sentiment average has been computed from the merged signal list (see the `weighted_sentiment_average()` function described in [Page 3](03-signal-scoring-and-weighted-signals.md)), the `derive_trend_direction()` function in `services/aggregation/worker.py` maps that numeric value to a `TrendDirection` enum. The rules are evaluated in a specific order, and the first matching rule wins:
1. **Mixed** — If the contradiction score exceeds `0.10` (the `MIXED_THRESHOLD` constant) *and* the absolute value of the average sentiment is below `0.30`, the direction is `MIXED`. This rule fires first because high contradiction with a weak directional signal indicates genuine disagreement in the evidence — the trend is not simply neutral, it is actively contested.
2. **Positive** — If the average sentiment is `≥ 0.15` (the `POSITIVE_THRESHOLD` constant), the direction is `POSITIVE`. This means the weight-adjusted evidence leans favorable with enough conviction to cross the threshold.
3. **Negative** — If the average sentiment is `≤ -0.15` (the `NEGATIVE_THRESHOLD` constant), the direction is `NEGATIVE`. The symmetric threshold ensures that positive and negative classifications require the same magnitude of evidence.
4. **Neutral** — If none of the above conditions are met, the direction is `NEUTRAL`. This covers the range where the average sentiment falls between -0.15 and +0.15 without high contradiction — the evidence is either balanced or insufficient to establish a directional lean.
The mixed-first evaluation order is important. Consider a scenario where five documents are positive and four are negative, all with similar weights. The weighted sentiment average might be slightly positive (say, +0.08), which would normally map to neutral. But the contradiction score — computed from the minority/majority weight split — would be high (close to 0.44). The mixed rule catches this case: the evidence is not neutral, it is conflicted. This distinction matters downstream because mixed trends receive different treatment in the recommendation engine than neutral trends.
---
## Contradiction Detection
The contradiction detection module in `services/aggregation/contradiction.py` provides a structured analysis of disagreement within the signal set. Rather than collapsing contradictory evidence into a single number, it produces a `ContradictionResult` containing both an overall score and a list of `DisagreementDetail` objects that explain *where* the disagreement lies.
The `detect_contradictions()` function runs two analyses:
### Sentiment Disagreement
The `_detect_sentiment_disagreement()` function examines whether both positive and negative sentiment signals exist in the signal set. For each signal with a non-zero effective weight (`combined_weight × impact_score > 0`), it classifies the signal as positive or negative based on its `sentiment_value` and accumulates the effective weight for each side. If both sides have at least one signal, it produces a `DisagreementDetail` with dimension `"sentiment"`, listing the document IDs and weights for each side, along with a human-readable description like "Sentiment split: 3 positive vs 2 negative signals (minority weight ratio 38%)".
### Catalyst-Level Disagreement
The `_detect_catalyst_disagreement()` function goes deeper. It groups signals by their `catalyst_type` (performance_report, product_launch, regulatory, etc.) using `CatalystEntry` objects built from the `document_impact_records`. Within each catalyst group, it checks whether both positive and negative signals exist. If they do, it produces a `DisagreementDetail` with dimension `"catalyst:<type>"` — for example, `"catalyst:performance_report"` when some documents interpret a periodic disclosure positively and others negatively. This catalyst-level analysis is valuable because it pinpoints the specific topic of disagreement rather than just flagging that disagreement exists somewhere in the evidence.
### The Overall Contradiction Score
The `_compute_overall_score()` function computes the backward-compatible scalar contradiction score using the minority/majority weight ratio formula:
```
contradiction_score = minority_weight / total_weight
```
where `minority_weight` is the smaller of the positive and negative effective weights, and `total_weight` is their sum. Signals with zero effective weight or neutral sentiment are excluded. The score ranges from `0.0` (complete agreement — all signals point the same direction) to `0.5` (perfect split — positive and negative weights are exactly equal). A score of `0.0` means no contradiction at all. A score above `0.10` combined with a weak average sentiment triggers the mixed direction classification in `derive_trend_direction()`.
The contradiction score also feeds directly into the confidence computation as a penalty, described in the next section. High contradiction reduces the system's confidence in the trend, which in turn affects whether the trend can escalate to actionable recommendations.
---
## Evidence Ranking
Not all documents contributing to a trend are equally important. The `rank_evidence()` function in `services/aggregation/worker.py` delegates to the evidence ranking module (`services/aggregation/evidence.py`) to produce ordered lists of the most influential supporting and opposing documents. The ranking uses a composite scoring approach configured by `EvidenceRankConfig`, considering multiple factors:
- **Weight** — the signal's composite weight from the scoring layer, reflecting recency, credibility, novelty, confidence, and environmental context.
- **Impact** — the extraction's impact score for the entity, reflecting how significant the document's content is.
- **Recency** — how recently the document was published, with more recent documents ranked higher.
- **Confidence** — the extraction confidence, reflecting how reliably the LLM parsed the document.
Signals are split into supporting (positive sentiment) and opposing (negative sentiment) groups. Neutral and mixed sentiment signals are excluded from evidence lists — they do not argue for or against the trend direction. Within each group, signals are sorted by their composite rank score in descending order, and the top entries (up to `MAX_EVIDENCE_REFS = 10` per side) are returned as document ID lists.
The `assemble_trend_with_evidence()` function in `services/aggregation/worker.py` uses the detailed variant `rank_evidence_detailed()` to get `RankedEvidence` objects that include the individual scoring components (weight, impact, recency, confidence, sentiment value). These detailed rankings are persisted to the `trend_evidence` table for auditability, while the document ID lists are stored directly in the `TrendSummary` as `top_supporting_evidence` and `top_opposing_evidence`.
The evidence ranking serves two purposes. First, it provides the recommendation engine with the most relevant documents to cite in its thesis generation (see [Page 5](05-recommendation-generation.md)). Second, it gives human reviewers a quick way to understand *why* the system reached a particular trend assessment — the top-ranked documents are the ones that most influenced the direction and strength.
---
## Confidence Computation
The `compute_trend_confidence()` function in `services/aggregation/worker.py` produces the confidence score for a `TrendSummary`. This score is critical because it directly gates whether a trend can produce actionable recommendations — the eligibility evaluation in `services/recommendation/eligibility.py` requires a minimum confidence of `0.35` to generate any recommendation at all, and higher confidence thresholds control escalation to simulation and live execution modes.
Confidence is computed from four components:
### Unique Source Count
The function counts the number of unique document IDs across all active signals (those with `combined_weight > 0`). This count is divided by 15 and capped at `0.8`:
```
count_factor = min(unique_sources / 15.0, 0.8)
```
A trend backed by 15 or more unique source documents reaches the maximum count contribution of `0.8`. A trend backed by a single document gets only `0.067`. This component rewards breadth of evidence — a trend confirmed by many independent sources is more trustworthy than one driven by a single article, regardless of how high that article's individual weight might be.
### Average Extraction Credibility
The average credibility weight across all active signals provides a baseline quality measure. If most contributing documents come from high-credibility sources, this component is high. If the evidence is dominated by low-credibility sources, confidence is penalized accordingly.
### Signal Agreement with Sample-Size Dampening
The agreement ratio measures what fraction of directional signals (positive + negative, excluding neutral) agree on the majority direction. If 8 out of 10 directional signals are positive, the raw agreement is `0.8`. But raw agreement is misleading with small sample sizes — 1 out of 1 signals agreeing gives a perfect `1.0` agreement, which is not meaningful.
To address this, the agreement is dampened by a logarithmic sample-size factor:
```
agreement_dampener = min(1.0, log₂(unique_sources + 1) / log₂(8))
```
This dampener saturates at `1.0` when `unique_sources` reaches approximately 7 (since `log₂(8) = 3.0` and `log₂(8) = 3.0`). With fewer sources, the dampener reduces the agreement contribution: 1 source gives a dampener of `0.33`, 3 sources give `0.67`, and 7 sources give the full `1.0`. The log₂ scaling means that each additional source provides diminishing marginal improvement to the dampener, which matches the intuition that the jump from 1 to 3 sources is far more meaningful than the jump from 15 to 17.
### Contradiction Penalty
The contradiction score computed by `services/aggregation/contradiction.py` is applied as a direct penalty:
```
contradiction_penalty = contradiction_score × 0.4
```
A contradiction score of `0.5` (perfect split) produces a penalty of `0.2`, which is substantial enough to push a moderately confident trend below the eligibility threshold.
### The Combined Formula
The four components are combined as:
```
confidence = 0.3 × count_factor + 0.3 × avg_credibility + 0.4 × agreement contradiction_penalty
```
The result is clamped to `[0.0, 1.0]`. The weighting gives signal agreement the largest share (40%), reflecting the principle that consensus among diverse sources is the strongest indicator of a reliable trend. Source count and credibility each contribute 30%, providing a balanced assessment of evidence breadth and quality. The contradiction penalty can reduce confidence significantly — a highly contradicted trend with a score of 0.4 loses 0.16 points of confidence, which can easily drop it below the 0.35 eligibility gate.
---
## How Accumulating Signals Escalate Decisions
The trend direction, strength, and confidence computed by the aggregation engine are not just descriptive — they directly determine what action the system takes. The escalation path from passive observation to active execution is governed by the eligibility thresholds defined in `services/recommendation/eligibility.py`, and the key insight is that consecutive signals pointing in the same direction naturally strengthen the trend metrics that control this escalation.
### The Escalation Ladder
The `EligibilityConfig` dataclass in `services/recommendation/eligibility.py` defines the thresholds that map trend metrics to actions:
**Neutral (no recommendation).** A trend fails the eligibility gates entirely when confidence is below `0.35`, trend strength is below `0.10`, contradiction exceeds `0.60`, evidence count is below `2`, or the direction is neutral. The `_check_gates()` function evaluates these hard gates — if any gate fails, no recommendation is generated for that window.
**Observe.** A trend that passes the gates but has a direction of mixed, or has strength below `0.25` with confidence below `0.50`, maps to an `OBSERVE` action via `_determine_action()`. This is the system's way of saying "something is happening, but the evidence is not strong enough to act on." Observe recommendations are always `informational` mode — they are logged for human review but never trigger decisions.
**Monitor.** When the trend has a clear direction (positive or negative) but strength remains below `0.25` while confidence reaches `0.50` or above, the action maps to `MONITOR`. This indicates that the directional signal is real but not yet strong enough for a commitment change. Like observe, monitor recommendations are `informational` mode.
**Act / Defer.** When trend strength reaches `0.25` or above with a positive direction, the action is `ACT`. With a negative direction at the same strength threshold, the action is `DEFER`. These are the only actions that can escalate beyond informational mode — `_determine_mode()` evaluates whether the recommendation qualifies for `simulation_eligible` (confidence ≥ `0.50`) or `production_eligible` (confidence ≥ `0.70`, contradiction ≤ `0.25`, evidence ≥ `5`).
### How Accumulation Drives Escalation
Consider an entity that starts with no recent intelligence. The first negative article arrives — a single document with negative sentiment. In the intraday window, this produces:
- **Trend strength** = `|avg_sentiment|` ≈ the absolute weighted sentiment from one signal, likely close to the impact score.
- **Confidence** = low, because `count_factor = min(1/15, 0.8) = 0.067` and the agreement dampener is only `log₂(2)/log₂(8) = 0.33`.
- **Direction** = negative (if the weighted sentiment is ≤ -0.15).
With confidence well below `0.35`, this trend fails the eligibility gate entirely. No recommendation is generated. The system is in the neutral state.
A second negative article arrives hours later. Now the intraday window has two signals:
- **Unique sources** = 2, so `count_factor = 0.133` and `agreement_dampener = log₂(3)/log₂(8) ≈ 0.53`.
- **Agreement** = `1.0 × 0.53 = 0.53` (both signals agree on negative).
- **Confidence** ≈ `0.3 × 0.133 + 0.3 × avg_cred + 0.4 × 0.53` — likely around `0.35-0.45` depending on credibility.
If confidence crosses `0.35` and strength exceeds `0.10`, the trend passes the eligibility gates. But with strength below `0.25`, the action is `OBSERVE` or `MONITOR` depending on confidence.
A third and fourth negative article arrive over the next day. The 1-day window now has four agreeing signals:
- **Unique sources** = 4, so `count_factor = 0.267` and `agreement_dampener = log₂(5)/log₂(8) ≈ 0.77`.
- **Agreement** = `1.0 × 0.77 = 0.77`.
- **Confidence** ≈ `0.3 × 0.267 + 0.3 × avg_cred + 0.4 × 0.77` — likely `0.50-0.60`.
- **Strength** = `|avg_sentiment|` — with four negative signals and no contradicting evidence, this could easily exceed `0.25`.
Now the trend maps to `DEFER` with `simulation_eligible` mode (confidence ≥ `0.50`). The system has escalated from no recommendation to a simulation-eligible defer recommendation purely through the accumulation of consistent negative evidence.
If the negative evidence continues — more documents, more sources, higher credibility — confidence climbs further. At confidence ≥ `0.70` with contradiction ≤ `0.25` and evidence ≥ `5`, the recommendation reaches `production_eligible` mode, the highest escalation level.
The same process works in reverse for positive accumulation: consecutive favorable signals strengthen the positive trend, increase confidence through source diversity and agreement, and escalate from observe through monitor to act.
### The Role of Contradiction in Preventing False Escalation
Accumulation only works when signals agree. If the fifth article about an entity is positive while the previous four were negative, the contradiction score jumps — `minority_weight / total_weight` increases because the minority (positive) side now has non-zero weight. This has two effects: the contradiction penalty reduces confidence (potentially dropping it below an eligibility threshold), and if the contradiction exceeds `0.10` with `|avg_sentiment| < 0.30`, the direction flips to mixed, which maps to `OBSERVE` regardless of strength. The system effectively de-escalates when the evidence becomes contested, requiring a clearer consensus before re-escalating.
---
## Trend Projections
After the `TrendSummary` is assembled and persisted, the aggregation engine computes a forward-looking `TrendProjection` via `compute_projection()` in `services/aggregation/projection.py`. Projections estimate where the trend is heading based on current momentum, macro signal decay, and upcoming catalysts. They are advisory — they do not directly trigger recommendations — but they provide valuable context for human reviewers and can inform future automated decision-making.
### Momentum
The `compute_trend_momentum()` function computes the rate of change in signed trend strength between the current and previous aggregation cycles. If the current window shows a negative trend at strength `0.40` and the previous cycle showed negative at `0.30`, the momentum is `-0.10` (strengthening negative). If no previous data is available, the function uses a heuristic: momentum is estimated as half the current signed strength, providing a reasonable baseline for new trends.
Momentum enters the projection as a half-weighted adjustment to the current signed strength:
```
momentum_projected_signed = direction_sign × current_strength + momentum × 0.5
```
This means momentum influences the projection but does not dominate it — a strong current trend with weakening momentum still projects as directional, just with reduced strength.
### Macro Decay
The `project_macro_decay()` function estimates how active macro events will evolve over the projection horizon. Each macro event has an `estimated_duration` that maps to a decay half-life:
| Duration | Half-Life |
|----------|-----------|
| `short_term` | 1 day |
| `medium_term` | 7 days |
| `long_term` | 30 days |
For each event, the function computes the projected remaining impact at the end of the horizon using exponential decay: `future_factor = 2^(future_age_days / half_life)`. The impact is further scaled by a severity weight (`critical`: 1.0, `high`: 0.75, `moderate`: 0.5, `low`: 0.25). Positive and negative macro impacts are accumulated separately, and the projected macro direction is determined by comparing the two sides — positive if the favorable side exceeds the unfavorable by 20%, negative if the reverse, mixed if both are present without a clear majority.
When the macro layer is enabled and macro events exist, the projection blends the entity-specific momentum projection with the macro trajectory. The macro weight is capped at `0.4` (40% of the blended projection), ensuring that macro signals inform but do not overwhelm the entity-specific trend. The blending formula combines the signed entity projection with the signed macro projection:
```
blended = company_weight × momentum_projected + macro_weight × macro_signed
```
### Driving Factors
The projection records a list of human-readable driving factors that explain what is influencing the projected direction. These include momentum descriptions ("Positive momentum (+0.150) in recent trend strength"), macro impact projections ("Macro signals project negative impact (strength 0.350) over 7d"), and upcoming catalysts drawn from the trend's `dominant_catalysts` list (limited to the top 3). If no specific factors are identified, a baseline continuation factor is recorded.
### Divergence Detection
After computing the projected direction, the function compares it to the current trend direction. If they differ — for example, the current trend is negative but the projection is positive due to decaying unfavorable macro events and favorable momentum — the projection is flagged with `diverges_from_current = True` and a divergence driving factor is appended. Divergence signals are particularly valuable because they indicate that the trend may be about to reverse, giving the recommendation engine and human reviewers an early warning.
The projection also flags low confidence when `projected_confidence` falls below the default threshold of `0.3`. Projection confidence starts at 80% of the current trend confidence (reflecting the inherent uncertainty of forward-looking estimates), with a small boost if macro data is available and a further reduction if the macro layer is disabled entirely.
---
## Persistence
Each aggregation cycle persists its results to four PostgreSQL tables, creating a durable record of the trend assessment and its supporting evidence.
### `trend_windows` — Current State
The `persist_trend_summary()` function in `services/aggregation/worker.py` upserts the `TrendSummary` into the `trend_windows` table, keyed by `(entity_type, entity_id, window)`. Each cycle overwrites the previous row for that entity and window, so `trend_windows` always reflects the most recent assessment. The row includes the trend direction, strength, confidence, contradiction score, disagreement details (as JSON), supporting and opposing evidence document IDs (as JSON arrays), dominant catalysts, material risks, environmental context, and the generation timestamp.
### `trend_history` — Time-Series Snapshots
Immediately after the upsert, `persist_trend_summary()` also inserts a snapshot row into the `trend_history` table. Unlike `trend_windows`, this table is append-only — every aggregation cycle adds a new row, creating a time-series of how the trend evolved over time. The history table stores the direction, strength, confidence, contradiction score, catalysts, risks, and timestamp. This time-series data powers the trend charts in the dashboard and enables the momentum computation in `services/aggregation/projection.py` by providing the previous cycle's strength and direction. If the history insert fails (for example, if the table does not yet exist in a development environment), the failure is logged at debug level and does not block the main upsert.
### `trend_evidence` — Per-Document Rankings
The `persist_trend_evidence()` function writes detailed evidence ranking rows to the `trend_evidence` table, linked to the `trend_windows` row by its UUID. Each row records a document ID, its role (supporting or opposing), and the individual scoring components: rank score, weight component, impact component, recency component, confidence component, and sentiment value. Non-UUID document IDs (such as synthetic pattern signal IDs like `pattern:Entity-A:performance_report:7d`) are filtered out before insertion, since the `trend_evidence` table enforces a foreign key to the `documents` table.
### `trend_projections` — Forward-Looking Estimates
The `persist_trend_projection()` function in `services/aggregation/projection.py` inserts the `TrendProjection` into the `trend_projections` table, linked to the `trend_windows` row. The row stores the projected direction, strength, confidence, projection horizon, driving factors (as JSON), macro contribution percentage, divergence flag, and computation timestamp. Like trend history, projections accumulate over time, allowing analysis of how well the system's forward-looking estimates matched subsequent reality.
---
## What Comes Next
At this point, the aggregation engine has transformed weighted signals into `TrendSummary` objects across five time windows, detected contradictions, ranked evidence, computed confidence, and persisted everything to PostgreSQL. The trend metrics — direction, strength, confidence, contradiction score — encode the accumulated weight of evidence for each entity. But a `TrendSummary` is still an assessment, not an action. The next stage translates these assessments into concrete recommendations: should the system act, defer, monitor, or simply observe? And with what conviction? [Page 5 — Recommendation Generation](05-recommendation-generation.md) explains how the recommendation engine applies data quality suppression, eligibility evaluation, commitment sizing, thesis generation, and risk classification to convert trend summaries into actionable `Recommendation` objects that the decision execution engine can execute.