Files

T

Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services

- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs

2026-04-22 02:56:41 +00:00

27 KiB

Raw Permalink Blame History

Page 4 — Trend Aggregation and Accumulating Signals

The scoring layer described in Page 3 transforms every intelligence record into a WeightedSignal — a document reference paired with a composite weight that encodes recency, credibility, novelty, confidence, and market conditions. Three independent signal layers (Company at weight 1.0, Macro at 0.3, Competitive at 0.2) each produce WeightedSignal objects that are concatenated into a single list. But a single list of weighted signals is still just raw material. The aggregation engine in services/aggregation/worker.py is where that raw material becomes a decision-grade assessment: a TrendSummary object that captures the direction, strength, confidence, contradiction level, and supporting evidence for a ticker across a specific time window. This page explains how that transformation works — from weighted sentiment averages through trend direction derivation, contradiction detection, evidence ranking, and confidence computation — and, critically, how consecutive signals pointing in the same direction accumulate across documents and time windows to escalate the system's response from passive observation to actionable trading recommendations.

For a visual overview of the accumulation and escalation process, see the Trend Accumulation and Escalation diagram. For how the three signal layers merge into the aggregation engine, see the Three-Layer Signal Merging diagram.

Five Time Windows

The aggregation engine does not compute a single trend for each ticker. It computes five, one for each time window defined in services/aggregation/worker.py:

Window	Lookback Duration
`intraday`	12 hours
`1d`	1 day
`7d`	7 days
`30d`	30 days
`90d`	90 days

Each window produces an independent TrendSummary by fetching all impact records, macro impacts, and competitive signals for the ticker within that window's time range. The aggregate_company_window() function in services/aggregation/worker.py orchestrates this per-window computation: it determines the time range from the window's lookback duration, fetches document_impact_records from PostgreSQL, retrieves market context, builds company weighted signals, checks the macro and competitive runtime toggles (see Page 3 for toggle details), merges any enabled layer signals, and then assembles the TrendSummary.

The five-window design serves a specific purpose. Short windows (intraday, 1d) capture fast-moving sentiment shifts — a breaking earnings miss, a sudden regulatory action — while long windows (30d, 90d) reveal sustained trends that persist across many documents and news cycles. A ticker might show a bearish intraday trend after a single negative article, but a neutral 30-day trend because the broader evidence base is balanced. The recommendation engine downstream (described in Page 5) evaluates each window's TrendSummary independently, so the system can respond to both short-term catalysts and long-term directional shifts.

The aggregate_company() function iterates over all effective windows (configurable via AggregationConfig.windows, defaulting to all five) and calls aggregate_company_window() for each one. This means a single aggregation cycle for one ticker produces up to five TrendSummary objects, each reflecting a different temporal perspective on the same underlying evidence.

Trend Direction Derivation

Once the weighted sentiment average has been computed from the merged signal list (see the weighted_sentiment_average() function described in Page 3), the derive_trend_direction() function in services/aggregation/worker.py maps that numeric value to a TrendDirection enum. The rules are evaluated in a specific order, and the first matching rule wins:

Mixed — If the contradiction score exceeds 0.10 (the MIXED_THRESHOLD constant) and the absolute value of the average sentiment is below 0.30, the direction is MIXED. This rule fires first because high contradiction with a weak directional signal indicates genuine disagreement in the evidence — the trend is not simply neutral, it is actively contested.
Bullish — If the average sentiment is ≥ 0.15 (the BULLISH_THRESHOLD constant), the direction is BULLISH. This means the weight-adjusted evidence leans positive with enough conviction to cross the threshold.
Bearish — If the average sentiment is ≤ -0.15 (the BEARISH_THRESHOLD constant), the direction is BEARISH. The symmetric threshold ensures that bullish and bearish classifications require the same magnitude of evidence.
Neutral — If none of the above conditions are met, the direction is NEUTRAL. This covers the range where the average sentiment falls between -0.15 and +0.15 without high contradiction — the evidence is either balanced or insufficient to establish a directional lean.

The mixed-first evaluation order is important. Consider a scenario where five documents are bullish and four are bearish, all with similar weights. The weighted sentiment average might be slightly positive (say, +0.08), which would normally map to neutral. But the contradiction score — computed from the minority/majority weight split — would be high (close to 0.44). The mixed rule catches this case: the evidence is not neutral, it is conflicted. This distinction matters downstream because mixed trends receive different treatment in the recommendation engine than neutral trends.

Contradiction Detection

The contradiction detection module in services/aggregation/contradiction.py provides a structured analysis of disagreement within the signal set. Rather than collapsing contradictory evidence into a single number, it produces a ContradictionResult containing both an overall score and a list of DisagreementDetail objects that explain where the disagreement lies.

The detect_contradictions() function runs two analyses:

Sentiment Disagreement

The _detect_sentiment_disagreement() function examines whether both positive and negative sentiment signals exist in the signal set. For each signal with a non-zero effective weight (combined_weight × impact_score > 0), it classifies the signal as positive or negative based on its sentiment_value and accumulates the effective weight for each side. If both sides have at least one signal, it produces a DisagreementDetail with dimension "sentiment", listing the document IDs and weights for each side, along with a human-readable description like "Sentiment split: 3 positive vs 2 negative signals (minority weight ratio 38%)".

Catalyst-Level Disagreement

The _detect_catalyst_disagreement() function goes deeper. It groups signals by their catalyst_type (earnings, product_launch, regulatory, etc.) using CatalystEntry objects built from the document_impact_records. Within each catalyst group, it checks whether both positive and negative signals exist. If they do, it produces a DisagreementDetail with dimension "catalyst:<type>" — for example, "catalyst:earnings" when some documents interpret an earnings report positively and others negatively. This catalyst-level analysis is valuable because it pinpoints the specific topic of disagreement rather than just flagging that disagreement exists somewhere in the evidence.

The Overall Contradiction Score

The _compute_overall_score() function computes the backward-compatible scalar contradiction score using the minority/majority weight ratio formula:

contradiction_score = minority_weight / total_weight

where minority_weight is the smaller of the positive and negative effective weights, and total_weight is their sum. Signals with zero effective weight or neutral sentiment are excluded. The score ranges from 0.0 (complete agreement — all signals point the same direction) to 0.5 (perfect split — positive and negative weights are exactly equal). A score of 0.0 means no contradiction at all. A score above 0.10 combined with a weak average sentiment triggers the mixed direction classification in derive_trend_direction().

The contradiction score also feeds directly into the confidence computation as a penalty, described in the next section. High contradiction reduces the system's confidence in the trend, which in turn affects whether the trend can escalate to actionable recommendations.

Evidence Ranking

Not all documents contributing to a trend are equally important. The rank_evidence() function in services/aggregation/worker.py delegates to the evidence ranking module (services/aggregation/evidence.py) to produce ordered lists of the most influential supporting and opposing documents. The ranking uses a composite scoring approach configured by EvidenceRankConfig, considering multiple factors:

Weight — the signal's composite weight from the scoring layer, reflecting recency, credibility, novelty, confidence, and market context.
Impact — the extraction's impact score for the company, reflecting how significant the document's content is.
Recency — how recently the document was published, with more recent documents ranked higher.
Confidence — the extraction confidence, reflecting how reliably the LLM parsed the document.

Signals are split into supporting (positive sentiment) and opposing (negative sentiment) groups. Neutral and mixed sentiment signals are excluded from evidence lists — they do not argue for or against the trend direction. Within each group, signals are sorted by their composite rank score in descending order, and the top entries (up to MAX_EVIDENCE_REFS = 10 per side) are returned as document ID lists.

The assemble_trend_with_evidence() function in services/aggregation/worker.py uses the detailed variant rank_evidence_detailed() to get RankedEvidence objects that include the individual scoring components (weight, impact, recency, confidence, sentiment value). These detailed rankings are persisted to the trend_evidence table for auditability, while the document ID lists are stored directly in the TrendSummary as top_supporting_evidence and top_opposing_evidence.

The evidence ranking serves two purposes. First, it provides the recommendation engine with the most relevant documents to cite in its thesis generation (see Page 5). Second, it gives human reviewers a quick way to understand why the system reached a particular trend assessment — the top-ranked documents are the ones that most influenced the direction and strength.

Confidence Computation

The compute_trend_confidence() function in services/aggregation/worker.py produces the confidence score for a TrendSummary. This score is critical because it directly gates whether a trend can produce actionable recommendations — the eligibility evaluation in services/recommendation/eligibility.py requires a minimum confidence of 0.35 to generate any recommendation at all, and higher confidence thresholds control escalation to paper and live trading modes.

Confidence is computed from four components:

Unique Source Count

The function counts the number of unique document IDs across all active signals (those with combined_weight > 0). This count is divided by 15 and capped at 0.8:

count_factor = min(unique_sources / 15.0, 0.8)

A trend backed by 15 or more unique source documents reaches the maximum count contribution of 0.8. A trend backed by a single document gets only 0.067. This component rewards breadth of evidence — a trend confirmed by many independent sources is more trustworthy than one driven by a single article, regardless of how high that article's individual weight might be.

Average Extraction Credibility

The average credibility weight across all active signals provides a baseline quality measure. If most contributing documents come from high-credibility sources, this component is high. If the evidence is dominated by low-credibility sources, confidence is penalized accordingly.

Signal Agreement with Sample-Size Dampening

The agreement ratio measures what fraction of directional signals (bullish + bearish, excluding neutral) agree on the majority direction. If 8 out of 10 directional signals are bullish, the raw agreement is 0.8. But raw agreement is misleading with small sample sizes — 1 out of 1 signals agreeing gives a perfect 1.0 agreement, which is not meaningful.

To address this, the agreement is dampened by a logarithmic sample-size factor:

agreement_dampener = min(1.0, log₂(unique_sources + 1) / log₂(8))

This dampener saturates at 1.0 when unique_sources reaches approximately 7 (since log₂(8) = 3.0 and log₂(8) = 3.0). With fewer sources, the dampener reduces the agreement contribution: 1 source gives a dampener of 0.33, 3 sources give 0.67, and 7 sources give the full 1.0. The log₂ scaling means that each additional source provides diminishing marginal improvement to the dampener, which matches the intuition that the jump from 1 to 3 sources is far more meaningful than the jump from 15 to 17.

Contradiction Penalty

The contradiction score computed by services/aggregation/contradiction.py is applied as a direct penalty:

contradiction_penalty = contradiction_score × 0.4

A contradiction score of 0.5 (perfect split) produces a penalty of 0.2, which is substantial enough to push a moderately confident trend below the eligibility threshold.

The Combined Formula

The four components are combined as:

confidence = 0.3 × count_factor + 0.3 × avg_credibility + 0.4 × agreement − contradiction_penalty

The result is clamped to [0.0, 1.0]. The weighting gives signal agreement the largest share (40%), reflecting the principle that consensus among diverse sources is the strongest indicator of a reliable trend. Source count and credibility each contribute 30%, providing a balanced assessment of evidence breadth and quality. The contradiction penalty can reduce confidence significantly — a highly contradicted trend with a score of 0.4 loses 0.16 points of confidence, which can easily drop it below the 0.35 eligibility gate.

How Accumulating Signals Escalate Decisions

The trend direction, strength, and confidence computed by the aggregation engine are not just descriptive — they directly determine what action the system takes. The escalation path from passive observation to active trading is governed by the eligibility thresholds defined in services/recommendation/eligibility.py, and the key insight is that consecutive signals pointing in the same direction naturally strengthen the trend metrics that control this escalation.

The Escalation Ladder

The EligibilityConfig dataclass in services/recommendation/eligibility.py defines the thresholds that map trend metrics to actions:

Neutral (no recommendation). A trend fails the eligibility gates entirely when confidence is below 0.35, trend strength is below 0.10, contradiction exceeds 0.60, evidence count is below 2, or the direction is neutral. The _check_gates() function evaluates these hard gates — if any gate fails, no recommendation is generated for that window.

Watch. A trend that passes the gates but has a direction of mixed, or has strength below 0.25 with confidence below 0.50, maps to a WATCH action via _determine_action(). This is the system's way of saying "something is happening, but the evidence is not strong enough to act on." Watch recommendations are always informational mode — they are logged for human review but never trigger trades.

Hold. When the trend has a clear direction (bullish or bearish) but strength remains below 0.25 while confidence reaches 0.50 or above, the action maps to HOLD. This indicates that the directional signal is real but not yet strong enough for a position change. Like watch, hold recommendations are informational mode.

Buy / Sell. When trend strength reaches 0.25 or above with a bullish direction, the action is BUY. With a bearish direction at the same strength threshold, the action is SELL. These are the only actions that can escalate beyond informational mode — _determine_mode() evaluates whether the recommendation qualifies for paper_eligible (confidence ≥ 0.50) or live_eligible (confidence ≥ 0.70, contradiction ≤ 0.25, evidence ≥ 5).

How Accumulation Drives Escalation

Consider a ticker that starts with no recent intelligence. The first bearish article arrives — a single document with negative sentiment. In the intraday window, this produces:

Trend strength = |avg_sentiment| ≈ the absolute weighted sentiment from one signal, likely close to the impact score.
Confidence = low, because count_factor = min(1/15, 0.8) = 0.067 and the agreement dampener is only log₂(2)/log₂(8) = 0.33.
Direction = bearish (if the weighted sentiment is ≤ -0.15).

With confidence well below 0.35, this trend fails the eligibility gate entirely. No recommendation is generated. The system is in the neutral state.

A second bearish article arrives hours later. Now the intraday window has two signals:

Unique sources = 2, so count_factor = 0.133 and agreement_dampener = log₂(3)/log₂(8) ≈ 0.53.
Agreement = 1.0 × 0.53 = 0.53 (both signals agree on bearish).
Confidence ≈ 0.3 × 0.133 + 0.3 × avg_cred + 0.4 × 0.53 — likely around 0.35-0.45 depending on credibility.

If confidence crosses 0.35 and strength exceeds 0.10, the trend passes the eligibility gates. But with strength below 0.25, the action is WATCH or HOLD depending on confidence.

A third and fourth bearish article arrive over the next day. The 1-day window now has four agreeing signals:

Unique sources = 4, so count_factor = 0.267 and agreement_dampener = log₂(5)/log₂(8) ≈ 0.77.
Agreement = 1.0 × 0.77 = 0.77.
Confidence ≈ 0.3 × 0.267 + 0.3 × avg_cred + 0.4 × 0.77 — likely 0.50-0.60.
Strength = |avg_sentiment| — with four bearish signals and no contradicting evidence, this could easily exceed 0.25.

Now the trend maps to SELL with paper_eligible mode (confidence ≥ 0.50). The system has escalated from no recommendation to a paper-eligible sell recommendation purely through the accumulation of consistent bearish evidence.

If the bearish evidence continues — more documents, more sources, higher credibility — confidence climbs further. At confidence ≥ 0.70 with contradiction ≤ 0.25 and evidence ≥ 5, the recommendation reaches live_eligible mode, the highest escalation level.

The same process works in reverse for bullish accumulation: consecutive positive signals strengthen the bullish trend, increase confidence through source diversity and agreement, and escalate from watch through hold to buy.

The Role of Contradiction in Preventing False Escalation

Accumulation only works when signals agree. If the fifth article about a ticker is bullish while the previous four were bearish, the contradiction score jumps — minority_weight / total_weight increases because the minority (bullish) side now has non-zero weight. This has two effects: the contradiction penalty reduces confidence (potentially dropping it below an eligibility threshold), and if the contradiction exceeds 0.10 with |avg_sentiment| < 0.30, the direction flips to mixed, which maps to WATCH regardless of strength. The system effectively de-escalates when the evidence becomes contested, requiring a clearer consensus before re-escalating.

Trend Projections

After the TrendSummary is assembled and persisted, the aggregation engine computes a forward-looking TrendProjection via compute_projection() in services/aggregation/projection.py. Projections estimate where the trend is heading based on current momentum, macro signal decay, and upcoming catalysts. They are advisory — they do not directly trigger recommendations — but they provide valuable context for human reviewers and can inform future automated decision-making.

Momentum

The compute_trend_momentum() function computes the rate of change in signed trend strength between the current and previous aggregation cycles. If the current window shows a bearish trend at strength 0.40 and the previous cycle showed bearish at 0.30, the momentum is -0.10 (strengthening bearish). If no previous data is available, the function uses a heuristic: momentum is estimated as half the current signed strength, providing a reasonable baseline for new trends.

Momentum enters the projection as a half-weighted adjustment to the current signed strength:

momentum_projected_signed = direction_sign × current_strength + momentum × 0.5

This means momentum influences the projection but does not dominate it — a strong current trend with weakening momentum still projects as directional, just with reduced strength.

Macro Decay

The project_macro_decay() function estimates how active macro events will evolve over the projection horizon. Each macro event has an estimated_duration that maps to a decay half-life:

Duration	Half-Life
`short_term`	1 day
`medium_term`	7 days
`long_term`	30 days

For each event, the function computes the projected remaining impact at the end of the horizon using exponential decay: future_factor = 2^(−future_age_days / half_life). The impact is further scaled by a severity weight (critical: 1.0, high: 0.75, moderate: 0.5, low: 0.25). Positive and negative macro impacts are accumulated separately, and the projected macro direction is determined by comparing the two sides — bullish if positive exceeds negative by 20%, bearish if the reverse, mixed if both are present without a clear majority.

When the macro layer is enabled and macro events exist, the projection blends the company-specific momentum projection with the macro trajectory. The macro weight is capped at 0.4 (40% of the blended projection), ensuring that macro signals inform but do not overwhelm the company-specific trend. The blending formula combines the signed company projection with the signed macro projection:

blended = company_weight × momentum_projected + macro_weight × macro_signed

Driving Factors

The projection records a list of human-readable driving factors that explain what is influencing the projected direction. These include momentum descriptions ("Positive momentum (+0.150) in recent trend strength"), macro impact projections ("Macro signals project bearish impact (strength 0.350) over 7d"), and upcoming catalysts drawn from the trend's dominant_catalysts list (limited to the top 3). If no specific factors are identified, a baseline continuation factor is recorded.

Divergence Detection

After computing the projected direction, the function compares it to the current trend direction. If they differ — for example, the current trend is bearish but the projection is bullish due to decaying negative macro events and positive momentum — the projection is flagged with diverges_from_current = True and a divergence driving factor is appended. Divergence signals are particularly valuable because they indicate that the trend may be about to reverse, giving the recommendation engine and human reviewers an early warning.

The projection also flags low confidence when projected_confidence falls below the default threshold of 0.3. Projection confidence starts at 80% of the current trend confidence (reflecting the inherent uncertainty of forward-looking estimates), with a small boost if macro data is available and a further reduction if the macro layer is disabled entirely.

Persistence

Each aggregation cycle persists its results to four PostgreSQL tables, creating a durable record of the trend assessment and its supporting evidence.

`trend_windows` — Current State

The persist_trend_summary() function in services/aggregation/worker.py upserts the TrendSummary into the trend_windows table, keyed by (entity_type, entity_id, window). Each cycle overwrites the previous row for that ticker and window, so trend_windows always reflects the most recent assessment. The row includes the trend direction, strength, confidence, contradiction score, disagreement details (as JSON), supporting and opposing evidence document IDs (as JSON arrays), dominant catalysts, material risks, market context, and the generation timestamp.

`trend_history` — Time-Series Snapshots

Immediately after the upsert, persist_trend_summary() also inserts a snapshot row into the trend_history table. Unlike trend_windows, this table is append-only — every aggregation cycle adds a new row, creating a time-series of how the trend evolved over time. The history table stores the direction, strength, confidence, contradiction score, catalysts, risks, and timestamp. This time-series data powers the trend charts in the dashboard and enables the momentum computation in services/aggregation/projection.py by providing the previous cycle's strength and direction. If the history insert fails (for example, if the table does not yet exist in a development environment), the failure is logged at debug level and does not block the main upsert.

`trend_evidence` — Per-Document Rankings

The persist_trend_evidence() function writes detailed evidence ranking rows to the trend_evidence table, linked to the trend_windows row by its UUID. Each row records a document ID, its role (supporting or opposing), and the individual scoring components: rank score, weight component, impact component, recency component, confidence component, and sentiment value. Non-UUID document IDs (such as synthetic pattern signal IDs like pattern:AAPL:earnings:7d) are filtered out before insertion, since the trend_evidence table enforces a foreign key to the documents table.

`trend_projections` — Forward-Looking Estimates

The persist_trend_projection() function in services/aggregation/projection.py inserts the TrendProjection into the trend_projections table, linked to the trend_windows row. The row stores the projected direction, strength, confidence, projection horizon, driving factors (as JSON), macro contribution percentage, divergence flag, and computation timestamp. Like trend history, projections accumulate over time, allowing analysis of how well the system's forward-looking estimates matched subsequent reality.

What Comes Next

At this point, the aggregation engine has transformed weighted signals into TrendSummary objects across five time windows, detected contradictions, ranked evidence, computed confidence, and persisted everything to PostgreSQL. The trend metrics — direction, strength, confidence, contradiction score — encode the accumulated weight of evidence for each ticker. But a TrendSummary is still an assessment, not an action. The next stage translates these assessments into concrete recommendations: should the system buy, sell, hold, or simply watch? And with what conviction? Page 5 — Recommendation Generation explains how the recommendation engine applies data quality suppression, eligibility evaluation, position sizing, thesis generation, and risk classification to convert trend summaries into actionable Recommendation objects that the trading engine can execute.

27 KiB Raw Permalink Blame History Unescape Escape