feat: competitive intelligence & historical pattern matching layer
This commit is contained in:
@@ -0,0 +1 @@
|
||||
{"specId": "3e745894-9abc-49ff-97cc-c921f436bb32", "workflowType": "requirements-first", "specType": "feature"}
|
||||
@@ -0,0 +1,581 @@
|
||||
# Competitive Intelligence & Historical Pattern Matching Layer — Design
|
||||
|
||||
## Overview
|
||||
|
||||
This design adds a third signal layer to the Stonks Oracle aggregation engine: competitive intelligence and historical pattern matching. The layer mines existing PostgreSQL data — `document_impact_records`, `trend_windows`, and `document_company_mentions` — to identify how similar catalyst types resolved historically for a company and its competitors, then feeds pattern-based signals into the aggregation engine alongside company-specific (layer 1) and macro (layer 2) signals.
|
||||
|
||||
The design follows the same integration pattern as the macro interpolation layer: new modules within existing services, a runtime toggle in `risk_configs`, and the same `WeightedSignal` abstraction for aggregation. No new Kubernetes deployments are required.
|
||||
|
||||
### Design Rationale
|
||||
|
||||
- **Mine existing data, no new ingestion**: All pattern signals derive from data already in PostgreSQL — document intelligence, impact records, and trend windows. No new external data sources or ingestion pipelines.
|
||||
- **Reuse existing scoring pipeline**: Pattern signals convert to `WeightedSignal` objects using the same `compute_signal_weight` function, ensuring consistent recency decay, confidence gating, and contradiction detection.
|
||||
- **Parallel to macro layer**: The competitive layer toggle, suppression logic, and aggregation integration mirror the macro layer's architecture for consistency.
|
||||
- **Safety-first**: Low-confidence patterns (< 0.3) are excluded, pattern-only trend shifts are forced to informational mode, and the entire layer is independently toggleable.
|
||||
- **Competitor relationships as first-class entities**: Both operator-defined and auto-inferred relationships, with strength scores that gate signal propagation.
|
||||
|
||||
## Architecture
|
||||
|
||||
The competitive intelligence layer adds five logical components within existing services:
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph SymReg["Symbol Registry (existing)"]
|
||||
CR[Competitor Registry]
|
||||
AI[Auto-Inference Engine]
|
||||
end
|
||||
|
||||
subgraph Aggregation["Aggregation Service (existing)"]
|
||||
PM[Pattern Matcher]
|
||||
SPE[Signal Propagation Engine]
|
||||
AE[Aggregation Engine]
|
||||
end
|
||||
|
||||
subgraph Recommendation["Recommendation Service (existing)"]
|
||||
PS[Pattern-Only Suppression]
|
||||
end
|
||||
|
||||
subgraph LakePublisher["Lake Publisher (existing)"]
|
||||
LP[Competitive Fact Publisher]
|
||||
end
|
||||
|
||||
subgraph QueryAPI["Query API (existing)"]
|
||||
PA[Pattern API Endpoints]
|
||||
CT[Competitive Toggle Endpoint]
|
||||
end
|
||||
|
||||
subgraph Dashboard["Dashboard (existing)"]
|
||||
CP[Competitors Panel]
|
||||
HP[Historical Patterns Panel]
|
||||
CS[Competitive Signals Panel]
|
||||
DT[Decision Timeline]
|
||||
end
|
||||
|
||||
CR -->|competitor relationships| SPE
|
||||
AI -->|inferred relationships| CR
|
||||
PM -->|historical patterns| SPE
|
||||
PM -->|self-company patterns| AE
|
||||
SPE -->|competitive signals| AE
|
||||
AE -->|trend summaries| PS
|
||||
SPE -->|signal records| LP
|
||||
CT -->|toggle state| AE
|
||||
PA --> CP
|
||||
PA --> HP
|
||||
PA --> CS
|
||||
PA --> DT
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
1. **Competitor Management**: Operators define competitor relationships via the Symbol Registry API, or trigger auto-inference from sector/industry and document co-mentions. Relationships are stored in `competitor_relationships`.
|
||||
|
||||
2. **Pattern Mining**: When the aggregation engine runs for a ticker, the Pattern Matcher queries `document_impact_records` joined with `trend_windows` to find historical instances of the same catalyst type. It computes outcome statistics (bullish_pct, bearish_pct, avg_strength) and a pattern_confidence score.
|
||||
|
||||
3. **Signal Propagation**: The Signal Propagation Engine looks up the ticker's competitors, queries the Pattern Matcher for cross-company historical patterns, and produces `competitive_signal_records` weighted by relationship strength × pattern confidence × source impact score.
|
||||
|
||||
4. **Aggregation**: Pattern signals (both self-company and competitive) are converted to `WeightedSignal` objects and merged into the existing signal list. The competitive layer toggle is checked from `risk_configs` at the start of each cycle.
|
||||
|
||||
5. **Recommendation Safety**: Pattern-only trend shifts (no supporting company-specific or macro signals) are forced to informational mode with a pattern-only caveat.
|
||||
|
||||
6. **Lake Publication**: Competitor relationships and competitive signal facts are published as partitioned Parquet datasets.
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### Competitor Registry
|
||||
|
||||
**Location**: `services/symbol_registry/competitors.py` (new module, registered as a FastAPI router in `app.py`)
|
||||
|
||||
Manages competitor relationships with CRUD operations and audit logging.
|
||||
|
||||
```python
|
||||
class CompetitorRelationshipCreate(BaseModel):
|
||||
company_b_id: str
|
||||
relationship_type: str # direct_rival | same_sector | overlapping_products | supply_chain_adjacent
|
||||
strength: float # [0, 1]
|
||||
bidirectional: bool = True
|
||||
source: str = "manual" # manual | inferred
|
||||
|
||||
class CompetitorRelationship(BaseModel):
|
||||
id: str
|
||||
company_a_id: str
|
||||
company_b_id: str
|
||||
relationship_type: str
|
||||
strength: float
|
||||
bidirectional: bool
|
||||
source: str
|
||||
active: bool
|
||||
created_at: datetime
|
||||
updated_at: datetime
|
||||
```
|
||||
|
||||
**API Endpoints** (on Symbol Registry):
|
||||
- `POST /companies/{company_id}/competitors` — create relationship
|
||||
- `GET /companies/{company_id}/competitors` — list relationships (ordered by strength desc)
|
||||
- `PUT /companies/{company_id}/competitors/{relationship_id}` — update relationship
|
||||
- `DELETE /companies/{company_id}/competitors/{relationship_id}` — soft-delete (set active=false)
|
||||
- `POST /companies/{company_id}/competitors/infer` — trigger auto-inference
|
||||
|
||||
**Auto-Inference Logic** (`services/symbol_registry/competitor_inference.py`):
|
||||
1. Query companies sharing the same sector and industry
|
||||
2. Rank candidates by co-mention frequency in `document_company_mentions`
|
||||
3. Compute strength = `0.3 * sector_match + 0.7 * normalized_co_mention_count`
|
||||
4. Upsert relationships with `source='inferred'`, refreshing strength on re-inference
|
||||
5. Return candidate list for operator review
|
||||
|
||||
### Pattern Matcher
|
||||
|
||||
**Location**: `services/aggregation/pattern_matcher.py`
|
||||
|
||||
Queries historical data to find how similar catalyst types resolved for a company or its competitors.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class HistoricalPattern:
|
||||
source_ticker: str # company that received the original catalyst
|
||||
target_ticker: str # company being evaluated (same as source for self-patterns)
|
||||
catalyst_type: str
|
||||
time_horizon: str # 1d | 7d | 30d
|
||||
sample_count: int
|
||||
bullish_pct: float # [0, 1]
|
||||
bearish_pct: float # [0, 1]
|
||||
avg_strength: float # [0, 1]
|
||||
avg_time_to_resolution: float # days
|
||||
pattern_confidence: float # [0, 1]
|
||||
data_start: datetime
|
||||
data_end: datetime
|
||||
tier: str # major_corporate_decision | routine_signal
|
||||
insufficient_data: bool # True when sample_count < 3
|
||||
```
|
||||
|
||||
**Core Functions**:
|
||||
- `find_self_patterns(pool, ticker, catalyst_type, horizons) -> list[HistoricalPattern]`
|
||||
- `find_cross_company_patterns(pool, source_ticker, target_ticker, catalyst_type, horizons) -> list[HistoricalPattern]`
|
||||
- `compute_pattern_confidence(sample_count, outcome_consistency, data_recency_days) -> float`
|
||||
- `classify_catalyst_tier(catalyst_type) -> str` — returns `major_corporate_decision` or `routine_signal`
|
||||
|
||||
**Pattern Confidence Formula**:
|
||||
```
|
||||
sample_factor = min(sample_count / 20, 1.0) # diminishing returns above 20
|
||||
consistency = max(bullish_pct, bearish_pct) # how uniform outcomes are
|
||||
recency_factor = 1.0 if newest_within_90d else 0.7 if newest_within_180d else 0.4
|
||||
confidence = sample_factor * 0.4 + consistency * 0.4 + recency_factor * 0.2
|
||||
```
|
||||
|
||||
**Insufficient Data**: When `sample_count < 3`, confidence is capped at 0.25 and `insufficient_data = True`.
|
||||
|
||||
**Staleness Decay** (Req 9.2): When no instances exist in the last 90 days and all data is older than 180 days, a 0.5 decay penalty is applied to confidence.
|
||||
|
||||
**Catalyst Tier Classification** (Req 11.1):
|
||||
- `major_corporate_decision`: catalyst types `m_and_a`, `legal`, `restructuring`, `leadership_change`, `strategic_pivot`, `buyback`, `dividend_change`
|
||||
- `routine_signal`: all other catalyst types
|
||||
- Major decisions use 365-day lookback; routine signals use 180-day lookback
|
||||
- Major decisions receive a 1.3× base weight multiplier on pattern_confidence
|
||||
|
||||
**Historical Query**: Only considers `document_impact_records` linked to `document_intelligence` with `validation_status = 'valid'` and `documents` with `status != 'rejected'`.
|
||||
|
||||
### Signal Propagation Engine
|
||||
|
||||
**Location**: `services/aggregation/signal_propagation.py`
|
||||
|
||||
Evaluates incoming document intelligence, identifies competitors, queries historical patterns, and produces competitive signals.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class CompetitiveSignalRecord:
|
||||
source_document_id: str
|
||||
source_ticker: str
|
||||
target_ticker: str
|
||||
catalyst_type: str
|
||||
pattern_confidence: float
|
||||
signal_direction: str # bullish | bearish
|
||||
signal_strength: float # [0, 1]
|
||||
relationship_strength: float
|
||||
computed_at: datetime
|
||||
```
|
||||
|
||||
**Core Functions**:
|
||||
- `propagate_signals(pool, ticker, catalyst_type, impact_score, document_id, config) -> list[CompetitiveSignalRecord]`
|
||||
- `build_pattern_weighted_signals(patterns, competitive_signals, reference_time, window, config) -> list[WeightedSignal]`
|
||||
|
||||
**Signal Weighting**:
|
||||
```
|
||||
signal_strength = pattern.avg_strength * relationship.strength * pattern.pattern_confidence * source_impact_score
|
||||
signal_direction = "bullish" if pattern.bullish_pct > pattern.bearish_pct else "bearish"
|
||||
```
|
||||
|
||||
**Propagation Threshold** (Req 4.5): Skip propagation when `relationship.strength < 0.2` (configurable).
|
||||
|
||||
**Confidence Threshold** (Req 9.1): Exclude patterns with `pattern_confidence < 0.3` (configurable).
|
||||
|
||||
### Aggregation Engine Extensions
|
||||
|
||||
**Location**: Modified `services/aggregation/worker.py`
|
||||
|
||||
The existing `aggregate_company_window` function is extended to:
|
||||
1. Check the competitive layer toggle from `risk_configs` (same pattern as macro toggle)
|
||||
2. Query self-company historical patterns for active catalyst types in the window
|
||||
3. Query competitive signals targeting this ticker
|
||||
4. Convert pattern/competitive signals to `WeightedSignal` objects
|
||||
5. Merge with company-specific and macro signals before computing the trend summary
|
||||
|
||||
**New config field on `AggregationConfig`**:
|
||||
```python
|
||||
competitive_signal_weight: float = 0.2 # relative weight of pattern signals
|
||||
competitive_enabled: bool = True # runtime toggle state
|
||||
```
|
||||
|
||||
**Pattern signal conversion**: Each pattern signal is converted to a `WeightedSignal` using:
|
||||
- `document_id` = source document that triggered the pattern lookup (for evidence tracing)
|
||||
- `sentiment_value` = +1.0 if pattern direction is bullish, -1.0 if bearish
|
||||
- `impact_score` = `signal_strength * competitive_signal_weight`
|
||||
- Recency decay uses the source document's publication time
|
||||
- Confidence gating uses `pattern_confidence` as the extraction confidence
|
||||
|
||||
**No-degradation guarantee** (Req 5.5): When no patterns or competitive signals exist, the aggregation produces identical output to the two-layer engine.
|
||||
|
||||
### Pattern-Only Suppression
|
||||
|
||||
**Location**: Extended `services/recommendation/suppression.py`
|
||||
|
||||
New suppression check mirroring `evaluate_macro_only_suppression`:
|
||||
|
||||
```python
|
||||
PATTERN_ONLY_CAVEAT = (
|
||||
"[Pattern-only signal] This trend direction is driven solely by historical "
|
||||
"pattern and competitive signals with no supporting company-specific or macro "
|
||||
"evidence. Recommendation is informational only."
|
||||
)
|
||||
|
||||
def evaluate_pattern_only_suppression(
|
||||
summary: TrendSummary,
|
||||
pattern_signal_count: int,
|
||||
company_signal_count: int,
|
||||
macro_signal_count: int,
|
||||
) -> bool
|
||||
```
|
||||
|
||||
New `SuppressionReason` enum value: `PATTERN_ONLY_SIGNAL = "pattern_only_signal"`
|
||||
|
||||
### Query API Extensions
|
||||
|
||||
**Location**: Extended `services/api/app.py`
|
||||
|
||||
New endpoints:
|
||||
- `GET /api/patterns/{ticker}` — historical patterns for a company, filterable by `catalyst_type` and `time_horizon`
|
||||
- `GET /api/patterns/{ticker}/competitors` — cross-company patterns showing how this company's catalysts affected competitors
|
||||
- `GET /api/patterns/{ticker}/competitive-signals` — recent competitive signals targeting this company
|
||||
- `GET /api/patterns/{ticker}/decisions` — major corporate decision history with trend outcomes
|
||||
- `GET /api/admin/competitive/status` — competitive layer enabled/disabled state
|
||||
- `PUT /api/admin/competitive/toggle` — toggle competitive layer on/off
|
||||
|
||||
### Dashboard Extensions
|
||||
|
||||
**Location**: Extended `frontend/src/`
|
||||
|
||||
New panels on Company Detail page (new tabs alongside existing sources/aliases/macro):
|
||||
- **Competitors tab**: Active competitor relationships with ticker, relationship_type, strength, source
|
||||
- **Historical Patterns tab**: Recent patterns for the company — catalyst_type, outcome distribution, sample_count, confidence
|
||||
- **Competitive Signals tab**: Incoming competitive signals — source ticker, catalyst_type, direction, strength
|
||||
- **Decisions tab**: Corporate decision timeline — major events with catalyst type, date, summary, trend outcome
|
||||
|
||||
Trend detail page extensions:
|
||||
- Visual distinction for pattern-based and competitive signal evidence (badge/icon differentiation)
|
||||
- Click-through on competitive signals showing full signal detail
|
||||
|
||||
Trading Controls page:
|
||||
- Competitive layer toggle alongside existing macro toggle, with confirmation dialog
|
||||
|
||||
## Data Models
|
||||
|
||||
### New PostgreSQL Tables (Migration 017)
|
||||
|
||||
#### `competitor_relationships`
|
||||
```sql
|
||||
CREATE TABLE competitor_relationships (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
company_a_id UUID NOT NULL REFERENCES companies(id),
|
||||
company_b_id UUID NOT NULL REFERENCES companies(id),
|
||||
relationship_type VARCHAR(30) NOT NULL,
|
||||
strength FLOAT NOT NULL DEFAULT 0.5,
|
||||
bidirectional BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
source VARCHAR(20) NOT NULL DEFAULT 'manual',
|
||||
active BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT chk_relationship_type CHECK (
|
||||
relationship_type IN ('direct_rival', 'same_sector', 'overlapping_products', 'supply_chain_adjacent')
|
||||
),
|
||||
CONSTRAINT chk_strength CHECK (strength >= 0 AND strength <= 1),
|
||||
CONSTRAINT chk_source CHECK (source IN ('manual', 'inferred')),
|
||||
CONSTRAINT chk_different_companies CHECK (company_a_id != company_b_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_competitor_rel_company_a ON competitor_relationships(company_a_id) WHERE active = TRUE;
|
||||
CREATE INDEX idx_competitor_rel_company_b ON competitor_relationships(company_b_id) WHERE active = TRUE;
|
||||
CREATE UNIQUE INDEX idx_competitor_rel_unique_pair ON competitor_relationships(
|
||||
LEAST(company_a_id, company_b_id), GREATEST(company_a_id, company_b_id)
|
||||
) WHERE active = TRUE;
|
||||
```
|
||||
|
||||
#### `competitive_signal_records`
|
||||
```sql
|
||||
CREATE TABLE competitive_signal_records (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
source_document_id UUID REFERENCES documents(id),
|
||||
source_ticker VARCHAR(20) NOT NULL,
|
||||
target_ticker VARCHAR(20) NOT NULL,
|
||||
catalyst_type VARCHAR(50) NOT NULL,
|
||||
pattern_confidence FLOAT NOT NULL,
|
||||
signal_direction VARCHAR(20) NOT NULL,
|
||||
signal_strength FLOAT NOT NULL,
|
||||
relationship_strength FLOAT NOT NULL,
|
||||
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_competitive_signals_target ON competitive_signal_records(target_ticker, computed_at DESC);
|
||||
CREATE INDEX idx_competitive_signals_source ON competitive_signal_records(source_ticker, computed_at DESC);
|
||||
```
|
||||
|
||||
### New Pydantic Schemas
|
||||
|
||||
Added to `services/shared/schemas.py`:
|
||||
|
||||
```python
|
||||
class RelationshipType(str, Enum):
|
||||
DIRECT_RIVAL = "direct_rival"
|
||||
SAME_SECTOR = "same_sector"
|
||||
OVERLAPPING_PRODUCTS = "overlapping_products"
|
||||
SUPPLY_CHAIN_ADJACENT = "supply_chain_adjacent"
|
||||
|
||||
class CatalystTier(str, Enum):
|
||||
MAJOR_CORPORATE_DECISION = "major_corporate_decision"
|
||||
ROUTINE_SIGNAL = "routine_signal"
|
||||
|
||||
# Major corporate decision catalyst types (Req 11.1)
|
||||
MAJOR_DECISION_CATALYSTS = frozenset({
|
||||
"m_and_a", "legal", "restructuring", "leadership_change",
|
||||
"strategic_pivot", "buyback", "dividend_change",
|
||||
})
|
||||
```
|
||||
|
||||
### New `CompetitiveConfig` in `services/shared/config.py`
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class CompetitiveConfig:
|
||||
competitive_signal_weight: float = 0.2
|
||||
competitive_enabled: bool = True
|
||||
pattern_confidence_threshold: float = 0.3
|
||||
propagation_strength_threshold: float = 0.2
|
||||
routine_lookback_days: int = 180
|
||||
major_decision_lookback_days: int = 365
|
||||
major_decision_weight_multiplier: float = 1.3
|
||||
staleness_window_days: int = 180
|
||||
staleness_recent_days: int = 90
|
||||
staleness_decay_penalty: float = 0.5
|
||||
min_pattern_samples: int = 3
|
||||
```
|
||||
|
||||
### Analytical Lake Datasets
|
||||
|
||||
New fact tables published to MinIO under `stonks-lakehouse/`:
|
||||
|
||||
- `lake.competitor_relationships` — partitioned by `dt`, columns: id, company_a_id, company_b_id, relationship_type, strength, bidirectional, source, active, created_at
|
||||
- `lake.competitive_signals` — partitioned by `dt` and `target_ticker`, columns: id, source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, computed_at
|
||||
|
||||
|
||||
|
||||
## Correctness Properties
|
||||
|
||||
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
||||
|
||||
### Property 1: Competitor relationship persistence round-trip
|
||||
|
||||
*For any* valid CompetitorRelationship object with valid company IDs, relationship_type, strength in [0, 1], bidirectional flag, and source, persisting it to PostgreSQL and reading it back SHALL produce an equivalent object with all fields preserved.
|
||||
|
||||
**Validates: Requirements 1.1, 7.1**
|
||||
|
||||
### Property 2: Competitor query completeness and ordering
|
||||
|
||||
*For any* set of competitor relationships involving a company (as either company_a or company_b), querying competitors for that company SHALL return all active relationships containing that company, and the results SHALL be ordered by strength descending.
|
||||
|
||||
**Validates: Requirements 1.2**
|
||||
|
||||
### Property 3: Soft-delete preserves row
|
||||
|
||||
*For any* active competitor relationship, deleting it SHALL set `active = False` while preserving the row in the database with all original field values intact.
|
||||
|
||||
**Validates: Requirements 1.3**
|
||||
|
||||
### Property 4: Auto-inference produces valid candidates
|
||||
|
||||
*For any* company with a defined sector and industry, running auto-inference SHALL produce only candidate relationships where the candidate company shares the same sector and industry, and all produced relationships SHALL have `source = 'inferred'` with strength in [0, 1].
|
||||
|
||||
**Validates: Requirements 2.1, 2.3**
|
||||
|
||||
### Property 5: Auto-inference ranks by co-mention frequency
|
||||
|
||||
*For any* set of candidate competitors with different co-mention counts in `document_company_mentions`, the auto-inferred relationships SHALL have strength scores that are monotonically non-decreasing with co-mention frequency — candidates with more co-mentions receive higher or equal strength scores.
|
||||
|
||||
**Validates: Requirements 2.2**
|
||||
|
||||
### Property 6: Auto-inference idempotence
|
||||
|
||||
*For any* company, running auto-inference twice in succession SHALL produce the same set of relationships (no duplicates created), with strength scores updated to reflect the latest co-mention data.
|
||||
|
||||
**Validates: Requirements 2.4**
|
||||
|
||||
### Property 7: Pattern computation correctness
|
||||
|
||||
*For any* set of historical `document_impact_records` and `trend_windows` for a company-catalyst pair (or cross-company pair), the computed HistoricalPattern SHALL have: `sample_count` equal to the actual number of matching records, `bullish_pct + bearish_pct + neutral_pct ≈ 1.0`, `avg_strength` equal to the mean of the matched trend strengths, and all fields within their valid ranges.
|
||||
|
||||
**Validates: Requirements 3.1, 3.2, 4.2**
|
||||
|
||||
### Property 8: Pattern confidence monotonicity
|
||||
|
||||
*For any* two HistoricalPatterns where one has strictly more samples, more consistent outcomes, and more recent data than the other (all else equal), the first SHALL have a higher or equal `pattern_confidence`. Additionally, *for any* two patterns with identical statistics but different tiers, the `major_corporate_decision` pattern SHALL have higher confidence than the `routine_signal` pattern.
|
||||
|
||||
**Validates: Requirements 3.3, 11.2**
|
||||
|
||||
### Property 9: Insufficient data threshold
|
||||
|
||||
*For any* HistoricalPattern with `sample_count < 3`, the `pattern_confidence` SHALL be below 0.3 and `insufficient_data` SHALL be True.
|
||||
|
||||
**Validates: Requirements 3.4**
|
||||
|
||||
### Property 10: Valid-only data filtering
|
||||
|
||||
*For any* set of `document_impact_records` containing records linked to invalid intelligence (`validation_status != 'valid'`) or rejected documents (`status = 'rejected'`), the Pattern_Matcher SHALL exclude those records from pattern computation — the resulting `sample_count` SHALL only reflect valid, non-rejected records.
|
||||
|
||||
**Validates: Requirements 3.5**
|
||||
|
||||
### Property 11: Competitive signal strength monotonicity
|
||||
|
||||
*For any* competitive signal computation, increasing the relationship strength, pattern confidence, or source impact score (while holding others constant) SHALL produce a `signal_strength` that is greater than or equal to the previous value.
|
||||
|
||||
**Validates: Requirements 4.3**
|
||||
|
||||
### Property 12: Signal propagation threshold gating
|
||||
|
||||
*For any* competitor relationship with `strength < 0.2` (configurable), the Signal_Propagation_Engine SHALL produce zero competitive signals for that pair. Similarly, *for any* HistoricalPattern with `pattern_confidence < 0.3` (configurable), the pattern SHALL be excluded from competitive signal computation.
|
||||
|
||||
**Validates: Requirements 4.5, 9.1**
|
||||
|
||||
### Property 13: Pattern signal to WeightedSignal conversion
|
||||
|
||||
*For any* pattern-based signal converted to a WeightedSignal, the resulting object SHALL have: `sentiment_value` of +1.0 for bullish patterns or -1.0 for bearish patterns, `impact_score` equal to `signal_strength * competitive_signal_weight`, confidence gating applied using `pattern_confidence`, and recency decay based on the source document's publication time.
|
||||
|
||||
**Validates: Requirements 5.2**
|
||||
|
||||
### Property 14: Pattern-company contradiction detection
|
||||
|
||||
*For any* set of signals where pattern-based signals have a direction opposing company-specific signals (e.g., pattern is bearish while company signals are positive), the resulting trend summary's `contradiction_score` SHALL be greater than zero and `disagreement_details` SHALL contain at least one entry.
|
||||
|
||||
**Validates: Requirements 5.3**
|
||||
|
||||
### Property 15: Pattern evidence traceability
|
||||
|
||||
*For any* trend summary that includes pattern-based or competitive signal contributions, the `top_supporting_evidence` or `top_opposing_evidence` lists SHALL contain the `source_document_id` of at least one contributing pattern signal.
|
||||
|
||||
**Validates: Requirements 5.4**
|
||||
|
||||
### Property 16: No-degradation and disabled-layer equivalence
|
||||
|
||||
*For any* company with no historical patterns or competitive signals in the aggregation window, the trend summary produced with the competitive layer enabled SHALL be identical to the summary produced with it disabled. Furthermore, *for any* aggregation run with the competitive layer disabled, the output SHALL be identical to company+macro-only aggregation regardless of existing pattern data.
|
||||
|
||||
**Validates: Requirements 5.5, 6.2**
|
||||
|
||||
### Property 17: Staleness decay penalty
|
||||
|
||||
*For any* HistoricalPattern where all historical instances are older than 180 days and no instances exist within the last 90 days, the `pattern_confidence` SHALL be strictly less than the confidence computed for an identical pattern with at least one instance within the last 90 days.
|
||||
|
||||
**Validates: Requirements 9.2**
|
||||
|
||||
### Property 18: Pattern-only suppression
|
||||
|
||||
*For any* trend summary where the trend direction is driven solely by pattern-based and competitive signals (no company-specific or macro signals support the direction), the resulting recommendation SHALL have `mode = 'informational'` and the thesis SHALL contain a pattern-only caveat.
|
||||
|
||||
**Validates: Requirements 9.3**
|
||||
|
||||
### Property 19: Catalyst tier classification determinism
|
||||
|
||||
*For any* catalyst type, the tier classification SHALL be deterministic: `m_and_a`, `legal`, `restructuring`, `leadership_change`, `strategic_pivot`, `buyback`, and `dividend_change` SHALL always map to `major_corporate_decision`; all other catalyst types SHALL map to `routine_signal`.
|
||||
|
||||
**Validates: Requirements 11.1**
|
||||
|
||||
### Property 20: Major decision extended lookback
|
||||
|
||||
*For any* pattern mining query for a `major_corporate_decision` catalyst type, the lookback window SHALL be 365 days. *For any* `routine_signal` catalyst type, the lookback window SHALL be 180 days. This applies to both self-company and cross-company pattern queries.
|
||||
|
||||
**Validates: Requirements 11.3, 11.5**
|
||||
|
||||
### Property 21: Competitive signal persistence round-trip
|
||||
|
||||
*For any* valid CompetitiveSignalRecord with all required fields (source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, computed_at), persisting it to PostgreSQL and reading it back SHALL produce an equivalent record with all fields preserved.
|
||||
|
||||
**Validates: Requirements 4.4, 7.2**
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Pattern Mining Failures
|
||||
- Database errors during historical pattern queries are logged and the pattern is treated as "no data" — the aggregation engine continues with company-specific and macro signals only.
|
||||
- Malformed or missing `trend_windows` data for a historical period results in that period being excluded from pattern computation (reduced sample_count) rather than failing the entire query.
|
||||
|
||||
### Signal Propagation Failures
|
||||
- If competitor relationship lookup fails, propagation is skipped for that ticker and logged. Aggregation continues with self-company patterns only.
|
||||
- If pattern mining fails for a specific competitor, that competitor is skipped. Other competitors are still processed.
|
||||
- Sustained propagation errors exceeding a configurable threshold (default 5 consecutive failures) trigger an operator alert via the existing alerting framework.
|
||||
|
||||
### Auto-Inference Failures
|
||||
- If the `document_company_mentions` table is empty or the query fails, auto-inference returns an empty candidate list with a warning. No relationships are created or modified.
|
||||
- If sector/industry data is missing for the target company, inference is skipped with a 400 response.
|
||||
|
||||
### Competitor Registry Failures
|
||||
- Attempting to create a relationship between the same company (company_a_id == company_b_id) returns a 400 error.
|
||||
- Attempting to create a duplicate active relationship returns a 409 conflict.
|
||||
- Foreign key violations (non-existent company IDs) return a 404 error.
|
||||
|
||||
### Runtime Toggle Safety
|
||||
- Toggle state is read from PostgreSQL at the start of each aggregation cycle — same pattern as the macro toggle, no caching.
|
||||
- Toggle changes are audit-logged with operator identity, previous state, and new state.
|
||||
- Disabling the competitive layer does not delete any data — pattern mining remains queryable via the API, only aggregation integration is skipped.
|
||||
|
||||
### Graceful Degradation
|
||||
- The competitive layer is designed to be fully optional. Any failure in pattern mining, signal propagation, or competitive signal computation results in the aggregation engine falling back to company-specific + macro signals with no degradation of existing behavior.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Property-Based Testing
|
||||
|
||||
This feature is well-suited for property-based testing. The core logic — pattern confidence computation, signal strength weighting, threshold gating, catalyst tier classification, and overlap/monotonicity properties — consists of pure functions with clear input/output behavior and a large input space.
|
||||
|
||||
**Library**: [Hypothesis](https://hypothesis.readthedocs.io/) for Python property-based testing.
|
||||
|
||||
**Configuration**: Minimum 100 iterations per property test.
|
||||
|
||||
**Tag format**: `Feature: competitive-historical-patterns, Property {number}: {property_text}`
|
||||
|
||||
Each correctness property maps to one property-based test. Generators will produce:
|
||||
- Random `CompetitorRelationship` objects with valid relationship types, strength in [0, 1], and source values
|
||||
- Random `HistoricalPattern` objects with valid sample counts, percentage distributions summing to ~1.0, and confidence scores
|
||||
- Random `CompetitiveSignalRecord` objects with valid direction, strength, and confidence values
|
||||
- Random sets of `WeightedSignal` objects with mixed sentiment values for contradiction testing
|
||||
- Random catalyst types drawn from both major decision and routine signal categories
|
||||
|
||||
### Unit Testing
|
||||
|
||||
Unit tests complement property tests for specific examples and edge cases:
|
||||
- API endpoint response codes and error handling (CRUD operations, validation errors, 404s, 409s)
|
||||
- Dashboard component rendering with mock data (competitors panel, patterns panel, signals panel, decision timeline)
|
||||
- Toggle state transitions and audit logging
|
||||
- Auto-inference with empty data, single company, no co-mentions
|
||||
- Pattern mining with zero results, exactly 3 results (boundary), mixed valid/invalid records
|
||||
|
||||
### Integration Testing
|
||||
|
||||
Integration tests verify end-to-end flows:
|
||||
- Full aggregation cycle with competitive layer enabled: document intelligence → pattern mining → signal propagation → trend summary
|
||||
- Lake publisher producing Parquet datasets for competitor relationships and competitive signals
|
||||
- Toggle disable/re-enable cycle preserving data integrity
|
||||
- API endpoints returning correct data from PostgreSQL
|
||||
- Dashboard pages rendering with live API data
|
||||
@@ -0,0 +1,157 @@
|
||||
# Requirements Document — Competitive Intelligence & Historical Pattern Matching Layer
|
||||
|
||||
## Introduction
|
||||
|
||||
This feature adds a third signal layer to the Stonks Oracle aggregation engine: competitive intelligence and historical pattern matching. The existing platform produces per-company trend summaries from two signal sources — company-specific document intelligence (layer 1) and global macro news interpolation (layer 2). This extension introduces a third parallel signal path that mines the existing `document_intelligence`, `document_impact_records`, and `trend_windows` tables to identify historical patterns — how similar catalyst types for the same company or its competitors resolved in the past — and uses those patterns to reinforce or weaken current trend signals.
|
||||
|
||||
The core insight is that competitive dynamics are predictable: when a company receives a bullish product catalyst, its direct competitors often experience a measurable bearish reaction within a short window. By mining the platform's own historical data for these patterns, the system can propagate signals across competitor relationships and weight current trends based on how similar situations resolved historically.
|
||||
|
||||
This layer does not ingest new external data. It mines existing data already in PostgreSQL — sentiment, catalyst types, impact scores from `document_impact_records`, and historical direction/strength outcomes from `trend_windows` — to produce pattern-based signals that feed into the aggregation engine alongside the other two layers.
|
||||
|
||||
## Glossary
|
||||
|
||||
- **Competitor_Relationship**: A directional or bidirectional link between two tracked companies indicating they compete in the same market segment. Relationships have a strength score in [0, 1] and a relationship_type (direct_rival, same_sector, overlapping_products, supply_chain_adjacent).
|
||||
- **Competitor_Registry**: The component within the Symbol_Registry that manages Competitor_Relationships, supporting both operator-defined and auto-inferred relationships.
|
||||
- **Historical_Pattern**: A statistical summary derived from past `document_impact_records` and `trend_windows` data, describing how a specific catalyst_type for a specific company (or its competitors) historically correlated with trend outcomes within a given time horizon.
|
||||
- **Pattern_Matcher**: The component that queries historical data to find past instances of similar catalyst types for a company or its competitors, computes outcome statistics, and produces Historical_Pattern objects.
|
||||
- **Pattern_Signal**: A weighted signal derived from a Historical_Pattern that feeds into the Aggregation_Engine, representing the historical tendency for a given catalyst type to produce a specific trend outcome.
|
||||
- **Competitive_Signal**: A Pattern_Signal that propagates from one company's news event to a competitor, based on historical evidence of how similar events affected the competitor in the past.
|
||||
- **Signal_Propagation_Engine**: The component that evaluates incoming document intelligence for a company, identifies its competitors via the Competitor_Registry, queries the Pattern_Matcher for historical precedents, and produces Competitive_Signals for affected competitors.
|
||||
- **Aggregation_Engine**: The existing trend aggregation system (services/aggregation/) that computes rolling trend summaries from document intelligence signals, macro signals, and now pattern-based signals.
|
||||
- **Pattern_Confidence**: A score in [0, 1] reflecting how statistically reliable a Historical_Pattern is, based on sample size, consistency of outcomes, and recency of the historical data.
|
||||
- **Competitive_Layer_Toggle**: A runtime switch allowing operators to enable or disable the competitive/historical pattern signal layer without redeployment, analogous to the macro layer toggle.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Competitor Relationship Management
|
||||
|
||||
**User Story:** As an operator, I want to define which companies are competitors of each other, so that the platform can propagate signals across competitive relationships.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN an operator creates a Competitor_Relationship between two companies, THE Competitor_Registry SHALL persist the relationship containing: company_a_id, company_b_id, relationship_type (one of direct_rival, same_sector, overlapping_products, supply_chain_adjacent), strength (a float in [0, 1] representing how closely the companies compete), bidirectional flag (whether the relationship applies in both directions), and source (manual or inferred).
|
||||
2. WHEN an operator queries competitors for a given company, THE Competitor_Registry SHALL return all Competitor_Relationships where the company appears as either company_a or company_b, ordered by strength descending.
|
||||
3. WHEN an operator deletes a Competitor_Relationship, THE Competitor_Registry SHALL soft-delete the relationship by marking it inactive rather than removing the row, preserving audit history.
|
||||
4. THE Competitor_Registry SHALL expose Competitor_Relationship CRUD operations through the Symbol_Registry REST API.
|
||||
5. WHEN a Competitor_Relationship is created or updated, THE Competitor_Registry SHALL record an audit event with the previous state, new state, and the operator who made the change.
|
||||
|
||||
### Requirement 2: Competitor Auto-Inference
|
||||
|
||||
**User Story:** As an operator, I want the platform to automatically suggest competitor relationships based on sector, industry, and document co-mentions, so that I do not have to manually define every relationship.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN an operator triggers competitor auto-inference for a company, THE Competitor_Registry SHALL identify candidate competitors by matching companies that share the same sector and industry fields in the companies table.
|
||||
2. WHEN the Competitor_Registry identifies sector-based candidates, THE Competitor_Registry SHALL further rank candidates by counting co-mentions in the document_company_mentions table — companies frequently mentioned in the same documents receive higher strength scores.
|
||||
3. WHEN the Competitor_Registry produces auto-inferred relationships, THE Competitor_Registry SHALL mark each relationship with source `inferred` and a strength score derived from the sector match and co-mention frequency, distinguishing them from operator-defined relationships marked as source `manual`.
|
||||
4. WHEN auto-inferred relationships already exist for a company, THE Competitor_Registry SHALL refresh them on re-inference rather than creating duplicates, updating strength scores based on the latest co-mention data.
|
||||
5. THE Competitor_Registry SHALL expose an inference endpoint at `POST /companies/{company_id}/competitors/infer` that triggers auto-inference and returns the resulting candidate relationships.
|
||||
|
||||
### Requirement 3: Historical Pattern Mining
|
||||
|
||||
**User Story:** As a strategist, I want the platform to mine its historical data to find how similar catalyst types resolved in the past for a given company, so that current signals can be weighted by historical precedent.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Pattern_Matcher receives a query for a company and catalyst_type, THE Pattern_Matcher SHALL search the document_impact_records table for past instances where the same company received the same catalyst_type, and join against trend_windows to determine the trend direction and strength that followed within configurable time horizons (default: 1d, 7d, 30d).
|
||||
2. WHEN the Pattern_Matcher finds historical instances, THE Pattern_Matcher SHALL compute a Historical_Pattern containing: company ticker, catalyst_type, time_horizon, sample_count (number of historical instances found), bullish_pct (percentage of instances that resolved bullish), bearish_pct (percentage that resolved bearish), avg_strength (average trend strength of the outcomes), avg_time_to_resolution (average days until the trend direction stabilized), and pattern_confidence (a score reflecting statistical reliability).
|
||||
3. WHEN computing pattern_confidence, THE Pattern_Matcher SHALL weight the score by sample_count (more samples increase confidence, with diminishing returns above 20 samples), outcome_consistency (how uniform the historical outcomes are — 90% bullish is more confident than 55% bullish), and data_recency (patterns from the last 90 days receive higher weight than patterns from 180+ days ago).
|
||||
4. WHEN the Pattern_Matcher finds fewer than 3 historical instances for a company-catalyst pair, THE Pattern_Matcher SHALL mark the pattern_confidence as low (below 0.3) and flag the pattern as insufficient_data.
|
||||
5. WHEN the Pattern_Matcher queries historical data, THE Pattern_Matcher SHALL only consider document_impact_records linked to document_intelligence with validation_status `valid` and documents with status not equal to `rejected`.
|
||||
|
||||
### Requirement 4: Competitive Signal Propagation
|
||||
|
||||
**User Story:** As a strategist, I want the platform to evaluate how news about one company historically affected its competitors, so that competitor news can inform a company's trend assessment.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN new document intelligence is produced for a company, THE Signal_Propagation_Engine SHALL identify the company's active competitors via the Competitor_Registry and query the Pattern_Matcher for historical instances where the same catalyst_type hitting the source company correlated with trend outcomes for each competitor.
|
||||
2. WHEN the Pattern_Matcher finds historical cross-company patterns, THE Pattern_Matcher SHALL compute a Historical_Pattern for the competitor containing: source_ticker (the company that received the original catalyst), target_ticker (the competitor), catalyst_type, time_horizon, sample_count, bullish_pct, bearish_pct, avg_strength, and pattern_confidence.
|
||||
3. WHEN the Signal_Propagation_Engine produces a Competitive_Signal for a competitor, THE Signal_Propagation_Engine SHALL weight the signal by the Competitor_Relationship strength, the Historical_Pattern's pattern_confidence, and the source document's impact_score.
|
||||
4. WHEN a Competitive_Signal is produced, THE Signal_Propagation_Engine SHALL persist a competitive_signal_record containing: source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction (bullish or bearish based on historical pattern), signal_strength, relationship_strength, and computed_at timestamp.
|
||||
5. WHEN the Competitor_Relationship strength is below a configurable threshold (default 0.2), THE Signal_Propagation_Engine SHALL skip signal propagation for that competitor pair and log the skip reason.
|
||||
|
||||
### Requirement 5: Pattern-Based Trend Reinforcement
|
||||
|
||||
**User Story:** As a strategist, I want historical patterns to strengthen or weaken current trend signals, so that the aggregation engine accounts for how similar situations resolved in the past.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Aggregation_Engine computes a company trend summary, THE Aggregation_Engine SHALL include pattern-based signals (both self-company historical patterns and competitive signals) as additional weighted signals alongside existing document intelligence and macro signals.
|
||||
2. WHEN weighting pattern-based signals, THE Aggregation_Engine SHALL apply the pattern_confidence as a confidence gate, the Historical_Pattern's avg_strength as the impact_score, and recency decay based on the source document's publication time, consistent with existing signal scoring.
|
||||
3. WHEN a Historical_Pattern indicates a direction that contradicts the current company-specific signals, THE Aggregation_Engine SHALL represent the disagreement in the contradiction_score and disagreement_details fields, consistent with existing contradiction detection behavior.
|
||||
4. WHEN a trend summary includes pattern-based signal contributions, THE Aggregation_Engine SHALL include the source document IDs in the evidence references so that the pattern signal chain is traceable.
|
||||
5. WHEN no historical patterns or competitive signals exist for a company in the aggregation window, THE Aggregation_Engine SHALL produce the trend summary using only company-specific and macro signals, with no degradation of existing behavior.
|
||||
6. THE Aggregation_Engine SHALL expose a configurable weight parameter (competitive_signal_weight) that controls the relative influence of pattern-based signals versus other signal layers, defaulting to 0.2.
|
||||
|
||||
### Requirement 6: Competitive Layer Toggle
|
||||
|
||||
**User Story:** As an operator, I want to enable or disable the competitive intelligence and historical pattern layer at runtime without redeploying services, so that I can control whether historical patterns and competitor signals influence trend summaries.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN an operator toggles the competitive signal layer via the Trading Controls page or the API, THE System SHALL persist the setting in the risk_configs table and apply it immediately to subsequent aggregation cycles without requiring a service restart.
|
||||
2. WHEN the competitive signal layer is disabled, THE Aggregation_Engine SHALL skip all pattern-based and competitive signals and produce trend summaries using only company-specific document intelligence and macro signals (if enabled).
|
||||
3. WHEN the competitive signal layer is disabled, THE Pattern_Matcher SHALL continue to be queryable for historical patterns (so that the data remains available for manual analysis), but THE Signal_Propagation_Engine SHALL skip automatic competitive signal computation during aggregation.
|
||||
4. WHEN the competitive signal layer is re-enabled after being disabled, THE Signal_Propagation_Engine SHALL resume computing pattern-based and competitive signals using the latest historical data, including any document intelligence ingested while the layer was disabled.
|
||||
5. THE Query API SHALL expose a `GET /api/admin/competitive/status` endpoint returning the current enabled/disabled state and a `PUT /api/admin/competitive/toggle` endpoint to switch it.
|
||||
6. THE Dashboard Trading Controls page SHALL display the competitive signal layer toggle alongside the existing trading mode and macro layer controls, with a confirmation dialog for state changes.
|
||||
7. WHEN the competitive signal layer state changes, THE System SHALL record an audit event with the previous state, new state, and the operator who made the change.
|
||||
|
||||
### Requirement 7: Competitive Intelligence Storage
|
||||
|
||||
**User Story:** As a data engineer, I want competitor relationships, historical patterns, and competitive signals stored in both the operational database and the analytical lake, so that I can query competitive intelligence alongside other platform data.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Competitor_Relationship is created, THE System SHALL persist it in PostgreSQL with fields for company_a_id, company_b_id, relationship_type, strength, bidirectional, source, active status, and timestamps.
|
||||
2. WHEN a competitive_signal_record is produced, THE System SHALL persist it in PostgreSQL with fields for source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, and computed_at timestamp.
|
||||
3. WHEN the Lake_Publisher runs, THE Lake_Publisher SHALL publish competitor relationship facts and competitive signal facts as partitioned Parquet datasets to MinIO under the `stonks-lakehouse` bucket.
|
||||
4. WHEN analytical queries join competitive signal data with company trends, THE System SHALL support SQL joins between competitor_relationships, competitive_signals, trend_windows, and document_impact_records tables through Trino.
|
||||
|
||||
### Requirement 8: Dashboard Visibility
|
||||
|
||||
**User Story:** As an analyst, I want to see competitor relationships, historical patterns, and competitive signals through the web dashboard, so that I can understand the competitive context behind trend assessments.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN an analyst views a company detail page, THE Dashboard SHALL display a competitors panel showing the company's active Competitor_Relationships with each competitor's ticker, relationship_type, strength score, and source (manual or inferred).
|
||||
2. WHEN an analyst views a company detail page, THE Dashboard SHALL display a historical patterns panel showing recent Historical_Patterns for the company, including catalyst_type, historical outcome distribution (bullish_pct, bearish_pct), sample_count, and pattern_confidence.
|
||||
3. WHEN an analyst views a trend summary, THE Dashboard SHALL visually distinguish pattern-based and competitive signal evidence from company-specific and macro evidence in the evidence chain.
|
||||
4. WHEN an analyst clicks a competitive signal in the evidence chain, THE Dashboard SHALL display the full signal detail including the source company, source document, catalyst_type, historical pattern statistics, and the Competitor_Relationship that linked the two companies.
|
||||
5. WHEN an analyst views a company detail page, THE Dashboard SHALL display an incoming competitive signals panel showing recent Competitive_Signals targeting this company from competitor news, with source ticker, catalyst_type, signal_direction, and signal_strength.
|
||||
|
||||
### Requirement 9: Pattern Signal Suppression and Safety
|
||||
|
||||
**User Story:** As a risk owner, I want pattern-based and competitive signals to be subject to quality controls, so that low-confidence historical patterns do not drive automated trading decisions.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Historical_Pattern has a pattern_confidence below a configurable threshold (default 0.3), THE Signal_Propagation_Engine SHALL exclude the pattern from competitive signal computation and log the exclusion reason.
|
||||
2. WHEN a Historical_Pattern is based on historical data older than a configurable staleness window (default 180 days with no instances in the last 90 days), THE Pattern_Matcher SHALL apply a decay penalty to the pattern_confidence.
|
||||
3. WHEN pattern-based signals are the sole basis for a trend direction change (no supporting company-specific or macro signals), THE Recommendation_Engine SHALL mark the recommendation as informational only and append a pattern-only caveat to the thesis.
|
||||
4. IF the competitive signal computation encounters sustained errors exceeding a configurable threshold, THEN THE System SHALL alert operators and continue producing recommendations using only company-specific and macro signals.
|
||||
|
||||
### Requirement 10: Historical Pattern Query API
|
||||
|
||||
**User Story:** As an analyst, I want to query historical patterns on demand for any company and catalyst type, so that I can manually investigate how similar situations resolved in the past.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Query API SHALL expose a `GET /api/patterns/{ticker}` endpoint returning all available Historical_Patterns for a company, filterable by catalyst_type and time_horizon.
|
||||
2. THE Query API SHALL expose a `GET /api/patterns/{ticker}/competitors` endpoint returning cross-company Historical_Patterns showing how the specified company's catalysts historically affected its competitors.
|
||||
3. WHEN the pattern query endpoints return results, THE Query API SHALL include the underlying sample_count, outcome distribution, pattern_confidence, and the date range of the historical data used.
|
||||
4. THE Query API SHALL expose a `GET /api/patterns/{ticker}/competitive-signals` endpoint returning recent Competitive_Signals targeting the specified company, with source details and pattern statistics.
|
||||
|
||||
### Requirement 11: Corporate Decision History Tracking
|
||||
|
||||
**User Story:** As a strategist, I want the platform to identify and track major corporate decisions (acquisitions, divestitures, leadership changes, strategic pivots, major partnerships, stock buybacks, dividend changes, restructurings) from the existing document intelligence, so that historical pattern mining can weight these high-impact events distinctly from routine news.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Pattern_Matcher mines historical data, THE Pattern_Matcher SHALL classify document_impact_records into two tiers: major_corporate_decision (catalyst types including m_and_a, legal, restructuring, leadership_change, strategic_pivot, buyback, dividend_change) and routine_signal (all other catalyst types), and compute separate Historical_Patterns for each tier.
|
||||
2. WHEN a major_corporate_decision pattern is found, THE Pattern_Matcher SHALL apply a higher base weight to the pattern_confidence calculation compared to routine_signal patterns, reflecting that major decisions have more predictable and durable market impact.
|
||||
3. WHEN the Pattern_Matcher computes a Historical_Pattern for a major_corporate_decision, THE Pattern_Matcher SHALL extend the default lookback window to 365 days (compared to 180 days for routine signals), since major corporate decisions are rarer but their outcomes are more structurally significant.
|
||||
4. WHEN an analyst views a company detail page, THE Dashboard SHALL display a corporate decision timeline showing major_corporate_decision events extracted from the company's document intelligence history, with the catalyst type, date, summary, and the trend outcome that followed.
|
||||
5. WHEN the Pattern_Matcher evaluates competitive signal propagation for a major_corporate_decision catalyst, THE Pattern_Matcher SHALL search for historical instances where similar major decisions by competitors produced measurable trend shifts for the target company, using the extended 365-day lookback window.
|
||||
6. THE Query API SHALL expose a `GET /api/patterns/{ticker}/decisions` endpoint returning the company's major corporate decision history with associated trend outcomes and pattern statistics.
|
||||
@@ -0,0 +1,300 @@
|
||||
# Implementation Plan: Competitive Intelligence & Historical Pattern Matching Layer
|
||||
|
||||
## Overview
|
||||
|
||||
This plan implements a third signal layer for the Stonks Oracle aggregation engine: competitive intelligence and historical pattern matching. The layer mines existing PostgreSQL data (document_impact_records, trend_windows, document_company_mentions) to identify how similar catalyst types resolved historically for a company and its competitors, then feeds pattern-based signals into the aggregation engine alongside company-specific (layer 1) and macro (layer 2) signals. All modules extend existing services — no new Kubernetes deployments required. Tasks are ordered so each step builds on the previous, with property-based tests validating core logic early.
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 1. Database migration and shared schemas
|
||||
- [x] 1.1 Create PostgreSQL migration `infra/migrations/017_competitive_historical_patterns.sql`
|
||||
- Add `competitor_relationships` table with id (UUID PK), company_a_id (FK companies), company_b_id (FK companies), relationship_type (VARCHAR CHECK direct_rival|same_sector|overlapping_products|supply_chain_adjacent), strength (FLOAT CHECK [0,1]), bidirectional (BOOLEAN), source (VARCHAR CHECK manual|inferred), active (BOOLEAN), created_at, updated_at
|
||||
- Add `competitive_signal_records` table with id (UUID PK), source_document_id (FK documents), source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, computed_at
|
||||
- Add CHECK constraint preventing self-referencing relationships (company_a_id != company_b_id)
|
||||
- Add unique index on (LEAST(company_a_id, company_b_id), GREATEST(company_a_id, company_b_id)) WHERE active = TRUE to prevent duplicate active pairs
|
||||
- Add indexes: idx_competitor_rel_company_a, idx_competitor_rel_company_b (both WHERE active = TRUE), idx_competitive_signals_target (target_ticker, computed_at DESC), idx_competitive_signals_source (source_ticker, computed_at DESC)
|
||||
- _Requirements: 7.1, 7.2_
|
||||
|
||||
- [x] 1.2 Add new Pydantic schemas and enums to `services/shared/schemas.py`
|
||||
- Add `RelationshipType` enum (direct_rival, same_sector, overlapping_products, supply_chain_adjacent)
|
||||
- Add `CatalystTier` enum (major_corporate_decision, routine_signal)
|
||||
- Add `MAJOR_DECISION_CATALYSTS` frozenset (m_and_a, legal, restructuring, leadership_change, strategic_pivot, buyback, dividend_change)
|
||||
- Add `CompetitorRelationshipSchema`, `CompetitiveSignalRecordSchema`, `HistoricalPatternSchema` Pydantic models
|
||||
- _Requirements: 1.1, 4.4, 7.1, 7.2, 11.1_
|
||||
|
||||
- [x] 1.3 Add competitive configuration fields to `services/shared/config.py`
|
||||
- Add `CompetitiveConfig` dataclass with fields: competitive_signal_weight (0.2), competitive_enabled (True), pattern_confidence_threshold (0.3), propagation_strength_threshold (0.2), routine_lookback_days (180), major_decision_lookback_days (365), major_decision_weight_multiplier (1.3), staleness_window_days (180), staleness_recent_days (90), staleness_decay_penalty (0.5), min_pattern_samples (3)
|
||||
- Add `competitive: CompetitiveConfig` to `AppConfig` with env var loading in `load_config()`
|
||||
- _Requirements: 5.6, 6.1, 9.1, 9.2, 11.2, 11.3_
|
||||
|
||||
- [x] 2. Checkpoint — Ensure migration and schemas are consistent
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 3. Competitor Registry and auto-inference
|
||||
- [x] 3.1 Implement `services/symbol_registry/competitors.py`
|
||||
- Implement `CompetitorRelationshipCreate` and `CompetitorRelationship` Pydantic models for API request/response
|
||||
- Implement `POST /companies/{company_id}/competitors` — create relationship with audit event
|
||||
- Implement `GET /companies/{company_id}/competitors` — list active relationships ordered by strength descending
|
||||
- Implement `PUT /companies/{company_id}/competitors/{relationship_id}` — update relationship with audit event recording previous state
|
||||
- Implement `DELETE /companies/{company_id}/competitors/{relationship_id}` — soft-delete (set active=False), preserve row
|
||||
- Register routes as a FastAPI router on the Symbol Registry app
|
||||
- Handle error cases: self-referencing (400), duplicate active pair (409), non-existent company (404)
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
|
||||
|
||||
- [x] 3.2 Write property test for competitor relationship persistence round-trip
|
||||
- **Property 1: Competitor relationship persistence round-trip**
|
||||
- **Validates: Requirements 1.1, 7.1**
|
||||
|
||||
- [x] 3.3 Write property test for competitor query completeness and ordering
|
||||
- **Property 2: Competitor query completeness and ordering**
|
||||
- **Validates: Requirements 1.2**
|
||||
|
||||
- [x] 3.4 Write property test for soft-delete preserves row
|
||||
- **Property 3: Soft-delete preserves row**
|
||||
- **Validates: Requirements 1.3**
|
||||
|
||||
- [x] 3.5 Implement `services/symbol_registry/competitor_inference.py`
|
||||
- Implement `infer_competitors(pool, company_id) -> list[CompetitorRelationship]`
|
||||
- Query companies sharing the same sector and industry
|
||||
- Rank candidates by co-mention frequency in `document_company_mentions`
|
||||
- Compute strength = `0.3 * sector_match + 0.7 * normalized_co_mention_count`
|
||||
- Upsert relationships with `source='inferred'`, refreshing strength on re-inference (no duplicates)
|
||||
- Implement `POST /companies/{company_id}/competitors/infer` endpoint returning candidate relationships
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5_
|
||||
|
||||
- [x] 3.6 Write property test for auto-inference produces valid candidates
|
||||
- **Property 4: Auto-inference produces valid candidates**
|
||||
- **Validates: Requirements 2.1, 2.3**
|
||||
|
||||
- [x] 3.7 Write property test for auto-inference ranks by co-mention frequency
|
||||
- **Property 5: Auto-inference ranks by co-mention frequency**
|
||||
- **Validates: Requirements 2.2**
|
||||
|
||||
- [x] 3.8 Write property test for auto-inference idempotence
|
||||
- **Property 6: Auto-inference idempotence**
|
||||
- **Validates: Requirements 2.4**
|
||||
|
||||
- [x] 4. Checkpoint — Ensure competitor registry and inference work correctly
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 5. Pattern Matcher — core historical pattern mining
|
||||
- [x] 5.1 Implement `services/aggregation/pattern_matcher.py`
|
||||
- Implement `HistoricalPattern` dataclass matching the design specification
|
||||
- Implement `classify_catalyst_tier(catalyst_type) -> str` — deterministic mapping of major_corporate_decision vs routine_signal catalyst types
|
||||
- Implement `compute_pattern_confidence(sample_count, outcome_consistency, data_recency_days, tier) -> float` using the formula: `sample_factor * 0.4 + consistency * 0.4 + recency_factor * 0.2`, with 1.3× multiplier for major decisions
|
||||
- Implement `find_self_patterns(pool, ticker, catalyst_type, horizons) -> list[HistoricalPattern]` — query document_impact_records joined with trend_windows for same company-catalyst pair across configurable time horizons (1d, 7d, 30d)
|
||||
- Implement `find_cross_company_patterns(pool, source_ticker, target_ticker, catalyst_type, horizons) -> list[HistoricalPattern]` — query cross-company historical patterns
|
||||
- Only consider records linked to document_intelligence with validation_status='valid' and documents with status != 'rejected'
|
||||
- Apply insufficient data threshold: when sample_count < 3, cap confidence at 0.25 and set insufficient_data=True
|
||||
- Apply staleness decay: when no instances in last 90 days and all data older than 180 days, apply 0.5 decay penalty
|
||||
- Use 365-day lookback for major_corporate_decision catalysts, 180-day for routine_signal
|
||||
- Compute separate HistoricalPatterns for each catalyst tier
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 11.1, 11.2, 11.3, 11.5_
|
||||
|
||||
- [x] 5.2 Write property test for pattern computation correctness
|
||||
- **Property 7: Pattern computation correctness**
|
||||
- **Validates: Requirements 3.1, 3.2, 4.2**
|
||||
|
||||
- [x] 5.3 Write property test for pattern confidence monotonicity
|
||||
- **Property 8: Pattern confidence monotonicity**
|
||||
- **Validates: Requirements 3.3, 11.2**
|
||||
|
||||
- [x] 5.4 Write property test for insufficient data threshold
|
||||
- **Property 9: Insufficient data threshold**
|
||||
- **Validates: Requirements 3.4**
|
||||
|
||||
- [x] 5.5 Write property test for valid-only data filtering
|
||||
- **Property 10: Valid-only data filtering**
|
||||
- **Validates: Requirements 3.5**
|
||||
|
||||
- [x] 5.6 Write property test for catalyst tier classification determinism
|
||||
- **Property 19: Catalyst tier classification determinism**
|
||||
- **Validates: Requirements 11.1**
|
||||
|
||||
- [x] 5.7 Write property test for major decision extended lookback
|
||||
- **Property 20: Major decision extended lookback**
|
||||
- **Validates: Requirements 11.3, 11.5**
|
||||
|
||||
- [x] 6. Checkpoint — Ensure pattern matcher and property tests pass
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 7. Signal Propagation Engine
|
||||
- [x] 7.1 Implement `services/aggregation/signal_propagation.py`
|
||||
- Implement `CompetitiveSignalRecord` dataclass matching the design specification
|
||||
- Implement `propagate_signals(pool, ticker, catalyst_type, impact_score, document_id, config) -> list[CompetitiveSignalRecord]` — look up competitors, query cross-company patterns, produce weighted competitive signals
|
||||
- Signal weighting: `signal_strength = pattern.avg_strength * relationship.strength * pattern.pattern_confidence * source_impact_score`
|
||||
- Signal direction: bullish if pattern.bullish_pct > bearish_pct, else bearish
|
||||
- Skip propagation when relationship.strength < propagation_strength_threshold (default 0.2), log skip reason
|
||||
- Exclude patterns with pattern_confidence < pattern_confidence_threshold (default 0.3), log exclusion reason
|
||||
- Persist CompetitiveSignalRecord objects to the competitive_signal_records PostgreSQL table
|
||||
- Implement `build_pattern_weighted_signals(patterns, competitive_signals, reference_time, window, config) -> list[WeightedSignal]` — convert pattern/competitive signals to WeightedSignal objects for aggregation
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 9.1_
|
||||
|
||||
- [x] 7.2 Write property test for competitive signal strength monotonicity
|
||||
- **Property 11: Competitive signal strength monotonicity**
|
||||
- **Validates: Requirements 4.3**
|
||||
|
||||
- [x] 7.3 Write property test for signal propagation threshold gating
|
||||
- **Property 12: Signal propagation threshold gating**
|
||||
- **Validates: Requirements 4.5, 9.1**
|
||||
|
||||
- [x] 7.4 Write property test for pattern signal to WeightedSignal conversion
|
||||
- **Property 13: Pattern signal to WeightedSignal conversion**
|
||||
- **Validates: Requirements 5.2**
|
||||
|
||||
- [x] 7.5 Write property test for competitive signal persistence round-trip
|
||||
- **Property 21: Competitive signal persistence round-trip**
|
||||
- **Validates: Requirements 4.4, 7.2**
|
||||
|
||||
- [x] 8. Checkpoint — Ensure signal propagation and property tests pass
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 9. Aggregation engine integration
|
||||
- [x] 9.1 Extend `services/aggregation/worker.py` to incorporate pattern-based and competitive signals
|
||||
- Add `competitive_signal_weight` and `competitive_enabled` fields to `AggregationConfig`
|
||||
- In `aggregate_company_window`, check competitive toggle state from `risk_configs` table (same pattern as macro toggle)
|
||||
- When competitive layer is enabled: query self-company historical patterns for active catalyst types in the window, query competitive signals targeting this ticker
|
||||
- Convert each pattern signal to a `WeightedSignal` using: document_id = source document, sentiment_value = +1.0 (bullish) or -1.0 (bearish), impact_score = signal_strength × competitive_signal_weight, recency decay from source document publication time, confidence gating from pattern_confidence
|
||||
- Merge pattern/competitive signals with company-specific and macro signals before computing trend direction, strength, confidence, and contradiction score
|
||||
- Include contributing source_document_ids in evidence references for traceability
|
||||
- When competitive layer is disabled or no pattern data exists, produce identical output to company+macro-only aggregation
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6_
|
||||
|
||||
- [x] 9.2 Write property test for pattern-company contradiction detection
|
||||
- **Property 14: Pattern-company contradiction detection**
|
||||
- **Validates: Requirements 5.3**
|
||||
|
||||
- [x] 9.3 Write property test for pattern evidence traceability
|
||||
- **Property 15: Pattern evidence traceability**
|
||||
- **Validates: Requirements 5.4**
|
||||
|
||||
- [x] 9.4 Write property test for no-degradation and disabled-layer equivalence
|
||||
- **Property 16: No-degradation and disabled-layer equivalence**
|
||||
- **Validates: Requirements 5.5, 6.2**
|
||||
|
||||
- [x] 9.5 Write property test for staleness decay penalty
|
||||
- **Property 17: Staleness decay penalty**
|
||||
- **Validates: Requirements 9.2**
|
||||
|
||||
- [x] 10. Checkpoint — Ensure aggregation integration works correctly
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 11. Pattern-only suppression and safety
|
||||
- [x] 11.1 Extend `services/recommendation/suppression.py` with pattern-only suppression
|
||||
- Add `PATTERN_ONLY_SIGNAL = "pattern_only_signal"` to `SuppressionReason` enum
|
||||
- Implement `evaluate_pattern_only_suppression(summary, pattern_signal_count, company_signal_count, macro_signal_count) -> bool`
|
||||
- When pattern-based signals are the sole basis for a trend direction change, force recommendation to `mode='informational'` and append pattern-only caveat to thesis
|
||||
- _Requirements: 9.3_
|
||||
|
||||
- [x] 11.2 Write property test for pattern-only suppression
|
||||
- **Property 18: Pattern-only suppression**
|
||||
- **Validates: Requirements 9.3**
|
||||
|
||||
- [x] 12. Competitive layer toggle and API endpoints
|
||||
- [x] 12.1 Implement competitive toggle and status endpoints in `services/api/app.py`
|
||||
- Add `GET /api/admin/competitive/status` returning current enabled/disabled state from `risk_configs` table
|
||||
- Add `PUT /api/admin/competitive/toggle` to switch competitive layer on/off, persisting to `risk_configs` and recording an audit event with previous state, new state, and operator
|
||||
- Toggle state is read from PostgreSQL at the start of each aggregation cycle (no caching)
|
||||
- When disabled, pattern mining remains queryable via API but signal propagation is skipped during aggregation
|
||||
- When re-enabled, resume computing signals using latest historical data including intelligence ingested while disabled
|
||||
- _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5, 6.7_
|
||||
|
||||
- [x] 12.2 Implement pattern and competitive signal query endpoints in `services/api/app.py`
|
||||
- Add `GET /api/patterns/{ticker}` — historical patterns for a company, filterable by catalyst_type and time_horizon
|
||||
- Add `GET /api/patterns/{ticker}/competitors` — cross-company patterns showing how this company's catalysts affected competitors
|
||||
- Add `GET /api/patterns/{ticker}/competitive-signals` — recent competitive signals targeting this company
|
||||
- Add `GET /api/patterns/{ticker}/decisions` — major corporate decision history with trend outcomes and pattern statistics
|
||||
- Include sample_count, outcome distribution, pattern_confidence, and date range in all responses
|
||||
- _Requirements: 10.1, 10.2, 10.3, 10.4, 11.4, 11.6_
|
||||
|
||||
- [x] 13. Checkpoint — Ensure API endpoints and toggle logic work correctly
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 14. Lake publisher extensions
|
||||
- [x] 14.1 Add competitive fact publishers to the lake publisher service
|
||||
- Implement `publish_competitor_relationship_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/competitor_relationships/dt={date}/`
|
||||
- Implement `publish_competitive_signal_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/competitive_signals/dt={date}/target_ticker={ticker}/`
|
||||
- Register new fact types in the lake publisher's job processing loop
|
||||
- _Requirements: 7.3, 7.4_
|
||||
|
||||
- [x] 15. Signal propagation wiring into aggregation pipeline
|
||||
- [x] 15.1 Wire signal propagation into the aggregation worker
|
||||
- After document intelligence is produced for a company, trigger signal propagation for the company's competitors
|
||||
- In the aggregation cycle, call `propagate_signals` for each new document intelligence record when competitive layer is enabled
|
||||
- Handle sustained propagation errors: after configurable threshold (default 5 consecutive failures), alert operators and continue with company-specific + macro signals only
|
||||
- _Requirements: 4.1, 9.4_
|
||||
|
||||
- [x] 15.2 Wire pattern mining into the aggregation cycle
|
||||
- During `aggregate_company_window`, call pattern matcher for self-company patterns and collect competitive signals for the ticker
|
||||
- Merge resulting WeightedSignals into the signal list before trend computation
|
||||
- Ensure evidence references include pattern signal source document IDs
|
||||
- _Requirements: 5.1, 5.4_
|
||||
|
||||
- [x] 16. Checkpoint — Ensure full backend pipeline works end-to-end
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 17. Dashboard — Competitors panel and historical patterns
|
||||
- [x] 17.1 Add competitors panel to Company Detail page
|
||||
- On `frontend/src/pages/CompanyDetail.tsx`, add a Competitors tab showing active competitor relationships with ticker, relationship_type, strength score, source (manual/inferred)
|
||||
- Add API hooks for `GET /companies/{company_id}/competitors` in `frontend/src/api/hooks.ts`
|
||||
- Add infer button triggering `POST /companies/{company_id}/competitors/infer`
|
||||
- _Requirements: 8.1_
|
||||
|
||||
- [x] 17.2 Add historical patterns panel to Company Detail page
|
||||
- On `frontend/src/pages/CompanyDetail.tsx`, add a Historical Patterns tab showing recent patterns: catalyst_type, outcome distribution (bullish_pct, bearish_pct), sample_count, pattern_confidence
|
||||
- Add API hook for `GET /api/patterns/{ticker}`
|
||||
- _Requirements: 8.2_
|
||||
|
||||
- [x] 17.3 Add competitive signals panel to Company Detail page
|
||||
- On `frontend/src/pages/CompanyDetail.tsx`, add a Competitive Signals tab showing incoming signals: source ticker, catalyst_type, signal_direction, signal_strength
|
||||
- Add API hook for `GET /api/patterns/{ticker}/competitive-signals`
|
||||
- Click-through on a signal shows full detail: source company, source document, catalyst_type, historical pattern statistics, competitor relationship
|
||||
- _Requirements: 8.5, 8.4_
|
||||
|
||||
- [x] 17.4 Add corporate decision timeline to Company Detail page
|
||||
- On `frontend/src/pages/CompanyDetail.tsx`, add a Decisions tab showing major_corporate_decision events: catalyst type, date, summary, trend outcome that followed
|
||||
- Add API hook for `GET /api/patterns/{ticker}/decisions`
|
||||
- _Requirements: 11.4_
|
||||
|
||||
- [x] 17.5 Add pattern-based evidence indicators to Trend detail page
|
||||
- On `frontend/src/pages/TrendDetail.tsx`, visually distinguish pattern-based and competitive signal evidence from company-specific and macro evidence (badge/icon differentiation)
|
||||
- _Requirements: 8.3_
|
||||
|
||||
- [x] 17.6 Add competitive toggle to Trading Controls page
|
||||
- On `frontend/src/pages/Trading.tsx`, add competitive signal layer enable/disable switch alongside existing macro toggle, with confirmation dialog
|
||||
- Add API hooks for `GET /api/admin/competitive/status` and `PUT /api/admin/competitive/toggle`
|
||||
- _Requirements: 6.6_
|
||||
|
||||
- [x] 18. Checkpoint — Ensure frontend pages render and integrate with API
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 19. Integration wiring and final validation
|
||||
- [x] 19.1 Write integration tests for competitive pipeline end-to-end
|
||||
- Test document intelligence → pattern mining → signal propagation → aggregation flow
|
||||
- Test lake publisher writes correct Parquet partitions for competitor relationships and competitive signals
|
||||
- Test competitive toggle state change propagates to next aggregation cycle
|
||||
- Test toggle disable/re-enable cycle preserves data integrity
|
||||
- _Requirements: 4.1, 5.1, 6.1, 6.4, 7.3_
|
||||
|
||||
- [x] 19.2 Write unit tests for API endpoints and dashboard components
|
||||
- Test competitor CRUD endpoints return correct data and error codes (400, 404, 409)
|
||||
- Test pattern query endpoints return correct data with filtering
|
||||
- Test competitive toggle endpoint persists state and records audit event
|
||||
- Test auto-inference endpoint with empty data, single company, no co-mentions
|
||||
- Add MSW handlers for competitive endpoints in `frontend/src/test/mocks/handlers.ts`
|
||||
- Test competitors panel, historical patterns panel, competitive signals panel, and decision timeline render correctly
|
||||
- _Requirements: 1.4, 2.5, 6.5, 8.1, 8.2, 8.5, 10.1, 10.4_
|
||||
|
||||
- [x] 20. Final checkpoint — Ensure all tests pass
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
## Notes
|
||||
|
||||
- Tasks marked with `*` are optional and can be skipped for faster MVP
|
||||
- Each task references specific requirements for traceability
|
||||
- Checkpoints ensure incremental validation after each major phase
|
||||
- Property tests validate the 21 correctness properties from the design using Hypothesis
|
||||
- The design uses Python throughout — no language selection needed
|
||||
- No new Kubernetes deployments required; all modules extend existing services
|
||||
- Next migration number is 017 (016 is global-news-interpolation)
|
||||
- Competitive layer follows the same toggle/suppression/aggregation pattern as the macro layer for consistency
|
||||
@@ -0,0 +1 @@
|
||||
{"specId": "3e745894-9abc-49ff-97cc-c921f436bb32", "workflowType": "requirements-first", "specType": "feature"}
|
||||
@@ -0,0 +1,619 @@
|
||||
# Global News Interpolation Layer — Design
|
||||
|
||||
## Overview
|
||||
|
||||
This design extends the Stonks Oracle platform with a macro-level global news interpolation layer. The layer introduces a parallel signal path that ingests global/geopolitical news events, classifies them by impact type and severity using Ollama, maps them to individual companies via exposure profiles, and feeds the resulting macro impact scores into the existing aggregation engine as weighted signals alongside company-specific document intelligence.
|
||||
|
||||
The design integrates with the existing service architecture — no new Kubernetes deployments are required. The event classifier reuses the extractor service's Ollama client, the interpolation engine runs within the aggregation worker, and exposure profiles are managed through the symbol registry API. A runtime toggle allows operators to enable/disable the macro signal layer without redeployment.
|
||||
|
||||
### Design Rationale
|
||||
|
||||
- **Reuse over new services**: The macro pipeline reuses existing ingestion, parsing, extraction, aggregation, and lake publisher infrastructure. New logic is added as modules within existing services rather than standalone deployments.
|
||||
- **Exposure-driven specificity**: Rather than applying a blanket macro sentiment to all companies, the system computes company-specific impact scores based on geographic revenue mix, supply chain exposure, and commodity dependencies.
|
||||
- **Safety-first**: Macro signals are subject to confidence gating, staleness decay, and a dedicated runtime toggle. Macro-only trend shifts are forced to informational mode.
|
||||
- **Auditability**: Every macro impact score is traceable from the originating global event through the classification, exposure profile overlap, and final weighted contribution to the trend summary.
|
||||
|
||||
## Architecture
|
||||
|
||||
The macro interpolation layer adds four logical components that run within existing services:
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Ingestion["Ingestion Service (existing)"]
|
||||
MS[Macro Source Adapter]
|
||||
end
|
||||
|
||||
subgraph Parser["Parser Service (existing)"]
|
||||
MP[Macro Article Parser]
|
||||
end
|
||||
|
||||
subgraph Extractor["Extractor Service (existing)"]
|
||||
EC[Event Classifier Module]
|
||||
end
|
||||
|
||||
subgraph SymReg["Symbol Registry (existing)"]
|
||||
EP[Exposure Profile CRUD]
|
||||
end
|
||||
|
||||
subgraph Aggregation["Aggregation Service (existing)"]
|
||||
IE[Interpolation Engine]
|
||||
AE[Aggregation Engine]
|
||||
TP[Trend Projections]
|
||||
end
|
||||
|
||||
subgraph Recommendation["Recommendation Service (existing)"]
|
||||
RE[Macro-Aware Recommendations]
|
||||
end
|
||||
|
||||
subgraph LakePublisher["Lake Publisher (existing)"]
|
||||
LP[Macro Fact Publisher]
|
||||
end
|
||||
|
||||
subgraph QueryAPI["Query API (existing)"]
|
||||
MA[Macro API Endpoints]
|
||||
MT[Macro Toggle Endpoint]
|
||||
end
|
||||
|
||||
subgraph Dashboard["Dashboard (existing)"]
|
||||
GEP[Global Events Page]
|
||||
MEP[Macro Exposure Panel]
|
||||
end
|
||||
|
||||
MS -->|raw macro articles| MP
|
||||
MP -->|normalized text| EC
|
||||
EC -->|Global_Event classification| IE
|
||||
EP -->|Exposure_Profiles| IE
|
||||
IE -->|macro impact signals| AE
|
||||
AE -->|trend summaries + projections| TP
|
||||
TP --> RE
|
||||
EC -->|event facts| LP
|
||||
IE -->|impact facts| LP
|
||||
MT -->|toggle state| AE
|
||||
MA --> GEP
|
||||
MA --> MEP
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
1. **Ingestion**: Scheduler triggers macro source fetches. The existing ingestion worker fetches from configured macro news sources and stores raw payloads in MinIO under `stonks-raw-news/macro/`. Metadata records use `document_type = 'macro_event'`.
|
||||
|
||||
2. **Parsing**: The existing parser normalizes macro articles identically to company-specific articles. No parser changes needed — the parser is document-type agnostic.
|
||||
|
||||
3. **Classification**: A new `event_classifier` module in the extractor service uses a dedicated Ollama prompt and JSON schema to produce `GlobalEvent` classification objects. The module reuses the existing `OllamaClient` for inference and retry logic.
|
||||
|
||||
4. **Interpolation**: A new `interpolation` module in the aggregation service loads company exposure profiles, computes overlap scores against each classified event, and produces `MacroImpactRecord` objects. These are stored in PostgreSQL and fed into the aggregation engine as additional weighted signals.
|
||||
|
||||
5. **Aggregation**: The existing `aggregate_company_window` function is extended to fetch macro impact records alongside document impact records. Macro signals use the same `WeightedSignal` abstraction with recency decay, confidence gating, and contradiction detection.
|
||||
|
||||
6. **Trend Projections**: A new projection module computes forward-looking trend estimates by combining current trend momentum with active macro event trajectories and known upcoming catalysts.
|
||||
|
||||
7. **Recommendation**: The recommendation engine incorporates macro signals through the trend summary (no direct changes needed). A new check forces macro-only trend shifts to informational mode.
|
||||
|
||||
8. **Lake Publication**: New `publish_global_event_fact` and `publish_macro_impact_fact` functions in the lake publisher write partitioned Parquet datasets for analytical queries.
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### Event Classifier Module
|
||||
|
||||
**Location**: `services/extractor/event_classifier.py`
|
||||
|
||||
Responsible for classifying macro news articles into structured `GlobalEvent` objects using Ollama.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class GlobalEvent:
|
||||
event_id: str # UUID
|
||||
event_types: list[str] # Impact_Type values
|
||||
severity: str # Severity_Level: low|moderate|high|critical
|
||||
affected_regions: list[str] # ISO 3166-1 alpha-2 codes or region names
|
||||
affected_sectors: list[str] # GICS sector identifiers
|
||||
affected_commodities: list[str] # commodity identifiers when applicable
|
||||
summary: str
|
||||
key_facts: list[str]
|
||||
estimated_duration: str # short_term|medium_term|long_term
|
||||
confidence: float # [0, 1]
|
||||
source_document_id: str # FK to documents table
|
||||
model_metadata: ModelMetadata
|
||||
```
|
||||
|
||||
**Interface**:
|
||||
- `classify_global_event(normalized_text: str, document_id: str, ollama_client: OllamaClient) -> GlobalEvent`
|
||||
- `build_event_classification_prompt(text: str) -> str`
|
||||
- `get_event_json_schema() -> dict`
|
||||
|
||||
**Ollama Integration**: Uses the existing `OllamaClient` with a dedicated prompt template (`event-classification-v1`) and JSON schema. Retries follow the same policy as document extraction.
|
||||
|
||||
### Exposure Profile Management
|
||||
|
||||
**Location**: `services/symbol_registry/exposure.py`
|
||||
|
||||
New endpoints on the Symbol Registry API for managing company exposure profiles.
|
||||
|
||||
```python
|
||||
class ExposureProfile(BaseModel):
|
||||
company_id: str
|
||||
geographic_revenue_mix: dict[str, float] # region_code -> pct (0-1)
|
||||
supply_chain_regions: list[str] # region codes
|
||||
key_input_commodities: list[str] # commodity identifiers
|
||||
regulatory_jurisdictions: list[str] # jurisdiction codes
|
||||
market_position_tier: str # global_leader|multinational|regional|domestic
|
||||
export_dependency_pct: float # 0-1
|
||||
source: str # "manual" | "inferred"
|
||||
confidence: float # [0, 1], relevant for inferred profiles
|
||||
version: int # auto-incremented on update
|
||||
```
|
||||
|
||||
**API Endpoints** (on Symbol Registry):
|
||||
- `GET /companies/{company_id}/exposure` — get current profile
|
||||
- `PUT /companies/{company_id}/exposure` — create/update profile (archives previous version)
|
||||
- `GET /companies/{company_id}/exposure/history` — list profile versions
|
||||
|
||||
### Interpolation Engine
|
||||
|
||||
**Location**: `services/aggregation/interpolation.py`
|
||||
|
||||
Computes per-company macro impact scores by evaluating overlap between global event classifications and company exposure profiles.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class MacroImpactRecord:
|
||||
event_id: str
|
||||
company_id: str
|
||||
ticker: str
|
||||
macro_impact_score: float # [0, 1]
|
||||
impact_direction: str # positive|negative|mixed
|
||||
contributing_factors: list[str] # which profile dimensions matched
|
||||
confidence: float # [0, 1]
|
||||
computed_at: datetime
|
||||
```
|
||||
|
||||
**Core Functions**:
|
||||
- `compute_macro_impact(event: GlobalEvent, profile: ExposureProfile) -> MacroImpactRecord`
|
||||
- `compute_geographic_overlap(event_regions: list[str], revenue_mix: dict[str, float]) -> float`
|
||||
- `compute_supply_chain_overlap(event_regions: list[str], supply_regions: list[str]) -> float`
|
||||
- `compute_commodity_overlap(event_commodities: list[str], company_commodities: list[str]) -> float`
|
||||
- `apply_resilience_modifier(raw_score: float, tier: str, event_is_international: bool) -> float`
|
||||
- `build_default_profile(sector: str, industry: str, market_cap_bucket: str) -> ExposureProfile`
|
||||
|
||||
**Scoring Formula**:
|
||||
```
|
||||
raw_score = severity_weight * (
|
||||
geo_weight * geographic_overlap +
|
||||
supply_weight * supply_chain_overlap +
|
||||
commodity_weight * commodity_overlap +
|
||||
sector_weight * sector_match
|
||||
)
|
||||
final_score = apply_resilience_modifier(raw_score, market_position_tier)
|
||||
```
|
||||
|
||||
Where:
|
||||
- `severity_weight`: critical=1.0, high=0.75, moderate=0.5, low=0.25
|
||||
- `geo_weight=0.35, supply_weight=0.25, commodity_weight=0.25, sector_weight=0.15`
|
||||
- Resilience modifiers: global_leader=0.7, multinational=0.85, regional=1.0, domestic=1.2 (for international events)
|
||||
|
||||
### Aggregation Engine Extensions
|
||||
|
||||
**Location**: Modified `services/aggregation/worker.py`
|
||||
|
||||
The existing `aggregate_company_window` function is extended to:
|
||||
1. Check the macro signal layer toggle (from `risk_configs` table)
|
||||
2. Fetch macro impact records for the ticker within the window
|
||||
3. Convert macro impact records to `WeightedSignal` objects using the same scoring pipeline
|
||||
4. Merge macro signals with company-specific signals before computing the trend summary
|
||||
5. Apply `macro_signal_weight` (default 0.3) to control relative influence
|
||||
|
||||
**New config field on `AggregationConfig`**:
|
||||
```python
|
||||
macro_signal_weight: float = 0.3 # relative weight of macro vs company signals
|
||||
macro_enabled: bool = True # runtime toggle state
|
||||
```
|
||||
|
||||
**Macro signal conversion**: Each `MacroImpactRecord` is converted to a `WeightedSignal` using:
|
||||
- `document_id` = event's `source_document_id` (for evidence tracing)
|
||||
- `sentiment_value` = mapped from `impact_direction` (positive=+1, negative=-1, mixed=0)
|
||||
- `impact_score` = `macro_impact_score * macro_signal_weight`
|
||||
- Recency decay uses the global event's publication time
|
||||
- Confidence gating uses the macro impact record's confidence
|
||||
|
||||
### Trend Projection Module
|
||||
|
||||
**Location**: `services/aggregation/projection.py`
|
||||
|
||||
Computes forward-looking trend projections alongside current trend summaries.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TrendProjection:
|
||||
projected_direction: str # bullish|bearish|mixed|neutral
|
||||
projected_strength: float # [0, 1]
|
||||
projected_confidence: float # [0, 1]
|
||||
projection_horizon: str # 1d|7d|30d
|
||||
driving_factors: list[str] # human-readable explanations
|
||||
macro_contribution_pct: float # % of projection driven by macro signals
|
||||
diverges_from_current: bool # True if projection != current direction
|
||||
computed_at: datetime
|
||||
```
|
||||
|
||||
**Inputs**:
|
||||
- Current trend summary (direction, strength, momentum)
|
||||
- Active global events with `estimated_duration` extending beyond the current window
|
||||
- Upcoming known catalysts from document intelligence (earnings dates, regulatory deadlines)
|
||||
- Historical resolution patterns for similar event types (optional, v2)
|
||||
|
||||
**Projection Logic**:
|
||||
1. Compute trend momentum as rate of change in strength across recent windows
|
||||
2. Project macro signal decay based on event `estimated_duration` and severity
|
||||
3. Factor in upcoming catalysts that may shift direction
|
||||
4. Combine momentum + macro trajectory + catalyst outlook into projected direction/strength
|
||||
5. Flag divergence when projected direction differs from current direction
|
||||
|
||||
### Macro Signal Suppression
|
||||
|
||||
**Location**: Extended `services/recommendation/suppression.py`
|
||||
|
||||
New suppression check: when macro signals are the sole basis for a trend direction change (no supporting company-specific signals agree), the recommendation is forced to informational mode with a macro-only caveat.
|
||||
|
||||
**New function**:
|
||||
- `evaluate_macro_only_suppression(summary: TrendSummary, macro_signal_count: int, company_signal_count: int) -> bool`
|
||||
|
||||
### Exposure Profile Auto-Inference
|
||||
|
||||
**Location**: `services/extractor/exposure_inference.py`
|
||||
|
||||
Infers baseline exposure profiles from company filing extractions when no manual profile exists.
|
||||
|
||||
**Interface**:
|
||||
- `infer_exposure_profile(document_intelligences: list[DocumentIntelligence], sector: str, industry: str, market_cap_bucket: str) -> ExposureProfile`
|
||||
|
||||
Scans recent filing extractions for geographic revenue breakdowns, supplier mentions, and commodity references. Produces an `ExposureProfile` with `source='inferred'` and a confidence score reflecting data quality.
|
||||
|
||||
### Query API Extensions
|
||||
|
||||
**Location**: Extended `services/api/`
|
||||
|
||||
New endpoints:
|
||||
- `GET /api/macro/events` — list recent global events with filtering
|
||||
- `GET /api/macro/events/{event_id}` — event detail with affected companies
|
||||
- `GET /api/macro/impacts/{ticker}` — macro impacts for a company
|
||||
- `GET /api/admin/macro/status` — macro layer enabled/disabled state
|
||||
- `PUT /api/admin/macro/toggle` — toggle macro layer on/off
|
||||
- `GET /api/trends/{trend_id}/projection` — trend projection for a specific window
|
||||
|
||||
### Dashboard Extensions
|
||||
|
||||
**Location**: Extended `frontend/src/`
|
||||
|
||||
New pages/panels:
|
||||
- **Global Events page** (`/macro/events`): filterable list of global events with severity badges, region/sector tags, and drill-down to affected companies
|
||||
- **Macro Exposure panel** on Company Detail page: shows exposure profile and active macro impacts
|
||||
- **Macro evidence indicators** on Trend and Recommendation detail pages: visually distinguishes macro-sourced evidence
|
||||
- **Trend projection display** on Trend detail page: projected direction/strength with driving factors
|
||||
- **Macro toggle** on Trading Controls page: enable/disable switch with confirmation dialog
|
||||
|
||||
## Data Models
|
||||
|
||||
### New PostgreSQL Tables
|
||||
|
||||
#### `global_events`
|
||||
```sql
|
||||
CREATE TABLE global_events (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
event_types TEXT[] NOT NULL,
|
||||
severity VARCHAR(20) NOT NULL,
|
||||
affected_regions TEXT[] NOT NULL DEFAULT '{}',
|
||||
affected_sectors TEXT[] NOT NULL DEFAULT '{}',
|
||||
affected_commodities TEXT[] NOT NULL DEFAULT '{}',
|
||||
summary TEXT NOT NULL,
|
||||
key_facts JSONB NOT NULL DEFAULT '[]',
|
||||
estimated_duration VARCHAR(20) NOT NULL,
|
||||
confidence FLOAT NOT NULL,
|
||||
source_document_id UUID REFERENCES documents(id),
|
||||
model_provider VARCHAR(100),
|
||||
model_name VARCHAR(200),
|
||||
prompt_version VARCHAR(100),
|
||||
schema_version VARCHAR(20),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
#### `macro_impact_records`
|
||||
```sql
|
||||
CREATE TABLE macro_impact_records (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
event_id UUID NOT NULL REFERENCES global_events(id),
|
||||
company_id UUID NOT NULL REFERENCES companies(id),
|
||||
ticker VARCHAR(20) NOT NULL,
|
||||
macro_impact_score FLOAT NOT NULL,
|
||||
impact_direction VARCHAR(20) NOT NULL,
|
||||
contributing_factors JSONB NOT NULL DEFAULT '[]',
|
||||
confidence FLOAT NOT NULL,
|
||||
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
#### `exposure_profiles`
|
||||
```sql
|
||||
CREATE TABLE exposure_profiles (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
company_id UUID NOT NULL REFERENCES companies(id),
|
||||
geographic_revenue_mix JSONB NOT NULL DEFAULT '{}',
|
||||
supply_chain_regions TEXT[] NOT NULL DEFAULT '{}',
|
||||
key_input_commodities TEXT[] NOT NULL DEFAULT '{}',
|
||||
regulatory_jurisdictions TEXT[] NOT NULL DEFAULT '{}',
|
||||
market_position_tier VARCHAR(30) NOT NULL DEFAULT 'regional',
|
||||
export_dependency_pct FLOAT NOT NULL DEFAULT 0.0,
|
||||
source VARCHAR(20) NOT NULL DEFAULT 'manual',
|
||||
confidence FLOAT NOT NULL DEFAULT 1.0,
|
||||
version INTEGER NOT NULL DEFAULT 1,
|
||||
active BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
#### `trend_projections`
|
||||
```sql
|
||||
CREATE TABLE trend_projections (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
trend_window_id UUID NOT NULL REFERENCES trend_windows(id),
|
||||
projected_direction VARCHAR(20) NOT NULL,
|
||||
projected_strength FLOAT NOT NULL,
|
||||
projected_confidence FLOAT NOT NULL,
|
||||
projection_horizon VARCHAR(10) NOT NULL,
|
||||
driving_factors JSONB NOT NULL DEFAULT '[]',
|
||||
macro_contribution_pct FLOAT NOT NULL DEFAULT 0.0,
|
||||
diverges_from_current BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
### New Pydantic Schemas
|
||||
|
||||
Added to `services/shared/schemas.py`:
|
||||
|
||||
```python
|
||||
class ImpactType(str, Enum):
|
||||
SUPPLY_DISRUPTION = "supply_disruption"
|
||||
DEMAND_SHIFT = "demand_shift"
|
||||
COST_INCREASE = "cost_increase"
|
||||
REGULATORY_PRESSURE = "regulatory_pressure"
|
||||
CURRENCY_IMPACT = "currency_impact"
|
||||
COMMODITY_SHOCK = "commodity_shock"
|
||||
TRADE_BARRIER = "trade_barrier"
|
||||
GEOPOLITICAL_RISK = "geopolitical_risk"
|
||||
|
||||
class SeverityLevel(str, Enum):
|
||||
LOW = "low"
|
||||
MODERATE = "moderate"
|
||||
HIGH = "high"
|
||||
CRITICAL = "critical"
|
||||
|
||||
class MarketPositionTier(str, Enum):
|
||||
GLOBAL_LEADER = "global_leader"
|
||||
MULTINATIONAL = "multinational"
|
||||
REGIONAL = "regional"
|
||||
DOMESTIC = "domestic"
|
||||
|
||||
class EstimatedDuration(str, Enum):
|
||||
SHORT_TERM = "short_term"
|
||||
MEDIUM_TERM = "medium_term"
|
||||
LONG_TERM = "long_term"
|
||||
```
|
||||
|
||||
### Analytical Lake Datasets
|
||||
|
||||
New fact tables published to MinIO under `stonks-lakehouse/`:
|
||||
|
||||
- `lake.global_events` — partitioned by `dt`, columns: event_id, event_types, severity, affected_regions, affected_sectors, affected_commodities, summary, estimated_duration, confidence, source_document_id, created_at
|
||||
- `lake.macro_impacts` — partitioned by `dt` and `ticker`, columns: event_id, company_id, ticker, macro_impact_score, impact_direction, contributing_factors, confidence, computed_at
|
||||
- `lake.trend_projections` — partitioned by `dt` and `ticker`, columns: trend_window_id, ticker, projected_direction, projected_strength, projected_confidence, projection_horizon, driving_factors, macro_contribution_pct, diverges_from_current, computed_at
|
||||
|
||||
|
||||
|
||||
## Correctness Properties
|
||||
|
||||
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
||||
|
||||
### Property 1: Content hash stability and uniqueness
|
||||
|
||||
*For any* macro news article content, computing the content hash twice on identical content SHALL produce the same hash, and computing the hash on distinct content SHALL produce different hashes.
|
||||
|
||||
**Validates: Requirements 1.2**
|
||||
|
||||
### Property 2: Macro pipeline output schema completeness
|
||||
|
||||
*For any* valid Ollama classification response, the resulting GlobalEvent object SHALL contain all required fields (event_id, event_types, severity, affected_regions, affected_sectors, summary, estimated_duration, confidence, source_document_id, model_metadata). Similarly, *for any* valid macro impact computation, the resulting MacroImpactRecord SHALL contain all required fields (event_id, company_id, ticker, macro_impact_score, impact_direction, contributing_factors, confidence).
|
||||
|
||||
**Validates: Requirements 2.2, 4.5**
|
||||
|
||||
### Property 3: Multiple impact types preserved
|
||||
|
||||
*For any* global event classification where the source article implies N distinct impact types, the resulting GlobalEvent's event_types list SHALL contain all N types without collapsing to a single category.
|
||||
|
||||
**Validates: Requirements 2.4**
|
||||
|
||||
### Property 4: Macro data persistence round-trip
|
||||
|
||||
*For any* valid GlobalEvent, MacroImpactRecord, ExposureProfile, or TrendProjection object, persisting it to PostgreSQL and reading it back SHALL produce an equivalent object with all fields preserved.
|
||||
|
||||
**Validates: Requirements 3.1, 7.1, 7.2, 12.5**
|
||||
|
||||
### Property 5: Default exposure profile derivation
|
||||
|
||||
*For any* company with a valid sector, industry, and market_cap_bucket but no manually configured ExposureProfile, the default profile SHALL have a market_position_tier consistent with the market_cap_bucket mapping (large_cap → global_leader, mid_cap → multinational, small_cap → regional, micro_cap → domestic) and SHALL have non-empty geographic_revenue_mix derived from the sector.
|
||||
|
||||
**Validates: Requirements 3.2**
|
||||
|
||||
### Property 6: Exposure profile version history
|
||||
|
||||
*For any* sequence of N updates to a company's ExposureProfile, the version history SHALL contain exactly N records, each preserving the complete profile state at the time of that update, with monotonically increasing version numbers.
|
||||
|
||||
**Validates: Requirements 3.3**
|
||||
|
||||
### Property 7: Macro impact score bounds and zero-overlap invariant
|
||||
|
||||
*For any* GlobalEvent and ExposureProfile pair, the computed Macro_Impact_Score SHALL be in [0, 1]. Furthermore, *for any* pair where the event's affected_regions, affected_sectors, and affected_commodities have zero intersection with the profile's geographic_revenue_mix keys, supply_chain_regions, and key_input_commodities, the score SHALL be exactly 0.0.
|
||||
|
||||
**Validates: Requirements 4.1, 4.4**
|
||||
|
||||
### Property 8: Scoring monotonicity
|
||||
|
||||
*For any* GlobalEvent and ExposureProfile pair, increasing the event's severity level (low → moderate → high → critical) while holding all other inputs constant SHALL produce a Macro_Impact_Score that is greater than or equal to the previous score. Similarly, increasing the geographic overlap percentage SHALL produce a score greater than or equal to the previous score.
|
||||
|
||||
**Validates: Requirements 4.2**
|
||||
|
||||
### Property 9: Resilience modifier tier ordering
|
||||
|
||||
*For any* positive raw impact score and an international event, applying the resilience modifier with market_position_tier=global_leader SHALL produce a final score less than or equal to multinational, which SHALL be less than or equal to regional, which SHALL be less than or equal to domestic.
|
||||
|
||||
**Validates: Requirements 4.3**
|
||||
|
||||
### Property 10: Mixed direction for dual-effect events
|
||||
|
||||
*For any* GlobalEvent and ExposureProfile pair where the computation identifies both positive and negative contributing factors, the resulting impact_direction SHALL be 'mixed' and both positive and negative factors SHALL be preserved separately in contributing_factors.
|
||||
|
||||
**Validates: Requirements 4.6**
|
||||
|
||||
### Property 11: Macro signals influence trend output
|
||||
|
||||
*For any* company with both company-specific signals and non-zero macro impact signals, the trend summary computed with macro signals included SHALL differ from the trend summary computed with only company-specific signals (in at least one of: trend_strength, confidence, or evidence references).
|
||||
|
||||
**Validates: Requirements 5.1**
|
||||
|
||||
### Property 12: Macro-company contradiction detection
|
||||
|
||||
*For any* set of signals where macro impact signals have a negative direction and company-specific signals have a positive sentiment (or vice versa), the resulting trend summary's contradiction_score SHALL be greater than zero and disagreement_details SHALL contain at least one entry.
|
||||
|
||||
**Validates: Requirements 5.3**
|
||||
|
||||
### Property 13: Macro evidence traceability
|
||||
|
||||
*For any* trend summary that includes macro signal contributions, the top_supporting_evidence or top_opposing_evidence lists SHALL contain the source_document_id of at least one contributing GlobalEvent.
|
||||
|
||||
**Validates: Requirements 5.4**
|
||||
|
||||
### Property 14: No degradation without macro data and disabled-layer equivalence
|
||||
|
||||
*For any* company with no macro impact records in the aggregation window, the trend summary produced with the macro layer enabled SHALL be identical to the trend summary produced with the macro layer disabled. Furthermore, *for any* aggregation run with the macro layer disabled, the output SHALL be identical to company-only aggregation regardless of existing macro data.
|
||||
|
||||
**Validates: Requirements 5.5, 11.2**
|
||||
|
||||
### Property 15: Sector and market rollup macro incorporation
|
||||
|
||||
*For any* sector containing companies with non-zero macro impact scores, the sector-level rollup SHALL reflect those macro signals in its trend_strength or confidence. Furthermore, *for any* GlobalEvent that disproportionately affects a single sector (>60% of total macro impact concentrated in one sector), that sector SHALL appear in the market-level rollup's material_risks or dominant_catalysts.
|
||||
|
||||
**Validates: Requirements 6.1, 6.2, 6.3**
|
||||
|
||||
### Property 16: Inferred exposure profile correctness
|
||||
|
||||
*For any* set of filing extractions containing geographic revenue breakdowns or commodity references, the inferred ExposureProfile SHALL have source='inferred', confidence in [0, 1], and geographic_revenue_mix entries that correspond to regions mentioned in the filings.
|
||||
|
||||
**Validates: Requirements 9.1, 9.2**
|
||||
|
||||
### Property 17: Low-confidence event exclusion
|
||||
|
||||
*For any* GlobalEvent classification with confidence below the configurable threshold (default 0.4), the Interpolation_Engine SHALL produce zero MacroImpactRecords for that event.
|
||||
|
||||
**Validates: Requirements 10.1**
|
||||
|
||||
### Property 18: Accelerated decay for stale short-term events
|
||||
|
||||
*For any* GlobalEvent with estimated_duration='short_term' and age exceeding 48 hours, the effective signal weight SHALL be strictly less than the weight computed using standard recency decay for the same age.
|
||||
|
||||
**Validates: Requirements 10.2**
|
||||
|
||||
### Property 19: Macro-only recommendation suppression
|
||||
|
||||
*For any* trend summary where the trend direction is driven solely by macro signals (no company-specific signals support the direction), the resulting recommendation SHALL have mode='informational' and the thesis SHALL contain a macro-only caveat.
|
||||
|
||||
**Validates: Requirements 10.3**
|
||||
|
||||
### Property 20: Trend projection always produced
|
||||
|
||||
*For any* trend summary produced by the Aggregation_Engine, a corresponding TrendProjection SHALL also be produced with valid projected_direction, projected_strength in [0, 1], projected_confidence in [0, 1], and a non-empty driving_factors list.
|
||||
|
||||
**Validates: Requirements 12.1**
|
||||
|
||||
### Property 21: Projection divergence flagging
|
||||
|
||||
*For any* TrendProjection where projected_direction differs from the current trend summary's trend_direction, the diverges_from_current field SHALL be True and driving_factors SHALL contain at least one entry explaining the divergence.
|
||||
|
||||
**Validates: Requirements 12.3**
|
||||
|
||||
### Property 22: Macro-disabled projections have reduced confidence
|
||||
|
||||
*For any* identical set of company signals and macro signals, the TrendProjection computed with the macro layer disabled SHALL have projected_confidence less than or equal to the projection computed with the macro layer enabled.
|
||||
|
||||
**Validates: Requirements 12.4**
|
||||
|
||||
### Property 23: Low-confidence projection exclusion
|
||||
|
||||
*For any* TrendProjection with projected_confidence below the configurable threshold (default 0.3), the projection SHALL be marked as low_confidence and SHALL NOT influence recommendation eligibility.
|
||||
|
||||
**Validates: Requirements 12.9**
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Macro Ingestion Failures
|
||||
- Source fetch failures follow existing retry/backoff logic from the ingestion service
|
||||
- Sustained macro source failures (configurable threshold, default 3 consecutive) trigger operator alerts via the existing alerting framework
|
||||
- The aggregation engine continues producing trends using company-specific signals only when macro ingestion is degraded
|
||||
|
||||
### Event Classification Failures
|
||||
- Invalid Ollama responses trigger retries per existing extraction retry policy (max 2 retries with exponential backoff)
|
||||
- Failed classifications are preserved in MinIO with validation errors for debugging
|
||||
- Failed events do not produce macro impact records — they are silently excluded from interpolation
|
||||
|
||||
### Exposure Profile Fallbacks
|
||||
- Missing manual profiles fall back to sector-based defaults
|
||||
- Failed auto-inference falls back to sector-based defaults
|
||||
- Default profiles use conservative assumptions (regional tier, even geographic distribution within sector norms)
|
||||
|
||||
### Interpolation Engine Failures
|
||||
- Database errors during macro impact computation are logged and the event is skipped for that company
|
||||
- The aggregation engine treats missing macro data as "no macro signal" — never blocks trend computation
|
||||
|
||||
### Projection Failures
|
||||
- If projection computation fails (e.g., insufficient historical data), the trend summary is still persisted without a projection
|
||||
- Low-confidence projections are marked but still displayed as informational
|
||||
|
||||
### Runtime Toggle Safety
|
||||
- Toggle state is read from PostgreSQL at the start of each aggregation cycle — no caching that could become stale
|
||||
- Toggle changes are audit-logged with operator identity, previous state, and new state
|
||||
- Disabling the macro layer does not delete any data — ingestion and classification continue, only interpolation and aggregation integration are skipped
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Property-Based Testing
|
||||
|
||||
This feature is well-suited for property-based testing. The core interpolation logic (impact scoring, overlap computation, resilience modifiers, signal weighting) consists of pure functions with clear input/output behavior and a large input space. The scoring formula has universal properties (monotonicity, bounds, zero-overlap invariant) that should hold across all valid inputs.
|
||||
|
||||
**Library**: [Hypothesis](https://hypothesis.readthedocs.io/) for Python property-based testing.
|
||||
|
||||
**Configuration**: Minimum 100 iterations per property test.
|
||||
|
||||
**Tag format**: `Feature: global-news-interpolation, Property {number}: {property_text}`
|
||||
|
||||
Each correctness property above maps to one property-based test. Generators will produce:
|
||||
- Random `GlobalEvent` objects with valid enum values and realistic field ranges
|
||||
- Random `ExposureProfile` objects with valid geographic mixes (summing to ~1.0), commodity lists, and tier values
|
||||
- Random `WeightedSignal` lists mixing macro and company-specific signals
|
||||
- Random `TrendSummary` objects for projection testing
|
||||
|
||||
### Unit Tests
|
||||
|
||||
Unit tests cover specific examples, edge cases, and integration points:
|
||||
- Event classification prompt construction and schema validation
|
||||
- Exposure profile API CRUD operations
|
||||
- Default profile generation for each sector/market_cap combination
|
||||
- Macro toggle API endpoints (status, toggle, audit logging)
|
||||
- Recommendation thesis text includes macro signal references when present
|
||||
- Dashboard component rendering for Global Events page, macro exposure panel, and projection display
|
||||
|
||||
### Integration Tests
|
||||
|
||||
Integration tests verify end-to-end data flow:
|
||||
- Macro article ingestion → parsing → classification → interpolation → aggregation pipeline
|
||||
- Lake publisher writes correct Parquet partitions for global events and macro impacts
|
||||
- Trino queries joining global_events, macro_impacts, and trend_windows return expected results
|
||||
- Macro toggle state change propagates to next aggregation cycle
|
||||
@@ -0,0 +1,167 @@
|
||||
# Requirements Document — Global News Interpolation Layer
|
||||
|
||||
## Introduction
|
||||
|
||||
This feature adds a macro-level global news interpolation layer to the Stonks Oracle platform. The existing system ingests company-specific news, filings, and market data to produce per-company trend summaries and trade recommendations. This extension introduces a parallel signal path that ingests global and geopolitical news events — tariffs, wars, sanctions, central bank rate decisions, commodity shocks, natural disasters, regulatory changes, pandemics, and similar macro events — classifies them by impact type and severity, maps them to affected business sectors and individual companies based on exposure profiles, and feeds the resulting macro intelligence into the aggregation engine as an additional weighted signal layer alongside existing company-specific document intelligence.
|
||||
|
||||
The interpolation layer accounts for the fact that the same global event affects different businesses differently depending on their business class, what they produce or market, their geographic revenue exposure, supply chain dependencies, and their position on the world scale (domestic-only vs. multinational vs. emerging-market-dependent).
|
||||
|
||||
## Glossary
|
||||
|
||||
- **Global_Event**: A macro-level news event with potential cross-sector or cross-geography market impact (e.g., a tariff announcement, armed conflict, central bank rate decision, commodity supply disruption, natural disaster, or regulatory change).
|
||||
- **Event_Classifier**: The Ollama-based extraction service that classifies a Global_Event by impact type, severity, affected regions, and affected sectors.
|
||||
- **Exposure_Profile**: A per-company record describing geographic revenue mix, supply chain dependencies, key input commodities, regulatory jurisdictions, and market position tier that determines how a Global_Event maps to that company.
|
||||
- **Macro_Impact_Score**: A computed score in [0, 1] representing the estimated magnitude of a Global_Event's effect on a specific company, derived from the event's severity and the company's Exposure_Profile overlap.
|
||||
- **Interpolation_Engine**: The component that combines Global_Event classifications with company Exposure_Profiles to produce per-company Macro_Impact_Scores and feed them into the existing Aggregation_Engine.
|
||||
- **Aggregation_Engine**: The existing trend aggregation system (services/aggregation/) that computes rolling trend summaries from document intelligence signals.
|
||||
- **Impact_Type**: The category of economic effect a Global_Event produces (e.g., supply_disruption, demand_shift, cost_increase, regulatory_pressure, currency_impact, commodity_shock, trade_barrier, geopolitical_risk).
|
||||
- **Severity_Level**: A classification of a Global_Event's magnitude: low, moderate, high, or critical.
|
||||
- **Market_Position_Tier**: A company's scale classification affecting its resilience to macro shocks: global_leader, multinational, regional, or domestic.
|
||||
- **Macro_Source**: A news source configured specifically for global/macro event ingestion, distinct from company-specific news sources.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Global Event Ingestion
|
||||
|
||||
**User Story:** As an analyst, I want the platform to ingest global and geopolitical news from macro-focused sources, so that macro events are captured alongside company-specific intelligence.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Scheduler triggers a macro news ingestion cycle, THE Ingestion_Engine SHALL fetch articles from configured Macro_Sources and persist raw response payloads to MinIO under the `stonks-raw-news` bucket with a `macro/` prefix path segment.
|
||||
2. WHEN a macro news article is ingested, THE Ingestion_Engine SHALL generate a stable content hash and use it to prevent duplicate processing, consistent with existing deduplication behavior.
|
||||
3. WHEN a macro news article is ingested, THE Ingestion_Engine SHALL persist a metadata record in PostgreSQL with source, URL, title, publication time, retrieval time, language, and content hash, using document_type `macro_event`.
|
||||
4. IF a macro news source is unreachable or returns an error, THEN THE Ingestion_Engine SHALL record the failure reason, retry policy state, and next eligible retry time, consistent with existing source failure handling.
|
||||
|
||||
### Requirement 2: Global Event Classification
|
||||
|
||||
**User Story:** As an analyst, I want each global news article classified by impact type, severity, affected regions, and affected sectors, so that the platform understands what kind of macro shock each event represents.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a macro news article passes parsing, THE Event_Classifier SHALL send the normalized text to a local Ollama model using structured JSON output with an explicit schema.
|
||||
2. WHEN the Event_Classifier processes a macro article, THE Event_Classifier SHALL produce a Global_Event intelligence object containing at minimum: event_id, event_type (one or more Impact_Types), severity (a Severity_Level), affected_regions (list of ISO country or region codes), affected_sectors (list of GICS sector identifiers or equivalent), affected_commodities (list when applicable), summary, key_facts, estimated_duration (short_term, medium_term, long_term), confidence score, and model metadata.
|
||||
3. WHEN the Ollama model returns an invalid or incomplete classification, THE Event_Classifier SHALL retry extraction according to policy and preserve both the failed output and validation errors.
|
||||
4. WHEN a Global_Event affects multiple Impact_Types simultaneously, THE Event_Classifier SHALL represent all applicable types rather than collapsing to a single category.
|
||||
5. THE Event_Classifier SHALL persist the classification prompt, schema, model metadata, and raw model output to MinIO for audit and reproducibility.
|
||||
|
||||
### Requirement 3: Company Exposure Profiles
|
||||
|
||||
**User Story:** As an operator, I want to define each tracked company's geographic exposure, supply chain dependencies, and market position, so that the platform can determine how global events affect each company differently.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN an operator creates or updates a company's Exposure_Profile, THE Symbol_Registry SHALL persist the profile containing: geographic_revenue_mix (a map of region codes to revenue percentage), supply_chain_regions (list of regions where key suppliers operate), key_input_commodities (list of commodities the company depends on), regulatory_jurisdictions (list of jurisdictions with material regulatory exposure), market_position_tier (one of global_leader, multinational, regional, domestic), and export_dependency_pct (percentage of revenue from exports).
|
||||
2. WHEN no Exposure_Profile exists for a tracked company, THE Interpolation_Engine SHALL use a default profile derived from the company's sector and industry fields, with market_position_tier inferred from market_cap_bucket.
|
||||
3. WHEN an operator updates an Exposure_Profile, THE Symbol_Registry SHALL record the previous profile version for audit trail purposes.
|
||||
4. THE Symbol_Registry SHALL expose Exposure_Profile CRUD operations through its existing REST API.
|
||||
|
||||
### Requirement 4: Macro-to-Company Impact Mapping
|
||||
|
||||
**User Story:** As a strategist, I want the platform to compute how each global event specifically impacts each tracked company based on their exposure profile, so that macro intelligence is company-specific rather than generic.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Global_Event classification is produced, THE Interpolation_Engine SHALL compute a Macro_Impact_Score for each tracked company by evaluating the overlap between the event's affected_regions, affected_sectors, and affected_commodities against the company's Exposure_Profile.
|
||||
2. WHEN computing a Macro_Impact_Score, THE Interpolation_Engine SHALL weight the score by the event's Severity_Level, the degree of geographic overlap (using geographic_revenue_mix percentages), the supply chain exposure (using supply_chain_regions), and the commodity dependency overlap.
|
||||
3. WHEN computing a Macro_Impact_Score, THE Interpolation_Engine SHALL apply a resilience modifier based on the company's Market_Position_Tier, where global_leader companies receive a dampening factor and domestic companies receive an amplification factor for international events.
|
||||
4. WHEN a Global_Event has zero overlap with a company's Exposure_Profile, THE Interpolation_Engine SHALL assign a Macro_Impact_Score of 0.0 and skip further processing for that company-event pair.
|
||||
5. WHEN a Macro_Impact_Score is computed, THE Interpolation_Engine SHALL produce a macro impact record containing: event_id, company_id, ticker, macro_impact_score, impact_direction (positive, negative, or mixed), contributing_factors (list of which profile dimensions matched), and confidence score.
|
||||
6. WHEN the same Global_Event produces both positive and negative effects on a company, THE Interpolation_Engine SHALL represent the net direction as mixed and preserve both the positive and negative contributing factors separately.
|
||||
|
||||
### Requirement 5: Aggregation Engine Integration
|
||||
|
||||
**User Story:** As a strategist, I want macro impact signals to be blended into existing company trend summaries alongside company-specific document intelligence, so that recommendations reflect both micro and macro conditions.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Aggregation_Engine computes a company trend summary, THE Aggregation_Engine SHALL include macro impact records as additional weighted signals alongside existing document intelligence signals.
|
||||
2. WHEN weighting macro impact signals, THE Aggregation_Engine SHALL apply recency decay, event severity weighting, and confidence gating consistent with existing signal scoring, using the Global_Event's publication time for recency and the Macro_Impact_Score as the impact score.
|
||||
3. WHEN macro signals and company-specific signals disagree in direction, THE Aggregation_Engine SHALL represent the disagreement explicitly in the contradiction_score and disagreement_details fields, consistent with existing contradiction detection behavior.
|
||||
4. WHEN a trend summary includes macro signal contributions, THE Aggregation_Engine SHALL include the contributing Global_Event IDs in the evidence references so that the macro signal chain is traceable from recommendation back to source event.
|
||||
5. WHEN no macro impact records exist for a company in the aggregation window, THE Aggregation_Engine SHALL produce the trend summary using only company-specific signals, with no degradation of existing behavior.
|
||||
6. THE Aggregation_Engine SHALL expose a configurable weight parameter (macro_signal_weight) that controls the relative influence of macro signals versus company-specific signals in the combined trend, defaulting to 0.3.
|
||||
|
||||
### Requirement 6: Sector and Market Rollup Enhancement
|
||||
|
||||
**User Story:** As an analyst, I want sector-level and market-level trend rollups to reflect macro event impacts, so that I can see how global events are shifting entire sectors.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Aggregation_Engine computes a sector-level rollup, THE Aggregation_Engine SHALL incorporate macro impact signals that affect the sector, weighted by the number and exposure of constituent companies impacted.
|
||||
2. WHEN the Aggregation_Engine computes a market-level rollup, THE Aggregation_Engine SHALL incorporate macro impact signals aggregated across all sectors, reflecting the breadth and severity of active global events.
|
||||
3. WHEN a Global_Event disproportionately affects one sector, THE Aggregation_Engine SHALL surface that sector as a material_risk or dominant_catalyst in the market-level rollup.
|
||||
|
||||
### Requirement 7: Global Event Storage and Queryability
|
||||
|
||||
**User Story:** As a data engineer, I want global event classifications and macro impact records stored in both the operational database and the analytical lake, so that I can query macro intelligence alongside company data.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Global_Event classification is produced, THE System SHALL persist the classification record in PostgreSQL with fields for event_id, event_types, severity, affected_regions, affected_sectors, affected_commodities, summary, estimated_duration, confidence, source_document_id, and model metadata.
|
||||
2. WHEN a macro impact record is computed, THE System SHALL persist it in PostgreSQL with fields for event_id, company_id, ticker, macro_impact_score, impact_direction, contributing_factors, confidence, and computed_at timestamp.
|
||||
3. WHEN the Lake_Publisher runs, THE Lake_Publisher SHALL publish global event facts and macro impact facts as partitioned Parquet datasets to MinIO under the `stonks-lakehouse` bucket.
|
||||
4. WHEN analytical queries join macro impact data with company trends, THE System SHALL support SQL joins between global_events, macro_impacts, trend_windows, and recommendations tables through Trino.
|
||||
|
||||
### Requirement 8: Dashboard Visibility
|
||||
|
||||
**User Story:** As an analyst, I want to see active global events, their severity, and which companies they impact through the web dashboard, so that I can understand the macro context behind trend shifts.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN an analyst navigates to a new Global Events section, THE Dashboard SHALL display a filterable list of recent Global_Events with columns for event summary, impact types, severity badge, affected regions, affected sectors, and event date.
|
||||
2. WHEN an analyst clicks a Global_Event, THE Dashboard SHALL display the full classification detail including all affected companies with their Macro_Impact_Scores, impact directions, and contributing factors.
|
||||
3. WHEN an analyst views a company detail page, THE Dashboard SHALL display a macro exposure panel showing the company's Exposure_Profile and a list of active Global_Events affecting that company with their Macro_Impact_Scores.
|
||||
4. WHEN an analyst views a trend summary, THE Dashboard SHALL visually distinguish macro-sourced evidence from company-specific evidence in the evidence chain.
|
||||
5. WHEN an analyst views a recommendation, THE Dashboard SHALL display any macro signals that contributed to the recommendation with links back to the originating Global_Events.
|
||||
|
||||
### Requirement 9: Exposure Profile Auto-Inference
|
||||
|
||||
**User Story:** As an operator, I want the platform to automatically infer a baseline exposure profile from company filings and public data when I haven't manually configured one, so that macro interpolation works out of the box for newly tracked companies.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a company is tracked and has no manually configured Exposure_Profile, THE Event_Classifier SHALL attempt to infer a baseline profile from the company's most recent filing extractions, using geographic revenue breakdowns, supplier mentions, and commodity references found in the document intelligence.
|
||||
2. WHEN the Event_Classifier infers an Exposure_Profile, THE Event_Classifier SHALL mark the profile as source `inferred` with a confidence score, distinguishing it from operator-configured profiles marked as source `manual`.
|
||||
3. IF the Event_Classifier cannot infer a meaningful profile due to insufficient filing data, THEN THE Interpolation_Engine SHALL fall back to the sector-based default profile described in Requirement 3.2.
|
||||
|
||||
### Requirement 10: Macro Signal Suppression and Safety
|
||||
|
||||
**User Story:** As a risk owner, I want macro signals to be subject to quality controls so that low-confidence or stale global event classifications do not drive automated trading decisions.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Global_Event classification has a confidence score below a configurable threshold (default 0.4), THE Interpolation_Engine SHALL exclude the event from macro impact computation and log the exclusion reason.
|
||||
2. WHEN a Global_Event's estimated_duration is short_term and the event is older than 48 hours, THE Interpolation_Engine SHALL apply an accelerated decay factor to the event's macro impact signals.
|
||||
3. WHEN macro signals are the sole basis for a trend direction change (no supporting company-specific signals), THE Recommendation_Engine SHALL mark the recommendation as informational only and append a macro-only caveat to the thesis.
|
||||
4. IF the macro ingestion pipeline experiences sustained failures exceeding a configurable threshold, THEN THE System SHALL alert operators and continue producing recommendations using only company-specific signals.
|
||||
|
||||
### Requirement 11: Macro Signal Layer Toggle
|
||||
|
||||
**User Story:** As an operator, I want to enable or disable the macro signal interpolation layer at runtime without redeploying services, so that I can control whether global news influences trend summaries and recommendations.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN an operator toggles the macro signal layer via the Trading Controls page or the API, THE System SHALL persist the setting in the risk_configs table and apply it immediately to subsequent aggregation and recommendation cycles without requiring a service restart.
|
||||
2. WHEN the macro signal layer is disabled, THE Aggregation_Engine SHALL skip all macro impact signals and produce trend summaries using only company-specific document intelligence, with no change to existing behavior.
|
||||
3. WHEN the macro signal layer is disabled, THE Ingestion_Engine SHALL continue ingesting and classifying macro news articles so that historical macro data is preserved, but THE Interpolation_Engine SHALL skip macro-to-company impact computation.
|
||||
4. WHEN the macro signal layer is re-enabled after being disabled, THE Interpolation_Engine SHALL resume computing macro impact scores using the most recent Global_Event classifications, including events ingested while the layer was disabled.
|
||||
5. THE Query API SHALL expose a `GET /api/admin/macro/status` endpoint returning the current enabled/disabled state and a `PUT /api/admin/macro/toggle` endpoint to switch it.
|
||||
6. THE Dashboard Trading Controls page SHALL display the macro signal layer toggle alongside the existing trading mode controls, with a confirmation dialog for state changes.
|
||||
7. WHEN the macro signal layer state changes, THE System SHALL record an audit event with the previous state, new state, and the operator who made the change.
|
||||
|
||||
### Requirement 12: Trend Projections
|
||||
|
||||
**User Story:** As a strategist, I want the platform to generate forward-looking trend projections that combine historical company-specific signals with active macro event trajectories, so that I can anticipate where a company's trend is heading rather than only seeing where it is now.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Aggregation_Engine produces a trend summary for a company, THE Aggregation_Engine SHALL also compute a trend projection containing a projected_direction (bullish, bearish, mixed, neutral), projected_strength, projected_confidence, projection_horizon (1d, 7d, 30d), and a list of driving_factors explaining what is expected to push the trend in that direction.
|
||||
2. WHEN computing a trend projection, THE Aggregation_Engine SHALL consider: the current trend trajectory and momentum (rate of change in strength over recent windows), active Global_Events with estimated_duration extending beyond the current window, the severity and decay profile of active macro signals, upcoming known catalysts from document intelligence (earnings dates, regulatory deadlines, product launches), and the historical pattern of how similar macro event types have resolved for companies with similar Exposure_Profiles.
|
||||
3. WHEN a trend projection diverges from the current trend direction (e.g., current trend is bullish but projection is bearish), THE Aggregation_Engine SHALL flag the projection as a potential reversal signal and include the divergence reason in the driving_factors.
|
||||
4. WHEN the macro signal layer is disabled, THE Aggregation_Engine SHALL still compute trend projections using only company-specific signal momentum and known upcoming catalysts, with reduced projection confidence.
|
||||
5. WHEN a trend projection is produced, THE System SHALL persist it in PostgreSQL alongside the trend_window record with fields for projected_direction, projected_strength, projected_confidence, projection_horizon, driving_factors, macro_contribution_pct (percentage of projection driven by macro signals vs company-specific), and computed_at timestamp.
|
||||
6. WHEN the Lake_Publisher runs, THE Lake_Publisher SHALL publish trend projection facts as a partitioned Parquet dataset to MinIO for analytical queries and backtesting.
|
||||
7. WHEN an analyst views a trend summary on the Dashboard, THE Dashboard SHALL display the trend projection alongside the current trend with a visual indicator showing the projected direction and strength, and an expandable panel listing the driving factors.
|
||||
8. WHEN a recommendation is generated, THE Recommendation_Engine SHALL incorporate the trend projection into the thesis and time_horizon fields, citing the projected direction and key driving factors.
|
||||
9. WHEN a trend projection's confidence falls below a configurable threshold (default 0.3), THE System SHALL mark the projection as low_confidence and exclude it from influencing recommendation eligibility, while still displaying it as informational on the dashboard.
|
||||
10. THE System SHALL expose a `GET /api/trends/{trend_id}/projection` endpoint returning the projection for a specific trend window, and include projection data in the existing `GET /api/trends` list response.
|
||||
@@ -0,0 +1,338 @@
|
||||
# Implementation Plan: Global News Interpolation Layer
|
||||
|
||||
## Overview
|
||||
|
||||
This plan implements a macro-level global news interpolation layer that ingests global/geopolitical news events, classifies them via Ollama, maps them to companies via exposure profiles, and feeds macro impact scores into the existing aggregation engine. The implementation extends existing services (extractor, aggregation, symbol registry, recommendation, API, lake publisher, dashboard) rather than creating new deployments. Tasks are ordered so each step builds on the previous, with property-based tests validating core scoring logic early.
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 1. Database migration and shared schemas
|
||||
- [x] 1.1 Create PostgreSQL migration `infra/migrations/016_global_news_interpolation.sql`
|
||||
- Add `global_events` table with event_types, severity, affected_regions, affected_sectors, affected_commodities, summary, key_facts, estimated_duration, confidence, source_document_id FK, model metadata, created_at
|
||||
- Add `macro_impact_records` table with event_id FK, company_id FK, ticker, macro_impact_score, impact_direction, contributing_factors, confidence, computed_at
|
||||
- Add `exposure_profiles` table with company_id FK, geographic_revenue_mix, supply_chain_regions, key_input_commodities, regulatory_jurisdictions, market_position_tier, export_dependency_pct, source, confidence, version, active, created_at, updated_at
|
||||
- Add `trend_projections` table with trend_window_id FK, projected_direction, projected_strength, projected_confidence, projection_horizon, driving_factors, macro_contribution_pct, diverges_from_current, computed_at
|
||||
- Add indexes on `macro_impact_records(event_id)`, `macro_impact_records(company_id, computed_at)`, `macro_impact_records(ticker, computed_at)`, `exposure_profiles(company_id, active)`, `global_events(created_at)`, `trend_projections(trend_window_id)`
|
||||
- _Requirements: 7.1, 7.2, 3.1, 12.5_
|
||||
|
||||
- [x] 1.2 Add new Pydantic schemas and enums to `services/shared/schemas.py`
|
||||
- Add `ImpactType`, `SeverityLevel`, `MarketPositionTier`, `EstimatedDuration` enums
|
||||
- Add `MACRO_EVENT = "macro_event"` to `DocumentType` enum
|
||||
- Add `GlobalEventSchema`, `MacroImpactRecordSchema`, `ExposureProfileSchema`, `TrendProjectionSchema` Pydantic models
|
||||
- _Requirements: 2.2, 4.5, 3.1, 12.1_
|
||||
|
||||
- [x] 1.3 Add macro-related Redis queue name to `services/shared/redis_keys.py`
|
||||
- Add `QUEUE_MACRO_CLASSIFICATION = "macro_classification"` for event classification jobs
|
||||
- _Requirements: 1.1_
|
||||
|
||||
- [x] 1.4 Add macro configuration fields to `services/shared/config.py`
|
||||
- Add `macro_signal_weight`, `macro_enabled`, `macro_confidence_threshold`, `macro_short_term_staleness_hours`, `projection_confidence_threshold` fields to a new `MacroConfig` dataclass
|
||||
- Add `macro: MacroConfig` to `AppConfig` with env var loading in `load_config()`
|
||||
- _Requirements: 5.6, 10.1, 10.2, 12.9_
|
||||
|
||||
- [x] 2. Checkpoint — Ensure migration and schemas are consistent
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 3. Event classifier module
|
||||
- [x] 3.1 Implement `services/extractor/event_classifier.py`
|
||||
- Implement `GlobalEvent` dataclass matching the design specification
|
||||
- Implement `get_event_json_schema()` returning the Ollama structured output schema for event classification
|
||||
- Implement `build_event_classification_prompt(text: str) -> str` with anti-hallucination instructions for macro event extraction
|
||||
- Implement `classify_global_event(normalized_text, document_id, ollama_client) -> GlobalEvent` using the existing `OllamaClient` with retry logic
|
||||
- Persist classification prompt, schema, model metadata, and raw output to MinIO under `stonks-llm-prompts/` and `stonks-llm-results/`
|
||||
- Persist the `GlobalEvent` record to the `global_events` PostgreSQL table
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5_
|
||||
|
||||
- [x] 3.2 Write property test for GlobalEvent schema completeness
|
||||
- **Property 2: Macro pipeline output schema completeness**
|
||||
- **Validates: Requirements 2.2, 4.5**
|
||||
|
||||
- [x] 3.3 Write property test for multiple impact types preserved
|
||||
- **Property 3: Multiple impact types preserved**
|
||||
- **Validates: Requirements 2.4**
|
||||
|
||||
- [x] 4. Exposure profile management
|
||||
- [x] 4.1 Implement `services/symbol_registry/exposure.py`
|
||||
- Implement `ExposureProfile` Pydantic model for API request/response
|
||||
- Implement `GET /companies/{company_id}/exposure` endpoint returning the current active profile
|
||||
- Implement `PUT /companies/{company_id}/exposure` endpoint that archives the previous version (sets `active=FALSE`) and inserts a new version with incremented version number
|
||||
- Implement `GET /companies/{company_id}/exposure/history` endpoint returning all profile versions ordered by version descending
|
||||
- Register routes on the Symbol Registry FastAPI app
|
||||
- _Requirements: 3.1, 3.3, 3.4_
|
||||
|
||||
- [x] 4.2 Write property test for exposure profile version history
|
||||
- **Property 6: Exposure profile version history**
|
||||
- **Validates: Requirements 3.3**
|
||||
|
||||
- [x] 4.3 Write property test for default exposure profile derivation
|
||||
- **Property 5: Default exposure profile derivation**
|
||||
- **Validates: Requirements 3.2**
|
||||
|
||||
- [x] 5. Interpolation engine — core scoring logic
|
||||
- [x] 5.1 Implement `services/aggregation/interpolation.py`
|
||||
- Implement `MacroImpactRecord` dataclass matching the design specification
|
||||
- Implement `compute_geographic_overlap(event_regions, revenue_mix) -> float` using revenue percentage weighting
|
||||
- Implement `compute_supply_chain_overlap(event_regions, supply_regions) -> float` using set intersection ratio
|
||||
- Implement `compute_commodity_overlap(event_commodities, company_commodities) -> float` using set intersection ratio
|
||||
- Implement `apply_resilience_modifier(raw_score, tier, event_is_international) -> float` with tier multipliers: global_leader=0.7, multinational=0.85, regional=1.0, domestic=1.2
|
||||
- Implement `compute_macro_impact(event: GlobalEvent, profile: ExposureProfile) -> MacroImpactRecord` using the scoring formula: `severity_weight * (0.35*geo + 0.25*supply + 0.25*commodity + 0.15*sector)` then resilience modifier
|
||||
- Implement `build_default_profile(sector, industry, market_cap_bucket) -> ExposureProfile` for companies without manual profiles
|
||||
- Handle zero-overlap case: return score 0.0 and skip further processing
|
||||
- Handle mixed direction: when both positive and negative factors exist, set direction to 'mixed' and preserve both factor lists
|
||||
- Persist `MacroImpactRecord` objects to the `macro_impact_records` PostgreSQL table
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 3.2_
|
||||
|
||||
- [x] 5.2 Write property test for macro impact score bounds and zero-overlap invariant
|
||||
- **Property 7: Macro impact score bounds and zero-overlap invariant**
|
||||
- **Validates: Requirements 4.1, 4.4**
|
||||
|
||||
- [x] 5.3 Write property test for scoring monotonicity
|
||||
- **Property 8: Scoring monotonicity**
|
||||
- **Validates: Requirements 4.2**
|
||||
|
||||
- [x] 5.4 Write property test for resilience modifier tier ordering
|
||||
- **Property 9: Resilience modifier tier ordering**
|
||||
- **Validates: Requirements 4.3**
|
||||
|
||||
- [x] 5.5 Write property test for mixed direction dual-effect events
|
||||
- **Property 10: Mixed direction for dual-effect events**
|
||||
- **Validates: Requirements 4.6**
|
||||
|
||||
- [x] 6. Checkpoint — Ensure core scoring logic and property tests pass
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 7. Aggregation engine integration
|
||||
- [x] 7.1 Extend `services/aggregation/worker.py` to incorporate macro signals
|
||||
- Add `macro_signal_weight` and `macro_enabled` fields to `AggregationConfig`
|
||||
- In `aggregate_company_window`, check macro toggle state from `risk_configs` table
|
||||
- Fetch `macro_impact_records` for the ticker within the aggregation window
|
||||
- Convert each `MacroImpactRecord` to a `WeightedSignal` using: `document_id=event.source_document_id`, `sentiment_value` mapped from `impact_direction`, `impact_score=macro_impact_score * macro_signal_weight`, recency decay from event publication time, confidence gating from macro record confidence
|
||||
- Merge macro signals with company-specific signals before computing trend direction, strength, confidence, and contradiction score
|
||||
- Include contributing `GlobalEvent` source_document_ids in evidence references
|
||||
- When macro layer is disabled or no macro data exists, produce identical output to company-only aggregation
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6_
|
||||
|
||||
- [x] 7.2 Write property test for macro signals influencing trend output
|
||||
- **Property 11: Macro signals influence trend output**
|
||||
- **Validates: Requirements 5.1**
|
||||
|
||||
- [x] 7.3 Write property test for macro-company contradiction detection
|
||||
- **Property 12: Macro-company contradiction detection**
|
||||
- **Validates: Requirements 5.3**
|
||||
|
||||
- [x] 7.4 Write property test for macro evidence traceability
|
||||
- **Property 13: Macro evidence traceability**
|
||||
- **Validates: Requirements 5.4**
|
||||
|
||||
- [x] 7.5 Write property test for no degradation without macro data and disabled-layer equivalence
|
||||
- **Property 14: No degradation without macro data and disabled-layer equivalence**
|
||||
- **Validates: Requirements 5.5, 11.2**
|
||||
|
||||
- [x] 8. Sector and market rollup enhancement
|
||||
- [x] 8.1 Extend sector and market rollup logic in `services/aggregation/worker.py`
|
||||
- When computing sector-level rollups, incorporate macro impact signals affecting the sector weighted by constituent company exposure
|
||||
- When computing market-level rollups, aggregate macro signals across all sectors reflecting breadth and severity
|
||||
- When a GlobalEvent disproportionately affects one sector (>60% of total macro impact), surface that sector in `material_risks` or `dominant_catalysts` of the market-level rollup
|
||||
- _Requirements: 6.1, 6.2, 6.3_
|
||||
|
||||
- [x] 8.2 Write property test for sector and market rollup macro incorporation
|
||||
- **Property 15: Sector and market rollup macro incorporation**
|
||||
- **Validates: Requirements 6.1, 6.2, 6.3**
|
||||
|
||||
- [x] 9. Trend projection module
|
||||
- [x] 9.1 Implement `services/aggregation/projection.py`
|
||||
- Implement `TrendProjection` dataclass matching the design specification
|
||||
- Implement projection logic: compute trend momentum (rate of change in strength across recent windows), project macro signal decay based on `estimated_duration` and severity, factor in upcoming catalysts from document intelligence, combine into projected direction/strength/confidence
|
||||
- Flag divergence when projected direction differs from current trend direction, include divergence reason in `driving_factors`
|
||||
- When macro layer is disabled, compute projections using only company-specific momentum with reduced confidence
|
||||
- Mark projections with `projected_confidence` below threshold (default 0.3) as `low_confidence`
|
||||
- Persist `TrendProjection` to the `trend_projections` PostgreSQL table alongside the trend_window record
|
||||
- Call projection computation from `aggregate_company_window` after trend summary is assembled
|
||||
- _Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.9_
|
||||
|
||||
- [x] 9.2 Write property test for trend projection always produced
|
||||
- **Property 20: Trend projection always produced**
|
||||
- **Validates: Requirements 12.1**
|
||||
|
||||
- [x] 9.3 Write property test for projection divergence flagging
|
||||
- **Property 21: Projection divergence flagging**
|
||||
- **Validates: Requirements 12.3**
|
||||
|
||||
- [x] 9.4 Write property test for macro-disabled projections have reduced confidence
|
||||
- **Property 22: Macro-disabled projections have reduced confidence**
|
||||
- **Validates: Requirements 12.4**
|
||||
|
||||
- [x] 9.5 Write property test for low-confidence projection exclusion
|
||||
- **Property 23: Low-confidence projection exclusion**
|
||||
- **Validates: Requirements 12.9**
|
||||
|
||||
- [x] 10. Checkpoint — Ensure aggregation integration and projections work correctly
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 11. Macro signal suppression and safety
|
||||
- [x] 11.1 Implement exposure profile auto-inference in `services/extractor/exposure_inference.py`
|
||||
- Implement `infer_exposure_profile(document_intelligences, sector, industry, market_cap_bucket) -> ExposureProfile`
|
||||
- Scan recent filing extractions for geographic revenue breakdowns, supplier mentions, and commodity references
|
||||
- Produce profile with `source='inferred'` and a confidence score reflecting data quality
|
||||
- Fall back to sector-based default profile when insufficient filing data
|
||||
- _Requirements: 9.1, 9.2, 9.3_
|
||||
|
||||
- [x] 11.2 Write property test for inferred exposure profile correctness
|
||||
- **Property 16: Inferred exposure profile correctness**
|
||||
- **Validates: Requirements 9.1, 9.2**
|
||||
|
||||
- [x] 11.3 Extend `services/recommendation/suppression.py` with macro-only suppression
|
||||
- Add `MACRO_ONLY_SIGNAL = "macro_only_signal"` to `SuppressionReason` enum
|
||||
- Implement `evaluate_macro_only_suppression(summary, macro_signal_count, company_signal_count) -> bool`
|
||||
- When macro signals are the sole basis for a trend direction change, force recommendation to `mode='informational'` and append macro-only caveat to thesis
|
||||
- _Requirements: 10.3_
|
||||
|
||||
- [x] 11.4 Write property test for macro-only recommendation suppression
|
||||
- **Property 19: Macro-only recommendation suppression**
|
||||
- **Validates: Requirements 10.3**
|
||||
|
||||
- [x] 11.5 Implement low-confidence event exclusion and accelerated decay in interpolation engine
|
||||
- In `services/aggregation/interpolation.py`, skip events with confidence below configurable threshold (default 0.4) and log exclusion reason
|
||||
- Apply accelerated decay factor for short_term events older than 48 hours (effective weight strictly less than standard recency decay)
|
||||
- _Requirements: 10.1, 10.2_
|
||||
|
||||
- [x] 11.6 Write property test for low-confidence event exclusion
|
||||
- **Property 17: Low-confidence event exclusion**
|
||||
- **Validates: Requirements 10.1**
|
||||
|
||||
- [x] 11.7 Write property test for accelerated decay for stale short-term events
|
||||
- **Property 18: Accelerated decay for stale short-term events**
|
||||
- **Validates: Requirements 10.2**
|
||||
|
||||
- [x] 12. Macro signal layer toggle and API endpoints
|
||||
- [x] 12.1 Implement macro toggle and status endpoints in `services/api/app.py`
|
||||
- Add `GET /api/admin/macro/status` returning current enabled/disabled state from `risk_configs` table
|
||||
- Add `PUT /api/admin/macro/toggle` to switch macro layer on/off, persisting to `risk_configs` and recording an audit event with previous state, new state, and operator
|
||||
- Toggle state is read from PostgreSQL at the start of each aggregation cycle (no caching)
|
||||
- _Requirements: 11.1, 11.5, 11.7_
|
||||
|
||||
- [x] 12.2 Implement macro event and impact query endpoints in `services/api/app.py`
|
||||
- Add `GET /api/macro/events` — list recent global events with filtering by severity, region, sector, date range
|
||||
- Add `GET /api/macro/events/{event_id}` — event detail with list of affected companies and their macro impact scores
|
||||
- Add `GET /api/macro/impacts/{ticker}` — macro impacts for a specific company
|
||||
- Add `GET /api/trends/{trend_id}/projection` — trend projection for a specific trend window
|
||||
- Include projection data in existing `GET /api/trends` list response
|
||||
- _Requirements: 8.1, 8.2, 12.10_
|
||||
|
||||
- [x] 12.3 Ensure macro ingestion continues when layer is disabled
|
||||
- When macro layer is disabled, ingestion and classification continue (historical data preserved), but interpolation and aggregation integration are skipped
|
||||
- When re-enabled, resume computing macro impact scores using most recent classifications including events ingested while disabled
|
||||
- _Requirements: 11.2, 11.3, 11.4_
|
||||
|
||||
- [x] 13. Checkpoint — Ensure API endpoints and toggle logic work correctly
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 14. Lake publisher extensions
|
||||
- [x] 14.1 Add macro fact publishers to the lake publisher service
|
||||
- Implement `publish_global_event_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/global_events/dt={date}/`
|
||||
- Implement `publish_macro_impact_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/macro_impacts/dt={date}/ticker={ticker}/`
|
||||
- Implement `publish_trend_projection_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/trend_projections/dt={date}/ticker={ticker}/`
|
||||
- Register new fact types in the lake publisher's job processing loop
|
||||
- _Requirements: 7.3, 12.6_
|
||||
|
||||
- [x] 14.2 Write property test for macro data persistence round-trip
|
||||
- **Property 4: Macro data persistence round-trip**
|
||||
- **Validates: Requirements 3.1, 7.1, 7.2, 12.5**
|
||||
|
||||
- [x] 14.3 Write property test for content hash stability and uniqueness
|
||||
- **Property 1: Content hash stability and uniqueness**
|
||||
- **Validates: Requirements 1.2**
|
||||
|
||||
- [x] 15. Macro ingestion pipeline wiring
|
||||
- [x] 15.1 Wire macro source ingestion into the scheduler and ingestion worker
|
||||
- Configure scheduler to trigger macro news source fetches on polling interval
|
||||
- Ingestion worker stores raw payloads in MinIO under `stonks-raw-news/macro/` prefix
|
||||
- Metadata records use `document_type='macro_event'` in PostgreSQL
|
||||
- Content hash deduplication consistent with existing behavior
|
||||
- Source failure handling with retry policy consistent with existing sources
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4_
|
||||
|
||||
- [x] 15.2 Wire event classification into the extractor worker
|
||||
- After parsing, route `macro_event` documents to `event_classifier.classify_global_event()` instead of standard document extraction
|
||||
- After classification, trigger interpolation for all tracked companies via aggregation queue
|
||||
- _Requirements: 2.1, 2.2, 2.3_
|
||||
|
||||
- [x] 15.3 Wire interpolation into the aggregation pipeline
|
||||
- After event classification, load exposure profiles for all tracked companies (manual, inferred, or default)
|
||||
- Compute `MacroImpactRecord` for each company with non-zero overlap
|
||||
- Persist records and trigger aggregation for affected tickers
|
||||
- Handle sustained macro ingestion failures: alert operators and continue with company-only signals
|
||||
- _Requirements: 4.1, 4.5, 10.4_
|
||||
|
||||
- [x] 16. Checkpoint — Ensure full backend pipeline works end-to-end
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 17. Dashboard — Global Events page and macro exposure panel
|
||||
- [x] 17.1 Create Global Events list page at `frontend/src/pages/GlobalEvents.tsx`
|
||||
- Filterable list of recent global events with columns: summary, impact types, severity badge, affected regions, affected sectors, event date
|
||||
- Add API hooks for `GET /api/macro/events` in `frontend/src/api/hooks.ts`
|
||||
- Add route `/macro/events` in `frontend/src/routes.tsx`
|
||||
- Add navigation entry in sidebar in `frontend/src/components/AppLayout.tsx`
|
||||
- _Requirements: 8.1_
|
||||
|
||||
- [x] 17.2 Create Global Event detail page at `frontend/src/pages/GlobalEventDetail.tsx`
|
||||
- Display full classification detail: all affected companies with Macro_Impact_Scores, impact directions, contributing factors
|
||||
- Add API hook for `GET /api/macro/events/{event_id}`
|
||||
- Add route `/macro/events/:id` in `frontend/src/routes.tsx`
|
||||
- _Requirements: 8.2_
|
||||
|
||||
- [x] 17.3 Add macro exposure panel to Company Detail page
|
||||
- On `frontend/src/pages/CompanyDetail.tsx`, add a new tab/panel showing the company's Exposure_Profile and active GlobalEvents affecting the company with their Macro_Impact_Scores
|
||||
- Add API hook for `GET /api/macro/impacts/{ticker}`
|
||||
- _Requirements: 8.3_
|
||||
|
||||
- [x] 17.4 Add macro evidence indicators to Trend and Recommendation detail pages
|
||||
- On `frontend/src/pages/TrendDetail.tsx`, visually distinguish macro-sourced evidence from company-specific evidence in the evidence chain
|
||||
- On `frontend/src/pages/RecommendationDetail.tsx`, display macro signals that contributed with links back to originating GlobalEvents
|
||||
- _Requirements: 8.4, 8.5_
|
||||
|
||||
- [x] 17.5 Add trend projection display to Trend detail page
|
||||
- On `frontend/src/pages/TrendDetail.tsx`, display projected direction/strength alongside current trend with visual indicator and expandable driving factors panel
|
||||
- Add API hook for `GET /api/trends/{trend_id}/projection`
|
||||
- _Requirements: 12.7_
|
||||
|
||||
- [x] 17.6 Add macro toggle to Trading Controls page
|
||||
- On `frontend/src/pages/Trading.tsx`, add macro signal layer enable/disable switch with confirmation dialog
|
||||
- Add API hooks for `GET /api/admin/macro/status` and `PUT /api/admin/macro/toggle`
|
||||
- _Requirements: 11.5, 11.6_
|
||||
|
||||
- [x] 18. Checkpoint — Ensure frontend pages render and integrate with API
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 19. Integration wiring and final validation
|
||||
- [x] 19.1 Add recommendation engine integration for trend projections
|
||||
- Incorporate trend projection into recommendation thesis and time_horizon fields, citing projected direction and key driving factors
|
||||
- Exclude low-confidence projections from influencing recommendation eligibility
|
||||
- _Requirements: 12.8, 12.9_
|
||||
|
||||
- [x] 19.2 Write integration tests for macro pipeline end-to-end
|
||||
- Test macro article ingestion → parsing → classification → interpolation → aggregation flow
|
||||
- Test lake publisher writes correct Parquet partitions for global events and macro impacts
|
||||
- Test macro toggle state change propagates to next aggregation cycle
|
||||
- _Requirements: 1.1, 2.1, 4.1, 5.1, 7.3, 11.1_
|
||||
|
||||
- [x] 19.3 Write unit tests for API endpoints and dashboard components
|
||||
- Test macro event list/detail endpoints return correct data
|
||||
- Test macro toggle endpoint persists state and records audit event
|
||||
- Test trend projection endpoint returns projection data
|
||||
- Add MSW handlers for macro endpoints in `frontend/src/test/mocks/handlers.ts`
|
||||
- Test GlobalEvents page and macro exposure panel render correctly
|
||||
- _Requirements: 8.1, 8.2, 11.5, 12.10_
|
||||
|
||||
- [x] 20. Final checkpoint — Ensure all tests pass
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
## Notes
|
||||
|
||||
- Tasks marked with `*` are optional and can be skipped for faster MVP
|
||||
- Each task references specific requirements for traceability
|
||||
- Checkpoints ensure incremental validation after each major phase
|
||||
- Property tests validate the 23 correctness properties from the design using Hypothesis
|
||||
- The design uses Python throughout — no language selection needed
|
||||
- No new Kubernetes deployments required; all modules extend existing services
|
||||
- Next migration number is 016
|
||||
Reference in New Issue
Block a user