feat: competitive intelligence & historical pattern matching layer

This commit is contained in:
Celes Renata
2026-04-14 19:42:48 +00:00
parent b478022ba3
commit f7a11d14ea
203 changed files with 20155 additions and 97 deletions
@@ -0,0 +1 @@
{"specId": "3e745894-9abc-49ff-97cc-c921f436bb32", "workflowType": "requirements-first", "specType": "feature"}
@@ -0,0 +1,581 @@
# Competitive Intelligence & Historical Pattern Matching Layer — Design
## Overview
This design adds a third signal layer to the Stonks Oracle aggregation engine: competitive intelligence and historical pattern matching. The layer mines existing PostgreSQL data — `document_impact_records`, `trend_windows`, and `document_company_mentions` — to identify how similar catalyst types resolved historically for a company and its competitors, then feeds pattern-based signals into the aggregation engine alongside company-specific (layer 1) and macro (layer 2) signals.
The design follows the same integration pattern as the macro interpolation layer: new modules within existing services, a runtime toggle in `risk_configs`, and the same `WeightedSignal` abstraction for aggregation. No new Kubernetes deployments are required.
### Design Rationale
- **Mine existing data, no new ingestion**: All pattern signals derive from data already in PostgreSQL — document intelligence, impact records, and trend windows. No new external data sources or ingestion pipelines.
- **Reuse existing scoring pipeline**: Pattern signals convert to `WeightedSignal` objects using the same `compute_signal_weight` function, ensuring consistent recency decay, confidence gating, and contradiction detection.
- **Parallel to macro layer**: The competitive layer toggle, suppression logic, and aggregation integration mirror the macro layer's architecture for consistency.
- **Safety-first**: Low-confidence patterns (< 0.3) are excluded, pattern-only trend shifts are forced to informational mode, and the entire layer is independently toggleable.
- **Competitor relationships as first-class entities**: Both operator-defined and auto-inferred relationships, with strength scores that gate signal propagation.
## Architecture
The competitive intelligence layer adds five logical components within existing services:
```mermaid
flowchart TD
subgraph SymReg["Symbol Registry (existing)"]
CR[Competitor Registry]
AI[Auto-Inference Engine]
end
subgraph Aggregation["Aggregation Service (existing)"]
PM[Pattern Matcher]
SPE[Signal Propagation Engine]
AE[Aggregation Engine]
end
subgraph Recommendation["Recommendation Service (existing)"]
PS[Pattern-Only Suppression]
end
subgraph LakePublisher["Lake Publisher (existing)"]
LP[Competitive Fact Publisher]
end
subgraph QueryAPI["Query API (existing)"]
PA[Pattern API Endpoints]
CT[Competitive Toggle Endpoint]
end
subgraph Dashboard["Dashboard (existing)"]
CP[Competitors Panel]
HP[Historical Patterns Panel]
CS[Competitive Signals Panel]
DT[Decision Timeline]
end
CR -->|competitor relationships| SPE
AI -->|inferred relationships| CR
PM -->|historical patterns| SPE
PM -->|self-company patterns| AE
SPE -->|competitive signals| AE
AE -->|trend summaries| PS
SPE -->|signal records| LP
CT -->|toggle state| AE
PA --> CP
PA --> HP
PA --> CS
PA --> DT
```
### Data Flow
1. **Competitor Management**: Operators define competitor relationships via the Symbol Registry API, or trigger auto-inference from sector/industry and document co-mentions. Relationships are stored in `competitor_relationships`.
2. **Pattern Mining**: When the aggregation engine runs for a ticker, the Pattern Matcher queries `document_impact_records` joined with `trend_windows` to find historical instances of the same catalyst type. It computes outcome statistics (bullish_pct, bearish_pct, avg_strength) and a pattern_confidence score.
3. **Signal Propagation**: The Signal Propagation Engine looks up the ticker's competitors, queries the Pattern Matcher for cross-company historical patterns, and produces `competitive_signal_records` weighted by relationship strength × pattern confidence × source impact score.
4. **Aggregation**: Pattern signals (both self-company and competitive) are converted to `WeightedSignal` objects and merged into the existing signal list. The competitive layer toggle is checked from `risk_configs` at the start of each cycle.
5. **Recommendation Safety**: Pattern-only trend shifts (no supporting company-specific or macro signals) are forced to informational mode with a pattern-only caveat.
6. **Lake Publication**: Competitor relationships and competitive signal facts are published as partitioned Parquet datasets.
## Components and Interfaces
### Competitor Registry
**Location**: `services/symbol_registry/competitors.py` (new module, registered as a FastAPI router in `app.py`)
Manages competitor relationships with CRUD operations and audit logging.
```python
class CompetitorRelationshipCreate(BaseModel):
company_b_id: str
relationship_type: str # direct_rival | same_sector | overlapping_products | supply_chain_adjacent
strength: float # [0, 1]
bidirectional: bool = True
source: str = "manual" # manual | inferred
class CompetitorRelationship(BaseModel):
id: str
company_a_id: str
company_b_id: str
relationship_type: str
strength: float
bidirectional: bool
source: str
active: bool
created_at: datetime
updated_at: datetime
```
**API Endpoints** (on Symbol Registry):
- `POST /companies/{company_id}/competitors` — create relationship
- `GET /companies/{company_id}/competitors` — list relationships (ordered by strength desc)
- `PUT /companies/{company_id}/competitors/{relationship_id}` — update relationship
- `DELETE /companies/{company_id}/competitors/{relationship_id}` — soft-delete (set active=false)
- `POST /companies/{company_id}/competitors/infer` — trigger auto-inference
**Auto-Inference Logic** (`services/symbol_registry/competitor_inference.py`):
1. Query companies sharing the same sector and industry
2. Rank candidates by co-mention frequency in `document_company_mentions`
3. Compute strength = `0.3 * sector_match + 0.7 * normalized_co_mention_count`
4. Upsert relationships with `source='inferred'`, refreshing strength on re-inference
5. Return candidate list for operator review
### Pattern Matcher
**Location**: `services/aggregation/pattern_matcher.py`
Queries historical data to find how similar catalyst types resolved for a company or its competitors.
```python
@dataclass
class HistoricalPattern:
source_ticker: str # company that received the original catalyst
target_ticker: str # company being evaluated (same as source for self-patterns)
catalyst_type: str
time_horizon: str # 1d | 7d | 30d
sample_count: int
bullish_pct: float # [0, 1]
bearish_pct: float # [0, 1]
avg_strength: float # [0, 1]
avg_time_to_resolution: float # days
pattern_confidence: float # [0, 1]
data_start: datetime
data_end: datetime
tier: str # major_corporate_decision | routine_signal
insufficient_data: bool # True when sample_count < 3
```
**Core Functions**:
- `find_self_patterns(pool, ticker, catalyst_type, horizons) -> list[HistoricalPattern]`
- `find_cross_company_patterns(pool, source_ticker, target_ticker, catalyst_type, horizons) -> list[HistoricalPattern]`
- `compute_pattern_confidence(sample_count, outcome_consistency, data_recency_days) -> float`
- `classify_catalyst_tier(catalyst_type) -> str` — returns `major_corporate_decision` or `routine_signal`
**Pattern Confidence Formula**:
```
sample_factor = min(sample_count / 20, 1.0) # diminishing returns above 20
consistency = max(bullish_pct, bearish_pct) # how uniform outcomes are
recency_factor = 1.0 if newest_within_90d else 0.7 if newest_within_180d else 0.4
confidence = sample_factor * 0.4 + consistency * 0.4 + recency_factor * 0.2
```
**Insufficient Data**: When `sample_count < 3`, confidence is capped at 0.25 and `insufficient_data = True`.
**Staleness Decay** (Req 9.2): When no instances exist in the last 90 days and all data is older than 180 days, a 0.5 decay penalty is applied to confidence.
**Catalyst Tier Classification** (Req 11.1):
- `major_corporate_decision`: catalyst types `m_and_a`, `legal`, `restructuring`, `leadership_change`, `strategic_pivot`, `buyback`, `dividend_change`
- `routine_signal`: all other catalyst types
- Major decisions use 365-day lookback; routine signals use 180-day lookback
- Major decisions receive a 1.3× base weight multiplier on pattern_confidence
**Historical Query**: Only considers `document_impact_records` linked to `document_intelligence` with `validation_status = 'valid'` and `documents` with `status != 'rejected'`.
### Signal Propagation Engine
**Location**: `services/aggregation/signal_propagation.py`
Evaluates incoming document intelligence, identifies competitors, queries historical patterns, and produces competitive signals.
```python
@dataclass
class CompetitiveSignalRecord:
source_document_id: str
source_ticker: str
target_ticker: str
catalyst_type: str
pattern_confidence: float
signal_direction: str # bullish | bearish
signal_strength: float # [0, 1]
relationship_strength: float
computed_at: datetime
```
**Core Functions**:
- `propagate_signals(pool, ticker, catalyst_type, impact_score, document_id, config) -> list[CompetitiveSignalRecord]`
- `build_pattern_weighted_signals(patterns, competitive_signals, reference_time, window, config) -> list[WeightedSignal]`
**Signal Weighting**:
```
signal_strength = pattern.avg_strength * relationship.strength * pattern.pattern_confidence * source_impact_score
signal_direction = "bullish" if pattern.bullish_pct > pattern.bearish_pct else "bearish"
```
**Propagation Threshold** (Req 4.5): Skip propagation when `relationship.strength < 0.2` (configurable).
**Confidence Threshold** (Req 9.1): Exclude patterns with `pattern_confidence < 0.3` (configurable).
### Aggregation Engine Extensions
**Location**: Modified `services/aggregation/worker.py`
The existing `aggregate_company_window` function is extended to:
1. Check the competitive layer toggle from `risk_configs` (same pattern as macro toggle)
2. Query self-company historical patterns for active catalyst types in the window
3. Query competitive signals targeting this ticker
4. Convert pattern/competitive signals to `WeightedSignal` objects
5. Merge with company-specific and macro signals before computing the trend summary
**New config field on `AggregationConfig`**:
```python
competitive_signal_weight: float = 0.2 # relative weight of pattern signals
competitive_enabled: bool = True # runtime toggle state
```
**Pattern signal conversion**: Each pattern signal is converted to a `WeightedSignal` using:
- `document_id` = source document that triggered the pattern lookup (for evidence tracing)
- `sentiment_value` = +1.0 if pattern direction is bullish, -1.0 if bearish
- `impact_score` = `signal_strength * competitive_signal_weight`
- Recency decay uses the source document's publication time
- Confidence gating uses `pattern_confidence` as the extraction confidence
**No-degradation guarantee** (Req 5.5): When no patterns or competitive signals exist, the aggregation produces identical output to the two-layer engine.
### Pattern-Only Suppression
**Location**: Extended `services/recommendation/suppression.py`
New suppression check mirroring `evaluate_macro_only_suppression`:
```python
PATTERN_ONLY_CAVEAT = (
"[Pattern-only signal] This trend direction is driven solely by historical "
"pattern and competitive signals with no supporting company-specific or macro "
"evidence. Recommendation is informational only."
)
def evaluate_pattern_only_suppression(
summary: TrendSummary,
pattern_signal_count: int,
company_signal_count: int,
macro_signal_count: int,
) -> bool
```
New `SuppressionReason` enum value: `PATTERN_ONLY_SIGNAL = "pattern_only_signal"`
### Query API Extensions
**Location**: Extended `services/api/app.py`
New endpoints:
- `GET /api/patterns/{ticker}` — historical patterns for a company, filterable by `catalyst_type` and `time_horizon`
- `GET /api/patterns/{ticker}/competitors` — cross-company patterns showing how this company's catalysts affected competitors
- `GET /api/patterns/{ticker}/competitive-signals` — recent competitive signals targeting this company
- `GET /api/patterns/{ticker}/decisions` — major corporate decision history with trend outcomes
- `GET /api/admin/competitive/status` — competitive layer enabled/disabled state
- `PUT /api/admin/competitive/toggle` — toggle competitive layer on/off
### Dashboard Extensions
**Location**: Extended `frontend/src/`
New panels on Company Detail page (new tabs alongside existing sources/aliases/macro):
- **Competitors tab**: Active competitor relationships with ticker, relationship_type, strength, source
- **Historical Patterns tab**: Recent patterns for the company — catalyst_type, outcome distribution, sample_count, confidence
- **Competitive Signals tab**: Incoming competitive signals — source ticker, catalyst_type, direction, strength
- **Decisions tab**: Corporate decision timeline — major events with catalyst type, date, summary, trend outcome
Trend detail page extensions:
- Visual distinction for pattern-based and competitive signal evidence (badge/icon differentiation)
- Click-through on competitive signals showing full signal detail
Trading Controls page:
- Competitive layer toggle alongside existing macro toggle, with confirmation dialog
## Data Models
### New PostgreSQL Tables (Migration 017)
#### `competitor_relationships`
```sql
CREATE TABLE competitor_relationships (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
company_a_id UUID NOT NULL REFERENCES companies(id),
company_b_id UUID NOT NULL REFERENCES companies(id),
relationship_type VARCHAR(30) NOT NULL,
strength FLOAT NOT NULL DEFAULT 0.5,
bidirectional BOOLEAN NOT NULL DEFAULT TRUE,
source VARCHAR(20) NOT NULL DEFAULT 'manual',
active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT chk_relationship_type CHECK (
relationship_type IN ('direct_rival', 'same_sector', 'overlapping_products', 'supply_chain_adjacent')
),
CONSTRAINT chk_strength CHECK (strength >= 0 AND strength <= 1),
CONSTRAINT chk_source CHECK (source IN ('manual', 'inferred')),
CONSTRAINT chk_different_companies CHECK (company_a_id != company_b_id)
);
CREATE INDEX idx_competitor_rel_company_a ON competitor_relationships(company_a_id) WHERE active = TRUE;
CREATE INDEX idx_competitor_rel_company_b ON competitor_relationships(company_b_id) WHERE active = TRUE;
CREATE UNIQUE INDEX idx_competitor_rel_unique_pair ON competitor_relationships(
LEAST(company_a_id, company_b_id), GREATEST(company_a_id, company_b_id)
) WHERE active = TRUE;
```
#### `competitive_signal_records`
```sql
CREATE TABLE competitive_signal_records (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_document_id UUID REFERENCES documents(id),
source_ticker VARCHAR(20) NOT NULL,
target_ticker VARCHAR(20) NOT NULL,
catalyst_type VARCHAR(50) NOT NULL,
pattern_confidence FLOAT NOT NULL,
signal_direction VARCHAR(20) NOT NULL,
signal_strength FLOAT NOT NULL,
relationship_strength FLOAT NOT NULL,
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_competitive_signals_target ON competitive_signal_records(target_ticker, computed_at DESC);
CREATE INDEX idx_competitive_signals_source ON competitive_signal_records(source_ticker, computed_at DESC);
```
### New Pydantic Schemas
Added to `services/shared/schemas.py`:
```python
class RelationshipType(str, Enum):
DIRECT_RIVAL = "direct_rival"
SAME_SECTOR = "same_sector"
OVERLAPPING_PRODUCTS = "overlapping_products"
SUPPLY_CHAIN_ADJACENT = "supply_chain_adjacent"
class CatalystTier(str, Enum):
MAJOR_CORPORATE_DECISION = "major_corporate_decision"
ROUTINE_SIGNAL = "routine_signal"
# Major corporate decision catalyst types (Req 11.1)
MAJOR_DECISION_CATALYSTS = frozenset({
"m_and_a", "legal", "restructuring", "leadership_change",
"strategic_pivot", "buyback", "dividend_change",
})
```
### New `CompetitiveConfig` in `services/shared/config.py`
```python
@dataclass
class CompetitiveConfig:
competitive_signal_weight: float = 0.2
competitive_enabled: bool = True
pattern_confidence_threshold: float = 0.3
propagation_strength_threshold: float = 0.2
routine_lookback_days: int = 180
major_decision_lookback_days: int = 365
major_decision_weight_multiplier: float = 1.3
staleness_window_days: int = 180
staleness_recent_days: int = 90
staleness_decay_penalty: float = 0.5
min_pattern_samples: int = 3
```
### Analytical Lake Datasets
New fact tables published to MinIO under `stonks-lakehouse/`:
- `lake.competitor_relationships` — partitioned by `dt`, columns: id, company_a_id, company_b_id, relationship_type, strength, bidirectional, source, active, created_at
- `lake.competitive_signals` — partitioned by `dt` and `target_ticker`, columns: id, source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, computed_at
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: Competitor relationship persistence round-trip
*For any* valid CompetitorRelationship object with valid company IDs, relationship_type, strength in [0, 1], bidirectional flag, and source, persisting it to PostgreSQL and reading it back SHALL produce an equivalent object with all fields preserved.
**Validates: Requirements 1.1, 7.1**
### Property 2: Competitor query completeness and ordering
*For any* set of competitor relationships involving a company (as either company_a or company_b), querying competitors for that company SHALL return all active relationships containing that company, and the results SHALL be ordered by strength descending.
**Validates: Requirements 1.2**
### Property 3: Soft-delete preserves row
*For any* active competitor relationship, deleting it SHALL set `active = False` while preserving the row in the database with all original field values intact.
**Validates: Requirements 1.3**
### Property 4: Auto-inference produces valid candidates
*For any* company with a defined sector and industry, running auto-inference SHALL produce only candidate relationships where the candidate company shares the same sector and industry, and all produced relationships SHALL have `source = 'inferred'` with strength in [0, 1].
**Validates: Requirements 2.1, 2.3**
### Property 5: Auto-inference ranks by co-mention frequency
*For any* set of candidate competitors with different co-mention counts in `document_company_mentions`, the auto-inferred relationships SHALL have strength scores that are monotonically non-decreasing with co-mention frequency — candidates with more co-mentions receive higher or equal strength scores.
**Validates: Requirements 2.2**
### Property 6: Auto-inference idempotence
*For any* company, running auto-inference twice in succession SHALL produce the same set of relationships (no duplicates created), with strength scores updated to reflect the latest co-mention data.
**Validates: Requirements 2.4**
### Property 7: Pattern computation correctness
*For any* set of historical `document_impact_records` and `trend_windows` for a company-catalyst pair (or cross-company pair), the computed HistoricalPattern SHALL have: `sample_count` equal to the actual number of matching records, `bullish_pct + bearish_pct + neutral_pct ≈ 1.0`, `avg_strength` equal to the mean of the matched trend strengths, and all fields within their valid ranges.
**Validates: Requirements 3.1, 3.2, 4.2**
### Property 8: Pattern confidence monotonicity
*For any* two HistoricalPatterns where one has strictly more samples, more consistent outcomes, and more recent data than the other (all else equal), the first SHALL have a higher or equal `pattern_confidence`. Additionally, *for any* two patterns with identical statistics but different tiers, the `major_corporate_decision` pattern SHALL have higher confidence than the `routine_signal` pattern.
**Validates: Requirements 3.3, 11.2**
### Property 9: Insufficient data threshold
*For any* HistoricalPattern with `sample_count < 3`, the `pattern_confidence` SHALL be below 0.3 and `insufficient_data` SHALL be True.
**Validates: Requirements 3.4**
### Property 10: Valid-only data filtering
*For any* set of `document_impact_records` containing records linked to invalid intelligence (`validation_status != 'valid'`) or rejected documents (`status = 'rejected'`), the Pattern_Matcher SHALL exclude those records from pattern computation — the resulting `sample_count` SHALL only reflect valid, non-rejected records.
**Validates: Requirements 3.5**
### Property 11: Competitive signal strength monotonicity
*For any* competitive signal computation, increasing the relationship strength, pattern confidence, or source impact score (while holding others constant) SHALL produce a `signal_strength` that is greater than or equal to the previous value.
**Validates: Requirements 4.3**
### Property 12: Signal propagation threshold gating
*For any* competitor relationship with `strength < 0.2` (configurable), the Signal_Propagation_Engine SHALL produce zero competitive signals for that pair. Similarly, *for any* HistoricalPattern with `pattern_confidence < 0.3` (configurable), the pattern SHALL be excluded from competitive signal computation.
**Validates: Requirements 4.5, 9.1**
### Property 13: Pattern signal to WeightedSignal conversion
*For any* pattern-based signal converted to a WeightedSignal, the resulting object SHALL have: `sentiment_value` of +1.0 for bullish patterns or -1.0 for bearish patterns, `impact_score` equal to `signal_strength * competitive_signal_weight`, confidence gating applied using `pattern_confidence`, and recency decay based on the source document's publication time.
**Validates: Requirements 5.2**
### Property 14: Pattern-company contradiction detection
*For any* set of signals where pattern-based signals have a direction opposing company-specific signals (e.g., pattern is bearish while company signals are positive), the resulting trend summary's `contradiction_score` SHALL be greater than zero and `disagreement_details` SHALL contain at least one entry.
**Validates: Requirements 5.3**
### Property 15: Pattern evidence traceability
*For any* trend summary that includes pattern-based or competitive signal contributions, the `top_supporting_evidence` or `top_opposing_evidence` lists SHALL contain the `source_document_id` of at least one contributing pattern signal.
**Validates: Requirements 5.4**
### Property 16: No-degradation and disabled-layer equivalence
*For any* company with no historical patterns or competitive signals in the aggregation window, the trend summary produced with the competitive layer enabled SHALL be identical to the summary produced with it disabled. Furthermore, *for any* aggregation run with the competitive layer disabled, the output SHALL be identical to company+macro-only aggregation regardless of existing pattern data.
**Validates: Requirements 5.5, 6.2**
### Property 17: Staleness decay penalty
*For any* HistoricalPattern where all historical instances are older than 180 days and no instances exist within the last 90 days, the `pattern_confidence` SHALL be strictly less than the confidence computed for an identical pattern with at least one instance within the last 90 days.
**Validates: Requirements 9.2**
### Property 18: Pattern-only suppression
*For any* trend summary where the trend direction is driven solely by pattern-based and competitive signals (no company-specific or macro signals support the direction), the resulting recommendation SHALL have `mode = 'informational'` and the thesis SHALL contain a pattern-only caveat.
**Validates: Requirements 9.3**
### Property 19: Catalyst tier classification determinism
*For any* catalyst type, the tier classification SHALL be deterministic: `m_and_a`, `legal`, `restructuring`, `leadership_change`, `strategic_pivot`, `buyback`, and `dividend_change` SHALL always map to `major_corporate_decision`; all other catalyst types SHALL map to `routine_signal`.
**Validates: Requirements 11.1**
### Property 20: Major decision extended lookback
*For any* pattern mining query for a `major_corporate_decision` catalyst type, the lookback window SHALL be 365 days. *For any* `routine_signal` catalyst type, the lookback window SHALL be 180 days. This applies to both self-company and cross-company pattern queries.
**Validates: Requirements 11.3, 11.5**
### Property 21: Competitive signal persistence round-trip
*For any* valid CompetitiveSignalRecord with all required fields (source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, computed_at), persisting it to PostgreSQL and reading it back SHALL produce an equivalent record with all fields preserved.
**Validates: Requirements 4.4, 7.2**
## Error Handling
### Pattern Mining Failures
- Database errors during historical pattern queries are logged and the pattern is treated as "no data" — the aggregation engine continues with company-specific and macro signals only.
- Malformed or missing `trend_windows` data for a historical period results in that period being excluded from pattern computation (reduced sample_count) rather than failing the entire query.
### Signal Propagation Failures
- If competitor relationship lookup fails, propagation is skipped for that ticker and logged. Aggregation continues with self-company patterns only.
- If pattern mining fails for a specific competitor, that competitor is skipped. Other competitors are still processed.
- Sustained propagation errors exceeding a configurable threshold (default 5 consecutive failures) trigger an operator alert via the existing alerting framework.
### Auto-Inference Failures
- If the `document_company_mentions` table is empty or the query fails, auto-inference returns an empty candidate list with a warning. No relationships are created or modified.
- If sector/industry data is missing for the target company, inference is skipped with a 400 response.
### Competitor Registry Failures
- Attempting to create a relationship between the same company (company_a_id == company_b_id) returns a 400 error.
- Attempting to create a duplicate active relationship returns a 409 conflict.
- Foreign key violations (non-existent company IDs) return a 404 error.
### Runtime Toggle Safety
- Toggle state is read from PostgreSQL at the start of each aggregation cycle — same pattern as the macro toggle, no caching.
- Toggle changes are audit-logged with operator identity, previous state, and new state.
- Disabling the competitive layer does not delete any data — pattern mining remains queryable via the API, only aggregation integration is skipped.
### Graceful Degradation
- The competitive layer is designed to be fully optional. Any failure in pattern mining, signal propagation, or competitive signal computation results in the aggregation engine falling back to company-specific + macro signals with no degradation of existing behavior.
## Testing Strategy
### Property-Based Testing
This feature is well-suited for property-based testing. The core logic — pattern confidence computation, signal strength weighting, threshold gating, catalyst tier classification, and overlap/monotonicity properties — consists of pure functions with clear input/output behavior and a large input space.
**Library**: [Hypothesis](https://hypothesis.readthedocs.io/) for Python property-based testing.
**Configuration**: Minimum 100 iterations per property test.
**Tag format**: `Feature: competitive-historical-patterns, Property {number}: {property_text}`
Each correctness property maps to one property-based test. Generators will produce:
- Random `CompetitorRelationship` objects with valid relationship types, strength in [0, 1], and source values
- Random `HistoricalPattern` objects with valid sample counts, percentage distributions summing to ~1.0, and confidence scores
- Random `CompetitiveSignalRecord` objects with valid direction, strength, and confidence values
- Random sets of `WeightedSignal` objects with mixed sentiment values for contradiction testing
- Random catalyst types drawn from both major decision and routine signal categories
### Unit Testing
Unit tests complement property tests for specific examples and edge cases:
- API endpoint response codes and error handling (CRUD operations, validation errors, 404s, 409s)
- Dashboard component rendering with mock data (competitors panel, patterns panel, signals panel, decision timeline)
- Toggle state transitions and audit logging
- Auto-inference with empty data, single company, no co-mentions
- Pattern mining with zero results, exactly 3 results (boundary), mixed valid/invalid records
### Integration Testing
Integration tests verify end-to-end flows:
- Full aggregation cycle with competitive layer enabled: document intelligence → pattern mining → signal propagation → trend summary
- Lake publisher producing Parquet datasets for competitor relationships and competitive signals
- Toggle disable/re-enable cycle preserving data integrity
- API endpoints returning correct data from PostgreSQL
- Dashboard pages rendering with live API data
@@ -0,0 +1,157 @@
# Requirements Document — Competitive Intelligence & Historical Pattern Matching Layer
## Introduction
This feature adds a third signal layer to the Stonks Oracle aggregation engine: competitive intelligence and historical pattern matching. The existing platform produces per-company trend summaries from two signal sources — company-specific document intelligence (layer 1) and global macro news interpolation (layer 2). This extension introduces a third parallel signal path that mines the existing `document_intelligence`, `document_impact_records`, and `trend_windows` tables to identify historical patterns — how similar catalyst types for the same company or its competitors resolved in the past — and uses those patterns to reinforce or weaken current trend signals.
The core insight is that competitive dynamics are predictable: when a company receives a bullish product catalyst, its direct competitors often experience a measurable bearish reaction within a short window. By mining the platform's own historical data for these patterns, the system can propagate signals across competitor relationships and weight current trends based on how similar situations resolved historically.
This layer does not ingest new external data. It mines existing data already in PostgreSQL — sentiment, catalyst types, impact scores from `document_impact_records`, and historical direction/strength outcomes from `trend_windows` — to produce pattern-based signals that feed into the aggregation engine alongside the other two layers.
## Glossary
- **Competitor_Relationship**: A directional or bidirectional link between two tracked companies indicating they compete in the same market segment. Relationships have a strength score in [0, 1] and a relationship_type (direct_rival, same_sector, overlapping_products, supply_chain_adjacent).
- **Competitor_Registry**: The component within the Symbol_Registry that manages Competitor_Relationships, supporting both operator-defined and auto-inferred relationships.
- **Historical_Pattern**: A statistical summary derived from past `document_impact_records` and `trend_windows` data, describing how a specific catalyst_type for a specific company (or its competitors) historically correlated with trend outcomes within a given time horizon.
- **Pattern_Matcher**: The component that queries historical data to find past instances of similar catalyst types for a company or its competitors, computes outcome statistics, and produces Historical_Pattern objects.
- **Pattern_Signal**: A weighted signal derived from a Historical_Pattern that feeds into the Aggregation_Engine, representing the historical tendency for a given catalyst type to produce a specific trend outcome.
- **Competitive_Signal**: A Pattern_Signal that propagates from one company's news event to a competitor, based on historical evidence of how similar events affected the competitor in the past.
- **Signal_Propagation_Engine**: The component that evaluates incoming document intelligence for a company, identifies its competitors via the Competitor_Registry, queries the Pattern_Matcher for historical precedents, and produces Competitive_Signals for affected competitors.
- **Aggregation_Engine**: The existing trend aggregation system (services/aggregation/) that computes rolling trend summaries from document intelligence signals, macro signals, and now pattern-based signals.
- **Pattern_Confidence**: A score in [0, 1] reflecting how statistically reliable a Historical_Pattern is, based on sample size, consistency of outcomes, and recency of the historical data.
- **Competitive_Layer_Toggle**: A runtime switch allowing operators to enable or disable the competitive/historical pattern signal layer without redeployment, analogous to the macro layer toggle.
## Requirements
### Requirement 1: Competitor Relationship Management
**User Story:** As an operator, I want to define which companies are competitors of each other, so that the platform can propagate signals across competitive relationships.
#### Acceptance Criteria
1. WHEN an operator creates a Competitor_Relationship between two companies, THE Competitor_Registry SHALL persist the relationship containing: company_a_id, company_b_id, relationship_type (one of direct_rival, same_sector, overlapping_products, supply_chain_adjacent), strength (a float in [0, 1] representing how closely the companies compete), bidirectional flag (whether the relationship applies in both directions), and source (manual or inferred).
2. WHEN an operator queries competitors for a given company, THE Competitor_Registry SHALL return all Competitor_Relationships where the company appears as either company_a or company_b, ordered by strength descending.
3. WHEN an operator deletes a Competitor_Relationship, THE Competitor_Registry SHALL soft-delete the relationship by marking it inactive rather than removing the row, preserving audit history.
4. THE Competitor_Registry SHALL expose Competitor_Relationship CRUD operations through the Symbol_Registry REST API.
5. WHEN a Competitor_Relationship is created or updated, THE Competitor_Registry SHALL record an audit event with the previous state, new state, and the operator who made the change.
### Requirement 2: Competitor Auto-Inference
**User Story:** As an operator, I want the platform to automatically suggest competitor relationships based on sector, industry, and document co-mentions, so that I do not have to manually define every relationship.
#### Acceptance Criteria
1. WHEN an operator triggers competitor auto-inference for a company, THE Competitor_Registry SHALL identify candidate competitors by matching companies that share the same sector and industry fields in the companies table.
2. WHEN the Competitor_Registry identifies sector-based candidates, THE Competitor_Registry SHALL further rank candidates by counting co-mentions in the document_company_mentions table — companies frequently mentioned in the same documents receive higher strength scores.
3. WHEN the Competitor_Registry produces auto-inferred relationships, THE Competitor_Registry SHALL mark each relationship with source `inferred` and a strength score derived from the sector match and co-mention frequency, distinguishing them from operator-defined relationships marked as source `manual`.
4. WHEN auto-inferred relationships already exist for a company, THE Competitor_Registry SHALL refresh them on re-inference rather than creating duplicates, updating strength scores based on the latest co-mention data.
5. THE Competitor_Registry SHALL expose an inference endpoint at `POST /companies/{company_id}/competitors/infer` that triggers auto-inference and returns the resulting candidate relationships.
### Requirement 3: Historical Pattern Mining
**User Story:** As a strategist, I want the platform to mine its historical data to find how similar catalyst types resolved in the past for a given company, so that current signals can be weighted by historical precedent.
#### Acceptance Criteria
1. WHEN the Pattern_Matcher receives a query for a company and catalyst_type, THE Pattern_Matcher SHALL search the document_impact_records table for past instances where the same company received the same catalyst_type, and join against trend_windows to determine the trend direction and strength that followed within configurable time horizons (default: 1d, 7d, 30d).
2. WHEN the Pattern_Matcher finds historical instances, THE Pattern_Matcher SHALL compute a Historical_Pattern containing: company ticker, catalyst_type, time_horizon, sample_count (number of historical instances found), bullish_pct (percentage of instances that resolved bullish), bearish_pct (percentage that resolved bearish), avg_strength (average trend strength of the outcomes), avg_time_to_resolution (average days until the trend direction stabilized), and pattern_confidence (a score reflecting statistical reliability).
3. WHEN computing pattern_confidence, THE Pattern_Matcher SHALL weight the score by sample_count (more samples increase confidence, with diminishing returns above 20 samples), outcome_consistency (how uniform the historical outcomes are — 90% bullish is more confident than 55% bullish), and data_recency (patterns from the last 90 days receive higher weight than patterns from 180+ days ago).
4. WHEN the Pattern_Matcher finds fewer than 3 historical instances for a company-catalyst pair, THE Pattern_Matcher SHALL mark the pattern_confidence as low (below 0.3) and flag the pattern as insufficient_data.
5. WHEN the Pattern_Matcher queries historical data, THE Pattern_Matcher SHALL only consider document_impact_records linked to document_intelligence with validation_status `valid` and documents with status not equal to `rejected`.
### Requirement 4: Competitive Signal Propagation
**User Story:** As a strategist, I want the platform to evaluate how news about one company historically affected its competitors, so that competitor news can inform a company's trend assessment.
#### Acceptance Criteria
1. WHEN new document intelligence is produced for a company, THE Signal_Propagation_Engine SHALL identify the company's active competitors via the Competitor_Registry and query the Pattern_Matcher for historical instances where the same catalyst_type hitting the source company correlated with trend outcomes for each competitor.
2. WHEN the Pattern_Matcher finds historical cross-company patterns, THE Pattern_Matcher SHALL compute a Historical_Pattern for the competitor containing: source_ticker (the company that received the original catalyst), target_ticker (the competitor), catalyst_type, time_horizon, sample_count, bullish_pct, bearish_pct, avg_strength, and pattern_confidence.
3. WHEN the Signal_Propagation_Engine produces a Competitive_Signal for a competitor, THE Signal_Propagation_Engine SHALL weight the signal by the Competitor_Relationship strength, the Historical_Pattern's pattern_confidence, and the source document's impact_score.
4. WHEN a Competitive_Signal is produced, THE Signal_Propagation_Engine SHALL persist a competitive_signal_record containing: source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction (bullish or bearish based on historical pattern), signal_strength, relationship_strength, and computed_at timestamp.
5. WHEN the Competitor_Relationship strength is below a configurable threshold (default 0.2), THE Signal_Propagation_Engine SHALL skip signal propagation for that competitor pair and log the skip reason.
### Requirement 5: Pattern-Based Trend Reinforcement
**User Story:** As a strategist, I want historical patterns to strengthen or weaken current trend signals, so that the aggregation engine accounts for how similar situations resolved in the past.
#### Acceptance Criteria
1. WHEN the Aggregation_Engine computes a company trend summary, THE Aggregation_Engine SHALL include pattern-based signals (both self-company historical patterns and competitive signals) as additional weighted signals alongside existing document intelligence and macro signals.
2. WHEN weighting pattern-based signals, THE Aggregation_Engine SHALL apply the pattern_confidence as a confidence gate, the Historical_Pattern's avg_strength as the impact_score, and recency decay based on the source document's publication time, consistent with existing signal scoring.
3. WHEN a Historical_Pattern indicates a direction that contradicts the current company-specific signals, THE Aggregation_Engine SHALL represent the disagreement in the contradiction_score and disagreement_details fields, consistent with existing contradiction detection behavior.
4. WHEN a trend summary includes pattern-based signal contributions, THE Aggregation_Engine SHALL include the source document IDs in the evidence references so that the pattern signal chain is traceable.
5. WHEN no historical patterns or competitive signals exist for a company in the aggregation window, THE Aggregation_Engine SHALL produce the trend summary using only company-specific and macro signals, with no degradation of existing behavior.
6. THE Aggregation_Engine SHALL expose a configurable weight parameter (competitive_signal_weight) that controls the relative influence of pattern-based signals versus other signal layers, defaulting to 0.2.
### Requirement 6: Competitive Layer Toggle
**User Story:** As an operator, I want to enable or disable the competitive intelligence and historical pattern layer at runtime without redeploying services, so that I can control whether historical patterns and competitor signals influence trend summaries.
#### Acceptance Criteria
1. WHEN an operator toggles the competitive signal layer via the Trading Controls page or the API, THE System SHALL persist the setting in the risk_configs table and apply it immediately to subsequent aggregation cycles without requiring a service restart.
2. WHEN the competitive signal layer is disabled, THE Aggregation_Engine SHALL skip all pattern-based and competitive signals and produce trend summaries using only company-specific document intelligence and macro signals (if enabled).
3. WHEN the competitive signal layer is disabled, THE Pattern_Matcher SHALL continue to be queryable for historical patterns (so that the data remains available for manual analysis), but THE Signal_Propagation_Engine SHALL skip automatic competitive signal computation during aggregation.
4. WHEN the competitive signal layer is re-enabled after being disabled, THE Signal_Propagation_Engine SHALL resume computing pattern-based and competitive signals using the latest historical data, including any document intelligence ingested while the layer was disabled.
5. THE Query API SHALL expose a `GET /api/admin/competitive/status` endpoint returning the current enabled/disabled state and a `PUT /api/admin/competitive/toggle` endpoint to switch it.
6. THE Dashboard Trading Controls page SHALL display the competitive signal layer toggle alongside the existing trading mode and macro layer controls, with a confirmation dialog for state changes.
7. WHEN the competitive signal layer state changes, THE System SHALL record an audit event with the previous state, new state, and the operator who made the change.
### Requirement 7: Competitive Intelligence Storage
**User Story:** As a data engineer, I want competitor relationships, historical patterns, and competitive signals stored in both the operational database and the analytical lake, so that I can query competitive intelligence alongside other platform data.
#### Acceptance Criteria
1. WHEN a Competitor_Relationship is created, THE System SHALL persist it in PostgreSQL with fields for company_a_id, company_b_id, relationship_type, strength, bidirectional, source, active status, and timestamps.
2. WHEN a competitive_signal_record is produced, THE System SHALL persist it in PostgreSQL with fields for source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, and computed_at timestamp.
3. WHEN the Lake_Publisher runs, THE Lake_Publisher SHALL publish competitor relationship facts and competitive signal facts as partitioned Parquet datasets to MinIO under the `stonks-lakehouse` bucket.
4. WHEN analytical queries join competitive signal data with company trends, THE System SHALL support SQL joins between competitor_relationships, competitive_signals, trend_windows, and document_impact_records tables through Trino.
### Requirement 8: Dashboard Visibility
**User Story:** As an analyst, I want to see competitor relationships, historical patterns, and competitive signals through the web dashboard, so that I can understand the competitive context behind trend assessments.
#### Acceptance Criteria
1. WHEN an analyst views a company detail page, THE Dashboard SHALL display a competitors panel showing the company's active Competitor_Relationships with each competitor's ticker, relationship_type, strength score, and source (manual or inferred).
2. WHEN an analyst views a company detail page, THE Dashboard SHALL display a historical patterns panel showing recent Historical_Patterns for the company, including catalyst_type, historical outcome distribution (bullish_pct, bearish_pct), sample_count, and pattern_confidence.
3. WHEN an analyst views a trend summary, THE Dashboard SHALL visually distinguish pattern-based and competitive signal evidence from company-specific and macro evidence in the evidence chain.
4. WHEN an analyst clicks a competitive signal in the evidence chain, THE Dashboard SHALL display the full signal detail including the source company, source document, catalyst_type, historical pattern statistics, and the Competitor_Relationship that linked the two companies.
5. WHEN an analyst views a company detail page, THE Dashboard SHALL display an incoming competitive signals panel showing recent Competitive_Signals targeting this company from competitor news, with source ticker, catalyst_type, signal_direction, and signal_strength.
### Requirement 9: Pattern Signal Suppression and Safety
**User Story:** As a risk owner, I want pattern-based and competitive signals to be subject to quality controls, so that low-confidence historical patterns do not drive automated trading decisions.
#### Acceptance Criteria
1. WHEN a Historical_Pattern has a pattern_confidence below a configurable threshold (default 0.3), THE Signal_Propagation_Engine SHALL exclude the pattern from competitive signal computation and log the exclusion reason.
2. WHEN a Historical_Pattern is based on historical data older than a configurable staleness window (default 180 days with no instances in the last 90 days), THE Pattern_Matcher SHALL apply a decay penalty to the pattern_confidence.
3. WHEN pattern-based signals are the sole basis for a trend direction change (no supporting company-specific or macro signals), THE Recommendation_Engine SHALL mark the recommendation as informational only and append a pattern-only caveat to the thesis.
4. IF the competitive signal computation encounters sustained errors exceeding a configurable threshold, THEN THE System SHALL alert operators and continue producing recommendations using only company-specific and macro signals.
### Requirement 10: Historical Pattern Query API
**User Story:** As an analyst, I want to query historical patterns on demand for any company and catalyst type, so that I can manually investigate how similar situations resolved in the past.
#### Acceptance Criteria
1. THE Query API SHALL expose a `GET /api/patterns/{ticker}` endpoint returning all available Historical_Patterns for a company, filterable by catalyst_type and time_horizon.
2. THE Query API SHALL expose a `GET /api/patterns/{ticker}/competitors` endpoint returning cross-company Historical_Patterns showing how the specified company's catalysts historically affected its competitors.
3. WHEN the pattern query endpoints return results, THE Query API SHALL include the underlying sample_count, outcome distribution, pattern_confidence, and the date range of the historical data used.
4. THE Query API SHALL expose a `GET /api/patterns/{ticker}/competitive-signals` endpoint returning recent Competitive_Signals targeting the specified company, with source details and pattern statistics.
### Requirement 11: Corporate Decision History Tracking
**User Story:** As a strategist, I want the platform to identify and track major corporate decisions (acquisitions, divestitures, leadership changes, strategic pivots, major partnerships, stock buybacks, dividend changes, restructurings) from the existing document intelligence, so that historical pattern mining can weight these high-impact events distinctly from routine news.
#### Acceptance Criteria
1. WHEN the Pattern_Matcher mines historical data, THE Pattern_Matcher SHALL classify document_impact_records into two tiers: major_corporate_decision (catalyst types including m_and_a, legal, restructuring, leadership_change, strategic_pivot, buyback, dividend_change) and routine_signal (all other catalyst types), and compute separate Historical_Patterns for each tier.
2. WHEN a major_corporate_decision pattern is found, THE Pattern_Matcher SHALL apply a higher base weight to the pattern_confidence calculation compared to routine_signal patterns, reflecting that major decisions have more predictable and durable market impact.
3. WHEN the Pattern_Matcher computes a Historical_Pattern for a major_corporate_decision, THE Pattern_Matcher SHALL extend the default lookback window to 365 days (compared to 180 days for routine signals), since major corporate decisions are rarer but their outcomes are more structurally significant.
4. WHEN an analyst views a company detail page, THE Dashboard SHALL display a corporate decision timeline showing major_corporate_decision events extracted from the company's document intelligence history, with the catalyst type, date, summary, and the trend outcome that followed.
5. WHEN the Pattern_Matcher evaluates competitive signal propagation for a major_corporate_decision catalyst, THE Pattern_Matcher SHALL search for historical instances where similar major decisions by competitors produced measurable trend shifts for the target company, using the extended 365-day lookback window.
6. THE Query API SHALL expose a `GET /api/patterns/{ticker}/decisions` endpoint returning the company's major corporate decision history with associated trend outcomes and pattern statistics.
@@ -0,0 +1,300 @@
# Implementation Plan: Competitive Intelligence & Historical Pattern Matching Layer
## Overview
This plan implements a third signal layer for the Stonks Oracle aggregation engine: competitive intelligence and historical pattern matching. The layer mines existing PostgreSQL data (document_impact_records, trend_windows, document_company_mentions) to identify how similar catalyst types resolved historically for a company and its competitors, then feeds pattern-based signals into the aggregation engine alongside company-specific (layer 1) and macro (layer 2) signals. All modules extend existing services — no new Kubernetes deployments required. Tasks are ordered so each step builds on the previous, with property-based tests validating core logic early.
## Tasks
- [x] 1. Database migration and shared schemas
- [x] 1.1 Create PostgreSQL migration `infra/migrations/017_competitive_historical_patterns.sql`
- Add `competitor_relationships` table with id (UUID PK), company_a_id (FK companies), company_b_id (FK companies), relationship_type (VARCHAR CHECK direct_rival|same_sector|overlapping_products|supply_chain_adjacent), strength (FLOAT CHECK [0,1]), bidirectional (BOOLEAN), source (VARCHAR CHECK manual|inferred), active (BOOLEAN), created_at, updated_at
- Add `competitive_signal_records` table with id (UUID PK), source_document_id (FK documents), source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, computed_at
- Add CHECK constraint preventing self-referencing relationships (company_a_id != company_b_id)
- Add unique index on (LEAST(company_a_id, company_b_id), GREATEST(company_a_id, company_b_id)) WHERE active = TRUE to prevent duplicate active pairs
- Add indexes: idx_competitor_rel_company_a, idx_competitor_rel_company_b (both WHERE active = TRUE), idx_competitive_signals_target (target_ticker, computed_at DESC), idx_competitive_signals_source (source_ticker, computed_at DESC)
- _Requirements: 7.1, 7.2_
- [x] 1.2 Add new Pydantic schemas and enums to `services/shared/schemas.py`
- Add `RelationshipType` enum (direct_rival, same_sector, overlapping_products, supply_chain_adjacent)
- Add `CatalystTier` enum (major_corporate_decision, routine_signal)
- Add `MAJOR_DECISION_CATALYSTS` frozenset (m_and_a, legal, restructuring, leadership_change, strategic_pivot, buyback, dividend_change)
- Add `CompetitorRelationshipSchema`, `CompetitiveSignalRecordSchema`, `HistoricalPatternSchema` Pydantic models
- _Requirements: 1.1, 4.4, 7.1, 7.2, 11.1_
- [x] 1.3 Add competitive configuration fields to `services/shared/config.py`
- Add `CompetitiveConfig` dataclass with fields: competitive_signal_weight (0.2), competitive_enabled (True), pattern_confidence_threshold (0.3), propagation_strength_threshold (0.2), routine_lookback_days (180), major_decision_lookback_days (365), major_decision_weight_multiplier (1.3), staleness_window_days (180), staleness_recent_days (90), staleness_decay_penalty (0.5), min_pattern_samples (3)
- Add `competitive: CompetitiveConfig` to `AppConfig` with env var loading in `load_config()`
- _Requirements: 5.6, 6.1, 9.1, 9.2, 11.2, 11.3_
- [x] 2. Checkpoint — Ensure migration and schemas are consistent
- Ensure all tests pass, ask the user if questions arise.
- [x] 3. Competitor Registry and auto-inference
- [x] 3.1 Implement `services/symbol_registry/competitors.py`
- Implement `CompetitorRelationshipCreate` and `CompetitorRelationship` Pydantic models for API request/response
- Implement `POST /companies/{company_id}/competitors` — create relationship with audit event
- Implement `GET /companies/{company_id}/competitors` — list active relationships ordered by strength descending
- Implement `PUT /companies/{company_id}/competitors/{relationship_id}` — update relationship with audit event recording previous state
- Implement `DELETE /companies/{company_id}/competitors/{relationship_id}` — soft-delete (set active=False), preserve row
- Register routes as a FastAPI router on the Symbol Registry app
- Handle error cases: self-referencing (400), duplicate active pair (409), non-existent company (404)
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
- [x] 3.2 Write property test for competitor relationship persistence round-trip
- **Property 1: Competitor relationship persistence round-trip**
- **Validates: Requirements 1.1, 7.1**
- [x] 3.3 Write property test for competitor query completeness and ordering
- **Property 2: Competitor query completeness and ordering**
- **Validates: Requirements 1.2**
- [x] 3.4 Write property test for soft-delete preserves row
- **Property 3: Soft-delete preserves row**
- **Validates: Requirements 1.3**
- [x] 3.5 Implement `services/symbol_registry/competitor_inference.py`
- Implement `infer_competitors(pool, company_id) -> list[CompetitorRelationship]`
- Query companies sharing the same sector and industry
- Rank candidates by co-mention frequency in `document_company_mentions`
- Compute strength = `0.3 * sector_match + 0.7 * normalized_co_mention_count`
- Upsert relationships with `source='inferred'`, refreshing strength on re-inference (no duplicates)
- Implement `POST /companies/{company_id}/competitors/infer` endpoint returning candidate relationships
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5_
- [x] 3.6 Write property test for auto-inference produces valid candidates
- **Property 4: Auto-inference produces valid candidates**
- **Validates: Requirements 2.1, 2.3**
- [x] 3.7 Write property test for auto-inference ranks by co-mention frequency
- **Property 5: Auto-inference ranks by co-mention frequency**
- **Validates: Requirements 2.2**
- [x] 3.8 Write property test for auto-inference idempotence
- **Property 6: Auto-inference idempotence**
- **Validates: Requirements 2.4**
- [x] 4. Checkpoint — Ensure competitor registry and inference work correctly
- Ensure all tests pass, ask the user if questions arise.
- [x] 5. Pattern Matcher — core historical pattern mining
- [x] 5.1 Implement `services/aggregation/pattern_matcher.py`
- Implement `HistoricalPattern` dataclass matching the design specification
- Implement `classify_catalyst_tier(catalyst_type) -> str` — deterministic mapping of major_corporate_decision vs routine_signal catalyst types
- Implement `compute_pattern_confidence(sample_count, outcome_consistency, data_recency_days, tier) -> float` using the formula: `sample_factor * 0.4 + consistency * 0.4 + recency_factor * 0.2`, with 1.3× multiplier for major decisions
- Implement `find_self_patterns(pool, ticker, catalyst_type, horizons) -> list[HistoricalPattern]` — query document_impact_records joined with trend_windows for same company-catalyst pair across configurable time horizons (1d, 7d, 30d)
- Implement `find_cross_company_patterns(pool, source_ticker, target_ticker, catalyst_type, horizons) -> list[HistoricalPattern]` — query cross-company historical patterns
- Only consider records linked to document_intelligence with validation_status='valid' and documents with status != 'rejected'
- Apply insufficient data threshold: when sample_count < 3, cap confidence at 0.25 and set insufficient_data=True
- Apply staleness decay: when no instances in last 90 days and all data older than 180 days, apply 0.5 decay penalty
- Use 365-day lookback for major_corporate_decision catalysts, 180-day for routine_signal
- Compute separate HistoricalPatterns for each catalyst tier
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 11.1, 11.2, 11.3, 11.5_
- [x] 5.2 Write property test for pattern computation correctness
- **Property 7: Pattern computation correctness**
- **Validates: Requirements 3.1, 3.2, 4.2**
- [x] 5.3 Write property test for pattern confidence monotonicity
- **Property 8: Pattern confidence monotonicity**
- **Validates: Requirements 3.3, 11.2**
- [x] 5.4 Write property test for insufficient data threshold
- **Property 9: Insufficient data threshold**
- **Validates: Requirements 3.4**
- [x] 5.5 Write property test for valid-only data filtering
- **Property 10: Valid-only data filtering**
- **Validates: Requirements 3.5**
- [x] 5.6 Write property test for catalyst tier classification determinism
- **Property 19: Catalyst tier classification determinism**
- **Validates: Requirements 11.1**
- [x] 5.7 Write property test for major decision extended lookback
- **Property 20: Major decision extended lookback**
- **Validates: Requirements 11.3, 11.5**
- [x] 6. Checkpoint — Ensure pattern matcher and property tests pass
- Ensure all tests pass, ask the user if questions arise.
- [x] 7. Signal Propagation Engine
- [x] 7.1 Implement `services/aggregation/signal_propagation.py`
- Implement `CompetitiveSignalRecord` dataclass matching the design specification
- Implement `propagate_signals(pool, ticker, catalyst_type, impact_score, document_id, config) -> list[CompetitiveSignalRecord]` — look up competitors, query cross-company patterns, produce weighted competitive signals
- Signal weighting: `signal_strength = pattern.avg_strength * relationship.strength * pattern.pattern_confidence * source_impact_score`
- Signal direction: bullish if pattern.bullish_pct > bearish_pct, else bearish
- Skip propagation when relationship.strength < propagation_strength_threshold (default 0.2), log skip reason
- Exclude patterns with pattern_confidence < pattern_confidence_threshold (default 0.3), log exclusion reason
- Persist CompetitiveSignalRecord objects to the competitive_signal_records PostgreSQL table
- Implement `build_pattern_weighted_signals(patterns, competitive_signals, reference_time, window, config) -> list[WeightedSignal]` — convert pattern/competitive signals to WeightedSignal objects for aggregation
- _Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 9.1_
- [x] 7.2 Write property test for competitive signal strength monotonicity
- **Property 11: Competitive signal strength monotonicity**
- **Validates: Requirements 4.3**
- [x] 7.3 Write property test for signal propagation threshold gating
- **Property 12: Signal propagation threshold gating**
- **Validates: Requirements 4.5, 9.1**
- [x] 7.4 Write property test for pattern signal to WeightedSignal conversion
- **Property 13: Pattern signal to WeightedSignal conversion**
- **Validates: Requirements 5.2**
- [x] 7.5 Write property test for competitive signal persistence round-trip
- **Property 21: Competitive signal persistence round-trip**
- **Validates: Requirements 4.4, 7.2**
- [x] 8. Checkpoint — Ensure signal propagation and property tests pass
- Ensure all tests pass, ask the user if questions arise.
- [x] 9. Aggregation engine integration
- [x] 9.1 Extend `services/aggregation/worker.py` to incorporate pattern-based and competitive signals
- Add `competitive_signal_weight` and `competitive_enabled` fields to `AggregationConfig`
- In `aggregate_company_window`, check competitive toggle state from `risk_configs` table (same pattern as macro toggle)
- When competitive layer is enabled: query self-company historical patterns for active catalyst types in the window, query competitive signals targeting this ticker
- Convert each pattern signal to a `WeightedSignal` using: document_id = source document, sentiment_value = +1.0 (bullish) or -1.0 (bearish), impact_score = signal_strength × competitive_signal_weight, recency decay from source document publication time, confidence gating from pattern_confidence
- Merge pattern/competitive signals with company-specific and macro signals before computing trend direction, strength, confidence, and contradiction score
- Include contributing source_document_ids in evidence references for traceability
- When competitive layer is disabled or no pattern data exists, produce identical output to company+macro-only aggregation
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6_
- [x] 9.2 Write property test for pattern-company contradiction detection
- **Property 14: Pattern-company contradiction detection**
- **Validates: Requirements 5.3**
- [x] 9.3 Write property test for pattern evidence traceability
- **Property 15: Pattern evidence traceability**
- **Validates: Requirements 5.4**
- [x] 9.4 Write property test for no-degradation and disabled-layer equivalence
- **Property 16: No-degradation and disabled-layer equivalence**
- **Validates: Requirements 5.5, 6.2**
- [x] 9.5 Write property test for staleness decay penalty
- **Property 17: Staleness decay penalty**
- **Validates: Requirements 9.2**
- [x] 10. Checkpoint — Ensure aggregation integration works correctly
- Ensure all tests pass, ask the user if questions arise.
- [x] 11. Pattern-only suppression and safety
- [x] 11.1 Extend `services/recommendation/suppression.py` with pattern-only suppression
- Add `PATTERN_ONLY_SIGNAL = "pattern_only_signal"` to `SuppressionReason` enum
- Implement `evaluate_pattern_only_suppression(summary, pattern_signal_count, company_signal_count, macro_signal_count) -> bool`
- When pattern-based signals are the sole basis for a trend direction change, force recommendation to `mode='informational'` and append pattern-only caveat to thesis
- _Requirements: 9.3_
- [x] 11.2 Write property test for pattern-only suppression
- **Property 18: Pattern-only suppression**
- **Validates: Requirements 9.3**
- [x] 12. Competitive layer toggle and API endpoints
- [x] 12.1 Implement competitive toggle and status endpoints in `services/api/app.py`
- Add `GET /api/admin/competitive/status` returning current enabled/disabled state from `risk_configs` table
- Add `PUT /api/admin/competitive/toggle` to switch competitive layer on/off, persisting to `risk_configs` and recording an audit event with previous state, new state, and operator
- Toggle state is read from PostgreSQL at the start of each aggregation cycle (no caching)
- When disabled, pattern mining remains queryable via API but signal propagation is skipped during aggregation
- When re-enabled, resume computing signals using latest historical data including intelligence ingested while disabled
- _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5, 6.7_
- [x] 12.2 Implement pattern and competitive signal query endpoints in `services/api/app.py`
- Add `GET /api/patterns/{ticker}` — historical patterns for a company, filterable by catalyst_type and time_horizon
- Add `GET /api/patterns/{ticker}/competitors` — cross-company patterns showing how this company's catalysts affected competitors
- Add `GET /api/patterns/{ticker}/competitive-signals` — recent competitive signals targeting this company
- Add `GET /api/patterns/{ticker}/decisions` — major corporate decision history with trend outcomes and pattern statistics
- Include sample_count, outcome distribution, pattern_confidence, and date range in all responses
- _Requirements: 10.1, 10.2, 10.3, 10.4, 11.4, 11.6_
- [x] 13. Checkpoint — Ensure API endpoints and toggle logic work correctly
- Ensure all tests pass, ask the user if questions arise.
- [x] 14. Lake publisher extensions
- [x] 14.1 Add competitive fact publishers to the lake publisher service
- Implement `publish_competitor_relationship_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/competitor_relationships/dt={date}/`
- Implement `publish_competitive_signal_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/competitive_signals/dt={date}/target_ticker={ticker}/`
- Register new fact types in the lake publisher's job processing loop
- _Requirements: 7.3, 7.4_
- [x] 15. Signal propagation wiring into aggregation pipeline
- [x] 15.1 Wire signal propagation into the aggregation worker
- After document intelligence is produced for a company, trigger signal propagation for the company's competitors
- In the aggregation cycle, call `propagate_signals` for each new document intelligence record when competitive layer is enabled
- Handle sustained propagation errors: after configurable threshold (default 5 consecutive failures), alert operators and continue with company-specific + macro signals only
- _Requirements: 4.1, 9.4_
- [x] 15.2 Wire pattern mining into the aggregation cycle
- During `aggregate_company_window`, call pattern matcher for self-company patterns and collect competitive signals for the ticker
- Merge resulting WeightedSignals into the signal list before trend computation
- Ensure evidence references include pattern signal source document IDs
- _Requirements: 5.1, 5.4_
- [x] 16. Checkpoint — Ensure full backend pipeline works end-to-end
- Ensure all tests pass, ask the user if questions arise.
- [x] 17. Dashboard — Competitors panel and historical patterns
- [x] 17.1 Add competitors panel to Company Detail page
- On `frontend/src/pages/CompanyDetail.tsx`, add a Competitors tab showing active competitor relationships with ticker, relationship_type, strength score, source (manual/inferred)
- Add API hooks for `GET /companies/{company_id}/competitors` in `frontend/src/api/hooks.ts`
- Add infer button triggering `POST /companies/{company_id}/competitors/infer`
- _Requirements: 8.1_
- [x] 17.2 Add historical patterns panel to Company Detail page
- On `frontend/src/pages/CompanyDetail.tsx`, add a Historical Patterns tab showing recent patterns: catalyst_type, outcome distribution (bullish_pct, bearish_pct), sample_count, pattern_confidence
- Add API hook for `GET /api/patterns/{ticker}`
- _Requirements: 8.2_
- [x] 17.3 Add competitive signals panel to Company Detail page
- On `frontend/src/pages/CompanyDetail.tsx`, add a Competitive Signals tab showing incoming signals: source ticker, catalyst_type, signal_direction, signal_strength
- Add API hook for `GET /api/patterns/{ticker}/competitive-signals`
- Click-through on a signal shows full detail: source company, source document, catalyst_type, historical pattern statistics, competitor relationship
- _Requirements: 8.5, 8.4_
- [x] 17.4 Add corporate decision timeline to Company Detail page
- On `frontend/src/pages/CompanyDetail.tsx`, add a Decisions tab showing major_corporate_decision events: catalyst type, date, summary, trend outcome that followed
- Add API hook for `GET /api/patterns/{ticker}/decisions`
- _Requirements: 11.4_
- [x] 17.5 Add pattern-based evidence indicators to Trend detail page
- On `frontend/src/pages/TrendDetail.tsx`, visually distinguish pattern-based and competitive signal evidence from company-specific and macro evidence (badge/icon differentiation)
- _Requirements: 8.3_
- [x] 17.6 Add competitive toggle to Trading Controls page
- On `frontend/src/pages/Trading.tsx`, add competitive signal layer enable/disable switch alongside existing macro toggle, with confirmation dialog
- Add API hooks for `GET /api/admin/competitive/status` and `PUT /api/admin/competitive/toggle`
- _Requirements: 6.6_
- [x] 18. Checkpoint — Ensure frontend pages render and integrate with API
- Ensure all tests pass, ask the user if questions arise.
- [x] 19. Integration wiring and final validation
- [x] 19.1 Write integration tests for competitive pipeline end-to-end
- Test document intelligence → pattern mining → signal propagation → aggregation flow
- Test lake publisher writes correct Parquet partitions for competitor relationships and competitive signals
- Test competitive toggle state change propagates to next aggregation cycle
- Test toggle disable/re-enable cycle preserves data integrity
- _Requirements: 4.1, 5.1, 6.1, 6.4, 7.3_
- [x] 19.2 Write unit tests for API endpoints and dashboard components
- Test competitor CRUD endpoints return correct data and error codes (400, 404, 409)
- Test pattern query endpoints return correct data with filtering
- Test competitive toggle endpoint persists state and records audit event
- Test auto-inference endpoint with empty data, single company, no co-mentions
- Add MSW handlers for competitive endpoints in `frontend/src/test/mocks/handlers.ts`
- Test competitors panel, historical patterns panel, competitive signals panel, and decision timeline render correctly
- _Requirements: 1.4, 2.5, 6.5, 8.1, 8.2, 8.5, 10.1, 10.4_
- [x] 20. Final checkpoint — Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation after each major phase
- Property tests validate the 21 correctness properties from the design using Hypothesis
- The design uses Python throughout — no language selection needed
- No new Kubernetes deployments required; all modules extend existing services
- Next migration number is 017 (016 is global-news-interpolation)
- Competitive layer follows the same toggle/suppression/aggregation pattern as the macro layer for consistency