582 lines
29 KiB
Markdown
582 lines
29 KiB
Markdown
# Competitive Intelligence & Historical Pattern Matching Layer — Design
|
||
|
||
## Overview
|
||
|
||
This design adds a third signal layer to the Stonks Oracle aggregation engine: competitive intelligence and historical pattern matching. The layer mines existing PostgreSQL data — `document_impact_records`, `trend_windows`, and `document_company_mentions` — to identify how similar catalyst types resolved historically for a company and its competitors, then feeds pattern-based signals into the aggregation engine alongside company-specific (layer 1) and macro (layer 2) signals.
|
||
|
||
The design follows the same integration pattern as the macro interpolation layer: new modules within existing services, a runtime toggle in `risk_configs`, and the same `WeightedSignal` abstraction for aggregation. No new Kubernetes deployments are required.
|
||
|
||
### Design Rationale
|
||
|
||
- **Mine existing data, no new ingestion**: All pattern signals derive from data already in PostgreSQL — document intelligence, impact records, and trend windows. No new external data sources or ingestion pipelines.
|
||
- **Reuse existing scoring pipeline**: Pattern signals convert to `WeightedSignal` objects using the same `compute_signal_weight` function, ensuring consistent recency decay, confidence gating, and contradiction detection.
|
||
- **Parallel to macro layer**: The competitive layer toggle, suppression logic, and aggregation integration mirror the macro layer's architecture for consistency.
|
||
- **Safety-first**: Low-confidence patterns (< 0.3) are excluded, pattern-only trend shifts are forced to informational mode, and the entire layer is independently toggleable.
|
||
- **Competitor relationships as first-class entities**: Both operator-defined and auto-inferred relationships, with strength scores that gate signal propagation.
|
||
|
||
## Architecture
|
||
|
||
The competitive intelligence layer adds five logical components within existing services:
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
subgraph SymReg["Symbol Registry (existing)"]
|
||
CR[Competitor Registry]
|
||
AI[Auto-Inference Engine]
|
||
end
|
||
|
||
subgraph Aggregation["Aggregation Service (existing)"]
|
||
PM[Pattern Matcher]
|
||
SPE[Signal Propagation Engine]
|
||
AE[Aggregation Engine]
|
||
end
|
||
|
||
subgraph Recommendation["Recommendation Service (existing)"]
|
||
PS[Pattern-Only Suppression]
|
||
end
|
||
|
||
subgraph LakePublisher["Lake Publisher (existing)"]
|
||
LP[Competitive Fact Publisher]
|
||
end
|
||
|
||
subgraph QueryAPI["Query API (existing)"]
|
||
PA[Pattern API Endpoints]
|
||
CT[Competitive Toggle Endpoint]
|
||
end
|
||
|
||
subgraph Dashboard["Dashboard (existing)"]
|
||
CP[Competitors Panel]
|
||
HP[Historical Patterns Panel]
|
||
CS[Competitive Signals Panel]
|
||
DT[Decision Timeline]
|
||
end
|
||
|
||
CR -->|competitor relationships| SPE
|
||
AI -->|inferred relationships| CR
|
||
PM -->|historical patterns| SPE
|
||
PM -->|self-company patterns| AE
|
||
SPE -->|competitive signals| AE
|
||
AE -->|trend summaries| PS
|
||
SPE -->|signal records| LP
|
||
CT -->|toggle state| AE
|
||
PA --> CP
|
||
PA --> HP
|
||
PA --> CS
|
||
PA --> DT
|
||
```
|
||
|
||
### Data Flow
|
||
|
||
1. **Competitor Management**: Operators define competitor relationships via the Symbol Registry API, or trigger auto-inference from sector/industry and document co-mentions. Relationships are stored in `competitor_relationships`.
|
||
|
||
2. **Pattern Mining**: When the aggregation engine runs for a ticker, the Pattern Matcher queries `document_impact_records` joined with `trend_windows` to find historical instances of the same catalyst type. It computes outcome statistics (bullish_pct, bearish_pct, avg_strength) and a pattern_confidence score.
|
||
|
||
3. **Signal Propagation**: The Signal Propagation Engine looks up the ticker's competitors, queries the Pattern Matcher for cross-company historical patterns, and produces `competitive_signal_records` weighted by relationship strength × pattern confidence × source impact score.
|
||
|
||
4. **Aggregation**: Pattern signals (both self-company and competitive) are converted to `WeightedSignal` objects and merged into the existing signal list. The competitive layer toggle is checked from `risk_configs` at the start of each cycle.
|
||
|
||
5. **Recommendation Safety**: Pattern-only trend shifts (no supporting company-specific or macro signals) are forced to informational mode with a pattern-only caveat.
|
||
|
||
6. **Lake Publication**: Competitor relationships and competitive signal facts are published as partitioned Parquet datasets.
|
||
|
||
## Components and Interfaces
|
||
|
||
### Competitor Registry
|
||
|
||
**Location**: `services/symbol_registry/competitors.py` (new module, registered as a FastAPI router in `app.py`)
|
||
|
||
Manages competitor relationships with CRUD operations and audit logging.
|
||
|
||
```python
|
||
class CompetitorRelationshipCreate(BaseModel):
|
||
company_b_id: str
|
||
relationship_type: str # direct_rival | same_sector | overlapping_products | supply_chain_adjacent
|
||
strength: float # [0, 1]
|
||
bidirectional: bool = True
|
||
source: str = "manual" # manual | inferred
|
||
|
||
class CompetitorRelationship(BaseModel):
|
||
id: str
|
||
company_a_id: str
|
||
company_b_id: str
|
||
relationship_type: str
|
||
strength: float
|
||
bidirectional: bool
|
||
source: str
|
||
active: bool
|
||
created_at: datetime
|
||
updated_at: datetime
|
||
```
|
||
|
||
**API Endpoints** (on Symbol Registry):
|
||
- `POST /companies/{company_id}/competitors` — create relationship
|
||
- `GET /companies/{company_id}/competitors` — list relationships (ordered by strength desc)
|
||
- `PUT /companies/{company_id}/competitors/{relationship_id}` — update relationship
|
||
- `DELETE /companies/{company_id}/competitors/{relationship_id}` — soft-delete (set active=false)
|
||
- `POST /companies/{company_id}/competitors/infer` — trigger auto-inference
|
||
|
||
**Auto-Inference Logic** (`services/symbol_registry/competitor_inference.py`):
|
||
1. Query companies sharing the same sector and industry
|
||
2. Rank candidates by co-mention frequency in `document_company_mentions`
|
||
3. Compute strength = `0.3 * sector_match + 0.7 * normalized_co_mention_count`
|
||
4. Upsert relationships with `source='inferred'`, refreshing strength on re-inference
|
||
5. Return candidate list for operator review
|
||
|
||
### Pattern Matcher
|
||
|
||
**Location**: `services/aggregation/pattern_matcher.py`
|
||
|
||
Queries historical data to find how similar catalyst types resolved for a company or its competitors.
|
||
|
||
```python
|
||
@dataclass
|
||
class HistoricalPattern:
|
||
source_ticker: str # company that received the original catalyst
|
||
target_ticker: str # company being evaluated (same as source for self-patterns)
|
||
catalyst_type: str
|
||
time_horizon: str # 1d | 7d | 30d
|
||
sample_count: int
|
||
bullish_pct: float # [0, 1]
|
||
bearish_pct: float # [0, 1]
|
||
avg_strength: float # [0, 1]
|
||
avg_time_to_resolution: float # days
|
||
pattern_confidence: float # [0, 1]
|
||
data_start: datetime
|
||
data_end: datetime
|
||
tier: str # major_corporate_decision | routine_signal
|
||
insufficient_data: bool # True when sample_count < 3
|
||
```
|
||
|
||
**Core Functions**:
|
||
- `find_self_patterns(pool, ticker, catalyst_type, horizons) -> list[HistoricalPattern]`
|
||
- `find_cross_company_patterns(pool, source_ticker, target_ticker, catalyst_type, horizons) -> list[HistoricalPattern]`
|
||
- `compute_pattern_confidence(sample_count, outcome_consistency, data_recency_days) -> float`
|
||
- `classify_catalyst_tier(catalyst_type) -> str` — returns `major_corporate_decision` or `routine_signal`
|
||
|
||
**Pattern Confidence Formula**:
|
||
```
|
||
sample_factor = min(sample_count / 20, 1.0) # diminishing returns above 20
|
||
consistency = max(bullish_pct, bearish_pct) # how uniform outcomes are
|
||
recency_factor = 1.0 if newest_within_90d else 0.7 if newest_within_180d else 0.4
|
||
confidence = sample_factor * 0.4 + consistency * 0.4 + recency_factor * 0.2
|
||
```
|
||
|
||
**Insufficient Data**: When `sample_count < 3`, confidence is capped at 0.25 and `insufficient_data = True`.
|
||
|
||
**Staleness Decay** (Req 9.2): When no instances exist in the last 90 days and all data is older than 180 days, a 0.5 decay penalty is applied to confidence.
|
||
|
||
**Catalyst Tier Classification** (Req 11.1):
|
||
- `major_corporate_decision`: catalyst types `m_and_a`, `legal`, `restructuring`, `leadership_change`, `strategic_pivot`, `buyback`, `dividend_change`
|
||
- `routine_signal`: all other catalyst types
|
||
- Major decisions use 365-day lookback; routine signals use 180-day lookback
|
||
- Major decisions receive a 1.3× base weight multiplier on pattern_confidence
|
||
|
||
**Historical Query**: Only considers `document_impact_records` linked to `document_intelligence` with `validation_status = 'valid'` and `documents` with `status != 'rejected'`.
|
||
|
||
### Signal Propagation Engine
|
||
|
||
**Location**: `services/aggregation/signal_propagation.py`
|
||
|
||
Evaluates incoming document intelligence, identifies competitors, queries historical patterns, and produces competitive signals.
|
||
|
||
```python
|
||
@dataclass
|
||
class CompetitiveSignalRecord:
|
||
source_document_id: str
|
||
source_ticker: str
|
||
target_ticker: str
|
||
catalyst_type: str
|
||
pattern_confidence: float
|
||
signal_direction: str # bullish | bearish
|
||
signal_strength: float # [0, 1]
|
||
relationship_strength: float
|
||
computed_at: datetime
|
||
```
|
||
|
||
**Core Functions**:
|
||
- `propagate_signals(pool, ticker, catalyst_type, impact_score, document_id, config) -> list[CompetitiveSignalRecord]`
|
||
- `build_pattern_weighted_signals(patterns, competitive_signals, reference_time, window, config) -> list[WeightedSignal]`
|
||
|
||
**Signal Weighting**:
|
||
```
|
||
signal_strength = pattern.avg_strength * relationship.strength * pattern.pattern_confidence * source_impact_score
|
||
signal_direction = "bullish" if pattern.bullish_pct > pattern.bearish_pct else "bearish"
|
||
```
|
||
|
||
**Propagation Threshold** (Req 4.5): Skip propagation when `relationship.strength < 0.2` (configurable).
|
||
|
||
**Confidence Threshold** (Req 9.1): Exclude patterns with `pattern_confidence < 0.3` (configurable).
|
||
|
||
### Aggregation Engine Extensions
|
||
|
||
**Location**: Modified `services/aggregation/worker.py`
|
||
|
||
The existing `aggregate_company_window` function is extended to:
|
||
1. Check the competitive layer toggle from `risk_configs` (same pattern as macro toggle)
|
||
2. Query self-company historical patterns for active catalyst types in the window
|
||
3. Query competitive signals targeting this ticker
|
||
4. Convert pattern/competitive signals to `WeightedSignal` objects
|
||
5. Merge with company-specific and macro signals before computing the trend summary
|
||
|
||
**New config field on `AggregationConfig`**:
|
||
```python
|
||
competitive_signal_weight: float = 0.2 # relative weight of pattern signals
|
||
competitive_enabled: bool = True # runtime toggle state
|
||
```
|
||
|
||
**Pattern signal conversion**: Each pattern signal is converted to a `WeightedSignal` using:
|
||
- `document_id` = source document that triggered the pattern lookup (for evidence tracing)
|
||
- `sentiment_value` = +1.0 if pattern direction is bullish, -1.0 if bearish
|
||
- `impact_score` = `signal_strength * competitive_signal_weight`
|
||
- Recency decay uses the source document's publication time
|
||
- Confidence gating uses `pattern_confidence` as the extraction confidence
|
||
|
||
**No-degradation guarantee** (Req 5.5): When no patterns or competitive signals exist, the aggregation produces identical output to the two-layer engine.
|
||
|
||
### Pattern-Only Suppression
|
||
|
||
**Location**: Extended `services/recommendation/suppression.py`
|
||
|
||
New suppression check mirroring `evaluate_macro_only_suppression`:
|
||
|
||
```python
|
||
PATTERN_ONLY_CAVEAT = (
|
||
"[Pattern-only signal] This trend direction is driven solely by historical "
|
||
"pattern and competitive signals with no supporting company-specific or macro "
|
||
"evidence. Recommendation is informational only."
|
||
)
|
||
|
||
def evaluate_pattern_only_suppression(
|
||
summary: TrendSummary,
|
||
pattern_signal_count: int,
|
||
company_signal_count: int,
|
||
macro_signal_count: int,
|
||
) -> bool
|
||
```
|
||
|
||
New `SuppressionReason` enum value: `PATTERN_ONLY_SIGNAL = "pattern_only_signal"`
|
||
|
||
### Query API Extensions
|
||
|
||
**Location**: Extended `services/api/app.py`
|
||
|
||
New endpoints:
|
||
- `GET /api/patterns/{ticker}` — historical patterns for a company, filterable by `catalyst_type` and `time_horizon`
|
||
- `GET /api/patterns/{ticker}/competitors` — cross-company patterns showing how this company's catalysts affected competitors
|
||
- `GET /api/patterns/{ticker}/competitive-signals` — recent competitive signals targeting this company
|
||
- `GET /api/patterns/{ticker}/decisions` — major corporate decision history with trend outcomes
|
||
- `GET /api/admin/competitive/status` — competitive layer enabled/disabled state
|
||
- `PUT /api/admin/competitive/toggle` — toggle competitive layer on/off
|
||
|
||
### Dashboard Extensions
|
||
|
||
**Location**: Extended `frontend/src/`
|
||
|
||
New panels on Company Detail page (new tabs alongside existing sources/aliases/macro):
|
||
- **Competitors tab**: Active competitor relationships with ticker, relationship_type, strength, source
|
||
- **Historical Patterns tab**: Recent patterns for the company — catalyst_type, outcome distribution, sample_count, confidence
|
||
- **Competitive Signals tab**: Incoming competitive signals — source ticker, catalyst_type, direction, strength
|
||
- **Decisions tab**: Corporate decision timeline — major events with catalyst type, date, summary, trend outcome
|
||
|
||
Trend detail page extensions:
|
||
- Visual distinction for pattern-based and competitive signal evidence (badge/icon differentiation)
|
||
- Click-through on competitive signals showing full signal detail
|
||
|
||
Trading Controls page:
|
||
- Competitive layer toggle alongside existing macro toggle, with confirmation dialog
|
||
|
||
## Data Models
|
||
|
||
### New PostgreSQL Tables (Migration 017)
|
||
|
||
#### `competitor_relationships`
|
||
```sql
|
||
CREATE TABLE competitor_relationships (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
company_a_id UUID NOT NULL REFERENCES companies(id),
|
||
company_b_id UUID NOT NULL REFERENCES companies(id),
|
||
relationship_type VARCHAR(30) NOT NULL,
|
||
strength FLOAT NOT NULL DEFAULT 0.5,
|
||
bidirectional BOOLEAN NOT NULL DEFAULT TRUE,
|
||
source VARCHAR(20) NOT NULL DEFAULT 'manual',
|
||
active BOOLEAN NOT NULL DEFAULT TRUE,
|
||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||
CONSTRAINT chk_relationship_type CHECK (
|
||
relationship_type IN ('direct_rival', 'same_sector', 'overlapping_products', 'supply_chain_adjacent')
|
||
),
|
||
CONSTRAINT chk_strength CHECK (strength >= 0 AND strength <= 1),
|
||
CONSTRAINT chk_source CHECK (source IN ('manual', 'inferred')),
|
||
CONSTRAINT chk_different_companies CHECK (company_a_id != company_b_id)
|
||
);
|
||
|
||
CREATE INDEX idx_competitor_rel_company_a ON competitor_relationships(company_a_id) WHERE active = TRUE;
|
||
CREATE INDEX idx_competitor_rel_company_b ON competitor_relationships(company_b_id) WHERE active = TRUE;
|
||
CREATE UNIQUE INDEX idx_competitor_rel_unique_pair ON competitor_relationships(
|
||
LEAST(company_a_id, company_b_id), GREATEST(company_a_id, company_b_id)
|
||
) WHERE active = TRUE;
|
||
```
|
||
|
||
#### `competitive_signal_records`
|
||
```sql
|
||
CREATE TABLE competitive_signal_records (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
source_document_id UUID REFERENCES documents(id),
|
||
source_ticker VARCHAR(20) NOT NULL,
|
||
target_ticker VARCHAR(20) NOT NULL,
|
||
catalyst_type VARCHAR(50) NOT NULL,
|
||
pattern_confidence FLOAT NOT NULL,
|
||
signal_direction VARCHAR(20) NOT NULL,
|
||
signal_strength FLOAT NOT NULL,
|
||
relationship_strength FLOAT NOT NULL,
|
||
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||
);
|
||
|
||
CREATE INDEX idx_competitive_signals_target ON competitive_signal_records(target_ticker, computed_at DESC);
|
||
CREATE INDEX idx_competitive_signals_source ON competitive_signal_records(source_ticker, computed_at DESC);
|
||
```
|
||
|
||
### New Pydantic Schemas
|
||
|
||
Added to `services/shared/schemas.py`:
|
||
|
||
```python
|
||
class RelationshipType(str, Enum):
|
||
DIRECT_RIVAL = "direct_rival"
|
||
SAME_SECTOR = "same_sector"
|
||
OVERLAPPING_PRODUCTS = "overlapping_products"
|
||
SUPPLY_CHAIN_ADJACENT = "supply_chain_adjacent"
|
||
|
||
class CatalystTier(str, Enum):
|
||
MAJOR_CORPORATE_DECISION = "major_corporate_decision"
|
||
ROUTINE_SIGNAL = "routine_signal"
|
||
|
||
# Major corporate decision catalyst types (Req 11.1)
|
||
MAJOR_DECISION_CATALYSTS = frozenset({
|
||
"m_and_a", "legal", "restructuring", "leadership_change",
|
||
"strategic_pivot", "buyback", "dividend_change",
|
||
})
|
||
```
|
||
|
||
### New `CompetitiveConfig` in `services/shared/config.py`
|
||
|
||
```python
|
||
@dataclass
|
||
class CompetitiveConfig:
|
||
competitive_signal_weight: float = 0.2
|
||
competitive_enabled: bool = True
|
||
pattern_confidence_threshold: float = 0.3
|
||
propagation_strength_threshold: float = 0.2
|
||
routine_lookback_days: int = 180
|
||
major_decision_lookback_days: int = 365
|
||
major_decision_weight_multiplier: float = 1.3
|
||
staleness_window_days: int = 180
|
||
staleness_recent_days: int = 90
|
||
staleness_decay_penalty: float = 0.5
|
||
min_pattern_samples: int = 3
|
||
```
|
||
|
||
### Analytical Lake Datasets
|
||
|
||
New fact tables published to MinIO under `stonks-lakehouse/`:
|
||
|
||
- `lake.competitor_relationships` — partitioned by `dt`, columns: id, company_a_id, company_b_id, relationship_type, strength, bidirectional, source, active, created_at
|
||
- `lake.competitive_signals` — partitioned by `dt` and `target_ticker`, columns: id, source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, computed_at
|
||
|
||
|
||
|
||
## Correctness Properties
|
||
|
||
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
||
|
||
### Property 1: Competitor relationship persistence round-trip
|
||
|
||
*For any* valid CompetitorRelationship object with valid company IDs, relationship_type, strength in [0, 1], bidirectional flag, and source, persisting it to PostgreSQL and reading it back SHALL produce an equivalent object with all fields preserved.
|
||
|
||
**Validates: Requirements 1.1, 7.1**
|
||
|
||
### Property 2: Competitor query completeness and ordering
|
||
|
||
*For any* set of competitor relationships involving a company (as either company_a or company_b), querying competitors for that company SHALL return all active relationships containing that company, and the results SHALL be ordered by strength descending.
|
||
|
||
**Validates: Requirements 1.2**
|
||
|
||
### Property 3: Soft-delete preserves row
|
||
|
||
*For any* active competitor relationship, deleting it SHALL set `active = False` while preserving the row in the database with all original field values intact.
|
||
|
||
**Validates: Requirements 1.3**
|
||
|
||
### Property 4: Auto-inference produces valid candidates
|
||
|
||
*For any* company with a defined sector and industry, running auto-inference SHALL produce only candidate relationships where the candidate company shares the same sector and industry, and all produced relationships SHALL have `source = 'inferred'` with strength in [0, 1].
|
||
|
||
**Validates: Requirements 2.1, 2.3**
|
||
|
||
### Property 5: Auto-inference ranks by co-mention frequency
|
||
|
||
*For any* set of candidate competitors with different co-mention counts in `document_company_mentions`, the auto-inferred relationships SHALL have strength scores that are monotonically non-decreasing with co-mention frequency — candidates with more co-mentions receive higher or equal strength scores.
|
||
|
||
**Validates: Requirements 2.2**
|
||
|
||
### Property 6: Auto-inference idempotence
|
||
|
||
*For any* company, running auto-inference twice in succession SHALL produce the same set of relationships (no duplicates created), with strength scores updated to reflect the latest co-mention data.
|
||
|
||
**Validates: Requirements 2.4**
|
||
|
||
### Property 7: Pattern computation correctness
|
||
|
||
*For any* set of historical `document_impact_records` and `trend_windows` for a company-catalyst pair (or cross-company pair), the computed HistoricalPattern SHALL have: `sample_count` equal to the actual number of matching records, `bullish_pct + bearish_pct + neutral_pct ≈ 1.0`, `avg_strength` equal to the mean of the matched trend strengths, and all fields within their valid ranges.
|
||
|
||
**Validates: Requirements 3.1, 3.2, 4.2**
|
||
|
||
### Property 8: Pattern confidence monotonicity
|
||
|
||
*For any* two HistoricalPatterns where one has strictly more samples, more consistent outcomes, and more recent data than the other (all else equal), the first SHALL have a higher or equal `pattern_confidence`. Additionally, *for any* two patterns with identical statistics but different tiers, the `major_corporate_decision` pattern SHALL have higher confidence than the `routine_signal` pattern.
|
||
|
||
**Validates: Requirements 3.3, 11.2**
|
||
|
||
### Property 9: Insufficient data threshold
|
||
|
||
*For any* HistoricalPattern with `sample_count < 3`, the `pattern_confidence` SHALL be below 0.3 and `insufficient_data` SHALL be True.
|
||
|
||
**Validates: Requirements 3.4**
|
||
|
||
### Property 10: Valid-only data filtering
|
||
|
||
*For any* set of `document_impact_records` containing records linked to invalid intelligence (`validation_status != 'valid'`) or rejected documents (`status = 'rejected'`), the Pattern_Matcher SHALL exclude those records from pattern computation — the resulting `sample_count` SHALL only reflect valid, non-rejected records.
|
||
|
||
**Validates: Requirements 3.5**
|
||
|
||
### Property 11: Competitive signal strength monotonicity
|
||
|
||
*For any* competitive signal computation, increasing the relationship strength, pattern confidence, or source impact score (while holding others constant) SHALL produce a `signal_strength` that is greater than or equal to the previous value.
|
||
|
||
**Validates: Requirements 4.3**
|
||
|
||
### Property 12: Signal propagation threshold gating
|
||
|
||
*For any* competitor relationship with `strength < 0.2` (configurable), the Signal_Propagation_Engine SHALL produce zero competitive signals for that pair. Similarly, *for any* HistoricalPattern with `pattern_confidence < 0.3` (configurable), the pattern SHALL be excluded from competitive signal computation.
|
||
|
||
**Validates: Requirements 4.5, 9.1**
|
||
|
||
### Property 13: Pattern signal to WeightedSignal conversion
|
||
|
||
*For any* pattern-based signal converted to a WeightedSignal, the resulting object SHALL have: `sentiment_value` of +1.0 for bullish patterns or -1.0 for bearish patterns, `impact_score` equal to `signal_strength * competitive_signal_weight`, confidence gating applied using `pattern_confidence`, and recency decay based on the source document's publication time.
|
||
|
||
**Validates: Requirements 5.2**
|
||
|
||
### Property 14: Pattern-company contradiction detection
|
||
|
||
*For any* set of signals where pattern-based signals have a direction opposing company-specific signals (e.g., pattern is bearish while company signals are positive), the resulting trend summary's `contradiction_score` SHALL be greater than zero and `disagreement_details` SHALL contain at least one entry.
|
||
|
||
**Validates: Requirements 5.3**
|
||
|
||
### Property 15: Pattern evidence traceability
|
||
|
||
*For any* trend summary that includes pattern-based or competitive signal contributions, the `top_supporting_evidence` or `top_opposing_evidence` lists SHALL contain the `source_document_id` of at least one contributing pattern signal.
|
||
|
||
**Validates: Requirements 5.4**
|
||
|
||
### Property 16: No-degradation and disabled-layer equivalence
|
||
|
||
*For any* company with no historical patterns or competitive signals in the aggregation window, the trend summary produced with the competitive layer enabled SHALL be identical to the summary produced with it disabled. Furthermore, *for any* aggregation run with the competitive layer disabled, the output SHALL be identical to company+macro-only aggregation regardless of existing pattern data.
|
||
|
||
**Validates: Requirements 5.5, 6.2**
|
||
|
||
### Property 17: Staleness decay penalty
|
||
|
||
*For any* HistoricalPattern where all historical instances are older than 180 days and no instances exist within the last 90 days, the `pattern_confidence` SHALL be strictly less than the confidence computed for an identical pattern with at least one instance within the last 90 days.
|
||
|
||
**Validates: Requirements 9.2**
|
||
|
||
### Property 18: Pattern-only suppression
|
||
|
||
*For any* trend summary where the trend direction is driven solely by pattern-based and competitive signals (no company-specific or macro signals support the direction), the resulting recommendation SHALL have `mode = 'informational'` and the thesis SHALL contain a pattern-only caveat.
|
||
|
||
**Validates: Requirements 9.3**
|
||
|
||
### Property 19: Catalyst tier classification determinism
|
||
|
||
*For any* catalyst type, the tier classification SHALL be deterministic: `m_and_a`, `legal`, `restructuring`, `leadership_change`, `strategic_pivot`, `buyback`, and `dividend_change` SHALL always map to `major_corporate_decision`; all other catalyst types SHALL map to `routine_signal`.
|
||
|
||
**Validates: Requirements 11.1**
|
||
|
||
### Property 20: Major decision extended lookback
|
||
|
||
*For any* pattern mining query for a `major_corporate_decision` catalyst type, the lookback window SHALL be 365 days. *For any* `routine_signal` catalyst type, the lookback window SHALL be 180 days. This applies to both self-company and cross-company pattern queries.
|
||
|
||
**Validates: Requirements 11.3, 11.5**
|
||
|
||
### Property 21: Competitive signal persistence round-trip
|
||
|
||
*For any* valid CompetitiveSignalRecord with all required fields (source_document_id, source_ticker, target_ticker, catalyst_type, pattern_confidence, signal_direction, signal_strength, relationship_strength, computed_at), persisting it to PostgreSQL and reading it back SHALL produce an equivalent record with all fields preserved.
|
||
|
||
**Validates: Requirements 4.4, 7.2**
|
||
|
||
## Error Handling
|
||
|
||
### Pattern Mining Failures
|
||
- Database errors during historical pattern queries are logged and the pattern is treated as "no data" — the aggregation engine continues with company-specific and macro signals only.
|
||
- Malformed or missing `trend_windows` data for a historical period results in that period being excluded from pattern computation (reduced sample_count) rather than failing the entire query.
|
||
|
||
### Signal Propagation Failures
|
||
- If competitor relationship lookup fails, propagation is skipped for that ticker and logged. Aggregation continues with self-company patterns only.
|
||
- If pattern mining fails for a specific competitor, that competitor is skipped. Other competitors are still processed.
|
||
- Sustained propagation errors exceeding a configurable threshold (default 5 consecutive failures) trigger an operator alert via the existing alerting framework.
|
||
|
||
### Auto-Inference Failures
|
||
- If the `document_company_mentions` table is empty or the query fails, auto-inference returns an empty candidate list with a warning. No relationships are created or modified.
|
||
- If sector/industry data is missing for the target company, inference is skipped with a 400 response.
|
||
|
||
### Competitor Registry Failures
|
||
- Attempting to create a relationship between the same company (company_a_id == company_b_id) returns a 400 error.
|
||
- Attempting to create a duplicate active relationship returns a 409 conflict.
|
||
- Foreign key violations (non-existent company IDs) return a 404 error.
|
||
|
||
### Runtime Toggle Safety
|
||
- Toggle state is read from PostgreSQL at the start of each aggregation cycle — same pattern as the macro toggle, no caching.
|
||
- Toggle changes are audit-logged with operator identity, previous state, and new state.
|
||
- Disabling the competitive layer does not delete any data — pattern mining remains queryable via the API, only aggregation integration is skipped.
|
||
|
||
### Graceful Degradation
|
||
- The competitive layer is designed to be fully optional. Any failure in pattern mining, signal propagation, or competitive signal computation results in the aggregation engine falling back to company-specific + macro signals with no degradation of existing behavior.
|
||
|
||
## Testing Strategy
|
||
|
||
### Property-Based Testing
|
||
|
||
This feature is well-suited for property-based testing. The core logic — pattern confidence computation, signal strength weighting, threshold gating, catalyst tier classification, and overlap/monotonicity properties — consists of pure functions with clear input/output behavior and a large input space.
|
||
|
||
**Library**: [Hypothesis](https://hypothesis.readthedocs.io/) for Python property-based testing.
|
||
|
||
**Configuration**: Minimum 100 iterations per property test.
|
||
|
||
**Tag format**: `Feature: competitive-historical-patterns, Property {number}: {property_text}`
|
||
|
||
Each correctness property maps to one property-based test. Generators will produce:
|
||
- Random `CompetitorRelationship` objects with valid relationship types, strength in [0, 1], and source values
|
||
- Random `HistoricalPattern` objects with valid sample counts, percentage distributions summing to ~1.0, and confidence scores
|
||
- Random `CompetitiveSignalRecord` objects with valid direction, strength, and confidence values
|
||
- Random sets of `WeightedSignal` objects with mixed sentiment values for contradiction testing
|
||
- Random catalyst types drawn from both major decision and routine signal categories
|
||
|
||
### Unit Testing
|
||
|
||
Unit tests complement property tests for specific examples and edge cases:
|
||
- API endpoint response codes and error handling (CRUD operations, validation errors, 404s, 409s)
|
||
- Dashboard component rendering with mock data (competitors panel, patterns panel, signals panel, decision timeline)
|
||
- Toggle state transitions and audit logging
|
||
- Auto-inference with empty data, single company, no co-mentions
|
||
- Pattern mining with zero results, exactly 3 results (boundary), mixed valid/invalid records
|
||
|
||
### Integration Testing
|
||
|
||
Integration tests verify end-to-end flows:
|
||
- Full aggregation cycle with competitive layer enabled: document intelligence → pattern mining → signal propagation → trend summary
|
||
- Lake publisher producing Parquet datasets for competitor relationships and competitive signals
|
||
- Toggle disable/re-enable cycle preserving data integrity
|
||
- API endpoints returning correct data from PostgreSQL
|
||
- Dashboard pages rendering with live API data
|