Files
stonks-oracle/.kiro/specs/global-news-interpolation/tasks.md
T

339 lines
22 KiB
Markdown

# Implementation Plan: Global News Interpolation Layer
## Overview
This plan implements a macro-level global news interpolation layer that ingests global/geopolitical news events, classifies them via Ollama, maps them to companies via exposure profiles, and feeds macro impact scores into the existing aggregation engine. The implementation extends existing services (extractor, aggregation, symbol registry, recommendation, API, lake publisher, dashboard) rather than creating new deployments. Tasks are ordered so each step builds on the previous, with property-based tests validating core scoring logic early.
## Tasks
- [x] 1. Database migration and shared schemas
- [x] 1.1 Create PostgreSQL migration `infra/migrations/016_global_news_interpolation.sql`
- Add `global_events` table with event_types, severity, affected_regions, affected_sectors, affected_commodities, summary, key_facts, estimated_duration, confidence, source_document_id FK, model metadata, created_at
- Add `macro_impact_records` table with event_id FK, company_id FK, ticker, macro_impact_score, impact_direction, contributing_factors, confidence, computed_at
- Add `exposure_profiles` table with company_id FK, geographic_revenue_mix, supply_chain_regions, key_input_commodities, regulatory_jurisdictions, market_position_tier, export_dependency_pct, source, confidence, version, active, created_at, updated_at
- Add `trend_projections` table with trend_window_id FK, projected_direction, projected_strength, projected_confidence, projection_horizon, driving_factors, macro_contribution_pct, diverges_from_current, computed_at
- Add indexes on `macro_impact_records(event_id)`, `macro_impact_records(company_id, computed_at)`, `macro_impact_records(ticker, computed_at)`, `exposure_profiles(company_id, active)`, `global_events(created_at)`, `trend_projections(trend_window_id)`
- _Requirements: 7.1, 7.2, 3.1, 12.5_
- [x] 1.2 Add new Pydantic schemas and enums to `services/shared/schemas.py`
- Add `ImpactType`, `SeverityLevel`, `MarketPositionTier`, `EstimatedDuration` enums
- Add `MACRO_EVENT = "macro_event"` to `DocumentType` enum
- Add `GlobalEventSchema`, `MacroImpactRecordSchema`, `ExposureProfileSchema`, `TrendProjectionSchema` Pydantic models
- _Requirements: 2.2, 4.5, 3.1, 12.1_
- [x] 1.3 Add macro-related Redis queue name to `services/shared/redis_keys.py`
- Add `QUEUE_MACRO_CLASSIFICATION = "macro_classification"` for event classification jobs
- _Requirements: 1.1_
- [x] 1.4 Add macro configuration fields to `services/shared/config.py`
- Add `macro_signal_weight`, `macro_enabled`, `macro_confidence_threshold`, `macro_short_term_staleness_hours`, `projection_confidence_threshold` fields to a new `MacroConfig` dataclass
- Add `macro: MacroConfig` to `AppConfig` with env var loading in `load_config()`
- _Requirements: 5.6, 10.1, 10.2, 12.9_
- [x] 2. Checkpoint — Ensure migration and schemas are consistent
- Ensure all tests pass, ask the user if questions arise.
- [x] 3. Event classifier module
- [x] 3.1 Implement `services/extractor/event_classifier.py`
- Implement `GlobalEvent` dataclass matching the design specification
- Implement `get_event_json_schema()` returning the Ollama structured output schema for event classification
- Implement `build_event_classification_prompt(text: str) -> str` with anti-hallucination instructions for macro event extraction
- Implement `classify_global_event(normalized_text, document_id, ollama_client) -> GlobalEvent` using the existing `OllamaClient` with retry logic
- Persist classification prompt, schema, model metadata, and raw output to MinIO under `stonks-llm-prompts/` and `stonks-llm-results/`
- Persist the `GlobalEvent` record to the `global_events` PostgreSQL table
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5_
- [x] 3.2 Write property test for GlobalEvent schema completeness
- **Property 2: Macro pipeline output schema completeness**
- **Validates: Requirements 2.2, 4.5**
- [x] 3.3 Write property test for multiple impact types preserved
- **Property 3: Multiple impact types preserved**
- **Validates: Requirements 2.4**
- [x] 4. Exposure profile management
- [x] 4.1 Implement `services/symbol_registry/exposure.py`
- Implement `ExposureProfile` Pydantic model for API request/response
- Implement `GET /companies/{company_id}/exposure` endpoint returning the current active profile
- Implement `PUT /companies/{company_id}/exposure` endpoint that archives the previous version (sets `active=FALSE`) and inserts a new version with incremented version number
- Implement `GET /companies/{company_id}/exposure/history` endpoint returning all profile versions ordered by version descending
- Register routes on the Symbol Registry FastAPI app
- _Requirements: 3.1, 3.3, 3.4_
- [x] 4.2 Write property test for exposure profile version history
- **Property 6: Exposure profile version history**
- **Validates: Requirements 3.3**
- [x] 4.3 Write property test for default exposure profile derivation
- **Property 5: Default exposure profile derivation**
- **Validates: Requirements 3.2**
- [x] 5. Interpolation engine — core scoring logic
- [x] 5.1 Implement `services/aggregation/interpolation.py`
- Implement `MacroImpactRecord` dataclass matching the design specification
- Implement `compute_geographic_overlap(event_regions, revenue_mix) -> float` using revenue percentage weighting
- Implement `compute_supply_chain_overlap(event_regions, supply_regions) -> float` using set intersection ratio
- Implement `compute_commodity_overlap(event_commodities, company_commodities) -> float` using set intersection ratio
- Implement `apply_resilience_modifier(raw_score, tier, event_is_international) -> float` with tier multipliers: global_leader=0.7, multinational=0.85, regional=1.0, domestic=1.2
- Implement `compute_macro_impact(event: GlobalEvent, profile: ExposureProfile) -> MacroImpactRecord` using the scoring formula: `severity_weight * (0.35*geo + 0.25*supply + 0.25*commodity + 0.15*sector)` then resilience modifier
- Implement `build_default_profile(sector, industry, market_cap_bucket) -> ExposureProfile` for companies without manual profiles
- Handle zero-overlap case: return score 0.0 and skip further processing
- Handle mixed direction: when both positive and negative factors exist, set direction to 'mixed' and preserve both factor lists
- Persist `MacroImpactRecord` objects to the `macro_impact_records` PostgreSQL table
- _Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 3.2_
- [x] 5.2 Write property test for macro impact score bounds and zero-overlap invariant
- **Property 7: Macro impact score bounds and zero-overlap invariant**
- **Validates: Requirements 4.1, 4.4**
- [x] 5.3 Write property test for scoring monotonicity
- **Property 8: Scoring monotonicity**
- **Validates: Requirements 4.2**
- [x] 5.4 Write property test for resilience modifier tier ordering
- **Property 9: Resilience modifier tier ordering**
- **Validates: Requirements 4.3**
- [x] 5.5 Write property test for mixed direction dual-effect events
- **Property 10: Mixed direction for dual-effect events**
- **Validates: Requirements 4.6**
- [x] 6. Checkpoint — Ensure core scoring logic and property tests pass
- Ensure all tests pass, ask the user if questions arise.
- [x] 7. Aggregation engine integration
- [x] 7.1 Extend `services/aggregation/worker.py` to incorporate macro signals
- Add `macro_signal_weight` and `macro_enabled` fields to `AggregationConfig`
- In `aggregate_company_window`, check macro toggle state from `risk_configs` table
- Fetch `macro_impact_records` for the ticker within the aggregation window
- Convert each `MacroImpactRecord` to a `WeightedSignal` using: `document_id=event.source_document_id`, `sentiment_value` mapped from `impact_direction`, `impact_score=macro_impact_score * macro_signal_weight`, recency decay from event publication time, confidence gating from macro record confidence
- Merge macro signals with company-specific signals before computing trend direction, strength, confidence, and contradiction score
- Include contributing `GlobalEvent` source_document_ids in evidence references
- When macro layer is disabled or no macro data exists, produce identical output to company-only aggregation
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6_
- [x] 7.2 Write property test for macro signals influencing trend output
- **Property 11: Macro signals influence trend output**
- **Validates: Requirements 5.1**
- [x] 7.3 Write property test for macro-company contradiction detection
- **Property 12: Macro-company contradiction detection**
- **Validates: Requirements 5.3**
- [x] 7.4 Write property test for macro evidence traceability
- **Property 13: Macro evidence traceability**
- **Validates: Requirements 5.4**
- [x] 7.5 Write property test for no degradation without macro data and disabled-layer equivalence
- **Property 14: No degradation without macro data and disabled-layer equivalence**
- **Validates: Requirements 5.5, 11.2**
- [x] 8. Sector and market rollup enhancement
- [x] 8.1 Extend sector and market rollup logic in `services/aggregation/worker.py`
- When computing sector-level rollups, incorporate macro impact signals affecting the sector weighted by constituent company exposure
- When computing market-level rollups, aggregate macro signals across all sectors reflecting breadth and severity
- When a GlobalEvent disproportionately affects one sector (>60% of total macro impact), surface that sector in `material_risks` or `dominant_catalysts` of the market-level rollup
- _Requirements: 6.1, 6.2, 6.3_
- [x] 8.2 Write property test for sector and market rollup macro incorporation
- **Property 15: Sector and market rollup macro incorporation**
- **Validates: Requirements 6.1, 6.2, 6.3**
- [x] 9. Trend projection module
- [x] 9.1 Implement `services/aggregation/projection.py`
- Implement `TrendProjection` dataclass matching the design specification
- Implement projection logic: compute trend momentum (rate of change in strength across recent windows), project macro signal decay based on `estimated_duration` and severity, factor in upcoming catalysts from document intelligence, combine into projected direction/strength/confidence
- Flag divergence when projected direction differs from current trend direction, include divergence reason in `driving_factors`
- When macro layer is disabled, compute projections using only company-specific momentum with reduced confidence
- Mark projections with `projected_confidence` below threshold (default 0.3) as `low_confidence`
- Persist `TrendProjection` to the `trend_projections` PostgreSQL table alongside the trend_window record
- Call projection computation from `aggregate_company_window` after trend summary is assembled
- _Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.9_
- [x] 9.2 Write property test for trend projection always produced
- **Property 20: Trend projection always produced**
- **Validates: Requirements 12.1**
- [x] 9.3 Write property test for projection divergence flagging
- **Property 21: Projection divergence flagging**
- **Validates: Requirements 12.3**
- [x] 9.4 Write property test for macro-disabled projections have reduced confidence
- **Property 22: Macro-disabled projections have reduced confidence**
- **Validates: Requirements 12.4**
- [x] 9.5 Write property test for low-confidence projection exclusion
- **Property 23: Low-confidence projection exclusion**
- **Validates: Requirements 12.9**
- [x] 10. Checkpoint — Ensure aggregation integration and projections work correctly
- Ensure all tests pass, ask the user if questions arise.
- [x] 11. Macro signal suppression and safety
- [x] 11.1 Implement exposure profile auto-inference in `services/extractor/exposure_inference.py`
- Implement `infer_exposure_profile(document_intelligences, sector, industry, market_cap_bucket) -> ExposureProfile`
- Scan recent filing extractions for geographic revenue breakdowns, supplier mentions, and commodity references
- Produce profile with `source='inferred'` and a confidence score reflecting data quality
- Fall back to sector-based default profile when insufficient filing data
- _Requirements: 9.1, 9.2, 9.3_
- [x] 11.2 Write property test for inferred exposure profile correctness
- **Property 16: Inferred exposure profile correctness**
- **Validates: Requirements 9.1, 9.2**
- [x] 11.3 Extend `services/recommendation/suppression.py` with macro-only suppression
- Add `MACRO_ONLY_SIGNAL = "macro_only_signal"` to `SuppressionReason` enum
- Implement `evaluate_macro_only_suppression(summary, macro_signal_count, company_signal_count) -> bool`
- When macro signals are the sole basis for a trend direction change, force recommendation to `mode='informational'` and append macro-only caveat to thesis
- _Requirements: 10.3_
- [x] 11.4 Write property test for macro-only recommendation suppression
- **Property 19: Macro-only recommendation suppression**
- **Validates: Requirements 10.3**
- [x] 11.5 Implement low-confidence event exclusion and accelerated decay in interpolation engine
- In `services/aggregation/interpolation.py`, skip events with confidence below configurable threshold (default 0.4) and log exclusion reason
- Apply accelerated decay factor for short_term events older than 48 hours (effective weight strictly less than standard recency decay)
- _Requirements: 10.1, 10.2_
- [x] 11.6 Write property test for low-confidence event exclusion
- **Property 17: Low-confidence event exclusion**
- **Validates: Requirements 10.1**
- [x] 11.7 Write property test for accelerated decay for stale short-term events
- **Property 18: Accelerated decay for stale short-term events**
- **Validates: Requirements 10.2**
- [x] 12. Macro signal layer toggle and API endpoints
- [x] 12.1 Implement macro toggle and status endpoints in `services/api/app.py`
- Add `GET /api/admin/macro/status` returning current enabled/disabled state from `risk_configs` table
- Add `PUT /api/admin/macro/toggle` to switch macro layer on/off, persisting to `risk_configs` and recording an audit event with previous state, new state, and operator
- Toggle state is read from PostgreSQL at the start of each aggregation cycle (no caching)
- _Requirements: 11.1, 11.5, 11.7_
- [x] 12.2 Implement macro event and impact query endpoints in `services/api/app.py`
- Add `GET /api/macro/events` — list recent global events with filtering by severity, region, sector, date range
- Add `GET /api/macro/events/{event_id}` — event detail with list of affected companies and their macro impact scores
- Add `GET /api/macro/impacts/{ticker}` — macro impacts for a specific company
- Add `GET /api/trends/{trend_id}/projection` — trend projection for a specific trend window
- Include projection data in existing `GET /api/trends` list response
- _Requirements: 8.1, 8.2, 12.10_
- [x] 12.3 Ensure macro ingestion continues when layer is disabled
- When macro layer is disabled, ingestion and classification continue (historical data preserved), but interpolation and aggregation integration are skipped
- When re-enabled, resume computing macro impact scores using most recent classifications including events ingested while disabled
- _Requirements: 11.2, 11.3, 11.4_
- [x] 13. Checkpoint — Ensure API endpoints and toggle logic work correctly
- Ensure all tests pass, ask the user if questions arise.
- [x] 14. Lake publisher extensions
- [x] 14.1 Add macro fact publishers to the lake publisher service
- Implement `publish_global_event_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/global_events/dt={date}/`
- Implement `publish_macro_impact_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/macro_impacts/dt={date}/ticker={ticker}/`
- Implement `publish_trend_projection_fact` writing partitioned Parquet datasets to `stonks-lakehouse/warehouse/trend_projections/dt={date}/ticker={ticker}/`
- Register new fact types in the lake publisher's job processing loop
- _Requirements: 7.3, 12.6_
- [x] 14.2 Write property test for macro data persistence round-trip
- **Property 4: Macro data persistence round-trip**
- **Validates: Requirements 3.1, 7.1, 7.2, 12.5**
- [x] 14.3 Write property test for content hash stability and uniqueness
- **Property 1: Content hash stability and uniqueness**
- **Validates: Requirements 1.2**
- [x] 15. Macro ingestion pipeline wiring
- [x] 15.1 Wire macro source ingestion into the scheduler and ingestion worker
- Configure scheduler to trigger macro news source fetches on polling interval
- Ingestion worker stores raw payloads in MinIO under `stonks-raw-news/macro/` prefix
- Metadata records use `document_type='macro_event'` in PostgreSQL
- Content hash deduplication consistent with existing behavior
- Source failure handling with retry policy consistent with existing sources
- _Requirements: 1.1, 1.2, 1.3, 1.4_
- [x] 15.2 Wire event classification into the extractor worker
- After parsing, route `macro_event` documents to `event_classifier.classify_global_event()` instead of standard document extraction
- After classification, trigger interpolation for all tracked companies via aggregation queue
- _Requirements: 2.1, 2.2, 2.3_
- [x] 15.3 Wire interpolation into the aggregation pipeline
- After event classification, load exposure profiles for all tracked companies (manual, inferred, or default)
- Compute `MacroImpactRecord` for each company with non-zero overlap
- Persist records and trigger aggregation for affected tickers
- Handle sustained macro ingestion failures: alert operators and continue with company-only signals
- _Requirements: 4.1, 4.5, 10.4_
- [x] 16. Checkpoint — Ensure full backend pipeline works end-to-end
- Ensure all tests pass, ask the user if questions arise.
- [x] 17. Dashboard — Global Events page and macro exposure panel
- [x] 17.1 Create Global Events list page at `frontend/src/pages/GlobalEvents.tsx`
- Filterable list of recent global events with columns: summary, impact types, severity badge, affected regions, affected sectors, event date
- Add API hooks for `GET /api/macro/events` in `frontend/src/api/hooks.ts`
- Add route `/macro/events` in `frontend/src/routes.tsx`
- Add navigation entry in sidebar in `frontend/src/components/AppLayout.tsx`
- _Requirements: 8.1_
- [x] 17.2 Create Global Event detail page at `frontend/src/pages/GlobalEventDetail.tsx`
- Display full classification detail: all affected companies with Macro_Impact_Scores, impact directions, contributing factors
- Add API hook for `GET /api/macro/events/{event_id}`
- Add route `/macro/events/:id` in `frontend/src/routes.tsx`
- _Requirements: 8.2_
- [x] 17.3 Add macro exposure panel to Company Detail page
- On `frontend/src/pages/CompanyDetail.tsx`, add a new tab/panel showing the company's Exposure_Profile and active GlobalEvents affecting the company with their Macro_Impact_Scores
- Add API hook for `GET /api/macro/impacts/{ticker}`
- _Requirements: 8.3_
- [x] 17.4 Add macro evidence indicators to Trend and Recommendation detail pages
- On `frontend/src/pages/TrendDetail.tsx`, visually distinguish macro-sourced evidence from company-specific evidence in the evidence chain
- On `frontend/src/pages/RecommendationDetail.tsx`, display macro signals that contributed with links back to originating GlobalEvents
- _Requirements: 8.4, 8.5_
- [x] 17.5 Add trend projection display to Trend detail page
- On `frontend/src/pages/TrendDetail.tsx`, display projected direction/strength alongside current trend with visual indicator and expandable driving factors panel
- Add API hook for `GET /api/trends/{trend_id}/projection`
- _Requirements: 12.7_
- [x] 17.6 Add macro toggle to Trading Controls page
- On `frontend/src/pages/Trading.tsx`, add macro signal layer enable/disable switch with confirmation dialog
- Add API hooks for `GET /api/admin/macro/status` and `PUT /api/admin/macro/toggle`
- _Requirements: 11.5, 11.6_
- [x] 18. Checkpoint — Ensure frontend pages render and integrate with API
- Ensure all tests pass, ask the user if questions arise.
- [x] 19. Integration wiring and final validation
- [x] 19.1 Add recommendation engine integration for trend projections
- Incorporate trend projection into recommendation thesis and time_horizon fields, citing projected direction and key driving factors
- Exclude low-confidence projections from influencing recommendation eligibility
- _Requirements: 12.8, 12.9_
- [x] 19.2 Write integration tests for macro pipeline end-to-end
- Test macro article ingestion → parsing → classification → interpolation → aggregation flow
- Test lake publisher writes correct Parquet partitions for global events and macro impacts
- Test macro toggle state change propagates to next aggregation cycle
- _Requirements: 1.1, 2.1, 4.1, 5.1, 7.3, 11.1_
- [x] 19.3 Write unit tests for API endpoints and dashboard components
- Test macro event list/detail endpoints return correct data
- Test macro toggle endpoint persists state and records audit event
- Test trend projection endpoint returns projection data
- Add MSW handlers for macro endpoints in `frontend/src/test/mocks/handlers.ts`
- Test GlobalEvents page and macro exposure panel render correctly
- _Requirements: 8.1, 8.2, 11.5, 12.10_
- [x] 20. Final checkpoint — Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation after each major phase
- Property tests validate the 23 correctness properties from the design using Hypothesis
- The design uses Python throughout — no language selection needed
- No new Kubernetes deployments required; all modules extend existing services
- Next migration number is 016