Files
stonks-oracle/.kiro/specs/intelligence-pipeline-deep-dive/design.md
T
Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
2026-04-22 02:56:41 +00:00

10 KiB

Design Document: Intelligence Pipeline Deep Dive

Overview

This design specifies the structure, content, and creation process for a 6-page narrative deep-dive document covering the full intelligence-to-decision pipeline in Stonks Oracle. The deliverable consists of Markdown narrative pages, an index file, and standalone Mermaid diagram files — all stored under docs/intelligence-pipeline-deep-dive/.

The document targets technical readers who want to understand how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions. Unlike the existing reference docs (docs/services.md, docs/architecture-data-pipeline.md), this deliverable is narrative and explanatory — it tells the story of data flowing through the platform end-to-end.

Key design decision: This is a documentation-only deliverable. No application code, database schemas, or infrastructure changes are involved. The output is purely Markdown files and Mermaid diagram files.

Existing Documentation Landscape

The codebase already has several reference documents that this deep-dive complements:

Document Purpose Style
docs/architecture-data-pipeline.md Queue topology, data store summary, Mermaid flow diagrams Reference diagrams + tables
docs/llm-to-trade-pipeline.md End-to-end data flow from model output to trade Narrative + tables + code blocks
docs/services.md Per-service configuration, tables, queues, behaviors Reference manual
docs/ai-agents.md AI agent configuration, variants, A/B testing, API Guide + reference

The deep-dive document will reference these existing docs for readers who want deeper detail, while providing a cohesive narrative that connects all pipeline stages into a single story.

Architecture

File Organization

docs/intelligence-pipeline-deep-dive/
├── index.md
├── 01-data-ingestion-and-preparation.md
├── 02-ai-agent-processing-and-extraction.md
├── 03-signal-scoring-and-weighted-signals.md
├── 04-trend-aggregation-and-accumulating-signals.md
├── 05-recommendation-generation.md
├── 06-trading-decisions-and-execution.md
└── diagrams/
    ├── ingestion-to-extraction-flow.md
    ├── three-layer-signal-merging.md
    ├── recommendation-generation-flow.md
    ├── trading-engine-decision-loop.md
    ├── weighted-signal-computation.md
    └── trend-accumulation-escalation.md

Content Flow

Each page covers one pipeline stage and ends with a transitional paragraph previewing the next page. Cross-references between pages use relative Markdown links. Diagrams are stored as standalone Mermaid files in the diagrams/ subdirectory and linked from the narrative pages (not embedded inline).

flowchart LR
    P1["Page 1\nData Ingestion"] --> P2["Page 2\nAI Extraction"]
    P2 --> P3["Page 3\nSignal Scoring"]
    P3 --> P4["Page 4\nTrend Aggregation"]
    P4 --> P5["Page 5\nRecommendations"]
    P5 --> P6["Page 6\nTrading Execution"]

Components and Interfaces

Index File (index.md)

The index provides:

  • A brief introduction to the deep-dive document series
  • A numbered table of contents linking to all 6 pages
  • A diagrams section linking to all Mermaid diagram files
  • References to existing documentation for additional context

Narrative Pages (01 through 06)

Each page follows a consistent structure:

  1. Title and introduction — what this stage does and why it matters
  2. Narrative body — explanatory prose describing the pipeline stage, referencing actual code modules (services/extractor/main.py), database tables (document_impact_records), Redis queues (stonks:queue:extraction), and Pydantic schemas (ExtractionResult)
  3. Diagram references — links to relevant Mermaid diagram files in diagrams/
  4. Transition — a closing paragraph that previews the next page

Mermaid Diagram Files

Each diagram file contains:

  1. A brief title comment
  2. A single Mermaid code block
  3. Service labels include both human-readable names and Python module paths
  4. Queue labels use full Redis key patterns
  5. Database references use exact PostgreSQL table names

Minimum 6 diagrams covering:

  • Ingestion-to-extraction flow: Scheduler → Ingestion → Parser → Extractor, with queues and storage
  • Three-layer signal merging: Company, Macro, and Competitive layers converging into aggregation
  • Recommendation generation flow: Suppression → Eligibility → Thesis → Risk classification
  • Trading engine decision loop: Pre-trade checks → Position sizing → Order submission
  • Weighted signal computation: Component breakdown of the composite weight formula
  • Trend accumulation and escalation: How consecutive signals strengthen trends and escalate actions

Page Content Mapping

Page Primary Code Modules Key Database Tables Key Queues
01 - Ingestion services/scheduler/app.py, services/ingestion/worker.py, services/parser/worker.py documents, ingestion_runs, document_company_mentions stonks:queue:ingestion, stonks:queue:parsing
02 - AI Extraction services/extractor/main.py, services/extractor/client.py, services/extractor/prompts.py, services/extractor/schemas.py, services/extractor/event_classifier.py, services/shared/agent_config.py document_intelligence, document_impact_records, global_events, macro_impact_records, ai_agents, agent_variants stonks:queue:extraction, stonks:queue:macro_classification, stonks:queue:aggregation
03 - Signal Scoring services/aggregation/scoring.py document_impact_records, macro_impact_records, competitive_signal_records, risk_configs
04 - Trend Aggregation services/aggregation/worker.py, services/aggregation/contradiction.py, services/aggregation/projection.py, services/aggregation/pattern_matcher.py, services/aggregation/signal_propagation.py trend_windows, trend_history, trend_evidence, trend_projections stonks:queue:aggregation, stonks:queue:recommendation
05 - Recommendations services/recommendation/main.py, services/recommendation/suppression.py, services/recommendation/eligibility.py, services/recommendation/thesis_llm.py recommendations, recommendation_evidence, risk_evaluations stonks:queue:recommendation
06 - Trading services/trading/engine.py, services/trading/position_sizer.py, services/trading/circuit_breaker.py, services/trading/reserve_pool.py, services/trading/risk_tier_controller.py, services/trading/stop_loss_manager.py trading_decisions, orders, positions, portfolio_snapshots, reserve_pool_ledger, risk_tier_history, circuit_breaker_events stonks:queue:broker_orders

Data Models

This feature produces only documentation files. There are no new data models, database tables, or schema changes.

The narrative pages will reference existing data models from the codebase:

  • WeightedSignal (services/aggregation/scoring.py) — document reference + composite weight + sentiment + impact
  • SignalWeight (services/aggregation/scoring.py) — breakdown of recency, credibility, novelty, confidence gate, market context multiplier
  • ScoringConfig (services/aggregation/scoring.py) — tunable parameters for signal scoring
  • ExtractionResult / CompanyImpact (services/extractor/schemas.py) — structured JSON output from document extraction
  • GlobalEventSchema (services/extractor/event_classifier.py) — macro event classification output
  • TrendSummary (services/shared/schemas.py) — rolling trend for a ticker across a time window
  • Recommendation (services/shared/schemas.py) — actionable trade recommendation
  • TradingDecision (services/trading/engine.py) — audit record of every trading evaluation

Error Handling

Since this is a documentation-only deliverable, there is no runtime error handling to design. The primary quality concern is accuracy — ensuring that all code module paths, database table names, Redis queue keys, schema field names, and configuration values referenced in the narrative match the actual codebase.

Accuracy Verification Strategy

  1. Code module paths: Every module path referenced in the narrative (e.g., services/aggregation/scoring.py) must correspond to an existing file in the repository.
  2. Database table names: Table names must match those defined in infra/migrations/ SQL files.
  3. Redis queue keys: Queue names must match constants in services/shared/redis_keys.py.
  4. Schema class names: Pydantic model names must match their definitions in services/shared/schemas.py and service-specific schema files.
  5. Configuration values: Environment variable names and default values must match services/shared/config.py and service-specific configuration.

Cross-Reference Integrity

All inter-page links (e.g., [Page 3](03-signal-scoring-and-weighted-signals.md)) and diagram links (e.g., [diagram](diagrams/ingestion-to-extraction-flow.md)) must resolve to files that exist in the deliverable.

Testing Strategy

Property-based testing does not apply to this feature. The deliverable is purely documentation — Markdown narrative pages and Mermaid diagram files. There are no functions, data transformations, or code logic to test.

Why PBT Does Not Apply

  • The output is static Markdown text, not executable code
  • There are no input/output functions to verify properties against
  • There is no data transformation logic that varies with input
  • The quality criteria (narrative coherence, codebase accuracy, cross-reference integrity) are best verified through manual review

Verification Approach

  1. File existence check: Verify all 6 page files, the index file, and all diagram files exist at the expected paths
  2. Link integrity: Verify all inter-page and diagram links resolve to existing files
  3. Mermaid syntax: Verify each diagram file contains valid Mermaid syntax by checking for proper flowchart or graph declarations
  4. Codebase reference spot-checks: Verify a sample of referenced module paths, table names, and queue keys against the actual codebase
  5. Narrative flow: Manual review to confirm each page ends with a transition to the next and the overall story is coherent