Files

T

Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services

- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs

2026-04-22 02:56:41 +00:00

10 KiB

Raw Blame History

Design Document: Intelligence Pipeline Deep Dive

Overview

This design specifies the structure, content, and creation process for a 6-page narrative deep-dive document covering the full intelligence-to-decision pipeline in Stonks Oracle. The deliverable consists of Markdown narrative pages, an index file, and standalone Mermaid diagram files — all stored under docs/intelligence-pipeline-deep-dive/.

The document targets technical readers who want to understand how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions. Unlike the existing reference docs (docs/services.md, docs/architecture-data-pipeline.md), this deliverable is narrative and explanatory — it tells the story of data flowing through the platform end-to-end.

Key design decision: This is a documentation-only deliverable. No application code, database schemas, or infrastructure changes are involved. The output is purely Markdown files and Mermaid diagram files.

Existing Documentation Landscape

The codebase already has several reference documents that this deep-dive complements:

Document	Purpose	Style
`docs/architecture-data-pipeline.md`	Queue topology, data store summary, Mermaid flow diagrams	Reference diagrams + tables
`docs/llm-to-trade-pipeline.md`	End-to-end data flow from model output to trade	Narrative + tables + code blocks
`docs/services.md`	Per-service configuration, tables, queues, behaviors	Reference manual
`docs/ai-agents.md`	AI agent configuration, variants, A/B testing, API	Guide + reference

The deep-dive document will reference these existing docs for readers who want deeper detail, while providing a cohesive narrative that connects all pipeline stages into a single story.

Architecture

File Organization

docs/intelligence-pipeline-deep-dive/
├── index.md
├── 01-data-ingestion-and-preparation.md
├── 02-ai-agent-processing-and-extraction.md
├── 03-signal-scoring-and-weighted-signals.md
├── 04-trend-aggregation-and-accumulating-signals.md
├── 05-recommendation-generation.md
├── 06-trading-decisions-and-execution.md
└── diagrams/
    ├── ingestion-to-extraction-flow.md
    ├── three-layer-signal-merging.md
    ├── recommendation-generation-flow.md
    ├── trading-engine-decision-loop.md
    ├── weighted-signal-computation.md
    └── trend-accumulation-escalation.md

Content Flow

Each page covers one pipeline stage and ends with a transitional paragraph previewing the next page. Cross-references between pages use relative Markdown links. Diagrams are stored as standalone Mermaid files in the diagrams/ subdirectory and linked from the narrative pages (not embedded inline).

flowchart LR
    P1["Page 1\nData Ingestion"] --> P2["Page 2\nAI Extraction"]
    P2 --> P3["Page 3\nSignal Scoring"]
    P3 --> P4["Page 4\nTrend Aggregation"]
    P4 --> P5["Page 5\nRecommendations"]
    P5 --> P6["Page 6\nTrading Execution"]

Components and Interfaces

Index File (`index.md`)

The index provides:

A brief introduction to the deep-dive document series
A numbered table of contents linking to all 6 pages
A diagrams section linking to all Mermaid diagram files
References to existing documentation for additional context

Narrative Pages (01 through 06)

Each page follows a consistent structure:

Title and introduction — what this stage does and why it matters
Narrative body — explanatory prose describing the pipeline stage, referencing actual code modules (services/extractor/main.py), database tables (document_impact_records), Redis queues (stonks:queue:extraction), and Pydantic schemas (ExtractionResult)
Diagram references — links to relevant Mermaid diagram files in diagrams/
Transition — a closing paragraph that previews the next page

Mermaid Diagram Files

Each diagram file contains:

A brief title comment
A single Mermaid code block
Service labels include both human-readable names and Python module paths
Queue labels use full Redis key patterns
Database references use exact PostgreSQL table names

Minimum 6 diagrams covering:

Ingestion-to-extraction flow: Scheduler → Ingestion → Parser → Extractor, with queues and storage
Three-layer signal merging: Company, Macro, and Competitive layers converging into aggregation
Recommendation generation flow: Suppression → Eligibility → Thesis → Risk classification
Trading engine decision loop: Pre-trade checks → Position sizing → Order submission
Weighted signal computation: Component breakdown of the composite weight formula
Trend accumulation and escalation: How consecutive signals strengthen trends and escalate actions

Page Content Mapping

Page	Primary Code Modules	Key Database Tables	Key Queues
01 - Ingestion	`services/scheduler/app.py`, `services/ingestion/worker.py`, `services/parser/worker.py`	`documents`, `ingestion_runs`, `document_company_mentions`	`stonks:queue:ingestion`, `stonks:queue:parsing`
02 - AI Extraction	`services/extractor/main.py`, `services/extractor/client.py`, `services/extractor/prompts.py`, `services/extractor/schemas.py`, `services/extractor/event_classifier.py`, `services/shared/agent_config.py`	`document_intelligence`, `document_impact_records`, `global_events`, `macro_impact_records`, `ai_agents`, `agent_variants`	`stonks:queue:extraction`, `stonks:queue:macro_classification`, `stonks:queue:aggregation`
03 - Signal Scoring	`services/aggregation/scoring.py`	`document_impact_records`, `macro_impact_records`, `competitive_signal_records`, `risk_configs`	—
04 - Trend Aggregation	`services/aggregation/worker.py`, `services/aggregation/contradiction.py`, `services/aggregation/projection.py`, `services/aggregation/pattern_matcher.py`, `services/aggregation/signal_propagation.py`	`trend_windows`, `trend_history`, `trend_evidence`, `trend_projections`	`stonks:queue:aggregation`, `stonks:queue:recommendation`
05 - Recommendations	`services/recommendation/main.py`, `services/recommendation/suppression.py`, `services/recommendation/eligibility.py`, `services/recommendation/thesis_llm.py`	`recommendations`, `recommendation_evidence`, `risk_evaluations`	`stonks:queue:recommendation`
06 - Trading	`services/trading/engine.py`, `services/trading/position_sizer.py`, `services/trading/circuit_breaker.py`, `services/trading/reserve_pool.py`, `services/trading/risk_tier_controller.py`, `services/trading/stop_loss_manager.py`	`trading_decisions`, `orders`, `positions`, `portfolio_snapshots`, `reserve_pool_ledger`, `risk_tier_history`, `circuit_breaker_events`	`stonks:queue:broker_orders`

Data Models

This feature produces only documentation files. There are no new data models, database tables, or schema changes.

The narrative pages will reference existing data models from the codebase:

WeightedSignal (services/aggregation/scoring.py) — document reference + composite weight + sentiment + impact
SignalWeight (services/aggregation/scoring.py) — breakdown of recency, credibility, novelty, confidence gate, market context multiplier
ScoringConfig (services/aggregation/scoring.py) — tunable parameters for signal scoring
ExtractionResult / CompanyImpact (services/extractor/schemas.py) — structured JSON output from document extraction
GlobalEventSchema (services/extractor/event_classifier.py) — macro event classification output
TrendSummary (services/shared/schemas.py) — rolling trend for a ticker across a time window
Recommendation (services/shared/schemas.py) — actionable trade recommendation
TradingDecision (services/trading/engine.py) — audit record of every trading evaluation

Error Handling

Since this is a documentation-only deliverable, there is no runtime error handling to design. The primary quality concern is accuracy — ensuring that all code module paths, database table names, Redis queue keys, schema field names, and configuration values referenced in the narrative match the actual codebase.

Accuracy Verification Strategy

Code module paths: Every module path referenced in the narrative (e.g., services/aggregation/scoring.py) must correspond to an existing file in the repository.
Database table names: Table names must match those defined in infra/migrations/ SQL files.
Redis queue keys: Queue names must match constants in services/shared/redis_keys.py.
Schema class names: Pydantic model names must match their definitions in services/shared/schemas.py and service-specific schema files.
Configuration values: Environment variable names and default values must match services/shared/config.py and service-specific configuration.

Cross-Reference Integrity

All inter-page links (e.g., [Page 3](03-signal-scoring-and-weighted-signals.md)) and diagram links (e.g., [diagram](diagrams/ingestion-to-extraction-flow.md)) must resolve to files that exist in the deliverable.

Testing Strategy

Property-based testing does not apply to this feature. The deliverable is purely documentation — Markdown narrative pages and Mermaid diagram files. There are no functions, data transformations, or code logic to test.

Why PBT Does Not Apply

The output is static Markdown text, not executable code
There are no input/output functions to verify properties against
There is no data transformation logic that varies with input
The quality criteria (narrative coherence, codebase accuracy, cross-reference integrity) are best verified through manual review

Verification Approach

File existence check: Verify all 6 page files, the index file, and all diagram files exist at the expected paths
Link integrity: Verify all inter-page and diagram links resolve to existing files
Mermaid syntax: Verify each diagram file contains valid Mermaid syntax by checking for proper flowchart or graph declarations
Codebase reference spot-checks: Verify a sample of referenced module paths, table names, and queue keys against the actual codebase
Narrative flow: Manual review to confirm each page ends with a transition to the next and the overall story is coherent

10 KiB Raw Blame History