feat: comprehensive docs, unit tests, docker-compose app services

- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
This commit is contained in:
Celes Renata
2026-04-22 02:56:41 +00:00
parent f251c53f92
commit 88ad1e8d99
57 changed files with 13318 additions and 51 deletions
@@ -0,0 +1,153 @@
# Design Document: Intelligence Pipeline Deep Dive
## Overview
This design specifies the structure, content, and creation process for a 6-page narrative deep-dive document covering the full intelligence-to-decision pipeline in Stonks Oracle. The deliverable consists of Markdown narrative pages, an index file, and standalone Mermaid diagram files — all stored under `docs/intelligence-pipeline-deep-dive/`.
The document targets technical readers who want to understand how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions. Unlike the existing reference docs (`docs/services.md`, `docs/architecture-data-pipeline.md`), this deliverable is narrative and explanatory — it tells the story of data flowing through the platform end-to-end.
**Key design decision**: This is a documentation-only deliverable. No application code, database schemas, or infrastructure changes are involved. The output is purely Markdown files and Mermaid diagram files.
### Existing Documentation Landscape
The codebase already has several reference documents that this deep-dive complements:
| Document | Purpose | Style |
|----------|---------|-------|
| `docs/architecture-data-pipeline.md` | Queue topology, data store summary, Mermaid flow diagrams | Reference diagrams + tables |
| `docs/llm-to-trade-pipeline.md` | End-to-end data flow from model output to trade | Narrative + tables + code blocks |
| `docs/services.md` | Per-service configuration, tables, queues, behaviors | Reference manual |
| `docs/ai-agents.md` | AI agent configuration, variants, A/B testing, API | Guide + reference |
The deep-dive document will reference these existing docs for readers who want deeper detail, while providing a cohesive narrative that connects all pipeline stages into a single story.
## Architecture
### File Organization
```
docs/intelligence-pipeline-deep-dive/
├── index.md
├── 01-data-ingestion-and-preparation.md
├── 02-ai-agent-processing-and-extraction.md
├── 03-signal-scoring-and-weighted-signals.md
├── 04-trend-aggregation-and-accumulating-signals.md
├── 05-recommendation-generation.md
├── 06-trading-decisions-and-execution.md
└── diagrams/
├── ingestion-to-extraction-flow.md
├── three-layer-signal-merging.md
├── recommendation-generation-flow.md
├── trading-engine-decision-loop.md
├── weighted-signal-computation.md
└── trend-accumulation-escalation.md
```
### Content Flow
Each page covers one pipeline stage and ends with a transitional paragraph previewing the next page. Cross-references between pages use relative Markdown links. Diagrams are stored as standalone Mermaid files in the `diagrams/` subdirectory and linked from the narrative pages (not embedded inline).
```mermaid
flowchart LR
P1["Page 1\nData Ingestion"] --> P2["Page 2\nAI Extraction"]
P2 --> P3["Page 3\nSignal Scoring"]
P3 --> P4["Page 4\nTrend Aggregation"]
P4 --> P5["Page 5\nRecommendations"]
P5 --> P6["Page 6\nTrading Execution"]
```
## Components and Interfaces
### Index File (`index.md`)
The index provides:
- A brief introduction to the deep-dive document series
- A numbered table of contents linking to all 6 pages
- A diagrams section linking to all Mermaid diagram files
- References to existing documentation for additional context
### Narrative Pages (01 through 06)
Each page follows a consistent structure:
1. **Title and introduction** — what this stage does and why it matters
2. **Narrative body** — explanatory prose describing the pipeline stage, referencing actual code modules (`services/extractor/main.py`), database tables (`document_impact_records`), Redis queues (`stonks:queue:extraction`), and Pydantic schemas (`ExtractionResult`)
3. **Diagram references** — links to relevant Mermaid diagram files in `diagrams/`
4. **Transition** — a closing paragraph that previews the next page
### Mermaid Diagram Files
Each diagram file contains:
1. A brief title comment
2. A single Mermaid code block
3. Service labels include both human-readable names and Python module paths
4. Queue labels use full Redis key patterns
5. Database references use exact PostgreSQL table names
Minimum 6 diagrams covering:
- **Ingestion-to-extraction flow**: Scheduler → Ingestion → Parser → Extractor, with queues and storage
- **Three-layer signal merging**: Company, Macro, and Competitive layers converging into aggregation
- **Recommendation generation flow**: Suppression → Eligibility → Thesis → Risk classification
- **Trading engine decision loop**: Pre-trade checks → Position sizing → Order submission
- **Weighted signal computation**: Component breakdown of the composite weight formula
- **Trend accumulation and escalation**: How consecutive signals strengthen trends and escalate actions
### Page Content Mapping
| Page | Primary Code Modules | Key Database Tables | Key Queues |
|------|---------------------|---------------------|------------|
| 01 - Ingestion | `services/scheduler/app.py`, `services/ingestion/worker.py`, `services/parser/worker.py` | `documents`, `ingestion_runs`, `document_company_mentions` | `stonks:queue:ingestion`, `stonks:queue:parsing` |
| 02 - AI Extraction | `services/extractor/main.py`, `services/extractor/client.py`, `services/extractor/prompts.py`, `services/extractor/schemas.py`, `services/extractor/event_classifier.py`, `services/shared/agent_config.py` | `document_intelligence`, `document_impact_records`, `global_events`, `macro_impact_records`, `ai_agents`, `agent_variants` | `stonks:queue:extraction`, `stonks:queue:macro_classification`, `stonks:queue:aggregation` |
| 03 - Signal Scoring | `services/aggregation/scoring.py` | `document_impact_records`, `macro_impact_records`, `competitive_signal_records`, `risk_configs` | — |
| 04 - Trend Aggregation | `services/aggregation/worker.py`, `services/aggregation/contradiction.py`, `services/aggregation/projection.py`, `services/aggregation/pattern_matcher.py`, `services/aggregation/signal_propagation.py` | `trend_windows`, `trend_history`, `trend_evidence`, `trend_projections` | `stonks:queue:aggregation`, `stonks:queue:recommendation` |
| 05 - Recommendations | `services/recommendation/main.py`, `services/recommendation/suppression.py`, `services/recommendation/eligibility.py`, `services/recommendation/thesis_llm.py` | `recommendations`, `recommendation_evidence`, `risk_evaluations` | `stonks:queue:recommendation` |
| 06 - Trading | `services/trading/engine.py`, `services/trading/position_sizer.py`, `services/trading/circuit_breaker.py`, `services/trading/reserve_pool.py`, `services/trading/risk_tier_controller.py`, `services/trading/stop_loss_manager.py` | `trading_decisions`, `orders`, `positions`, `portfolio_snapshots`, `reserve_pool_ledger`, `risk_tier_history`, `circuit_breaker_events` | `stonks:queue:broker_orders` |
## Data Models
This feature produces only documentation files. There are no new data models, database tables, or schema changes.
The narrative pages will reference existing data models from the codebase:
- **`WeightedSignal`** (`services/aggregation/scoring.py`) — document reference + composite weight + sentiment + impact
- **`SignalWeight`** (`services/aggregation/scoring.py`) — breakdown of recency, credibility, novelty, confidence gate, market context multiplier
- **`ScoringConfig`** (`services/aggregation/scoring.py`) — tunable parameters for signal scoring
- **`ExtractionResult`** / **`CompanyImpact`** (`services/extractor/schemas.py`) — structured JSON output from document extraction
- **`GlobalEventSchema`** (`services/extractor/event_classifier.py`) — macro event classification output
- **`TrendSummary`** (`services/shared/schemas.py`) — rolling trend for a ticker across a time window
- **`Recommendation`** (`services/shared/schemas.py`) — actionable trade recommendation
- **`TradingDecision`** (`services/trading/engine.py`) — audit record of every trading evaluation
## Error Handling
Since this is a documentation-only deliverable, there is no runtime error handling to design. The primary quality concern is **accuracy** — ensuring that all code module paths, database table names, Redis queue keys, schema field names, and configuration values referenced in the narrative match the actual codebase.
### Accuracy Verification Strategy
1. **Code module paths**: Every module path referenced in the narrative (e.g., `services/aggregation/scoring.py`) must correspond to an existing file in the repository.
2. **Database table names**: Table names must match those defined in `infra/migrations/` SQL files.
3. **Redis queue keys**: Queue names must match constants in `services/shared/redis_keys.py`.
4. **Schema class names**: Pydantic model names must match their definitions in `services/shared/schemas.py` and service-specific schema files.
5. **Configuration values**: Environment variable names and default values must match `services/shared/config.py` and service-specific configuration.
### Cross-Reference Integrity
All inter-page links (e.g., `[Page 3](03-signal-scoring-and-weighted-signals.md)`) and diagram links (e.g., `[diagram](diagrams/ingestion-to-extraction-flow.md)`) must resolve to files that exist in the deliverable.
## Testing Strategy
**Property-based testing does not apply to this feature.** The deliverable is purely documentation — Markdown narrative pages and Mermaid diagram files. There are no functions, data transformations, or code logic to test.
### Why PBT Does Not Apply
- The output is static Markdown text, not executable code
- There are no input/output functions to verify properties against
- There is no data transformation logic that varies with input
- The quality criteria (narrative coherence, codebase accuracy, cross-reference integrity) are best verified through manual review
### Verification Approach
1. **File existence check**: Verify all 6 page files, the index file, and all diagram files exist at the expected paths
2. **Link integrity**: Verify all inter-page and diagram links resolve to existing files
3. **Mermaid syntax**: Verify each diagram file contains valid Mermaid syntax by checking for proper `flowchart` or `graph` declarations
4. **Codebase reference spot-checks**: Verify a sample of referenced module paths, table names, and queue keys against the actual codebase
5. **Narrative flow**: Manual review to confirm each page ends with a transition to the next and the overall story is coherent