feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
This commit is contained in:
@@ -0,0 +1 @@
|
||||
{"specId": "d2fe9091-6423-482c-a4ce-3cd72e62eb23", "workflowType": "requirements-first", "specType": "feature"}
|
||||
@@ -0,0 +1,153 @@
|
||||
# Design Document: Intelligence Pipeline Deep Dive
|
||||
|
||||
## Overview
|
||||
|
||||
This design specifies the structure, content, and creation process for a 6-page narrative deep-dive document covering the full intelligence-to-decision pipeline in Stonks Oracle. The deliverable consists of Markdown narrative pages, an index file, and standalone Mermaid diagram files — all stored under `docs/intelligence-pipeline-deep-dive/`.
|
||||
|
||||
The document targets technical readers who want to understand how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions. Unlike the existing reference docs (`docs/services.md`, `docs/architecture-data-pipeline.md`), this deliverable is narrative and explanatory — it tells the story of data flowing through the platform end-to-end.
|
||||
|
||||
**Key design decision**: This is a documentation-only deliverable. No application code, database schemas, or infrastructure changes are involved. The output is purely Markdown files and Mermaid diagram files.
|
||||
|
||||
### Existing Documentation Landscape
|
||||
|
||||
The codebase already has several reference documents that this deep-dive complements:
|
||||
|
||||
| Document | Purpose | Style |
|
||||
|----------|---------|-------|
|
||||
| `docs/architecture-data-pipeline.md` | Queue topology, data store summary, Mermaid flow diagrams | Reference diagrams + tables |
|
||||
| `docs/llm-to-trade-pipeline.md` | End-to-end data flow from model output to trade | Narrative + tables + code blocks |
|
||||
| `docs/services.md` | Per-service configuration, tables, queues, behaviors | Reference manual |
|
||||
| `docs/ai-agents.md` | AI agent configuration, variants, A/B testing, API | Guide + reference |
|
||||
|
||||
The deep-dive document will reference these existing docs for readers who want deeper detail, while providing a cohesive narrative that connects all pipeline stages into a single story.
|
||||
|
||||
## Architecture
|
||||
|
||||
### File Organization
|
||||
|
||||
```
|
||||
docs/intelligence-pipeline-deep-dive/
|
||||
├── index.md
|
||||
├── 01-data-ingestion-and-preparation.md
|
||||
├── 02-ai-agent-processing-and-extraction.md
|
||||
├── 03-signal-scoring-and-weighted-signals.md
|
||||
├── 04-trend-aggregation-and-accumulating-signals.md
|
||||
├── 05-recommendation-generation.md
|
||||
├── 06-trading-decisions-and-execution.md
|
||||
└── diagrams/
|
||||
├── ingestion-to-extraction-flow.md
|
||||
├── three-layer-signal-merging.md
|
||||
├── recommendation-generation-flow.md
|
||||
├── trading-engine-decision-loop.md
|
||||
├── weighted-signal-computation.md
|
||||
└── trend-accumulation-escalation.md
|
||||
```
|
||||
|
||||
### Content Flow
|
||||
|
||||
Each page covers one pipeline stage and ends with a transitional paragraph previewing the next page. Cross-references between pages use relative Markdown links. Diagrams are stored as standalone Mermaid files in the `diagrams/` subdirectory and linked from the narrative pages (not embedded inline).
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
P1["Page 1\nData Ingestion"] --> P2["Page 2\nAI Extraction"]
|
||||
P2 --> P3["Page 3\nSignal Scoring"]
|
||||
P3 --> P4["Page 4\nTrend Aggregation"]
|
||||
P4 --> P5["Page 5\nRecommendations"]
|
||||
P5 --> P6["Page 6\nTrading Execution"]
|
||||
```
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### Index File (`index.md`)
|
||||
|
||||
The index provides:
|
||||
- A brief introduction to the deep-dive document series
|
||||
- A numbered table of contents linking to all 6 pages
|
||||
- A diagrams section linking to all Mermaid diagram files
|
||||
- References to existing documentation for additional context
|
||||
|
||||
### Narrative Pages (01 through 06)
|
||||
|
||||
Each page follows a consistent structure:
|
||||
1. **Title and introduction** — what this stage does and why it matters
|
||||
2. **Narrative body** — explanatory prose describing the pipeline stage, referencing actual code modules (`services/extractor/main.py`), database tables (`document_impact_records`), Redis queues (`stonks:queue:extraction`), and Pydantic schemas (`ExtractionResult`)
|
||||
3. **Diagram references** — links to relevant Mermaid diagram files in `diagrams/`
|
||||
4. **Transition** — a closing paragraph that previews the next page
|
||||
|
||||
### Mermaid Diagram Files
|
||||
|
||||
Each diagram file contains:
|
||||
1. A brief title comment
|
||||
2. A single Mermaid code block
|
||||
3. Service labels include both human-readable names and Python module paths
|
||||
4. Queue labels use full Redis key patterns
|
||||
5. Database references use exact PostgreSQL table names
|
||||
|
||||
Minimum 6 diagrams covering:
|
||||
- **Ingestion-to-extraction flow**: Scheduler → Ingestion → Parser → Extractor, with queues and storage
|
||||
- **Three-layer signal merging**: Company, Macro, and Competitive layers converging into aggregation
|
||||
- **Recommendation generation flow**: Suppression → Eligibility → Thesis → Risk classification
|
||||
- **Trading engine decision loop**: Pre-trade checks → Position sizing → Order submission
|
||||
- **Weighted signal computation**: Component breakdown of the composite weight formula
|
||||
- **Trend accumulation and escalation**: How consecutive signals strengthen trends and escalate actions
|
||||
|
||||
### Page Content Mapping
|
||||
|
||||
| Page | Primary Code Modules | Key Database Tables | Key Queues |
|
||||
|------|---------------------|---------------------|------------|
|
||||
| 01 - Ingestion | `services/scheduler/app.py`, `services/ingestion/worker.py`, `services/parser/worker.py` | `documents`, `ingestion_runs`, `document_company_mentions` | `stonks:queue:ingestion`, `stonks:queue:parsing` |
|
||||
| 02 - AI Extraction | `services/extractor/main.py`, `services/extractor/client.py`, `services/extractor/prompts.py`, `services/extractor/schemas.py`, `services/extractor/event_classifier.py`, `services/shared/agent_config.py` | `document_intelligence`, `document_impact_records`, `global_events`, `macro_impact_records`, `ai_agents`, `agent_variants` | `stonks:queue:extraction`, `stonks:queue:macro_classification`, `stonks:queue:aggregation` |
|
||||
| 03 - Signal Scoring | `services/aggregation/scoring.py` | `document_impact_records`, `macro_impact_records`, `competitive_signal_records`, `risk_configs` | — |
|
||||
| 04 - Trend Aggregation | `services/aggregation/worker.py`, `services/aggregation/contradiction.py`, `services/aggregation/projection.py`, `services/aggregation/pattern_matcher.py`, `services/aggregation/signal_propagation.py` | `trend_windows`, `trend_history`, `trend_evidence`, `trend_projections` | `stonks:queue:aggregation`, `stonks:queue:recommendation` |
|
||||
| 05 - Recommendations | `services/recommendation/main.py`, `services/recommendation/suppression.py`, `services/recommendation/eligibility.py`, `services/recommendation/thesis_llm.py` | `recommendations`, `recommendation_evidence`, `risk_evaluations` | `stonks:queue:recommendation` |
|
||||
| 06 - Trading | `services/trading/engine.py`, `services/trading/position_sizer.py`, `services/trading/circuit_breaker.py`, `services/trading/reserve_pool.py`, `services/trading/risk_tier_controller.py`, `services/trading/stop_loss_manager.py` | `trading_decisions`, `orders`, `positions`, `portfolio_snapshots`, `reserve_pool_ledger`, `risk_tier_history`, `circuit_breaker_events` | `stonks:queue:broker_orders` |
|
||||
|
||||
## Data Models
|
||||
|
||||
This feature produces only documentation files. There are no new data models, database tables, or schema changes.
|
||||
|
||||
The narrative pages will reference existing data models from the codebase:
|
||||
|
||||
- **`WeightedSignal`** (`services/aggregation/scoring.py`) — document reference + composite weight + sentiment + impact
|
||||
- **`SignalWeight`** (`services/aggregation/scoring.py`) — breakdown of recency, credibility, novelty, confidence gate, market context multiplier
|
||||
- **`ScoringConfig`** (`services/aggregation/scoring.py`) — tunable parameters for signal scoring
|
||||
- **`ExtractionResult`** / **`CompanyImpact`** (`services/extractor/schemas.py`) — structured JSON output from document extraction
|
||||
- **`GlobalEventSchema`** (`services/extractor/event_classifier.py`) — macro event classification output
|
||||
- **`TrendSummary`** (`services/shared/schemas.py`) — rolling trend for a ticker across a time window
|
||||
- **`Recommendation`** (`services/shared/schemas.py`) — actionable trade recommendation
|
||||
- **`TradingDecision`** (`services/trading/engine.py`) — audit record of every trading evaluation
|
||||
|
||||
## Error Handling
|
||||
|
||||
Since this is a documentation-only deliverable, there is no runtime error handling to design. The primary quality concern is **accuracy** — ensuring that all code module paths, database table names, Redis queue keys, schema field names, and configuration values referenced in the narrative match the actual codebase.
|
||||
|
||||
### Accuracy Verification Strategy
|
||||
|
||||
1. **Code module paths**: Every module path referenced in the narrative (e.g., `services/aggregation/scoring.py`) must correspond to an existing file in the repository.
|
||||
2. **Database table names**: Table names must match those defined in `infra/migrations/` SQL files.
|
||||
3. **Redis queue keys**: Queue names must match constants in `services/shared/redis_keys.py`.
|
||||
4. **Schema class names**: Pydantic model names must match their definitions in `services/shared/schemas.py` and service-specific schema files.
|
||||
5. **Configuration values**: Environment variable names and default values must match `services/shared/config.py` and service-specific configuration.
|
||||
|
||||
### Cross-Reference Integrity
|
||||
|
||||
All inter-page links (e.g., `[Page 3](03-signal-scoring-and-weighted-signals.md)`) and diagram links (e.g., `[diagram](diagrams/ingestion-to-extraction-flow.md)`) must resolve to files that exist in the deliverable.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
**Property-based testing does not apply to this feature.** The deliverable is purely documentation — Markdown narrative pages and Mermaid diagram files. There are no functions, data transformations, or code logic to test.
|
||||
|
||||
### Why PBT Does Not Apply
|
||||
|
||||
- The output is static Markdown text, not executable code
|
||||
- There are no input/output functions to verify properties against
|
||||
- There is no data transformation logic that varies with input
|
||||
- The quality criteria (narrative coherence, codebase accuracy, cross-reference integrity) are best verified through manual review
|
||||
|
||||
### Verification Approach
|
||||
|
||||
1. **File existence check**: Verify all 6 page files, the index file, and all diagram files exist at the expected paths
|
||||
2. **Link integrity**: Verify all inter-page and diagram links resolve to existing files
|
||||
3. **Mermaid syntax**: Verify each diagram file contains valid Mermaid syntax by checking for proper `flowchart` or `graph` declarations
|
||||
4. **Codebase reference spot-checks**: Verify a sample of referenced module paths, table names, and queue keys against the actual codebase
|
||||
5. **Narrative flow**: Manual review to confirm each page ends with a transition to the next and the overall story is coherent
|
||||
@@ -0,0 +1,155 @@
|
||||
# Requirements Document
|
||||
|
||||
## Introduction
|
||||
|
||||
This specification defines a 6-page narrative deep-dive document (plus separate Mermaid diagram files) that explains the full intelligence-to-decision pipeline in Stonks Oracle. The document targets a technical reader who wants to understand how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions. Unlike the existing service reference and API docs, this deliverable is narrative and explanatory — it tells the story of data flowing through the platform end-to-end, referencing actual code modules, database tables, queue names, and schemas from the codebase.
|
||||
|
||||
## Glossary
|
||||
|
||||
- **Deep_Dive_Document**: The 6-page Markdown document delivered under `docs/intelligence-pipeline-deep-dive/`, consisting of pages 01 through 06 covering the full intelligence-to-decision pipeline.
|
||||
- **Mermaid_Diagram_File**: A standalone Markdown file containing a single Mermaid diagram block, stored alongside the narrative pages in `docs/intelligence-pipeline-deep-dive/diagrams/`.
|
||||
- **Pipeline**: The end-to-end data flow from external source ingestion through AI extraction, signal aggregation, recommendation generation, and autonomous trading execution.
|
||||
- **Signal_Layer**: One of three independent signal sources (Company, Macro, Competitive) that produce `WeightedSignal` objects merged by the Aggregation_Engine.
|
||||
- **Aggregation_Engine**: The `services/aggregation/` module that merges weighted signals from all three layers into `TrendSummary` objects across five time windows.
|
||||
- **Trading_Engine**: The `services/trading/engine.py` module that polls recommendations and executes autonomous paper trades through a multi-check decision loop.
|
||||
- **Extractor**: The `services/extractor/` module that uses Ollama LLM inference to produce structured JSON intelligence from documents.
|
||||
- **WeightedSignal**: The `services.aggregation.scoring.WeightedSignal` dataclass that pairs a document reference with a composite aggregation weight.
|
||||
- **TrendSummary**: The `services.shared.schemas.TrendSummary` Pydantic model representing a rolling trend for a ticker across a specific time window.
|
||||
- **Recommendation**: The `services.shared.schemas.Recommendation` Pydantic model representing an actionable trade recommendation with action, mode, confidence, thesis, and position sizing.
|
||||
- **Circuit_Breaker**: The `services/trading/circuit_breaker.py` safety mechanism that halts trading when risk thresholds (daily loss, single-position loss, volatility clustering) are breached.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Document Structure and File Organization
|
||||
|
||||
**User Story:** As a technical reader, I want the deep-dive organized into clearly separated pages with a consistent structure, so that I can navigate to specific pipeline stages without reading the entire document.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document SHALL consist of exactly 6 Markdown page files named `01-data-ingestion-and-preparation.md` through `06-trading-decisions-and-execution.md`, stored under `docs/intelligence-pipeline-deep-dive/`.
|
||||
2. THE Deep_Dive_Document SHALL include an `index.md` file that provides a table of contents linking to all 6 pages and all Mermaid_Diagram_Files.
|
||||
3. WHEN a page references a Mermaid diagram, THE Deep_Dive_Document SHALL link to the corresponding Mermaid_Diagram_File stored in `docs/intelligence-pipeline-deep-dive/diagrams/` rather than embedding the diagram inline.
|
||||
4. THE Deep_Dive_Document SHALL include a minimum of 4 separate Mermaid_Diagram_Files covering: (a) the ingestion-to-extraction flow, (b) the three signal layers merging into aggregation, (c) the recommendation generation pipeline, and (d) the trading engine decision loop.
|
||||
5. WHEN a page references a code module, THE Deep_Dive_Document SHALL use the full Python module path (e.g., `services/extractor/prompts.py`) rather than abbreviated names.
|
||||
6. WHEN a page references a database table, THE Deep_Dive_Document SHALL use the exact table name as defined in the PostgreSQL schema (e.g., `document_impact_records`, `trend_windows`).
|
||||
7. WHEN a page references a Redis queue, THE Deep_Dive_Document SHALL use the full key pattern as defined in `services/shared/redis_keys.py` (e.g., `stonks:queue:extraction`).
|
||||
|
||||
### Requirement 2: Page 1 — Data Ingestion and Preparation
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how raw data enters Stonks Oracle and gets prepared for AI processing, so that I can trace the origin of any signal back to its external source.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 01 SHALL explain the four categories of input data: news articles (Polygon.io), SEC filings (EDGAR), market data (Polygon.io grouped daily and intraday bars), and macro/geopolitical events (macro news APIs).
|
||||
2. THE Deep_Dive_Document page 01 SHALL describe the Scheduler's role in orchestrating ingestion cycles, including cadence polling intervals per source type (`market_api`: 300s, `news_api`: 300s, `filings_api`: 3600s, `macro_news`: 600s), rate limiting, and exponential backoff.
|
||||
3. THE Deep_Dive_Document page 01 SHALL describe the Ingestion worker's adapter dispatch pattern, referencing the adapter classes (`PolygonMarketAdapter`, `PolygonNewsAdapter`, `SECEdgarAdapter`, `MacroNewsAdapter`) in `services/ingestion/`.
|
||||
4. THE Deep_Dive_Document page 01 SHALL explain content deduplication via Redis content-hash markers (`stonks:dedupe:*` with 24-hour TTL) and raw artifact storage in MinIO buckets (`stonks-raw-market`, `stonks-raw-news`, `stonks-raw-filings`).
|
||||
5. THE Deep_Dive_Document page 01 SHALL describe the Parser's role in converting raw HTML/text into normalized documents, including quality scoring with confidence levels (`high`, `medium`, `low`), company mention detection via alias matching, and the routing decision that sends `macro_event` documents to `stonks:queue:macro_classification` instead of `stonks:queue:extraction`.
|
||||
6. THE Deep_Dive_Document page 01 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 3: Page 2 — AI Agent Processing and Structured Extraction
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how the AI agents process documents and produce structured JSON output, so that I can evaluate the extraction quality and understand the schema contract.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 02 SHALL explain the Document Intelligence Extractor agent (`document-extractor` slug), including its entry point (`services/extractor/main.py` → `services/extractor/client.py`), the system prompt, and the user prompt template built by `build_extraction_prompt()` in `services/extractor/prompts.py`.
|
||||
2. THE Deep_Dive_Document page 02 SHALL describe the `ExtractionResult` JSON schema with all fields (summary, companies array with ticker/sentiment/impact_score/impact_horizon/catalyst_type/key_facts/risks/evidence_spans, macro_themes, novelty_score, confidence, extraction_warnings), referencing `services/extractor/schemas.py`.
|
||||
3. THE Deep_Dive_Document page 02 SHALL explain the Global Event Classifier agent (`event-classifier` slug), including its entry point (`services/extractor/event_classifier.py`), the `GlobalEvent` output schema with event_types/severity/affected_regions/affected_sectors/affected_commodities/estimated_duration/confidence, and the anti-hallucination rules that prevent classifying company-specific news as macro events.
|
||||
4. THE Deep_Dive_Document page 02 SHALL describe the JSON repair pipeline (direct parse → markdown fence stripping → `json-repair` library fallback) and the structural plus semantic validation in `services/extractor/schemas.py`, including retry logic with exponential backoff.
|
||||
5. THE Deep_Dive_Document page 02 SHALL explain the `AgentConfigResolver` mechanism (`services/shared/agent_config.py`) that enables hot-swapping models and prompts via the `ai_agents` and `agent_variants` database tables with a 60-second TTL cache.
|
||||
6. THE Deep_Dive_Document page 02 SHALL describe how extraction results are persisted to `document_intelligence` (one row per document) and `document_impact_records` (one row per company mention), and how the extractor enqueues aggregation jobs to `stonks:queue:aggregation`.
|
||||
7. THE Deep_Dive_Document page 02 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 4: Page 3 — Signal Scoring and the WeightedSignal Abstraction
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how raw extraction output gets transformed into weighted signals for decision making, so that I can reason about why certain documents influence trends more than others.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 03 SHALL explain the `WeightedSignal` dataclass (`services/aggregation/scoring.py`) and the composite weight formula: `combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier`.
|
||||
2. THE Deep_Dive_Document page 03 SHALL describe each weight component in detail: confidence gate (threshold 0.2), recency decay (exponential half-life per window: intraday=2h, 1d=12h, 7d=72h, 30d=240h, 90d=720h), source credibility weighting (clamped [0.1, 1.0] with configurable exponent), novelty bonus (up to 25%), and market context multiplier (volatility boost up to 30%, volume surge boost 15%).
|
||||
3. THE Deep_Dive_Document page 03 SHALL explain how sentiment labels are mapped to numeric values (+1.0 positive, -1.0 negative, 0.0 neutral/mixed) via `sentiment_to_numeric()` and how the weighted sentiment average is computed across all signals.
|
||||
4. THE Deep_Dive_Document page 03 SHALL describe the three signal layers (Company, Macro, Competitive) and how each produces `WeightedSignal` objects that are concatenated into a single list before trend computation, with relative influence controlled by `MACRO_SIGNAL_WEIGHT` (0.3) and `COMPETITIVE_SIGNAL_WEIGHT` (0.2).
|
||||
5. THE Deep_Dive_Document page 03 SHALL explain the runtime toggle mechanism for macro and competitive layers via the `risk_configs` database table, including graceful degradation when a layer is disabled or fails.
|
||||
6. THE Deep_Dive_Document page 03 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 5: Page 4 — Trend Aggregation and Accumulating Signals
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how the aggregation engine merges multiple signals — including consecutive signals suggesting the same direction — to produce trend summaries that drive grander decisions, so that I can see how accumulating bearish or bullish evidence escalates the system's response.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 04 SHALL explain how the Aggregation_Engine (`services/aggregation/worker.py`) computes `TrendSummary` objects across five time windows (intraday, 1d, 7d, 30d, 90d) by fetching impact records, macro impacts, and competitive signals for a ticker.
|
||||
2. THE Deep_Dive_Document page 04 SHALL describe the trend direction derivation rules: bullish (avg_sentiment ≥ 0.15), bearish (avg_sentiment ≤ -0.15), mixed (contradiction > 0.10 and |avg_sentiment| < 0.30), neutral (otherwise), referencing `derive_trend_direction()` in `services/aggregation/worker.py`.
|
||||
3. THE Deep_Dive_Document page 04 SHALL explain contradiction detection (`services/aggregation/contradiction.py`), including sentiment disagreement analysis and catalyst-level disagreement, and how the contradiction score (minority_weight / total_weight) penalizes trend confidence.
|
||||
4. THE Deep_Dive_Document page 04 SHALL describe how consecutive signals in the same direction accumulate to strengthen trend_strength and confidence, explaining the evidence ranking mechanism (`rank_evidence()`) that uses composite scoring (weight, impact, recency, confidence) and the confidence computation that rewards unique source count (caps at 15 sources for 0.8 contribution) and signal agreement (log₂ scaling, saturates around 7 unique sources).
|
||||
5. THE Deep_Dive_Document page 04 SHALL explain how accumulating bearish signals across multiple documents and time windows escalate the system's response — from a neutral hold to a bearish sell recommendation — and conversely how accumulating bullish signals escalate from watch to buy, using the trend strength and confidence thresholds from the eligibility rules.
|
||||
6. THE Deep_Dive_Document page 04 SHALL describe trend projections (`services/aggregation/projection.py`), including macro decay, momentum, driving factors, and divergence detection.
|
||||
7. THE Deep_Dive_Document page 04 SHALL describe persistence to `trend_windows` (upserted each cycle), `trend_history` (time-series snapshots), `trend_evidence` (per-document rankings), and `trend_projections`.
|
||||
8. THE Deep_Dive_Document page 04 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 6: Page 5 — Recommendation Generation and Signal-to-Action Translation
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how trend summaries are translated into actionable recommendations with risk classification and thesis generation, so that I can see the decision logic between aggregated intelligence and trading actions.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 05 SHALL explain the data quality suppression layer (`services/recommendation/suppression.py`), including the six suppression checks (extraction confidence < 0.40, evidence staleness > 168h, source diversity < 1, extraction failure rate > 50%, valid document count < 2, data quality score < 0.30) and the safety suppressions for macro-only and pattern-only trend shifts.
|
||||
2. THE Deep_Dive_Document page 05 SHALL describe the eligibility evaluation (`services/recommendation/eligibility.py`), including gate checks (confidence ≥ 0.35, strength ≥ 0.10, contradiction ≤ 0.60, evidence ≥ 2, direction ≠ neutral), action mapping (BUY/SELL for strength ≥ 0.25, HOLD for weaker directional signals, WATCH otherwise), and mode escalation (informational → paper_eligible → live_eligible based on confidence and evidence thresholds).
|
||||
3. THE Deep_Dive_Document page 05 SHALL explain position sizing computation from signal quality: base 1% + confidence × strength scaling up to 10%, with contradiction penalty, evidence count penalty, and max loss percentage scaling.
|
||||
4. THE Deep_Dive_Document page 05 SHALL describe the two-layer thesis generation: deterministic thesis assembly from trend data, and optional LLM rewrite via the `thesis-rewriter` agent (`services/recommendation/thesis_llm.py`) for trading-eligible recommendations.
|
||||
5. THE Deep_Dive_Document page 05 SHALL explain risk classification (low/moderate/high/very_high) based on contradiction score, confidence, evidence count, and mode.
|
||||
6. THE Deep_Dive_Document page 05 SHALL describe persistence to `recommendations`, `recommendation_evidence`, and `risk_evaluations` tables.
|
||||
7. THE Deep_Dive_Document page 05 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 7: Page 6 — Trading Engine Decisions and Execution
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how the trading engine uses aggregated trend data to make buy/sell/hold decisions, including position sizing, risk evaluation, and circuit breakers, so that I can trace any trade back to its intelligence origin.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 06 SHALL explain the Trading_Engine decision loop (`services/trading/engine.py`), including the five concurrent async tasks: decision loop (60s polling), stop-loss monitor, performance loop, risk tier scheduler, and rebalance scheduler.
|
||||
2. THE Deep_Dive_Document page 06 SHALL describe the pre-trade check sequence in order: circuit breaker check, trading window check, confidence gate (risk-tier minimum), deduplication, declining positions check, and max open positions check, explaining that the first failure short-circuits the evaluation.
|
||||
3. THE Deep_Dive_Document page 06 SHALL explain position sizing (`services/trading/position_sizer.py`), including confidence-based scaling with sample-size-dampened agreement scoring, risk tier adjustment (conservative/moderate/aggressive with specific parameter differences), correlation-aware diversification, sector exposure reduction, earnings proximity adjustment, and the absolute position cap.
|
||||
4. THE Deep_Dive_Document page 06 SHALL describe the Circuit_Breaker mechanism (`services/trading/circuit_breaker.py`), including the three trigger types (daily_loss with emergency drawdown threshold, single_position loss with ticker cooldown, volatility with stop-loss clustering detection), cooldown computation, and Redis state tracking (`stonks:trading:circuit_breaker:*`).
|
||||
5. THE Deep_Dive_Document page 06 SHALL explain the reserve pool mechanism (`services/trading/reserve_pool.py`): profit siphoning (default 20%), high-water mark rebalancing (30% threshold), emergency liquidation, and ledger tracking in `reserve_pool_ledger`.
|
||||
6. THE Deep_Dive_Document page 06 SHALL describe risk tier auto-adjustment (`services/trading/risk_tier_controller.py`), including the evaluation criteria (Sharpe ratio, drawdown, win rate) and the three tier configurations with their parameter differences (min confidence, max position %, stop-loss ATR multiplier, reward/risk ratio, max sector %, max portfolio heat).
|
||||
7. THE Deep_Dive_Document page 06 SHALL explain the order submission flow: `TradingDecision` persistence to `trading_decisions`, order job enqueue to `stonks:queue:broker_orders`, broker adapter risk evaluation, Alpaca paper trading submission, and the full audit trail from signal to broker response.
|
||||
8. THE Deep_Dive_Document page 06 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 8: Mermaid Diagram Quality and Separation
|
||||
|
||||
**User Story:** As a technical reader, I want architecture diagrams in separate files that I can render independently, so that I can use them in presentations or embed them in other documents.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Mermaid_Diagram_File is created, THE Deep_Dive_Document SHALL store the diagram in a standalone Markdown file under `docs/intelligence-pipeline-deep-dive/diagrams/` with a descriptive filename (e.g., `ingestion-to-extraction-flow.md`).
|
||||
2. THE Deep_Dive_Document SHALL include at least 4 Mermaid_Diagram_Files: one for the ingestion-to-extraction pipeline, one for the three-layer signal merging, one for the recommendation generation flow, and one for the trading engine decision loop.
|
||||
3. WHEN a Mermaid diagram references a service, THE Mermaid_Diagram_File SHALL label the service with both its human-readable name and its Python module path (e.g., `Extractor\nservices/extractor/main.py`).
|
||||
4. WHEN a Mermaid diagram references a queue, THE Mermaid_Diagram_File SHALL use the full Redis key pattern (e.g., `stonks:queue:extraction`).
|
||||
5. WHEN a Mermaid diagram references a database table, THE Mermaid_Diagram_File SHALL use the exact PostgreSQL table name.
|
||||
|
||||
### Requirement 9: Narrative Style and Cross-Referencing
|
||||
|
||||
**User Story:** As a technical reader, I want the document to read as a coherent narrative rather than a reference manual, so that I can build a mental model of the full pipeline without jumping between disconnected sections.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document SHALL use narrative prose with explanatory paragraphs as the primary writing style, reserving tables and bullet lists for structured data summaries only.
|
||||
2. WHEN a page references content covered in a different page, THE Deep_Dive_Document SHALL include a Markdown link to the relevant page and section.
|
||||
3. THE Deep_Dive_Document SHALL include transitional paragraphs at the end of each page that preview what the next page covers, creating a continuous narrative flow.
|
||||
4. THE Deep_Dive_Document SHALL reference the existing documentation where appropriate (e.g., `docs/services.md`, `docs/ai-agents.md`, `docs/architecture-data-pipeline.md`, `docs/llm-to-trade-pipeline.md`) for readers who want deeper reference-level detail.
|
||||
5. IF a concept is introduced for the first time, THEN THE Deep_Dive_Document SHALL provide a brief inline explanation before using the concept in subsequent discussion.
|
||||
|
||||
### Requirement 10: Codebase Accuracy
|
||||
|
||||
**User Story:** As a developer, I want the document to reference actual code modules, database tables, and queue names from the codebase, so that I can use the document as a reliable guide when navigating the source code.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document SHALL reference code modules using paths that exist in the repository (e.g., `services/aggregation/scoring.py`, `services/trading/circuit_breaker.py`, `services/shared/schemas.py`).
|
||||
2. THE Deep_Dive_Document SHALL reference database tables using names that match the PostgreSQL schema as defined in `infra/migrations/`.
|
||||
3. THE Deep_Dive_Document SHALL reference Redis queue names using the constants defined in `services/shared/redis_keys.py` (e.g., `QUEUE_EXTRACTION`, `QUEUE_AGGREGATION`, `QUEUE_RECOMMENDATION`, `QUEUE_BROKER`).
|
||||
4. THE Deep_Dive_Document SHALL reference Pydantic schema classes using their actual class names from `services/shared/schemas.py` (e.g., `DocumentIntelligence`, `TrendSummary`, `Recommendation`, `GlobalEventSchema`, `CompanyImpact`).
|
||||
5. THE Deep_Dive_Document SHALL reference configuration environment variables using the exact names defined in `services/shared/config.py` and the service-specific configuration sections.
|
||||
@@ -0,0 +1,35 @@
|
||||
# Tasks — Intelligence Pipeline Deep Dive
|
||||
|
||||
## Task 1: Create directory structure and index file
|
||||
- [x] 1.1 Create `docs/intelligence-pipeline-deep-dive/` directory and `docs/intelligence-pipeline-deep-dive/diagrams/` subdirectory
|
||||
- [x] 1.2 Create `docs/intelligence-pipeline-deep-dive/index.md` with table of contents linking to all 6 pages and all diagram files, plus references to existing docs (`docs/services.md`, `docs/ai-agents.md`, `docs/architecture-data-pipeline.md`, `docs/llm-to-trade-pipeline.md`)
|
||||
|
||||
## Task 2: Create Mermaid diagram files
|
||||
- [x] 2.1 Create `docs/intelligence-pipeline-deep-dive/diagrams/ingestion-to-extraction-flow.md` — flowchart from Scheduler through Ingestion, Parser, to Extractor with all queues (`stonks:queue:ingestion`, `stonks:queue:parsing`, `stonks:queue:extraction`, `stonks:queue:macro_classification`), storage (MinIO buckets, PostgreSQL tables), and service module paths
|
||||
- [x] 2.2 Create `docs/intelligence-pipeline-deep-dive/diagrams/three-layer-signal-merging.md` — flowchart showing Company signals (`document_impact_records`), Macro signals (`macro_impact_records`), and Competitive signals (`competitive_signal_records`) each producing `WeightedSignal` objects that merge into the Aggregation engine (`services/aggregation/worker.py`)
|
||||
- [x] 2.3 Create `docs/intelligence-pipeline-deep-dive/diagrams/weighted-signal-computation.md` — diagram showing the composite weight formula components: confidence gate, recency decay, source credibility, novelty bonus, and market context multiplier
|
||||
- [x] 2.4 Create `docs/intelligence-pipeline-deep-dive/diagrams/trend-accumulation-escalation.md` — diagram showing how consecutive signals accumulate across time windows to escalate from neutral → watch → hold → buy/sell decisions
|
||||
- [x] 2.5 Create `docs/intelligence-pipeline-deep-dive/diagrams/recommendation-generation-flow.md` — flowchart from TrendSummary through data quality suppression, eligibility evaluation, thesis generation, risk classification, to recommendation persistence
|
||||
- [x] 2.6 Create `docs/intelligence-pipeline-deep-dive/diagrams/trading-engine-decision-loop.md` — flowchart showing the pre-trade check sequence (circuit breaker → trading window → confidence gate → dedup → declining positions → max positions), position sizing, and order submission to `stonks:queue:broker_orders`
|
||||
|
||||
## Task 3: Write Page 1 — Data Ingestion and Preparation
|
||||
- [x] 3.1 Write `docs/intelligence-pipeline-deep-dive/01-data-ingestion-and-preparation.md` covering: four input data categories (Polygon news, SEC EDGAR filings, Polygon market data, macro news APIs), Scheduler cadence polling (market_api: 300s, news_api: 300s, filings_api: 3600s, macro_news: 600s) with rate limiting and backoff, Ingestion worker adapter dispatch (`PolygonMarketAdapter`, `PolygonNewsAdapter`, `SECEdgarAdapter`, `MacroNewsAdapter`), content deduplication via Redis (`stonks:dedupe:*` with 24h TTL), raw artifact storage in MinIO (`stonks-raw-market`, `stonks-raw-news`, `stonks-raw-filings`), Parser role (HTML normalization, quality scoring, company mention detection, routing `macro_event` docs to `stonks:queue:macro_classification`). Written in narrative prose with links to diagrams and transition to Page 2.
|
||||
|
||||
## Task 4: Write Page 2 — AI Agent Processing and Structured Extraction
|
||||
- [x] 4.1 Write `docs/intelligence-pipeline-deep-dive/02-ai-agent-processing-and-extraction.md` covering: Document Intelligence Extractor agent (`document-extractor` slug, `services/extractor/main.py` → `services/extractor/client.py`, system prompt, `build_extraction_prompt()` in `services/extractor/prompts.py`), `ExtractionResult` JSON schema with all fields, Global Event Classifier agent (`event-classifier` slug, `services/extractor/event_classifier.py`, `GlobalEvent` schema, anti-hallucination rules), JSON repair pipeline (direct parse → fence stripping → `json-repair` fallback), structural + semantic validation in `services/extractor/schemas.py`, `AgentConfigResolver` mechanism (`services/shared/agent_config.py`, `ai_agents`/`agent_variants` tables, 60s TTL cache), persistence to `document_intelligence` and `document_impact_records`, aggregation job enqueue. Written in narrative prose with links to diagrams and transition to Page 3.
|
||||
|
||||
## Task 5: Write Page 3 — Signal Scoring and the WeightedSignal Abstraction
|
||||
- [x] 5.1 Write `docs/intelligence-pipeline-deep-dive/03-signal-scoring-and-weighted-signals.md` covering: `WeightedSignal` dataclass (`services/aggregation/scoring.py`), composite weight formula (`combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier`), each component in detail (confidence gate threshold 0.2, recency decay half-lives per window, source credibility clamped [0.1, 1.0], novelty bonus up to 25%, market context volatility boost up to 30% and volume surge boost 15%), sentiment mapping via `sentiment_to_numeric()`, weighted sentiment average computation, three signal layers (Company, Macro weight 0.3, Competitive weight 0.2), runtime toggle via `risk_configs` table. Written in narrative prose with links to diagrams and transition to Page 4.
|
||||
|
||||
## Task 6: Write Page 4 — Trend Aggregation and Accumulating Signals
|
||||
- [x] 6.1 Write `docs/intelligence-pipeline-deep-dive/04-trend-aggregation-and-accumulating-signals.md` covering: Aggregation engine computing TrendSummary across 5 windows (intraday, 1d, 7d, 30d, 90d), trend direction rules (bullish ≥ 0.15, bearish ≤ -0.15, mixed, neutral), contradiction detection (`services/aggregation/contradiction.py`, minority_weight/total_weight), evidence ranking (`rank_evidence()` composite scoring), confidence computation (unique source count caps at 15, log₂ scaling saturates at 7 sources), how consecutive same-direction signals accumulate to escalate decisions (neutral → watch → hold → buy/sell), trend projections (`services/aggregation/projection.py`, macro decay, momentum, divergence detection), persistence to `trend_windows`, `trend_history`, `trend_evidence`, `trend_projections`. Written in narrative prose with links to diagrams and transition to Page 5.
|
||||
|
||||
## Task 7: Write Page 5 — Recommendation Generation and Signal-to-Action Translation
|
||||
- [x] 7.1 Write `docs/intelligence-pipeline-deep-dive/05-recommendation-generation.md` covering: data quality suppression (`services/recommendation/suppression.py`, 6 checks: extraction confidence < 0.40, staleness > 168h, source diversity < 1, failure rate > 50%, valid docs < 2, quality score < 0.30, plus macro-only and pattern-only safety), eligibility evaluation (`services/recommendation/eligibility.py`, gate checks, action mapping BUY/SELL/HOLD/WATCH, mode escalation informational/paper_eligible/live_eligible), position sizing (base 1% + confidence × strength up to 10%, contradiction and evidence penalties), thesis generation (deterministic + optional LLM rewrite via `thesis-rewriter` agent), risk classification (low/moderate/high/very_high), persistence to `recommendations`, `recommendation_evidence`, `risk_evaluations`. Written in narrative prose with links to diagrams and transition to Page 6.
|
||||
|
||||
## Task 8: Write Page 6 — Trading Engine Decisions and Execution
|
||||
- [x] 8.1 Write `docs/intelligence-pipeline-deep-dive/06-trading-decisions-and-execution.md` covering: Trading engine decision loop (`services/trading/engine.py`, 5 concurrent tasks: decision loop 60s, stop-loss monitor, performance loop, risk tier scheduler, rebalance scheduler), pre-trade check sequence (circuit breaker → trading window → confidence gate → dedup → declining positions → max positions), position sizing (`services/trading/position_sizer.py`, confidence scaling, risk tier adjustment, correlation diversification, sector exposure, earnings proximity, absolute cap), circuit breaker (`services/trading/circuit_breaker.py`, daily_loss, single_position, volatility triggers, cooldown, Redis state), reserve pool (`services/trading/reserve_pool.py`, profit siphoning 20%, high-water mark 30%, emergency liquidation), risk tier auto-adjustment (`services/trading/risk_tier_controller.py`, Sharpe/drawdown/win-rate evaluation, conservative/moderate/aggressive tiers), order submission flow (TradingDecision → `stonks:queue:broker_orders` → broker adapter → Alpaca). Written in narrative prose with links to diagrams.
|
||||
|
||||
## Task 9: Update index and verify cross-references
|
||||
- [x] 9.1 Update `docs/intelligence-pipeline-deep-dive/index.md` to ensure all page links and diagram links are correct and all files exist
|
||||
- [x] 9.2 Verify all inter-page links within narrative pages resolve correctly and all diagram references point to existing files
|
||||
Reference in New Issue
Block a user