- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
21 KiB
Requirements Document
Introduction
This specification defines a 6-page narrative deep-dive document (plus separate Mermaid diagram files) that explains the full intelligence-to-decision pipeline in Stonks Oracle. The document targets a technical reader who wants to understand how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions. Unlike the existing service reference and API docs, this deliverable is narrative and explanatory — it tells the story of data flowing through the platform end-to-end, referencing actual code modules, database tables, queue names, and schemas from the codebase.
Glossary
- Deep_Dive_Document: The 6-page Markdown document delivered under
docs/intelligence-pipeline-deep-dive/, consisting of pages 01 through 06 covering the full intelligence-to-decision pipeline. - Mermaid_Diagram_File: A standalone Markdown file containing a single Mermaid diagram block, stored alongside the narrative pages in
docs/intelligence-pipeline-deep-dive/diagrams/. - Pipeline: The end-to-end data flow from external source ingestion through AI extraction, signal aggregation, recommendation generation, and autonomous trading execution.
- Signal_Layer: One of three independent signal sources (Company, Macro, Competitive) that produce
WeightedSignalobjects merged by the Aggregation_Engine. - Aggregation_Engine: The
services/aggregation/module that merges weighted signals from all three layers intoTrendSummaryobjects across five time windows. - Trading_Engine: The
services/trading/engine.pymodule that polls recommendations and executes autonomous paper trades through a multi-check decision loop. - Extractor: The
services/extractor/module that uses Ollama LLM inference to produce structured JSON intelligence from documents. - WeightedSignal: The
services.aggregation.scoring.WeightedSignaldataclass that pairs a document reference with a composite aggregation weight. - TrendSummary: The
services.shared.schemas.TrendSummaryPydantic model representing a rolling trend for a ticker across a specific time window. - Recommendation: The
services.shared.schemas.RecommendationPydantic model representing an actionable trade recommendation with action, mode, confidence, thesis, and position sizing. - Circuit_Breaker: The
services/trading/circuit_breaker.pysafety mechanism that halts trading when risk thresholds (daily loss, single-position loss, volatility clustering) are breached.
Requirements
Requirement 1: Document Structure and File Organization
User Story: As a technical reader, I want the deep-dive organized into clearly separated pages with a consistent structure, so that I can navigate to specific pipeline stages without reading the entire document.
Acceptance Criteria
- THE Deep_Dive_Document SHALL consist of exactly 6 Markdown page files named
01-data-ingestion-and-preparation.mdthrough06-trading-decisions-and-execution.md, stored underdocs/intelligence-pipeline-deep-dive/. - THE Deep_Dive_Document SHALL include an
index.mdfile that provides a table of contents linking to all 6 pages and all Mermaid_Diagram_Files. - WHEN a page references a Mermaid diagram, THE Deep_Dive_Document SHALL link to the corresponding Mermaid_Diagram_File stored in
docs/intelligence-pipeline-deep-dive/diagrams/rather than embedding the diagram inline. - THE Deep_Dive_Document SHALL include a minimum of 4 separate Mermaid_Diagram_Files covering: (a) the ingestion-to-extraction flow, (b) the three signal layers merging into aggregation, (c) the recommendation generation pipeline, and (d) the trading engine decision loop.
- WHEN a page references a code module, THE Deep_Dive_Document SHALL use the full Python module path (e.g.,
services/extractor/prompts.py) rather than abbreviated names. - WHEN a page references a database table, THE Deep_Dive_Document SHALL use the exact table name as defined in the PostgreSQL schema (e.g.,
document_impact_records,trend_windows). - WHEN a page references a Redis queue, THE Deep_Dive_Document SHALL use the full key pattern as defined in
services/shared/redis_keys.py(e.g.,stonks:queue:extraction).
Requirement 2: Page 1 — Data Ingestion and Preparation
User Story: As a technical reader, I want to understand how raw data enters Stonks Oracle and gets prepared for AI processing, so that I can trace the origin of any signal back to its external source.
Acceptance Criteria
- THE Deep_Dive_Document page 01 SHALL explain the four categories of input data: news articles (Polygon.io), SEC filings (EDGAR), market data (Polygon.io grouped daily and intraday bars), and macro/geopolitical events (macro news APIs).
- THE Deep_Dive_Document page 01 SHALL describe the Scheduler's role in orchestrating ingestion cycles, including cadence polling intervals per source type (
market_api: 300s,news_api: 300s,filings_api: 3600s,macro_news: 600s), rate limiting, and exponential backoff. - THE Deep_Dive_Document page 01 SHALL describe the Ingestion worker's adapter dispatch pattern, referencing the adapter classes (
PolygonMarketAdapter,PolygonNewsAdapter,SECEdgarAdapter,MacroNewsAdapter) inservices/ingestion/. - THE Deep_Dive_Document page 01 SHALL explain content deduplication via Redis content-hash markers (
stonks:dedupe:*with 24-hour TTL) and raw artifact storage in MinIO buckets (stonks-raw-market,stonks-raw-news,stonks-raw-filings). - THE Deep_Dive_Document page 01 SHALL describe the Parser's role in converting raw HTML/text into normalized documents, including quality scoring with confidence levels (
high,medium,low), company mention detection via alias matching, and the routing decision that sendsmacro_eventdocuments tostonks:queue:macro_classificationinstead ofstonks:queue:extraction. - THE Deep_Dive_Document page 01 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
Requirement 3: Page 2 — AI Agent Processing and Structured Extraction
User Story: As a technical reader, I want to understand how the AI agents process documents and produce structured JSON output, so that I can evaluate the extraction quality and understand the schema contract.
Acceptance Criteria
- THE Deep_Dive_Document page 02 SHALL explain the Document Intelligence Extractor agent (
document-extractorslug), including its entry point (services/extractor/main.py→services/extractor/client.py), the system prompt, and the user prompt template built bybuild_extraction_prompt()inservices/extractor/prompts.py. - THE Deep_Dive_Document page 02 SHALL describe the
ExtractionResultJSON schema with all fields (summary, companies array with ticker/sentiment/impact_score/impact_horizon/catalyst_type/key_facts/risks/evidence_spans, macro_themes, novelty_score, confidence, extraction_warnings), referencingservices/extractor/schemas.py. - THE Deep_Dive_Document page 02 SHALL explain the Global Event Classifier agent (
event-classifierslug), including its entry point (services/extractor/event_classifier.py), theGlobalEventoutput schema with event_types/severity/affected_regions/affected_sectors/affected_commodities/estimated_duration/confidence, and the anti-hallucination rules that prevent classifying company-specific news as macro events. - THE Deep_Dive_Document page 02 SHALL describe the JSON repair pipeline (direct parse → markdown fence stripping →
json-repairlibrary fallback) and the structural plus semantic validation inservices/extractor/schemas.py, including retry logic with exponential backoff. - THE Deep_Dive_Document page 02 SHALL explain the
AgentConfigResolvermechanism (services/shared/agent_config.py) that enables hot-swapping models and prompts via theai_agentsandagent_variantsdatabase tables with a 60-second TTL cache. - THE Deep_Dive_Document page 02 SHALL describe how extraction results are persisted to
document_intelligence(one row per document) anddocument_impact_records(one row per company mention), and how the extractor enqueues aggregation jobs tostonks:queue:aggregation. - THE Deep_Dive_Document page 02 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
Requirement 4: Page 3 — Signal Scoring and the WeightedSignal Abstraction
User Story: As a technical reader, I want to understand how raw extraction output gets transformed into weighted signals for decision making, so that I can reason about why certain documents influence trends more than others.
Acceptance Criteria
- THE Deep_Dive_Document page 03 SHALL explain the
WeightedSignaldataclass (services/aggregation/scoring.py) and the composite weight formula:combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier. - THE Deep_Dive_Document page 03 SHALL describe each weight component in detail: confidence gate (threshold 0.2), recency decay (exponential half-life per window: intraday=2h, 1d=12h, 7d=72h, 30d=240h, 90d=720h), source credibility weighting (clamped [0.1, 1.0] with configurable exponent), novelty bonus (up to 25%), and market context multiplier (volatility boost up to 30%, volume surge boost 15%).
- THE Deep_Dive_Document page 03 SHALL explain how sentiment labels are mapped to numeric values (+1.0 positive, -1.0 negative, 0.0 neutral/mixed) via
sentiment_to_numeric()and how the weighted sentiment average is computed across all signals. - THE Deep_Dive_Document page 03 SHALL describe the three signal layers (Company, Macro, Competitive) and how each produces
WeightedSignalobjects that are concatenated into a single list before trend computation, with relative influence controlled byMACRO_SIGNAL_WEIGHT(0.3) andCOMPETITIVE_SIGNAL_WEIGHT(0.2). - THE Deep_Dive_Document page 03 SHALL explain the runtime toggle mechanism for macro and competitive layers via the
risk_configsdatabase table, including graceful degradation when a layer is disabled or fails. - THE Deep_Dive_Document page 03 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
Requirement 5: Page 4 — Trend Aggregation and Accumulating Signals
User Story: As a technical reader, I want to understand how the aggregation engine merges multiple signals — including consecutive signals suggesting the same direction — to produce trend summaries that drive grander decisions, so that I can see how accumulating bearish or bullish evidence escalates the system's response.
Acceptance Criteria
- THE Deep_Dive_Document page 04 SHALL explain how the Aggregation_Engine (
services/aggregation/worker.py) computesTrendSummaryobjects across five time windows (intraday, 1d, 7d, 30d, 90d) by fetching impact records, macro impacts, and competitive signals for a ticker. - THE Deep_Dive_Document page 04 SHALL describe the trend direction derivation rules: bullish (avg_sentiment ≥ 0.15), bearish (avg_sentiment ≤ -0.15), mixed (contradiction > 0.10 and |avg_sentiment| < 0.30), neutral (otherwise), referencing
derive_trend_direction()inservices/aggregation/worker.py. - THE Deep_Dive_Document page 04 SHALL explain contradiction detection (
services/aggregation/contradiction.py), including sentiment disagreement analysis and catalyst-level disagreement, and how the contradiction score (minority_weight / total_weight) penalizes trend confidence. - THE Deep_Dive_Document page 04 SHALL describe how consecutive signals in the same direction accumulate to strengthen trend_strength and confidence, explaining the evidence ranking mechanism (
rank_evidence()) that uses composite scoring (weight, impact, recency, confidence) and the confidence computation that rewards unique source count (caps at 15 sources for 0.8 contribution) and signal agreement (log₂ scaling, saturates around 7 unique sources). - THE Deep_Dive_Document page 04 SHALL explain how accumulating bearish signals across multiple documents and time windows escalate the system's response — from a neutral hold to a bearish sell recommendation — and conversely how accumulating bullish signals escalate from watch to buy, using the trend strength and confidence thresholds from the eligibility rules.
- THE Deep_Dive_Document page 04 SHALL describe trend projections (
services/aggregation/projection.py), including macro decay, momentum, driving factors, and divergence detection. - THE Deep_Dive_Document page 04 SHALL describe persistence to
trend_windows(upserted each cycle),trend_history(time-series snapshots),trend_evidence(per-document rankings), andtrend_projections. - THE Deep_Dive_Document page 04 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
Requirement 6: Page 5 — Recommendation Generation and Signal-to-Action Translation
User Story: As a technical reader, I want to understand how trend summaries are translated into actionable recommendations with risk classification and thesis generation, so that I can see the decision logic between aggregated intelligence and trading actions.
Acceptance Criteria
- THE Deep_Dive_Document page 05 SHALL explain the data quality suppression layer (
services/recommendation/suppression.py), including the six suppression checks (extraction confidence < 0.40, evidence staleness > 168h, source diversity < 1, extraction failure rate > 50%, valid document count < 2, data quality score < 0.30) and the safety suppressions for macro-only and pattern-only trend shifts. - THE Deep_Dive_Document page 05 SHALL describe the eligibility evaluation (
services/recommendation/eligibility.py), including gate checks (confidence ≥ 0.35, strength ≥ 0.10, contradiction ≤ 0.60, evidence ≥ 2, direction ≠ neutral), action mapping (BUY/SELL for strength ≥ 0.25, HOLD for weaker directional signals, WATCH otherwise), and mode escalation (informational → paper_eligible → live_eligible based on confidence and evidence thresholds). - THE Deep_Dive_Document page 05 SHALL explain position sizing computation from signal quality: base 1% + confidence × strength scaling up to 10%, with contradiction penalty, evidence count penalty, and max loss percentage scaling.
- THE Deep_Dive_Document page 05 SHALL describe the two-layer thesis generation: deterministic thesis assembly from trend data, and optional LLM rewrite via the
thesis-rewriteragent (services/recommendation/thesis_llm.py) for trading-eligible recommendations. - THE Deep_Dive_Document page 05 SHALL explain risk classification (low/moderate/high/very_high) based on contradiction score, confidence, evidence count, and mode.
- THE Deep_Dive_Document page 05 SHALL describe persistence to
recommendations,recommendation_evidence, andrisk_evaluationstables. - THE Deep_Dive_Document page 05 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
Requirement 7: Page 6 — Trading Engine Decisions and Execution
User Story: As a technical reader, I want to understand how the trading engine uses aggregated trend data to make buy/sell/hold decisions, including position sizing, risk evaluation, and circuit breakers, so that I can trace any trade back to its intelligence origin.
Acceptance Criteria
- THE Deep_Dive_Document page 06 SHALL explain the Trading_Engine decision loop (
services/trading/engine.py), including the five concurrent async tasks: decision loop (60s polling), stop-loss monitor, performance loop, risk tier scheduler, and rebalance scheduler. - THE Deep_Dive_Document page 06 SHALL describe the pre-trade check sequence in order: circuit breaker check, trading window check, confidence gate (risk-tier minimum), deduplication, declining positions check, and max open positions check, explaining that the first failure short-circuits the evaluation.
- THE Deep_Dive_Document page 06 SHALL explain position sizing (
services/trading/position_sizer.py), including confidence-based scaling with sample-size-dampened agreement scoring, risk tier adjustment (conservative/moderate/aggressive with specific parameter differences), correlation-aware diversification, sector exposure reduction, earnings proximity adjustment, and the absolute position cap. - THE Deep_Dive_Document page 06 SHALL describe the Circuit_Breaker mechanism (
services/trading/circuit_breaker.py), including the three trigger types (daily_loss with emergency drawdown threshold, single_position loss with ticker cooldown, volatility with stop-loss clustering detection), cooldown computation, and Redis state tracking (stonks:trading:circuit_breaker:*). - THE Deep_Dive_Document page 06 SHALL explain the reserve pool mechanism (
services/trading/reserve_pool.py): profit siphoning (default 20%), high-water mark rebalancing (30% threshold), emergency liquidation, and ledger tracking inreserve_pool_ledger. - THE Deep_Dive_Document page 06 SHALL describe risk tier auto-adjustment (
services/trading/risk_tier_controller.py), including the evaluation criteria (Sharpe ratio, drawdown, win rate) and the three tier configurations with their parameter differences (min confidence, max position %, stop-loss ATR multiplier, reward/risk ratio, max sector %, max portfolio heat). - THE Deep_Dive_Document page 06 SHALL explain the order submission flow:
TradingDecisionpersistence totrading_decisions, order job enqueue tostonks:queue:broker_orders, broker adapter risk evaluation, Alpaca paper trading submission, and the full audit trail from signal to broker response. - THE Deep_Dive_Document page 06 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
Requirement 8: Mermaid Diagram Quality and Separation
User Story: As a technical reader, I want architecture diagrams in separate files that I can render independently, so that I can use them in presentations or embed them in other documents.
Acceptance Criteria
- WHEN a Mermaid_Diagram_File is created, THE Deep_Dive_Document SHALL store the diagram in a standalone Markdown file under
docs/intelligence-pipeline-deep-dive/diagrams/with a descriptive filename (e.g.,ingestion-to-extraction-flow.md). - THE Deep_Dive_Document SHALL include at least 4 Mermaid_Diagram_Files: one for the ingestion-to-extraction pipeline, one for the three-layer signal merging, one for the recommendation generation flow, and one for the trading engine decision loop.
- WHEN a Mermaid diagram references a service, THE Mermaid_Diagram_File SHALL label the service with both its human-readable name and its Python module path (e.g.,
Extractor\nservices/extractor/main.py). - WHEN a Mermaid diagram references a queue, THE Mermaid_Diagram_File SHALL use the full Redis key pattern (e.g.,
stonks:queue:extraction). - WHEN a Mermaid diagram references a database table, THE Mermaid_Diagram_File SHALL use the exact PostgreSQL table name.
Requirement 9: Narrative Style and Cross-Referencing
User Story: As a technical reader, I want the document to read as a coherent narrative rather than a reference manual, so that I can build a mental model of the full pipeline without jumping between disconnected sections.
Acceptance Criteria
- THE Deep_Dive_Document SHALL use narrative prose with explanatory paragraphs as the primary writing style, reserving tables and bullet lists for structured data summaries only.
- WHEN a page references content covered in a different page, THE Deep_Dive_Document SHALL include a Markdown link to the relevant page and section.
- THE Deep_Dive_Document SHALL include transitional paragraphs at the end of each page that preview what the next page covers, creating a continuous narrative flow.
- THE Deep_Dive_Document SHALL reference the existing documentation where appropriate (e.g.,
docs/services.md,docs/ai-agents.md,docs/architecture-data-pipeline.md,docs/llm-to-trade-pipeline.md) for readers who want deeper reference-level detail. - IF a concept is introduced for the first time, THEN THE Deep_Dive_Document SHALL provide a brief inline explanation before using the concept in subsequent discussion.
Requirement 10: Codebase Accuracy
User Story: As a developer, I want the document to reference actual code modules, database tables, and queue names from the codebase, so that I can use the document as a reliable guide when navigating the source code.
Acceptance Criteria
- THE Deep_Dive_Document SHALL reference code modules using paths that exist in the repository (e.g.,
services/aggregation/scoring.py,services/trading/circuit_breaker.py,services/shared/schemas.py). - THE Deep_Dive_Document SHALL reference database tables using names that match the PostgreSQL schema as defined in
infra/migrations/. - THE Deep_Dive_Document SHALL reference Redis queue names using the constants defined in
services/shared/redis_keys.py(e.g.,QUEUE_EXTRACTION,QUEUE_AGGREGATION,QUEUE_RECOMMENDATION,QUEUE_BROKER). - THE Deep_Dive_Document SHALL reference Pydantic schema classes using their actual class names from
services/shared/schemas.py(e.g.,DocumentIntelligence,TrendSummary,Recommendation,GlobalEventSchema,CompanyImpact). - THE Deep_Dive_Document SHALL reference configuration environment variables using the exact names defined in
services/shared/config.pyand the service-specific configuration sections.