Files

T

Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services

- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs

2026-04-22 02:56:41 +00:00

21 KiB

Raw Blame History

Requirements Document

Introduction

This specification defines a 6-page narrative deep-dive document (plus separate Mermaid diagram files) that explains the full intelligence-to-decision pipeline in Stonks Oracle. The document targets a technical reader who wants to understand how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions. Unlike the existing service reference and API docs, this deliverable is narrative and explanatory — it tells the story of data flowing through the platform end-to-end, referencing actual code modules, database tables, queue names, and schemas from the codebase.

Glossary

Deep_Dive_Document: The 6-page Markdown document delivered under docs/intelligence-pipeline-deep-dive/, consisting of pages 01 through 06 covering the full intelligence-to-decision pipeline.
Mermaid_Diagram_File: A standalone Markdown file containing a single Mermaid diagram block, stored alongside the narrative pages in docs/intelligence-pipeline-deep-dive/diagrams/.
Pipeline: The end-to-end data flow from external source ingestion through AI extraction, signal aggregation, recommendation generation, and autonomous trading execution.
Signal_Layer: One of three independent signal sources (Company, Macro, Competitive) that produce WeightedSignal objects merged by the Aggregation_Engine.
Aggregation_Engine: The services/aggregation/ module that merges weighted signals from all three layers into TrendSummary objects across five time windows.
Trading_Engine: The services/trading/engine.py module that polls recommendations and executes autonomous paper trades through a multi-check decision loop.
Extractor: The services/extractor/ module that uses Ollama LLM inference to produce structured JSON intelligence from documents.
WeightedSignal: The services.aggregation.scoring.WeightedSignal dataclass that pairs a document reference with a composite aggregation weight.
TrendSummary: The services.shared.schemas.TrendSummary Pydantic model representing a rolling trend for a ticker across a specific time window.
Recommendation: The services.shared.schemas.Recommendation Pydantic model representing an actionable trade recommendation with action, mode, confidence, thesis, and position sizing.
Circuit_Breaker: The services/trading/circuit_breaker.py safety mechanism that halts trading when risk thresholds (daily loss, single-position loss, volatility clustering) are breached.

Requirements

Requirement 1: Document Structure and File Organization

User Story: As a technical reader, I want the deep-dive organized into clearly separated pages with a consistent structure, so that I can navigate to specific pipeline stages without reading the entire document.

Acceptance Criteria

THE Deep_Dive_Document SHALL consist of exactly 6 Markdown page files named 01-data-ingestion-and-preparation.md through 06-trading-decisions-and-execution.md, stored under docs/intelligence-pipeline-deep-dive/.
THE Deep_Dive_Document SHALL include an index.md file that provides a table of contents linking to all 6 pages and all Mermaid_Diagram_Files.
WHEN a page references a Mermaid diagram, THE Deep_Dive_Document SHALL link to the corresponding Mermaid_Diagram_File stored in docs/intelligence-pipeline-deep-dive/diagrams/ rather than embedding the diagram inline.
THE Deep_Dive_Document SHALL include a minimum of 4 separate Mermaid_Diagram_Files covering: (a) the ingestion-to-extraction flow, (b) the three signal layers merging into aggregation, (c) the recommendation generation pipeline, and (d) the trading engine decision loop.
WHEN a page references a code module, THE Deep_Dive_Document SHALL use the full Python module path (e.g., services/extractor/prompts.py) rather than abbreviated names.
WHEN a page references a database table, THE Deep_Dive_Document SHALL use the exact table name as defined in the PostgreSQL schema (e.g., document_impact_records, trend_windows).
WHEN a page references a Redis queue, THE Deep_Dive_Document SHALL use the full key pattern as defined in services/shared/redis_keys.py (e.g., stonks:queue:extraction).

Requirement 2: Page 1 — Data Ingestion and Preparation

User Story: As a technical reader, I want to understand how raw data enters Stonks Oracle and gets prepared for AI processing, so that I can trace the origin of any signal back to its external source.

Acceptance Criteria

THE Deep_Dive_Document page 01 SHALL explain the four categories of input data: news articles (Polygon.io), SEC filings (EDGAR), market data (Polygon.io grouped daily and intraday bars), and macro/geopolitical events (macro news APIs).
THE Deep_Dive_Document page 01 SHALL describe the Scheduler's role in orchestrating ingestion cycles, including cadence polling intervals per source type (market_api: 300s, news_api: 300s, filings_api: 3600s, macro_news: 600s), rate limiting, and exponential backoff.
THE Deep_Dive_Document page 01 SHALL describe the Ingestion worker's adapter dispatch pattern, referencing the adapter classes (PolygonMarketAdapter, PolygonNewsAdapter, SECEdgarAdapter, MacroNewsAdapter) in services/ingestion/.
THE Deep_Dive_Document page 01 SHALL explain content deduplication via Redis content-hash markers (stonks:dedupe:* with 24-hour TTL) and raw artifact storage in MinIO buckets (stonks-raw-market, stonks-raw-news, stonks-raw-filings).
THE Deep_Dive_Document page 01 SHALL describe the Parser's role in converting raw HTML/text into normalized documents, including quality scoring with confidence levels (high, medium, low), company mention detection via alias matching, and the routing decision that sends macro_event documents to stonks:queue:macro_classification instead of stonks:queue:extraction.
THE Deep_Dive_Document page 01 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.

Requirement 3: Page 2 — AI Agent Processing and Structured Extraction

User Story: As a technical reader, I want to understand how the AI agents process documents and produce structured JSON output, so that I can evaluate the extraction quality and understand the schema contract.

Acceptance Criteria

THE Deep_Dive_Document page 02 SHALL explain the Document Intelligence Extractor agent (document-extractor slug), including its entry point (services/extractor/main.py → services/extractor/client.py), the system prompt, and the user prompt template built by build_extraction_prompt() in services/extractor/prompts.py.
THE Deep_Dive_Document page 02 SHALL describe the ExtractionResult JSON schema with all fields (summary, companies array with ticker/sentiment/impact_score/impact_horizon/catalyst_type/key_facts/risks/evidence_spans, macro_themes, novelty_score, confidence, extraction_warnings), referencing services/extractor/schemas.py.
THE Deep_Dive_Document page 02 SHALL explain the Global Event Classifier agent (event-classifier slug), including its entry point (services/extractor/event_classifier.py), the GlobalEvent output schema with event_types/severity/affected_regions/affected_sectors/affected_commodities/estimated_duration/confidence, and the anti-hallucination rules that prevent classifying company-specific news as macro events.
THE Deep_Dive_Document page 02 SHALL describe the JSON repair pipeline (direct parse → markdown fence stripping → json-repair library fallback) and the structural plus semantic validation in services/extractor/schemas.py, including retry logic with exponential backoff.
THE Deep_Dive_Document page 02 SHALL explain the AgentConfigResolver mechanism (services/shared/agent_config.py) that enables hot-swapping models and prompts via the ai_agents and agent_variants database tables with a 60-second TTL cache.
THE Deep_Dive_Document page 02 SHALL describe how extraction results are persisted to document_intelligence (one row per document) and document_impact_records (one row per company mention), and how the extractor enqueues aggregation jobs to stonks:queue:aggregation.
THE Deep_Dive_Document page 02 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.

Requirement 4: Page 3 — Signal Scoring and the WeightedSignal Abstraction

User Story: As a technical reader, I want to understand how raw extraction output gets transformed into weighted signals for decision making, so that I can reason about why certain documents influence trends more than others.

Acceptance Criteria

THE Deep_Dive_Document page 03 SHALL explain the WeightedSignal dataclass (services/aggregation/scoring.py) and the composite weight formula: combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier.
THE Deep_Dive_Document page 03 SHALL describe each weight component in detail: confidence gate (threshold 0.2), recency decay (exponential half-life per window: intraday=2h, 1d=12h, 7d=72h, 30d=240h, 90d=720h), source credibility weighting (clamped [0.1, 1.0] with configurable exponent), novelty bonus (up to 25%), and market context multiplier (volatility boost up to 30%, volume surge boost 15%).
THE Deep_Dive_Document page 03 SHALL explain how sentiment labels are mapped to numeric values (+1.0 positive, -1.0 negative, 0.0 neutral/mixed) via sentiment_to_numeric() and how the weighted sentiment average is computed across all signals.
THE Deep_Dive_Document page 03 SHALL describe the three signal layers (Company, Macro, Competitive) and how each produces WeightedSignal objects that are concatenated into a single list before trend computation, with relative influence controlled by MACRO_SIGNAL_WEIGHT (0.3) and COMPETITIVE_SIGNAL_WEIGHT (0.2).
THE Deep_Dive_Document page 03 SHALL explain the runtime toggle mechanism for macro and competitive layers via the risk_configs database table, including graceful degradation when a layer is disabled or fails.
THE Deep_Dive_Document page 03 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.

Requirement 5: Page 4 — Trend Aggregation and Accumulating Signals

User Story: As a technical reader, I want to understand how the aggregation engine merges multiple signals — including consecutive signals suggesting the same direction — to produce trend summaries that drive grander decisions, so that I can see how accumulating bearish or bullish evidence escalates the system's response.

Acceptance Criteria

THE Deep_Dive_Document page 04 SHALL explain how the Aggregation_Engine (services/aggregation/worker.py) computes TrendSummary objects across five time windows (intraday, 1d, 7d, 30d, 90d) by fetching impact records, macro impacts, and competitive signals for a ticker.
THE Deep_Dive_Document page 04 SHALL describe the trend direction derivation rules: bullish (avg_sentiment ≥ 0.15), bearish (avg_sentiment ≤ -0.15), mixed (contradiction > 0.10 and |avg_sentiment| < 0.30), neutral (otherwise), referencing derive_trend_direction() in services/aggregation/worker.py.
THE Deep_Dive_Document page 04 SHALL explain contradiction detection (services/aggregation/contradiction.py), including sentiment disagreement analysis and catalyst-level disagreement, and how the contradiction score (minority_weight / total_weight) penalizes trend confidence.
THE Deep_Dive_Document page 04 SHALL describe how consecutive signals in the same direction accumulate to strengthen trend_strength and confidence, explaining the evidence ranking mechanism (rank_evidence()) that uses composite scoring (weight, impact, recency, confidence) and the confidence computation that rewards unique source count (caps at 15 sources for 0.8 contribution) and signal agreement (log₂ scaling, saturates around 7 unique sources).
THE Deep_Dive_Document page 04 SHALL explain how accumulating bearish signals across multiple documents and time windows escalate the system's response — from a neutral hold to a bearish sell recommendation — and conversely how accumulating bullish signals escalate from watch to buy, using the trend strength and confidence thresholds from the eligibility rules.
THE Deep_Dive_Document page 04 SHALL describe trend projections (services/aggregation/projection.py), including macro decay, momentum, driving factors, and divergence detection.
THE Deep_Dive_Document page 04 SHALL describe persistence to trend_windows (upserted each cycle), trend_history (time-series snapshots), trend_evidence (per-document rankings), and trend_projections.
THE Deep_Dive_Document page 04 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.

Requirement 6: Page 5 — Recommendation Generation and Signal-to-Action Translation

User Story: As a technical reader, I want to understand how trend summaries are translated into actionable recommendations with risk classification and thesis generation, so that I can see the decision logic between aggregated intelligence and trading actions.

Acceptance Criteria

THE Deep_Dive_Document page 05 SHALL explain the data quality suppression layer (services/recommendation/suppression.py), including the six suppression checks (extraction confidence < 0.40, evidence staleness > 168h, source diversity < 1, extraction failure rate > 50%, valid document count < 2, data quality score < 0.30) and the safety suppressions for macro-only and pattern-only trend shifts.
THE Deep_Dive_Document page 05 SHALL describe the eligibility evaluation (services/recommendation/eligibility.py), including gate checks (confidence ≥ 0.35, strength ≥ 0.10, contradiction ≤ 0.60, evidence ≥ 2, direction ≠ neutral), action mapping (BUY/SELL for strength ≥ 0.25, HOLD for weaker directional signals, WATCH otherwise), and mode escalation (informational → paper_eligible → live_eligible based on confidence and evidence thresholds).
THE Deep_Dive_Document page 05 SHALL explain position sizing computation from signal quality: base 1% + confidence × strength scaling up to 10%, with contradiction penalty, evidence count penalty, and max loss percentage scaling.
THE Deep_Dive_Document page 05 SHALL describe the two-layer thesis generation: deterministic thesis assembly from trend data, and optional LLM rewrite via the thesis-rewriter agent (services/recommendation/thesis_llm.py) for trading-eligible recommendations.
THE Deep_Dive_Document page 05 SHALL explain risk classification (low/moderate/high/very_high) based on contradiction score, confidence, evidence count, and mode.
THE Deep_Dive_Document page 05 SHALL describe persistence to recommendations, recommendation_evidence, and risk_evaluations tables.
THE Deep_Dive_Document page 05 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.

Requirement 7: Page 6 — Trading Engine Decisions and Execution

User Story: As a technical reader, I want to understand how the trading engine uses aggregated trend data to make buy/sell/hold decisions, including position sizing, risk evaluation, and circuit breakers, so that I can trace any trade back to its intelligence origin.

Acceptance Criteria

THE Deep_Dive_Document page 06 SHALL explain the Trading_Engine decision loop (services/trading/engine.py), including the five concurrent async tasks: decision loop (60s polling), stop-loss monitor, performance loop, risk tier scheduler, and rebalance scheduler.
THE Deep_Dive_Document page 06 SHALL describe the pre-trade check sequence in order: circuit breaker check, trading window check, confidence gate (risk-tier minimum), deduplication, declining positions check, and max open positions check, explaining that the first failure short-circuits the evaluation.
THE Deep_Dive_Document page 06 SHALL explain position sizing (services/trading/position_sizer.py), including confidence-based scaling with sample-size-dampened agreement scoring, risk tier adjustment (conservative/moderate/aggressive with specific parameter differences), correlation-aware diversification, sector exposure reduction, earnings proximity adjustment, and the absolute position cap.
THE Deep_Dive_Document page 06 SHALL describe the Circuit_Breaker mechanism (services/trading/circuit_breaker.py), including the three trigger types (daily_loss with emergency drawdown threshold, single_position loss with ticker cooldown, volatility with stop-loss clustering detection), cooldown computation, and Redis state tracking (stonks:trading:circuit_breaker:*).
THE Deep_Dive_Document page 06 SHALL explain the reserve pool mechanism (services/trading/reserve_pool.py): profit siphoning (default 20%), high-water mark rebalancing (30% threshold), emergency liquidation, and ledger tracking in reserve_pool_ledger.
THE Deep_Dive_Document page 06 SHALL describe risk tier auto-adjustment (services/trading/risk_tier_controller.py), including the evaluation criteria (Sharpe ratio, drawdown, win rate) and the three tier configurations with their parameter differences (min confidence, max position %, stop-loss ATR multiplier, reward/risk ratio, max sector %, max portfolio heat).
THE Deep_Dive_Document page 06 SHALL explain the order submission flow: TradingDecision persistence to trading_decisions, order job enqueue to stonks:queue:broker_orders, broker adapter risk evaluation, Alpaca paper trading submission, and the full audit trail from signal to broker response.
THE Deep_Dive_Document page 06 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.

Requirement 8: Mermaid Diagram Quality and Separation

User Story: As a technical reader, I want architecture diagrams in separate files that I can render independently, so that I can use them in presentations or embed them in other documents.

Acceptance Criteria

WHEN a Mermaid_Diagram_File is created, THE Deep_Dive_Document SHALL store the diagram in a standalone Markdown file under docs/intelligence-pipeline-deep-dive/diagrams/ with a descriptive filename (e.g., ingestion-to-extraction-flow.md).
THE Deep_Dive_Document SHALL include at least 4 Mermaid_Diagram_Files: one for the ingestion-to-extraction pipeline, one for the three-layer signal merging, one for the recommendation generation flow, and one for the trading engine decision loop.
WHEN a Mermaid diagram references a service, THE Mermaid_Diagram_File SHALL label the service with both its human-readable name and its Python module path (e.g., Extractor\nservices/extractor/main.py).
WHEN a Mermaid diagram references a queue, THE Mermaid_Diagram_File SHALL use the full Redis key pattern (e.g., stonks:queue:extraction).
WHEN a Mermaid diagram references a database table, THE Mermaid_Diagram_File SHALL use the exact PostgreSQL table name.

Requirement 9: Narrative Style and Cross-Referencing

User Story: As a technical reader, I want the document to read as a coherent narrative rather than a reference manual, so that I can build a mental model of the full pipeline without jumping between disconnected sections.

Acceptance Criteria

THE Deep_Dive_Document SHALL use narrative prose with explanatory paragraphs as the primary writing style, reserving tables and bullet lists for structured data summaries only.
WHEN a page references content covered in a different page, THE Deep_Dive_Document SHALL include a Markdown link to the relevant page and section.
THE Deep_Dive_Document SHALL include transitional paragraphs at the end of each page that preview what the next page covers, creating a continuous narrative flow.
THE Deep_Dive_Document SHALL reference the existing documentation where appropriate (e.g., docs/services.md, docs/ai-agents.md, docs/architecture-data-pipeline.md, docs/llm-to-trade-pipeline.md) for readers who want deeper reference-level detail.
IF a concept is introduced for the first time, THEN THE Deep_Dive_Document SHALL provide a brief inline explanation before using the concept in subsequent discussion.

Requirement 10: Codebase Accuracy

User Story: As a developer, I want the document to reference actual code modules, database tables, and queue names from the codebase, so that I can use the document as a reliable guide when navigating the source code.

Acceptance Criteria

THE Deep_Dive_Document SHALL reference code modules using paths that exist in the repository (e.g., services/aggregation/scoring.py, services/trading/circuit_breaker.py, services/shared/schemas.py).
THE Deep_Dive_Document SHALL reference database tables using names that match the PostgreSQL schema as defined in infra/migrations/.
THE Deep_Dive_Document SHALL reference Redis queue names using the constants defined in services/shared/redis_keys.py (e.g., QUEUE_EXTRACTION, QUEUE_AGGREGATION, QUEUE_RECOMMENDATION, QUEUE_BROKER).
THE Deep_Dive_Document SHALL reference Pydantic schema classes using their actual class names from services/shared/schemas.py (e.g., DocumentIntelligence, TrendSummary, Recommendation, GlobalEventSchema, CompanyImpact).
THE Deep_Dive_Document SHALL reference configuration environment variables using the exact names defined in services/shared/config.py and the service-specific configuration sections.

21 KiB Raw Blame History Unescape Escape

Requirements Document

Introduction

Glossary

Requirements

Requirement 1: Document Structure and File Organization

Acceptance Criteria

Requirement 2: Page 1 — Data Ingestion and Preparation

Acceptance Criteria

Requirement 3: Page 2 — AI Agent Processing and Structured Extraction

Acceptance Criteria

Requirement 4: Page 3 — Signal Scoring and the WeightedSignal Abstraction

Acceptance Criteria

Requirement 5: Page 4 — Trend Aggregation and Accumulating Signals

Acceptance Criteria

Requirement 6: Page 5 — Recommendation Generation and Signal-to-Action Translation

Acceptance Criteria

Requirement 7: Page 6 — Trading Engine Decisions and Execution

Acceptance Criteria

Requirement 8: Mermaid Diagram Quality and Separation

Acceptance Criteria

Requirement 9: Narrative Style and Cross-Referencing

Acceptance Criteria

Requirement 10: Codebase Accuracy

Acceptance Criteria

21 KiB

Raw Blame History