# From Model Output to Trade: The Full Pipeline This document traces the complete journey of data through Stonks Oracle — from the moment an Ollama model produces structured JSON, through signal scoring and aggregation, to the final trading decision. --- ## 1. Document Ingestion Before the model ever sees a document, the ingestion layer fetches raw content from configured sources (news APIs, SEC filings, earnings transcripts, press releases). Each document lands in the `documents` table with a status, type, and `published_at` timestamp. A Redis queue (`stonks:queue:extraction`) feeds documents to the extractor service. --- ## 2. Prompting the Model The extractor service (`services/extractor/client.py`) sends each document to a local Ollama instance via `POST /api/chat`. ### System prompt A short, strict instruction set: > You are a financial document analyst. Extract structured data as JSON. Return ONLY a single JSON object. No markdown fences, no explanation, no text before or after the JSON. Every field in the schema is required. Use "other" for catalyst_type if unsure. Keep evidence_spans short (under 20 words each). Keep key_facts to 3-5 items max. ### User prompt Built dynamically per document (`services/extractor/prompts.py`). It includes: - **Document type guidance** — tailored instructions for articles, filings, transcripts, and press releases. For example, filings get: *"Extract concrete financial figures, risk factors, and material events as stated."* Transcripts get: *"Distinguish between management forward-looking statements and reported results."* - **Tracked ticker hints** — the list of 50 tracked tickers, with rules: if a ticker appears verbatim in the text, the model must include it; if a sector theme clearly affects a tracked company, include it; never invent tickers outside the list. - **Field-by-field instructions** — what each output field means and its valid range. - **Document text** — truncated to 8,000 characters to keep inference fast. ### Ollama call parameters - `think=false` (speed over chain-of-thought) - `num_predict=4096` (max output tokens) - Optional `num_ctx` override for longer documents - The JSON schema (generated from Pydantic models) is passed as the `format` parameter for structured output --- ## 3. Model Output: The JSON Contract The model returns a single JSON object matching the `ExtractionResult` schema (`services/extractor/schemas.py`): ```json { "summary": "Apple reported record Q4 earnings driven by iPhone 16 demand.", "companies": [ { "ticker": "AAPL", "company_name": "Apple Inc.", "relevance": 0.95, "sentiment": "positive", "impact_score": 0.8, "impact_horizon": "1d_7d", "catalyst_type": "earnings", "key_facts": [ "Revenue up 12% YoY to $94.9B", "iPhone revenue grew 18%", "Services hit all-time high" ], "risks": [ "China market softness noted by management" ], "evidence_spans": [ "record quarterly revenue of $94.9 billion", "iPhone revenue grew 18 percent year over year" ] } ], "macro_themes": ["consumer_spending", "ai_capex"], "novelty_score": 0.6, "confidence": 0.85, "extraction_warnings": [] } ``` ### Field definitions | Field | Type | Range | Purpose | |---|---|---|---| | `summary` | string | — | 1-3 sentence document summary | | `companies[]` | array | — | Per-company intelligence (one entry per affected company) | | `.ticker` | string | — | Stock ticker symbol | | `.relevance` | float | 0-1 | How central this company is to the document | | `.sentiment` | enum | positive / negative / neutral / mixed | Overall sentiment toward the company | | `.impact_score` | float | 0-1 | Estimated magnitude of impact (0 = negligible, 1 = highly material) | | `.impact_horizon` | string | intraday / 1d / 1d_7d / 1d_30d / 30d_90d / 90d_plus | When the impact is expected to manifest | | `.catalyst_type` | enum | earnings / product / legal / macro / supply_chain / m_and_a / rating_change / other | Primary catalyst category | | `.key_facts` | string[] | — | Facts explicitly stated in the document (no fabrication) | | `.risks` | string[] | — | Risks explicitly mentioned | | `.evidence_spans` | string[] | — | Short verbatim quotes supporting the analysis | | `macro_themes` | string[] | — | Broad economic themes (rates, inflation, ai_capex, etc.) | | `novelty_score` | float | 0-1 | How surprising the information is | | `confidence` | float | 0-1 | Model's self-assessed extraction quality | | `extraction_warnings` | string[] | — | Issues encountered (ambiguous_ticker, incomplete_text, etc.) | --- ## 4. JSON Repair and Validation The raw model output goes through two stages before it's trusted. ### 4a. JSON repair (`services/extractor/client.py`) Ollama's `format` constraint is unreliable with `think=false` on certain models (Ollama bug #14645). The extractor handles this: 1. Try `json.loads()` directly — if it parses, use it as-is. 2. Strip markdown fences (` ```json ... ``` `) if present. 3. Fall back to the `json-repair` library, which fixes trailing commas, unterminated strings, and control characters. ### 4b. Structural + semantic validation (`services/extractor/schemas.py`) 1. **Structural validation** — parse the JSON against the `ExtractionResult` Pydantic model. Missing required fields, wrong types, or out-of-range values fail here. 2. **Semantic validation** — cross-field consistency checks: - Ticker format validation - Evidence span length checks - Catalyst type alias normalization (maps variants to canonical enum values) - Impact horizon normalization 3. Returns a `ValidationReport` with the parsed result or a list of errors. ### 4c. Retry logic If validation fails and the error is retryable (not an HTTP 4xx client error), the extractor retries up to `max_retries` times (default 2) with exponential backoff. Every attempt — raw output, validation result, error, duration — is preserved in the `ExtractionResponse.attempts` list for audit. --- ## 5. Persistence: Document Intelligence and Impact Records A successful extraction produces two sets of database records. ### Document intelligence (`document_intelligence` table) One row per document: - `document_id`, `document_type`, `summary`, `companies` (JSONB), `macro_themes` - `novelty_score`, `source_credibility`, `confidence`, `extraction_warnings` - `validation_status` (valid/failed) - `model` metadata: provider, model_name, prompt_version, schema_version ### Per-company impact records (`document_impact_records` table) One row per company mentioned in the extraction: - `ticker`, `company_name`, `relevance`, `sentiment`, `impact_score`, `impact_horizon`, `catalyst_type` - `key_facts`, `risks`, `evidence_spans` (all JSONB) - Links back to `document_intelligence` via `intelligence_id` ### Raw artifact storage (MinIO) Full prompts and raw model responses are stored in MinIO buckets (`stonks-llm-prompts`, `stonks-llm-results`) keyed by `document_id`, so any extraction can be replayed or audited. --- ## 6. Signal Scoring: Turning Records into Weighted Signals The aggregation engine (`services/aggregation/worker.py`) converts raw impact records into `WeightedSignal` objects. Each signal carries a composite weight that determines how much it influences the final trend. ### Weight components (`services/aggregation/scoring.py`) The combined weight is: ``` combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier ``` | Component | Formula | Purpose | |---|---|---| | **Confidence gate** | 0 if extraction confidence < 0.2, else 1 | Reject unreliable extractions entirely | | **Recency decay** | `2^(-age_hours / half_life)`, min 0.01 | Exponential decay — newer documents matter more. Half-lives: intraday=2h, 1d=12h, 7d=72h, 30d=240h, 90d=720h | | **Credibility** | `source_credibility ^ exponent`, clamped [0.1, 1.0] | Source quality weighting | | **Novelty bonus** | `novelty_score × 0.25` | Novel information gets up to 25% boost | | **Market context** | Volatility boost (up to +30%) + volume surge boost (+15%) | Fast-moving, high-volume markets amplify fresh signals | Each `WeightedSignal` also carries: - `sentiment_value`: +1.0 (positive), -1.0 (negative), 0.0 (neutral/mixed) - `impact_score`: the extraction's impact magnitude - `document_id`: for evidence tracing --- ## 7. Three Signal Layers The aggregation engine merges signals from three independent layers. Each layer can be toggled on/off at runtime via the `risk_configs` table — no restart needed. ### Layer 1: Company-specific signals (always active) Direct document intelligence about a company. This is the core layer — `document_impact_records` for the ticker, scored as described in §6. ### Layer 2: Macro signals (toggle: `macro_enabled`) Global events that affect companies through exposure profiles. **Flow:** 1. The macro service classifies global events (from news) using Ollama — extracting event type, severity, affected regions/sectors/commodities. 2. Each company has an **exposure profile** (`exposure_profiles` table): geographic revenue mix, supply chain regions, commodity dependencies, market position tier. 3. **Overlap scoring** computes how much a global event overlaps with a company's exposure (geographic, supply chain, commodity dimensions). 4. A **resilience modifier** based on market position tier (global leaders are more resilient than domestic companies) adjusts the score. 5. The final `macro_impact_score = base_score × overlap_factor × resilience_modifier`. 6. Events older than 48 hours get accelerated staleness decay. Macro signals are converted to `WeightedSignal` objects with: - `sentiment_value` mapped from `impact_direction` (positive → +1, negative → -1) - `impact_score = macro_impact_score × macro_signal_weight` (default weight: 0.3) - Recency decay from the global event's publication time ### Layer 3: Competitive signals (toggle: `competitive_enabled`) Historical patterns and cross-company signal propagation. **Flow:** 1. **Self-company pattern mining** (`services/aggregation/pattern_matcher.py`): For each catalyst type in the current impact records, query historical outcomes for this ticker. Lookback: 180 days for routine signals, 365 days for major decisions (1.3× weight multiplier). Produces `HistoricalPattern` objects with `bullish_pct`, `bearish_pct`, `avg_strength`, `pattern_confidence`. 2. **Cross-company propagation** (`services/aggregation/signal_propagation.py`): When company A has a catalyst, look up its competitors via the `competitor_relationships` table (46 relationships across 50 companies). For each competitor, query cross-company historical patterns. Signal strength = `avg_strength × relationship_strength × pattern_confidence × impact_score`. Direction = majority historical outcome (bullish or bearish). 3. Competitive signals are converted to `WeightedSignal` objects with: - `impact_score = signal_strength × competitive_signal_weight` (default weight: 0.2) - Recency decay from the pattern's most recent data point or the signal's `computed_at` time ### Merging All three layers produce `WeightedSignal` objects with the same structure. The aggregation engine simply concatenates them into a single list before computing the trend summary. The relative influence of each layer is controlled by the `macro_signal_weight` (0.3) and `competitive_signal_weight` (0.2) multipliers applied to their impact scores. --- ## 8. Trend Summary Assembly From the merged signal list, the aggregation engine computes a `TrendSummary` for each ticker × window combination (intraday, 1d, 7d, 30d, 90d). ### Weighted sentiment average ``` avg_sentiment = Σ(sentiment_value × combined_weight × impact_score) / Σ(combined_weight × impact_score) ``` ### Trend direction | Condition | Direction | |---|---| | `avg_sentiment ≥ 0.15` | **Bullish** | | `avg_sentiment ≤ -0.15` | **Bearish** | | Contradiction > 0.10 and \|avg_sentiment\| < 0.30 | **Mixed** | | Otherwise | **Neutral** | ### Trend strength `strength = min(|avg_sentiment|, 1.0)` — the absolute magnitude of the weighted sentiment, clamped to [0, 1]. ### Contradiction score Measures disagreement among signals: ``` contradiction = minority_side_weight / total_weight ``` Where minority side is whichever of positive or negative has less total weight. A score of 0 means full agreement; approaching 0.5 means equal-weight disagreement. The system also runs multi-dimensional contradiction detection (`services/aggregation/contradiction.py`): - **Sentiment disagreement** — the core positive-vs-negative split - **Catalyst disagreement** — same catalyst type with opposing sentiment from different documents ### Confidence Derived from four factors: - **Unique source count** — more distinct documents = higher confidence (caps at 15 unique sources for 0.8 contribution) - **Average extraction confidence** — from the model's self-assessed quality - **Signal agreement** — fraction of signals pointing the same direction, dampened by sample size (log₂ scaling, saturates around 7 unique sources) - **Contradiction penalty** — `contradiction_score × 0.4` subtracted ### Evidence ranking Supporting and opposing documents are ranked by a composite score considering weight, impact, recency, and confidence — not just raw weight. The top 10 of each are stored for citation. ### Catalysts and risks Dominant catalyst types are ranked by cumulative signal weight. Material risks are deduplicated and ordered by the weight of the signal that surfaced them. ### Persistence The assembled `TrendSummary` is upserted into the `trend_windows` table (one row per entity × window, updated each cycle). A snapshot is also appended to `trend_history` for time-series charting. Evidence mappings go into `trend_evidence` with per-document rank scores and component breakdowns. --- ## 9. Trend Projections After assembling the current trend, the engine computes a forward-looking projection (`services/aggregation/projection.py`): - **Macro decay** — projects macro event impact forward with exponential decay based on estimated duration and severity - **Momentum** — trend momentum from recent price action - **Driving factors** — lists key macro events, competitive patterns, and market conditions - **Divergence detection** — flags when the projection diverges from the current trend direction Output: `TrendProjection` with `projected_direction`, `projected_strength`, `projected_confidence`, `projection_horizon`, `driving_factors`, and `diverges_from_current`. Projections with confidence below 0.3 are flagged as `low_confidence` and excluded from thesis generation. --- ## 10. Recommendation Generation The recommendation service (`services/recommendation/worker.py`) turns trend summaries into actionable recommendations. ### Step 1: Data quality suppression (`services/recommendation/suppression.py`) Before any eligibility check, the system evaluates the quality of the underlying data: | Check | Threshold | Effect | |---|---|---| | Average extraction confidence | < 0.40 | Suppress | | Evidence staleness | > 168 hours (7 days) | Suppress | | Source type diversity | < 1 distinct type | Suppress | | Extraction failure rate | > 50% | Suppress | | Valid document count | < 2 | Suppress | | Overall data quality score | < 0.30 | Suppress | The data quality score is a weighted composite: 40% extraction confidence + 30% evidence freshness + 30% document coverage. **Safety suppression** — two additional rules prevent trading on thin evidence from a single signal layer: - **Macro-only suppression**: If the trend direction is driven solely by macro signals with zero company-specific evidence, the recommendation is forced to informational mode. - **Pattern-only suppression**: Same rule for pattern/competitive signals with no company or macro support. ### Step 2: Eligibility evaluation (`services/recommendation/eligibility.py`) Deterministic rules — no model involvement: **Gate checks** (any failure → no recommendation): - Confidence ≥ 0.35 - Trend strength ≥ 0.10 - Contradiction score ≤ 0.60 - Evidence count ≥ 2 - Direction ≠ neutral **Action mapping:** - Strong bullish (strength ≥ 0.25) → **BUY** - Strong bearish (strength ≥ 0.25) → **SELL** - Weak but directional + decent confidence (≥ 0.50) → **HOLD** - Everything else → **WATCH** **Mode escalation:** - WATCH and HOLD → always **informational** (no trades) - BUY/SELL with confidence ≥ 0.70, contradiction ≤ 0.25, evidence ≥ 5 → **live_eligible** - BUY/SELL with confidence ≥ 0.50 → **paper_eligible** - Below that → **informational** ### Step 3: Position sizing Computed from signal quality: ``` raw_portfolio_pct = base (1%) + confidence × strength × range (up to 10%) ``` Adjusted by: - Contradiction penalty (higher contradiction → smaller position) - Evidence count penalty (< 3 docs → 50% reduction, < 5 docs → 75%) - Max loss percentage scales similarly (base 0.3% up to 2%) ### Step 4: Thesis generation Two layers: 1. **Deterministic thesis** — assembled from trend direction, strength, catalysts, risks, contradiction notes, projection info, and the recommended action. Always generated. 2. **Optional LLM rewrite** (`services/recommendation/thesis_llm.py`) — for trading-eligible recommendations only, the deterministic thesis is rewritten into analyst-quality prose via Ollama. This is cosmetic; the underlying decision is unchanged. ### Step 5: Risk classification Based on contradiction score, confidence, evidence count, and mode: - `low` — high confidence, low contradiction, strong evidence - `moderate` — decent signals with some uncertainty - `high` — notable contradiction or low evidence - `very_high` — multiple risk factors present The thesis is prefixed with the risk label: `[risk:moderate] AAPL shows a bullish trend...` ### Step 6: Persistence - `recommendations` table — the full recommendation record - `recommendation_evidence` table — per-document citations with weights and evidence types - `risk_evaluations` table — the eligibility decision, risk checks, and full decision trace --- ## 11. Trading Engine Decision Loop The trading engine (`services/trading/engine.py`) polls the `recommendations` table every 60 seconds for actionable recommendations (`action IN ('buy', 'sell')` and `mode IN ('paper_eligible', 'live_eligible')`). ### Pre-trade checks (in order, first failure short-circuits) 1. **Circuit breaker** — is the daily loss cap or single-position loss cap breached? If so, all trading halts. 2. **Trading window** — is the market open? Outside market hours, skip. 3. **Confidence gate** — does the recommendation meet the active risk tier's minimum confidence? 4. **Deduplication** — has this recommendation already been processed? 5. **Declining positions** — are there multiple open positions currently declining? 6. **Max open positions** — is the portfolio at capacity? ### Position sizing (`services/trading/position_sizer.py`) Computes the dollar amount and share quantity: - Confidence-based scaling (sample-size-dampened agreement scoring) - Risk tier adjustment (conservative / moderate / aggressive) - Portfolio heat check (sector concentration, correlation) - Active pool available capital - Absolute position cap ### Stop-loss and take-profit (`services/trading/stop_loss_manager.py`) - Stop-loss = entry price − (ATR × atr_multiplier) - Take-profit = entry price + (ATR × atr_multiplier × reward_risk_ratio) - Trailing stops activate for open positions ### Additional checks - **Correlation-aware diversification** — reject positions that would push portfolio correlation above threshold - **Earnings calendar awareness** — reduce size or skip if earnings are within 2 days - **Gradual entry** — large positions (> $30) split into 3 tranches over time - **Reserve pool** — profits from closed positions siphon into an emergency liquidity reserve ### Risk tier auto-adjustment (`services/trading/risk_tier_controller.py`) Daily evaluation of Sharpe ratio, drawdown, and win rate. The engine auto-adjusts between conservative, moderate, and aggressive tiers. The new tier is persisted to `risk_configs` and takes effect on the next cycle. ### Output: Trading Decision Every evaluation produces a `TradingDecision` record persisted for audit: - `decision`: act or skip - `skip_reason`: which check failed (if any) - `computed_position_size`, `computed_share_quantity` - `risk_tier_at_decision`, `portfolio_heat_at_decision`, `active_pool_at_decision` - `circuit_breaker_status`, `correlation_check_result`, `sector_exposure_check_result` - `earnings_proximity_flag`, `is_micro_trade`, `decision_trace` If the decision is **act**, an order job is pushed to the Redis broker queue (`stonks:queue:broker`) with ticker, action, quantity, and order type. --- ## 12. The Complete Data Flow (Summary) ``` Document (article/filing/transcript) │ ▼ Ollama extraction (JSON) │ ├─ JSON repair (json-repair library) │ └─ Pydantic validation + semantic checks │ ▼ document_intelligence + document_impact_records (PostgreSQL) │ └─ Raw prompts/responses → MinIO (audit) │ ├──────────────────────────────────────────────────┐ │ │ ▼ ▼ Layer 1: Company signals Layer 2: Macro signals (impact records → WeightedSignal) (global_events → exposure matching → macro_impact_records → WeightedSignal) │ │ │ Layer 3: Competitive signals │ │ (pattern mining + propagation │ │ → competitive_signal_records │ │ → WeightedSignal) │ │ │ │ └───────────┬───────────────┘───────────────────────┘ │ ▼ Signal merging (concatenate all WeightedSignals) │ ▼ Trend summary assembly (weighted sentiment → direction, strength, confidence, contradiction, evidence ranking, catalysts, risks) │ ├─→ trend_windows (PostgreSQL) ├─→ trend_history (time-series) └─→ trend_evidence (per-document rankings) │ ▼ Trend projection (forward-looking) │ ▼ Data quality suppression (extraction confidence, staleness, diversity, macro-only / pattern-only safety) │ ▼ Eligibility evaluation (gate checks → action mapping → mode escalation → position sizing) │ ▼ Thesis generation + risk classification │ ├─→ recommendations (PostgreSQL) ├─→ recommendation_evidence └─→ risk_evaluations │ ▼ Trading engine decision loop (pre-trade checks → position sizing → stop-loss → correlation → earnings → gradual entry) │ ├─→ trading_decisions (PostgreSQL, audit) └─→ stonks:queue:broker (Redis, order execution) ``` --- ## 13. Key Database Tables | Table | Stage | Purpose | |---|---|---| | `documents` | Ingestion | Raw ingested content | | `document_intelligence` | Extraction | Ollama extraction output | | `document_impact_records` | Extraction | Per-company impact from a document | | `global_events` | Macro | Classified macro/geopolitical events | | `exposure_profiles` | Macro | Company exposure data (geography, supply chain, commodities) | | `macro_impact_records` | Macro | Per-company macro impact scores | | `competitor_relationships` | Competitive | Company relationship graph | | `competitive_signal_records` | Competitive | Cross-company propagated signals | | `trend_windows` | Aggregation | Current trend summaries (upserted each cycle) | | `trend_history` | Aggregation | Time-series snapshots for charting | | `trend_evidence` | Aggregation | Per-document evidence rankings | | `trend_projections` | Projection | Forward-looking trend projections | | `recommendations` | Recommendation | Trade recommendations | | `recommendation_evidence` | Recommendation | Per-document citations | | `risk_evaluations` | Recommendation | Eligibility decisions and risk checks | | `risk_configs` | Runtime | Toggle switches and risk tier configuration | | `trading_decisions` | Trading | Pre-trade evaluation audit trail | | `positions` | Trading | Open positions | | `orders` | Trading | Broker orders | | `fills` | Trading | Order fills | --- ## 14. Audit Trail Every stage preserves full context for reproducibility: - **Extraction**: raw Ollama response, repair steps, validation errors, all retry attempts - **Aggregation**: per-signal weight breakdowns (recency, credibility, novelty, market context), contradiction details by dimension - **Recommendation**: deterministic thesis, evidence citations with weights, eligibility decision trace, risk evaluation - **Trading**: every pre-trade check result, position sizing breakdown, risk tier at decision time, full decision trace - **Execution**: order details, fills, P&L, performance metrics