stonks-oracle/docs/llm-to-trade-pipeline.md

# From Model Output to Trade: The Full Pipeline

This document traces the complete journey of data through Stonks Oracle — from the moment an Ollama model produces structured JSON, through signal scoring and aggregation, to the final trading decision.

---

## 1. Document Ingestion

Before the model ever sees a document, the ingestion layer fetches raw content from configured sources (news APIs, SEC filings, earnings transcripts, press releases). Each document lands in the `documents` table with a status, type, and `published_at` timestamp. A Redis queue (`stonks:queue:extraction`) feeds documents to the extractor service.

---

## 2. Prompting the Model

The extractor service (`services/extractor/client.py`) sends each document to a local Ollama instance via `POST /api/chat`.

### System prompt

A short, strict instruction set:

> You are a financial document analyst. Extract structured data as JSON. Return ONLY a single JSON object. No markdown fences, no explanation, no text before or after the JSON. Every field in the schema is required. Use "other" for catalyst_type if unsure. Keep evidence_spans short (under 20 words each). Keep key_facts to 3-5 items max.

### User prompt

Built dynamically per document (`services/extractor/prompts.py`). It includes:

- **Document type guidance** — tailored instructions for articles, filings, transcripts, and press releases. For example, filings get: *"Extract concrete financial figures, risk factors, and material events as stated."* Transcripts get: *"Distinguish between management forward-looking statements and reported results."*
- **Tracked ticker hints** — the list of 50 tracked tickers, with rules: if a ticker appears verbatim in the text, the model must include it; if a sector theme clearly affects a tracked company, include it; never invent tickers outside the list.
- **Field-by-field instructions** — what each output field means and its valid range.
- **Document text** — truncated to 8,000 characters to keep inference fast.

### Ollama call parameters

- `think=false` (speed over chain-of-thought)
- `num_predict=4096` (max output tokens)
- Optional `num_ctx` override for longer documents
- The JSON schema (generated from Pydantic models) is passed as the `format` parameter for structured output

---

## 3. Model Output: The JSON Contract

The model returns a single JSON object matching the `ExtractionResult` schema (`services/extractor/schemas.py`):

```json
{
  "summary": "Apple reported record Q4 earnings driven by iPhone 16 demand.",
  "companies": [
    {
      "ticker": "AAPL",
      "company_name": "Apple Inc.",
      "relevance": 0.95,
      "sentiment": "positive",
      "impact_score": 0.8,
      "impact_horizon": "1d_7d",
      "catalyst_type": "earnings",
      "key_facts": [
        "Revenue up 12% YoY to $94.9B",
        "iPhone revenue grew 18%",
        "Services hit all-time high"
      ],
      "risks": [
        "China market softness noted by management"
      ],
      "evidence_spans": [
        "record quarterly revenue of $94.9 billion",
        "iPhone revenue grew 18 percent year over year"
      ]
    }
  ],
  "macro_themes": ["consumer_spending", "ai_capex"],
  "novelty_score": 0.6,
  "confidence": 0.85,
  "extraction_warnings": []
}
```

### Field definitions

| Field | Type | Range | Purpose |
|---|---|---|---|
| `summary` | string | — | 1-3 sentence document summary |
| `companies[]` | array | — | Per-company intelligence (one entry per affected company) |
| `.ticker` | string | — | Stock ticker symbol |
| `.relevance` | float | 0-1 | How central this company is to the document |
| `.sentiment` | enum | positive / negative / neutral / mixed | Overall sentiment toward the company |
| `.impact_score` | float | 0-1 | Estimated magnitude of impact (0 = negligible, 1 = highly material) |
| `.impact_horizon` | string | intraday / 1d / 1d_7d / 1d_30d / 30d_90d / 90d_plus | When the impact is expected to manifest |
| `.catalyst_type` | enum | earnings / product / legal / macro / supply_chain / m_and_a / rating_change / other | Primary catalyst category |
| `.key_facts` | string[] | — | Facts explicitly stated in the document (no fabrication) |
| `.risks` | string[] | — | Risks explicitly mentioned |
| `.evidence_spans` | string[] | — | Short verbatim quotes supporting the analysis |
| `macro_themes` | string[] | — | Broad economic themes (rates, inflation, ai_capex, etc.) |
| `novelty_score` | float | 0-1 | How surprising the information is |
| `confidence` | float | 0-1 | Model's self-assessed extraction quality |
| `extraction_warnings` | string[] | — | Issues encountered (ambiguous_ticker, incomplete_text, etc.) |

---

## 4. JSON Repair and Validation

The raw model output goes through two stages before it's trusted.

### 4a. JSON repair (`services/extractor/client.py`)

Ollama's `format` constraint is unreliable with `think=false` on certain models (Ollama bug #14645). The extractor handles this:

1. Try `json.loads()` directly — if it parses, use it as-is.
2. Strip markdown fences (` ```json ... ``` `) if present.
3. Fall back to the `json-repair` library, which fixes trailing commas, unterminated strings, and control characters.

### 4b. Structural + semantic validation (`services/extractor/schemas.py`)

1. **Structural validation** — parse the JSON against the `ExtractionResult` Pydantic model. Missing required fields, wrong types, or out-of-range values fail here.
2. **Semantic validation** — cross-field consistency checks:
   - Ticker format validation
   - Evidence span length checks
   - Catalyst type alias normalization (maps variants to canonical enum values)
   - Impact horizon normalization
3. Returns a `ValidationReport` with the parsed result or a list of errors.

### 4c. Retry logic

If validation fails and the error is retryable (not an HTTP 4xx client error), the extractor retries up to `max_retries` times (default 2) with exponential backoff. Every attempt — raw output, validation result, error, duration — is preserved in the `ExtractionResponse.attempts` list for audit.

---

## 5. Persistence: Document Intelligence and Impact Records

A successful extraction produces two sets of database records.

### Document intelligence (`document_intelligence` table)

One row per document:
- `document_id`, `document_type`, `summary`, `companies` (JSONB), `macro_themes`
- `novelty_score`, `source_credibility`, `confidence`, `extraction_warnings`
- `validation_status` (valid/failed)
- `model` metadata: provider, model_name, prompt_version, schema_version

### Per-company impact records (`document_impact_records` table)

One row per company mentioned in the extraction:
- `ticker`, `company_name`, `relevance`, `sentiment`, `impact_score`, `impact_horizon`, `catalyst_type`
- `key_facts`, `risks`, `evidence_spans` (all JSONB)
- Links back to `document_intelligence` via `intelligence_id`

### Raw artifact storage (MinIO)

Full prompts and raw model responses are stored in MinIO buckets (`stonks-llm-prompts`, `stonks-llm-results`) keyed by `document_id`, so any extraction can be replayed or audited.

---

## 6. Signal Scoring: Turning Records into Weighted Signals

The aggregation engine (`services/aggregation/worker.py`) converts raw impact records into `WeightedSignal` objects. Each signal carries a composite weight that determines how much it influences the final trend.

### Weight components (`services/aggregation/scoring.py`)

The combined weight is:

```
combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier
```

| Component | Formula | Purpose |
|---|---|---|
| **Confidence gate** | 0 if extraction confidence < 0.2, else 1 | Reject unreliable extractions entirely |
| **Recency decay** | `2^(-age_hours / half_life)`, min 0.01 | Exponential decay — newer documents matter more. Half-lives: intraday=2h, 1d=12h, 7d=72h, 30d=240h, 90d=720h |
| **Credibility** | `source_credibility ^ exponent`, clamped [0.1, 1.0] | Source quality weighting |
| **Novelty bonus** | `novelty_score × 0.25` | Novel information gets up to 25% boost |
| **Market context** | Volatility boost (up to +30%) + volume surge boost (+15%) | Fast-moving, high-volume markets amplify fresh signals |

Each `WeightedSignal` also carries:
- `sentiment_value`: +1.0 (positive), -1.0 (negative), 0.0 (neutral/mixed)
- `impact_score`: the extraction's impact magnitude
- `document_id`: for evidence tracing

---

## 7. Three Signal Layers

The aggregation engine merges signals from three independent layers. Each layer can be toggled on/off at runtime via the `risk_configs` table — no restart needed.

### Layer 1: Company-specific signals (always active)

Direct document intelligence about a company. This is the core layer — `document_impact_records` for the ticker, scored as described in §6.

### Layer 2: Macro signals (toggle: `macro_enabled`)

Global events that affect companies through exposure profiles.

**Flow:**
1. The macro service classifies global events (from news) using Ollama — extracting event type, severity, affected regions/sectors/commodities.
2. Each company has an **exposure profile** (`exposure_profiles` table): geographic revenue mix, supply chain regions, commodity dependencies, market position tier.
3. **Overlap scoring** computes how much a global event overlaps with a company's exposure (geographic, supply chain, commodity dimensions).
4. A **resilience modifier** based on market position tier (global leaders are more resilient than domestic companies) adjusts the score.
5. The final `macro_impact_score = base_score × overlap_factor × resilience_modifier`.
6. Events older than 48 hours get accelerated staleness decay.

Macro signals are converted to `WeightedSignal` objects with:
- `sentiment_value` mapped from `impact_direction` (positive → +1, negative → -1)
- `impact_score = macro_impact_score × macro_signal_weight` (default weight: 0.3)
- Recency decay from the global event's publication time

### Layer 3: Competitive signals (toggle: `competitive_enabled`)

Historical patterns and cross-company signal propagation.

**Flow:**
1. **Self-company pattern mining** (`services/aggregation/pattern_matcher.py`): For each catalyst type in the current impact records, query historical outcomes for this ticker. Lookback: 180 days for routine signals, 365 days for major decisions (1.3× weight multiplier). Produces `HistoricalPattern` objects with `bullish_pct`, `bearish_pct`, `avg_strength`, `pattern_confidence`.
2. **Cross-company propagation** (`services/aggregation/signal_propagation.py`): When company A has a catalyst, look up its competitors via the `competitor_relationships` table (46 relationships across 50 companies). For each competitor, query cross-company historical patterns. Signal strength = `avg_strength × relationship_strength × pattern_confidence × impact_score`. Direction = majority historical outcome (bullish or bearish).
3. Competitive signals are converted to `WeightedSignal` objects with:
   - `impact_score = signal_strength × competitive_signal_weight` (default weight: 0.2)
   - Recency decay from the pattern's most recent data point or the signal's `computed_at` time

### Merging

All three layers produce `WeightedSignal` objects with the same structure. The aggregation engine simply concatenates them into a single list before computing the trend summary. The relative influence of each layer is controlled by the `macro_signal_weight` (0.3) and `competitive_signal_weight` (0.2) multipliers applied to their impact scores.

---

## 8. Trend Summary Assembly

From the merged signal list, the aggregation engine computes a `TrendSummary` for each ticker × window combination (intraday, 1d, 7d, 30d, 90d).

### Weighted sentiment average

```
avg_sentiment = Σ(sentiment_value × combined_weight × impact_score) / Σ(combined_weight × impact_score)
```

### Trend direction

| Condition | Direction |
|---|---|
| `avg_sentiment ≥ 0.15` | **Bullish** |
| `avg_sentiment ≤ -0.15` | **Bearish** |
| Contradiction > 0.10 and \|avg_sentiment\| < 0.30 | **Mixed** |
| Otherwise | **Neutral** |

### Trend strength

`strength = min(|avg_sentiment|, 1.0)` — the absolute magnitude of the weighted sentiment, clamped to [0, 1].

### Contradiction score

Measures disagreement among signals:

```
contradiction = minority_side_weight / total_weight
```

Where minority side is whichever of positive or negative has less total weight. A score of 0 means full agreement; approaching 0.5 means equal-weight disagreement.

The system also runs multi-dimensional contradiction detection (`services/aggregation/contradiction.py`):
- **Sentiment disagreement** — the core positive-vs-negative split
- **Catalyst disagreement** — same catalyst type with opposing sentiment from different documents

### Confidence

Derived from four factors:
- **Unique source count** — more distinct documents = higher confidence (caps at 15 unique sources for 0.8 contribution)
- **Average extraction confidence** — from the model's self-assessed quality
- **Signal agreement** — fraction of signals pointing the same direction, dampened by sample size (log₂ scaling, saturates around 7 unique sources)
- **Contradiction penalty** — `contradiction_score × 0.4` subtracted

### Evidence ranking

Supporting and opposing documents are ranked by a composite score considering weight, impact, recency, and confidence — not just raw weight. The top 10 of each are stored for citation.

### Catalysts and risks

Dominant catalyst types are ranked by cumulative signal weight. Material risks are deduplicated and ordered by the weight of the signal that surfaced them.

### Persistence

The assembled `TrendSummary` is upserted into the `trend_windows` table (one row per entity × window, updated each cycle). A snapshot is also appended to `trend_history` for time-series charting. Evidence mappings go into `trend_evidence` with per-document rank scores and component breakdowns.

---

## 9. Trend Projections

After assembling the current trend, the engine computes a forward-looking projection (`services/aggregation/projection.py`):

- **Macro decay** — projects macro event impact forward with exponential decay based on estimated duration and severity
- **Momentum** — trend momentum from recent price action
- **Driving factors** — lists key macro events, competitive patterns, and market conditions
- **Divergence detection** — flags when the projection diverges from the current trend direction

Output: `TrendProjection` with `projected_direction`, `projected_strength`, `projected_confidence`, `projection_horizon`, `driving_factors`, and `diverges_from_current`. Projections with confidence below 0.3 are flagged as `low_confidence` and excluded from thesis generation.

---

## 10. Recommendation Generation

The recommendation service (`services/recommendation/worker.py`) turns trend summaries into actionable recommendations.

### Step 1: Data quality suppression (`services/recommendation/suppression.py`)

Before any eligibility check, the system evaluates the quality of the underlying data:

| Check | Threshold | Effect |
|---|---|---|
| Average extraction confidence | < 0.40 | Suppress |
| Evidence staleness | > 168 hours (7 days) | Suppress |
| Source type diversity | < 1 distinct type | Suppress |
| Extraction failure rate | > 50% | Suppress |
| Valid document count | < 2 | Suppress |
| Overall data quality score | < 0.30 | Suppress |

The data quality score is a weighted composite: 40% extraction confidence + 30% evidence freshness + 30% document coverage.

**Safety suppression** — two additional rules prevent trading on thin evidence from a single signal layer:
- **Macro-only suppression**: If the trend direction is driven solely by macro signals with zero company-specific evidence, the recommendation is forced to informational mode.
- **Pattern-only suppression**: Same rule for pattern/competitive signals with no company or macro support.

### Step 2: Eligibility evaluation (`services/recommendation/eligibility.py`)

Deterministic rules — no model involvement:

**Gate checks** (any failure → no recommendation):
- Confidence ≥ 0.35
- Trend strength ≥ 0.10
- Contradiction score ≤ 0.60
- Evidence count ≥ 2
- Direction ≠ neutral

**Action mapping:**
- Strong bullish (strength ≥ 0.25) → **BUY**
- Strong bearish (strength ≥ 0.25) → **SELL**
- Weak but directional + decent confidence (≥ 0.50) → **HOLD**
- Everything else → **WATCH**

**Mode escalation:**
- WATCH and HOLD → always **informational** (no trades)
- BUY/SELL with confidence ≥ 0.70, contradiction ≤ 0.25, evidence ≥ 5 → **live_eligible**
- BUY/SELL with confidence ≥ 0.50 → **paper_eligible**
- Below that → **informational**

### Step 3: Position sizing

Computed from signal quality:

```
raw_portfolio_pct = base (1%) + confidence × strength × range (up to 10%)
```

Adjusted by:
- Contradiction penalty (higher contradiction → smaller position)
- Evidence count penalty (< 3 docs → 50% reduction, < 5 docs → 75%)
- Max loss percentage scales similarly (base 0.3% up to 2%)

### Step 4: Thesis generation

Two layers:
1. **Deterministic thesis** — assembled from trend direction, strength, catalysts, risks, contradiction notes, projection info, and the recommended action. Always generated.
2. **Optional LLM rewrite** (`services/recommendation/thesis_llm.py`) — for trading-eligible recommendations only, the deterministic thesis is rewritten into analyst-quality prose via Ollama. This is cosmetic; the underlying decision is unchanged.

### Step 5: Risk classification

Based on contradiction score, confidence, evidence count, and mode:
- `low` — high confidence, low contradiction, strong evidence
- `moderate` — decent signals with some uncertainty
- `high` — notable contradiction or low evidence
- `very_high` — multiple risk factors present

The thesis is prefixed with the risk label: `[risk:moderate] AAPL shows a bullish trend...`

### Step 6: Persistence

- `recommendations` table — the full recommendation record
- `recommendation_evidence` table — per-document citations with weights and evidence types
- `risk_evaluations` table — the eligibility decision, risk checks, and full decision trace

---

## 11. Trading Engine Decision Loop

The trading engine (`services/trading/engine.py`) polls the `recommendations` table every 60 seconds for actionable recommendations (`action IN ('buy', 'sell')` and `mode IN ('paper_eligible', 'live_eligible')`).

### Pre-trade checks (in order, first failure short-circuits)

1. **Circuit breaker** — is the daily loss cap or single-position loss cap breached? If so, all trading halts.
2. **Trading window** — is the market open? Outside market hours, skip.
3. **Confidence gate** — does the recommendation meet the active risk tier's minimum confidence?
4. **Deduplication** — has this recommendation already been processed?
5. **Declining positions** — are there multiple open positions currently declining?
6. **Max open positions** — is the portfolio at capacity?

### Position sizing (`services/trading/position_sizer.py`)

Computes the dollar amount and share quantity:
- Confidence-based scaling (sample-size-dampened agreement scoring)
- Risk tier adjustment (conservative / moderate / aggressive)
- Portfolio heat check (sector concentration, correlation)
- Active pool available capital
- Absolute position cap

### Stop-loss and take-profit (`services/trading/stop_loss_manager.py`)

- Stop-loss = entry price − (ATR × atr_multiplier)
- Take-profit = entry price + (ATR × atr_multiplier × reward_risk_ratio)
- Trailing stops activate for open positions

### Additional checks

- **Correlation-aware diversification** — reject positions that would push portfolio correlation above threshold
- **Earnings calendar awareness** — reduce size or skip if earnings are within 2 days
- **Gradual entry** — large positions (> $30) split into 3 tranches over time
- **Reserve pool** — profits from closed positions siphon into an emergency liquidity reserve

### Risk tier auto-adjustment (`services/trading/risk_tier_controller.py`)

Daily evaluation of Sharpe ratio, drawdown, and win rate. The engine auto-adjusts between conservative, moderate, and aggressive tiers. The new tier is persisted to `risk_configs` and takes effect on the next cycle.

### Output: Trading Decision

Every evaluation produces a `TradingDecision` record persisted for audit:
- `decision`: act or skip
- `skip_reason`: which check failed (if any)
- `computed_position_size`, `computed_share_quantity`
- `risk_tier_at_decision`, `portfolio_heat_at_decision`, `active_pool_at_decision`
- `circuit_breaker_status`, `correlation_check_result`, `sector_exposure_check_result`
- `earnings_proximity_flag`, `is_micro_trade`, `decision_trace`

If the decision is **act**, an order job is pushed to the Redis broker queue (`stonks:queue:broker`) with ticker, action, quantity, and order type.

---

## 12. The Complete Data Flow (Summary)

```
Document (article/filing/transcript)
  │
  ▼
Ollama extraction (JSON)
  │  ├─ JSON repair (json-repair library)
  │  └─ Pydantic validation + semantic checks
  │
  ▼
document_intelligence + document_impact_records (PostgreSQL)
  │  └─ Raw prompts/responses → MinIO (audit)
  │
  ├──────────────────────────────────────────────────┐
  │                                                  │
  ▼                                                  ▼
Layer 1: Company signals              Layer 2: Macro signals
(impact records → WeightedSignal)     (global_events → exposure matching
                                       → macro_impact_records → WeightedSignal)
  │                                                  │
  │                    Layer 3: Competitive signals   │
  │                    (pattern mining + propagation  │
  │                     → competitive_signal_records  │
  │                     → WeightedSignal)             │
  │                           │                       │
  └───────────┬───────────────┘───────────────────────┘
              │
              ▼
       Signal merging (concatenate all WeightedSignals)
              │
              ▼
       Trend summary assembly
       (weighted sentiment → direction, strength, confidence,
        contradiction, evidence ranking, catalysts, risks)
              │
              ├─→ trend_windows (PostgreSQL)
              ├─→ trend_history (time-series)
              └─→ trend_evidence (per-document rankings)
              │
              ▼
       Trend projection (forward-looking)
              │
              ▼
       Data quality suppression
       (extraction confidence, staleness, diversity,
        macro-only / pattern-only safety)
              │
              ▼
       Eligibility evaluation
       (gate checks → action mapping → mode escalation → position sizing)
              │
              ▼
       Thesis generation + risk classification
              │
              ├─→ recommendations (PostgreSQL)
              ├─→ recommendation_evidence
              └─→ risk_evaluations
              │
              ▼
       Trading engine decision loop
       (pre-trade checks → position sizing → stop-loss →
        correlation → earnings → gradual entry)
              │
              ├─→ trading_decisions (PostgreSQL, audit)
              └─→ stonks:queue:broker (Redis, order execution)
```

---

## 13. Key Database Tables

| Table | Stage | Purpose |
|---|---|---|
| `documents` | Ingestion | Raw ingested content |
| `document_intelligence` | Extraction | Ollama extraction output |
| `document_impact_records` | Extraction | Per-company impact from a document |
| `global_events` | Macro | Classified macro/geopolitical events |
| `exposure_profiles` | Macro | Company exposure data (geography, supply chain, commodities) |
| `macro_impact_records` | Macro | Per-company macro impact scores |
| `competitor_relationships` | Competitive | Company relationship graph |
| `competitive_signal_records` | Competitive | Cross-company propagated signals |
| `trend_windows` | Aggregation | Current trend summaries (upserted each cycle) |
| `trend_history` | Aggregation | Time-series snapshots for charting |
| `trend_evidence` | Aggregation | Per-document evidence rankings |
| `trend_projections` | Projection | Forward-looking trend projections |
| `recommendations` | Recommendation | Trade recommendations |
| `recommendation_evidence` | Recommendation | Per-document citations |
| `risk_evaluations` | Recommendation | Eligibility decisions and risk checks |
| `risk_configs` | Runtime | Toggle switches and risk tier configuration |
| `trading_decisions` | Trading | Pre-trade evaluation audit trail |
| `positions` | Trading | Open positions |
| `orders` | Trading | Broker orders |
| `fills` | Trading | Order fills |

---

## 14. Audit Trail

Every stage preserves full context for reproducibility:

- **Extraction**: raw Ollama response, repair steps, validation errors, all retry attempts
- **Aggregation**: per-signal weight breakdowns (recency, credibility, novelty, market context), contradiction details by dimension
- **Recommendation**: deterministic thesis, evidence citations with weights, eligibility decision trace, risk evaluation
- **Trading**: every pre-trade check result, position sizing breakdown, risk tier at decision time, full decision trace
- **Execution**: order details, fills, P&L, performance metrics