fix: clean up utcnow deprecation warnings, fix 12 failing tests, add CI/CD pipeline manifests

- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files - Fix 12 failing tests to match current implementation behavior - Fix pytest_plugins in non-top-level conftest (moved to root conftest.py) - Auto-fix 189 lint issues (import sorting, unused imports) - Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests) - Add values-beta.yaml and values-paper.yaml for staged deployments - Update GitHub Actions workflow to use self-hosted-gremlin runners - Add integration-test job to CI pipeline Result: 1596 passed, 0 failed, 0 warnings
2026-04-18 03:59:28 +00:00
parent 40227a4eb2
commit c85c0068a2
123 changed files with 7221 additions and 405 deletions
@@ -373,6 +373,110 @@ All services read configuration from environment variables with sensible default

 ---

+## 11. Integration Tests
+
+The integration test pipeline validates all API endpoints against a live Kubernetes sandbox with realistic seed data. It deploys ephemeral infrastructure (PostgreSQL, Redis, MinIO), seeds deterministic test data, deploys all API services, and runs the full test suite with profiling.
+
+### Prerequisites
+
+- `kubectl` configured with access to a Kubernetes cluster
+- Docker images built and pushed to GHCR (or use `:latest`)
+- `envsubst` available (usually part of `gettext` package)
+- `GHCR_TOKEN` environment variable set for image pulls (optional if images are public)
+
+### Running the Full Pipeline
+
+```bash
+# Run with latest images
+bash infra/inttest/run_pipeline.sh
+
+# Run with a specific image tag
+bash infra/inttest/run_pipeline.sh --image-tag abc123
+
+# Keep the sandbox running for debugging
+bash infra/inttest/run_pipeline.sh --skip-teardown
+
+# Custom namespace and results file
+bash infra/inttest/run_pipeline.sh --namespace my-test --results-file results.json
+```
+
+### CLI Options
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--image-tag TAG` | `latest` | Docker image tag to deploy |
+| `--namespace NAME` | `stonks-inttest-<timestamp>` | Kubernetes namespace name |
+| `--skip-teardown` | `false` | Leave namespace running after tests |
+| `--results-file PATH` | `inttest-results.json` | Path for JSON results output |
+
+### Exit Codes
+
+| Code | Meaning |
+|------|---------|
+| 0 | All tests passed |
+| 1 | One or more test failures |
+| 2 | Infrastructure setup failure |
+
+### JSON Result Contract
+
+The pipeline produces a JSON results file (`inttest-results.json` by default) with this structure:
+
+```json
+{
+  "run_id": "stonks-inttest-1705312800",
+  "image_tag": "abc123",
+  "started_at": "2025-01-15T12:00:00Z",
+  "completed_at": "2025-01-15T12:07:30Z",
+  "exit_code": 0,
+  "stages": {
+    "infra_deploy": {"duration_s": 45, "status": "ok"},
+    "seed_data": {"duration_s": 8, "status": "ok"},
+    "service_deploy": {"duration_s": 32, "status": "ok"},
+    "integration_tests": {"duration_s": 28, "status": "ok"},
+    "teardown": {"duration_s": 5, "status": "ok"}
+  },
+  "tests": {"total": 41, "passed": 41, "failed": 0, "errors": 0},
+  "profiling": {
+    "endpoints": {"/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45}},
+    "slow_endpoints": []
+  }
+}
+```
+
+### Running Tests Locally (Development)
+
+For faster iteration during development, you can run individual test files against local services:
+
+```bash
+# Start local services first (query-api on 8000, registry on 8001, etc.)
+# Then run specific test files:
+.venv/bin/python -m pytest tests/integration/test_query_api.py -v --tb=short
+.venv/bin/python -m pytest tests/integration/test_registry_api.py -v --tb=short
+.venv/bin/python -m pytest tests/integration/test_frontend_data_deps.py -v --tb=short
+
+# Run with profiling output:
+.venv/bin/python -m pytest tests/integration/ -v --profiling-output=profiling.json
+```
+
+Set the service URLs via environment variables:
+```bash
+export QUERY_API_URL=http://localhost:8000
+export REGISTRY_API_URL=http://localhost:8001
+export RISK_API_URL=http://localhost:8002
+export TRADING_API_URL=http://localhost:8003
+```
+
+### Future: CI/CD Pipeline
+
+This integration test runner is designed as a standalone foundation. A future CI/CD pipeline spec will consume it as one stage in a larger pipeline that includes:
+- Self-hosted builds on gremlin nodes (no GitHub Actions compute costs)
+- Staged promotion: beta → paper → live
+- Market-hours promotion blockers (9:30–16:00 ET)
+- Break-glass emergency deploy to production
+- Per-stage enable/disable toggles
+
+---
+
 ## Troubleshooting

 ### "Connection refused" to PostgreSQL/Redis/MinIO
@@ -0,0 +1,535 @@
+# From Model Output to Trade: The Full Pipeline
+
+This document traces the complete journey of data through Stonks Oracle — from the moment an Ollama model produces structured JSON, through signal scoring and aggregation, to the final trading decision.
+
+---
+
+## 1. Document Ingestion
+
+Before the model ever sees a document, the ingestion layer fetches raw content from configured sources (news APIs, SEC filings, earnings transcripts, press releases). Each document lands in the `documents` table with a status, type, and `published_at` timestamp. A Redis queue (`stonks:queue:extraction`) feeds documents to the extractor service.
+
+---
+
+## 2. Prompting the Model
+
+The extractor service (`services/extractor/client.py`) sends each document to a local Ollama instance via `POST /api/chat`.
+
+### System prompt
+
+A short, strict instruction set:
+
+> You are a financial document analyst. Extract structured data as JSON. Return ONLY a single JSON object. No markdown fences, no explanation, no text before or after the JSON. Every field in the schema is required. Use "other" for catalyst_type if unsure. Keep evidence_spans short (under 20 words each). Keep key_facts to 3-5 items max.
+
+### User prompt
+
+Built dynamically per document (`services/extractor/prompts.py`). It includes:
+
+- **Document type guidance** — tailored instructions for articles, filings, transcripts, and press releases. For example, filings get: *"Extract concrete financial figures, risk factors, and material events as stated."* Transcripts get: *"Distinguish between management forward-looking statements and reported results."*
+- **Tracked ticker hints** — the list of 50 tracked tickers, with rules: if a ticker appears verbatim in the text, the model must include it; if a sector theme clearly affects a tracked company, include it; never invent tickers outside the list.
+- **Field-by-field instructions** — what each output field means and its valid range.
+- **Document text** — truncated to 8,000 characters to keep inference fast.
+
+### Ollama call parameters
+
+- `think=false` (speed over chain-of-thought)
+- `num_predict=4096` (max output tokens)
+- Optional `num_ctx` override for longer documents
+- The JSON schema (generated from Pydantic models) is passed as the `format` parameter for structured output
+
+---
+
+## 3. Model Output: The JSON Contract
+
+The model returns a single JSON object matching the `ExtractionResult` schema (`services/extractor/schemas.py`):
+
+```json
+{
+  "summary": "Apple reported record Q4 earnings driven by iPhone 16 demand.",
+  "companies": [
+    {
+      "ticker": "AAPL",
+      "company_name": "Apple Inc.",
+      "relevance": 0.95,
+      "sentiment": "positive",
+      "impact_score": 0.8,
+      "impact_horizon": "1d_7d",
+      "catalyst_type": "earnings",
+      "key_facts": [
+        "Revenue up 12% YoY to $94.9B",
+        "iPhone revenue grew 18%",
+        "Services hit all-time high"
+      ],
+      "risks": [
+        "China market softness noted by management"
+      ],
+      "evidence_spans": [
+        "record quarterly revenue of $94.9 billion",
+        "iPhone revenue grew 18 percent year over year"
+      ]
+    }
+  ],
+  "macro_themes": ["consumer_spending", "ai_capex"],
+  "novelty_score": 0.6,
+  "confidence": 0.85,
+  "extraction_warnings": []
+}
+```
+
+### Field definitions
+
+| Field | Type | Range | Purpose |
+|---|---|---|---|
+| `summary` | string | — | 1-3 sentence document summary |
+| `companies[]` | array | — | Per-company intelligence (one entry per affected company) |
+| `.ticker` | string | — | Stock ticker symbol |
+| `.relevance` | float | 0-1 | How central this company is to the document |
+| `.sentiment` | enum | positive / negative / neutral / mixed | Overall sentiment toward the company |
+| `.impact_score` | float | 0-1 | Estimated magnitude of impact (0 = negligible, 1 = highly material) |
+| `.impact_horizon` | string | intraday / 1d / 1d_7d / 1d_30d / 30d_90d / 90d_plus | When the impact is expected to manifest |
+| `.catalyst_type` | enum | earnings / product / legal / macro / supply_chain / m_and_a / rating_change / other | Primary catalyst category |
+| `.key_facts` | string[] | — | Facts explicitly stated in the document (no fabrication) |
+| `.risks` | string[] | — | Risks explicitly mentioned |
+| `.evidence_spans` | string[] | — | Short verbatim quotes supporting the analysis |
+| `macro_themes` | string[] | — | Broad economic themes (rates, inflation, ai_capex, etc.) |
+| `novelty_score` | float | 0-1 | How surprising the information is |
+| `confidence` | float | 0-1 | Model's self-assessed extraction quality |
+| `extraction_warnings` | string[] | — | Issues encountered (ambiguous_ticker, incomplete_text, etc.) |
+
+---
+
+## 4. JSON Repair and Validation
+
+The raw model output goes through two stages before it's trusted.
+
+### 4a. JSON repair (`services/extractor/client.py`)
+
+Ollama's `format` constraint is unreliable with `think=false` on certain models (Ollama bug #14645). The extractor handles this:
+
+1. Try `json.loads()` directly — if it parses, use it as-is.
+2. Strip markdown fences (` ```json ... ``` `) if present.
+3. Fall back to the `json-repair` library, which fixes trailing commas, unterminated strings, and control characters.
+
+### 4b. Structural + semantic validation (`services/extractor/schemas.py`)
+
+1. **Structural validation** — parse the JSON against the `ExtractionResult` Pydantic model. Missing required fields, wrong types, or out-of-range values fail here.
+2. **Semantic validation** — cross-field consistency checks:
+   - Ticker format validation
+   - Evidence span length checks
+   - Catalyst type alias normalization (maps variants to canonical enum values)
+   - Impact horizon normalization
+3. Returns a `ValidationReport` with the parsed result or a list of errors.
+
+### 4c. Retry logic
+
+If validation fails and the error is retryable (not an HTTP 4xx client error), the extractor retries up to `max_retries` times (default 2) with exponential backoff. Every attempt — raw output, validation result, error, duration — is preserved in the `ExtractionResponse.attempts` list for audit.
+
+---
+
+## 5. Persistence: Document Intelligence and Impact Records
+
+A successful extraction produces two sets of database records.
+
+### Document intelligence (`document_intelligence` table)
+
+One row per document:
+- `document_id`, `document_type`, `summary`, `companies` (JSONB), `macro_themes`
+- `novelty_score`, `source_credibility`, `confidence`, `extraction_warnings`
+- `validation_status` (valid/failed)
+- `model` metadata: provider, model_name, prompt_version, schema_version
+
+### Per-company impact records (`document_impact_records` table)
+
+One row per company mentioned in the extraction:
+- `ticker`, `company_name`, `relevance`, `sentiment`, `impact_score`, `impact_horizon`, `catalyst_type`
+- `key_facts`, `risks`, `evidence_spans` (all JSONB)
+- Links back to `document_intelligence` via `intelligence_id`
+
+### Raw artifact storage (MinIO)
+
+Full prompts and raw model responses are stored in MinIO buckets (`stonks-llm-prompts`, `stonks-llm-results`) keyed by `document_id`, so any extraction can be replayed or audited.
+
+---
+
+## 6. Signal Scoring: Turning Records into Weighted Signals
+
+The aggregation engine (`services/aggregation/worker.py`) converts raw impact records into `WeightedSignal` objects. Each signal carries a composite weight that determines how much it influences the final trend.
+
+### Weight components (`services/aggregation/scoring.py`)
+
+The combined weight is:
+
+```
+combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier
+```
+
+| Component | Formula | Purpose |
+|---|---|---|
+| **Confidence gate** | 0 if extraction confidence < 0.2, else 1 | Reject unreliable extractions entirely |
+| **Recency decay** | `2^(-age_hours / half_life)`, min 0.01 | Exponential decay — newer documents matter more. Half-lives: intraday=2h, 1d=12h, 7d=72h, 30d=240h, 90d=720h |
+| **Credibility** | `source_credibility ^ exponent`, clamped [0.1, 1.0] | Source quality weighting |
+| **Novelty bonus** | `novelty_score × 0.25` | Novel information gets up to 25% boost |
+| **Market context** | Volatility boost (up to +30%) + volume surge boost (+15%) | Fast-moving, high-volume markets amplify fresh signals |
+
+Each `WeightedSignal` also carries:
+- `sentiment_value`: +1.0 (positive), -1.0 (negative), 0.0 (neutral/mixed)
+- `impact_score`: the extraction's impact magnitude
+- `document_id`: for evidence tracing
+
+---
+
+## 7. Three Signal Layers
+
+The aggregation engine merges signals from three independent layers. Each layer can be toggled on/off at runtime via the `risk_configs` table — no restart needed.
+
+### Layer 1: Company-specific signals (always active)
+
+Direct document intelligence about a company. This is the core layer — `document_impact_records` for the ticker, scored as described in §6.
+
+### Layer 2: Macro signals (toggle: `macro_enabled`)
+
+Global events that affect companies through exposure profiles.
+
+**Flow:**
+1. The macro service classifies global events (from news) using Ollama — extracting event type, severity, affected regions/sectors/commodities.
+2. Each company has an **exposure profile** (`exposure_profiles` table): geographic revenue mix, supply chain regions, commodity dependencies, market position tier.
+3. **Overlap scoring** computes how much a global event overlaps with a company's exposure (geographic, supply chain, commodity dimensions).
+4. A **resilience modifier** based on market position tier (global leaders are more resilient than domestic companies) adjusts the score.
+5. The final `macro_impact_score = base_score × overlap_factor × resilience_modifier`.
+6. Events older than 48 hours get accelerated staleness decay.
+
+Macro signals are converted to `WeightedSignal` objects with:
+- `sentiment_value` mapped from `impact_direction` (positive → +1, negative → -1)
+- `impact_score = macro_impact_score × macro_signal_weight` (default weight: 0.3)
+- Recency decay from the global event's publication time
+
+### Layer 3: Competitive signals (toggle: `competitive_enabled`)
+
+Historical patterns and cross-company signal propagation.
+
+**Flow:**
+1. **Self-company pattern mining** (`services/aggregation/pattern_matcher.py`): For each catalyst type in the current impact records, query historical outcomes for this ticker. Lookback: 180 days for routine signals, 365 days for major decisions (1.3× weight multiplier). Produces `HistoricalPattern` objects with `bullish_pct`, `bearish_pct`, `avg_strength`, `pattern_confidence`.
+2. **Cross-company propagation** (`services/aggregation/signal_propagation.py`): When company A has a catalyst, look up its competitors via the `competitor_relationships` table (46 relationships across 50 companies). For each competitor, query cross-company historical patterns. Signal strength = `avg_strength × relationship_strength × pattern_confidence × impact_score`. Direction = majority historical outcome (bullish or bearish).
+3. Competitive signals are converted to `WeightedSignal` objects with:
+   - `impact_score = signal_strength × competitive_signal_weight` (default weight: 0.2)
+   - Recency decay from the pattern's most recent data point or the signal's `computed_at` time
+
+### Merging
+
+All three layers produce `WeightedSignal` objects with the same structure. The aggregation engine simply concatenates them into a single list before computing the trend summary. The relative influence of each layer is controlled by the `macro_signal_weight` (0.3) and `competitive_signal_weight` (0.2) multipliers applied to their impact scores.
+
+---
+
+## 8. Trend Summary Assembly
+
+From the merged signal list, the aggregation engine computes a `TrendSummary` for each ticker × window combination (intraday, 1d, 7d, 30d, 90d).
+
+### Weighted sentiment average
+
+```
+avg_sentiment = Σ(sentiment_value × combined_weight × impact_score) / Σ(combined_weight × impact_score)
+```
+
+### Trend direction
+
+| Condition | Direction |
+|---|---|
+| `avg_sentiment ≥ 0.15` | **Bullish** |
+| `avg_sentiment ≤ -0.15` | **Bearish** |
+| Contradiction > 0.10 and \|avg_sentiment\| < 0.30 | **Mixed** |
+| Otherwise | **Neutral** |
+
+### Trend strength
+
+`strength = min(|avg_sentiment|, 1.0)` — the absolute magnitude of the weighted sentiment, clamped to [0, 1].
+
+### Contradiction score
+
+Measures disagreement among signals:
+
+```
+contradiction = minority_side_weight / total_weight
+```
+
+Where minority side is whichever of positive or negative has less total weight. A score of 0 means full agreement; approaching 0.5 means equal-weight disagreement.
+
+The system also runs multi-dimensional contradiction detection (`services/aggregation/contradiction.py`):
+- **Sentiment disagreement** — the core positive-vs-negative split
+- **Catalyst disagreement** — same catalyst type with opposing sentiment from different documents
+
+### Confidence
+
+Derived from four factors:
+- **Unique source count** — more distinct documents = higher confidence (caps at 15 unique sources for 0.8 contribution)
+- **Average extraction confidence** — from the model's self-assessed quality
+- **Signal agreement** — fraction of signals pointing the same direction, dampened by sample size (log₂ scaling, saturates around 7 unique sources)
+- **Contradiction penalty** — `contradiction_score × 0.4` subtracted
+
+### Evidence ranking
+
+Supporting and opposing documents are ranked by a composite score considering weight, impact, recency, and confidence — not just raw weight. The top 10 of each are stored for citation.
+
+### Catalysts and risks
+
+Dominant catalyst types are ranked by cumulative signal weight. Material risks are deduplicated and ordered by the weight of the signal that surfaced them.
+
+### Persistence
+
+The assembled `TrendSummary` is upserted into the `trend_windows` table (one row per entity × window, updated each cycle). A snapshot is also appended to `trend_history` for time-series charting. Evidence mappings go into `trend_evidence` with per-document rank scores and component breakdowns.
+
+---
+
+## 9. Trend Projections
+
+After assembling the current trend, the engine computes a forward-looking projection (`services/aggregation/projection.py`):
+
+- **Macro decay** — projects macro event impact forward with exponential decay based on estimated duration and severity
+- **Momentum** — trend momentum from recent price action
+- **Driving factors** — lists key macro events, competitive patterns, and market conditions
+- **Divergence detection** — flags when the projection diverges from the current trend direction
+
+Output: `TrendProjection` with `projected_direction`, `projected_strength`, `projected_confidence`, `projection_horizon`, `driving_factors`, and `diverges_from_current`. Projections with confidence below 0.3 are flagged as `low_confidence` and excluded from thesis generation.
+
+---
+
+## 10. Recommendation Generation
+
+The recommendation service (`services/recommendation/worker.py`) turns trend summaries into actionable recommendations.
+
+### Step 1: Data quality suppression (`services/recommendation/suppression.py`)
+
+Before any eligibility check, the system evaluates the quality of the underlying data:
+
+| Check | Threshold | Effect |
+|---|---|---|
+| Average extraction confidence | < 0.40 | Suppress |
+| Evidence staleness | > 168 hours (7 days) | Suppress |
+| Source type diversity | < 1 distinct type | Suppress |
+| Extraction failure rate | > 50% | Suppress |
+| Valid document count | < 2 | Suppress |
+| Overall data quality score | < 0.30 | Suppress |
+
+The data quality score is a weighted composite: 40% extraction confidence + 30% evidence freshness + 30% document coverage.
+
+**Safety suppression** — two additional rules prevent trading on thin evidence from a single signal layer:
+- **Macro-only suppression**: If the trend direction is driven solely by macro signals with zero company-specific evidence, the recommendation is forced to informational mode.
+- **Pattern-only suppression**: Same rule for pattern/competitive signals with no company or macro support.
+
+### Step 2: Eligibility evaluation (`services/recommendation/eligibility.py`)
+
+Deterministic rules — no model involvement:
+
+**Gate checks** (any failure → no recommendation):
+- Confidence ≥ 0.35
+- Trend strength ≥ 0.10
+- Contradiction score ≤ 0.60
+- Evidence count ≥ 2
+- Direction ≠ neutral
+
+**Action mapping:**
+- Strong bullish (strength ≥ 0.25) → **BUY**
+- Strong bearish (strength ≥ 0.25) → **SELL**
+- Weak but directional + decent confidence (≥ 0.50) → **HOLD**
+- Everything else → **WATCH**
+
+**Mode escalation:**
+- WATCH and HOLD → always **informational** (no trades)
+- BUY/SELL with confidence ≥ 0.70, contradiction ≤ 0.25, evidence ≥ 5 → **live_eligible**
+- BUY/SELL with confidence ≥ 0.50 → **paper_eligible**
+- Below that → **informational**
+
+### Step 3: Position sizing
+
+Computed from signal quality:
+
+```
+raw_portfolio_pct = base (1%) + confidence × strength × range (up to 10%)
+```
+
+Adjusted by:
+- Contradiction penalty (higher contradiction → smaller position)
+- Evidence count penalty (< 3 docs → 50% reduction, < 5 docs → 75%)
+- Max loss percentage scales similarly (base 0.3% up to 2%)
+
+### Step 4: Thesis generation
+
+Two layers:
+1. **Deterministic thesis** — assembled from trend direction, strength, catalysts, risks, contradiction notes, projection info, and the recommended action. Always generated.
+2. **Optional LLM rewrite** (`services/recommendation/thesis_llm.py`) — for trading-eligible recommendations only, the deterministic thesis is rewritten into analyst-quality prose via Ollama. This is cosmetic; the underlying decision is unchanged.
+
+### Step 5: Risk classification
+
+Based on contradiction score, confidence, evidence count, and mode:
+- `low` — high confidence, low contradiction, strong evidence
+- `moderate` — decent signals with some uncertainty
+- `high` — notable contradiction or low evidence
+- `very_high` — multiple risk factors present
+
+The thesis is prefixed with the risk label: `[risk:moderate] AAPL shows a bullish trend...`
+
+### Step 6: Persistence
+
+- `recommendations` table — the full recommendation record
+- `recommendation_evidence` table — per-document citations with weights and evidence types
+- `risk_evaluations` table — the eligibility decision, risk checks, and full decision trace
+
+---
+
+## 11. Trading Engine Decision Loop
+
+The trading engine (`services/trading/engine.py`) polls the `recommendations` table every 60 seconds for actionable recommendations (`action IN ('buy', 'sell')` and `mode IN ('paper_eligible', 'live_eligible')`).
+
+### Pre-trade checks (in order, first failure short-circuits)
+
+1. **Circuit breaker** — is the daily loss cap or single-position loss cap breached? If so, all trading halts.
+2. **Trading window** — is the market open? Outside market hours, skip.
+3. **Confidence gate** — does the recommendation meet the active risk tier's minimum confidence?
+4. **Deduplication** — has this recommendation already been processed?
+5. **Declining positions** — are there multiple open positions currently declining?
+6. **Max open positions** — is the portfolio at capacity?
+
+### Position sizing (`services/trading/position_sizer.py`)
+
+Computes the dollar amount and share quantity:
+- Confidence-based scaling (sample-size-dampened agreement scoring)
+- Risk tier adjustment (conservative / moderate / aggressive)
+- Portfolio heat check (sector concentration, correlation)
+- Active pool available capital
+- Absolute position cap
+
+### Stop-loss and take-profit (`services/trading/stop_loss_manager.py`)
+
+- Stop-loss = entry price − (ATR × atr_multiplier)
+- Take-profit = entry price + (ATR × atr_multiplier × reward_risk_ratio)
+- Trailing stops activate for open positions
+
+### Additional checks
+
+- **Correlation-aware diversification** — reject positions that would push portfolio correlation above threshold
+- **Earnings calendar awareness** — reduce size or skip if earnings are within 2 days
+- **Gradual entry** — large positions (> $30) split into 3 tranches over time
+- **Reserve pool** — profits from closed positions siphon into an emergency liquidity reserve
+
+### Risk tier auto-adjustment (`services/trading/risk_tier_controller.py`)
+
+Daily evaluation of Sharpe ratio, drawdown, and win rate. The engine auto-adjusts between conservative, moderate, and aggressive tiers. The new tier is persisted to `risk_configs` and takes effect on the next cycle.
+
+### Output: Trading Decision
+
+Every evaluation produces a `TradingDecision` record persisted for audit:
+- `decision`: act or skip
+- `skip_reason`: which check failed (if any)
+- `computed_position_size`, `computed_share_quantity`
+- `risk_tier_at_decision`, `portfolio_heat_at_decision`, `active_pool_at_decision`
+- `circuit_breaker_status`, `correlation_check_result`, `sector_exposure_check_result`
+- `earnings_proximity_flag`, `is_micro_trade`, `decision_trace`
+
+If the decision is **act**, an order job is pushed to the Redis broker queue (`stonks:queue:broker`) with ticker, action, quantity, and order type.
+
+---
+
+## 12. The Complete Data Flow (Summary)
+
+```
+Document (article/filing/transcript)
+  │
+  ▼
+Ollama extraction (JSON)
+  │  ├─ JSON repair (json-repair library)
+  │  └─ Pydantic validation + semantic checks
+  │
+  ▼
+document_intelligence + document_impact_records (PostgreSQL)
+  │  └─ Raw prompts/responses → MinIO (audit)
+  │
+  ├──────────────────────────────────────────────────┐
+  │                                                  │
+  ▼                                                  ▼
+Layer 1: Company signals              Layer 2: Macro signals
+(impact records → WeightedSignal)     (global_events → exposure matching
+                                       → macro_impact_records → WeightedSignal)
+  │                                                  │
+  │                    Layer 3: Competitive signals   │
+  │                    (pattern mining + propagation  │
+  │                     → competitive_signal_records  │
+  │                     → WeightedSignal)             │
+  │                           │                       │
+  └───────────┬───────────────┘───────────────────────┘
+              │
+              ▼
+       Signal merging (concatenate all WeightedSignals)
+              │
+              ▼
+       Trend summary assembly
+       (weighted sentiment → direction, strength, confidence,
+        contradiction, evidence ranking, catalysts, risks)
+              │
+              ├─→ trend_windows (PostgreSQL)
+              ├─→ trend_history (time-series)
+              └─→ trend_evidence (per-document rankings)
+              │
+              ▼
+       Trend projection (forward-looking)
+              │
+              ▼
+       Data quality suppression
+       (extraction confidence, staleness, diversity,
+        macro-only / pattern-only safety)
+              │
+              ▼
+       Eligibility evaluation
+       (gate checks → action mapping → mode escalation → position sizing)
+              │
+              ▼
+       Thesis generation + risk classification
+              │
+              ├─→ recommendations (PostgreSQL)
+              ├─→ recommendation_evidence
+              └─→ risk_evaluations
+              │
+              ▼
+       Trading engine decision loop
+       (pre-trade checks → position sizing → stop-loss →
+        correlation → earnings → gradual entry)
+              │
+              ├─→ trading_decisions (PostgreSQL, audit)
+              └─→ stonks:queue:broker (Redis, order execution)
+```
+
+---
+
+## 13. Key Database Tables
+
+| Table | Stage | Purpose |
+|---|---|---|
+| `documents` | Ingestion | Raw ingested content |
+| `document_intelligence` | Extraction | Ollama extraction output |
+| `document_impact_records` | Extraction | Per-company impact from a document |
+| `global_events` | Macro | Classified macro/geopolitical events |
+| `exposure_profiles` | Macro | Company exposure data (geography, supply chain, commodities) |
+| `macro_impact_records` | Macro | Per-company macro impact scores |
+| `competitor_relationships` | Competitive | Company relationship graph |
+| `competitive_signal_records` | Competitive | Cross-company propagated signals |
+| `trend_windows` | Aggregation | Current trend summaries (upserted each cycle) |
+| `trend_history` | Aggregation | Time-series snapshots for charting |
+| `trend_evidence` | Aggregation | Per-document evidence rankings |
+| `trend_projections` | Projection | Forward-looking trend projections |
+| `recommendations` | Recommendation | Trade recommendations |
+| `recommendation_evidence` | Recommendation | Per-document citations |
+| `risk_evaluations` | Recommendation | Eligibility decisions and risk checks |
+| `risk_configs` | Runtime | Toggle switches and risk tier configuration |
+| `trading_decisions` | Trading | Pre-trade evaluation audit trail |
+| `positions` | Trading | Open positions |
+| `orders` | Trading | Broker orders |
+| `fills` | Trading | Order fills |
+
+---
+
+## 14. Audit Trail
+
+Every stage preserves full context for reproducibility:
+
+- **Extraction**: raw Ollama response, repair steps, validation errors, all retry attempts
+- **Aggregation**: per-signal weight breakdowns (recency, credibility, novelty, market context), contradiction details by dimension
+- **Recommendation**: deterministic thesis, evidence citations with weights, eligibility decision trace, risk evaluation
+- **Trading**: every pre-trade check result, position sizing breakdown, risk tier at decision time, full decision trace
+- **Execution**: order details, fills, P&L, performance metrics