feat: implement dual-pipeline signal engine service

New service at services/signal_engine/ implementing concurrent heuristic (deterministic scoring) and probabilistic (Bayesian inference) pipelines that evaluate technical signals across 6 timeframes (M30-M) and produce independent BUY/WATCH/SKIP verdicts per ticker per evaluation tick. Components: - Input Normalizer: multi-source data assembly with sentinel fallbacks - Signal Library: Fibonacci, MA Stack, RSI, Cup & Handle, Elliott Wave - Multi-Timeframe Confluence Engine: weighted scoring with D/W/M anchors - Hard Filter Engine: macro_bias, valuation, earnings proximity gating - Heuristic Pipeline: S_total scoring with confidence-gated verdicts - Probabilistic Pipeline: Bayesian log-odds with regime priors, entropy gating, EV_R calculation, and signal correlation penalty - Exit Engine: stop-loss, targets, trailing ATR-based stops - Delta Analyzer: pipeline agreement tracking with rolling Redis metrics - Output Formatter: SignalOutput contract + Recommendation schema mapping - Worker orchestrator: concurrent pipelines with failure isolation - Main entry point: queue polling with fail-safe config loading Infrastructure: - Migration 039: signal_engine_outputs table with 3 indexes - Helm chart: signalEngine service entry (processing tier) - Redis key: QUEUE_SIGNAL_ENGINE constant Tests: 390 tests (unit + property-based) covering all components Config: dual_pipeline_enabled=false by default (safe rollout)
2026-05-02 07:32:26 +00:00
parent 7e2343ec2c
commit f468e30af0
61 changed files with 14107 additions and 184 deletions
@@ -1,6 +1,6 @@
 # AI Agent Building Guide

-Stonks Oracle uses three AI agents powered by a local Ollama instance. Each agent has a dedicated purpose in the pipeline, a database-backed configuration, and support for A/B testing through variants. This guide covers how each agent works, how to configure them, how to create and test variants, and how to monitor performance.
+Stonks Oracle uses three AI agents powered by local LLM inference (Ollama or vLLM). Each agent has a dedicated purpose in the pipeline, a database-backed configuration, and support for A/B testing through variants. This guide covers how each agent works, how to configure them, how to create and test variants, and how to monitor performance.

 ## Table of Contents

@@ -8,6 +8,7 @@ Stonks Oracle uses three AI agents powered by a local Ollama instance. Each agen
  - [Document Intelligence Extractor](#1-document-intelligence-extractor)
  - [Global Event Classifier](#2-global-event-classifier)
  - [Thesis Rewriter](#3-thesis-rewriter)
+- [LLM Provider Abstraction](#llm-provider-abstraction)
 - [Database Schema](#database-schema)
  - [ai_agents Table](#ai_agents-table)
  - [agent_variants Table](#agent_variants-table)
@@ -30,9 +31,10 @@ Three agents are seeded into the `ai_agents` table on first migration (migration
 | **Slug** | `document-extractor` |
 | **Purpose** | Extracts structured intelligence (sentiment, catalysts, impact scores, key facts, risks) from company news, SEC filings, earnings transcripts, and press releases |
 | **Default Model** | `qwen3.5:9b-fast` (Ollama) |
+| **Supported Providers** | `ollama`, `vllm` |
 | **Prompt Version** | `document-intel-v2` |
 | **Schema Version** | `2.0.0` |
-| **Entry Point** | `services/extractor/main.py` → `services/extractor/client.py` |
+| **Entry Point** | `services/extractor/main.py` → `services/extractor/llm_factory.py` → `services/extractor/client.py` (Ollama) or `services/extractor/vllm_client.py` (vLLM) |

 **Input Data:**
 - Normalized document text (fetched from MinIO or passed in the Redis job payload)
@@ -40,7 +42,7 @@ Three agents are seeded into the `ai_agents` table on first migration (migration
 - List of tracked tickers for company identification
 - Document ID for traceability

-**Output Schema** (`ExtractionResult`):
+**Output Schema** (`ExtractionResult` — defined in `services/extractor/schemas.py`):

 ```json
 {
@@ -81,6 +83,7 @@ Use "other" for catalyst_type if unsure. Keep evidence_spans short
 - Includes tracked ticker list with rules for company identification
 - Includes the full JSON schema field descriptions
 - Truncates documents to 8,000 characters to limit inference time
+- When an active variant has `input_token_limit > 0`, truncation uses `input_token_limit * 4` characters instead

 ---

@@ -91,6 +94,7 @@ Use "other" for catalyst_type if unsure. Keep evidence_spans short
 | **Slug** | `event-classifier` |
 | **Purpose** | Classifies global/geopolitical news into structured macro events with impact type, severity, affected regions/sectors/commodities, and estimated duration |
 | **Default Model** | `qwen3.5:9b-fast` (Ollama) |
+| **Supported Providers** | `ollama`, `vllm` |
 | **Prompt Version** | `event-classification-v1` |
 | **Schema Version** | `1.0.0` |
 | **Entry Point** | `services/extractor/main.py` → `services/extractor/event_classifier.py` |
@@ -99,7 +103,7 @@ Use "other" for catalyst_type if unsure. Keep evidence_spans short
 - Normalized text of a macro news article (from the `stonks:queue:macro_classification` Redis queue)
 - Document ID for traceability

-**Output Schema** (`GlobalEvent`):
+**Output Schema** (`GlobalEvent` — defined in `services/extractor/event_classifier.py`):

 ```json
 {
@@ -141,9 +145,11 @@ as empty arrays.
 ```

 **User Prompt Template** (built by `build_event_classification_prompt()` in `services/extractor/event_classifier.py`):
- Includes anti-hallucination rules
+- Includes anti-hallucination rules (no fabrication, severity "critical" reserved for multi-country events)
 - Lists all valid enum values for each field
 - Truncates articles to 6,000 characters
+- When an active variant has `input_token_limit > 0`, truncation uses `input_token_limit * 4` characters instead
+- If a variant overrides the system prompt, the classifier ensures JSON output instructions are always appended if not already present

 ---

@@ -154,6 +160,7 @@ as empty arrays.
 | **Slug** | `thesis-rewriter` |
 | **Purpose** | Rewrites deterministic trade thesis summaries into clear, professional analyst prose. Optional layer — the system falls back to the deterministic thesis if this fails |
 | **Default Model** | `qwen3.5:9b-fast` (Ollama) |
+| **Supported Providers** | `ollama`, `vllm` |
 | **Prompt Version** | `thesis-rewrite-v1` |
 | **Schema Version** | `1.0.0` |
 | **Entry Point** | `services/recommendation/main.py` → `services/recommendation/thesis_llm.py` |
@@ -165,6 +172,7 @@ as empty arrays.
 **Output Schema:**
 - Plain text (not JSON). The model returns only the rewritten thesis as a string, under 150 words.
 - On failure or empty response, the original deterministic thesis is returned unchanged.
+- A `_strip_thinking_block()` post-processor removes `<think>` XML tags and "Thinking Process:" blocks that some models (e.g. Qwen3) emit before the actual response.

 **System Prompt:**

@@ -182,11 +190,37 @@ STRICT RULES:
 5. Use a neutral, professional tone. Avoid hype or marketing language.
 6. Return ONLY the rewritten thesis text. No JSON, no markdown, no
   commentary.
+7. Do NOT show your thinking process. Do NOT include any reasoning
+   steps. Output ONLY the final rewritten text.
 ```

 **User Prompt Template** (built by `build_thesis_rewrite_prompt()` in `services/recommendation/thesis_llm.py`):
 - Includes the deterministic thesis between delimiters
 - Includes trend context: ticker, window, direction, strength, confidence, contradiction score, top catalysts, top risks
+- Appends `/no_think` suffix to suppress reasoning mode on models that support it (e.g. Qwen3)
+- Ollama calls also set `"think": false` in the request payload
+
+---
+
+## LLM Provider Abstraction
+
+All three agents support both **Ollama** and **vLLM** as inference providers. The provider is determined by the `model_provider` field in the agent config (or active variant).
+
+**Module:** `services/extractor/llm_factory.py`
+
+The `build_llm_client()` factory function routes to the correct client:
+
+| `model_provider` value | Client class | API endpoint |
+|------------------------|-------------|--------------|
+| `ollama` (default), `""`, `None` | `OllamaClient` (`services/extractor/client.py`) | `{OLLAMA_BASE_URL}/api/chat` |
+| `vllm` | `VLLMClient` (`services/extractor/vllm_client.py`) | `{VLLM_BASE_URL}/v1/chat/completions` (OpenAI-compatible) |
+| Unknown value | `OllamaClient` (with warning log) | Falls back to Ollama |
+
+Both clients implement the `LLMClient` protocol (`services/shared/llm_protocol.py`), providing `call_llm()` and `close()` methods.
+
+**Provider switching at runtime:** When a variant changes the `model_provider`, the extractor worker detects this during its periodic config refresh (every 100 jobs) and creates a new client instance. The old client is closed gracefully. A safety guard prevents switching to Ollama if `OLLAMA_BASE_URL` is empty.
+
+**vLLM health check:** At startup, if the resolved provider is `vllm`, the extractor runs a health check against the vLLM endpoint. If it fails, the worker falls back to Ollama automatically.

 ---

@@ -202,8 +236,8 @@ Defined in migration `026_ai_agents.sql`. Stores the base configuration for each
 | `name` | `VARCHAR(100)` | — | Human-readable name (unique) |
 | `slug` | `VARCHAR(100)` | — | URL-safe identifier (unique), used by `AgentConfigResolver` |
 | `purpose` | `TEXT` | `''` | Description of what the agent does |
-| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider |
-| `model_name` | `VARCHAR(200)` | `'qwen3.5:9b'` | Model identifier |
+| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider (`ollama` or `vllm`) |
+| `model_name` | `VARCHAR(200)` | `'qwen3.5:9b-fast'` | Model identifier |
 | `system_prompt` | `TEXT` | `''` | System prompt sent to the model |
 | `user_prompt_template` | `TEXT` | `''` | User prompt template (optional — code-defined templates take precedence) |
 | `prompt_version` | `VARCHAR(100)` | `''` | Version tag for prompt tracking |
@@ -297,13 +331,20 @@ The `AgentConfigResolver` is the central mechanism for resolving runtime agent c
 2. **COALESCE-based override**: The SQL query uses `COALESCE(variant_column, agent_column)` for every configuration field. If an active variant exists and has a non-NULL value for a field, that value is used. Otherwise, the base agent's value is used.

   ```sql
-   SELECT a.id AS agent_id,
-          v.id AS variant_id,
+   SELECT a.id        AS agent_id,
+          v.id        AS variant_id,
          COALESCE(v.model_provider,       a.model_provider)       AS model_provider,
          COALESCE(v.model_name,           a.model_name)           AS model_name,
          COALESCE(v.system_prompt,        a.system_prompt)        AS system_prompt,
          COALESCE(v.user_prompt_template, a.user_prompt_template) AS user_prompt_template,
-          -- ... all other fields ...
+          COALESCE(v.prompt_version,       a.prompt_version)       AS prompt_version,
+          COALESCE(v.temperature,          a.temperature)          AS temperature,
+          COALESCE(v.max_tokens,           a.max_tokens)           AS max_tokens,
+          COALESCE(v.context_window,       0)                      AS context_window,
+          COALESCE(v.input_token_limit,    0)                      AS input_token_limit,
+          COALESCE(v.token_budget,         0)                      AS token_budget,
+          COALESCE(v.timeout_seconds,      a.timeout_seconds)      AS timeout_seconds,
+          COALESCE(v.max_retries,          a.max_retries)          AS max_retries
     FROM ai_agents a
     LEFT JOIN agent_variants v
            ON v.agent_id = a.id AND v.is_active = TRUE
@@ -361,7 +402,10 @@ resolver.invalidate()                       # Clear all entries

 ### Config Refresh in Workers

-The extractor and recommendation workers periodically re-resolve their agent config (every 100 jobs for the extractor, every 50 jobs for the recommendation worker). If the resolved model changes, the worker creates a new `OllamaClient` instance with the updated configuration.
+The extractor and recommendation workers periodically re-resolve their agent config to pick up variant swaps and model changes:
+
+- **Extractor worker** (`services/extractor/main.py`): Re-resolves both `document-extractor` and `event-classifier` configs every **100 jobs**. If the resolved model or provider changes, the worker creates a new LLM client instance via `build_llm_client()` and closes the old one. A safety guard prevents switching to Ollama if `OLLAMA_BASE_URL` is empty.
+- **Recommendation worker** (`services/recommendation/main.py`): Re-resolves the `thesis-rewriter` config every **50 jobs**. If the model changes, a new `OllamaConfig` is built.

 ---

@@ -373,7 +417,7 @@ Every agent invocation is logged to `agent_performance_log` with the `agent_id`

 - **Document extractor**: Logged in `services/extractor/main.py` after each extraction. Records success/failure, duration, confidence, retry count, token estimates.
 - **Event classifier**: Logged in `services/extractor/event_classifier.py` after each classification. Same fields.
- **Thesis rewriter**: Logged in `services/recommendation/thesis_llm.py` after each rewrite attempt. Confidence is always 0.0 (not applicable for rewrites).
+- **Thesis rewriter**: Logged in `services/recommendation/thesis_llm.py` after each rewrite attempt. Confidence is always 0.0 (not applicable for rewrites). `document_id` is always NULL.

 ### Querying for Variant Comparison

@@ -464,6 +508,8 @@ All agent endpoints are served by the Query API (`services/api/app.py`) under th
 }
 ```

+All fields except `name` have defaults. The `slug` is auto-generated from `name` if not provided. The `model_name` defaults to `llama3.1:8b` for user-created agents.
+
 **Update Agent Request Body** (all fields optional):

 ```json
@@ -509,6 +555,30 @@ All agent endpoints are served by the Query API (`services/api/app.py`) under th
 | `PUT` | `/api/agents/{agent_id}/variants/{variant_id}` | Partial update a variant |
 | `DELETE` | `/api/agents/{agent_id}/variants/{variant_id}` | Delete a variant (returns 400 if active) |

+**Create Variant Request Body:**
+
+```json
+{
+  "variant_name": "Llama 3.1 8B Test",
+  "variant_slug": "llama-3-1-8b-test",
+  "description": "Testing llama3.1:8b as an alternative",
+  "model_provider": "ollama",
+  "model_name": "llama3.1:8b",
+  "system_prompt": "",
+  "user_prompt_template": "",
+  "prompt_version": "",
+  "temperature": 0.0,
+  "max_tokens": 32768,
+  "context_window": 0,
+  "input_token_limit": 0,
+  "token_budget": 0,
+  "timeout_seconds": 120,
+  "max_retries": 2
+}
+```
+
+Required fields: `variant_name`, `model_name`. The `variant_slug` is auto-generated from `variant_name` if not provided.
+
 ### Clone Endpoints

 | Method | Path | Description |
@@ -516,7 +586,7 @@ All agent endpoints are served by the Query API (`services/api/app.py`) under th
 | `POST` | `/api/agents/{agent_id}/clone` | Clone an agent's base config as a new variant |
 | `POST` | `/api/agents/{agent_id}/variants/{variant_id}/clone` | Clone an existing variant as a new variant |

-Clone requests copy all configuration fields from the source, with optional overrides in the request body.
+Clone requests copy all configuration fields from the source, with optional overrides in the request body. The `variant_name` field is required. All other fields default to the source's values if not provided.

 ### Activate / Deactivate

@@ -525,6 +595,8 @@ Clone requests copy all configuration fields from the source, with optional over
 | `POST` | `/api/agents/{agent_id}/variants/{variant_id}/activate` | Set a variant as active (deactivates any other active variant in a single transaction) |
 | `POST` | `/api/agents/{agent_id}/variants/deactivate` | Deactivate the currently active variant (agent falls back to base config) |

+The activate endpoint uses a database transaction to atomically deactivate the current variant and activate the new one, ensuring exactly one active variant at all times.
+
 ### Per-Variant Performance

 | Method | Path | Description |
@@ -532,6 +604,8 @@ Clone requests copy all configuration fields from the source, with optional over
 | `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance` | Aggregated metrics for a specific variant |
 | `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance/history` | Hourly time-series for a specific variant |

+Both endpoints accept the same `hours` query parameter (default 24, max 720) and return the same response shape as the agent-level performance endpoints.
+
 ---

 ## Step-by-Step: Creating and Activating a Variant
@@ -616,3 +690,20 @@ curl -s -X PUT \
 ```

 Then re-activate and compare again.
+
+### 7. Switch to vLLM Provider
+
+To test a variant using vLLM instead of Ollama:
+
+```bash
+curl -s -X POST https://stonks-api.celestium.life/api/agents/$AGENT_ID/clone \
+  -H "Content-Type: application/json" \
+  -d '{
+    "variant_name": "vLLM Qwen3 Test",
+    "description": "Testing extraction with vLLM backend",
+    "model_provider": "vllm",
+    "model_name": "Qwen/Qwen3-8B"
+  }' | jq .
+```
+
+The extractor worker will detect the provider change during its next config refresh and build a `VLLMClient` instead of an `OllamaClient`. Ensure the `VLLM_BASE_URL` environment variable is set in the extractor deployment.
@@ -142,14 +142,35 @@ Trend projection for a specific trend window.
 ### 1.5 Market Prices

 #### `GET /api/market/prices/{ticker}`
-Historical close prices from `market_snapshots`.
+Historical OHLCV bars from `market_snapshots`, deduplicated by bar timestamp and ordered oldest-first. Also returns 90-day high/low range.

 | Parameter | Type | Default | Constraints | Description |
 |-----------|------|---------|-------------|-------------|
-| `limit` | int | `30` | max `200` | Max bars returned |
+| `limit` | int | `200` | max `500` | Max bars returned |

 - **Path params:** `ticker` (auto-uppercased)
- **Response:** Array of OHLCV objects ordered oldest-first
+- **Response:** `{ bars: [{ ticker, close, open, high, low, volume, bar_timestamp, captured_at }], range_90d: { low, high } }`
+
+#### `POST /api/market/backfill/{ticker}`
+Backfill daily OHLCV bars from Polygon for the last N days. Deduplicates by bar timestamp.
+
+| Parameter | Type | Default | Constraints | Description |
+|-----------|------|---------|-------------|-------------|
+| `days` | int | `90` | max `365` | Number of days to backfill |
+
+- **Path params:** `ticker` (auto-uppercased)
+- **Response:** `{ ticker, inserted, total_bars, days }`
+- **Errors:** `503` — No market data API key configured
+
+#### `POST /api/market/backfill-all`
+Backfill daily bars for all active companies from Polygon.
+
+| Parameter | Type | Default | Constraints | Description |
+|-----------|------|---------|-------------|-------------|
+| `days` | int | `90` | max `365` | Number of days to backfill |
+
+- **Response:** `{ total_inserted, tickers, details[] }` — each detail has `{ ticker, inserted }` or `{ ticker, inserted: 0, error }`
+- **Errors:** `503` — No market data API key configured

 ### 1.6 Recommendations

@@ -224,8 +245,6 @@ Get audit events for any entity type and ID.

 - **Path params:** `entity_type` (string), `entity_id` (string)
 - **Response:** Array of audit event objects
- **Errors:** `404` — No audit events found
-

 ### 1.10 Admin: Source Health

@@ -331,6 +350,8 @@ Approve or reject a pending operator approval request.
 #### `GET /api/admin/trading/lockouts`
 List active symbol lockouts (news-shock, cooldown, manual).

+- **Response:** Array of lockout objects
+
 #### `POST /api/admin/trading/lockouts`
 Create a manual symbol lockout.

@@ -353,7 +374,6 @@ Update operator approval settings.
 - **Body:** `{ auto_approve_paper?: bool, require_approval_for_live?: bool, approval_timeout_minutes?: int }`
 - **Response:** Updated approval settings

-
 ### 1.13 Operational Dashboard

 #### `GET /api/ops/ingestion/throughput`
@@ -450,7 +470,7 @@ Trino catalog/schema/table/column metadata for the schema browser.
 #### `GET /api/analytics/pg-schema`
 PostgreSQL table/column metadata with primary keys, foreign keys, and row estimates.

- **Response:** `{ catalog: "postgresql", schema: "public", tables[] }`
+- **Response:** `{ catalog: "postgresql", schema: "public", tables[{ name, row_estimate, columns[{ name, type, nullable, primary_key?, references?, has_default? }] }] }`

 #### `POST /api/analytics/pg-query`
 Run read-only SQL against PostgreSQL directly. Only SELECT statements allowed.
@@ -462,17 +482,19 @@ Run read-only SQL against PostgreSQL directly. Only SELECT statements allowed.
 #### `GET /api/analytics/saved-queries`
 List all saved queries.

+- **Response:** Array of `{ id, name, description, sql_text, created_by, created_at, updated_at }`
+
 #### `POST /api/analytics/saved-queries` (201)
 Save a new query.

 - **Body:** `{ name: string, description?: string, sql_text: string }`
+- **Response:** `{ id, name, description, sql_text, created_by, created_at }`

 #### `DELETE /api/analytics/saved-queries/{query_id}`
 Delete a saved query.

 - **Errors:** `404` — Query not found

-
 ### 1.16 Macro Signal Layer

 #### `GET /api/admin/macro/status`
@@ -501,9 +523,13 @@ List recent global events with filtering.
 | `limit` | int | `50` | max `200` | Page size |
 | `offset` | int | `0` | — | Pagination offset |

+- **Response:** Array of global event objects with `id`, `event_types`, `severity`, `affected_regions`, `affected_sectors`, `affected_commodities`, `summary`, `key_facts`, `estimated_duration`, `confidence`, `source_document_id`, `created_at`
+
 #### `GET /api/macro/events/{event_id}`
 Event detail with affected companies and macro impact scores.

+- **Path params:** `event_id` (UUID string)
+- **Response:** Global event object + `impacts[]` (each with `company_id`, `ticker`, `macro_impact_score`, `impact_direction`, `contributing_factors`, `confidence`, `legal_name`, `sector`)
 - **Errors:** `404` — Global event not found

 #### `GET /api/macro/impacts/{ticker}`
@@ -515,7 +541,8 @@ Macro impacts and exposure profile for a specific company.
 | `limit` | int | `50` | max `200` | Page size |
 | `offset` | int | `0` | — | Pagination offset |

- **Response:** `{ exposure_profile, impacts[] }`
+- **Path params:** `ticker` (auto-uppercased)
+- **Response:** `{ exposure_profile, impacts[] }` — each impact includes `event_summary`, `event_severity`, `event_types`, `affected_regions`

 ### 1.18 Competitive Signal Layer

@@ -540,6 +567,7 @@ Historical patterns for a company.
 | `catalyst_type` | string | — | Filter by catalyst type |
 | `time_horizon` | string | — | Filter by time horizon |

+- **Path params:** `ticker` (string)
 - **Response:** `{ ticker, patterns[], count }`

 #### `GET /api/patterns/{ticker}/competitors`
@@ -555,6 +583,7 @@ Cross-company patterns showing how this company's catalysts affected competitors
 #### `GET /api/patterns/{ticker}/competitive-signals`
 Recent competitive signals targeting this company (limit 100).

+- **Path params:** `ticker` (string)
 - **Response:** `{ ticker, competitive_signals[], count }`

 #### `GET /api/patterns/{ticker}/decisions`
@@ -564,9 +593,9 @@ Major corporate decision history with trend outcomes and pattern statistics.
 |-----------|------|---------|-------------|
 | `time_horizon` | string | — | Filter by time horizon |

+- **Path params:** `ticker` (string)
 - **Response:** `{ ticker, decisions[], count }` — each decision includes `pattern_statistics[]`

-
 ### 1.20 AI Agents

 #### `GET /api/agents`
@@ -576,9 +605,12 @@ List all AI agent configurations.
 |-----------|------|---------|-------------|
 | `active_only` | bool | `false` | Only show active agents |

+- **Response:** Array of agent objects with `id`, `name`, `slug`, `purpose`, `model_provider`, `model_name`, `system_prompt`, `user_prompt_template`, `prompt_version`, `schema_version`, `temperature`, `max_tokens`, `timeout_seconds`, `max_retries`, `active`, `source`, `created_at`, `updated_at`
+
 #### `GET /api/agents/{agent_id}`
 Get a single agent configuration.

+- **Path params:** `agent_id` (UUID string)
 - **Errors:** `404` — Agent not found

 #### `POST /api/agents` (201)
@@ -603,9 +635,9 @@ Create a new user-defined agent.
 | `max_retries` | int | `2` | Max retry attempts |

 #### `PUT /api/agents/{agent_id}`
-Update an agent configuration. Partial updates supported.
+Update an agent configuration. Partial updates supported — only provided fields are changed.

- **Body:** `AgentUpdateBody` — all fields optional (same fields as create)
+- **Body:** `AgentUpdateBody` — all fields optional (same fields as create plus `active`)
 - **Errors:** `400` — No fields to update; `404` — Agent not found

 #### `DELETE /api/agents/{agent_id}`
@@ -636,6 +668,8 @@ Hourly performance time-series for an agent.
 #### `GET /api/agents/{agent_id}/variants`
 List all variants for an agent, ordered by `created_at` ascending.

+- **Response:** Array of variant objects with `id`, `agent_id`, `variant_name`, `variant_slug`, `description`, `model_provider`, `model_name`, `system_prompt`, `user_prompt_template`, `prompt_version`, `temperature`, `max_tokens`, `context_window`, `input_token_limit`, `token_budget`, `timeout_seconds`, `max_retries`, `is_active`, `created_at`, `updated_at`
+
 #### `GET /api/agents/{agent_id}/variants/{variant_id}`
 Get a single variant.

@@ -680,13 +714,13 @@ Delete a variant. Cannot delete active variants.
 #### `POST /api/agents/{agent_id}/clone` (201)
 Clone an agent's configuration as a new variant with optional overrides.

- **Body:** `VariantCloneBody { variant_name, variant_slug?, ...optional overrides }`
+- **Body:** `VariantCloneBody { variant_name, variant_slug?, description?, model_provider?, model_name?, system_prompt?, user_prompt_template?, prompt_version?, temperature?, max_tokens?, context_window?, input_token_limit?, token_budget?, timeout_seconds?, max_retries? }`
 - **Errors:** `404` — Agent not found; `409` — Duplicate slug

 #### `POST /api/agents/{agent_id}/variants/{variant_id}/clone` (201)
 Clone an existing variant as a new variant with optional overrides.

- **Body:** `VariantCloneBody`
+- **Body:** `VariantCloneBody` (same as above)
 - **Errors:** `404` — Source variant not found; `409` — Duplicate slug

 #### `POST /api/agents/{agent_id}/variants/{variant_id}/activate`
@@ -697,6 +731,8 @@ Set a variant as the active variant for its agent. Deactivates any currently act
 #### `POST /api/agents/{agent_id}/variants/deactivate`
 Deactivate the currently active variant. Agent falls back to base configuration.

+- **Response:** `{ deactivated: true }`
+
 #### `GET /api/agents/{agent_id}/variants/{variant_id}/performance`
 Aggregated performance metrics for a specific variant.

@@ -704,6 +740,8 @@ Aggregated performance metrics for a specific variant.
 |-----------|------|---------|-------------|-------------|
 | `hours` | int | `24` | max `720` | Time window |

+- **Response:** Same shape as agent performance (invocations, successes, failures, durations, confidence, tokens, success_rate)
+
 #### `GET /api/agents/{agent_id}/variants/{variant_id}/performance/history`
 Hourly performance time-series for a specific variant.

@@ -711,6 +749,108 @@ Hourly performance time-series for a specific variant.
 |-----------|------|---------|-------------|-------------|
 | `hours` | int | `24` | max `720` | Time window |

+- **Response:** Array of `{ hour, invocations, successes, avg_duration_ms, avg_confidence }`
+
+### 1.22 Model Validation
+
+#### `GET /api/validation/summary`
+Latest model metric snapshot plus quality gate status.
+
+| Parameter | Type | Default | Constraints | Description |
+|-----------|------|---------|-------------|-------------|
+| `lookback` | string | `"30d"` | `7d`, `30d`, `90d`, `all` | Lookback window |
+| `horizon` | string | `"7d"` | `1h`, `6h`, `1d`, `7d`, `30d` | Prediction horizon |
+
+- **Response:** `{ snapshot: { id, generated_at, lookback_window, horizon, prediction_count, win_rate, directional_accuracy, information_coefficient, rank_information_coefficient, avg_return, avg_excess_return_vs_spy, avg_excess_return_vs_sector, calibration_error, brier_score, buy_win_rate, sell_win_rate, hold_win_rate, metadata }, gate_status }`
+- **Errors:** `400` — Invalid lookback or horizon value
+
+#### `GET /api/validation/calibration`
+Calibration table with confidence buckets showing predicted vs observed win rates.
+
+| Parameter | Type | Default | Constraints | Description |
+|-----------|------|---------|-------------|-------------|
+| `lookback` | string | `"30d"` | `7d`, `30d`, `90d`, `all` | Lookback window |
+| `horizon` | string | `"7d"` | `1h`, `6h`, `1d`, `7d`, `30d` | Prediction horizon |
+
+- **Response:** `{ buckets: [{ bucket_low, bucket_high, avg_confidence, observed_win_rate, prediction_count, miscalibrated }], lookback, horizon }`
+- Buckets: 0.50–0.60, 0.60–0.70, 0.70–0.80, 0.80–0.90, 0.90–1.00
+- `miscalibrated` is `true` when `|avg_confidence - observed_win_rate| > 0.15`
+- **Errors:** `400` — Invalid lookback or horizon value
+
+#### `GET /api/validation/ic-by-horizon`
+Information Coefficient and Rank IC per prediction horizon.
+
+| Parameter | Type | Default | Constraints | Description |
+|-----------|------|---------|-------------|-------------|
+| `lookback` | string | `"30d"` | `7d`, `30d`, `90d`, `all` | Lookback window |
+
+- **Response:** `{ horizons: [{ horizon, information_coefficient, rank_information_coefficient, prediction_count, generated_at }], lookback }`
+- Horizons ordered: `1h`, `6h`, `1d`, `7d`, `30d`
+- **Errors:** `400` — Invalid lookback value
+
+#### `GET /api/validation/gate-status`
+Quality gate evaluation detail from `risk_configs` where `name = 'model_quality_gate'`.
+
+- **Response:** `{ gate_status, updated_at }` or `{ gate_status: null, message: "No gate evaluation found..." }`
+
+### 1.23 Attribution
+
+#### `GET /api/validation/attribution/sources`
+Per-source performance metrics: win rate, IC, average return, duplicate rate.
+
+| Parameter | Type | Default | Constraints | Description |
+|-----------|------|---------|-------------|-------------|
+| `lookback` | string | `"30d"` | `7d`, `30d`, `90d`, `all` | Lookback window |
+| `horizon` | string | `"7d"` | `1h`, `6h`, `1d`, `7d`, `30d` | Prediction horizon |
+
+- **Response:** `{ sources[], lookback, horizon }`
+- **Errors:** `400` — Invalid lookback or horizon; `500` — Computation failed
+
+#### `GET /api/validation/attribution/catalysts`
+Per-catalyst-type performance metrics: win rate, IC, average return.
+
+| Parameter | Type | Default | Constraints | Description |
+|-----------|------|---------|-------------|-------------|
+| `lookback` | string | `"30d"` | `7d`, `30d`, `90d`, `all` | Lookback window |
+| `horizon` | string | `"7d"` | `1h`, `6h`, `1d`, `7d`, `30d` | Prediction horizon |
+
+- **Response:** `{ catalysts[], lookback, horizon }`
+- **Errors:** `400` — Invalid lookback or horizon; `500` — Computation failed
+
+#### `GET /api/validation/attribution/layers`
+Per-signal-layer (company, macro, competitive) performance metrics.
+
+| Parameter | Type | Default | Constraints | Description |
+|-----------|------|---------|-------------|-------------|
+| `lookback` | string | `"30d"` | `7d`, `30d`, `90d`, `all` | Lookback window |
+| `horizon` | string | `"7d"` | `1h`, `6h`, `1d`, `7d`, `30d` | Prediction horizon |
+
+- **Response:** `{ layers[], lookback, horizon }` — each layer has `avg_contribution_pct`, `dominant_win_rate`, `dominant_ic`
+- **Errors:** `400` — Invalid lookback or horizon; `500` — Computation failed
+
+### 1.24 Trading Reports
+
+#### `GET /api/reports`
+Paginated list of trading reports with optional filtering.
+
+| Parameter | Type | Default | Constraints | Description |
+|-----------|------|---------|-------------|-------------|
+| `report_type` | string | — | `daily` or `weekly` | Filter by report type |
+| `start_date` | string | — | ISO date (YYYY-MM-DD) | Filter `period_start >= this` |
+| `end_date` | string | — | ISO date (YYYY-MM-DD) | Filter `period_end <= this` |
+| `limit` | int | `20` | max `100` | Page size |
+| `offset` | int | `0` | min `0` | Pagination offset |
+
+- **Response:** Array of `{ id, report_type, period_start, period_end, validation_status, generated_at }`
+- **Errors:** `400` — Invalid `report_type` or date format
+
+#### `GET /api/reports/{report_id}`
+Fetch a single report including full `report_data` JSONB.
+
+- **Path params:** `report_id` (UUID string)
+- **Response:** `{ id, report_type, period_start, period_end, report_data, validation_status, generated_at, created_at }`
+- **Errors:** `404` — Report not found
+
 ---

 ## 2. Symbol Registry API
@@ -756,6 +896,7 @@ List tracked companies.
 #### `GET /companies/{company_id}`
 Get a single company.

+- **Path params:** `company_id` (UUID string)
 - **Errors:** `404` — Company not found

 #### `PUT /companies/{company_id}`
@@ -783,14 +924,18 @@ List aliases for a company.
 Create a new watchlist.

 - **Body:** `{ name: string, description?: string }`
+- **Response:** `{ id, name, description, active }`
 - **Errors:** `409` — Watchlist name already exists

 #### `GET /watchlists`
 List all watchlists.

+- **Response:** Array of `{ id, name, description, active }`
+
 #### `POST /watchlists/{watchlist_id}/members/{company_id}` (201)
 Add a company to a watchlist.

+- **Response:** `{ status: "added" }`
 - **Errors:** `409` — Already a member; `404` — Watchlist or company not found

 #### `GET /watchlists/{watchlist_id}/members`
@@ -814,11 +959,14 @@ Add a data source for a company.
 | `retention_days` | int | `365` | — | Data retention period |
 | `access_policy` | string | `"internal"` | `internal`, `public`, `restricted` | Access policy |

+- **Response:** `{ id, source_type, source_name, credibility_score, active }`
 - **Errors:** `404` — Company not found; `422` — Invalid source_type or access_policy

 #### `GET /companies/{company_id}/sources`
 List sources for a company.

+- **Response:** Array of `{ id, source_type, source_name, config, credibility_score, retention_days, access_policy, active }`
+
 ### 2.6 Exposure Profiles

 #### `GET /companies/{company_id}/exposure`
@@ -848,6 +996,8 @@ Create or update an exposure profile. Archives the previous active version.
 #### `GET /companies/{company_id}/exposure/history`
 Get all exposure profile versions for a company, ordered by version descending.

+- **Response:** Array of `ExposureProfileResponse`
+
 ### 2.7 Competitor Relationships

 #### `POST /companies/{company_id}/competitors` (201)
@@ -863,10 +1013,11 @@ Create a competitor relationship. Records an audit event.
 | `bidirectional` | bool | `true` | — | Bidirectional relationship |
 | `source` | string | `"manual"` | `manual`, `inferred` | Data source |

+- **Response:** `CompetitorRelationship { id, company_a_id, company_b_id, relationship_type, strength, bidirectional, source, active, created_at, updated_at }`
 - **Errors:** `400` — Self-reference; `404` — Company not found; `409` — Relationship already exists

 #### `GET /companies/{company_id}/competitors`
-List active competitor relationships, enriched with ticker and legal_name of the other company.
+List active competitor relationships, enriched with `ticker` and `legal_name` of the other company. Ordered by strength descending.

 - **Errors:** `404` — Company not found

@@ -879,6 +1030,7 @@ Update a competitor relationship. Records an audit event with previous state.
 #### `DELETE /companies/{company_id}/competitors/{relationship_id}`
 Soft-delete a competitor relationship (sets `active=false`). Records an audit event.

+- **Response:** `{ status: "deleted", id }`
 - **Errors:** `404` — Active relationship not found

 ### 2.8 Competitor Inference
@@ -923,7 +1075,7 @@ Diagnostic endpoint showing engine internals for troubleshooting.
 #### `GET /api/trading/status`
 Return current engine state.

- **Response:** `{ enabled, paused, risk_tier, circuit_breaker_status, active_pool, reserve_pool, portfolio_heat, open_positions, last_decision_at }`
+- **Response:** `{ enabled, paused, risk_tier, circuit_breaker_status, active_pool, reserve_pool, portfolio_heat, open_positions, open_position_count, max_open_positions, absolute_position_cap, last_decision_at }`
 - **Errors:** `503` — Engine not initialised

 #### `PUT /api/trading/config`
@@ -960,7 +1112,13 @@ Resume the trading engine.
 #### `POST /api/trading/reset`
 Full paper trading reset: liquidate broker positions, cancel orders, clear trading state, reset capital.

- **Body:** `{ initial_capital?: float (default 0.0) }` — if 0, uses broker balance or defaults to 100,000
+- **Body:** `CapitalRequest`
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `initial_capital` | float | `0.0` | If 0, uses broker balance or defaults to 100,000 |
+| `reserve_pct` | float | `null` | Reserve pool percentage (0–1). If null, uses engine config `reserve_siphon_pct` |
+
 - **Response:** `{ reset: true, initial_capital, active_pool, reserve_pool, broker: { orders_cancelled, positions_closed, portfolio_value, cash, buying_power } }`
 - **Errors:** `503` — Engine not initialised; `500` — Database reset failed

@@ -977,6 +1135,8 @@ Return recent trading decisions from the database.
 | `limit` | int | `50` | max `200` | Page size |
 | `offset` | int | `0` | — | Pagination offset |

+- **Response:** Array of `{ id, recommendation_id, decision, skip_reason, ticker, computed_position_size, computed_share_quantity, risk_tier_at_decision, portfolio_heat_at_decision, active_pool_at_decision, reserve_pool_at_decision, circuit_breaker_status, is_micro_trade, created_at }`
+
 ### 3.5 Performance Metrics

 #### `GET /api/trading/metrics`
@@ -992,6 +1152,8 @@ Return historical daily portfolio snapshots.
 |-----------|------|---------|-------------|-------------|
 | `limit` | int | `30` | max `365` | Max snapshots |

+- **Response:** Array of `{ id, snapshot_date, portfolio_value, active_pool, reserve_pool, daily_return, cumulative_return, unrealized_pnl, realized_pnl, win_count, loss_count, win_rate, sharpe_ratio, max_drawdown, current_drawdown_pct, portfolio_heat, risk_tier, created_at }`
+
 ### 3.6 Backtesting

 #### `POST /api/trading/backtest`
@@ -1012,6 +1174,7 @@ Launch a backtest run asynchronously.
 #### `GET /api/trading/backtest/{backtest_id}`
 Retrieve backtest results.

+- **Path params:** `backtest_id` (UUID string)
 - **Response:** `{ id, start_date, end_date, initial_capital, risk_tier, config, total_return, sharpe_ratio, max_drawdown, win_rate, profit_factor, trade_count, equity_curve[], trades[], status, completed_at, created_at }`
 - Status values: `running`, `completed`, `not_found`, `pending`

@@ -1037,10 +1200,11 @@ Update notification preferences.

 All fields optional.

+- **Response:** `{ updated: { ...changed fields } }`
 - **Errors:** `503` — Engine not initialised

 #### `GET /api/trading/notifications/history`
-Return recent notifications.
+Return recent notifications (placeholder — currently returns empty array).

 | Parameter | Type | Default | Constraints | Description |
 |-----------|------|---------|-------------|-------------|
@@ -1116,6 +1280,8 @@ List pending approval requests.
 #### `GET /approvals/{approval_id}`
 Get a single approval request.

+- **Path params:** `approval_id` (UUID string)
+- **Response:** Approval request object
 - **Errors:** `404` — Approval not found; `503` — Database not ready

 #### `POST /approvals/{approval_id}/review`
@@ -1138,4 +1304,4 @@ Approve or reject a pending approval request.
 Expire stale approvals that have passed their expiration time.

 - **Response:** `{ expired: int, items: [] }`
- **Errors:** `503` — Database not ready
+- **Errors:** `503` — Database not ready
@@ -18,13 +18,13 @@ flowchart TB
    end

    %% ── Scheduler ─────────────────────────────────────────────────
-    scheduler["<b>Scheduler</b><br/><i>services.scheduler.app</i><br/>Cadence polling, rate limiting,<br/>backoff &amp; stale recovery"]
+    scheduler["<b>Scheduler</b><br/><i>services.scheduler.app</i><br/>Cadence polling, rate limiting,<br/>backoff, stale recovery,<br/>periodic aggregation,<br/>report scheduling"]

    sources -.->|"API polling<br/>on cadence"| scheduler

    %% ── Ingestion Queue ───────────────────────────────────────────
    q_ingestion[["stonks:queue:ingestion"]]
-    scheduler -->|"rpush job"| q_ingestion
+    scheduler -->|"rpush job<br/>(company, macro,<br/>global market)"| q_ingestion

    %% ── Ingestion Worker ──────────────────────────────────────────
    ingestion["<b>Ingestion</b><br/><i>services.ingestion.worker</i><br/>Adapter dispatch, dedupe,<br/>raw artifact upload"]
@@ -42,7 +42,7 @@ flowchart TB

    %% ── Parsing Queue ─────────────────────────────────────────────
    q_parsing[["stonks:queue:parsing"]]
-    ingestion -->|"rpush<br/>(news, filings,<br/>web_scrape)"| q_parsing
+    ingestion -->|"rpush<br/>(news, filings,<br/>web_scrape, macro)"| q_parsing

    %% ── Parser Worker ─────────────────────────────────────────────
    parser["<b>Parser</b><br/><i>services.parser.worker</i><br/>HTML parsing, quality scoring,<br/>company mention detection"]
@@ -50,7 +50,7 @@ flowchart TB
    q_parsing -->|"lpop"| parser

    minio_norm[("MinIO<br/><i>Normalized Text</i><br/><i>Parser Output JSON</i>")]
-    parser -->|"upload normalized text"| minio_norm
+    parser -->|"upload normalized text<br/>+ structured output"| minio_norm
    parser -->|"update document status,<br/>insert mentions"| pg_docs
 ```

@@ -70,18 +70,23 @@ flowchart TB
    parser -->|"rpush<br/>(standard docs)"| q_extraction
    parser -->|"rpush<br/>(macro_event docs)"| q_macro

+    %% ── Scheduler Recovery ────────────────────────────────────────
+    scheduler_recovery(("Scheduler<br/><i>stale recovery &amp;<br/>failed retry</i>"))
+    scheduler_recovery -.->|"re-enqueue orphaned<br/>parsed docs"| q_extraction
+    scheduler_recovery -.->|"re-enqueue orphaned<br/>macro docs"| q_macro
+
    %% ── Extractor Worker ──────────────────────────────────────────
    subgraph extractor_svc ["Extractor Service"]
        direction TB
-        ext_main["<b>Extractor</b><br/><i>services.extractor.main</i><br/>Alternates between queues<br/>(2 extraction : 1 macro)"]
+        ext_main["<b>Extractor</b><br/><i>services.extractor.main</i><br/>Alternates between queues<br/>(2 extraction : 1 macro)<br/>Token budget enforcement"]
    end

    q_extraction -->|"lpop"| ext_main
    q_macro -->|"lpop"| ext_main

    %% ── Ollama LLM ───────────────────────────────────────────────
-    ollama["<b>Ollama</b><br/><i>LLM Inference</i><br/>document-extractor agent<br/>event-classifier agent"]
-    ext_main <-->|"HTTP /api/generate"| ollama
+    ollama["<b>Ollama / vLLM</b><br/><i>LLM Inference</i><br/>document-extractor agent<br/>event-classifier agent"]
+    ext_main <-->|"HTTP /api/generate<br/>(AgentConfigResolver<br/>selects model + variant)"| ollama

    %% ── Signal Layer 1: Company ───────────────────────────────────
    subgraph layer1 ["Layer 1 — Company Signals"]
@@ -95,7 +100,7 @@ flowchart TB
    subgraph layer2 ["Layer 2 — Macro Signals"]
        direction LR
        ge["global_events"]
-        mir["macro_impact_records<br/><i>per-company interpolation</i>"]
+        mir["macro_impact_records<br/><i>per-company interpolation<br/>via exposure profiles</i>"]
        ge --> mir
    end

@@ -106,6 +111,10 @@ flowchart TB
    q_agg[["stonks:queue:aggregation"]]
    ext_main -->|"rpush<br/>(per ticker)"| q_agg

+    %% ── Scheduler Periodic Aggregation ────────────────────────────
+    scheduler_agg(("Scheduler<br/><i>periodic aggregation<br/>every ~15 min</i>"))
+    scheduler_agg -.->|"rpush all<br/>active tickers"| q_agg
+
    %% ── Aggregation Worker ────────────────────────────────────────
    aggregation["<b>Aggregation</b><br/><i>services.aggregation.main</i><br/>Trend windows, scoring,<br/>contradiction detection"]

@@ -133,6 +142,8 @@ flowchart TB

 ## Recommendation → Trading → Broker

+The recommendation worker consumes from the recommendation queue. The trading engine does **not** consume from a queue — it polls the `recommendations` table in PostgreSQL on a configurable interval, evaluates each recommendation through its decision pipeline, and pushes "act" decisions to the broker queue.
+
 ```mermaid
 flowchart TB
    %% ── Recommendation Queue ──────────────────────────────────────
@@ -144,19 +155,23 @@ flowchart TB

    q_rec -->|"lpop"| recommendation

-    ollama_thesis["<b>Ollama</b><br/><i>thesis-rewriter agent</i><br/>(optional LLM rewrite)"]
+    ollama_thesis["<b>Ollama / vLLM</b><br/><i>thesis-rewriter agent</i><br/>(AgentConfigResolver<br/>selects model + variant)"]
    recommendation <-->|"rewrite thesis<br/>(trading-eligible only)"| ollama_thesis

    pg_recs[("PostgreSQL<br/><i>recommendations,<br/>recommendation_evidence,<br/>risk_evaluations</i>")]
    recommendation -->|"persist recommendation<br/>+ evidence + risk eval"| pg_recs

+    %% ── Lake Publication (inline) ─────────────────────────────────
+    minio_rec_lake[("MinIO<br/><i>Lakehouse</i><br/>recommendation facts")]
+    recommendation -->|"publish_recommendation_facts<br/>(Parquet)"| minio_rec_lake
+
    %% ── Trading Engine ────────────────────────────────────────────
    subgraph trading_loop ["Trading Engine Decision Loop"]
        direction TB
        poll["Poll recommendations<br/><i>action IN (buy, sell)<br/>mode IN (paper, live)<br/>generated_at &gt; last_poll</i>"]
        dedup_check["Redis dedup check<br/><i>stonks:dedupe:trading:*</i>"]
-        evaluate["evaluate_recommendation<br/><i>Circuit breaker check<br/>Trading window check<br/>Confidence gate<br/>Sector exposure check<br/>Correlation check<br/>Earnings blackout</i>"]
-        size["Position sizing<br/><i>Kelly criterion,<br/>risk tier limits</i>"]
+        evaluate["evaluate_recommendation<br/><i>Circuit breaker check<br/>Trading window check<br/>Confidence gate<br/>Sector exposure check<br/>Correlation check<br/>Earnings blackout<br/>Max positions check</i>"]
+        size["Position sizing<br/><i>Kelly criterion,<br/>risk tier limits,<br/>micro-trade support</i>"]
        decide{{"Decision"}}
        poll --> dedup_check --> evaluate --> size --> decide
    end
@@ -170,22 +185,30 @@ flowchart TB

    pg_decisions[("PostgreSQL<br/><i>trading_decisions</i>")]

+    %% ── Manual Override ───────────────────────────────────────────
+    trading_api(("Trading API<br/><i>POST /override/order</i>"))
+    trading_api -->|"rpush<br/>manual order"| q_broker
+
    %% ── Broker Adapter ────────────────────────────────────────────
-    broker["<b>Broker Adapter</b><br/><i>services.adapters.broker_service</i><br/>Risk evaluation, idempotency,<br/>order submission, fill tracking"]
+    broker["<b>Broker Adapter</b><br/><i>services.adapters.broker_service</i><br/>Idempotency, risk evaluation,<br/>approval gate, order submission,<br/>fill tracking, position sync"]

    q_broker -->|"lpop"| broker

    %% ── Risk Engine ───────────────────────────────────────────────
-    risk["<b>Risk Engine</b><br/><i>services.risk.app</i><br/>POST /evaluate<br/>Approval workflow"]
-    broker <-->|"evaluate order"| risk
+    risk["<b>Risk Engine</b><br/><i>services.risk.app</i><br/>evaluate_order()<br/>Position limits, sector exposure,<br/>daily loss caps, approval workflow"]
+    broker -->|"evaluate order<br/>(inline call)"| risk

    %% ── Alpaca ────────────────────────────────────────────────────
-    alpaca["<b>Alpaca</b><br/><i>Paper Trading API</i><br/>Order submission,<br/>position sync"]
-    broker <-->|"submit order /<br/>sync positions"| alpaca
+    alpaca["<b>Alpaca</b><br/><i>Paper Trading API</i><br/>Order submission,<br/>position sync,<br/>account state"]
+    broker <-->|"submit order /<br/>sync positions /<br/>sync order status"| alpaca

-    pg_orders[("PostgreSQL<br/><i>orders, order_events,<br/>positions,<br/>portfolio_snapshots</i>")]
+    pg_orders[("PostgreSQL<br/><i>orders, order_events,<br/>positions,<br/>portfolio_snapshots,<br/>broker_accounts</i>")]
    broker -->|"persist order,<br/>events, positions"| pg_orders

+    %% ── Lake Publication (broker inline) ──────────────────────────
+    minio_broker_lake[("MinIO<br/><i>Lakehouse</i><br/>order + fill + position facts")]
+    broker -->|"publish_trade_order<br/>publish_trade_fill<br/>publish_positions_daily_batch<br/>(Parquet)"| minio_broker_lake
+
    %% ── Notifications ─────────────────────────────────────────────
    subgraph notifications ["Notifications"]
        direction LR
@@ -198,28 +221,32 @@ flowchart TB

 ## Analytical Branch — Lake Publisher

-The lake publisher runs as a separate worker, consuming from its own queue and writing partitioned Parquet fact tables to MinIO for analytical queries.
+The lake publisher runs as a separate worker, consuming from its own queue and writing partitioned Parquet fact tables to MinIO for analytical queries. Some services (broker adapter, recommendation worker) also publish facts directly to MinIO inline, bypassing the queue.

 ```mermaid
 flowchart LR
    %% ── Lake Publish Queue ────────────────────────────────────────
    q_lake[["stonks:queue:lake_publish"]]

-    various(("Various Services<br/><i>ingestion, extractor,<br/>recommendation,<br/>broker adapter</i>"))
-    various -->|"enqueue_lake_job"| q_lake
+    various(("Upstream Services<br/><i>via enqueue_lake_job()</i>"))
+    various -->|"rpush job<br/>(job_type + entity_id)"| q_lake

    %% ── Lake Publisher Worker ─────────────────────────────────────
-    lake["<b>Lake Publisher</b><br/><i>services.lake_publisher.jobs</i><br/>Transforms operational data<br/>into analytical facts"]
+    lake["<b>Lake Publisher</b><br/><i>services.lake_publisher.jobs</i><br/>Transforms operational data<br/>into analytical facts<br/><i>15 job types supported</i>"]

    q_lake -->|"lpop"| lake

-    pg_source[("PostgreSQL<br/><i>Operational Tables</i><br/>documents, extractions,<br/>orders, positions, events")]
+    pg_source[("PostgreSQL<br/><i>Operational Tables</i><br/>documents, extractions,<br/>orders, positions, events,<br/>global_events, macro_impacts,<br/>competitive_signals")]
    lake -->|"query source data"| pg_source

    %% ── MinIO Parquet ─────────────────────────────────────────────
    minio_lake[("MinIO<br/><i>Lakehouse Bucket</i><br/>Partitioned Parquet<br/>/year=/month=/day=")]
    lake -->|"write Parquet files"| minio_lake

+    %% ── Inline Publishers ─────────────────────────────────────────
+    inline(("Inline Publishers<br/><i>broker adapter,<br/>recommendation worker</i>"))
+    inline -->|"publish_* functions<br/>(direct Parquet write)"| minio_lake
+
    %% ── Trino ─────────────────────────────────────────────────────
    trino["<b>Trino</b><br/><i>SQL Query Engine</i><br/>Hive connector → MinIO"]
    minio_lake -->|"read via<br/>Hive Metastore"| trino
@@ -238,18 +265,40 @@ flowchart LR
    query_api --> dashboard
 ```

+## Report Generation
+
+The scheduler manages report generation as a sub-loop, enqueuing daily and weekly report jobs to a dedicated queue and consuming them inline.
+
+```mermaid
+flowchart LR
+    scheduler["<b>Scheduler</b><br/><i>report schedule check</i><br/>daily @ 16:30 ET<br/>weekly @ Saturday"]
+
+    q_report[["stonks:queue:report_generation"]]
+    scheduler -->|"rpush<br/>(daily/weekly)"| q_report
+
+    scheduler_consumer["<b>Scheduler</b><br/><i>report consumer loop</i><br/>pops up to 5 jobs/cycle"]
+    q_report -->|"lpop"| scheduler_consumer
+
+    generator["<b>Report Generator</b><br/><i>services.reporting.generator</i>"]
+    scheduler_consumer -->|"process_report_job()"| generator
+
+    pg_reports[("PostgreSQL<br/><i>trading_reports</i>")]
+    generator -->|"persist report"| pg_reports
+```
+
 ## Complete Queue Topology

 | Queue | Full Key | Producer(s) | Consumer |
 |-------|----------|-------------|----------|
-| Ingestion | `stonks:queue:ingestion` | Scheduler | Ingestion Worker |
-| Parsing | `stonks:queue:parsing` | Ingestion Worker | Parser Worker |
-| Extraction | `stonks:queue:extraction` | Parser (standard docs) | Extractor Worker |
-| Macro Classification | `stonks:queue:macro_classification` | Parser (macro_event docs), Scheduler | Extractor Worker |
-| Aggregation | `stonks:queue:aggregation` | Extractor Worker | Aggregation Worker |
-| Recommendation | `stonks:queue:recommendation` | Aggregation Worker | Recommendation Worker |
-| Broker Orders | `stonks:queue:broker_orders` | Trading Engine, Trading API (manual overrides) | Broker Adapter |
-| Lake Publish | `stonks:queue:lake_publish` | Various services | Lake Publisher |
+| Ingestion | `stonks:queue:ingestion` | Scheduler (company, macro, global market sources) | Ingestion Worker |
+| Parsing | `stonks:queue:parsing` | Ingestion Worker (news, filings, web_scrape, macro) | Parser Worker |
+| Extraction | `stonks:queue:extraction` | Parser (standard docs), Scheduler (stale recovery) | Extractor Worker |
+| Macro Classification | `stonks:queue:macro_classification` | Parser (macro_event docs), Scheduler (stale/failed recovery) | Extractor Worker |
+| Aggregation | `stonks:queue:aggregation` | Extractor Worker (per ticker), Scheduler (periodic, all tickers) | Aggregation Worker |
+| Recommendation | `stonks:queue:recommendation` | Aggregation Worker (ticker + window, 5 min dedup TTL) | Recommendation Worker |
+| Broker Orders | `stonks:queue:broker_orders` | Trading Engine (act decisions), Trading API (manual overrides) | Broker Adapter |
+| Lake Publish | `stonks:queue:lake_publish` | Various services (via `enqueue_lake_job()`) | Lake Publisher |
+| Report Generation | `stonks:queue:report_generation` | Scheduler (daily/weekly triggers) | Scheduler (inline consumer) |

 Dead-letter queues follow the pattern `stonks:dlq:<queue_name>` and are populated when a job exhausts its retry budget.

@@ -257,18 +306,25 @@ Dead-letter queues follow the pattern `stonks:dlq:<queue_name>` and are populate

 | Store | Role | Key Tables / Buckets |
 |-------|------|---------------------|
-| **PostgreSQL** | Structured operational data | `documents`, `document_intelligence`, `document_impact_records`, `global_events`, `macro_impact_records`, `competitive_signal_records`, `trend_windows`, `trend_history`, `trend_projections`, `recommendations`, `recommendation_evidence`, `risk_evaluations`, `orders`, `order_events`, `positions`, `portfolio_snapshots`, `trading_decisions` |
-| **Redis** | Queues, dedup markers, rate limits, circuit breaker state | `stonks:queue:*`, `stonks:dedupe:*`, `stonks:ratelimit:*`, `stonks:trading:circuit_breaker:*`, `stonks:dlq:*` |
-| **MinIO** | Object storage for raw artifacts, normalized text, and analytical Parquet files | Raw artifacts bucket, normalized text bucket, lakehouse bucket (partitioned Parquet) |
+| **PostgreSQL** | Structured operational data | `documents`, `document_intelligence`, `document_impact_records`, `document_company_mentions`, `global_events`, `macro_impact_records`, `exposure_profiles`, `competitive_signal_records`, `competitor_relationships`, `trend_windows`, `trend_history`, `trend_projections`, `recommendations`, `recommendation_evidence`, `risk_evaluations`, `orders`, `order_events`, `positions`, `portfolio_snapshots`, `trading_decisions`, `circuit_breaker_events`, `reserve_pool_ledger`, `risk_tier_history`, `broker_accounts`, `ingestion_runs`, `sources`, `companies`, `company_aliases`, `ai_agents`, `agent_variants`, `agent_performance_log`, `risk_configs`, `trading_reports` |
+| **Redis** | Queues, dedup markers, rate limits, circuit breaker state, pipeline toggle | `stonks:queue:*` (9 queues), `stonks:dedupe:*`, `stonks:dedupe:trading:*`, `stonks:ratelimit:*`, `stonks:trading:circuit_breaker:*`, `stonks:trading:notification_rate:*`, `stonks:order_idempotency:*`, `stonks:lock:*`, `stonks:cache:*`, `stonks:retry:*`, `stonks:rec_dedup:*`, `stonks:pipeline:enabled`, `stonks:dlq:*` |
+| **MinIO** | Object storage for raw artifacts, normalized text, and analytical Parquet files | Raw artifacts bucket, normalized text bucket, parser output bucket, lakehouse bucket (partitioned Parquet: documents, extractions, market bars/quotes, orders, fills, positions, PnL, global events, macro impacts, trend projections, competitive signals, competitor relationships, recommendations) |

 ## External Integration Points

 | Integration | Service | Protocol | Purpose |
 |-------------|---------|----------|---------|
-| **Polygon.io** | Ingestion (via adapters) | HTTPS REST | News articles, market bars, grouped daily data |
-| **SEC EDGAR** | Ingestion (via FilingsDataAdapter) | HTTPS REST | 10-K, 10-Q filings |
-| **Ollama** | Extractor, Recommendation | HTTP `/api/generate` | LLM inference for document extraction, event classification, thesis rewriting |
-| **Alpaca** | Broker Adapter | HTTPS REST | Paper trading order submission, position sync, account state |
+| **Polygon.io** | Ingestion (via PolygonNewsAdapter, PolygonMarketAdapter) | HTTPS REST | News articles, market bars, grouped daily data, intraday bars |
+| **SEC EDGAR** | Ingestion (via SECEdgarAdapter) | HTTPS REST | 10-K, 10-Q filings |
+| **Macro News** | Ingestion (via MacroNewsAdapter) | HTTPS REST | Geopolitical and economic event articles |
+| **Ollama / vLLM** | Extractor, Recommendation | HTTP `/api/generate` | LLM inference for document extraction (document-extractor agent), event classification (event-classifier agent), thesis rewriting (thesis-rewriter agent). Model and variant selected via `AgentConfigResolver` with 60s TTL cache. |
+| **Alpaca** | Broker Adapter | HTTPS REST | Paper/live trading: order submission, position sync, account state, order status polling |
 | **AWS SNS** | Trading Engine (notifications) | boto3 SDK | SMS alerts for circuit breaker trips, order fills, stop-loss triggers |
 | **Gmail** | Trading Engine (notifications) | SMTP (port 587 STARTTLS) | Email alerts for trading events |
-| **Trino** | Query API, Superset | JDBC / HTTP | SQL queries over lakehouse Parquet files |
+| **Trino** | Query API, Superset | HTTP | SQL queries over lakehouse Parquet files via Hive Metastore |
+
+## Pipeline Toggle
+
+The pipeline can be paused globally via the Redis key `stonks:pipeline:enabled`. When set to `"0"`, all queue workers (ingestion, parser, extractor, aggregation, recommendation, broker adapter, lake publisher) enter a sleep loop and stop processing jobs. The scheduler also skips scheduling cycles when the toggle is off. The toggle can be set via the Query API's pipeline control endpoints.
+
+Setting `PIPELINE_DEFAULT_OFF=true` on the scheduler initializes the toggle to OFF on first boot, useful for staged deployments where you want to verify infrastructure before enabling the pipeline.
@@ -53,7 +53,7 @@ graph TB
        subgraph trading_tier ["Trading Tier"]
            direction LR
            trading_engine["trading-engine<br/><i>docker/Dockerfile</i><br/><i>uvicorn services.trading.app</i><br/>host :8002 → :8000"]
-            risk_engine["risk-engine<br/><i>docker/Dockerfile</i><br/><i>uvicorn services.risk.app</i><br/>host :8003 → :8000"]
+            risk_engine["risk-engine<br/><i>docker/Dockerfile</i><br/><i>uvicorn services.risk.app</i><br/>host :8003 → :8000<br/><i>alias: risk</i>"]
            broker_adapter["broker-adapter<br/><i>docker/Dockerfile</i><br/><i>python -m services.adapters.broker_service</i><br/><i>no host port</i>"]
        end

@@ -320,3 +320,4 @@ All containers share the default Docker Compose network. Services reference each
 | `hive-metastore` | Hive Metastore container | trino (thrift://hive-metastore:9083) |
 | `trino` | Trino container | superset (trino:8080) |
 | `query-api` | Query API container | dashboard (nginx proxy upstream) |
+| `risk` | risk-engine container (network alias) | trading-engine (risk evaluation calls) |
@@ -11,7 +11,7 @@ graph TB
    %% ── External traffic ──────────────────────────────────────────
    internet((Internet))

-    subgraph traefik ["kube-system (Traefik Ingress Controller)"]
+    subgraph traefik ["kube-system · Traefik Ingress Controller"]
        direction LR
        ing_dash["stonks.celestium.life"]
        ing_api["stonks-api.celestium.life"]
@@ -28,47 +28,55 @@ graph TB
        direction TB

        %% ── API Tier (ingress-facing) ─────────────────────────────
-        subgraph api_tier ["API Tier"]
+        subgraph api_tier ["API Tier · tier: api"]
            direction LR
-            query_api["query-api<br/><i>Deployment (1 replica)</i><br/>:8000"]
-            symbol_registry["symbol-registry<br/><i>Deployment (1 replica)</i><br/>:8000"]
+            query_api["query-api<br/><i>Deployment · 1 replica</i><br/>:8000<br/><i>readiness: /docs</i>"]
+            symbol_registry["symbol-registry<br/><i>Deployment · 1 replica</i><br/>:8000<br/><i>readiness: /docs · liveness: /docs</i>"]
        end

        %% ── Frontend Tier ─────────────────────────────────────────
-        subgraph frontend_tier ["Frontend Tier"]
-            dashboard["dashboard<br/><i>Deployment (1 replica)</i><br/>:8080<br/><i>nginx-unprivileged</i>"]
+        subgraph frontend_tier ["Frontend Tier · tier: frontend"]
+            dashboard["dashboard<br/><i>Deployment · 1 replica</i><br/>:8080<br/><i>nginx-unprivileged</i><br/><i>readiness: / · liveness: /</i>"]
        end

        %% ── Trading Tier ──────────────────────────────────────────
-        subgraph trading_tier ["Trading Tier"]
+        subgraph trading_tier ["Trading Tier · tier: trading"]
            direction LR
-            trading_engine["trading-engine<br/><i>Deployment (1 replica)</i><br/>:8000"]
-            risk_engine["risk-engine<br/><i>Deployment (1 replica)</i><br/>:8000"]
-            broker_adapter["broker-adapter<br/><i>Deployment (1 replica)</i><br/><i>queue-driven worker</i>"]
+            trading_engine["trading-engine<br/><i>Deployment · 1 replica</i><br/>:8000<br/><i>readiness: /ready · liveness: /health</i>"]
+            risk_engine["risk-engine<br/><i>Deployment · 1 replica</i><br/>:8000"]
+            broker_adapter["broker-adapter<br/><i>Deployment · 1 replica</i><br/><i>queue-driven worker · pipeline-gated</i>"]
        end

        %% ── Orchestration Tier ────────────────────────────────────
-        subgraph orchestration_tier ["Orchestration Tier"]
-            scheduler["scheduler<br/><i>Deployment (1 replica)</i><br/><i>runs migrations + seed</i>"]
+        subgraph orchestration_tier ["Orchestration Tier · tier: orchestration"]
+            scheduler["scheduler<br/><i>Deployment · 1 replica · pipeline-gated</i><br/><i>init: migrations → seed → backfill</i>"]
+        end
+
+        %% ── Ingestion Tier ────────────────────────────────────────
+        subgraph ingestion_tier ["Ingestion Tier · tier: ingestion"]
+            ingestion["ingestion<br/><i>Deployment · 1 replica · pipeline-gated</i><br/><i>queue-driven worker</i>"]
        end

        %% ── Processing Tier (pipeline workers) ────────────────────
-        subgraph processing_tier ["Processing Tier (pipeline workers)"]
+        subgraph processing_tier ["Processing Tier · tier: processing"]
            direction LR
-            ingestion["ingestion<br/><i>Deployment (2 replicas)</i>"]
-            parser["parser<br/><i>Deployment (2 replicas)</i>"]
-            extractor["extractor<br/><i>Deployment (1 replica)</i>"]
-            aggregation["aggregation<br/><i>Deployment (4 replicas)</i>"]
-            recommendation["recommendation<br/><i>Deployment (1 replica)</i>"]
+            parser["parser<br/><i>Deployment · 2 replicas · pipeline-gated</i>"]
+            extractor["extractor<br/><i>Deployment · 1 replica · pipeline-gated</i>"]
+            aggregation["aggregation<br/><i>Deployment · 4 replicas · pipeline-gated</i>"]
+            recommendation["recommendation<br/><i>Deployment · 1 replica · pipeline-gated</i>"]
        end

        %% ── Analytics Tier ────────────────────────────────────────
-        subgraph analytics_tier ["Analytics Tier"]
+        subgraph analytics_tier ["Analytics Tier · tier: analytics"]
            direction LR
-            lake_publisher["lake-publisher<br/><i>Deployment (1 replica)</i><br/><i>queue-driven worker</i>"]
-            hive_metastore["hive-metastore<br/><i>Deployment (1 replica)</i><br/>:9083<br/><i>apache/hive:4.0.0</i>"]
-            trino["trino<br/><i>Deployment (1 replica)</i><br/>:8080<br/><i>trinodb/trino:latest</i>"]
-            superset["superset<br/><i>Deployment (1 replica)</i><br/>:8088<br/><i>custom image</i>"]
+            lake_publisher["lake-publisher<br/><i>Deployment · 1 replica · pipeline-gated</i><br/><i>queue-driven worker</i>"]
+            hive_metastore["hive-metastore<br/><i>Deployment · 1 replica</i><br/>:9083<br/><i>apache/hive:4.0.0</i><br/><i>PVC: hive-metastore-data</i>"]
+            trino["trino<br/><i>Deployment · 1 replica</i><br/>:8080<br/><i>trinodb/trino:latest</i><br/><i>readiness: /v1/info</i>"]
+        end
+
+        %% ── Superset (tier: dashboard in template) ────────────────
+        subgraph superset_block ["Superset · tier: dashboard"]
+            superset["superset<br/><i>Deployment · 1 replica</i><br/>:8088<br/><i>custom image</i><br/><i>PVC: superset-data</i><br/><i>readiness: /health</i>"]
        end

        %% ── Helm Secrets ──────────────────────────────────────────
@@ -99,7 +107,7 @@ graph TB
    end

    subgraph ollama_ns ["ollama-service namespace"]
-        ollama[("Ollama<br/>ollama:11434<br/><i>GPU: 4070 Ti Super</i>")]
+        ollama[("Ollama<br/>ollama:11434<br/><i>GPU: 4070 Ti Super 16GB</i>")]
    end

    %% ── Ingress Routes ────────────────────────────────────────────
@@ -191,6 +199,7 @@ graph TB
    sec_broker -.-> broker_adapter

    sec_market -.-> ingestion
+    sec_market -.-> query_api

    sec_gmail -.-> trading_engine

@@ -216,7 +225,9 @@ graph TB
    classDef tradingSvc fill:#e8a838,stroke:#b07d1a,color:#fff
    classDef processSvc fill:#9b59b6,stroke:#6c3483,color:#fff
    classDef orchSvc fill:#1abc9c,stroke:#148f77,color:#fff
+    classDef ingestionSvc fill:#e67e22,stroke:#bf6516,color:#fff
    classDef analyticsSvc fill:#e74c3c,stroke:#a93226,color:#fff
+    classDef supersetSvc fill:#c0392b,stroke:#96281b,color:#fff
    classDef extSvc fill:#95a5a6,stroke:#717d7e,color:#fff
    classDef secretSvc fill:#f5f5dc,stroke:#999,color:#333
    classDef configSvc fill:#dfe6e9,stroke:#999,color:#333
@@ -225,8 +236,10 @@ graph TB
    class dashboard frontendSvc
    class trading_engine,risk_engine,broker_adapter tradingSvc
    class scheduler orchSvc
-    class ingestion,parser,extractor,aggregation,recommendation processSvc
-    class lake_publisher,hive_metastore,trino,superset analyticsSvc
+    class ingestion ingestionSvc
+    class parser,extractor,aggregation,recommendation processSvc
+    class lake_publisher,hive_metastore,trino analyticsSvc
+    class superset supersetSvc
    class postgres,redis,minio,ollama extSvc
    class sec_core,sec_broker,sec_market,sec_gmail,sec_dashboard secretSvc
    class configmap configSvc
@@ -284,8 +297,8 @@ The following services have **no inbound network policy** — they are queue-dri

 | Service | Tier | Behavior |
 |---------|------|----------|
-| scheduler | orchestration | Polls DB, enqueues to Redis |
-| ingestion | processing | Reads from `stonks:queue:ingestion`, writes to DB/MinIO/Redis |
+| scheduler | orchestration | Polls DB, enqueues to Redis. Runs migrations + seed + backfill as init containers |
+| ingestion | ingestion | Reads from `stonks:queue:ingestion`, writes to DB/MinIO/Redis. Egress to Polygon.io/News APIs |
 | parser | processing | Reads from `stonks:queue:parsing`, writes to DB/Redis |
 | extractor | processing | Reads from `stonks:queue:extraction`, calls Ollama, writes to DB/Redis |
 | aggregation | processing | Reads from `stonks:queue:aggregation`, writes to DB/Redis |
@@ -294,22 +307,24 @@ The following services have **no inbound network policy** — they are queue-dri

 ## Service Tier Summary

-| Tier | Services | Ingress? | Replicas | Notes |
-|------|----------|----------|----------|-------|
-| **api** | query-api, symbol-registry | Yes (Traefik) | 1 each | FastAPI, readiness probes on `/docs` |
-| **frontend** | dashboard | Yes (Traefik) | 1 | nginx-unprivileged on :8080, proxies to API services |
-| **trading** | trading-engine, risk-engine, broker-adapter | trading-engine: Yes; risk-engine: internal only; broker-adapter: denied | 1 each | trading-engine has egress to Alpaca + Gmail |
-| **orchestration** | scheduler | No | 1 | Runs DB migrations + seed as init containers |
-| **processing** | ingestion, parser, extractor, aggregation, recommendation | No | 2, 2, 1, 4, 1 | Pipeline-gated by `pipelineEnabled` toggle |
-| **analytics** | lake-publisher, trino, hive-metastore, superset | trino + superset: Yes; others: No | 1 each | lake-publisher is pipeline-gated |
+| Tier | Services | Ingress? | Replicas | Pipeline-Gated? | Notes |
+|------|----------|----------|----------|-----------------|-------|
+| **api** | query-api, symbol-registry | Yes (Traefik) | 1 each | No | FastAPI, readiness probes on `/docs` |
+| **frontend** | dashboard | Yes (Traefik) | 1 | No | nginx-unprivileged on :8080, proxies to API services |
+| **trading** | trading-engine, risk-engine, broker-adapter | trading-engine: Yes; risk-engine: internal only; broker-adapter: denied | 1 each | broker-adapter only | trading-engine has egress to Alpaca + Gmail |
+| **orchestration** | scheduler | No | 1 | Yes | Runs DB migrations + seed + backfill as init containers |
+| **ingestion** | ingestion | No | 1 | Yes | Fetches from external APIs (Polygon.io, news, filings) |
+| **processing** | parser, extractor, aggregation, recommendation | No | 2, 1, 4, 1 | Yes | Queue-driven pipeline workers |
+| **analytics** | lake-publisher, trino, hive-metastore | trino: Yes (Traefik); others: No | 1 each | lake-publisher only | trino + hive-metastore gated by `trino.enabled` / `hiveMetastore.enabled` |
+| **dashboard** (Superset) | superset | Yes (Traefik) | 1 | No | Gated by `superset.enabled`, custom image with trino + psycopg2 drivers |

 ## Secret Consumption Map

 | Secret | Keys | Consumers |
 |--------|------|-----------|
-| `stonks-core-secrets` | POSTGRES_PASSWORD, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, REDIS_PASSWORD | All 13 app services + hive-metastore, trino, superset |
+| `stonks-core-secrets` | POSTGRES_PASSWORD, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, REDIS_PASSWORD | All 13 app services + hive-metastore (init), trino (init), superset |
 | `stonks-broker-secrets` | BROKER_API_KEY, BROKER_API_SECRET, BROKER_BASE_URL | ingestion, trading-engine, risk-engine, broker-adapter |
-| `stonks-market-secrets` | MARKET_DATA_API_KEY | ingestion |
+| `stonks-market-secrets` | MARKET_DATA_API_KEY | ingestion, query-api |
 | `stonks-gmail-secrets` | GMAIL_SENDER, GMAIL_RECIPIENT, GMAIL_APP_PASSWORD | trading-engine |
 | `stonks-dashboard-secrets` | SUPERSET_SECRET_KEY, SUPERSET_ADMIN_PASSWORD | superset |

@@ -336,10 +351,10 @@ These services run outside the `stonks-oracle` namespace and are referenced via

 The analytics stack runs within the `stonks-oracle` namespace:

-1. **Lake Publisher** writes Parquet fact tables to MinIO at `s3a://stonks-lakehouse/warehouse`
-2. **Hive Metastore** (Apache Hive 4.0.0) manages table metadata, backed by embedded Derby DB with a PVC for persistence. Connects to MinIO for S3A filesystem access.
-3. **Trino** queries the lakehouse via Hive Metastore (thrift://hive-metastore:9083). Exposes two catalogs: `lakehouse` (Hive connector) and `iceberg` (Iceberg connector). Both connect to MinIO for data access.
-4. **Superset** connects to Trino for lakehouse queries and to PostgreSQL for its metadata DB. Uses Redis for caching. Exposed externally via Traefik ingress.
+1. **Lake Publisher** writes Parquet fact tables to MinIO at `s3a://stonks-lakehouse/warehouse`. Pipeline-gated — scales to 0 when `pipelineEnabled: false`.
+2. **Hive Metastore** (Apache Hive 4.0.0) manages table metadata, backed by embedded Derby DB with a PVC (`hive-metastore-data`) for persistence. Connects to MinIO for S3A filesystem access. Gated by `hiveMetastore.enabled`.
+3. **Trino** queries the lakehouse via Hive Metastore (`thrift://hive-metastore:9083`). Exposes two catalogs: `lakehouse` (Hive connector) and `iceberg` (Iceberg connector). Both connect to MinIO for data access. Gated by `trino.enabled`. Readiness probe on `/v1/info`.
+4. **Superset** connects to Trino for lakehouse queries and to PostgreSQL for its metadata DB. Uses Redis for caching. Exposed externally via Traefik ingress. Gated by `superset.enabled`. Uses custom image (`registry.celestium.life/stonks-oracle/superset:latest`) with trino + psycopg2 drivers. PVC (`superset-data`) for persistence.

 ## Ingress Routes

@@ -353,3 +368,13 @@ All ingress resources use the `traefik` IngressClass with TLS certificates issue
 | `stonks-trading.celestium.life` | trading-engine | 8000 | `stonks-trading-tls` |
 | `stonks-dash.celestium.life` | superset | 8088 | `stonks-dash-tls` |
 | `stonks-trino.celestium.life` | trino | 8080 | `stonks-trino-tls` |
+
+## Deployment Stages
+
+The Helm chart supports multiple deployment stages via value override files:
+
+| Stage | Override File | Namespace | Key Differences |
+|-------|--------------|-----------|-----------------|
+| **Production** | `values.yaml` (base) | `stonks-oracle` | Full analytics stack, all services |
+| **Paper** | `values-paper.yaml` | `stonks-oracle` | `BROKER_MODE=paper`, `DEPLOY_STAGE=paper`, separate DB (`stonks_paper`), Redis DB 2, paper-specific ingress hostnames |
+| **Beta** | `values-beta.yaml` | `stonks-oracle-beta` | `DEPLOY_STAGE=beta`, `LOG_LEVEL=DEBUG`, separate DB (`stonks_beta`), Redis DB 1, analytics stack disabled, beta-specific ingress hostnames |
@@ -5,6 +5,7 @@ This guide covers running the full Stonks Oracle platform locally using Docker C
 ## Prerequisites

 - Docker Engine 24+ and Docker Compose v2
+- NVIDIA GPU with drivers and NVIDIA Container Toolkit (for Ollama LLM inference)
 - At least 16 GB RAM (Ollama + Trino + all services)
 - API keys for Polygon.io and Alpaca (optional — platform runs in degraded mode without them)

@@ -14,20 +15,54 @@ This guide covers running the full Stonks Oracle platform locally using Docker C
 # 1. Clone the repository
 git clone <repo-url> && cd stonks-oracle

-# 2. Configure API keys
-cp .env.example .env   # or edit the existing .env
-# Fill in MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET
+# 2. Configure API keys (create .env in the repo root)
+cat > .env <<'EOF'
+MARKET_DATA_API_KEY=your_polygon_key
+BROKER_API_KEY=your_alpaca_key
+BROKER_API_SECRET=your_alpaca_secret
+BROKER_BASE_URL=https://paper-api.alpaca.markets
+EOF

 # 3. Start everything
 docker compose up -d

-# 4. Verify all services are healthy
+# 4. Pull an LLM model into Ollama
+docker compose exec ollama ollama pull qwen3.5:9b-fast
+
+# 5. Seed the database
+docker compose exec scheduler python -m services.symbol_registry.seed
+
+# 6. Verify all services are healthy
 docker compose ps

-# 5. Access the dashboard
+# 7. Access the dashboard
 open http://localhost:3000
 ```

+### Automated Deployment
+
+The `deploy-docker.sh` script automates the full deployment to a remote host via SSH, including prerequisite installation, repository sync, environment configuration, image builds, service startup, database seeding, and Ollama model pulling:
+
+```bash
+# Deploy with defaults (GPU-accelerated Docker Ollama)
+bash deploy-docker.sh
+
+# Specify a custom Ollama model
+bash deploy-docker.sh --ollama-model qwen3.6
+
+# Deploy to a different host
+bash deploy-docker.sh --host user@myserver --dir /opt/stonks
+```
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--host` | `celes@192.168.42.254` | SSH target (`USER@HOST`) |
+| `--ollama-url` | (auto — Docker container) | Ollama API URL |
+| `--ollama-model` | `qwen3.5:9b-fast` | Ollama model to pull |
+| `--dir` | `~/stonks-oracle` | Remote install directory |
+
+The script detects the target OS and package manager (apt, dnf, yum, pacman, zypper) and installs Docker, NVIDIA drivers, and the NVIDIA Container Toolkit as needed. It also handles WSL environments and firewall configuration.
+
 ---

 ## Service Inventory
@@ -63,6 +98,8 @@ open http://localhost:3000
 | `query-api` | `docker/Dockerfile` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` | `8004:8000` | postgres (healthy), redis (healthy), minio (healthy) |
 | `dashboard` | `frontend/Dockerfile` | nginx (built-in) | `3000:8080` | query-api (healthy) |

+The `risk-engine` service has a Docker network alias of `risk` so the dashboard's nginx reverse proxy can resolve it as `http://risk:8000`.
+
 ### Port Summary

 | Port | Service | Protocol |
@@ -109,15 +146,27 @@ The `.env` file is loaded by `ingestion`, `broker-adapter`, and `trading-engine`

 ```dotenv
 # Stonks Oracle — Environment Variables
-# These are loaded by ingestion, broker-adapter, and trading-engine services.
+# Loaded by: ingestion, broker-adapter, trading-engine

-# Polygon.io market data API key (required for live data ingestion)
+# ── Required for live data ingestion ──
 MARKET_DATA_API_KEY=

-# Alpaca broker credentials (required for paper/live trading)
+# ── Required for paper/live trading ──
 BROKER_API_KEY=
 BROKER_API_SECRET=
 BROKER_BASE_URL=https://paper-api.alpaca.markets
+
+# ── Trading engine settings (optional) ──
+TRADING_ENABLED=true
+TRADING_RISK_TIER=moderate
+TRADING_MAX_OPEN_POSITIONS=15
+
+# ── LLM model (optional) ──
+OLLAMA_MODEL=qwen3.5:9b-fast
+
+# ── Signal layers (optional) ──
+MACRO_ENABLED=true
+COMPETITIVE_ENABLED=true
 ```

 | Variable | Required | Default | Used By | Description |
@@ -178,20 +227,24 @@ All application services support additional environment variables loaded via `se
 | `REDIS_DB` | `0` | Redis database number |
 | `REDIS_PASSWORD` | (none) | Redis password (not needed in Docker Compose) |
 | `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
-| `OLLAMA_BASE_URL` | `http://ollama:11434` | Ollama LLM server URL |
 | `OLLAMA_MODEL` | `qwen3.5:9b` | Default LLM model for extraction |
 | `OLLAMA_TIMEOUT` | `120` | Ollama request timeout (seconds) |
 | `OLLAMA_MAX_RETRIES` | `2` | Max retries for Ollama requests |
-| `VLLM_BASE_URL` | (empty) | vLLM server URL (if using vLLM instead of Ollama) |
-| `VLLM_MODEL` | (empty) | vLLM model name (e.g. `AxionML/Qwen3.5-9B-NVFP4`) |
+| `OLLAMA_RETRY_BASE_DELAY` | `1.0` | Base delay between retries (seconds) |
+| `OLLAMA_RETRY_MAX_DELAY` | `10.0` | Maximum delay between retries (seconds) |
+| `OLLAMA_RETRY_BACKOFF_MULTIPLIER` | `2.0` | Backoff multiplier for retries |
+| `VLLM_BASE_URL` | `http://192.168.42.254:8000` | vLLM server URL (if using vLLM instead of Ollama) |
+| `VLLM_MODEL` | `RedHatAI/Qwen3.6-35B-A3B-NVFP4` | vLLM model name |
 | `VLLM_TIMEOUT` | `120` | vLLM request timeout (seconds) |
 | `VLLM_MAX_RETRIES` | `2` | Max retries for vLLM requests |
 | `VLLM_TEMPERATURE` | `0.7` | vLLM sampling temperature |
+| `VLLM_MAX_TOKENS` | `4096` | vLLM max output tokens |
 | `VLLM_API_KEY` | (empty) | vLLM API key (if required) |
 | `TRINO_HOST` | `localhost` | Trino hostname |
 | `TRINO_PORT` | `8080` | Trino port |
 | `TRINO_CATALOG` | `lakehouse` | Trino catalog name |
 | `TRINO_SCHEMA` | `stonks` | Trino schema name |
+| `TRINO_ICEBERG_CATALOG` | `iceberg` | Trino Iceberg catalog name |
 | `MARKET_DATA_BASE_URL` | `https://api.polygon.io` | Polygon.io base URL |
 | `MARKET_DATA_PROVIDER` | `polygon` | Market data provider |
 | `BROKER_MODE` | `paper` | Broker mode: `paper` or `live` |
@@ -200,12 +253,62 @@ All application services support additional environment variables loaded via `se
 | `TRADING_RISK_TIER` | `moderate` | Risk tier: `conservative`, `moderate`, `aggressive` |
 | `TRADING_POLLING_INTERVAL_SECONDS` | `60` | Recommendation polling interval |
 | `TRADING_MAX_OPEN_POSITIONS` | `10` | Maximum concurrent open positions |
+| `TRADING_RESERVE_SIPHON_PCT` | `0.20` | Percentage of profits siphoned to reserve pool |
+| `TRADING_STOP_LOSS_CHECK_INTERVAL_SECONDS` | `300` | Stop-loss check interval |
+| `TRADING_FAST_STOP_LOSS_INTERVAL_SECONDS` | `60` | Fast stop-loss check interval |
+| `TRADING_GRADUAL_ENTRY_TRANCHES` | `3` | Number of tranches for gradual entry |
+| `TRADING_GRADUAL_ENTRY_THRESHOLD_DOLLARS` | `30.0` | Dollar threshold for gradual entry |
+| `TRADING_ABSOLUTE_POSITION_CAP` | `50.0` | Maximum position size (dollars) |
+| `TRADING_ACTIVE_POOL_MINIMUM` | `100.0` | Minimum active pool balance |
+| `TRADING_EMERGENCY_DRAWDOWN_THRESHOLD_PCT` | `0.40` | Emergency drawdown threshold |
+| `TRADING_RESERVE_HIGH_WATER_PCT` | `0.30` | Reserve high-water mark percentage |
+| `TRADING_MICRO_TRADING_ENABLED` | `false` | Enable micro-trading mode |
+| `TRADING_MICRO_TRADING_INTERVAL_SECONDS` | `300` | Micro-trading polling interval |
+| `TRADING_MICRO_TRADING_ALLOCATION_CAP_PCT` | `0.03` | Micro-trading allocation cap |
+| `TRADING_MICRO_TRADING_MAX_DAILY` | `10` | Max micro-trades per day |
+| `TRADING_MICRO_TRADING_MAX_HOLD_MINUTES` | `120` | Max micro-trade hold time |
+| `TRADING_SNS_TOPIC_ARN` | (empty) | AWS SNS topic ARN for notifications |
+| `TRADING_SNS_PHONE_NUMBER` | (empty) | Phone number for SNS notifications |
+| `TRADING_GMAIL_SENDER` | (empty) | Gmail sender address for notifications |
+| `TRADING_GMAIL_RECIPIENT` | (empty) | Gmail recipient address for notifications |
 | `MACRO_ENABLED` | `true` | Enable macro signal layer |
+| `MACRO_SIGNAL_WEIGHT` | `0.3` | Relative weight of macro vs company signals |
+| `MACRO_CONFIDENCE_THRESHOLD` | `0.4` | Minimum confidence for macro event inclusion |
+| `MACRO_SHORT_TERM_STALENESS_HOURS` | `48` | Hours before short-term events get accelerated decay |
+| `PROJECTION_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for projections to influence recommendations |
 | `COMPETITIVE_ENABLED` | `true` | Enable competitive signal layer |
+| `COMPETITIVE_SIGNAL_WEIGHT` | `0.2` | Relative weight of competitive signals |
+| `COMPETITIVE_PATTERN_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for pattern inclusion |
+| `COMPETITIVE_PROPAGATION_STRENGTH_THRESHOLD` | `0.2` | Minimum strength for signal propagation |
+| `COMPETITIVE_ROUTINE_LOOKBACK_DAYS` | `180` | Lookback window for routine patterns |
+| `COMPETITIVE_MAJOR_DECISION_LOOKBACK_DAYS` | `365` | Lookback window for major decisions |
+| `COMPETITIVE_MIN_PATTERN_SAMPLES` | `3` | Minimum samples for pattern matching |
+| `COMPETITIVE_MAJOR_DECISION_WEIGHT_MULTIPLIER` | `1.3` | Weight multiplier for major decision patterns |
+| `COMPETITIVE_STALENESS_WINDOW_DAYS` | `180` | Window for staleness decay on competitive signals |
+| `COMPETITIVE_STALENESS_RECENT_DAYS` | `90` | Days within which signals are considered recent |
+| `COMPETITIVE_STALENESS_DECAY_PENALTY` | `0.5` | Decay penalty for stale competitive signals |
+| `COMPETITIVE_PROPAGATION_FAILURE_THRESHOLD` | `5` | Consecutive propagation failures before operator alert |
+| `ALERT_SOURCE_FAILURE_THRESHOLD` | `3` | Consecutive source failures before alert fires |
+| `ALERT_SOURCE_FAILURE_WINDOW_HOURS` | `6` | Lookback window for source failure alerting |
+| `ALERT_SCHEMA_FAILURE_RATE_THRESHOLD` | `0.3` | Extraction failure rate (30%) that triggers alert |
+| `ALERT_SCHEMA_FAILURE_WINDOW_HOURS` | `1` | Lookback window for schema failure spike |
+| `ALERT_LAKE_LAG_THRESHOLD_MINUTES` | `60` | Minutes since last lake publish before alert |
+| `ALERT_BROKER_ERROR_THRESHOLD` | `3` | Consecutive broker errors before alert |
+| `ALERT_BROKER_ERROR_WINDOW_HOURS` | `1` | Lookback window for broker error alerting |
+| `ALERT_CHECK_INTERVAL_SECONDS` | `120` | How often alerting rules are evaluated |
+| `RETENTION_RAW_MARKET_DAYS` | `90` | Retention period for raw market data (days) |
+| `RETENTION_RAW_NEWS_DAYS` | `180` | Retention period for raw news articles (days) |
+| `RETENTION_RAW_FILINGS_DAYS` | `365` | Retention period for raw SEC filings (days) |
+| `RETENTION_NORMALIZED_DAYS` | `180` | Retention period for normalized documents (days) |
+| `RETENTION_LLM_PROMPTS_DAYS` | `365` | Retention period for LLM prompt archives (days) |
+| `RETENTION_LLM_RESULTS_DAYS` | `365` | Retention period for LLM extraction results (days) |
+| `RETENTION_LAKEHOUSE_DAYS` | `730` | Retention period for lakehouse Parquet files (days) |
+| `RETENTION_AUDIT_DAYS` | `730` | Retention period for audit trail artifacts (days) |
+| `RETENTION_CLEANUP_INTERVAL_HOURS` | `24` | How often the retention cleanup worker runs |
+| `RETENTION_BATCH_SIZE` | `1000` | Number of objects processed per cleanup batch |
 | `LOG_LEVEL` | `INFO` | Logging level |
 | `JSON_LOGS` | `true` | Enable structured JSON logging |
 | `DEPLOY_STAGE` | (empty) | Deployment stage prefix for bucket names |
-| `TZ` | `America/Los_Angeles` | Display timezone for timestamps (set on all containers) |

 See `services/shared/config.py` for the complete list of all supported environment variables with their defaults.

@@ -217,7 +320,7 @@ Stonks Oracle supports two LLM backends: **Ollama** (local, self-hosted) and **v

 ### Option A: Bundled Ollama (default)

-The `docker-compose.yml` includes an Ollama container. On first start, pull a model:
+The `docker-compose.yml` includes an Ollama container with GPU passthrough via the NVIDIA Container Toolkit. On first start, pull a model:

 ```bash
 docker compose exec ollama ollama pull qwen3.5:9b-fast
@@ -225,6 +328,8 @@ docker compose exec ollama ollama pull qwen3.5:9b-fast

 No additional configuration needed — services connect to `http://ollama:11434` by default.

+The Ollama container requests all available NVIDIA GPUs via the `deploy.resources.reservations.devices` configuration. If no GPU is available, Ollama falls back to CPU inference (significantly slower).
+
 ### Option B: External Ollama

 If Ollama is already running on the host (e.g. with GPU access), create a `docker-compose.override.yml`:
@@ -252,15 +357,15 @@ services:
      - "host.docker.internal:host-gateway"
 ```

-This disables the bundled Ollama container and routes services to the host's instance. Replace the port if your Ollama runs on a non-standard port.
+This disables the bundled Ollama container and routes services to the host's instance. Replace the port if your Ollama runs on a non-standard port. For a remote Ollama instance (not on localhost), replace `host.docker.internal` with the remote IP and remove the `extra_hosts` block.

 ### Option C: vLLM Server

-For higher throughput or quantized models (e.g. `AxionML/Qwen3.5-9B-NVFP4`), point services at a vLLM server. Add to your `.env`:
+For higher throughput or quantized models (e.g. `RedHatAI/Qwen3.6-35B-A3B-NVFP4`), point services at a vLLM server. Add to your `.env`:

 ```dotenv
 VLLM_BASE_URL=http://192.168.42.254:8000
-VLLM_MODEL=AxionML/Qwen3.5-9B-NVFP4
+VLLM_MODEL=RedHatAI/Qwen3.6-35B-A3B-NVFP4
 VLLM_TIMEOUT=120
 VLLM_TEMPERATURE=0.7
 ```
@@ -268,7 +373,7 @@ VLLM_TEMPERATURE=0.7
 Then update the `ai_agents` table to use the vLLM provider:

 ```sql
-UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE active = true;
+UPDATE ai_agents SET model_provider = 'vllm', model_name = 'RedHatAI/Qwen3.6-35B-A3B-NVFP4' WHERE active = true;
 ```

 Or use the API:
@@ -276,7 +381,7 @@ Or use the API:
 ```bash
 curl -X PUT http://localhost:8004/api/admin/agents/document-extractor \
  -H 'Content-Type: application/json' \
-  -d '{"model_provider": "vllm", "model_name": "AxionML/Qwen3.5-9B-NVFP4"}'
+  -d '{"model_provider": "vllm", "model_name": "RedHatAI/Qwen3.6-35B-A3B-NVFP4"}'
 ```

 ### Option D: Mixed (Ollama + vLLM)
@@ -284,8 +389,8 @@ curl -X PUT http://localhost:8004/api/admin/agents/document-extractor \
 You can run different agents on different providers. For example, use vLLM for the high-volume extractor and Ollama for the thesis rewriter:

 ```sql
-UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE slug = 'document-extractor';
-UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE slug = 'event-classifier';
+UPDATE ai_agents SET model_provider = 'vllm', model_name = 'RedHatAI/Qwen3.6-35B-A3B-NVFP4' WHERE slug = 'document-extractor';
+UPDATE ai_agents SET model_provider = 'vllm', model_name = 'RedHatAI/Qwen3.6-35B-A3B-NVFP4' WHERE slug = 'event-classifier';
 UPDATE ai_agents SET model_provider = 'ollama', model_name = 'qwen3.5:9b-fast' WHERE slug = 'thesis-rewriter';
 ```

@@ -293,19 +398,21 @@ Both `OLLAMA_BASE_URL` and `VLLM_BASE_URL` must be set in the environment for mi

 ### Automated Deployment

-The `deploy-docker.sh` script handles LLM configuration automatically:
+The `deploy-docker.sh` script handles LLM configuration automatically. It always uses the Docker Ollama container with GPU passthrough (NVIDIA Container Toolkit):

 ```bash
-# Auto-detect host Ollama, use default model
+# Deploy with defaults (Docker Ollama, GPU-accelerated)
 bash deploy-docker.sh

-# Specify a remote Ollama instance
-bash deploy-docker.sh --ollama-url http://10.1.1.12:2701 --ollama-model qwen3.6
+# Specify a custom model
+bash deploy-docker.sh --ollama-model qwen3.6

-# Specify a different host
+# Specify a different host and directory
 bash deploy-docker.sh --host user@myserver --dir /opt/stonks
 ```

+If an external Ollama URL is provided via `--ollama-url`, the script creates a `docker-compose.override.yml` that disables the bundled container and routes services to the external instance.
+
 ---

 ## Volume Mounts and Data Persistence
@@ -404,6 +511,9 @@ docker compose ps query-api

 # Inspect health check details for a container
 docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | python -m json.tool
+
+# Wait for all services to be healthy
+docker compose up -d --wait
 ```

 ---
@@ -414,17 +524,19 @@ docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | pyt

 Used by all application services except the scheduler. Accepts a `SERVICE_CMD` build argument that determines which service the container runs.

-**Base image**: `python:3.12-slim`
+**Base image**: `python:3.12-slim` (via Harbor proxy cache in CI)

 **Build arguments**:

 | Argument | Default | Description |
 |----------|---------|-------------|
 | `SERVICE_CMD` | `python -m services.scheduler.app` | The command executed when the container starts |
+| `CACHE_BUST` | (none) | Optional cache-busting argument to force rebuild of source layers |

 **What gets copied**:
 - `requirements.txt` → pip dependencies installed
 - `services/` → all service source code
+- `scripts/` → operational scripts
 - `tests/` → test files (available for in-container testing)
 - `conftest.py` → pytest configuration

@@ -462,7 +574,7 @@ A specialized variant of the generic Dockerfile used only by the `scheduler` ser

 Extends the official Apache Superset image with additional database drivers.

-**Base image**: `apache/superset:latest`
+**Base image**: `apache/superset:latest` (via Harbor proxy cache in CI)

 **Additional packages**: `trino[sqlalchemy]`, `psycopg2-binary`, `redis`

@@ -481,7 +593,9 @@ Multi-stage build for the React dashboard.
 **Stage 2 — Serve** (base: `nginxinc/nginx-unprivileged:alpine`):
 - Serves the built static files on port 8080
 - Uses `frontend/nginx.conf` for SPA fallback and API reverse proxying
- Proxies `/api/` → `query-api:8000`, `/registry/` → `symbol-registry:8000`, `/risk/` → `risk-engine:8000`, `/trading/` → `trading-engine:8000`
+- Proxies `/api/` → `query-api:8000`, `/registry/` → `symbol-registry:8000`, `/risk/` → `risk:8000`, `/trading/` → `trading-engine:8000`
+- SSE stream endpoint (`/api/ops/pipeline/stream`) has buffering disabled for real-time delivery
+- Static assets under `/assets/` are cached with 1-year expiry

 ### Building Custom Images

@@ -503,6 +617,9 @@ docker build -t my-dashboard \

 # Rebuild all images
 docker compose build
+
+# Rebuild without cache (force fresh build)
+docker compose build --no-cache
 ```

 ---
@@ -561,6 +678,9 @@ Services with `condition: service_healthy` wait until the dependency's health ch
 # Start all services in the background
 docker compose up -d

+# Start all services and wait for health checks
+docker compose up -d --wait
+
 # Start only infrastructure (useful for local development)
 docker compose up -d postgres redis minio minio-init ollama

@@ -639,6 +759,9 @@ docker compose exec query-api python -c "from services.shared.config import load

 # Open a shell in a container
 docker compose exec postgres psql -U stonks -d stonks
+
+# Seed the database
+docker compose exec scheduler python -m services.symbol_registry.seed
 ```

 ### Full Reset
@@ -680,13 +803,16 @@ The dashboard container runs nginx with reverse proxy rules that route API reque
 | Path | Proxied To | Service |
 |------|-----------|---------|
 | `/api/` | `http://query-api:8000` | Query API |
+| `/api/ops/pipeline/stream` | `http://query-api:8000` (SSE, no buffering) | Query API (real-time pipeline stream) |
 | `/registry/` | `http://symbol-registry:8000/` | Symbol Registry API |
 | `/risk/` | `http://risk:8000/` | Risk Engine (via network alias) |
 | `/trading/` | `http://trading-engine:8000/` | Trading Engine API |

 The `risk-engine` service has a network alias of `risk` in `docker-compose.yml` so the nginx upstream resolves correctly.

-All other paths serve the React SPA with `try_files` fallback to `index.html`.
+All other paths serve the React SPA with `try_files` fallback to `index.html`. Static assets under `/assets/` are served with 1-year cache headers.
+
+Security headers applied: `X-Frame-Options: SAMEORIGIN`, `X-Content-Type-Options: nosniff`, `Referrer-Policy: strict-origin-when-cross-origin`.

 ---

@@ -734,6 +860,19 @@ curl http://your-vllm-host:8000/v1/models

 If Ollama is already running on the host, the bundled container will fail to bind port 11434. Use the external Ollama configuration described in the "LLM Provider Configuration" section above, or use `deploy-docker.sh` which handles this automatically.

+### GPU not detected by Ollama container
+
+Ensure the NVIDIA Container Toolkit is installed and Docker is configured:
+
+```bash
+# Verify GPU passthrough works
+docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
+
+# If it fails, reconfigure Docker runtime
+sudo nvidia-ctk runtime configure --runtime=docker
+sudo systemctl restart docker
+```
+
 ### Port conflicts

 If a port is already in use, modify the host port mapping in `docker-compose.yml`:
@@ -743,3 +882,15 @@ query-api:
  ports:
    - "9004:8000"   # Changed from 8004 to 9004
 ```
+
+### Container runs out of memory
+
+The full stack requires at least 16 GB RAM. If services are being OOM-killed:
+
+```bash
+# Check which containers are using the most memory
+docker stats --no-stream
+
+# Reduce memory usage by stopping non-essential services
+docker compose stop trino hive-metastore superset
+```
@@ -94,7 +94,7 @@ Each key under `services` defines a Kubernetes Deployment. The deployments templ
 | `image` | string | yes | Image name appended to `image.registry`. Also used as the Deployment name and pod label (`app: <image>`). |
 | `command` | string | no | Shell command passed as `["sh", "-c", "<command>"]`. Omit for images with a built-in entrypoint (e.g., dashboard/nginx). |
 | `tier` | string | yes | Service tier label (`stonks-oracle/tier`). One of: `api`, `frontend`, `processing`, `trading`, `orchestration`, `analytics`, `ingestion`. |
-| `port` | int | no | Container port. When set, a Kubernetes Service is created mapping `port → port`. |
+| `port` | int | no | Container port. When set, a Kubernetes Service is created mapping `port -> port`. |
 | `pipeline` | bool | no | If `true`, replicas are set to 0 when `pipelineEnabled` is `false`. |
 | `secrets` | list(string) | no | List of Secret names to mount via `envFrom.secretRef`. |
 | `resources` | object | yes | Kubernetes resource requests and limits (`cpu`, `memory`). |
@@ -118,9 +118,10 @@ Each key under `services` defines a Kubernetes Deployment. The deployments templ
 | `resources.limits` | cpu: 200m, memory: 128Mi |
 | `probes` | — |

-The scheduler deployment has two init containers (not configurable via values):
+The scheduler deployment has three init containers (not configurable via values):
 1. **run-migrations** — applies all SQL files from `infra/migrations/*.sql` in sorted order.
 2. **seed-if-empty** — runs `python -m services.symbol_registry.seed` if the `companies` table is empty.
+3. **backfill-market-data** — runs `scripts/backfill_market_data.py` if available (skips gracefully if not).

 #### symbolRegistry

@@ -141,7 +142,7 @@ The scheduler deployment has two init containers (not configurable via values):

 | Field | Value |
 |-------|-------|
-| `replicas` | `2` |
+| `replicas` | `1` |
 | `pipeline` | `true` |
 | `image` | `ingestion` |
 | `command` | `python -m services.ingestion.worker` |
@@ -274,7 +275,7 @@ Single replica is recommended — the extractor is bottlenecked by the shared Ol
 | `command` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` |
 | `tier` | `api` |
 | `port` | `8000` |
-| `secrets` | `stonks-core-secrets` |
+| `secrets` | `stonks-core-secrets`, `stonks-market-secrets` |
 | `resources.requests` | cpu: 100m, memory: 128Mi |
 | `resources.limits` | cpu: 500m, memory: 256Mi |
 | `probes.readiness` | path: `/docs`, port: 8000, initialDelay: 5s, period: 10s |
@@ -323,7 +324,7 @@ All keys under `config` are rendered into a Kubernetes ConfigMap named `stonks-c

 | Key | Type | Default | Description |
 |-----|------|---------|-------------|
-| `config.OLLAMA_BASE_URL` | string | `""` (empty) | Ollama API base URL. Set to the cluster-internal or external Ollama endpoint. |
+| `config.OLLAMA_BASE_URL` | string | `http://10.1.1.12:2701` | Ollama API base URL. Points to the external Ollama endpoint by default. |
 | `config.OLLAMA_MODEL` | string | `qwen3.5:9b-fast` | Default LLM model for extraction and classification agents. |
 | `config.OLLAMA_TIMEOUT` | string | `240` | Request timeout in seconds for Ollama API calls. |
 | `config.OLLAMA_MAX_RETRIES` | string | `2` | Maximum retry attempts for failed Ollama requests. |
@@ -331,6 +332,17 @@ All keys under `config` are rendered into a Kubernetes ConfigMap named `stonks-c
 | `config.OLLAMA_RETRY_MAX_DELAY` | string | `10.0` | Maximum delay cap in seconds for Ollama retry backoff. |
 | `config.OLLAMA_RETRY_BACKOFF_MULTIPLIER` | string | `2.0` | Multiplier for exponential backoff between Ollama retries. |

+### vLLM
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `config.VLLM_BASE_URL` | string | `http://10.1.1.12:2701` | vLLM API base URL. Alternative LLM backend using OpenAI-compatible API. |
+| `config.VLLM_MODEL` | string | `qwen3.5:9b-fast` | vLLM model identifier. |
+| `config.VLLM_TIMEOUT` | string | `120` | Request timeout in seconds for vLLM API calls. |
+| `config.VLLM_MAX_RETRIES` | string | `2` | Maximum retry attempts for failed vLLM requests. |
+| `config.VLLM_TEMPERATURE` | string | `0.7` | Sampling temperature for vLLM generation (0.0-1.0). |
+| `config.VLLM_API_KEY` | string | `""` (empty) | API key for vLLM authentication. Leave empty if not required. |
+
 ### Analytics / Trino

 | Key | Type | Default | Description |
@@ -347,7 +359,7 @@ All keys under `config` are rendered into a Kubernetes ConfigMap named `stonks-c
 |-----|------|---------|-------------|
 | `config.BROKER_MODE` | string | `paper` | Broker execution mode. `paper` for simulated trading, `live` for real orders. |
 | `config.BROKER_PROVIDER` | string | `""` (empty) | Broker provider name (e.g., `alpaca`). |
-| `config.MARKET_DATA_BASE_URL` | string | `""` (empty) | Market data API base URL (e.g., `https://api.polygon.io`). |
+| `config.MARKET_DATA_BASE_URL` | string | `https://api.polygon.io` | Market data API base URL. |
 | `config.MARKET_DATA_PROVIDER` | string | `polygon` | Market data provider identifier. |
 | `config.TRADING_ENABLED` | string | `true` | Master toggle for the trading engine. Set to `false` to disable order submission. |
 | `config.TRADING_RISK_TIER` | string | `moderate` | Default risk tier for position sizing. Options: `conservative`, `moderate`, `aggressive`. |
@@ -384,7 +396,7 @@ All keys under `config` are rendered into a Kubernetes ConfigMap named `stonks-c
 |-----|------|---------|-------------|
 | `config.ALERT_SOURCE_FAILURE_THRESHOLD` | string | `3` | Number of consecutive source failures before firing an alert. |
 | `config.ALERT_SOURCE_FAILURE_WINDOW_HOURS` | string | `6` | Time window (hours) for evaluating source failure count. |
-| `config.ALERT_SCHEMA_FAILURE_RATE_THRESHOLD` | string | `0.3` | Schema validation failure rate (0.0–1.0) that triggers an alert. |
+| `config.ALERT_SCHEMA_FAILURE_RATE_THRESHOLD` | string | `0.3` | Schema validation failure rate (0.0-1.0) that triggers an alert. |
 | `config.ALERT_SCHEMA_FAILURE_WINDOW_HOURS` | string | `1` | Time window (hours) for evaluating schema failure rate. |
 | `config.ALERT_LAKE_LAG_THRESHOLD_MINUTES` | string | `60` | Minutes of lakehouse publish lag before alerting. |
 | `config.ALERT_BROKER_ERROR_THRESHOLD` | string | `3` | Number of broker errors before firing an alert. |
@@ -395,7 +407,7 @@ All keys under `config` are rendered into a Kubernetes ConfigMap named `stonks-c

 ## `secrets` — Kubernetes Secrets

-Secrets are rendered into five Kubernetes Secret objects. In the base `values.yaml`, all secret values default to empty strings. Inject real values at deploy time using `--set` flags or a values override file.
+Secrets are rendered into five Kubernetes Secret objects. Inject real values at deploy time using `--set` flags or a values override file. The base `values.yaml` contains placeholder values — override them for each environment.

 ### Secret Objects

@@ -403,32 +415,32 @@ Secrets are rendered into five Kubernetes Secret objects. In the base `values.ya
 |-------------|-----------|-------------|
 | `stonks-core-secrets` | `secrets.core` | All services |
 | `stonks-broker-secrets` | `secrets.broker` | ingestion, trading-engine, risk-engine, broker-adapter |
-| `stonks-market-secrets` | `secrets.market` | ingestion |
+| `stonks-market-secrets` | `secrets.market` | ingestion, query-api |
 | `stonks-gmail-secrets` | `secrets.gmail` | trading-engine |
 | `stonks-dashboard-secrets` | `secrets.dashboard` | superset |

 ### `secrets.core`

-| Key | Type | Default | Description |
-|-----|------|---------|-------------|
-| `POSTGRES_PASSWORD` | string | `""` | PostgreSQL password. |
-| `MINIO_ACCESS_KEY` | string | `""` | MinIO access key (AWS-style). |
-| `MINIO_SECRET_KEY` | string | `""` | MinIO secret key. |
-| `REDIS_PASSWORD` | string | `""` | Redis authentication password. |
+| Key | Type | Description |
+|-----|------|-------------|
+| `POSTGRES_PASSWORD` | string | PostgreSQL password. |
+| `MINIO_ACCESS_KEY` | string | MinIO access key (AWS-style). |
+| `MINIO_SECRET_KEY` | string | MinIO secret key. |
+| `REDIS_PASSWORD` | string | Redis authentication password. |

 ### `secrets.broker`

-| Key | Type | Default | Description |
-|-----|------|---------|-------------|
-| `BROKER_API_KEY` | string | `""` | Broker API key (e.g., Alpaca paper trading key). |
-| `BROKER_API_SECRET` | string | `""` | Broker API secret. |
-| `BROKER_BASE_URL` | string | `""` | Broker API base URL (e.g., `https://paper-api.alpaca.markets`). |
+| Key | Type | Description |
+|-----|------|-------------|
+| `BROKER_API_KEY` | string | Broker API key (e.g., Alpaca paper trading key). |
+| `BROKER_API_SECRET` | string | Broker API secret. |
+| `BROKER_BASE_URL` | string | Broker API base URL (e.g., `https://paper-api.alpaca.markets`). |

 ### `secrets.market`

-| Key | Type | Default | Description |
-|-----|------|---------|-------------|
-| `MARKET_DATA_API_KEY` | string | `""` | Market data provider API key (e.g., Polygon.io). |
+| Key | Type | Description |
+|-----|------|-------------|
+| `MARKET_DATA_API_KEY` | string | Market data provider API key (e.g., Polygon.io). |

 ### `secrets.gmail`

@@ -440,10 +452,10 @@ Secrets are rendered into five Kubernetes Secret objects. In the base `values.ya

 ### `secrets.dashboard`

-| Key | Type | Default | Description |
-|-----|------|---------|-------------|
-| `SUPERSET_SECRET_KEY` | string | `""` | Flask secret key for Superset session encryption. |
-| `SUPERSET_ADMIN_PASSWORD` | string | `""` | Superset admin user password. |
+| Key | Type | Description |
+|-----|------|-------------|
+| `SUPERSET_SECRET_KEY` | string | Flask secret key for Superset session encryption. |
+| `SUPERSET_ADMIN_PASSWORD` | string | Superset admin user password. |

 ### Injecting Secrets at Deploy Time

@@ -596,15 +608,20 @@ Key overrides:
 | `pipelineEnabled` | `true` | Services deployed (ArgoCD health checks), but pipeline defaults to OFF via `PIPELINE_DEFAULT_OFF`. |
 | `config.DEPLOY_STAGE` | `beta` | Isolates Redis keys (`stonks:beta:*`) and MinIO buckets (`beta-stonks-*`). |
 | `config.POSTGRES_DB` | `stonks_beta` | Separate database for beta data. |
+| `config.POSTGRES_USER` | `stonks_beta` | Separate database user for beta. |
 | `config.REDIS_DB` | `1` | Separate Redis DB index. |
 | `config.LOG_LEVEL` | `DEBUG` | Verbose logging for debugging. |
-| `config.TRADING_ENABLED` | `false` | Safety net — no order submission in beta. |
-| `config.PIPELINE_DEFAULT_OFF` | `true` | Scheduler won't enqueue jobs unless explicitly enabled. |
+| `config.TRADING_ENABLED` | `true` | Trading engine active but constrained by paper broker mode. |
+| `config.PIPELINE_DEFAULT_OFF` | `true` | Scheduler won't enqueue jobs unless explicitly enabled via the UI. |
+| `config.BROKER_MODE` | `paper` | Simulated order execution. |
+| `config.BROKER_PROVIDER` | `alpaca` | Alpaca paper trading API. |
 | `config.OLLAMA_MODEL` | `qwen3.6` | May use a different model version for testing. |
 | `trino.enabled` | `false` | Analytics stack disabled in beta. |
 | `hiveMetastore.enabled` | `false` | Analytics stack disabled in beta. |
 | `superset.enabled` | `false` | Analytics stack disabled in beta. |

+Beta also configures vLLM settings (`VLLM_BASE_URL`, `VLLM_MODEL`, etc.) for testing alternative LLM backends.
+
 Beta ingress hostnames:

 | Service | Hostname |
@@ -649,11 +666,11 @@ Paper ingress hostnames:

 ```
 values-beta.yaml          values-paper.yaml          values.yaml (base)
-     Beta          →        Paper Trading       →        Production
+     Beta          ->        Paper Trading       ->        Production
  Integration               Simulated orders           Live trading
  testing                   Real market data           Real orders
  Pipeline OFF              Pipeline ON                Pipeline ON
-  Trading OFF               Trading ON                 Trading ON
+  Trading ON                Trading ON                 Trading ON
  Analytics OFF             Analytics ON               Analytics ON
 ```

@@ -20,7 +20,7 @@ scrape_configs:
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: /metrics
-    static_targets:
+    static_configs:
      - targets:
          # Docker Compose
          - "query-api:8000"
@@ -124,6 +124,7 @@ All metrics are defined in `services/shared/metrics.py`. Metric names use the `s
 | `stonks_orders_rejected_total` | Counter | `reason_category` | Orders rejected before broker submission |
 | `stonks_orders_filled_total` | Counter | `side` | Orders filled by broker |
 | `stonks_orders_duplicates_prevented_total` | Counter | `detected_via` | Duplicate orders prevented by idempotency checks |
+| `stonks_orders_clamped_total` | Counter | — | Orders auto-clamped to fit within position limits |
 | `stonks_risk_evaluations_total` | Counter | `result` | Risk evaluations performed |
 | `stonks_risk_check_failures_total` | Counter | `check_name` | Individual risk check failures |
 | `stonks_positions_synced_total` | Counter | — | Position sync operations completed |
@@ -41,6 +41,7 @@ All queues use the `stonks:queue:<name>` key pattern (configurable via `DEPLOY_S
 | `recommendation` | `stonks:queue:recommendation` | Aggregation | Recommendation |
 | `broker_orders` | `stonks:queue:broker_orders` | Trading Engine, Trading API | Broker Adapter |
 | `lake_publish` | `stonks:queue:lake_publish` | Various services | Lake Publisher |
+| `report_generation` | `stonks:queue:report_generation` | Scheduler | Scheduler (inline consumer) |

 ### Queue Message Schemas

@@ -131,11 +132,20 @@ All queues use the `stonks:queue:<name>` key pattern (configurable via `DEPLOY_S
 }
 ```

+**Report Generation Job** (`stonks:queue:report_generation`):
+```json
+{
+  "report_type": "daily | weekly",
+  "period_start": "2025-01-01",
+  "period_end": "2025-01-01"
+}
+```
+
 ---

 ## 1. Scheduler

-**Purpose**: Triggers ingestion cycles for tracked companies and sources on a configurable cadence. Polls the symbol registry for active companies and their configured sources, respects per-source polling intervals and backoff windows, coordinates rate limits across source types, and enqueues ingestion jobs for downstream workers. Also runs periodic maintenance: stale document recovery, failed extraction retries, and data retention cleanup.
+**Purpose**: Triggers ingestion cycles for tracked companies and sources on a configurable cadence. Polls the symbol registry for active companies and their configured sources, respects per-source polling intervals and backoff windows, coordinates rate limits across source types, and enqueues ingestion jobs for downstream workers. Also runs periodic maintenance: stale document recovery, failed extraction retries, data retention cleanup, periodic aggregation re-runs, and automated report generation (daily/weekly).

 **Entry Point**: `services.scheduler.app`

@@ -176,12 +186,16 @@ All queues use the `stonks:queue:<name>` key pattern (configurable via `DEPLOY_S
 | `recommendations` | Write (delete) | Retention cleanup |
 | `order_events` | Write (delete) | Retention cleanup |
 | `model_performance_metrics` | Write (delete) | Retention cleanup |
+| `ingestion_runs` | Write (delete) | Retention cleanup |
+| `trading_reports` | Write | Report generation storage |

 ### Redis Queues

 | Direction | Queue | Purpose |
 |---|---|---|
 | Publish | `stonks:queue:ingestion` | Enqueue ingestion jobs for due sources |
+| Publish | `stonks:queue:aggregation` | Periodic aggregation re-runs |
+| Publish/Consume | `stonks:queue:report_generation` | Enqueue and consume report generation jobs |
 | Read | `stonks:pipeline:enabled` | Pipeline toggle (skip cycle if `"0"`) |
 | Read/Write | `stonks:lock:scheduler_cycle` | Distributed lock for single-writer |
 | Read/Write | `stonks:ratelimit:*` | Per-source-type and global Polygon rate limits |
@@ -195,6 +209,8 @@ All queues use the `stonks:queue:<name>` key pattern (configurable via `DEPLOY_S
 - **Stale document recovery**: Every ~5 minutes, re-enqueues documents stuck in `parsed` status for >240 minutes.
 - **Failed extraction retry**: Every ~10 minutes, re-enqueues `extraction_failed` documents older than 60 minutes.
 - **Data retention cleanup**: Every ~25 minutes, deletes old rows from 10 tables with configurable retention windows (14–90 days).
+- **Periodic aggregation**: Re-enqueues aggregation jobs for all active tickers to keep trend summaries fresh.
+- **Report generation**: Enqueues daily and weekly report jobs on schedule; consumes them inline via `process_report_job` with retry logic (3 attempts, exponential backoff 30s/60s/120s).

 ---

@@ -281,7 +297,7 @@ None — this service is purely HTTP-driven.
 ### MinIO Buckets

 - `stonks-raw-market` — Raw market data JSON
- `stonks-raw-news` — Raw news article JSON
+- `stonks-raw-news` — Raw news article JSON (also used for macro news)
 - `stonks-raw-filings` — Raw SEC filing data
 - `stonks-normalized` — Normalized text (written by parser)

@@ -296,6 +312,13 @@ None — this service is purely HTTP-driven.
 | `broker` | `AlpacaBrokerAdapter` | Alpaca |
 | `macro_news` | `MacroNewsAdapter` | Polygon.io |

+### Key Behaviors
+
+- Macro news jobs (`source_type=macro_news`) may lack a `company_id` — the worker handles this gracefully
+- Macro news documents are typed as `macro_event` so the parser routes them to the macro classification queue
+- Duplicate documents detected via content hash are linked to the current company (except for `macro_news`)
+- Tracks `last_published_at` per source to fetch only newer articles on subsequent runs
+
 ---

 ## 4. Parser
@@ -349,7 +372,7 @@ None — this service is purely HTTP-driven.

 ## 5. Extractor

-**Purpose**: Performs LLM-based intelligence extraction from documents using Ollama. Handles two pipelines: (1) standard document extraction producing `DocumentIntelligence` with per-company impact records, and (2) macro event classification producing `GlobalEventSchema` with company-level macro impact interpolation. Supports AI agent configuration with variant-based A/B testing.
+**Purpose**: Performs LLM-based intelligence extraction from documents using Ollama or a remote vLLM inference server. Handles two pipelines: (1) standard document extraction producing `DocumentIntelligence` with per-company impact records, and (2) macro event classification producing `GlobalEventSchema` with company-level macro impact interpolation. Supports AI agent configuration with variant-based A/B testing and provider routing (Ollama or vLLM).

 **Entry Point**: `services.extractor.main`

@@ -363,9 +386,16 @@ None — this service is purely HTTP-driven.
 | `REDIS_*` | _(see shared)_ | Redis connection |
 | `MINIO_*` | _(see shared)_ | MinIO connection |
 | `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API endpoint |
-| `OLLAMA_MODEL` | `qwen3.5:9b` | Default LLM model |
+| `OLLAMA_MODEL` | `qwen3.5:9b` | Default Ollama model |
 | `OLLAMA_TIMEOUT` | `120` | Request timeout (seconds) |
 | `OLLAMA_MAX_RETRIES` | `2` | Max retry attempts |
+| `VLLM_BASE_URL` | `http://192.168.42.254:8000` | vLLM inference server endpoint |
+| `VLLM_MODEL` | `RedHatAI/Qwen3.6-35B-A3B-NVFP4` | Default vLLM model |
+| `VLLM_TIMEOUT` | `120` | vLLM request timeout (seconds) |
+| `VLLM_MAX_RETRIES` | `2` | vLLM max retry attempts |
+| `VLLM_MAX_TOKENS` | `4096` | vLLM max output tokens |
+| `VLLM_TEMPERATURE` | `0.7` | vLLM sampling temperature |
+| `VLLM_API_KEY` | _(empty)_ | Optional API key for authenticated vLLM deployments |
 | `MACRO_CONFIDENCE_THRESHOLD` | `0.4` | Minimum confidence for macro event inclusion |
 | `LOG_LEVEL` | `INFO` | Logging level |

@@ -395,6 +425,7 @@ None — this service is purely HTTP-driven.

 ### Key Behaviors

+- **LLM provider routing**: The `AgentConfigResolver` resolves agent configuration from the DB, including a `model_provider` field (`"ollama"` or `"vllm"`). The `build_llm_client` factory returns the appropriate client (`OllamaClient` or `VLLMClient`).
 - Alternates between macro and extraction queues (1 macro per 3 jobs) to prevent starvation
 - Resolves agent configuration from DB with 60-second TTL cache (`AgentConfigResolver`)
 - Supports separate models for document extraction and event classification
@@ -565,7 +596,7 @@ None — this service is purely HTTP-driven.
 | `risk_tier_history` | Read/Write | Risk tier change audit trail |
 | `circuit_breaker_events` | Read/Write | Circuit breaker trigger/reset events |
 | `positions` | Read | Current open positions |
-| `position_stop_levels` | Read/Write | Stop-loss and take-profit levels |
+| `position_stop_levels` | Read/Write | Stop-loss and take-profit levels per position |
 | `orders` | Read | Order history for dedup |
 | `backtest_runs` | Read/Write | Backtest configuration and results |
 | `backtest_trades` | Read/Write | Individual trades within a backtest |
@@ -652,7 +683,7 @@ None — called synchronously by the broker adapter and via HTTP.
 | `positions` | Write (upsert) | Sync positions from Alpaca |
 | `broker_accounts` | Write (upsert) | Register/update broker account |
 | `daily_risk_snapshots` | Read | Daily portfolio state for risk evaluation |
-| `risk_configs` | Read | Active risk configuration |
+| `risk_configs` | Read | Active risk configuration for order evaluation |
 | `approval_requests` | Write | Create approval requests for gated orders |
 | `audit_events` | Write | Full audit trail |

@@ -728,7 +759,7 @@ None — called synchronously by the broker adapter and via HTTP.

 ## 12. Query API

-**Purpose**: Read-only FastAPI service for analytics, evidence drill-down, and admin controls. Serves the React dashboard and external integrations with endpoints for companies, documents, trends, recommendations, orders, positions, portfolio metrics, global events, macro impacts, competitive signals, trend projections, AI agents, dead-letter queues, pipeline control, SQL explorer, saved queries, audit trail, DevOps metrics, and Prometheus metrics.
+**Purpose**: Read-only FastAPI service for analytics, evidence drill-down, and admin controls. Serves the React dashboard and external integrations with endpoints for companies, documents, trends, recommendations, orders, positions, portfolio metrics, global events, macro impacts, competitive signals, trend projections, AI agents, dead-letter queues, pipeline control, SQL explorer, saved queries, audit trail, DevOps metrics, Prometheus metrics, model validation, and trading reports.

 **Entry Point**: `services.api.app` (FastAPI)

@@ -745,6 +776,7 @@ None — called synchronously by the broker adapter and via HTTP.
 | `TRINO_PORT` | `8080` | Trino port |
 | `TRINO_CATALOG` | `lakehouse` | Trino catalog |
 | `TRINO_SCHEMA` | `stonks` | Trino schema |
+| `TRINO_ICEBERG_CATALOG` | `iceberg` | Trino Iceberg catalog |
 | `LOG_LEVEL` | `INFO` | Logging level |

 ### Database Tables
@@ -757,9 +789,9 @@ The Query API reads from nearly all tables in the database, including:
 | `sources` | Source configurations |
 | `documents`, `document_company_mentions` | Document timelines |
 | `document_intelligence`, `document_impact_records` | Intelligence extraction results |
-| `trend_windows`, `trend_history`, `trend_projections` | Trend summaries and projections |
+| `trend_windows`, `trend_history`, `trend_projections`, `trend_evidence` | Trend summaries and projections |
 | `recommendations`, `recommendation_evidence` | Recommendation history with evidence |
-| `risk_evaluations` | Risk evaluation results |
+| `risk_evaluations`, `risk_configs` | Risk evaluation results and configuration |
 | `orders`, `order_events` | Order history and lifecycle |
 | `positions`, `portfolio_snapshots` | Portfolio state |
 | `global_events`, `macro_impact_records` | Macro event data |
@@ -768,6 +800,13 @@ The Query API reads from nearly all tables in the database, including:
 | `audit_events` | Audit trail |
 | `market_snapshots` | Market price data |
 | `watchlists`, `watchlist_members` | Watchlist data |
+| `ingestion_runs` | Ingestion throughput and source health |
+| `model_performance_metrics` | Model quality metrics |
+| `prediction_snapshots`, `prediction_outcomes` | Model validation and calibration |
+| `trading_decisions` | Trading decision history |
+| `trading_reports` | Generated daily/weekly reports |
+| `approval_requests` | Pending approval workflow |
+| `symbol_lockouts` | Active trading lockouts per symbol |

 ### Redis Queues

@@ -776,15 +815,22 @@ The Query API reads from nearly all tables in the database, including:
 | Read/Write | `stonks:pipeline:enabled` | Pipeline toggle control |
 | Read | `stonks:queue:*` | Queue depth monitoring for DLQ and DevOps metrics |
 | Read | `stonks:dlq:*` | Dead-letter queue inspection and replay |
+| Read | `stonks:ratelimit:*` | Rate limit status monitoring |

 ### Key Behaviors

 - Exposes `/metrics` endpoint for Prometheus scraping
 - Trace context propagation via `x-trace-id` header middleware
- SQL explorer endpoint for ad-hoc Trino queries
+- SQL explorer endpoint for ad-hoc Trino queries (`/analytics/query`)
+- PostgreSQL schema explorer (`/pg/schema`, `/pg/query`)
 - Dead-letter queue management (list, inspect, replay)
 - Pipeline control (enable/disable via Redis toggle)
 - Saved queries with CRUD operations
+- Macro and competitive layer toggle endpoints
+- Model validation endpoints (summary, calibration, IC by horizon, gate status, attribution)
+- Trading report listing and retrieval
+- SSE pipeline health stream (`/pipeline/stream`)
+- Market price backfill endpoints

 ---

@@ -1042,6 +1088,67 @@ All services load configuration from environment variables via `services/shared/
 | `OLLAMA_MODEL` | `qwen3.5:9b` | Default model |
 | `OLLAMA_TIMEOUT` | `120` | Request timeout (seconds) |
 | `OLLAMA_MAX_RETRIES` | `2` | Max retry attempts |
+| `OLLAMA_RETRY_BASE_DELAY` | `1.0` | Base delay between retries (seconds) |
+| `OLLAMA_RETRY_MAX_DELAY` | `10.0` | Maximum delay between retries (seconds) |
+| `OLLAMA_RETRY_BACKOFF_MULTIPLIER` | `2.0` | Backoff multiplier |
+
+### vLLM
+
+| Variable | Default | Description |
+|---|---|---|
+| `VLLM_BASE_URL` | `http://192.168.42.254:8000` | vLLM inference server endpoint |
+| `VLLM_MODEL` | `RedHatAI/Qwen3.6-35B-A3B-NVFP4` | Default vLLM model |
+| `VLLM_TIMEOUT` | `120` | Request timeout (seconds) |
+| `VLLM_MAX_RETRIES` | `2` | Max retry attempts |
+| `VLLM_MAX_TOKENS` | `4096` | Max output tokens |
+| `VLLM_TEMPERATURE` | `0.7` | Sampling temperature |
+| `VLLM_API_KEY` | _(empty)_ | Optional API key for authenticated deployments |
+| `VLLM_RETRY_BASE_DELAY` | `1.0` | Base delay between retries (seconds) |
+| `VLLM_RETRY_MAX_DELAY` | `10.0` | Maximum delay between retries (seconds) |
+| `VLLM_RETRY_BACKOFF_MULTIPLIER` | `2.0` | Backoff multiplier |
+
+### Trino
+
+| Variable | Default | Description |
+|---|---|---|
+| `TRINO_HOST` | `localhost` | Trino host |
+| `TRINO_PORT` | `8080` | Trino port |
+| `TRINO_CATALOG` | `lakehouse` | Trino catalog |
+| `TRINO_SCHEMA` | `stonks` | Trino schema |
+| `TRINO_ICEBERG_CATALOG` | `iceberg` | Trino Iceberg catalog |
+
+### Market Data
+
+| Variable | Default | Description |
+|---|---|---|
+| `MARKET_DATA_API_KEY` | _(empty)_ | Polygon.io API key |
+| `MARKET_DATA_BASE_URL` | `https://api.polygon.io` | Polygon base URL |
+| `MARKET_DATA_PROVIDER` | `polygon` | Market data provider |
+
+### Broker
+
+| Variable | Default | Description |
+|---|---|---|
+| `BROKER_MODE` | `paper` | Trading mode (`paper` or `live`) |
+| `BROKER_PROVIDER` | `alpaca` | Broker provider |
+| `BROKER_API_KEY` | _(none)_ | Alpaca API key |
+| `BROKER_API_SECRET` | _(none)_ | Alpaca API secret |
+| `BROKER_BASE_URL` | _(none)_ | Alpaca base URL |
+
+### Retention
+
+| Variable | Default | Description |
+|---|---|---|
+| `RETENTION_RAW_MARKET_DAYS` | `90` | Raw market data retention (days) |
+| `RETENTION_RAW_NEWS_DAYS` | `180` | Raw news data retention (days) |
+| `RETENTION_RAW_FILINGS_DAYS` | `365` | Raw filings retention (days) |
+| `RETENTION_NORMALIZED_DAYS` | `180` | Normalized text retention (days) |
+| `RETENTION_LLM_PROMPTS_DAYS` | `365` | LLM prompt retention (days) |
+| `RETENTION_LLM_RESULTS_DAYS` | `365` | LLM result retention (days) |
+| `RETENTION_LAKEHOUSE_DAYS` | `730` | Lakehouse data retention (days) |
+| `RETENTION_AUDIT_DAYS` | `730` | Audit log retention (days) |
+| `RETENTION_CLEANUP_INTERVAL_HOURS` | `24` | Cleanup interval (hours) |
+| `RETENTION_BATCH_SIZE` | `1000` | Rows deleted per batch |

 ### Observability