feat: comprehensive docs, unit tests, docker-compose app services

- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
This commit is contained in:
Celes Renata
2026-04-22 02:56:41 +00:00
parent f251c53f92
commit 88ad1e8d99
57 changed files with 13318 additions and 51 deletions
+618
View File
@@ -0,0 +1,618 @@
# AI Agent Building Guide
Stonks Oracle uses three AI agents powered by a local Ollama instance. Each agent has a dedicated purpose in the pipeline, a database-backed configuration, and support for A/B testing through variants. This guide covers how each agent works, how to configure them, how to create and test variants, and how to monitor performance.
## Table of Contents
- [Built-in Agents](#built-in-agents)
- [Document Intelligence Extractor](#1-document-intelligence-extractor)
- [Global Event Classifier](#2-global-event-classifier)
- [Thesis Rewriter](#3-thesis-rewriter)
- [Database Schema](#database-schema)
- [ai_agents Table](#ai_agents-table)
- [agent_variants Table](#agent_variants-table)
- [agent_performance_log Table](#agent_performance_log-table)
- [AgentConfigResolver](#agentconfigresolver)
- [Performance Logging and Variant Comparison](#performance-logging-and-variant-comparison)
- [API Endpoints](#api-endpoints)
- [Step-by-Step: Creating and Activating a Variant](#step-by-step-creating-and-activating-a-variant)
---
## Built-in Agents
Three agents are seeded into the `ai_agents` table on first migration (migration `026_ai_agents.sql`). They have `source = 'system'` and cannot be deleted through the API — only deactivated or edited.
### 1. Document Intelligence Extractor
| Field | Value |
|-------|-------|
| **Slug** | `document-extractor` |
| **Purpose** | Extracts structured intelligence (sentiment, catalysts, impact scores, key facts, risks) from company news, SEC filings, earnings transcripts, and press releases |
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
| **Prompt Version** | `document-intel-v2` |
| **Schema Version** | `2.0.0` |
| **Entry Point** | `services/extractor/main.py``services/extractor/client.py` |
**Input Data:**
- Normalized document text (fetched from MinIO or passed in the Redis job payload)
- Document type: `article`, `filing`, `transcript`, or `press_release`
- List of tracked tickers for company identification
- Document ID for traceability
**Output Schema** (`ExtractionResult`):
```json
{
"summary": "1-3 sentence summary",
"companies": [
{
"ticker": "AAPL",
"company_name": "Apple Inc.",
"relevance": 0.9,
"sentiment": "positive|negative|neutral|mixed",
"impact_score": 0.7,
"impact_horizon": "intraday|1d|1d_7d|1d_30d|30d_90d|90d_plus",
"catalyst_type": "earnings|product|legal|macro|supply_chain|m_and_a|rating_change|other",
"key_facts": ["fact1", "fact2"],
"risks": ["risk1"],
"evidence_spans": ["verbatim quote from document"]
}
],
"macro_themes": ["inflation", "ai_capex"],
"novelty_score": 0.6,
"confidence": 0.8,
"extraction_warnings": []
}
```
**System Prompt:**
```
You are a financial document analyst. Extract structured data as JSON.
Return ONLY a single JSON object. No markdown fences, no explanation,
no text before or after the JSON. Every field in the schema is required.
Use "other" for catalyst_type if unsure. Keep evidence_spans short
(under 20 words each). Keep key_facts to 3-5 items max.
```
**User Prompt Template** (built by `build_extraction_prompt()` in `services/extractor/prompts.py`):
- Includes document type and type-specific guidance (article, filing, transcript, press release)
- Includes tracked ticker list with rules for company identification
- Includes the full JSON schema field descriptions
- Truncates documents to 8,000 characters to limit inference time
---
### 2. Global Event Classifier
| Field | Value |
|-------|-------|
| **Slug** | `event-classifier` |
| **Purpose** | Classifies global/geopolitical news into structured macro events with impact type, severity, affected regions/sectors/commodities, and estimated duration |
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
| **Prompt Version** | `event-classification-v1` |
| **Schema Version** | `1.0.0` |
| **Entry Point** | `services/extractor/main.py``services/extractor/event_classifier.py` |
**Input Data:**
- Normalized text of a macro news article (from the `stonks:queue:macro_classification` Redis queue)
- Document ID for traceability
**Output Schema** (`GlobalEvent`):
```json
{
"event_types": ["trade_barrier", "commodity_shock"],
"severity": "low|moderate|high|critical",
"affected_regions": ["US", "CN"],
"affected_sectors": ["Energy", "Industrials"],
"affected_commodities": ["crude_oil"],
"summary": "1-3 sentence summary of event and market implications",
"key_facts": ["fact1", "fact2"],
"estimated_duration": "short_term|medium_term|long_term",
"confidence": 0.75
}
```
Valid `event_types`: `supply_disruption`, `demand_shift`, `cost_increase`, `regulatory_pressure`, `currency_impact`, `commodity_shock`, `trade_barrier`, `geopolitical_risk`
Valid `severity`: `low`, `moderate`, `high`, `critical`
**System Prompt:**
```
You classify MACRO-LEVEL global news into structured event JSON.
Return ONLY a single JSON object. No markdown, no explanation.
Every field is required. Keep key_facts to 3-5 items. Keep summary
under 3 sentences.
CRITICAL: Only classify articles about MACRO events that affect entire
markets, sectors, or economies. Examples: trade wars, interest rate
changes, commodity supply disruptions, regulatory changes, geopolitical
conflicts, natural disasters.
DO NOT classify as macro events: individual company earnings, lawsuits
against a single company, single-company management changes, individual
stock analysis, company-specific debt or bankruptcy, product launches
by one company. For these, set severity to "low", confidence below 0.3,
and leave affected_regions, affected_sectors, and affected_commodities
as empty arrays.
```
**User Prompt Template** (built by `build_event_classification_prompt()` in `services/extractor/event_classifier.py`):
- Includes anti-hallucination rules
- Lists all valid enum values for each field
- Truncates articles to 6,000 characters
---
### 3. Thesis Rewriter
| Field | Value |
|-------|-------|
| **Slug** | `thesis-rewriter` |
| **Purpose** | Rewrites deterministic trade thesis summaries into clear, professional analyst prose. Optional layer — the system falls back to the deterministic thesis if this fails |
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
| **Prompt Version** | `thesis-rewrite-v1` |
| **Schema Version** | `1.0.0` |
| **Entry Point** | `services/recommendation/main.py``services/recommendation/thesis_llm.py` |
**Input Data:**
- Deterministic thesis string (rule-based, built from trend data and eligibility rules)
- `TrendSummary` context: ticker, window, direction, strength, confidence, contradiction score, dominant catalysts, material risks
**Output Schema:**
- Plain text (not JSON). The model returns only the rewritten thesis as a string, under 150 words.
- On failure or empty response, the original deterministic thesis is returned unchanged.
**System Prompt:**
```
You are a concise financial analyst. You rewrite structured trade thesis
summaries into clear, professional prose suitable for an internal
research note.
STRICT RULES:
1. Do NOT add any information that is not present in the input.
2. Do NOT fabricate numbers, dates, company names, or analyst opinions.
3. Keep the rewrite under 150 words.
4. Preserve all factual claims, risk notes, and evidence counts from
the input.
5. Use a neutral, professional tone. Avoid hype or marketing language.
6. Return ONLY the rewritten thesis text. No JSON, no markdown, no
commentary.
```
**User Prompt Template** (built by `build_thesis_rewrite_prompt()` in `services/recommendation/thesis_llm.py`):
- Includes the deterministic thesis between delimiters
- Includes trend context: ticker, window, direction, strength, confidence, contradiction score, top catalysts, top risks
---
## Database Schema
### `ai_agents` Table
Defined in migration `026_ai_agents.sql`. Stores the base configuration for each agent.
| Column | Type | Default | Description |
|--------|------|---------|-------------|
| `id` | `UUID` | `gen_random_uuid()` | Primary key |
| `name` | `VARCHAR(100)` | — | Human-readable name (unique) |
| `slug` | `VARCHAR(100)` | — | URL-safe identifier (unique), used by `AgentConfigResolver` |
| `purpose` | `TEXT` | `''` | Description of what the agent does |
| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider |
| `model_name` | `VARCHAR(200)` | `'qwen3.5:9b'` | Model identifier |
| `system_prompt` | `TEXT` | `''` | System prompt sent to the model |
| `user_prompt_template` | `TEXT` | `''` | User prompt template (optional — code-defined templates take precedence) |
| `prompt_version` | `VARCHAR(100)` | `''` | Version tag for prompt tracking |
| `schema_version` | `VARCHAR(50)` | `'1.0.0'` | Version of the output schema |
| `temperature` | `FLOAT` | `0.0` | Model temperature |
| `max_tokens` | `INTEGER` | `32768` | Maximum output tokens |
| `timeout_seconds` | `INTEGER` | `120` | Request timeout |
| `max_retries` | `INTEGER` | `2` | Retry count on failure |
| `active` | `BOOLEAN` | `TRUE` | Whether the agent is enabled |
| `source` | `VARCHAR(20)` | `'system'` | `'system'` for built-in agents, `'user'` for API-created |
| `created_at` | `TIMESTAMPTZ` | `NOW()` | Creation timestamp |
| `updated_at` | `TIMESTAMPTZ` | `NOW()` | Last update timestamp |
**Indexes:**
- `idx_ai_agents_slug` on `slug`
- `idx_ai_agents_active` on `active`
**Registration:**
- **System-seeded**: The three built-in agents are inserted by migration 026 using `INSERT ... WHERE NOT EXISTS` — they are only created if no row with that slug exists. This means user edits to system agents are preserved across re-migrations.
- **API-created**: Users can create custom agents via `POST /api/agents`. These get `source = 'user'` and can be deleted.
### `agent_variants` Table
Defined in migration `027_agent_variants.sql`. Stores alternative configurations for A/B testing.
| Column | Type | Default | Description |
|--------|------|---------|-------------|
| `id` | `UUID` | `gen_random_uuid()` | Primary key |
| `agent_id` | `UUID` | — | Foreign key → `ai_agents(id)` (CASCADE delete) |
| `variant_name` | `VARCHAR(200)` | — | Human-readable variant name |
| `variant_slug` | `VARCHAR(200)` | — | URL-safe slug (unique per agent) |
| `description` | `TEXT` | `''` | What this variant changes |
| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider override |
| `model_name` | `VARCHAR(200)` | — | Model override |
| `system_prompt` | `TEXT` | `''` | System prompt override |
| `user_prompt_template` | `TEXT` | `''` | User prompt template override |
| `prompt_version` | `VARCHAR(100)` | `''` | Prompt version tag |
| `temperature` | `FLOAT` | `0.0` | Temperature override |
| `max_tokens` | `INTEGER` | `32768` | Max tokens override |
| `context_window` | `INTEGER` | `0` | Ollama `num_ctx` override (0 = model default) |
| `input_token_limit` | `INTEGER` | `0` | Max input tokens before truncation (0 = no limit) |
| `token_budget` | `INTEGER` | `0` | Total tokens per hour budget (0 = unlimited) |
| `timeout_seconds` | `INTEGER` | `120` | Timeout override |
| `max_retries` | `INTEGER` | `2` | Retry count override |
| `is_active` | `BOOLEAN` | `FALSE` | Whether this variant is the active override |
| `created_at` | `TIMESTAMPTZ` | `NOW()` | Creation timestamp |
| `updated_at` | `TIMESTAMPTZ` | `NOW()` | Last update timestamp |
**Indexes and Constraints:**
- `idx_agent_variants_slug` — unique index on `(agent_id, variant_slug)` — each agent's variant slugs must be unique
- `idx_agent_variants_active` — unique partial index on `(agent_id) WHERE is_active = TRUE`**at most one active variant per agent** (database-enforced)
- `idx_agent_variants_agent` — lookup by agent
### `agent_performance_log` Table
Defined in migration `026_ai_agents.sql`, extended in `027_agent_variants.sql` with `variant_id`.
| Column | Type | Default | Description |
|--------|------|---------|-------------|
| `id` | `UUID` | `gen_random_uuid()` | Primary key |
| `agent_id` | `UUID` | — | Foreign key → `ai_agents(id)` (CASCADE delete) |
| `variant_id` | `UUID` | `NULL` | Foreign key → `agent_variants(id)` (SET NULL on delete) |
| `document_id` | `UUID` | `NULL` | Foreign key → `documents(id)` (SET NULL on delete) |
| `ticker` | `VARCHAR(20)` | — | Stock ticker processed |
| `success` | `BOOLEAN` | — | Whether the invocation succeeded |
| `duration_ms` | `INTEGER` | `0` | Total invocation time in milliseconds |
| `confidence` | `FLOAT` | `0.0` | Model confidence score (0.0 for thesis rewrites) |
| `retry_count` | `INTEGER` | `0` | Number of retries before success/failure |
| `input_tokens` | `INTEGER` | `0` | Estimated input tokens (chars / 4) |
| `output_tokens` | `INTEGER` | `0` | Estimated output tokens (chars / 4) |
| `error_message` | `TEXT` | `NULL` | Error description on failure |
| `recorded_at` | `TIMESTAMPTZ` | `NOW()` | When the invocation occurred |
**Indexes:**
- `idx_agent_perf_agent` on `(agent_id, recorded_at DESC)`
- `idx_agent_perf_time` on `(recorded_at DESC)`
- `idx_agent_perf_variant` on `(variant_id, recorded_at DESC)`
---
## AgentConfigResolver
**Module:** `services/shared/agent_config.py`
The `AgentConfigResolver` is the central mechanism for resolving runtime agent configuration. All three agent services use it instead of duplicating resolution logic.
### How It Works
1. **Lookup by slug**: The resolver queries the `ai_agents` table by slug (e.g., `"document-extractor"`), joining with `agent_variants` to find any active variant.
2. **COALESCE-based override**: The SQL query uses `COALESCE(variant_column, agent_column)` for every configuration field. If an active variant exists and has a non-NULL value for a field, that value is used. Otherwise, the base agent's value is used.
```sql
SELECT a.id AS agent_id,
v.id AS variant_id,
COALESCE(v.model_provider, a.model_provider) AS model_provider,
COALESCE(v.model_name, a.model_name) AS model_name,
COALESCE(v.system_prompt, a.system_prompt) AS system_prompt,
COALESCE(v.user_prompt_template, a.user_prompt_template) AS user_prompt_template,
-- ... all other fields ...
FROM ai_agents a
LEFT JOIN agent_variants v
ON v.agent_id = a.id AND v.is_active = TRUE
WHERE a.slug = $1
AND a.active = TRUE
```
3. **TTL cache (60 seconds)**: Resolved configurations are cached in memory using `time.monotonic()`. Cache entries expire after 60 seconds (configurable via `ttl_seconds`). This means variant swaps take effect within 60 seconds without restarting any service.
4. **Fallback behavior**: If the database query fails or returns no rows (agent not found or inactive), the resolver returns `None`. Callers fall back to environment-variable-based `OllamaConfig` defaults.
### Resolved Config Dataclass
```python
@dataclass(frozen=True, slots=True)
class ResolvedAgentConfig:
agent_id: str
variant_id: str | None # None if no active variant
model_provider: str
model_name: str
system_prompt: str
user_prompt_template: str
prompt_version: str
temperature: float
max_tokens: int
context_window: int # Ollama num_ctx; 0 = model default
input_token_limit: int # Max input chars before truncation; 0 = no limit
token_budget: int # Hourly token budget; 0 = unlimited
timeout_seconds: int
max_retries: int
```
### Usage Pattern
```python
from services.shared.agent_config import AgentConfigResolver
resolver = AgentConfigResolver(pool, ttl_seconds=60)
config = await resolver.resolve("document-extractor")
if config is None:
# Fall back to env-var defaults
...
else:
# Use config.model_name, config.system_prompt, etc.
...
```
### Cache Invalidation
```python
resolver.invalidate("document-extractor") # Clear one entry
resolver.invalidate() # Clear all entries
```
### Config Refresh in Workers
The extractor and recommendation workers periodically re-resolve their agent config (every 100 jobs for the extractor, every 50 jobs for the recommendation worker). If the resolved model changes, the worker creates a new `OllamaClient` instance with the updated configuration.
---
## Performance Logging and Variant Comparison
Every agent invocation is logged to `agent_performance_log` with the `agent_id` and `variant_id` (if a variant was active). This enables comparing variant effectiveness.
### What Gets Logged
- **Document extractor**: Logged in `services/extractor/main.py` after each extraction. Records success/failure, duration, confidence, retry count, token estimates.
- **Event classifier**: Logged in `services/extractor/event_classifier.py` after each classification. Same fields.
- **Thesis rewriter**: Logged in `services/recommendation/thesis_llm.py` after each rewrite attempt. Confidence is always 0.0 (not applicable for rewrites).
### Querying for Variant Comparison
Compare two variants of the document extractor over the last 24 hours:
```sql
SELECT
v.variant_name,
COUNT(*) AS total_invocations,
COUNT(*) FILTER (WHERE p.success) AS successes,
ROUND(100.0 * COUNT(*) FILTER (WHERE p.success) / COUNT(*), 1) AS success_rate_pct,
ROUND(AVG(p.duration_ms)::numeric) AS avg_duration_ms,
ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY p.duration_ms)::numeric) AS p95_duration_ms,
ROUND(AVG(p.confidence)::numeric, 4) AS avg_confidence,
ROUND(AVG(p.retry_count)::numeric, 2) AS avg_retries,
SUM(p.input_tokens + p.output_tokens) AS total_tokens
FROM agent_performance_log p
JOIN agent_variants v ON v.id = p.variant_id
WHERE p.agent_id = '<agent-uuid>'
AND p.recorded_at >= NOW() - INTERVAL '24 hours'
GROUP BY v.variant_name
ORDER BY success_rate_pct DESC;
```
Compare base agent (no variant) vs active variant:
```sql
SELECT
CASE WHEN p.variant_id IS NULL THEN 'base' ELSE v.variant_name END AS config,
COUNT(*) AS invocations,
ROUND(100.0 * COUNT(*) FILTER (WHERE p.success) / COUNT(*), 1) AS success_rate_pct,
ROUND(AVG(p.duration_ms)::numeric) AS avg_duration_ms,
ROUND(AVG(p.confidence)::numeric, 4) AS avg_confidence
FROM agent_performance_log p
LEFT JOIN agent_variants v ON v.id = p.variant_id
WHERE p.agent_id = '<agent-uuid>'
AND p.recorded_at >= NOW() - INTERVAL '48 hours'
GROUP BY config
ORDER BY config;
```
### Token Budget Enforcement
Variants can set a `token_budget` (total tokens per hour). Before each invocation, the worker checks:
```sql
SELECT COALESCE(SUM(input_tokens + output_tokens), 0) AS total_tokens
FROM agent_performance_log
WHERE variant_id = $1
AND recorded_at >= NOW() - INTERVAL '1 hour'
```
If the budget is exceeded, the invocation is skipped (extractor) or falls back to the deterministic thesis (thesis rewriter).
---
## API Endpoints
All agent endpoints are served by the Query API (`services/api/app.py`) under the `/api/agents` prefix.
### Agent CRUD
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/agents` | List all agents. Query param: `active_only` (bool, default `false`) |
| `GET` | `/api/agents/{agent_id}` | Get a single agent by UUID |
| `POST` | `/api/agents` | Create a new user-defined agent (returns 201) |
| `PUT` | `/api/agents/{agent_id}` | Partial update an agent (system or user) |
| `DELETE` | `/api/agents/{agent_id}` | Delete a user-created agent. Returns 403 for system agents |
**Create Agent Request Body:**
```json
{
"name": "My Custom Agent",
"slug": "my-custom-agent",
"purpose": "Custom extraction for earnings calls",
"model_provider": "ollama",
"model_name": "llama3.1:8b",
"system_prompt": "You are a financial analyst...",
"user_prompt_template": "",
"prompt_version": "v1",
"schema_version": "1.0.0",
"temperature": 0.0,
"max_tokens": 32768,
"timeout_seconds": 120,
"max_retries": 2
}
```
**Update Agent Request Body** (all fields optional):
```json
{
"model_name": "qwen3.5:14b",
"system_prompt": "Updated prompt...",
"temperature": 0.1,
"active": false
}
```
### Agent Performance
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/agents/{agent_id}/performance` | Aggregated metrics. Query param: `hours` (int, default 24, max 720) |
| `GET` | `/api/agents/{agent_id}/performance/history` | Hourly time-series. Query param: `hours` (int, default 24, max 720) |
**Performance Response:**
```json
{
"total_invocations": 1250,
"successes": 1180,
"failures": 70,
"avg_duration_ms": 3400,
"p95_duration_ms": 8200,
"avg_confidence": 0.7234,
"avg_retries": 0.15,
"total_input_tokens": 5000000,
"total_output_tokens": 1200000,
"success_rate": 0.944
}
```
### Variant CRUD
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/agents/{agent_id}/variants` | List all variants for an agent |
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}` | Get a single variant |
| `POST` | `/api/agents/{agent_id}/variants` | Create a new variant (returns 201, 409 on duplicate slug) |
| `PUT` | `/api/agents/{agent_id}/variants/{variant_id}` | Partial update a variant |
| `DELETE` | `/api/agents/{agent_id}/variants/{variant_id}` | Delete a variant (returns 400 if active) |
### Clone Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/agents/{agent_id}/clone` | Clone an agent's base config as a new variant |
| `POST` | `/api/agents/{agent_id}/variants/{variant_id}/clone` | Clone an existing variant as a new variant |
Clone requests copy all configuration fields from the source, with optional overrides in the request body.
### Activate / Deactivate
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/agents/{agent_id}/variants/{variant_id}/activate` | Set a variant as active (deactivates any other active variant in a single transaction) |
| `POST` | `/api/agents/{agent_id}/variants/deactivate` | Deactivate the currently active variant (agent falls back to base config) |
### Per-Variant Performance
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance` | Aggregated metrics for a specific variant |
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance/history` | Hourly time-series for a specific variant |
---
## Step-by-Step: Creating and Activating a Variant
This walkthrough creates a new variant of the document extractor that uses a different model and activates it for live traffic.
### 1. Find the Agent ID
```bash
curl -s https://stonks-api.celestium.life/api/agents?active_only=true | jq '.[] | select(.slug == "document-extractor") | .id'
```
Note the UUID — we'll call it `AGENT_ID`.
### 2. Clone the Agent as a Variant
```bash
curl -s -X POST https://stonks-api.celestium.life/api/agents/$AGENT_ID/clone \
-H "Content-Type: application/json" \
-d '{
"variant_name": "Llama 3.1 8B Test",
"description": "Testing llama3.1:8b as an alternative to qwen3.5:9b-fast",
"model_name": "llama3.1:8b",
"temperature": 0.1
}' | jq .
```
This creates a new variant with all fields copied from the base agent, except `model_name` and `temperature` which are overridden. The variant starts as `is_active: false`.
Note the variant's `id` — we'll call it `VARIANT_ID`.
### 3. Activate the Variant
```bash
curl -s -X POST \
https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID/activate | jq .
```
This atomically deactivates any previously active variant and activates the new one. Within 60 seconds (the TTL cache window), the extractor worker will pick up the new configuration and start using `llama3.1:8b`.
### 4. Monitor Performance
Wait for some documents to be processed, then compare:
```bash
# Base agent performance (all invocations)
curl -s "https://stonks-api.celestium.life/api/agents/$AGENT_ID/performance?hours=4" | jq .
# Variant-specific performance
curl -s "https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID/performance?hours=4" | jq .
```
Check the hourly trend:
```bash
curl -s "https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID/performance/history?hours=12" | jq .
```
### 5. Roll Back (Deactivate)
If the variant underperforms, deactivate it to revert to the base agent config:
```bash
curl -s -X POST \
https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/deactivate | jq .
```
The extractor will revert to the base `qwen3.5:9b-fast` configuration within 60 seconds.
### 6. Iterate
You can update the variant's prompt or parameters without creating a new one:
```bash
curl -s -X PUT \
https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID \
-H "Content-Type: application/json" \
-d '{
"system_prompt": "You are a financial document analyst. Extract structured data as JSON. Be extra conservative with impact scores — only assign > 0.7 for material events with concrete numbers.",
"prompt_version": "document-intel-v2-conservative"
}' | jq .
```
Then re-activate and compare again.
File diff suppressed because it is too large Load Diff
+274
View File
@@ -0,0 +1,274 @@
# Data Pipeline Architecture — Stonks Oracle
This document describes the end-to-end data pipeline from external data sources through signal processing to trade execution. The pipeline is queue-driven, with Redis lists connecting each stage and PostgreSQL/MinIO providing durable storage at every step.
All queue names follow the convention `stonks:queue:<name>` (see `services/shared/redis_keys.py`). Dead-letter queues mirror the pattern as `stonks:dlq:<name>`.
## Pipeline Overview
```mermaid
flowchart TB
%% ── External Data Sources ─────────────────────────────────────
subgraph sources ["External Data Sources"]
direction LR
polygon["Polygon.io<br/><i>News, Market Bars,<br/>Grouped Daily</i>"]
sec["SEC EDGAR<br/><i>10-K, 10-Q Filings</i>"]
macro_src["Macro News APIs<br/><i>Geopolitical &amp;<br/>Economic Events</i>"]
market_src["Market Data API<br/><i>Intraday Bars,<br/>Grouped Daily</i>"]
end
%% ── Scheduler ─────────────────────────────────────────────────
scheduler["<b>Scheduler</b><br/><i>services.scheduler.app</i><br/>Cadence polling, rate limiting,<br/>backoff &amp; stale recovery"]
sources -.->|"API polling<br/>on cadence"| scheduler
%% ── Ingestion Queue ───────────────────────────────────────────
q_ingestion[["stonks:queue:ingestion"]]
scheduler -->|"rpush job"| q_ingestion
%% ── Ingestion Worker ──────────────────────────────────────────
ingestion["<b>Ingestion</b><br/><i>services.ingestion.worker</i><br/>Adapter dispatch, dedupe,<br/>raw artifact upload"]
q_ingestion -->|"lpop"| ingestion
%% ── Raw Storage ───────────────────────────────────────────────
minio_raw[("MinIO<br/><i>Raw Artifacts</i><br/>JSON / HTML")]
pg_docs[("PostgreSQL<br/><i>documents,<br/>ingestion_runs</i>")]
redis_dedupe[("Redis<br/><i>Dedupe Markers</i><br/>stonks:dedupe:*")]
ingestion -->|"upload raw payload"| minio_raw
ingestion -->|"persist metadata"| pg_docs
ingestion -->|"set content hash"| redis_dedupe
%% ── Parsing Queue ─────────────────────────────────────────────
q_parsing[["stonks:queue:parsing"]]
ingestion -->|"rpush<br/>(news, filings,<br/>web_scrape)"| q_parsing
%% ── Parser Worker ─────────────────────────────────────────────
parser["<b>Parser</b><br/><i>services.parser.worker</i><br/>HTML parsing, quality scoring,<br/>company mention detection"]
q_parsing -->|"lpop"| parser
minio_norm[("MinIO<br/><i>Normalized Text</i><br/><i>Parser Output JSON</i>")]
parser -->|"upload normalized text"| minio_norm
parser -->|"update document status,<br/>insert mentions"| pg_docs
```
## Three Signal Layers
The parser routes documents into two extraction paths based on `document_type`. All three signal layers converge at the aggregation stage through the shared `WeightedSignal` abstraction.
```mermaid
flowchart TB
%% ── Parser Output ─────────────────────────────────────────────
parser(("Parser"))
%% ── Extraction Queues ─────────────────────────────────────────
q_extraction[["stonks:queue:extraction"]]
q_macro[["stonks:queue:macro_classification"]]
parser -->|"rpush<br/>(standard docs)"| q_extraction
parser -->|"rpush<br/>(macro_event docs)"| q_macro
%% ── Extractor Worker ──────────────────────────────────────────
subgraph extractor_svc ["Extractor Service"]
direction TB
ext_main["<b>Extractor</b><br/><i>services.extractor.main</i><br/>Alternates between queues<br/>(2 extraction : 1 macro)"]
end
q_extraction -->|"lpop"| ext_main
q_macro -->|"lpop"| ext_main
%% ── Ollama LLM ───────────────────────────────────────────────
ollama["<b>Ollama</b><br/><i>LLM Inference</i><br/>document-extractor agent<br/>event-classifier agent"]
ext_main <-->|"HTTP /api/generate"| ollama
%% ── Signal Layer 1: Company ───────────────────────────────────
subgraph layer1 ["Layer 1 — Company Signals"]
direction LR
di["document_intelligence<br/>document_impact_records"]
end
ext_main -->|"persist extraction<br/>(standard docs)"| di
%% ── Signal Layer 2: Macro ─────────────────────────────────────
subgraph layer2 ["Layer 2 — Macro Signals"]
direction LR
ge["global_events"]
mir["macro_impact_records<br/><i>per-company interpolation</i>"]
ge --> mir
end
ext_main -->|"classify &amp; persist<br/>(macro_event docs)"| ge
ext_main -->|"compute_macro_impact<br/>for all tracked companies"| mir
%% ── Aggregation Queue ─────────────────────────────────────────
q_agg[["stonks:queue:aggregation"]]
ext_main -->|"rpush<br/>(per ticker)"| q_agg
%% ── Aggregation Worker ────────────────────────────────────────
aggregation["<b>Aggregation</b><br/><i>services.aggregation.main</i><br/>Trend windows, scoring,<br/>contradiction detection"]
q_agg -->|"lpop"| aggregation
%% ── Signal Layer 3: Competitive ──────────────────────────────
subgraph layer3 ["Layer 3 — Competitive Signals"]
direction LR
pm["pattern_matcher<br/><i>historical patterns</i>"]
sp["signal_propagation<br/><i>cross-company signals</i>"]
csr["competitive_signal_records"]
pm --> sp --> csr
end
aggregation -->|"trigger_signal_propagation<br/>(when competitive_enabled)"| layer3
%% ── All layers merge ──────────────────────────────────────────
pg_trends[("PostgreSQL<br/><i>trend_windows,<br/>trend_history,<br/>trend_projections</i>")]
di -->|"WeightedSignal"| aggregation
mir -->|"WeightedSignal"| aggregation
csr -->|"WeightedSignal"| aggregation
aggregation -->|"persist trend summaries"| pg_trends
```
## Recommendation → Trading → Broker
```mermaid
flowchart TB
%% ── Recommendation Queue ──────────────────────────────────────
q_rec[["stonks:queue:recommendation"]]
aggregation(("Aggregation")) -->|"rpush<br/>(ticker + window,<br/>dedup 5 min TTL)"| q_rec
%% ── Recommendation Worker ─────────────────────────────────────
recommendation["<b>Recommendation</b><br/><i>services.recommendation.main</i><br/>Eligibility, suppression,<br/>thesis generation"]
q_rec -->|"lpop"| recommendation
ollama_thesis["<b>Ollama</b><br/><i>thesis-rewriter agent</i><br/>(optional LLM rewrite)"]
recommendation <-->|"rewrite thesis<br/>(trading-eligible only)"| ollama_thesis
pg_recs[("PostgreSQL<br/><i>recommendations,<br/>recommendation_evidence,<br/>risk_evaluations</i>")]
recommendation -->|"persist recommendation<br/>+ evidence + risk eval"| pg_recs
%% ── Trading Engine ────────────────────────────────────────────
subgraph trading_loop ["Trading Engine Decision Loop"]
direction TB
poll["Poll recommendations<br/><i>action IN (buy, sell)<br/>mode IN (paper, live)<br/>generated_at &gt; last_poll</i>"]
dedup_check["Redis dedup check<br/><i>stonks:dedupe:trading:*</i>"]
evaluate["evaluate_recommendation<br/><i>Circuit breaker check<br/>Trading window check<br/>Confidence gate<br/>Sector exposure check<br/>Correlation check<br/>Earnings blackout</i>"]
size["Position sizing<br/><i>Kelly criterion,<br/>risk tier limits</i>"]
decide{{"Decision"}}
poll --> dedup_check --> evaluate --> size --> decide
end
pg_recs -->|"SELECT recent<br/>recommendations"| poll
%% ── Broker Queue ──────────────────────────────────────────────
q_broker[["stonks:queue:broker_orders"]]
decide -->|"act → rpush<br/>order job"| q_broker
decide -->|"skip → persist<br/>decision only"| pg_decisions
pg_decisions[("PostgreSQL<br/><i>trading_decisions</i>")]
%% ── Broker Adapter ────────────────────────────────────────────
broker["<b>Broker Adapter</b><br/><i>services.adapters.broker_service</i><br/>Risk evaluation, idempotency,<br/>order submission, fill tracking"]
q_broker -->|"lpop"| broker
%% ── Risk Engine ───────────────────────────────────────────────
risk["<b>Risk Engine</b><br/><i>services.risk.app</i><br/>POST /evaluate<br/>Approval workflow"]
broker <-->|"evaluate order"| risk
%% ── Alpaca ────────────────────────────────────────────────────
alpaca["<b>Alpaca</b><br/><i>Paper Trading API</i><br/>Order submission,<br/>position sync"]
broker <-->|"submit order /<br/>sync positions"| alpaca
pg_orders[("PostgreSQL<br/><i>orders, order_events,<br/>positions,<br/>portfolio_snapshots</i>")]
broker -->|"persist order,<br/>events, positions"| pg_orders
%% ── Notifications ─────────────────────────────────────────────
subgraph notifications ["Notifications"]
direction LR
sns["AWS SNS<br/><i>SMS alerts</i>"]
gmail["Gmail SMTP<br/><i>Email alerts</i>"]
end
trading_loop -->|"circuit breaker trips,<br/>order fills,<br/>stop-loss triggers"| notifications
```
## Analytical Branch — Lake Publisher
The lake publisher runs as a separate worker, consuming from its own queue and writing partitioned Parquet fact tables to MinIO for analytical queries.
```mermaid
flowchart LR
%% ── Lake Publish Queue ────────────────────────────────────────
q_lake[["stonks:queue:lake_publish"]]
various(("Various Services<br/><i>ingestion, extractor,<br/>recommendation,<br/>broker adapter</i>"))
various -->|"enqueue_lake_job"| q_lake
%% ── Lake Publisher Worker ─────────────────────────────────────
lake["<b>Lake Publisher</b><br/><i>services.lake_publisher.jobs</i><br/>Transforms operational data<br/>into analytical facts"]
q_lake -->|"lpop"| lake
pg_source[("PostgreSQL<br/><i>Operational Tables</i><br/>documents, extractions,<br/>orders, positions, events")]
lake -->|"query source data"| pg_source
%% ── MinIO Parquet ─────────────────────────────────────────────
minio_lake[("MinIO<br/><i>Lakehouse Bucket</i><br/>Partitioned Parquet<br/>/year=/month=/day=")]
lake -->|"write Parquet files"| minio_lake
%% ── Trino ─────────────────────────────────────────────────────
trino["<b>Trino</b><br/><i>SQL Query Engine</i><br/>Hive connector → MinIO"]
minio_lake -->|"read via<br/>Hive Metastore"| trino
hive["<b>Hive Metastore</b><br/><i>Schema catalog</i>"]
trino <-->|"table metadata"| hive
hive -->|"location refs"| minio_lake
%% ── Visualization ─────────────────────────────────────────────
superset["<b>Superset</b><br/><i>Dashboards &amp;<br/>SQL Lab</i>"]
dashboard["<b>React Dashboard</b><br/><i>frontend</i><br/>Charts, portfolio,<br/>recommendations"]
query_api["<b>Query API</b><br/><i>services.api.app</i>"]
trino --> superset
trino --> query_api
query_api --> dashboard
```
## Complete Queue Topology
| Queue | Full Key | Producer(s) | Consumer |
|-------|----------|-------------|----------|
| Ingestion | `stonks:queue:ingestion` | Scheduler | Ingestion Worker |
| Parsing | `stonks:queue:parsing` | Ingestion Worker | Parser Worker |
| Extraction | `stonks:queue:extraction` | Parser (standard docs) | Extractor Worker |
| Macro Classification | `stonks:queue:macro_classification` | Parser (macro_event docs), Scheduler | Extractor Worker |
| Aggregation | `stonks:queue:aggregation` | Extractor Worker | Aggregation Worker |
| Recommendation | `stonks:queue:recommendation` | Aggregation Worker | Recommendation Worker |
| Broker Orders | `stonks:queue:broker_orders` | Trading Engine, Trading API (manual overrides) | Broker Adapter |
| Lake Publish | `stonks:queue:lake_publish` | Various services | Lake Publisher |
Dead-letter queues follow the pattern `stonks:dlq:<queue_name>` and are populated when a job exhausts its retry budget.
## Data Store Summary
| Store | Role | Key Tables / Buckets |
|-------|------|---------------------|
| **PostgreSQL** | Structured operational data | `documents`, `document_intelligence`, `document_impact_records`, `global_events`, `macro_impact_records`, `competitive_signal_records`, `trend_windows`, `trend_history`, `trend_projections`, `recommendations`, `recommendation_evidence`, `risk_evaluations`, `orders`, `order_events`, `positions`, `portfolio_snapshots`, `trading_decisions` |
| **Redis** | Queues, dedup markers, rate limits, circuit breaker state | `stonks:queue:*`, `stonks:dedupe:*`, `stonks:ratelimit:*`, `stonks:trading:circuit_breaker:*`, `stonks:dlq:*` |
| **MinIO** | Object storage for raw artifacts, normalized text, and analytical Parquet files | Raw artifacts bucket, normalized text bucket, lakehouse bucket (partitioned Parquet) |
## External Integration Points
| Integration | Service | Protocol | Purpose |
|-------------|---------|----------|---------|
| **Polygon.io** | Ingestion (via adapters) | HTTPS REST | News articles, market bars, grouped daily data |
| **SEC EDGAR** | Ingestion (via FilingsDataAdapter) | HTTPS REST | 10-K, 10-Q filings |
| **Ollama** | Extractor, Recommendation | HTTP `/api/generate` | LLM inference for document extraction, event classification, thesis rewriting |
| **Alpaca** | Broker Adapter | HTTPS REST | Paper trading order submission, position sync, account state |
| **AWS SNS** | Trading Engine (notifications) | boto3 SDK | SMS alerts for circuit breaker trips, order fills, stop-loss triggers |
| **Gmail** | Trading Engine (notifications) | SMTP (port 587 STARTTLS) | Email alerts for trading events |
| **Trino** | Query API, Superset | JDBC / HTTP | SQL queries over lakehouse Parquet files |
+322
View File
@@ -0,0 +1,322 @@
# Docker Compose Architecture — Stonks Oracle
This document describes the Docker Compose deployment topology for Stonks Oracle, derived from the `docker-compose.yml` file at the repository root.
All containers run on a single Docker network created by Compose. Infrastructure services (PostgreSQL, Redis, MinIO, Ollama, Trino, Hive Metastore, Superset) start first, and application services wait for their dependencies via `depends_on` with health check conditions.
## Container Topology Diagram
```mermaid
graph TB
%% ── Host machine ──────────────────────────────────────────────
host((Host Machine))
%% ── .env file ─────────────────────────────────────────────────
envfile[".env file<br/><i>MARKET_DATA_API_KEY</i><br/><i>BROKER_API_KEY</i><br/><i>BROKER_API_SECRET</i><br/><i>BROKER_BASE_URL</i>"]
%% ── Docker Compose default network ────────────────────────────
subgraph network ["Docker Compose Network (default)"]
direction TB
%% ── Infrastructure Containers ─────────────────────────────
subgraph infra ["Infrastructure Containers"]
direction LR
postgres[("postgres<br/><i>postgres:16-alpine</i><br/>host :5432 → :5432")]
redis[("redis<br/><i>redis:7-alpine</i><br/>host :6379 → :6379")]
minio[("minio<br/><i>minio/minio:latest</i><br/>host :9000 → :9000<br/>host :9001 → :9001")]
ollama[("ollama<br/><i>ollama/ollama:latest</i><br/>host :11434 → :11434")]
end
subgraph infra_init ["Infrastructure Init"]
minio_init["minio-init<br/><i>minio/mc:latest</i><br/><i>Creates buckets on startup</i>"]
end
subgraph analytics ["Analytics Containers"]
direction LR
hive_metastore["hive-metastore<br/><i>apache/hive:4.0.0</i><br/>host :9083 → :9083"]
trino["trino<br/><i>trinodb/trino:latest</i><br/>host :8080 → :8080"]
superset["superset<br/><i>apache/superset:latest</i><br/>host :8088 → :8088"]
end
%% ── Application Containers ────────────────────────────────
subgraph api_tier ["API Tier"]
direction LR
query_api["query-api<br/><i>docker/Dockerfile</i><br/><i>uvicorn services.api.app</i><br/>host :8004 → :8000"]
symbol_registry["symbol-registry<br/><i>docker/Dockerfile</i><br/><i>uvicorn services.symbol_registry.app</i><br/>host :8001 → :8000"]
end
subgraph frontend_tier ["Frontend Tier"]
dashboard["dashboard<br/><i>frontend/Dockerfile</i><br/><i>nginx on :8080</i><br/>host :3000 → :8080"]
end
subgraph trading_tier ["Trading Tier"]
direction LR
trading_engine["trading-engine<br/><i>docker/Dockerfile</i><br/><i>uvicorn services.trading.app</i><br/>host :8002 → :8000"]
risk_engine["risk-engine<br/><i>docker/Dockerfile</i><br/><i>uvicorn services.risk.app</i><br/>host :8003 → :8000"]
broker_adapter["broker-adapter<br/><i>docker/Dockerfile</i><br/><i>python -m services.adapters.broker_service</i><br/><i>no host port</i>"]
end
subgraph orchestration_tier ["Orchestration Tier"]
scheduler["scheduler<br/><i>docker/Dockerfile.scheduler</i><br/><i>no host port</i>"]
end
subgraph processing_tier ["Processing Tier (pipeline workers)"]
direction LR
ingestion["ingestion<br/><i>docker/Dockerfile</i><br/><i>python -m services.ingestion.worker</i><br/><i>no host port</i>"]
parser["parser<br/><i>docker/Dockerfile</i><br/><i>python -m services.parser.worker</i><br/><i>no host port</i>"]
extractor["extractor<br/><i>docker/Dockerfile</i><br/><i>python -m services.extractor.main</i><br/><i>no host port</i>"]
aggregation["aggregation<br/><i>docker/Dockerfile</i><br/><i>python -m services.aggregation.main</i><br/><i>no host port</i>"]
recommendation["recommendation<br/><i>docker/Dockerfile</i><br/><i>python -m services.recommendation.main</i><br/><i>no host port</i>"]
end
subgraph analytics_worker ["Analytics Worker"]
lake_publisher["lake-publisher<br/><i>docker/Dockerfile</i><br/><i>python -m services.lake_publisher.jobs</i><br/><i>no host port</i>"]
end
end
%% ── Host port access ──────────────────────────────────────────
host -->|":5432"| postgres
host -->|":6379"| redis
host -->|":9000 / :9001"| minio
host -->|":11434"| ollama
host -->|":8080"| trino
host -->|":9083"| hive_metastore
host -->|":8088"| superset
host -->|":8001"| symbol_registry
host -->|":8004"| query_api
host -->|":8002"| trading_engine
host -->|":8003"| risk_engine
host -->|":3000"| dashboard
%% ── .env injection ────────────────────────────────────────────
envfile -.->|"env_file: .env"| ingestion
envfile -.->|"env_file: .env"| broker_adapter
envfile -.->|"env_file: .env"| trading_engine
%% ── Styles ────────────────────────────────────────────────────
classDef infraSvc fill:#95a5a6,stroke:#717d7e,color:#fff
classDef analyticsSvc fill:#e74c3c,stroke:#a93226,color:#fff
classDef apiSvc fill:#4a90d9,stroke:#2c5f8a,color:#fff
classDef frontendSvc fill:#50c878,stroke:#2e7d46,color:#fff
classDef tradingSvc fill:#e8a838,stroke:#b07d1a,color:#fff
classDef orchSvc fill:#1abc9c,stroke:#148f77,color:#fff
classDef processSvc fill:#9b59b6,stroke:#6c3483,color:#fff
classDef initSvc fill:#bdc3c7,stroke:#7f8c8d,color:#333
classDef envSvc fill:#f5f5dc,stroke:#999,color:#333
class postgres,redis,minio,ollama infraSvc
class hive_metastore,trino,superset,lake_publisher analyticsSvc
class query_api,symbol_registry apiSvc
class dashboard frontendSvc
class trading_engine,risk_engine,broker_adapter tradingSvc
class scheduler orchSvc
class ingestion,parser,extractor,aggregation,recommendation processSvc
class minio_init initSvc
class envfile envSvc
```
## Dependency Graph
The following diagram shows `depends_on` relationships and health check conditions. Solid arrows indicate `condition: service_healthy` (the dependent waits for the health check to pass). Dashed arrows indicate `condition: service_started` (the dependent waits only for the container to start).
```mermaid
graph LR
%% ── Infrastructure health checks ──────────────────────────────
postgres[("postgres<br/><i>pg_isready -U stonks</i>")]
redis[("redis<br/><i>redis-cli ping</i>")]
minio[("minio<br/><i>mc ready local</i>")]
ollama[("ollama<br/><i>no health check</i>")]
%% ── Analytics dependencies ────────────────────────────────────
hive["hive-metastore"] -->|started| minio
trino["trino"] -->|started| minio
trino -->|started| hive
superset["superset"] -->|started| trino
minio_init["minio-init"] -->|healthy| minio
%% ── Application depends_on (healthy) ──────────────────────────
scheduler["scheduler"] -->|healthy| postgres
scheduler -->|healthy| redis
symbol_registry["symbol-registry"] -->|healthy| postgres
ingestion["ingestion"] -->|healthy| postgres
ingestion -->|healthy| redis
ingestion -->|healthy| minio
parser["parser"] -->|healthy| postgres
parser -->|healthy| redis
extractor["extractor"] -->|healthy| postgres
extractor -->|healthy| redis
extractor -.->|started| ollama
aggregation["aggregation"] -->|healthy| postgres
aggregation -->|healthy| redis
recommendation["recommendation"] -->|healthy| postgres
recommendation -->|healthy| redis
trading_engine["trading-engine"] -->|healthy| postgres
trading_engine -->|healthy| redis
risk_engine["risk-engine"] -->|healthy| postgres
broker_adapter["broker-adapter"] -->|healthy| postgres
broker_adapter -->|healthy| redis
lake_publisher["lake-publisher"] -->|healthy| postgres
lake_publisher -->|healthy| minio
query_api["query-api"] -->|healthy| postgres
query_api -->|healthy| redis
query_api -->|healthy| minio
dashboard["dashboard"] -->|healthy| query_api
%% ── Styles ────────────────────────────────────────────────────
classDef infraSvc fill:#95a5a6,stroke:#717d7e,color:#fff
classDef appSvc fill:#4a90d9,stroke:#2c5f8a,color:#fff
classDef analyticsSvc fill:#e74c3c,stroke:#a93226,color:#fff
classDef initSvc fill:#bdc3c7,stroke:#7f8c8d,color:#333
class postgres,redis,minio,ollama infraSvc
class scheduler,symbol_registry,ingestion,parser,extractor,aggregation,recommendation,trading_engine,risk_engine,broker_adapter,lake_publisher,query_api,dashboard appSvc
class hive,trino,superset analyticsSvc
class minio_init initSvc
```
## Named Volumes
Docker Compose defines five named volumes for persistent data:
```mermaid
graph LR
pgdata["📦 pgdata"]
miniodata["📦 miniodata"]
ollama_models["📦 ollama_models"]
hive_data["📦 hive_data"]
superset_data["📦 superset_data"]
pgdata -->|"/var/lib/postgresql/data"| postgres[("postgres")]
miniodata -->|"/data"| minio[("minio")]
ollama_models -->|"/root/.ollama"| ollama[("ollama")]
hive_data -->|"/opt/hive/data"| hive["hive-metastore"]
superset_data -->|"/app/superset_home"| superset["superset"]
classDef volStyle fill:#f5f5dc,stroke:#999,color:#333
classDef svcStyle fill:#95a5a6,stroke:#717d7e,color:#fff
class pgdata,miniodata,ollama_models,hive_data,superset_data volStyle
class postgres,minio,ollama,hive,superset svcStyle
```
| Volume | Mount Point | Container | Purpose |
|--------|-------------|-----------|---------|
| `pgdata` | `/var/lib/postgresql/data` | postgres | PostgreSQL database files |
| `miniodata` | `/data` | minio | MinIO object storage data |
| `ollama_models` | `/root/.ollama` | ollama | Downloaded LLM model weights |
| `hive_data` | `/opt/hive/data` | hive-metastore | Hive Metastore embedded Derby DB |
| `superset_data` | `/app/superset_home` | superset | Superset configuration and metadata |
### Bind Mounts
In addition to named volumes, several containers use bind mounts for configuration files:
| Host Path | Mount Point | Container | Mode |
|-----------|-------------|-----------|------|
| `./infra/migrations/` | `/docker-entrypoint-initdb.d` | postgres | rw (init scripts) |
| `./infra/trino/catalog/` | `/etc/trino/catalog` | trino | rw |
| `./infra/hive/core-site.xml` | `/opt/hive/conf/core-site.xml` | hive-metastore | ro |
| `./infra/hive/metastore-site.xml` | `/opt/hive/conf/metastore-site.xml` | hive-metastore | ro |
## Host Port Mappings
Services accessible from the host machine:
| Host Port | Container | Container Port | Service |
|-----------|-----------|----------------|---------|
| 5432 | postgres | 5432 | PostgreSQL database |
| 6379 | redis | 6379 | Redis cache and queues |
| 9000 | minio | 9000 | MinIO S3 API |
| 9001 | minio | 9001 | MinIO web console |
| 11434 | ollama | 11434 | Ollama LLM API |
| 8080 | trino | 8080 | Trino query engine |
| 9083 | hive-metastore | 9083 | Hive Metastore thrift |
| 8088 | superset | 8088 | Superset dashboard |
| 8001 | symbol-registry | 8000 | Symbol Registry API |
| 8002 | trading-engine | 8000 | Trading Engine API |
| 8003 | risk-engine | 8000 | Risk Engine API |
| 8004 | query-api | 8000 | Query API |
| 3000 | dashboard | 8080 | React dashboard (nginx) |
Services without host port mappings (internal only): scheduler, ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher, minio-init.
## Environment Configuration
### Shared Environment (`x-app-env` YAML anchor)
All 13 application services and the scheduler receive these environment variables via the `x-app-env` anchor:
| Variable | Value | Purpose |
|----------|-------|---------|
| `POSTGRES_HOST` | `postgres` | Docker Compose service name for PostgreSQL |
| `POSTGRES_PORT` | `5432` | PostgreSQL port |
| `POSTGRES_DB` | `stonks` | Database name |
| `POSTGRES_USER` | `stonks` | Database user |
| `POSTGRES_PASSWORD` | `stonks_dev` | Database password (dev default) |
| `REDIS_HOST` | `redis` | Docker Compose service name for Redis |
| `REDIS_PORT` | `6379` | Redis port |
| `MINIO_ENDPOINT` | `minio:9000` | Docker Compose service name for MinIO |
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
| `OLLAMA_BASE_URL` | `http://ollama:11434` | Docker Compose service name for Ollama |
### `.env` File (API Keys)
Three services load additional secrets from the `.env` file in the repository root via `env_file: .env`:
| Variable | Required By | Purpose |
|----------|-------------|---------|
| `MARKET_DATA_API_KEY` | ingestion | Polygon.io market data API key |
| `BROKER_API_KEY` | broker-adapter, trading-engine | Alpaca broker API key |
| `BROKER_API_SECRET` | broker-adapter, trading-engine | Alpaca broker API secret |
| `BROKER_BASE_URL` | broker-adapter, trading-engine | Alpaca API base URL (default: `https://paper-api.alpaca.markets`) |
## Health Check Summary
| Container | Health Check Command | Interval | Timeout | Retries | Start Period |
|-----------|---------------------|----------|---------|---------|--------------|
| postgres | `pg_isready -U stonks` | 5s | — | 5 | — |
| redis | `redis-cli ping` | 5s | — | 5 | — |
| minio | `mc ready local` | 5s | — | 5 | — |
| symbol-registry | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| query-api | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| trading-engine | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| risk-engine | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| dashboard | `curl -f http://localhost:8080/` | 10s | 5s | 3 | 10s |
| scheduler | `pgrep -f 'python -m services.scheduler.app'` | 10s | 5s | 3 | 15s |
| ingestion | `pgrep -f 'python -m services.ingestion.worker'` | 10s | 5s | 3 | 15s |
| parser | `pgrep -f 'python -m services.parser.worker'` | 10s | 5s | 3 | 15s |
| extractor | `pgrep -f 'python -m services.extractor.main'` | 10s | 5s | 3 | 15s |
| aggregation | `pgrep -f 'python -m services.aggregation.main'` | 10s | 5s | 3 | 15s |
| recommendation | `pgrep -f 'python -m services.recommendation.main'` | 10s | 5s | 3 | 15s |
| broker-adapter | `pgrep -f 'python -m services.adapters.broker_service'` | 10s | 5s | 3 | 15s |
| lake-publisher | `pgrep -f 'python -m services.lake_publisher.jobs'` | 10s | 5s | 3 | 15s |
Infrastructure services (ollama, trino, hive-metastore, superset) do not define health checks in docker-compose.yml. Application services that depend on ollama use `condition: service_started` instead of `condition: service_healthy`.
## Internal Network Connectivity
All containers share the default Docker Compose network. Services reference each other by their Compose service name as the hostname:
| Hostname | Resolved To | Used By |
|----------|-------------|---------|
| `postgres` | PostgreSQL container | All 13 app services, superset |
| `redis` | Redis container | scheduler, ingestion, parser, extractor, aggregation, recommendation, trading-engine, broker-adapter, query-api |
| `minio` | MinIO container | ingestion, lake-publisher, query-api (via `minio:9000`) |
| `ollama` | Ollama container | extractor (via `http://ollama:11434`) |
| `hive-metastore` | Hive Metastore container | trino (thrift://hive-metastore:9083) |
| `trino` | Trino container | superset (trino:8080) |
| `query-api` | Query API container | dashboard (nginx proxy upstream) |
+355
View File
@@ -0,0 +1,355 @@
# Kubernetes Architecture — Stonks Oracle
This document describes the Kubernetes deployment topology for Stonks Oracle, derived from the Helm chart at `infra/helm/stonks-oracle/`.
All application workloads deploy to the `stonks-oracle` namespace. External cluster services (PostgreSQL, Redis, MinIO, Ollama) run in their own namespaces and are referenced via cross-namespace DNS.
## Deployment Diagram
```mermaid
graph TB
%% ── External traffic ──────────────────────────────────────────
internet((Internet))
subgraph traefik ["kube-system (Traefik Ingress Controller)"]
direction LR
ing_dash["stonks.celestium.life"]
ing_api["stonks-api.celestium.life"]
ing_reg["stonks-registry.celestium.life"]
ing_trade["stonks-trading.celestium.life"]
ing_superset["stonks-dash.celestium.life"]
ing_trino["stonks-trino.celestium.life"]
end
internet --> traefik
%% ── stonks-oracle namespace ───────────────────────────────────
subgraph ns ["stonks-oracle namespace"]
direction TB
%% ── API Tier (ingress-facing) ─────────────────────────────
subgraph api_tier ["API Tier"]
direction LR
query_api["query-api<br/><i>Deployment (1 replica)</i><br/>:8000"]
symbol_registry["symbol-registry<br/><i>Deployment (1 replica)</i><br/>:8000"]
end
%% ── Frontend Tier ─────────────────────────────────────────
subgraph frontend_tier ["Frontend Tier"]
dashboard["dashboard<br/><i>Deployment (1 replica)</i><br/>:8080<br/><i>nginx-unprivileged</i>"]
end
%% ── Trading Tier ──────────────────────────────────────────
subgraph trading_tier ["Trading Tier"]
direction LR
trading_engine["trading-engine<br/><i>Deployment (1 replica)</i><br/>:8000"]
risk_engine["risk-engine<br/><i>Deployment (1 replica)</i><br/>:8000"]
broker_adapter["broker-adapter<br/><i>Deployment (1 replica)</i><br/><i>queue-driven worker</i>"]
end
%% ── Orchestration Tier ────────────────────────────────────
subgraph orchestration_tier ["Orchestration Tier"]
scheduler["scheduler<br/><i>Deployment (1 replica)</i><br/><i>runs migrations + seed</i>"]
end
%% ── Processing Tier (pipeline workers) ────────────────────
subgraph processing_tier ["Processing Tier (pipeline workers)"]
direction LR
ingestion["ingestion<br/><i>Deployment (2 replicas)</i>"]
parser["parser<br/><i>Deployment (2 replicas)</i>"]
extractor["extractor<br/><i>Deployment (1 replica)</i>"]
aggregation["aggregation<br/><i>Deployment (4 replicas)</i>"]
recommendation["recommendation<br/><i>Deployment (1 replica)</i>"]
end
%% ── Analytics Tier ────────────────────────────────────────
subgraph analytics_tier ["Analytics Tier"]
direction LR
lake_publisher["lake-publisher<br/><i>Deployment (1 replica)</i><br/><i>queue-driven worker</i>"]
hive_metastore["hive-metastore<br/><i>Deployment (1 replica)</i><br/>:9083<br/><i>apache/hive:4.0.0</i>"]
trino["trino<br/><i>Deployment (1 replica)</i><br/>:8080<br/><i>trinodb/trino:latest</i>"]
superset["superset<br/><i>Deployment (1 replica)</i><br/>:8088<br/><i>custom image</i>"]
end
%% ── Helm Secrets ──────────────────────────────────────────
subgraph secrets_block ["Helm-Managed Secrets"]
direction LR
sec_core["stonks-core-secrets<br/><i>POSTGRES_PASSWORD</i><br/><i>MINIO_ACCESS_KEY</i><br/><i>MINIO_SECRET_KEY</i><br/><i>REDIS_PASSWORD</i>"]
sec_broker["stonks-broker-secrets<br/><i>BROKER_API_KEY</i><br/><i>BROKER_API_SECRET</i><br/><i>BROKER_BASE_URL</i>"]
sec_market["stonks-market-secrets<br/><i>MARKET_DATA_API_KEY</i>"]
sec_gmail["stonks-gmail-secrets<br/><i>GMAIL_SENDER</i><br/><i>GMAIL_RECIPIENT</i><br/><i>GMAIL_APP_PASSWORD</i>"]
sec_dashboard["stonks-dashboard-secrets<br/><i>SUPERSET_SECRET_KEY</i><br/><i>SUPERSET_ADMIN_PASSWORD</i>"]
end
%% ── ConfigMap ─────────────────────────────────────────────
configmap["stonks-config<br/><i>ConfigMap</i><br/><i>All env vars from values.yaml config block</i>"]
end
%% ── External Cluster Services ─────────────────────────────────
subgraph pg_ns ["postgresql-service namespace"]
postgres[("PostgreSQL<br/>postgresql-rw:5432")]
end
subgraph redis_ns ["redis-service namespace"]
redis[("Redis<br/>redis-master:6379")]
end
subgraph minio_ns ["minio-service namespace"]
minio[("MinIO<br/>minio:80")]
end
subgraph ollama_ns ["ollama-service namespace"]
ollama[("Ollama<br/>ollama:11434<br/><i>GPU: 4070 Ti Super</i>")]
end
%% ── Ingress Routes ────────────────────────────────────────────
ing_dash -->|":8080"| dashboard
ing_api -->|":8000"| query_api
ing_reg -->|":8000"| symbol_registry
ing_trade -->|":8000"| trading_engine
ing_superset -->|":8088"| superset
ing_trino -->|":8080"| trino
%% ── Dashboard → Backend APIs ──────────────────────────────────
dashboard -.->|"/api/ proxy"| query_api
dashboard -.->|"/registry/ proxy"| symbol_registry
dashboard -.->|"/risk/ proxy"| risk_engine
%% ── Pipeline data flow (via Redis queues) ─────────────────────
scheduler -->|"enqueue jobs"| redis
ingestion -->|"stonks:queue:parsing"| redis
parser -->|"stonks:queue:extraction"| redis
extractor -->|"stonks:queue:aggregation"| redis
aggregation -->|"stonks:queue:recommendation"| redis
recommendation -->|"stonks:queue:trading_decisions"| redis
trading_engine -->|"stonks:queue:broker_orders"| redis
broker_adapter -->|"read orders"| redis
lake_publisher -->|"stonks:queue:lake_publish"| redis
%% ── External service connections ──────────────────────────────
scheduler --> postgres
scheduler --> redis
ingestion --> postgres
ingestion --> redis
ingestion --> minio
parser --> postgres
parser --> redis
extractor --> postgres
extractor --> redis
extractor --> ollama
aggregation --> postgres
aggregation --> redis
recommendation --> postgres
recommendation --> redis
trading_engine --> postgres
trading_engine --> redis
risk_engine --> postgres
broker_adapter --> postgres
broker_adapter --> redis
lake_publisher --> postgres
lake_publisher --> minio
query_api --> postgres
query_api --> redis
query_api --> minio
symbol_registry --> postgres
%% ── Analytics plane connections ───────────────────────────────
lake_publisher -->|"Parquet → s3a://stonks-lakehouse"| minio
hive_metastore -->|"s3a:// catalog"| minio
trino -->|"thrift://hive-metastore:9083"| hive_metastore
superset -->|"trino:8080"| trino
query_api -->|"trino:8080"| trino
superset --> postgres
superset --> redis
%% ── Trading tier external egress ──────────────────────────────
trading_engine -->|"HTTPS :443<br/>Alpaca API"| internet
trading_engine -->|"SMTP :587<br/>Gmail notifications"| internet
broker_adapter -->|"HTTPS :443<br/>Alpaca API"| internet
ingestion -->|"HTTPS :443<br/>Polygon.io / News APIs"| internet
%% ── Secret consumption ────────────────────────────────────────
sec_core -.-> query_api
sec_core -.-> symbol_registry
sec_core -.-> scheduler
sec_core -.-> ingestion
sec_core -.-> parser
sec_core -.-> extractor
sec_core -.-> aggregation
sec_core -.-> recommendation
sec_core -.-> trading_engine
sec_core -.-> risk_engine
sec_core -.-> broker_adapter
sec_core -.-> lake_publisher
sec_core -.-> hive_metastore
sec_core -.-> trino
sec_core -.-> superset
sec_broker -.-> ingestion
sec_broker -.-> trading_engine
sec_broker -.-> risk_engine
sec_broker -.-> broker_adapter
sec_market -.-> ingestion
sec_gmail -.-> trading_engine
sec_dashboard -.-> superset
configmap -.-> query_api
configmap -.-> symbol_registry
configmap -.-> scheduler
configmap -.-> ingestion
configmap -.-> parser
configmap -.-> extractor
configmap -.-> aggregation
configmap -.-> recommendation
configmap -.-> trading_engine
configmap -.-> risk_engine
configmap -.-> broker_adapter
configmap -.-> lake_publisher
configmap -.-> superset
%% ── Styles ────────────────────────────────────────────────────
classDef apiSvc fill:#4a90d9,stroke:#2c5f8a,color:#fff
classDef frontendSvc fill:#50c878,stroke:#2e7d46,color:#fff
classDef tradingSvc fill:#e8a838,stroke:#b07d1a,color:#fff
classDef processSvc fill:#9b59b6,stroke:#6c3483,color:#fff
classDef orchSvc fill:#1abc9c,stroke:#148f77,color:#fff
classDef analyticsSvc fill:#e74c3c,stroke:#a93226,color:#fff
classDef extSvc fill:#95a5a6,stroke:#717d7e,color:#fff
classDef secretSvc fill:#f5f5dc,stroke:#999,color:#333
classDef configSvc fill:#dfe6e9,stroke:#999,color:#333
class query_api,symbol_registry apiSvc
class dashboard frontendSvc
class trading_engine,risk_engine,broker_adapter tradingSvc
class scheduler orchSvc
class ingestion,parser,extractor,aggregation,recommendation processSvc
class lake_publisher,hive_metastore,trino,superset analyticsSvc
class postgres,redis,minio,ollama extSvc
class sec_core,sec_broker,sec_market,sec_gmail,sec_dashboard secretSvc
class configmap configSvc
```
## Network Policy Boundaries
The Helm chart deploys a **default-deny-ingress** policy that blocks all inbound traffic to pods in the `stonks-oracle` namespace. Each service that needs inbound connections has an explicit allow policy:
```mermaid
graph LR
subgraph netpol ["Network Policies — stonks-oracle namespace"]
direction TB
deny["🔒 default-deny-ingress<br/><i>Blocks ALL ingress to all pods</i>"]
subgraph allows ["Explicit Allow Rules"]
direction TB
np_dash["allow-dashboard-ingress<br/>dashboard :8080<br/>← kube-system (Traefik)"]
np_api["allow-query-api-ingress<br/>query-api :8000<br/>← kube-system (Traefik)<br/>← dashboard pod"]
np_reg["allow-symbol-registry-ingress<br/>symbol-registry :8000<br/>← kube-system (Traefik)<br/>← dashboard pod"]
np_trade["allow-trading-engine-ingress<br/>trading-engine :8000<br/>← kube-system (Traefik)<br/>← query-api pod<br/>← dashboard pod<br/><i>Egress: PostgreSQL :5432,</i><br/><i>Redis :6379, HTTPS :443, SMTP :587</i>"]
np_risk["allow-risk-engine-ingress<br/>risk-engine :8000<br/>← broker-adapter pod<br/>← query-api pod<br/>← dashboard pod"]
np_superset["allow-superset-ingress<br/>superset :8088<br/>← kube-system (Traefik)"]
np_trino["allow-trino-ingress<br/>trino :8080<br/>← superset pod<br/>← query-api pod<br/>← kube-system (Traefik)"]
np_hive["allow-hive-metastore-ingress<br/>hive-metastore :9083<br/>← trino pod<br/>← lake-publisher pod"]
np_broker["deny-broker-adapter-ingress<br/>broker-adapter<br/><i>No inbound traffic allowed</i>"]
end
end
style deny fill:#e74c3c,stroke:#c0392b,color:#fff
style np_broker fill:#e74c3c,stroke:#c0392b,color:#fff
style np_dash fill:#2ecc71,stroke:#27ae60,color:#fff
style np_api fill:#2ecc71,stroke:#27ae60,color:#fff
style np_reg fill:#2ecc71,stroke:#27ae60,color:#fff
style np_trade fill:#f39c12,stroke:#d68910,color:#fff
style np_risk fill:#f39c12,stroke:#d68910,color:#fff
style np_superset fill:#2ecc71,stroke:#27ae60,color:#fff
style np_trino fill:#2ecc71,stroke:#27ae60,color:#fff
style np_hive fill:#3498db,stroke:#2980b9,color:#fff
```
### Services Without Ingress Policies (Pipeline Workers)
The following services have **no inbound network policy** — they are queue-driven workers that only make outbound connections to PostgreSQL, Redis, MinIO, and Ollama. The default-deny-ingress policy blocks any unsolicited inbound traffic:
| Service | Tier | Behavior |
|---------|------|----------|
| scheduler | orchestration | Polls DB, enqueues to Redis |
| ingestion | processing | Reads from `stonks:queue:ingestion`, writes to DB/MinIO/Redis |
| parser | processing | Reads from `stonks:queue:parsing`, writes to DB/Redis |
| extractor | processing | Reads from `stonks:queue:extraction`, calls Ollama, writes to DB/Redis |
| aggregation | processing | Reads from `stonks:queue:aggregation`, writes to DB/Redis |
| recommendation | processing | Reads from `stonks:queue:recommendation`, writes to DB/Redis |
| lake-publisher | analytics | Reads from `stonks:queue:lake_publish`, writes Parquet to MinIO |
## Service Tier Summary
| Tier | Services | Ingress? | Replicas | Notes |
|------|----------|----------|----------|-------|
| **api** | query-api, symbol-registry | Yes (Traefik) | 1 each | FastAPI, readiness probes on `/docs` |
| **frontend** | dashboard | Yes (Traefik) | 1 | nginx-unprivileged on :8080, proxies to API services |
| **trading** | trading-engine, risk-engine, broker-adapter | trading-engine: Yes; risk-engine: internal only; broker-adapter: denied | 1 each | trading-engine has egress to Alpaca + Gmail |
| **orchestration** | scheduler | No | 1 | Runs DB migrations + seed as init containers |
| **processing** | ingestion, parser, extractor, aggregation, recommendation | No | 2, 2, 1, 4, 1 | Pipeline-gated by `pipelineEnabled` toggle |
| **analytics** | lake-publisher, trino, hive-metastore, superset | trino + superset: Yes; others: No | 1 each | lake-publisher is pipeline-gated |
## Secret Consumption Map
| Secret | Keys | Consumers |
|--------|------|-----------|
| `stonks-core-secrets` | POSTGRES_PASSWORD, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, REDIS_PASSWORD | All 13 app services + hive-metastore, trino, superset |
| `stonks-broker-secrets` | BROKER_API_KEY, BROKER_API_SECRET, BROKER_BASE_URL | ingestion, trading-engine, risk-engine, broker-adapter |
| `stonks-market-secrets` | MARKET_DATA_API_KEY | ingestion |
| `stonks-gmail-secrets` | GMAIL_SENDER, GMAIL_RECIPIENT, GMAIL_APP_PASSWORD | trading-engine |
| `stonks-dashboard-secrets` | SUPERSET_SECRET_KEY, SUPERSET_ADMIN_PASSWORD | superset |
## Pipeline Toggle
Setting `pipelineEnabled: false` in `values.yaml` scales all services with `pipeline: true` to 0 replicas. This affects:
- scheduler, ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher
API-tier services (query-api, symbol-registry), trading-tier services (trading-engine, risk-engine), analytics services (trino, hive-metastore, superset), and the dashboard always run regardless of this toggle.
## External Cluster Services
These services run outside the `stonks-oracle` namespace and are referenced via cross-namespace DNS:
| Service | Namespace | DNS | Port | Notes |
|---------|-----------|-----|------|-------|
| PostgreSQL | `postgresql-service` | `postgresql-rw.postgresql-service.svc.cluster.local` | 5432 | CloudNativePG managed |
| Redis | `redis-service` | `redis-master.redis-service.svc.cluster.local` | 6379 | Password in `stonks-core-secrets` |
| MinIO | `minio-service` | `minio.minio-service.svc.cluster.local` | 80 | S3-compatible object store |
| Ollama | `ollama-service` | `ollama.ollama-service.svc.cluster.local` | 11434 | LLM inference, GPU: 4070 Ti Super 16GB |
## Analytics Plane
The analytics stack runs within the `stonks-oracle` namespace:
1. **Lake Publisher** writes Parquet fact tables to MinIO at `s3a://stonks-lakehouse/warehouse`
2. **Hive Metastore** (Apache Hive 4.0.0) manages table metadata, backed by embedded Derby DB with a PVC for persistence. Connects to MinIO for S3A filesystem access.
3. **Trino** queries the lakehouse via Hive Metastore (thrift://hive-metastore:9083). Exposes two catalogs: `lakehouse` (Hive connector) and `iceberg` (Iceberg connector). Both connect to MinIO for data access.
4. **Superset** connects to Trino for lakehouse queries and to PostgreSQL for its metadata DB. Uses Redis for caching. Exposed externally via Traefik ingress.
## Ingress Routes
All ingress resources use the `traefik` IngressClass with TLS certificates issued by the `ca-issuer` ClusterIssuer:
| Domain | Backend Service | Port | TLS Secret |
|--------|----------------|------|------------|
| `stonks.celestium.life` | dashboard | 8080 | `stonks-dashboard-tls` |
| `stonks-api.celestium.life` | query-api | 8000 | `stonks-api-tls` |
| `stonks-registry.celestium.life` | symbol-registry | 8000 | `stonks-registry-tls` |
| `stonks-trading.celestium.life` | trading-engine | 8000 | `stonks-trading-tls` |
| `stonks-dash.celestium.life` | superset | 8088 | `stonks-dash-tls` |
| `stonks-trino.celestium.life` | trino | 8080 | `stonks-trino-tls` |
+440
View File
@@ -0,0 +1,440 @@
# Backup and Restore Guide
This guide documents every backup and restore script in the Stonks Oracle platform, their CLI options, storage locations, retention policies, and procedures for disaster recovery.
## Overview
Stonks Oracle provides two tiers of backup tooling:
| Tier | Scripts | Scope | Storage |
|------|---------|-------|---------|
| **Local (kubectl-based)** | `backup-db.sh`, `restore-db.sh`, `backup-redis.sh` | Individual data stores, streamed to the operator's machine | `~/backups/stonks-oracle/` (local filesystem) |
| **Cluster (Kubernetes Job)** | `backup.sh`, `restore.sh` | Full platform (PostgreSQL + all MinIO buckets) | NFS share at `192.168.42.8:/volume1/Kubernetes/stonks` |
All scripts live in the `scripts/` directory and require `kubectl` access to the cluster.
---
## Local Backup Scripts
### `backup-db.sh` — PostgreSQL Database Backup
Creates a compressed `pg_dump` of the `stonks` database and optionally uploads it to MinIO.
**Usage:**
```bash
./scripts/backup-db.sh # backup to local file
./scripts/backup-db.sh --upload-minio # backup + upload to MinIO
```
**CLI Arguments:**
| Argument | Required | Description |
|----------|----------|-------------|
| `--upload-minio` | No | Upload the backup file to the `stonks-backups` MinIO bucket after creating it |
**Environment Variables:**
| Variable | Default | Description |
|----------|---------|-------------|
| `BACKUP_DIR` | `~/backups/stonks-oracle` | Local directory where backup files are stored |
**What it captures:**
- Full `pg_dump` of the `stonks` database (all tables, data, sequences)
- Dump flags: `--no-owner --no-privileges --clean --if-exists`
- Output format: gzip-compressed SQL (`.sql.gz`)
**How it works:**
1. Runs `pg_dump` inside the PostgreSQL pod (`postgresql-1` in `postgresql-service` namespace) and streams the compressed output to the local machine
2. Validates the backup is non-empty and counts tables as a sanity check
3. If `--upload-minio` is specified, attempts to create the `stonks-backups` bucket (if it doesn't exist) and stages the file for upload
4. Prunes old backups, keeping only the last 7 files matching `stonks-*.sql.gz`
**Storage:**
- Local path: `~/backups/stonks-oracle/stonks-<YYYYMMDD-HHMMSS>.sql.gz`
- MinIO bucket (optional): `stonks-backups`
**Retention:** Keeps the last 7 backups. Older files matching `stonks-*.sql.gz` in the backup directory are automatically deleted.
---
### `backup-redis.sh` — Redis State Backup
Triggers a Redis `BGSAVE` and copies the RDB dump file to the local machine.
**Usage:**
```bash
./scripts/backup-redis.sh
```
**CLI Arguments:** None.
**Environment Variables:**
| Variable | Default | Description |
|----------|---------|-------------|
| `BACKUP_DIR` | `~/backups/stonks-oracle` | Local directory where the RDB file is stored |
| `REDIS_PASSWORD` | `PSCh4ng3me!` | Redis authentication password |
**What it captures:**
- Redis RDB snapshot (`dump.rdb`) containing all in-memory state: deduplication markers, queue contents, rate-limit counters, cached values
**How it works:**
1. Triggers `BGSAVE` on the Redis master pod (`redis-master-0` in `redis-service` namespace)
2. Waits 5 seconds for the background save to complete, then logs the `LASTSAVE` timestamp
3. Copies the RDB file from the pod. Tries `/data/dump.rdb` first, then falls back to `/var/lib/redis/dump.rdb` and `/bitnami/redis/data/dump.rdb`
4. Prints Redis keyspace statistics for verification
**Storage:**
- Local path: `~/backups/stonks-oracle/redis-<YYYYMMDD-HHMMSS>.rdb`
**Retention:** No automatic pruning. Old Redis backups accumulate and must be cleaned up manually.
---
### `restore-db.sh` — PostgreSQL Database Restore
Restores a `pg_dump` backup into the `stonks` database with full service scale-down/scale-up.
**Usage:**
```bash
./scripts/restore-db.sh <backup-file.sql.gz>
./scripts/restore-db.sh ~/backups/stonks-oracle/stonks-20260415-180000.sql.gz
```
If called without arguments, lists available backups in `~/backups/stonks-oracle/`.
**CLI Arguments:**
| Argument | Required | Description |
|----------|----------|-------------|
| `<backup-file.sql.gz>` | Yes | Path to the gzip-compressed SQL backup file to restore |
**What it restores:**
- All tables, data, sequences, and indexes in the `stonks` database
- Re-grants `ALL PRIVILEGES` to the `stonks` user on all tables and sequences after restore
**Service scale-down/scale-up procedure:**
1. **Terminates active connections** — Runs `pg_terminate_backend()` for all connections to the `stonks` database
2. **Scales down all deployments** in the `stonks-oracle` namespace to 0 replicas to prevent reconnections
3. **Waits 10 seconds** for pods to terminate
4. **Restores the backup** using `psql --single-transaction` (piped from `zcat`)
5. **Re-grants permissions** to the `stonks` user
6. **Verifies** the restore by counting tables
7. **Scales all deployments back to 1 replica**, then scales `ingestion` and `parser` to 2 replicas
**Data loss implications:**
> **WARNING:** This replaces ALL data in the `stonks` database with the backup contents. Any data written after the backup was taken is permanently lost. The script requires interactive confirmation — you must type `yes` to proceed.
---
## Cluster Backup Scripts (Kubernetes Jobs)
### `backup.sh` — Full Platform Backup (PostgreSQL + MinIO)
Runs a Kubernetes Job that backs up both PostgreSQL and all MinIO buckets to an NFS share.
**Usage:**
```bash
bash scripts/backup.sh
```
**CLI Arguments:** None.
**What it captures:**
- **PostgreSQL**: Full `pg_dump` in custom format (`-Fc`) as `stonks.pgdump`
- **MinIO buckets** (8 buckets mirrored):
- `stonks-raw-market` — Raw market data from Polygon.io
- `stonks-raw-news` — Raw news articles
- `stonks-raw-filings` — Raw SEC filings
- `stonks-normalized` — Normalized documents
- `stonks-llm-prompts` — LLM prompt logs
- `stonks-llm-results` — LLM extraction results
- `stonks-lakehouse` — Parquet fact tables for Trino
- `stonks-audit` — Audit trail artifacts
- **Manifest**: `manifest.json` with backup name, timestamp, and bucket list
**How it works:**
1. Deletes any previous `stonks-backup` Job
2. Creates a Kubernetes Job using `postgres:18-alpine` with NFS volume mount and MinIO credentials from cluster secrets
3. Inside the Job container:
- Runs `pg_dump` with credentials from `stonks-config` ConfigMap and `stonks-core-secrets` Secret
- Installs the MinIO client (`mc`) and mirrors each bucket to the NFS backup directory
- Writes a `manifest.json` and updates the `latest` symlink
4. Waits up to 600 seconds (10 minutes) for the Job to complete
5. Job auto-cleans after 300 seconds (`ttlSecondsAfterFinished`)
**Storage:**
- NFS path: `192.168.42.8:/volume1/Kubernetes/stonks/<backup-name>/`
- Directory structure:
```
stonks-backup-YYYYMMDD-HHMMSS/
├── stonks.pgdump # PostgreSQL custom-format dump
├── manifest.json # Backup metadata
└── minio/
├── stonks-raw-market/ # Mirrored bucket contents
├── stonks-raw-news/
├── stonks-raw-filings/
├── stonks-normalized/
├── stonks-llm-prompts/
├── stonks-llm-results/
├── stonks-lakehouse/
└── stonks-audit/
```
- A `latest` symlink always points to the most recent backup
**Retention:** No automatic pruning on NFS. Old backups must be cleaned up manually.
---
### `restore.sh` — Full Platform Restore (PostgreSQL + MinIO)
Runs a Kubernetes Job that restores both PostgreSQL and MinIO buckets from an NFS backup.
**Usage:**
```bash
bash scripts/restore.sh # restore from "latest" symlink
bash scripts/restore.sh <backup-name> # restore a specific backup
```
**CLI Arguments:**
| Argument | Required | Description |
|----------|----------|-------------|
| `<backup-name>` | No | Name of the backup directory on NFS. Defaults to `latest` (symlink to most recent backup) |
**What it restores:**
- **PostgreSQL**: Full database restore using `pg_restore --clean --if-exists --no-owner --no-acl`
- **MinIO buckets**: All 8 buckets mirrored back with `mc mirror --overwrite`
**How it works:**
1. Prints a warning and gives 5 seconds to abort (Ctrl+C)
2. Deletes any previous `stonks-restore` Job
3. Creates a Kubernetes Job that:
- Validates the backup exists (`stonks.pgdump` file present)
- Restores PostgreSQL using `pg_restore` with `--clean` (drops and recreates objects)
- Installs `mc` and mirrors each bucket back from NFS to MinIO
- Verifies the restore by querying row counts for key tables (companies, documents, intelligence, impacts, trends, recommendations)
4. Waits up to 600 seconds for the Job to complete
**Data loss implications:**
> **WARNING:** This will DROP and recreate all objects in the `stonks` database. All MinIO bucket contents are overwritten. Any data written after the backup was taken is permanently lost. The script provides a 5-second abort window before proceeding.
**Post-restore steps:**
After the restore completes, restart all services to pick up the restored state:
```bash
kubectl rollout restart deployment -n stonks-oracle --all
```
---
## MinIO Upload Option (`--upload-minio`)
The `backup-db.sh` script supports `--upload-minio` for off-host storage of database backups. When enabled:
1. The script connects to MinIO through an ingestion pod in the `stonks-oracle` namespace
2. Creates the `stonks-backups` bucket if it doesn't already exist
3. Stages the backup file for upload
This provides a second copy of the database backup on object storage, separate from the operator's local filesystem. The full cluster backup (`backup.sh`) stores backups on NFS and does not use this flag — it backs up MinIO bucket *contents* rather than uploading database dumps *to* MinIO.
---
## Full Nuke and Rebuild Procedure
When a complete platform reset is needed (corrupted state, major schema changes, fresh start), follow this procedure:
### Step 1: Tear Down Services
```bash
bash ~/sources/kube/stonks-oracle/runmelast.sh
```
This runs from `gremlin-1` and performs a Helm uninstall, cleaning up all Kubernetes resources in the `stonks-oracle` namespace. Database, MinIO, and Redis data are preserved (they run in separate namespaces).
### Step 2: Terminate Database Connections
```bash
kubectl exec -n postgresql-service postgresql-1 -c postgres -- \
psql -U postgres -c \
"SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'stonks' AND pid <> pg_backend_pid();"
```
### Step 3: Drop the Database
```bash
kubectl exec -n postgresql-service postgresql-1 -c postgres -- \
psql -U postgres -c "DROP DATABASE IF EXISTS stonks;"
```
### Step 4: Flush Redis
Clear all `stonks:*` keys to reset deduplication markers, queue contents, and cached state:
```bash
kubectl exec -n redis-service redis-master-0 -- \
redis-cli -a 'PSCh4ng3me!' --scan --pattern 'stonks:*' | \
xargs -L 100 kubectl exec -n redis-service redis-master-0 -- \
redis-cli -a 'PSCh4ng3me!' DEL
```
### Step 5: Redeploy
```bash
bash ~/sources/kube/stonks-oracle/runmefirst.sh
```
This runs from `gremlin-1` and performs:
- Database creation and migration (all `infra/migrations/*.sql` files applied in order)
- Helm install with secrets injected via `--set` flags
- Rolling restart of all deployments
### Step 6: Re-seed the Symbol Registry
```bash
POSTGRES_HOST=postgresql-rw.postgresql-service.svc.cluster.local \
POSTGRES_PASSWORD='St0nks0racl3!' \
POSTGRES_USER=stonks \
POSTGRES_DB=stonks \
.venv/bin/python -m services.symbol_registry.seed
```
This populates the 50 tracked companies across 10 sectors and 46 competitor relationships.
---
## Recommended Backup Schedules
### Daily Database Backup (cron)
Run `backup-db.sh` daily on a machine with `kubectl` access. The built-in retention keeps the last 7 backups automatically.
```cron
# Daily database backup at 2:00 AM
0 2 * * * /path/to/stonks-oracle/scripts/backup-db.sh --upload-minio >> /var/log/stonks-backup.log 2>&1
```
### Weekly Full Backup (cron)
Run the full cluster backup weekly to capture both PostgreSQL and MinIO data on NFS:
```cron
# Weekly full backup (PostgreSQL + MinIO) on Sundays at 3:00 AM
0 3 * * 0 /path/to/stonks-oracle/scripts/backup.sh >> /var/log/stonks-full-backup.log 2>&1
```
### Redis Backup Before Deployments
Redis state is transient (queues, dedup markers, caches) and rebuilds naturally. Back up Redis before major deployments or database resets as a precaution:
```bash
./scripts/backup-redis.sh
```
### Kubernetes CronJobs
For fully automated in-cluster backups, create a CronJob based on the same Job spec used by `backup.sh`:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: stonks-backup
namespace: stonks-oracle
spec:
schedule: "0 2 * * *" # Daily at 2:00 AM UTC
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
ttlSecondsAfterFinished: 3600
backoffLimit: 1
template:
spec:
restartPolicy: Never
volumes:
- name: nfs-backup
nfs:
server: 192.168.42.8
path: /volume1/Kubernetes/stonks
containers:
- name: backup
image: postgres:18-alpine
volumeMounts:
- name: nfs-backup
mountPath: /backup
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-core-secrets
env:
- name: MINIO_ACCESS_KEY
valueFrom:
secretKeyRef:
name: stonks-core-secrets
key: MINIO_ACCESS_KEY
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: stonks-core-secrets
key: MINIO_SECRET_KEY
command: ["sh", "-c"]
args:
- |
set -e
apk add --no-cache curl ca-certificates
STAMP="stonks-backup-$(date +%Y%m%d-%H%M%S)"
DIR="/backup/${STAMP}"
mkdir -p "${DIR}/minio"
# PostgreSQL backup
PGPASSWORD="${POSTGRES_PASSWORD}" pg_dump \
-h "${POSTGRES_HOST}" -p "${POSTGRES_PORT}" \
-U "${POSTGRES_USER}" -d "${POSTGRES_DB}" \
--no-owner --no-acl -Fc \
-f "${DIR}/stonks.pgdump"
# MinIO backup
curl -sL https://dl.min.io/client/mc/release/linux-amd64/mc -o /usr/local/bin/mc
chmod +x /usr/local/bin/mc
mc alias set backup "http://${MINIO_ENDPOINT}" "${MINIO_ACCESS_KEY}" "${MINIO_SECRET_KEY}" --api S3v4
for bucket in stonks-raw-market stonks-raw-news stonks-raw-filings stonks-normalized stonks-llm-prompts stonks-llm-results stonks-lakehouse stonks-audit; do
mc mirror "backup/${bucket}" "${DIR}/minio/${bucket}/" 2>/dev/null || true
done
ln -sfn "${STAMP}" /backup/latest
echo "Backup complete: ${DIR}"
```
### Recommended Schedule Summary
| What | Frequency | Script | Retention |
|------|-----------|--------|-----------|
| Database only | Daily | `backup-db.sh --upload-minio` | Last 7 (auto-pruned) |
| Full platform (DB + MinIO) | Weekly | `backup.sh` | Manual cleanup on NFS |
| Redis snapshot | Before deployments | `backup-redis.sh` | Manual cleanup |
+627
View File
@@ -0,0 +1,627 @@
# Docker Deployment Guide
This guide covers running the full Stonks Oracle platform locally using Docker Compose. It documents every service, environment variable, volume mount, health check, and operational command.
## Prerequisites
- Docker Engine 24+ and Docker Compose v2
- At least 16 GB RAM (Ollama + Trino + all services)
- API keys for Polygon.io and Alpaca (optional — platform runs in degraded mode without them)
## Quick Start
```bash
# 1. Clone the repository
git clone <repo-url> && cd stonks-oracle
# 2. Configure API keys
cp .env.example .env # or edit the existing .env
# Fill in MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET
# 3. Start everything
docker compose up -d
# 4. Verify all services are healthy
docker compose ps
# 5. Access the dashboard
open http://localhost:3000
```
---
## Service Inventory
### Infrastructure Services
| Service | Image | Ports | Volumes | Purpose |
|---------|-------|-------|---------|---------|
| `postgres` | `postgres:16-alpine` | `5432:5432` | `pgdata``/var/lib/postgresql/data`, `./infra/migrations``/docker-entrypoint-initdb.d` | Primary database; migrations auto-applied on first start |
| `redis` | `redis:7-alpine` | `6379:6379` | — | Queue broker, caching, deduplication |
| `minio` | `minio/minio:latest` | `9000:9000` (API), `9001:9001` (console) | `miniodata``/data` | Object storage for raw artifacts and lakehouse |
| `minio-init` | `minio/mc:latest` | — | — | One-shot init container that creates required buckets |
| `ollama` | `ollama/ollama:latest` | `11434:11434` | `ollama_models``/root/.ollama` | LLM inference server for extraction and classification |
| `trino` | `trinodb/trino:latest` | `8080:8080` | `./infra/trino/catalog``/etc/trino/catalog` | SQL query engine over the lakehouse |
| `hive-metastore` | `apache/hive:4.0.0` | `9083:9083` | `hive_data``/opt/hive/data`, `./infra/hive/core-site.xml``/opt/hive/conf/core-site.xml`, `./infra/hive/metastore-site.xml``/opt/hive/conf/metastore-site.xml` | Iceberg/Hive metadata catalog for Trino |
| `superset` | `apache/superset:latest` | `8088:8088` | `superset_data``/app/superset_home` | BI dashboards over Trino |
### Application Services
| Service | Dockerfile | `SERVICE_CMD` / Command | Ports | Depends On |
|---------|-----------|------------------------|-------|------------|
| `scheduler` | `docker/Dockerfile.scheduler` | `python -m services.scheduler.app` | — | postgres (healthy), redis (healthy) |
| `symbol-registry` | `docker/Dockerfile` | `uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000` | `8001:8000` | postgres (healthy) |
| `ingestion` | `docker/Dockerfile` | `python -m services.ingestion.worker` | — | postgres (healthy), redis (healthy), minio (healthy) |
| `parser` | `docker/Dockerfile` | `python -m services.parser.worker` | — | postgres (healthy), redis (healthy) |
| `extractor` | `docker/Dockerfile` | `python -m services.extractor.main` | — | postgres (healthy), redis (healthy), ollama (started) |
| `aggregation` | `docker/Dockerfile` | `python -m services.aggregation.main` | — | postgres (healthy), redis (healthy) |
| `recommendation` | `docker/Dockerfile` | `python -m services.recommendation.main` | — | postgres (healthy), redis (healthy) |
| `trading-engine` | `docker/Dockerfile` | `uvicorn services.trading.app:app --host 0.0.0.0 --port 8000` | `8002:8000` | postgres (healthy), redis (healthy) |
| `risk-engine` | `docker/Dockerfile` | `uvicorn services.risk.app:app --host 0.0.0.0 --port 8000` | `8003:8000` | postgres (healthy) |
| `broker-adapter` | `docker/Dockerfile` | `python -m services.adapters.broker_service` | — | postgres (healthy), redis (healthy) |
| `lake-publisher` | `docker/Dockerfile` | `python -m services.lake_publisher.jobs` | — | postgres (healthy), minio (healthy) |
| `query-api` | `docker/Dockerfile` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` | `8004:8000` | postgres (healthy), redis (healthy), minio (healthy) |
| `dashboard` | `frontend/Dockerfile` | nginx (built-in) | `3000:8080` | query-api (healthy) |
### Port Summary
| Port | Service | Protocol |
|------|---------|----------|
| 3000 | Dashboard (React UI) | HTTP |
| 5432 | PostgreSQL | TCP |
| 6379 | Redis | TCP |
| 8001 | Symbol Registry API | HTTP |
| 8002 | Trading Engine API | HTTP |
| 8003 | Risk Engine API | HTTP |
| 8004 | Query API | HTTP |
| 8080 | Trino | HTTP |
| 8088 | Superset | HTTP |
| 9000 | MinIO API | HTTP |
| 9001 | MinIO Console | HTTP |
| 9083 | Hive Metastore | Thrift |
| 11434 | Ollama | HTTP |
---
## Environment Variables
### Shared Application Environment (`x-app-env`)
All application services inherit these variables via the `x-app-env` YAML anchor:
| Variable | Default | Description |
|----------|---------|-------------|
| `POSTGRES_HOST` | `postgres` | PostgreSQL hostname (Docker service name) |
| `POSTGRES_PORT` | `5432` | PostgreSQL port |
| `POSTGRES_DB` | `stonks` | Database name |
| `POSTGRES_USER` | `stonks` | Database user |
| `POSTGRES_PASSWORD` | `stonks_dev` | Database password |
| `REDIS_HOST` | `redis` | Redis hostname (Docker service name) |
| `REDIS_PORT` | `6379` | Redis port |
| `MINIO_ENDPOINT` | `minio:9000` | MinIO API endpoint |
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
| `OLLAMA_BASE_URL` | `http://ollama:11434` | Ollama LLM server URL |
### `.env` File
The `.env` file is loaded by `ingestion`, `broker-adapter`, and `trading-engine` via the `env_file` directive. Create it in the repository root:
```dotenv
# Stonks Oracle — Environment Variables
# These are loaded by ingestion, broker-adapter, and trading-engine services.
# Polygon.io market data API key (required for live data ingestion)
MARKET_DATA_API_KEY=
# Alpaca broker credentials (required for paper/live trading)
BROKER_API_KEY=
BROKER_API_SECRET=
BROKER_BASE_URL=https://paper-api.alpaca.markets
```
| Variable | Required | Default | Used By | Description |
|----------|----------|---------|---------|-------------|
| `MARKET_DATA_API_KEY` | No* | (empty) | ingestion | Polygon.io API key for market data fetching |
| `BROKER_API_KEY` | No* | (empty) | broker-adapter, trading-engine | Alpaca API key |
| `BROKER_API_SECRET` | No* | (empty) | broker-adapter, trading-engine | Alpaca API secret |
| `BROKER_BASE_URL` | No | `https://paper-api.alpaca.markets` | broker-adapter, trading-engine | Alpaca API base URL |
*Services start without these keys but run in degraded mode — ingestion cannot fetch market data and the broker adapter cannot execute trades.
### Infrastructure Service Environment
**PostgreSQL** (`postgres`):
| Variable | Value | Description |
|----------|-------|-------------|
| `POSTGRES_DB` | `stonks` | Database created on first start |
| `POSTGRES_USER` | `stonks` | Superuser for the database |
| `POSTGRES_PASSWORD` | `stonks_dev` | Password for the database user |
**MinIO** (`minio`):
| Variable | Value | Description |
|----------|-------|-------------|
| `MINIO_ROOT_USER` | `minioadmin` | MinIO admin username |
| `MINIO_ROOT_PASSWORD` | `minioadmin` | MinIO admin password |
**Trino** (`trino`):
| Variable | Value | Description |
|----------|-------|-------------|
| `MINIO_ACCESS_KEY` | `minioadmin` | Passed to Trino for MinIO catalog access |
| `MINIO_SECRET_KEY` | `minioadmin` | Passed to Trino for MinIO catalog access |
**Hive Metastore** (`hive-metastore`):
| Variable | Value | Description |
|----------|-------|-------------|
| `SERVICE_NAME` | `metastore` | Tells Hive to run in metastore-only mode |
| `DB_DRIVER` | `derby` | Embedded Derby database for metadata |
**Superset** (`superset`):
| Variable | Value | Description |
|----------|-------|-------------|
| `SUPERSET_SECRET_KEY` | `stonks-dev-secret-key-change-me` | Flask secret key (change in production) |
| `ADMIN_USERNAME` | `admin` | Initial admin username |
| `ADMIN_PASSWORD` | `admin` | Initial admin password |
| `ADMIN_EMAIL` | `admin@stonks.local` | Initial admin email |
### Additional Configuration Variables
All application services support additional environment variables loaded via `services/shared/config.py`. These can be added to individual service `environment` blocks or to the `x-app-env` anchor as needed:
| Variable | Default | Description |
|----------|---------|-------------|
| `REDIS_DB` | `0` | Redis database number |
| `REDIS_PASSWORD` | (none) | Redis password (not needed in Docker Compose) |
| `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
| `OLLAMA_MODEL` | `qwen3.5:9b` | Default LLM model for extraction |
| `OLLAMA_TIMEOUT` | `120` | Ollama request timeout (seconds) |
| `OLLAMA_MAX_RETRIES` | `2` | Max retries for Ollama requests |
| `TRINO_HOST` | `localhost` | Trino hostname |
| `TRINO_PORT` | `8080` | Trino port |
| `TRINO_CATALOG` | `lakehouse` | Trino catalog name |
| `TRINO_SCHEMA` | `stonks` | Trino schema name |
| `MARKET_DATA_BASE_URL` | `https://api.polygon.io` | Polygon.io base URL |
| `MARKET_DATA_PROVIDER` | `polygon` | Market data provider |
| `BROKER_MODE` | `paper` | Broker mode: `paper` or `live` |
| `BROKER_PROVIDER` | `alpaca` | Broker provider |
| `TRADING_ENABLED` | `false` | Enable autonomous trading engine |
| `TRADING_RISK_TIER` | `moderate` | Risk tier: `conservative`, `moderate`, `aggressive` |
| `TRADING_POLLING_INTERVAL_SECONDS` | `60` | Recommendation polling interval |
| `TRADING_MAX_OPEN_POSITIONS` | `10` | Maximum concurrent open positions |
| `MACRO_ENABLED` | `true` | Enable macro signal layer |
| `COMPETITIVE_ENABLED` | `true` | Enable competitive signal layer |
| `LOG_LEVEL` | `INFO` | Logging level |
| `JSON_LOGS` | `true` | Enable structured JSON logging |
| `DEPLOY_STAGE` | (empty) | Deployment stage prefix for bucket names |
See `services/shared/config.py` for the complete list of all supported environment variables with their defaults.
---
## Volume Mounts and Data Persistence
Docker Compose defines five named volumes for persistent data:
| Volume | Mounted By | Mount Path | Contents |
|--------|-----------|------------|----------|
| `pgdata` | postgres | `/var/lib/postgresql/data` | PostgreSQL database files |
| `miniodata` | minio | `/data` | MinIO object storage (raw artifacts, lakehouse Parquet files) |
| `ollama_models` | ollama | `/root/.ollama` | Downloaded LLM model weights |
| `hive_data` | hive-metastore | `/opt/hive/data` | Hive metastore Derby database |
| `superset_data` | superset | `/app/superset_home` | Superset configuration and metadata |
### Bind Mounts
In addition to named volumes, several services use bind mounts for configuration:
| Service | Host Path | Container Path | Mode | Purpose |
|---------|-----------|---------------|------|---------|
| postgres | `./infra/migrations` | `/docker-entrypoint-initdb.d` | rw | SQL migrations auto-applied on first start |
| trino | `./infra/trino/catalog` | `/etc/trino/catalog` | rw | Trino catalog configuration (lakehouse, iceberg) |
| hive-metastore | `./infra/hive/core-site.xml` | `/opt/hive/conf/core-site.xml` | ro | Hadoop core-site config for MinIO access |
| hive-metastore | `./infra/hive/metastore-site.xml` | `/opt/hive/conf/metastore-site.xml` | ro | Hive metastore config |
### Resetting Data
To destroy all persistent data and start fresh:
```bash
# Stop all containers and remove named volumes
docker compose down -v
```
This removes `pgdata`, `miniodata`, `ollama_models`, `hive_data`, and `superset_data`. The next `docker compose up` will re-initialize PostgreSQL with migrations, re-create MinIO buckets (via `minio-init`), and re-download Ollama models.
To reset only specific volumes:
```bash
docker compose down
docker volume rm stonks-oracle_pgdata # Reset database only
docker compose up -d
```
> **Note**: Volume names are prefixed with the project directory name (e.g., `stonks-oracle_pgdata`). Use `docker volume ls` to see exact names.
---
## Health Checks
Every service has a health check configured. Docker Compose uses these to enforce startup ordering via `depends_on` with `condition: service_healthy`.
### Infrastructure Health Checks
| Service | Test Command | Interval | Retries |
|---------|-------------|----------|---------|
| `postgres` | `pg_isready -U stonks` | 5s | 5 |
| `redis` | `redis-cli ping` | 5s | 5 |
| `minio` | `mc ready local` | 5s | 5 |
### Application Health Checks — FastAPI Services
FastAPI services (symbol-registry, trading-engine, risk-engine, query-api) use HTTP health endpoints:
| Service | Test Command | Interval | Timeout | Retries | Start Period |
|---------|-------------|----------|---------|---------|-------------|
| `symbol-registry` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| `trading-engine` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| `risk-engine` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| `query-api` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| `dashboard` | `curl -f http://localhost:8080/` | 10s | 5s | 3 | 10s |
### Application Health Checks — Worker Services
Worker services (no HTTP endpoint) use process liveness checks:
| Service | Test Command | Interval | Timeout | Retries | Start Period |
|---------|-------------|----------|---------|---------|-------------|
| `scheduler` | `pgrep -f 'python -m services.scheduler.app'` | 10s | 5s | 3 | 15s |
| `ingestion` | `pgrep -f 'python -m services.ingestion.worker'` | 10s | 5s | 3 | 15s |
| `parser` | `pgrep -f 'python -m services.parser.worker'` | 10s | 5s | 3 | 15s |
| `extractor` | `pgrep -f 'python -m services.extractor.main'` | 10s | 5s | 3 | 15s |
| `aggregation` | `pgrep -f 'python -m services.aggregation.main'` | 10s | 5s | 3 | 15s |
| `recommendation` | `pgrep -f 'python -m services.recommendation.main'` | 10s | 5s | 3 | 15s |
| `broker-adapter` | `pgrep -f 'python -m services.adapters.broker_service'` | 10s | 5s | 3 | 15s |
| `lake-publisher` | `pgrep -f 'python -m services.lake_publisher.jobs'` | 10s | 5s | 3 | 15s |
### Verifying Service Health
```bash
# Check all service statuses
docker compose ps
# Check a specific service
docker compose ps query-api
# Inspect health check details for a container
docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | python -m json.tool
```
---
## Dockerfile Build Details
### `docker/Dockerfile` — Generic Python Service Image
Used by all application services except the scheduler. Accepts a `SERVICE_CMD` build argument that determines which service the container runs.
**Base image**: `python:3.12-slim`
**Build arguments**:
| Argument | Default | Description |
|----------|---------|-------------|
| `SERVICE_CMD` | `python -m services.scheduler.app` | The command executed when the container starts |
**What gets copied**:
- `requirements.txt` → pip dependencies installed
- `services/` → all service source code
- `tests/` → test files (available for in-container testing)
- `conftest.py` → pytest configuration
**Environment variables set**:
- `PYTHONDONTWRITEBYTECODE=1` — no `.pyc` files
- `PYTHONUNBUFFERED=1` — unbuffered stdout/stderr for log visibility
- `PYTHONPATH=/app` — ensures `services.*` imports resolve
**System packages installed**: `gcc`, `libpq-dev` (PostgreSQL client library), `curl` (for health checks)
**Security**: Runs as non-root user `stonks` (UID 1000).
**How `SERVICE_CMD` works**: The `CMD` directive is `sh -c "${SERVICE_CMD}"`, so the build argument becomes the runtime command. Each service in `docker-compose.yml` overrides this via the `args.SERVICE_CMD` build parameter:
```yaml
query-api:
build:
context: .
dockerfile: docker/Dockerfile
args:
SERVICE_CMD: "uvicorn services.api.app:app --host 0.0.0.0 --port 8000"
```
### `docker/Dockerfile.scheduler` — Scheduler Image
A specialized variant of the generic Dockerfile used only by the `scheduler` service. Adds `postgresql-client` for running database migrations via `psql`.
**Additional contents**:
- `infra/migrations/` → copied to `/app/infra/migrations/` for migration execution
- `postgresql-client` system package installed
**Command**: Hardcoded `CMD ["python", "-m", "services.scheduler.app"]` (no `SERVICE_CMD` argument).
### `docker/Dockerfile.superset` — Custom Superset Image
Extends the official Apache Superset image with additional database drivers.
**Base image**: `apache/superset:latest`
**Additional packages**: `trino[sqlalchemy]`, `psycopg2-binary`, `redis`
### `frontend/Dockerfile` — Dashboard Image
Multi-stage build for the React dashboard.
**Stage 1 — Build** (base: `node:24-alpine`):
| Build Argument | Default | Description |
|---------------|---------|-------------|
| `VITE_QUERY_API_URL` | `""` | Query API base URL (empty = use relative `/api/` proxy) |
| `VITE_SYMBOL_REGISTRY_URL` | `""` | Symbol Registry base URL (empty = use relative `/registry/` proxy) |
| `VITE_RISK_ENGINE_URL` | `""` | Risk Engine base URL (empty = use relative `/risk/` proxy) |
**Stage 2 — Serve** (base: `nginxinc/nginx-unprivileged:alpine`):
- Serves the built static files on port 8080
- Uses `frontend/nginx.conf` for SPA fallback and API reverse proxying
- Proxies `/api/``query-api:8000`, `/registry/``symbol-registry:8000`, `/risk/``risk-engine:8000`, `/trading/``trading-engine:8000`
### Building Custom Images
To build a single service image locally:
```bash
# Build the query-api image
docker compose build query-api
# Build with a custom SERVICE_CMD
docker build -t my-custom-service \
--build-arg SERVICE_CMD="python -m services.my_service.main" \
-f docker/Dockerfile .
# Build the dashboard with custom API URLs
docker build -t my-dashboard \
--build-arg VITE_QUERY_API_URL="https://api.example.com" \
-f frontend/Dockerfile frontend/
# Rebuild all images
docker compose build
```
---
## Dependency Ordering
Docker Compose enforces startup order using `depends_on` with health check conditions. The dependency graph is:
```
postgres (healthy) ──┬── scheduler
├── symbol-registry
├── ingestion
├── parser
├── extractor
├── aggregation
├── recommendation
├── trading-engine
├── risk-engine
├── broker-adapter
├── lake-publisher
└── query-api
redis (healthy) ─────┬── scheduler
├── ingestion
├── parser
├── extractor
├── aggregation
├── recommendation
├── trading-engine
├── broker-adapter
└── query-api
minio (healthy) ─────┬── minio-init
├── ingestion
├── lake-publisher
└── query-api
ollama (started) ────── extractor
minio ───────────────── trino
hive-metastore ─────── trino
trino ──────────────── superset (via depends_on)
query-api (healthy) ── dashboard
```
Services with `condition: service_healthy` wait until the dependency's health check passes. The `extractor` depends on `ollama` with `condition: service_started` (no health check — Ollama may take time to load models).
---
## Operational Commands
### Starting Services
```bash
# Start all services in the background
docker compose up -d
# Start only infrastructure (useful for local development)
docker compose up -d postgres redis minio minio-init ollama
# Start a specific service and its dependencies
docker compose up -d query-api
```
### Stopping Services
```bash
# Stop all services (preserves volumes)
docker compose down
# Stop all services and remove volumes (full reset)
docker compose down -v
# Stop a specific service
docker compose stop trading-engine
```
### Restarting Services
```bash
# Restart a specific service
docker compose restart query-api
# Restart with a fresh build
docker compose up -d --build query-api
# Force recreate a service (picks up compose file changes)
docker compose up -d --force-recreate query-api
```
### Viewing Logs
```bash
# Follow logs for all services
docker compose logs -f
# Follow logs for a specific service
docker compose logs -f query-api
# View last 50 lines of a service's logs
docker compose logs --tail=50 ingestion
# View logs for multiple services
docker compose logs -f scheduler ingestion extractor
```
### Scaling Replicas
```bash
# Scale a worker service to 3 replicas
docker compose up -d --scale ingestion=3
# Scale multiple services
docker compose up -d --scale ingestion=3 --scale extractor=2
# Scale back to 1
docker compose up -d --scale ingestion=1
```
> **Note**: Scaling works best for worker services (ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher) that consume from Redis queues. Do not scale FastAPI services that expose host ports without adjusting port mappings.
### Inspecting Services
```bash
# List all services and their status
docker compose ps
# View resource usage
docker compose top
# Execute a command inside a running container
docker compose exec query-api python -c "from services.shared.config import load_config; print(load_config())"
# Open a shell in a container
docker compose exec postgres psql -U stonks -d stonks
```
### Full Reset
```bash
# Nuclear option: stop everything, remove volumes, rebuild, restart
docker compose down -v
docker compose build --no-cache
docker compose up -d
```
This destroys all data (database, object storage, model weights, metastore, Superset config) and starts from scratch. PostgreSQL migrations are re-applied automatically. MinIO buckets are re-created by `minio-init`. Ollama models must be re-downloaded.
---
## MinIO Bucket Initialization
The `minio-init` service runs once on startup and creates the required object storage buckets:
| Bucket | Purpose |
|--------|---------|
| `stonks-raw-market` | Raw market data from Polygon.io |
| `stonks-raw-news` | Raw news articles |
| `stonks-raw-filings` | Raw SEC filings |
| `stonks-normalized` | Normalized/parsed documents |
| `stonks-llm-prompts` | LLM prompt archives |
| `stonks-llm-results` | LLM extraction results |
| `stonks-lakehouse` | Parquet fact tables for Trino |
| `stonks-audit` | Audit trail artifacts |
Access the MinIO console at `http://localhost:9001` (credentials: `minioadmin` / `minioadmin`).
---
## Dashboard Reverse Proxy
The dashboard container runs nginx with reverse proxy rules that route API requests to backend services using Docker Compose service names:
| Path | Proxied To | Service |
|------|-----------|---------|
| `/api/` | `http://query-api:8000` | Query API |
| `/registry/` | `http://symbol-registry:8000/` | Symbol Registry API |
| `/risk/` | `http://risk-engine:8000/` | Risk Engine API |
| `/trading/` | `http://trading-engine:8000/` | Trading Engine API |
All other paths serve the React SPA with `try_files` fallback to `index.html`.
---
## Troubleshooting
### Service won't start
Check dependency health:
```bash
docker compose ps postgres redis minio
```
If infrastructure services are unhealthy, application services will wait indefinitely. Check infrastructure logs:
```bash
docker compose logs postgres
```
### Database migration errors
Migrations in `./infra/migrations/` are applied by PostgreSQL's `docker-entrypoint-initdb.d` mechanism, which only runs on first database initialization. If you need to re-run migrations:
```bash
docker compose down -v # Remove pgdata volume
docker compose up -d # Migrations re-applied on fresh init
```
### Ollama model not available
The extractor service needs an LLM model loaded in Ollama. Pull a model manually:
```bash
docker compose exec ollama ollama pull qwen3.5:9b
```
### Port conflicts
If a port is already in use, modify the host port mapping in `docker-compose.yml`:
```yaml
query-api:
ports:
- "9004:8000" # Changed from 8004 to 9004
```
+659
View File
@@ -0,0 +1,659 @@
# Helm Chart Configuration Reference
Complete reference for the Stonks Oracle Helm chart at `infra/helm/stonks-oracle/`.
| | |
|---|---|
| **Chart name** | `stonks-oracle` |
| **Chart version** | `0.1.0` |
| **App version** | `1.0.0` |
| **Chart type** | `application` |
Install with:
```bash
helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle
```
Override values per stage:
```bash
# Beta
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
-n stonks-oracle-beta -f infra/helm/stonks-oracle/values-beta.yaml
# Paper trading
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
-n stonks-oracle -f infra/helm/stonks-oracle/values-paper.yaml
```
---
## Table of Contents
- [image — Global Image Settings](#image--global-image-settings)
- [pipelineEnabled — Pipeline Toggle](#pipelineenabled--pipeline-toggle)
- [services — Service Deployments](#services--service-deployments)
- [config — ConfigMap Environment Variables](#config--configmap-environment-variables)
- [secrets — Kubernetes Secrets](#secrets--kubernetes-secrets)
- [ingress — Ingress Configuration](#ingress--ingress-configuration)
- [Analytics Stack — Trino, Hive Metastore, Superset](#analytics-stack--trino-hive-metastore-superset)
- [networkPolicies — Network Policy Configuration](#networkpolicies--network-policy-configuration)
- [Value Override Files](#value-override-files)
---
## `image` — Global Image Settings
Controls the container image registry, pull policy, and tag for all service deployments. Each service image is resolved as `{registry}/{service.image}:{tag}`.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `image.registry` | string | `registry.celestium.life/stonks-oracle` | Container registry prefix. Each service appends its `image` name to this. |
| `image.pullPolicy` | string | `Always` | Kubernetes `imagePullPolicy`. Use `Always` for latest-tag workflows. |
| `image.tag` | string | `latest` | Image tag applied to all services. CI overrides this with the Git SHA via `--set image.tag=<sha>`. |
Example override:
```bash
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
--set image.tag=abc1234
```
---
## `pipelineEnabled` — Pipeline Toggle
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `pipelineEnabled` | bool | `true` | Master toggle for the data pipeline. |
When `false`, all services with `pipeline: true` in their definition are scaled to **0 replicas**. API-tier and trading-tier services continue running normally.
**Affected services** (scaled to 0 when disabled): scheduler, ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher.
**Unaffected services** (always run): symbol-registry, query-api, trading-engine, risk-engine, dashboard.
The replica count logic in the deployment template:
```yaml
replicas: {{ if and (hasKey $svc "pipeline") $svc.pipeline (not .Values.pipelineEnabled) }}0{{ else }}{{ $svc.replicas }}{{ end }}
```
---
## `services` — Service Deployments
Each key under `services` defines a Kubernetes Deployment. The deployments template iterates over all entries and creates a Deployment + optional Service for each.
### Per-Service Structure
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `replicas` | int | yes | Number of pod replicas. Set to 0 by `pipelineEnabled: false` for pipeline services. |
| `image` | string | yes | Image name appended to `image.registry`. Also used as the Deployment name and pod label (`app: <image>`). |
| `command` | string | no | Shell command passed as `["sh", "-c", "<command>"]`. Omit for images with a built-in entrypoint (e.g., dashboard/nginx). |
| `tier` | string | yes | Service tier label (`stonks-oracle/tier`). One of: `api`, `frontend`, `processing`, `trading`, `orchestration`, `analytics`, `ingestion`. |
| `port` | int | no | Container port. When set, a Kubernetes Service is created mapping `port → port`. |
| `pipeline` | bool | no | If `true`, replicas are set to 0 when `pipelineEnabled` is `false`. |
| `secrets` | list(string) | no | List of Secret names to mount via `envFrom.secretRef`. |
| `resources` | object | yes | Kubernetes resource requests and limits (`cpu`, `memory`). |
| `probes.readiness` | object | no | HTTP readiness probe: `path`, `port`, `initialDelay`, `period`. |
| `probes.liveness` | object | no | HTTP liveness probe: `path`, `port`, `initialDelay`, `period`. |
### Service Definitions
#### scheduler
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `scheduler` |
| `command` | `python -m services.scheduler.app` |
| `tier` | `orchestration` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 50m, memory: 64Mi |
| `resources.limits` | cpu: 200m, memory: 128Mi |
| `probes` | — |
The scheduler deployment has two init containers (not configurable via values):
1. **run-migrations** — applies all SQL files from `infra/migrations/*.sql` in sorted order.
2. **seed-if-empty** — runs `python -m services.symbol_registry.seed` if the `companies` table is empty.
#### symbolRegistry
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `symbol-registry` |
| `command` | `uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000` |
| `tier` | `api` |
| `port` | `8000` |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
| `probes.readiness` | path: `/docs`, port: 8000, initialDelay: 5s, period: 10s |
| `probes.liveness` | path: `/docs`, port: 8000, initialDelay: 10s, period: 30s |
#### ingestion
| Field | Value |
|-------|-------|
| `replicas` | `2` |
| `pipeline` | `true` |
| `image` | `ingestion` |
| `command` | `python -m services.ingestion.worker` |
| `tier` | `ingestion` |
| `port` | — |
| `secrets` | `stonks-core-secrets`, `stonks-market-secrets`, `stonks-broker-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### parser
| Field | Value |
|-------|-------|
| `replicas` | `2` |
| `pipeline` | `true` |
| `image` | `parser` |
| `command` | `python -m services.parser.worker` |
| `tier` | `processing` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### extractor
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `extractor` |
| `command` | `python -m services.extractor.main` |
| `tier` | `processing` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 200m, memory: 256Mi |
| `resources.limits` | cpu: 1, memory: 512Mi |
Single replica is recommended — the extractor is bottlenecked by the shared Ollama GPU.
#### aggregation
| Field | Value |
|-------|-------|
| `replicas` | `4` |
| `pipeline` | `true` |
| `image` | `aggregation` |
| `command` | `python -m services.aggregation.main` |
| `tier` | `processing` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### recommendation
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `recommendation` |
| `command` | `python -m services.recommendation.main` |
| `tier` | `processing` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### tradingEngine
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `trading-engine` |
| `command` | `uvicorn services.trading.app:app --host 0.0.0.0 --port 8000` |
| `tier` | `trading` |
| `port` | `8000` |
| `secrets` | `stonks-core-secrets`, `stonks-broker-secrets`, `stonks-gmail-secrets` |
| `resources.requests` | cpu: 100m, memory: 256Mi |
| `resources.limits` | cpu: 500m, memory: 512Mi |
| `probes.readiness` | path: `/ready`, port: 8000, initialDelay: 5s, period: 10s |
| `probes.liveness` | path: `/health`, port: 8000, initialDelay: 10s, period: 30s |
#### riskEngine
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `risk` |
| `command` | `uvicorn services.risk.app:app --host 0.0.0.0 --port 8000` |
| `tier` | `trading` |
| `port` | `8000` |
| `secrets` | `stonks-core-secrets`, `stonks-broker-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### brokerAdapter
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `broker-adapter` |
| `command` | `python -m services.adapters.broker_service` |
| `tier` | `trading` |
| `port` | — |
| `secrets` | `stonks-core-secrets`, `stonks-broker-secrets` |
| `resources.requests` | cpu: 50m, memory: 64Mi |
| `resources.limits` | cpu: 200m, memory: 128Mi |
#### lakePublisher
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `lake-publisher` |
| `command` | `python -m services.lake_publisher.jobs` |
| `tier` | `analytics` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### queryApi
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `query-api` |
| `command` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` |
| `tier` | `api` |
| `port` | `8000` |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
| `probes.readiness` | path: `/docs`, port: 8000, initialDelay: 5s, period: 10s |
#### dashboard
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `dashboard` |
| `command` | — (nginx built-in entrypoint) |
| `tier` | `frontend` |
| `port` | `8080` |
| `secrets` | — |
| `resources.requests` | cpu: 50m, memory: 64Mi |
| `resources.limits` | cpu: 200m, memory: 128Mi |
| `probes.readiness` | path: `/`, port: 8080, initialDelay: 3s, period: 10s |
| `probes.liveness` | path: `/`, port: 8080, initialDelay: 5s, period: 30s |
---
## `config` — ConfigMap Environment Variables
All keys under `config` are rendered into a Kubernetes ConfigMap named `stonks-config` and injected into every service pod via `envFrom.configMapRef`. Values are strings.
### Database
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.POSTGRES_HOST` | string | `postgresql-rw.postgresql-service.svc.cluster.local` | PostgreSQL hostname. Points to the CloudNativePG read-write service. |
| `config.POSTGRES_PORT` | string | `5432` | PostgreSQL port. |
| `config.POSTGRES_DB` | string | `stonks` | Database name. Override per stage (e.g., `stonks_beta`, `stonks_paper`). |
| `config.POSTGRES_USER` | string | `stonks` | Database user. Override per stage. |
| `config.REDIS_HOST` | string | `redis-master.redis-service.svc.cluster.local` | Redis hostname. |
| `config.REDIS_PORT` | string | `6379` | Redis port. |
| `config.REDIS_DB` | string | `0` | Redis database index. Use different indices per stage to isolate keys (beta: `1`, paper: `2`). |
### Object Storage
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.MINIO_ENDPOINT` | string | `minio.minio-service.svc.cluster.local:80` | MinIO API endpoint (host:port). |
| `config.MINIO_SECURE` | string | `false` | Use HTTPS for MinIO connections. Set to `true` if MinIO has TLS. |
### LLM / Ollama
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.OLLAMA_BASE_URL` | string | `""` (empty) | Ollama API base URL. Set to the cluster-internal or external Ollama endpoint. |
| `config.OLLAMA_MODEL` | string | `qwen3.5:9b-fast` | Default LLM model for extraction and classification agents. |
| `config.OLLAMA_TIMEOUT` | string | `240` | Request timeout in seconds for Ollama API calls. |
| `config.OLLAMA_MAX_RETRIES` | string | `2` | Maximum retry attempts for failed Ollama requests. |
| `config.OLLAMA_RETRY_BASE_DELAY` | string | `1.0` | Base delay in seconds for exponential backoff on Ollama retries. |
| `config.OLLAMA_RETRY_MAX_DELAY` | string | `10.0` | Maximum delay cap in seconds for Ollama retry backoff. |
| `config.OLLAMA_RETRY_BACKOFF_MULTIPLIER` | string | `2.0` | Multiplier for exponential backoff between Ollama retries. |
### Analytics / Trino
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.TRINO_HOST` | string | `trino.stonks-oracle.svc.cluster.local` | Trino coordinator hostname. |
| `config.TRINO_PORT` | string | `8080` | Trino coordinator port. |
| `config.TRINO_CATALOG` | string | `lakehouse` | Default Trino catalog for Hive-based queries. |
| `config.TRINO_SCHEMA` | string | `stonks` | Default Trino schema. |
| `config.TRINO_ICEBERG_CATALOG` | string | `iceberg` | Trino catalog for Iceberg table queries. |
### Broker / Trading
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.BROKER_MODE` | string | `paper` | Broker execution mode. `paper` for simulated trading, `live` for real orders. |
| `config.BROKER_PROVIDER` | string | `""` (empty) | Broker provider name (e.g., `alpaca`). |
| `config.MARKET_DATA_BASE_URL` | string | `""` (empty) | Market data API base URL (e.g., `https://api.polygon.io`). |
| `config.MARKET_DATA_PROVIDER` | string | `polygon` | Market data provider identifier. |
| `config.TRADING_ENABLED` | string | `true` | Master toggle for the trading engine. Set to `false` to disable order submission. |
| `config.TRADING_RISK_TIER` | string | `moderate` | Default risk tier for position sizing. Options: `conservative`, `moderate`, `aggressive`. |
| `config.TRADING_ABSOLUTE_POSITION_CAP` | string | `10000.0` | Maximum dollar value per position. |
| `config.TRADING_MAX_OPEN_POSITIONS` | string | `10` | Maximum number of concurrent open positions. |
### Data Retention
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.RETENTION_RAW_MARKET_DAYS` | string | `90` | Days to retain raw market data before cleanup. |
| `config.RETENTION_RAW_NEWS_DAYS` | string | `180` | Days to retain raw news articles. |
| `config.RETENTION_RAW_FILINGS_DAYS` | string | `365` | Days to retain raw SEC filings. |
| `config.RETENTION_NORMALIZED_DAYS` | string | `180` | Days to retain normalized/parsed documents. |
| `config.RETENTION_LLM_PROMPTS_DAYS` | string | `365` | Days to retain LLM prompt logs. |
| `config.RETENTION_LLM_RESULTS_DAYS` | string | `365` | Days to retain LLM extraction results. |
| `config.RETENTION_LAKEHOUSE_DAYS` | string | `730` | Days to retain lakehouse fact tables. |
| `config.RETENTION_AUDIT_DAYS` | string | `730` | Days to retain audit trail events. |
| `config.RETENTION_CLEANUP_INTERVAL_HOURS` | string | `24` | Hours between retention cleanup runs. |
| `config.RETENTION_BATCH_SIZE` | string | `1000` | Number of rows deleted per cleanup batch. |
### Logging and Deployment
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.LOG_LEVEL` | string | `INFO` | Python logging level. Options: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
| `config.JSON_LOGS` | string | `true` | Emit structured JSON logs when `true`. |
| `config.DEPLOY_STAGE` | string | `""` (empty) | Deployment stage identifier. Used to isolate Redis keys and MinIO buckets per stage (e.g., `beta`, `paper`). |
### Alerting
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.ALERT_SOURCE_FAILURE_THRESHOLD` | string | `3` | Number of consecutive source failures before firing an alert. |
| `config.ALERT_SOURCE_FAILURE_WINDOW_HOURS` | string | `6` | Time window (hours) for evaluating source failure count. |
| `config.ALERT_SCHEMA_FAILURE_RATE_THRESHOLD` | string | `0.3` | Schema validation failure rate (0.01.0) that triggers an alert. |
| `config.ALERT_SCHEMA_FAILURE_WINDOW_HOURS` | string | `1` | Time window (hours) for evaluating schema failure rate. |
| `config.ALERT_LAKE_LAG_THRESHOLD_MINUTES` | string | `60` | Minutes of lakehouse publish lag before alerting. |
| `config.ALERT_BROKER_ERROR_THRESHOLD` | string | `3` | Number of broker errors before firing an alert. |
| `config.ALERT_BROKER_ERROR_WINDOW_HOURS` | string | `1` | Time window (hours) for evaluating broker error count. |
| `config.ALERT_CHECK_INTERVAL_SECONDS` | string | `120` | Seconds between alert evaluation cycles. |
---
## `secrets` — Kubernetes Secrets
Secrets are rendered into five Kubernetes Secret objects. In the base `values.yaml`, all secret values default to empty strings. Inject real values at deploy time using `--set` flags or a values override file.
### Secret Objects
| Secret Name | Values Key | Consumed By |
|-------------|-----------|-------------|
| `stonks-core-secrets` | `secrets.core` | All services |
| `stonks-broker-secrets` | `secrets.broker` | ingestion, trading-engine, risk-engine, broker-adapter |
| `stonks-market-secrets` | `secrets.market` | ingestion |
| `stonks-gmail-secrets` | `secrets.gmail` | trading-engine |
| `stonks-dashboard-secrets` | `secrets.dashboard` | superset |
### `secrets.core`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `POSTGRES_PASSWORD` | string | `""` | PostgreSQL password. |
| `MINIO_ACCESS_KEY` | string | `""` | MinIO access key (AWS-style). |
| `MINIO_SECRET_KEY` | string | `""` | MinIO secret key. |
| `REDIS_PASSWORD` | string | `""` | Redis authentication password. |
### `secrets.broker`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `BROKER_API_KEY` | string | `""` | Broker API key (e.g., Alpaca paper trading key). |
| `BROKER_API_SECRET` | string | `""` | Broker API secret. |
| `BROKER_BASE_URL` | string | `""` | Broker API base URL (e.g., `https://paper-api.alpaca.markets`). |
### `secrets.market`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `MARKET_DATA_API_KEY` | string | `""` | Market data provider API key (e.g., Polygon.io). |
### `secrets.gmail`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `GMAIL_SENDER` | string | `celes@celestium.life` | Gmail sender address for trading notifications. |
| `GMAIL_RECIPIENT` | string | `celes@celestium.life` | Gmail recipient address for trading notifications. |
| `GMAIL_APP_PASSWORD` | string | `""` | Gmail app password for SMTP authentication. |
### `secrets.dashboard`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `SUPERSET_SECRET_KEY` | string | `""` | Flask secret key for Superset session encryption. |
| `SUPERSET_ADMIN_PASSWORD` | string | `""` | Superset admin user password. |
### Injecting Secrets at Deploy Time
```bash
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
-n stonks-oracle \
--set secrets.core.POSTGRES_PASSWORD="<password>" \
--set secrets.core.MINIO_ACCESS_KEY="<key>" \
--set secrets.core.MINIO_SECRET_KEY="<secret>" \
--set secrets.core.REDIS_PASSWORD="<password>" \
--set secrets.broker.BROKER_API_KEY="<key>" \
--set secrets.broker.BROKER_API_SECRET="<secret>" \
--set secrets.broker.BROKER_BASE_URL="https://paper-api.alpaca.markets" \
--set secrets.market.MARKET_DATA_API_KEY="<key>" \
--set secrets.gmail.GMAIL_APP_PASSWORD="<password>" \
--set secrets.dashboard.SUPERSET_SECRET_KEY="<key>" \
--set secrets.dashboard.SUPERSET_ADMIN_PASSWORD="<password>"
```
---
## `ingress` — Ingress Configuration
Controls Traefik Ingress resources with TLS via cert-manager.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `ingress.enabled` | bool | `true` | Create Ingress resources. Set to `false` for port-forward-only access. |
| `ingress.className` | string | `traefik` | Kubernetes IngressClass name. |
| `ingress.clusterIssuer` | string | `ca-issuer` | cert-manager ClusterIssuer for TLS certificates. |
### Host Mappings
| Key | Default | Routes To | Port |
|-----|---------|-----------|------|
| `ingress.hosts.queryApi` | `stonks-api.celestium.life` | query-api Service | 8000 |
| `ingress.hosts.symbolRegistry` | `stonks-registry.celestium.life` | symbol-registry Service | 8000 |
| `ingress.hosts.dashboard` | `stonks.celestium.life` | dashboard Service | 8080 |
| `ingress.hosts.superset` | `stonks-dash.celestium.life` | superset Service | 8088 |
| `ingress.hosts.trino` | `stonks-trino.celestium.life` | trino Service | 8080 |
| `ingress.hosts.tradingEngine` | `stonks-trading.celestium.life` | trading-engine Service | 8000 |
Setting `superset` or `trino` host to an empty string (`""`) disables that Ingress resource (the template uses a conditional check).
Each Ingress resource gets a dedicated TLS secret (e.g., `stonks-api-tls`, `stonks-registry-tls`) automatically provisioned by cert-manager.
---
## Analytics Stack — Trino, Hive Metastore, Superset
The analytics stack provides SQL-based querying over the lakehouse data stored in MinIO. Each component can be independently enabled or disabled.
### `trino`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `trino.enabled` | bool | `true` | Deploy the Trino coordinator. |
| `trino.resources.requests.cpu` | string | `500m` | CPU request. |
| `trino.resources.requests.memory` | string | `1Gi` | Memory request. |
| `trino.resources.limits.cpu` | string | `2` | CPU limit. |
| `trino.resources.limits.memory` | string | `4Gi` | Memory limit. |
When enabled, Trino deploys with two auto-configured catalogs:
- **`lakehouse`** — Hive connector for Parquet fact tables in MinIO.
- **`iceberg`** — Iceberg connector for Iceberg-format tables.
Both catalogs connect to the Hive Metastore for schema metadata and to MinIO for data via S3A. MinIO credentials are read from `stonks-core-secrets`.
### `hiveMetastore`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `hiveMetastore.enabled` | bool | `true` | Deploy the Hive Metastore. |
| `hiveMetastore.storageSize` | string | `1Gi` | PersistentVolumeClaim size for the embedded Derby metastore database. |
| `hiveMetastore.resources.requests.cpu` | string | `200m` | CPU request. |
| `hiveMetastore.resources.requests.memory` | string | `512Mi` | Memory request. |
| `hiveMetastore.resources.limits.cpu` | string | `1` | CPU limit. |
| `hiveMetastore.resources.limits.memory` | string | `1Gi` | Memory limit. |
Uses `apache/hive:4.0.0` with an embedded Derby database. The Thrift metastore listens on port 9083. MinIO credentials are injected from `stonks-core-secrets` via an init container that generates `core-site.xml` and `metastore-site.xml`.
### `superset`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `superset.enabled` | bool | `true` | Deploy Apache Superset. |
| `superset.storageSize` | string | `2Gi` | PersistentVolumeClaim size for Superset home directory. |
| `superset.resources.requests.cpu` | string | `200m` | CPU request. |
| `superset.resources.requests.memory` | string | `512Mi` | Memory request. |
| `superset.resources.limits.cpu` | string | `1` | CPU limit. |
| `superset.resources.limits.memory` | string | `2Gi` | Memory limit. |
Uses a custom image (`registry.celestium.life/stonks-oracle/superset`) with Trino and psycopg2 drivers pre-installed. Superset's metadata database is PostgreSQL (same cluster instance). Redis is used for caching. Credentials come from `stonks-core-secrets` and `stonks-dashboard-secrets`.
Superset listens on port 8088 with a readiness probe at `/health`.
### Disabling the Analytics Stack
To disable the entire analytics stack (e.g., in beta environments):
```yaml
trino:
enabled: false
hiveMetastore:
enabled: false
superset:
enabled: false
```
---
## `networkPolicies` — Network Policy Configuration
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `networkPolicies.enabled` | bool | `true` | Deploy NetworkPolicy resources. |
When enabled, the chart creates a **default-deny-ingress** policy that blocks all inbound traffic to every pod in the namespace. Individual allow policies are then created for services that need ingress:
| Policy | Target Pod | Allowed Sources | Port |
|--------|-----------|-----------------|------|
| `allow-query-api-ingress` | `query-api` | kube-system (Traefik), dashboard | 8000 |
| `allow-symbol-registry-ingress` | `symbol-registry` | kube-system (Traefik), dashboard | 8000 |
| `allow-risk-engine-ingress` | `risk` | broker-adapter, query-api, dashboard | 8000 |
| `allow-trading-engine-ingress` | `trading-engine` | query-api, dashboard, kube-system (Traefik) | 8000 |
| `allow-superset-ingress` | `superset` | kube-system (Traefik) | 8088 |
| `allow-trino-ingress` | `trino` | superset, query-api, kube-system (Traefik) | 8080 |
| `allow-hive-metastore-ingress` | `hive-metastore` | trino, lake-publisher | 9083 |
| `allow-dashboard-ingress` | `dashboard` | kube-system (Traefik) | 8080 |
| `deny-broker-adapter-ingress` | `broker-adapter` | (none — explicit deny) | — |
The trading-engine also has egress rules allowing outbound connections to PostgreSQL (5432), Redis (6379), HTTPS (443), SMTP (587), and DNS (53).
Pipeline workers (scheduler, ingestion, parser, extractor, aggregation, recommendation, lake-publisher) have no explicit ingress allow policies — they rely on the default-deny and communicate only via outbound connections to Redis queues and PostgreSQL.
---
## Value Override Files
The chart ships with two override files for staged deployments. ArgoCD or Kargo applies these during promotion.
### `values-beta.yaml` — Beta / Integration Testing
**Purpose**: Integration testing environment deployed to `stonks-oracle-beta` namespace. Shares infrastructure with paper but uses isolated database (`stonks_beta`), Redis DB index (`1`), and separate ingress hostnames.
Key overrides:
| Key | Beta Value | Reason |
|-----|-----------|--------|
| `pipelineEnabled` | `true` | Services deployed (ArgoCD health checks), but pipeline defaults to OFF via `PIPELINE_DEFAULT_OFF`. |
| `config.DEPLOY_STAGE` | `beta` | Isolates Redis keys (`stonks:beta:*`) and MinIO buckets (`beta-stonks-*`). |
| `config.POSTGRES_DB` | `stonks_beta` | Separate database for beta data. |
| `config.REDIS_DB` | `1` | Separate Redis DB index. |
| `config.LOG_LEVEL` | `DEBUG` | Verbose logging for debugging. |
| `config.TRADING_ENABLED` | `false` | Safety net — no order submission in beta. |
| `config.PIPELINE_DEFAULT_OFF` | `true` | Scheduler won't enqueue jobs unless explicitly enabled. |
| `config.OLLAMA_MODEL` | `qwen3.6` | May use a different model version for testing. |
| `trino.enabled` | `false` | Analytics stack disabled in beta. |
| `hiveMetastore.enabled` | `false` | Analytics stack disabled in beta. |
| `superset.enabled` | `false` | Analytics stack disabled in beta. |
Beta ingress hostnames:
| Service | Hostname |
|---------|----------|
| Query API | `stonks-api-beta.celestium.life` |
| Symbol Registry | `stonks-registry-beta.celestium.life` |
| Dashboard | `stonks-beta.celestium.life` |
| Trading Engine | `stonks-trading-beta.celestium.life` |
| Superset | (disabled) |
| Trino | (disabled) |
### `values-paper.yaml` — Paper Trading
**Purpose**: Paper trading environment with real market data but simulated order execution via Alpaca's paper trading API. Deployed to the main `stonks-oracle` namespace.
Key overrides:
| Key | Paper Value | Reason |
|-----|-----------|--------|
| `config.BROKER_MODE` | `paper` | Simulated order execution. |
| `config.BROKER_PROVIDER` | `alpaca` | Alpaca paper trading API. |
| `config.TRADING_ENABLED` | `true` | Trading engine active. |
| `config.POSTGRES_DB` | `stonks_paper` | Separate database for paper trading data. |
| `config.POSTGRES_USER` | `stonks_paper` | Separate database user. |
| `config.REDIS_DB` | `2` | Separate Redis DB index. |
| `config.DEPLOY_STAGE` | `paper` | Stage identifier. |
| `config.LOG_LEVEL` | `INFO` | Standard logging. |
| `services.extractor.replicas` | `1` | Single replica (GPU bottleneck). |
Paper ingress hostnames:
| Service | Hostname |
|---------|----------|
| Query API | `stonks-paper-api.celestium.life` |
| Symbol Registry | `stonks-paper-registry.celestium.life` |
| Dashboard | `stonks-paper.celestium.life` |
| Superset | `stonks-paper-dash.celestium.life` |
| Trino | `stonks-paper-trino.celestium.life` |
| Trading Engine | `stonks-paper-trading.celestium.life` |
### Deployment Stage Progression
```
values-beta.yaml values-paper.yaml values.yaml (base)
Beta → Paper Trading → Production
Integration Simulated orders Live trading
testing Real market data Real orders
Pipeline OFF Pipeline ON Pipeline ON
Trading OFF Trading ON Trading ON
Analytics OFF Analytics ON Analytics ON
```
Promotion between stages is managed by Kargo/ArgoCD. CI sets the image tag, and the promotion pipeline applies the appropriate values file.
@@ -0,0 +1,130 @@
# Page 1 — Data Ingestion and Preparation
Every signal that Stonks Oracle eventually acts on begins its life as raw data pulled from an external source. Before any AI agent can extract structured intelligence, before any trend can accumulate, and before any trade can be placed, the platform must first discover new content, fetch it reliably, eliminate duplicates, store the raw artifacts for audit, and normalize the text into a form suitable for downstream processing. This page traces that journey from external API to parser output, covering the Scheduler, Ingestion Worker, deduplication layer, raw storage, and Parser in detail.
For a visual overview of the full flow described here, see the [Ingestion to Extraction Flow diagram](diagrams/ingestion-to-extraction-flow.md).
---
## Four Categories of Input Data
Stonks Oracle tracks 50 companies across 10 sectors, and it draws intelligence from four distinct categories of external data. Each category has its own adapter, its own API conventions, and its own scheduling cadence, but all of them feed into the same ingestion pipeline.
The first category is **company news**, sourced from the Polygon.io ticker news endpoint (`/v2/reference/news`). The `PolygonNewsAdapter` in `services/adapters/news_adapter.py` fetches articles linked to a specific ticker, returning structured results that include title, publisher, article URL, description, keywords, and publication timestamp. Each request can return up to 1,000 articles, though the default limit is 20 per fetch. The adapter tracks the most recent `published_utc` value and uses it on subsequent fetches to avoid re-retrieving articles the system has already seen.
The second category is **SEC filings**, sourced from the SEC EDGAR full-text search system (EFTS). The `SECEdgarAdapter` in `services/adapters/filings_adapter.py` queries the `/LATEST/search-index` endpoint for 8-K, 10-Q, 10-K, and other form types associated with a company's ticker or CIK number. Unlike the Polygon endpoints, EDGAR is a public API that requires no key — only a descriptive `User-Agent` header per the SEC's fair-access policy. The adapter deduplicates results by accession number (`adsh`), filters out non-primary documents like XML fragments and graphics, and constructs the SEC EDGAR filing index URL for each hit so downstream services can fetch the full document.
The third category is **market data**, also sourced from Polygon.io. The `PolygonMarketAdapter` in `services/adapters/market_adapter.py` supports multiple endpoints: previous-day aggregate bars (`/v2/aggs/ticker/{ticker}/prev`), range bars for custom date windows, intraday hourly bars, grouped daily bars that return data for all tickers in a single call (`/v2/aggs/grouped/locale/us/market/stocks/{date}`), and ticker detail lookups. Market data follows a different path than textual content — it does not pass through the Parser or Extractor, since the structured numeric data is already in a usable form.
The fourth category is **macro and geopolitical news**, fetched by the `MacroNewsAdapter` in `services/adapters/macro_news_adapter.py`. Unlike the other three categories, macro news is not company-specific. These sources have `source_type='macro_news'` in the `sources` database table and may have a `NULL` `company_id`. The adapter fetches from a configurable HTTP endpoint (typically the Polygon news API filtered for broad market topics) and returns articles that describe global events — trade policy shifts, central bank decisions, geopolitical conflicts — rather than company-specific developments. Macro news articles are eventually classified by the Global Event Classifier agent and routed through a separate queue, as described in [Page 2](02-ai-agent-processing-and-extraction.md).
All four adapter classes inherit from `BaseAdapter` defined in `services/adapters/base.py` and return an `AdapterResult` dataclass containing the raw payload bytes, a SHA-256 content hash, a list of parsed item dicts, HTTP metadata (status code, response time), and an error field that is `None` on success. This uniform interface allows the Ingestion Worker to handle all source types through a single dispatch mechanism.
---
## The Scheduler: Orchestrating Ingestion Cycles
The Scheduler (`services/scheduler/app.py`) is the heartbeat of the ingestion pipeline. It runs a continuous loop that ticks every 15 seconds (`SCHEDULER_TICK = 15`), and on each tick it evaluates which sources are due for their next fetch. The Scheduler does not fetch data itself — it enqueues jobs onto the `stonks:queue:ingestion` Redis list for the Ingestion Worker to process.
Each source type has a default polling cadence defined in the `DEFAULT_CADENCES` dictionary:
| Source Type | Default Cadence |
|---------------|-----------------|
| `market_api` | 300 seconds |
| `news_api` | 300 seconds |
| `filings_api` | 3,600 seconds |
| `macro_news` | 600 seconds |
| `web_scrape` | 1,800 seconds |
| `broker` | 30 seconds |
Individual sources can override their cadence via the `polling_interval_seconds` field in their `config` JSONB column in the `sources` table. The `get_cadence_for_source()` function checks for this override first, falling back to the default if none is set, and enforces a minimum interval of 10 seconds.
The Scheduler determines whether a source is due by calling `is_source_due()`, which considers several conditions. If a source has never run before (no entry in the `ingestion_runs` table), it is immediately due. If the last run failed, the Scheduler respects an exponential backoff computed by `compute_backoff()`: the delay starts at 60 seconds (`DEFAULT_BACKOFF_BASE`) and doubles with each retry up to a maximum of 3,600 seconds (`MAX_BACKOFF`). If a source has failed 10 consecutive times (`MAX_RETRY_COUNT`), the Scheduler stops scheduling it entirely until an operator manually resets the retry state. If the last run is still marked as `running`, the source is skipped to prevent double-scheduling. Otherwise, the Scheduler checks whether enough time has elapsed since the last completed run based on the source's cadence.
Rate limiting adds another layer of protection. The `check_rate_limit()` function enforces two constraints. First, each source type has a per-type limit defined in `DEFAULT_RATE_LIMITS` — for example, `market_api` and `news_api` are each capped at 20 requests per minute, while `filings_api` and `macro_news` are capped at 10. Second, because `market_api` and `news_api` both use the same Polygon.io API key, a global Polygon rate limit of 45 requests per minute (`POLYGON_GLOBAL_RATE_LIMIT`) is enforced across both types combined. Rate limit state is tracked in Redis using keys of the form `stonks:ratelimit:{source_type}:{window}`, where the window is a minute-granularity timestamp. If a source type exceeds its limit, the Scheduler logs a warning and skips that source for the current tick.
The Scheduler handles three categories of sources in each cycle. First, it fetches all active company-specific sources (excluding `macro_news`) by joining the `sources` and `companies` tables. Second, it fetches active macro news sources separately, since these may not have a `company_id`. Third, it fetches global market sources — those with `source_type='market_api'` and `company_id IS NULL` — which represent endpoints like the grouped daily bars that return data for all tickers in a single API call. For intraday bar sources, the Scheduler expands a single global source into per-ticker jobs for every active company.
Each enqueued job payload includes the `source_id`, `company_id`, `ticker`, `legal_name`, `source_type`, `source_name`, `config`, `credibility_score`, a list of company `aliases` (fetched from the `company_aliases` table), and a `scheduled_at` timestamp. The job is pushed onto `stonks:queue:ingestion` via Redis `RPUSH`.
Beyond scheduling, the Scheduler also performs periodic maintenance. Every ~20 cycles (~5 minutes), it runs `recover_stale_documents()` to re-enqueue documents that have been stuck in `parsed` status for longer than 240 minutes — a safety net for cases where Redis loses queue entries due to pod restarts or OOM events. Every ~40 cycles (~10 minutes), it runs `retry_failed_extractions()` to give documents in `extraction_failed` status another chance, resetting them to `parsed` and deleting the failed `document_intelligence` row so the Extractor treats them as fresh. Every ~100 cycles (~25 minutes), it runs `cleanup_all_tables()` to enforce retention policies across tables like `competitive_signal_records` (30 days), `ingestion_runs` (14 days), and `trading_decisions` (90 days).
For more detail on the Scheduler's configuration and operational behavior, see the [Services Reference](../services.md).
---
## The Ingestion Worker: Adapter Dispatch and Persistence
The Ingestion Worker (`services/ingestion/worker.py`) is a long-running process that continuously pops jobs from the `stonks:queue:ingestion` Redis list and processes them. On startup, it initializes one instance of each adapter class and stores them in a dispatch dictionary keyed by `source_type`:
```
adapters = {
"market_api": PolygonMarketAdapter(...),
"news_api": PolygonNewsAdapter(...),
"filings_api": SECEdgarAdapter(),
"web_scrape": WebScrapeAdapter(),
"broker": AlpacaBrokerAdapter(...),
"macro_news": MacroNewsAdapter(...),
}
```
When a job arrives, the `process_job()` function looks up the appropriate adapter by `source_type` and calls its `fetch()` method with the ticker and source config. Before fetching, it records a new row in the `ingestion_runs` table with status `running`. If the adapter returns an error, the worker calls `record_retrieval_failure()` to update the run status and increment the source's retry counter with exponential backoff timing.
On a successful fetch, the worker performs several steps in sequence. First, it uploads the raw payload to MinIO via `upload_raw_artifact()` in `services/shared/storage.py`. The target bucket is determined by the source type through the `SOURCE_BUCKET_MAP`: `market_api` payloads go to `stonks-raw-market`, `news_api` and `macro_news` payloads go to `stonks-raw-news`, and `filings_api` payloads go to `stonks-raw-filings`. Objects are stored under a path that encodes the source type, ticker, date hierarchy, and document ID — for example, `news_api/AAPL/2025/01/15/{run_id}/raw.json`.
---
## Content Deduplication via Redis
After storing the raw artifact, the Ingestion Worker checks for duplicate content. Deduplication operates at two levels.
At the payload level, the worker checks the overall `content_hash` (a SHA-256 digest of the raw API response) against Redis. The key pattern is `stonks:dedupe:{content_hash}` with a 24-hour TTL (86,400 seconds). If the hash is already present, the entire payload is skipped — the `ingestion_runs` row is marked as completed with `items_new=0`, and no downstream jobs are enqueued. If the hash is new, the worker sets the marker in Redis so future fetches of identical content are caught.
At the individual item level, for source types other than `market_api` and `broker`, the worker calls `dedupe_items()` from `services/shared/dedupe.py`. This function checks each item against a layered deduplication strategy. The fast path checks Redis for both content-hash markers (`stonks:dedupe:{hash}`) and canonical-URL markers (`stonks:dedupe:url:{url_hash}`), both with 24-hour TTLs. If the Redis check misses, the function falls back to PostgreSQL, querying the `documents` table by `content_hash` or `canonical_url` for durable cross-source matching. When a duplicate is found through the PostgreSQL fallback, the function warms the Redis cache so subsequent checks are fast.
Items identified as duplicates are not discarded entirely. If the duplicate document was originally ingested for a different company, the worker creates a cross-source mention link in the `document_company_mentions` table via `persist_document_company_mention()`. This ensures that a news article mentioning both Apple and Microsoft is linked to both companies even if it was first ingested through Apple's news source.
New (non-duplicate) items are persisted to PostgreSQL through `persist_ingestion_items()` in `services/shared/metadata.py`, which inserts rows into the `documents` table and records company mentions in `document_company_mentions`. Each new document ID is then pushed onto `stonks:queue:parsing` for the Parser to process. After persistence, the worker calls `mark_as_seen()` to set Redis dedupe markers for both the content hash and canonical URL of each new item, ensuring that the next fetch cycle's deduplication checks are fast.
On successful completion, the worker updates the `ingestion_runs` row with the final counts (`items_fetched`, `items_new`) and calls `reset_source_retry_state()` to clear any accumulated backoff from previous failures. For news-type sources (`news_api` and `macro_news`), the worker also updates the source's `config` JSONB column with the latest `published_utc` value, so the next fetch only retrieves newer articles.
---
## The Parser: Normalization, Quality Scoring, and Routing
Documents that pass through ingestion arrive on the `stonks:queue:parsing` Redis list as JSON payloads containing a `document_id`, `ticker`, and `source_type`. The Parser Worker (`services/parser/worker.py`) pops these jobs and transforms raw HTML or text into normalized, quality-scored documents ready for AI extraction.
The parsing pipeline begins with HTML fetching. If the document has a URL (looked up from the `documents` table if not present in the job payload), the worker calls `fetch_html()` to retrieve the page content. SEC EDGAR URLs receive a specialized `User-Agent` header to comply with the SEC's fair-access policy. The raw HTML is then passed to `parse_html()` in `services/parser/html_parser.py`, which runs a multi-stage extraction pipeline.
The HTML parser first strips non-content tags — `script`, `style`, `nav`, `footer`, `header`, `aside`, `iframe`, and others — and removes boilerplate containers identified by CSS class or ID patterns (sidebars, ad slots, newsletter signups, social share bars, and similar UI elements). It then searches for the article body using a priority list of semantic selectors (`article`, `[role='main']`, `.article-body`, `.post-content`, and others). If no semantic match is found, it falls back to text-density scoring across candidate `div`, `section`, and `td` elements, selecting the block with the highest composite score based on text density, link density, paragraph count, and word count. The extracted text undergoes further cleaning: regex-based removal of residual boilerplate phrases (copyright notices, "subscribe to our newsletter" prompts, "share this article" fragments), removal of short orphan lines that are likely UI fragments, detection and collapse of repeated template blocks, and whitespace normalization.
Metadata extraction pulls the document title (from `og:title` or `<title>`), author, publisher (from `og:site_name` or hostname), publication date (from `article:published_time` or JSON-LD `datePublished`), canonical URL, language, description, and keywords from the HTML head elements.
If the parsed body text is shorter than 500 characters, the worker attempts to enrich it by reading the raw API payload from MinIO and extracting the Polygon article description, keywords, and author fields for the matching article. This enrichment step ensures that even articles with minimal scrapeable HTML still have enough textual content for meaningful AI extraction.
Quality scoring is performed by `score_parse_quality()` in `services/parser/html_parser.py`, which evaluates six weighted signals to produce a composite score between 0 and 0.95:
| Signal | Weight | What It Measures |
|--------------------|--------|-----------------------------------------------------------------|
| `word_count` | 0.30 | Length of extracted text (thresholds at 20, 50, 150, 300 words) |
| `body_found` | 0.20 | Whether a semantic article body element was located |
| `diversity` | 0.15 | Vocabulary richness (unique words / total words) |
| `sentence` | 0.15 | Presence of proper sentence structure (terminal punctuation) |
| `paragraph` | 0.10 | Multi-paragraph structure (blocks separated by blank lines) |
| `metadata` | 0.10 | Presence of title, author, publisher, and publication date |
The composite score maps to a confidence label: scores below 0.35 are labeled `low`, scores between 0.35 and 0.65 are `medium`, and scores 0.65 and above are `high`. Documents with `low` confidence are marked with status `low_quality` in the `documents` table and are not enqueued for extraction — they are effectively filtered out of the pipeline at this stage.
Company mention detection runs next. The worker fetches all known aliases from the `company_aliases` table (plus tickers and legal names from the `companies` table) and calls `detect_company_mentions()` in `services/parser/html_parser.py`. The matching strategy varies by alias length: one-to-two character aliases use case-sensitive word-boundary matching to avoid false positives (the letter "A" should not match every occurrence of the word "a"), three-to-four character aliases use case-insensitive word-boundary matching (standard ticker format), and aliases of five or more characters use case-insensitive substring matching (company names and brands). Confidence scores vary by alias type: ticker matches receive 0.9, legal name matches 0.85, general aliases 0.7, and brand matches 0.6. Multiple alias hits for the same company are deduplicated, keeping the highest-confidence match and summing match counts. Detected mentions are persisted to the `document_company_mentions` table.
The normalized text and a structured parser output JSON (containing all metadata, quality signals, warnings, outbound links, tags, and mentions) are uploaded to the `stonks-normalized` MinIO bucket. The `documents` row is updated with the normalized storage reference, parser output reference, quality score, and confidence level.
Finally, the Parser makes a routing decision. If the document's `document_type` is `macro_event`, it is pushed onto `stonks:queue:macro_classification` for the Global Event Classifier agent. All other documents are pushed onto `stonks:queue:extraction` for the Document Intelligence Extractor agent. Both queues feed into the Extractor service described in [Page 2](02-ai-agent-processing-and-extraction.md). The job payload includes the `document_id`, `ticker`, and the first 32,000 characters of the normalized text, giving the downstream agent immediate access to the content without needing to fetch it from MinIO.
For additional detail on queue topology and data store layout, see the [Data Pipeline Architecture](../architecture-data-pipeline.md) documentation.
---
## What Comes Next
At this point, raw data has been fetched from four external sources, deduplicated, stored in MinIO, parsed into normalized text, scored for quality, tagged with company mentions, and routed to the appropriate extraction queue. The documents sitting on `stonks:queue:extraction` and `stonks:queue:macro_classification` are clean, quality-filtered, and ready for AI processing. [Page 2 — AI Agent Processing and Structured Extraction](02-ai-agent-processing-and-extraction.md) picks up the story from here, explaining how the Document Intelligence Extractor and Global Event Classifier agents use LLM inference to transform these normalized documents into the structured JSON intelligence that feeds the rest of the pipeline.
@@ -0,0 +1,164 @@
# Page 2 — AI Agent Processing and Structured Extraction
Documents that arrive on the `stonks:queue:extraction` and `stonks:queue:macro_classification` Redis queues are clean, quality-filtered, and normalized — but they are still unstructured text. The job of the Extractor service is to transform that text into structured JSON intelligence that the rest of the pipeline can reason about quantitatively. Two AI agents share this responsibility: the Document Intelligence Extractor handles company-specific news, filings, and transcripts, while the Global Event Classifier handles macro-level geopolitical and economic events. Both agents run through the same Ollama-based inference infrastructure, share a common JSON repair pipeline, and persist their results to PostgreSQL and MinIO for downstream consumption and audit.
This page explains how each agent works, what schemas they produce, how the system validates and repairs LLM output, how runtime configuration is resolved from the database, and how the final structured records are persisted. For a visual overview of the full flow from ingestion through extraction, see the [Ingestion to Extraction Flow diagram](diagrams/ingestion-to-extraction-flow.md). For reference-level detail on agent configuration and the variant management API, see the [AI Agents Guide](../ai-agents.md).
---
## The Document Intelligence Extractor
The Document Intelligence Extractor is the primary AI agent in the pipeline. Registered under the slug `document-extractor` in the `ai_agents` database table, it processes every non-macro document that passes through the Parser — news articles, SEC filings, earnings transcripts, and press releases. Its purpose is to read a normalized document and produce a structured JSON object that captures the document's summary, the companies it affects, the sentiment and impact for each company, the catalysts driving that impact, and the evidence supporting the analysis.
The entry point is `services/extractor/main.py`, which runs a continuous worker loop polling the `stonks:queue:extraction` Redis list. When a job arrives, the worker extracts the `document_id`, `ticker`, and `text` fields from the JSON payload. If the job payload does not include the document text directly, the worker fetches it from MinIO using the `normalized_storage_ref` stored in the `documents` table — the Parser uploaded the normalized text to the `stonks-normalized` bucket during the previous pipeline stage (see [Page 1](01-data-ingestion-and-preparation.md)).
The actual LLM inference is handled by `OllamaClient` in `services/extractor/client.py`. The client sends the document to a local Ollama instance via the `/api/chat` HTTP endpoint with `stream=False` and `think=False`. The `think=False` flag is a deliberate performance choice — it disables the model's chain-of-thought reasoning phase, which would otherwise add two to four minutes of latency per document. The client does not use Ollama's `format` parameter for structured output because of a known Ollama bug (#14645) where the format constraint is silently ignored when `think=False` on qwen3.5 models. Instead, the system relies on prompt engineering to produce JSON and repairs any syntax issues after the fact.
The prompt sent to the model has two parts. The system prompt, defined in `services/extractor/prompts.py`, establishes the model's role as a financial document analyst and sets strict output rules: return only a single JSON object, no markdown fences, no explanation text, every schema field is required, use `"other"` for `catalyst_type` when unsure, keep evidence spans under 20 words, and limit key facts to three to five items. The user prompt, built by `build_extraction_prompt()` in the same module, provides the document text along with document-type-specific guidance. Four guidance variants exist — one each for articles, filings, transcripts, and press releases — each calibrated to the conventions and biases of that document type. For example, the filing guidance instructs the model to preserve the precise legal language of SEC documents, while the press release guidance warns that sentiment may be biased positive and directs the model to focus on concrete metrics rather than marketing language.
The user prompt also includes a list of all tracked tickers from the `companies` table, along with rules for how the model should use them. If a tracked ticker appears verbatim in the text, the model must include it in the output with at least one evidence span. If the article discusses a sector or theme that clearly affects a tracked company (oil prices affecting XOM, AI chip demand affecting NVDA), the model should include that company as well. The model is explicitly told not to invent tickers that are not in the provided list. Documents longer than 8,000 characters are truncated before being included in the prompt, with a `[... truncated for extraction ...]` marker appended.
The `OllamaClient` also supports a `context_window` override via the Ollama `num_ctx` option, which can be configured per agent variant through the `AgentConfigResolver` mechanism described later in this page.
---
## The ExtractionResult Schema
The structured output that the Document Intelligence Extractor produces is defined by the `ExtractionResult` Pydantic model in `services/extractor/schemas.py`. Every field is required — the model has no defaults — so the generated JSON schema forces the LLM to produce every field explicitly. The top-level fields are:
**`summary`** — a concise one-to-three sentence summary of the document's main point. This becomes the human-readable description stored in the `document_intelligence` table.
**`companies`** — an array of `CompanyExtractionItem` objects, one per affected company. Each company entry contains:
- `ticker` — the stock ticker symbol (validated against a regex pattern of one to five uppercase letters).
- `company_name` — the full company name as referenced in the document.
- `relevance` — a float between 0.0 and 1.0 indicating how relevant the document is to this company, where 0 means tangential and 1 means the company is the primary subject.
- `sentiment` — one of `positive`, `negative`, `neutral`, or `mixed`, representing the overall sentiment toward this company in the document.
- `impact_score` — a float between 0.0 and 1.0 estimating the magnitude of impact, where 0 is negligible and 1 is highly material.
- `impact_horizon` — one of `intraday`, `1d`, `1d_7d`, `1d_30d`, `30d_90d`, or `90d_plus`, indicating the expected timeframe over which the impact will play out.
- `catalyst_type` — exactly one of `earnings`, `product`, `legal`, `macro`, `supply_chain`, `m_and_a`, `rating_change`, or `other`. The prompt instructs the model to use `other` when none of the specific categories fit.
- `key_facts` — a list of facts explicitly stated in the document. The prompt emphasizes that the model must not infer or fabricate facts.
- `risks` — a list of risks explicitly mentioned in the document.
- `evidence_spans` — short verbatim quotes from the document supporting the analysis. The prompt requests these be kept under 20 words each.
**`macro_themes`** — a list of broad economic or market themes mentioned in the document, such as `rates`, `inflation`, or `ai_capex`.
**`novelty_score`** — a float between 0.0 and 1.0 indicating how novel or surprising the information is. Routine earnings reports score low; unexpected regulatory actions score high. This value feeds into the novelty bonus component of the signal weighting formula described in [Page 3](03-signal-scoring-and-weighted-signals.md).
**`confidence`** — a float between 0.0 and 1.0 representing the model's confidence in the accuracy of its extraction. Lower values indicate ambiguous or incomplete source text. This value becomes the confidence gate input for signal scoring.
**`extraction_warnings`** — a list of issues encountered during extraction, such as `ambiguous_ticker`, `incomplete_text`, or `low_confidence`. These warnings are persisted alongside the intelligence record for operational monitoring.
The JSON schema is generated programmatically from the Pydantic models via `generate_json_schema()` in `services/extractor/schemas.py`, which calls Pydantic's `model_json_schema()` and then inlines all `$defs` references so the schema is self-contained and Ollama-friendly.
---
## The Global Event Classifier
Not all documents describe company-specific developments. Macro news articles — those tagged with `document_type='macro_event'` by the Parser — describe events that affect entire markets, sectors, or economies: trade wars, central bank rate decisions, commodity supply disruptions, geopolitical conflicts. These documents are routed to the `stonks:queue:macro_classification` Redis queue and processed by the Global Event Classifier agent, registered under the slug `event-classifier` in the `ai_agents` table.
The classifier is implemented in `services/extractor/event_classifier.py`. When the extractor worker in `services/extractor/main.py` pops a job and determines that the document type is `macro_event` (either because the job came from the macro queue or because the `documents` table records it as such), it routes the document to `_process_macro_classification()` instead of the standard extraction pipeline. This function calls `classify_global_event()`, which builds a dedicated prompt, sends it to Ollama through the same `OllamaClient` infrastructure, parses the response, and persists the result.
The classifier's system prompt is distinct from the extractor's. It establishes the model's role as a macro-level news classifier and includes explicit anti-hallucination rules that are critical to preventing the classifier from overreaching. The prompt states that the model should only classify articles about macro events that affect entire markets, sectors, or economies — trade wars, interest rate changes, commodity supply disruptions, regulatory changes, geopolitical conflicts, natural disasters. It explicitly lists what should not be classified as macro events: individual company earnings, lawsuits against a single company, single-company management changes, individual stock analysis, company-specific debt or bankruptcy, and product launches by one company. For these company-specific articles that were incorrectly routed, the model is instructed to set severity to `"low"`, confidence below 0.3, and leave the `affected_regions`, `affected_sectors`, and `affected_commodities` arrays empty.
The user prompt, built by `build_event_classification_prompt()`, reinforces these anti-hallucination rules and provides additional guidance. It instructs the model to only extract facts explicitly stated in the text, to set confidence below 0.4 for vague or speculative content, to distinguish announced policy from rumored policy, and to reserve `"critical"` severity for events affecting multiple countries or entire global markets. Articles longer than 6,000 characters are truncated before inclusion in the prompt.
The output schema is the `GlobalEvent` dataclass, which contains:
- `event_types` — a list of impact type strings, drawn from a fixed set: `supply_disruption`, `demand_shift`, `cost_increase`, `regulatory_pressure`, `currency_impact`, `commodity_shock`, `trade_barrier`, and `geopolitical_risk`. The model is instructed to include all applicable types rather than collapsing to a single category.
- `severity` — one of `low`, `moderate`, `high`, or `critical`.
- `affected_regions` — ISO 3166-1 alpha-2 country codes or region names (e.g., `US`, `CN`, `EU`, `GB`, `JP`). Only regions explicitly mentioned or clearly implied should be included.
- `affected_sectors` — GICS sector identifiers such as `Energy`, `Financials`, `Information Technology`, or `Industrials`.
- `affected_commodities` — commodity identifiers like `crude_oil`, `natural_gas`, `gold`, `copper`, `wheat`, `lithium`, or `semiconductors`. An empty list if no commodities are directly affected.
- `summary` — a one-to-three sentence summary of the event and its market implications.
- `key_facts` — facts explicitly stated in the article, limited to three to five items.
- `estimated_duration` — one of `short_term` (days to weeks), `medium_term` (weeks to months), or `long_term` (months to years).
- `confidence` — a float between 0.0 and 1.0, clamped during parsing.
Each `GlobalEvent` also carries a `model_metadata` object recording the provider (`ollama`), model name, prompt version (`event-classification-v1`), and schema version (`1.0.0`), plus a `source_document_id` linking back to the originating document.
After a successful classification, the system computes macro impact records for all tracked companies using the exposure-based interpolation engine in `services/aggregation/interpolation.py`. Each company's exposure profile — geographic revenue mix, supply chain regions, key input commodities, regulatory jurisdictions, and market position tier — determines how much a given macro event affects that company. Companies with non-zero macro impact scores get `macro_impact_records` rows persisted to PostgreSQL, and aggregation jobs are enqueued to `stonks:queue:aggregation` for each affected ticker. The extractor worker tracks consecutive macro classification failures and emits a critical-level alert after three consecutive failures, continuing with company-only signals in the meantime.
---
## The JSON Repair Pipeline
LLM output is inherently unreliable at the syntactic level. Models sometimes wrap JSON in markdown fences, produce trailing commas, leave strings unterminated, or truncate output mid-object when they hit token limits. The extractor addresses this with a three-stage JSON repair pipeline implemented across `services/extractor/client.py` and `services/extractor/schemas.py`.
The first stage is a direct `json.loads()` call. If the raw model output is already valid JSON, no repair is needed and the pipeline moves straight to validation. This is the fast path for well-behaved model responses.
The second stage strips markdown fences. Models frequently wrap their output in `` ```json ... ``` `` blocks despite being told not to. The `_strip_markdown_fences()` function in `services/extractor/client.py` uses a regex to detect and remove these wrappers before attempting another parse.
The third stage invokes the `json-repair` library as a fallback. The `_repair_json()` function in `services/extractor/client.py` calls `repair_json()` with `return_objects=False` to get a repaired JSON string. This library handles a wide range of common LLM JSON errors — trailing commas, missing quotes, unescaped characters — that would otherwise require custom repair logic.
The `services/extractor/schemas.py` module contains an additional layer of repair logic in its own `_repair_json()` function, which handles cases that the library might miss. It strips non-JSON prefixes (models sometimes prepend explanatory text before the opening brace), removes control characters that break parsing, fixes trailing commas before closing brackets, and as a last resort calls `_repair_truncated_json()` — a state-machine parser that walks the string tracking bracket depth and string state, then appends the necessary closing tokens to complete a truncated JSON object.
For the Global Event Classifier, the `_parse_classification_response()` function in `services/extractor/event_classifier.py` reuses the same `_strip_markdown_fences()` and `_repair_json()` functions from the client module, and additionally handles the case where the model wraps the output object in a single-element list — a quirk observed with some model configurations.
---
## Structural and Semantic Validation
Repairing JSON syntax is only the first step. The `validate_extraction()` function in `services/extractor/schemas.py` performs both structural and semantic validation on the parsed output, and the distinction between the two is important for understanding the retry logic.
Structural validation begins with normalization. The `_normalize_extraction_data()` function fills in missing top-level fields with sensible defaults (empty summary, empty companies array, 0.5 novelty score, 0.3 confidence), clamps numeric fields to the [0.0, 1.0] range, and normalizes per-company fields. Catalyst types that the model produces as free-text alternatives — `"strategic pivot"`, `"acquisition"`, `"lawsuit"`, `"inflation"`, `"launch"` — are mapped to their canonical enum values through a comprehensive alias dictionary. Impact horizons like `"long-term"`, `"short"`, `"immediate"`, or `"near-term"` are similarly mapped to the valid set (`intraday`, `1d`, `1d_7d`, `1d_30d`, `30d_90d`, `90d_plus`). After normalization, the data is validated against the `ExtractionResult` Pydantic model, which enforces type constraints, enum membership, and range bounds.
Semantic validation catches issues that are structurally valid but logically suspect. The `_semantic_checks()` function runs a series of cross-field consistency checks that produce either errors (which trigger a retry) or warnings (which are logged but do not block acceptance). Semantic errors include duplicate tickers across company entries, missing ticker fields, and invalid impact horizon values. Semantic warnings include empty summaries, low confidence with companies present, invalid ticker formats (not matching the one-to-five uppercase letter pattern), missing evidence spans, evidence spans that are too short (under 8 characters) or too long (over 500 characters), high impact scores with no supporting key facts, very low relevance scores, and strong sentiment paired with negligible impact scores.
When the original document text is available, the validator also performs an evidence grounding check: each evidence span is searched for in the source text (case-insensitive), and spans not found in the document are flagged with a warning. This helps detect hallucinated evidence — quotes the model fabricated rather than extracted from the actual text.
If validation produces any semantic errors, the `ValidationReport` is marked as invalid and the `OllamaClient` retry loop treats it as a failed attempt. The retry logic uses exponential backoff with configurable parameters: a base delay (default from `OllamaConfig`), a multiplier applied on each retry, and a maximum delay cap. The number of retries is configurable per agent through the `max_retries` field in the `ai_agents` or `agent_variants` table. Non-retryable errors — HTTP 400, 401, 403, 404, and 422 responses from Ollama — short-circuit the retry loop immediately, since these indicate a problem with the request itself rather than a transient model failure.
Every attempt, whether successful or not, is recorded in an `ExtractionAttempt` dataclass that captures the raw output, validation report, error description, duration in milliseconds, model name, and whether the error was retryable. The full list of attempts is preserved in the `ExtractionResponse` for audit purposes and uploaded to MinIO by the persistence layer.
---
## The AgentConfigResolver: Hot-Swapping Models and Prompts
Both the Document Intelligence Extractor and the Global Event Classifier resolve their runtime configuration through the `AgentConfigResolver` in `services/shared/agent_config.py`. This mechanism allows operators to change models, prompts, timeouts, retry counts, and token budgets without restarting any service — changes take effect within 60 seconds.
The resolver works by querying the `ai_agents` and `agent_variants` PostgreSQL tables with a single SQL statement that uses `COALESCE` to prefer variant values over base agent values. When the extractor worker starts, it creates an `AgentConfigResolver` instance with a 60-second TTL cache and calls `resolver.resolve("document-extractor")` to get the active configuration. If an active variant exists for the agent (enforced by a unique partial index on `agent_variants` that allows at most one active variant per agent), the variant's `model_name`, `system_prompt`, `temperature`, `max_tokens`, `context_window`, `timeout_seconds`, and `max_retries` override the base agent's values wherever the variant provides a non-NULL value. If no active variant exists, the base agent's configuration is used. If the database query fails entirely, the resolver returns `None` and the worker falls back to environment-variable-based `OllamaConfig` defaults.
The resolved configuration is captured in a `ResolvedAgentConfig` frozen dataclass that includes the `agent_id`, `variant_id` (if any), `model_provider`, `model_name`, `system_prompt`, `user_prompt_template`, `prompt_version`, `temperature`, `max_tokens`, `context_window`, `input_token_limit`, `token_budget`, `timeout_seconds`, and `max_retries`. The extractor worker uses this to build an `OllamaConfig` that is passed to the `OllamaClient`.
The 60-second TTL cache means the resolver only hits the database once per minute per agent slug. Cache entries are keyed by slug and timestamped with `time.monotonic()`. When a cached entry expires, the next `resolve()` call re-queries the database and refreshes the cache. The `invalidate()` method can clear a single slug or the entire cache, though in practice the TTL-based expiry is sufficient for normal operations.
The extractor worker re-resolves its configuration every 100 jobs. If the resolved model name has changed (for example, because an operator activated a variant that uses a different model), the worker closes the old `OllamaClient` and creates a new one with the updated configuration. The event classifier is resolved separately and can use a different model than the document extractor — the worker maintains two independent `OllamaClient` instances when the models differ.
Token budget enforcement adds another layer of control. If a variant specifies a `token_budget` (total tokens per hour), the worker checks the `agent_performance_log` table before each invocation to see whether the budget has been exceeded. If so, the invocation is skipped entirely. Input token limits work similarly: if a variant sets an `input_token_limit`, the worker truncates the document text to approximately that many tokens (estimated at four characters per token) before sending it to the model.
For a complete guide to creating variants, activating them, and comparing their performance, see the [AI Agents Guide](../ai-agents.md).
---
## Persistence: From Extraction to Database
Once the LLM produces a valid extraction and it passes validation, the `persist_extraction()` function in `services/extractor/worker.py` orchestrates the full persistence pipeline. This function writes to both MinIO (for audit) and PostgreSQL (for downstream consumption), ensuring that every extraction attempt is fully traceable.
The MinIO persistence layer uploads four artifacts per extraction, all stored under date-partitioned paths in dedicated buckets. The prompt metadata (prompt version, schema version, model name) goes to `stonks-llm-prompts`. The raw model output for every attempt — including failed ones — goes to `stonks-llm-results`, preserving the full retry history. A validation report summarizing the final attempt's status, errors, and warnings is uploaded alongside the raw output. On success, the final parsed intelligence object (the `ExtractionResult` serialized as JSON) is uploaded to a separate path for easy retrieval.
The PostgreSQL persistence writes to two tables. The `document_intelligence` table receives one row per document, containing the summary, macro themes, novelty score, source credibility, extraction warnings, confidence, model metadata (provider, model name, prompt version, schema version), references to the MinIO artifacts (raw output ref, prompt ref), validation status (`valid` or `failed`), validation errors, and retry count. This row is the authoritative record of what the AI extracted from the document.
The `document_impact_records` table receives one row per company mention within the extraction. Each impact record is linked to the parent `document_intelligence` row via `intelligence_id` and to the `companies` table via `company_id`. The record captures the ticker, relevance, sentiment, impact score, impact horizon, catalyst type, key facts, risks, and evidence spans for that specific company. The `company_id` is resolved from a ticker-to-UUID mapping that the worker maintains by querying the `companies` table (refreshed every 100 jobs). If a ticker in the extraction output does not match any tracked company, the impact record is skipped with a warning — the system only persists impact records for companies in its tracked universe.
After persisting the intelligence and impact records, the worker updates the document's status in the `documents` table to `extracted` (or `extraction_failed` if all retry attempts were exhausted). Even failed extractions get a `document_intelligence` row with `validation_status='failed'`, empty summary, zero confidence, and the accumulated error messages — this ensures the failure is visible in the database rather than silently lost.
Performance metrics are collected for every extraction via `collect_metrics()` in `services/extractor/metrics.py` and persisted to a metrics table. Prometheus counters and histograms track extraction attempts, duration, retries, confidence distribution, validation errors, and estimated token usage (input and output, estimated at four characters per token). When a resolved agent config is available, the worker also logs to the `agent_performance_log` table with variant attribution, enabling the A/B comparison queries described in the [AI Agents Guide](../ai-agents.md).
For the Global Event Classifier, persistence follows a parallel path. The prompt and raw output are uploaded to MinIO under an `event_classification/macro/` path prefix. The parsed `GlobalEvent` is persisted to the `global_events` PostgreSQL table, which stores the event types, severity, affected regions, affected sectors, affected commodities, summary, key facts, estimated duration, confidence, source document ID, and model metadata. Downstream, the macro interpolation engine computes `macro_impact_records` for each affected company and persists those as well.
---
## Enqueuing Aggregation Jobs
The final step in the extraction pipeline is to notify the downstream aggregation engine that new intelligence is available. After a successful document extraction, the worker pushes a job onto the `stonks:queue:aggregation` Redis list containing the ticker of the affected company. The aggregation engine (described in [Page 3](03-signal-scoring-and-weighted-signals.md)) will pick up this job and recompute the weighted signals and trend summaries for that ticker, incorporating the freshly extracted intelligence.
For macro events, the enqueue logic is more expansive. After the Global Event Classifier produces a `GlobalEvent` and the interpolation engine computes macro impact records, the worker enqueues an aggregation job for every ticker that received a non-zero macro impact score. A single macro event — say, a new tariff announcement affecting the Energy and Industrials sectors — can trigger aggregation recomputation for dozens of tickers simultaneously. The aggregation job payload includes both the `ticker` and the `macro_event_id`, so the aggregation engine knows to incorporate the new macro signals.
The worker alternates between the extraction and macro classification queues to prevent starvation: every third job is pulled from `stonks:queue:macro_classification`, with the remaining two-thirds from `stonks:queue:extraction`. If the preferred queue is empty, the worker falls back to the other queue, ensuring that neither pipeline stalls while the other has work available.
---
## What Comes Next
At this point, documents have been transformed from unstructured text into structured JSON intelligence — `ExtractionResult` objects for company-specific documents and `GlobalEvent` objects for macro news. These structured records are persisted in PostgreSQL and their tickers have been enqueued for aggregation. But raw extraction output is not yet actionable for trading decisions. The extraction tells us that a document is bearish for AAPL with an impact score of 0.7 and a confidence of 0.8, but it does not tell us how much weight that signal should carry relative to other signals about AAPL, or how it compares to signals from different sources, time periods, or market conditions. [Page 3 — Signal Scoring and the WeightedSignal Abstraction](03-signal-scoring-and-weighted-signals.md) picks up the story from here, explaining how the aggregation engine transforms these raw extraction outputs into weighted signals through confidence gating, recency decay, source credibility scoring, novelty bonuses, and market context multipliers.
@@ -0,0 +1,210 @@
# Page 3 — Signal Scoring and the WeightedSignal Abstraction
The extraction pipeline described in [Page 2](02-ai-agent-processing-and-extraction.md) produces structured intelligence records — `document_impact_records` for company-specific documents, `macro_impact_records` for global events, and `competitive_signal_records` for cross-company pattern propagation. Each record carries a sentiment, an impact score, a confidence value, and a publication timestamp. But these raw values are not directly comparable. A high-confidence extraction from a reputable source published ten minutes ago should carry far more weight than a low-confidence extraction from an unknown source published three weeks ago. A document that breaks genuinely novel information should matter more than one that rehashes yesterday's earnings call. And when the market is moving fast — high volatility, surging volume — fresh signals become even more critical.
The signal scoring layer in `services/aggregation/scoring.py` solves this problem by transforming each raw intelligence record into a `WeightedSignal` object: a document reference paired with a composite aggregation weight that encodes recency, credibility, novelty, confidence, and market conditions into a single number. This page explains how that weight is computed, how sentiment labels become numeric values, and how three independent signal layers — Company, Macro, and Competitive — each produce `WeightedSignal` objects that are concatenated into a unified list before the aggregation engine computes trend summaries. For a visual breakdown of the composite weight formula, see the [Weighted Signal Computation diagram](diagrams/weighted-signal-computation.md). For the full picture of how the three layers merge, see the [Three-Layer Signal Merging diagram](diagrams/three-layer-signal-merging.md).
---
## The WeightedSignal and SignalWeight Dataclasses
The core abstraction is the `WeightedSignal` dataclass, defined in `services/aggregation/scoring.py`. It pairs a document reference with the computed weight and the signal's sentiment and impact values:
- **`document_id`** — the UUID of the source document (for company and macro signals) or a synthetic identifier for pattern-derived signals (e.g., `pattern:AAPL:earnings:7d`).
- **`weight`** — a `SignalWeight` object containing the component breakdown and the final combined score.
- **`sentiment_value`** — a numeric sentiment value: `+1.0` for positive, `-1.0` for negative, `0.0` for neutral or mixed.
- **`impact_score`** — the magnitude of impact, drawn from the extraction's per-company impact score for company signals, or scaled by a layer-specific weight multiplier for macro and competitive signals.
The `SignalWeight` dataclass captures the individual components that feed into the combined weight, making the scoring decision fully transparent and auditable:
- **`recency`** — the exponential decay weight based on document age.
- **`credibility`** — the source credibility weight after clamping and exponentiation.
- **`novelty_bonus`** — the additive bonus derived from the document's novelty score.
- **`confidence_gate`** — either `1.0` (signal passes) or `0.0` (signal is gated out).
- **`market_ctx_multiplier`** — a multiplicative boost from market conditions, always `>= 1.0`.
- **`combined`** — the final composite weight used by the aggregation engine.
The `ScoringConfig` frozen dataclass holds all tunable parameters for the scoring functions — half-life hours per window, credibility bounds, novelty bonus cap, confidence floor, and market context thresholds. A module-level `DEFAULT_CONFIG` singleton provides the production defaults, but every scoring function accepts an optional `config` parameter so that tests and alternative configurations can override any parameter without modifying global state.
---
## The Composite Weight Formula
The `compute_signal_weight()` function in `services/aggregation/scoring.py` computes the combined weight for a single document signal. The formula is:
```
combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier
```
Each factor is computed independently and then multiplied together. This multiplicative structure means that any single factor can zero out the entire weight (the confidence gate) or amplify it (the market context multiplier), and the interaction between factors is naturally captured — a highly credible, very recent document with novel information in a volatile market receives the maximum possible weight, while a stale, low-credibility document with routine information receives a weight close to zero.
The following sections describe each component in detail.
---
## Confidence Gate
The confidence gate is the first and most decisive filter. If the extraction confidence for a document falls below the `confidence_floor` threshold — set to `0.2` in the default `ScoringConfig` — the gate evaluates to `0.0` and the entire combined weight becomes zero. The document is effectively excluded from aggregation. If the confidence meets or exceeds the threshold, the gate evaluates to `1.0` and has no further effect on the weight.
This binary gate exists because documents with very low extraction confidence are too unreliable to aggregate. A confidence of 0.15 typically means the LLM struggled to parse the document — perhaps the text was truncated, the language was ambiguous, or the document type was unusual. Including such signals would add noise rather than information. The threshold of 0.2 is deliberately low; it filters only the most unreliable extractions while allowing moderately confident signals to participate (their lower confidence is reflected through the credibility component instead).
---
## Recency Decay
The `recency_weight()` function computes an exponential decay based on how old a document is relative to the aggregation anchor time. The formula is:
```
w = 2^(age_hours / half_life)
```
A document published exactly one half-life ago receives a recency weight of `0.5`. A document published two half-lives ago receives `0.25`, and so on. A document published at or after the reference time receives the maximum weight of `1.0`.
The half-life varies by trend window, reflecting the intuition that shorter windows need faster decay to stay responsive, while longer windows should give older documents more influence. The default half-lives, configured in `ScoringConfig.half_life_hours`, are:
| Window | Half-Life |
|--------|-----------|
| `intraday` | 2 hours |
| `1d` | 12 hours |
| `7d` | 72 hours (3 days) |
| `30d` | 240 hours (10 days) |
| `90d` | 720 hours (30 days) |
For the intraday window, a document published four hours ago already has a recency weight of `0.25` — it is rapidly losing influence as newer information arrives. For the 90-day window, that same four-hour-old document still has a recency weight of essentially `1.0`, because the 30-day half-life means age only becomes significant over weeks.
A floor value of `min_recency_weight = 0.01` prevents very old documents from being completely zeroed out. Even a document from months ago retains a trace-level weight of 1%, ensuring it can still contribute to trend computation if no newer signals exist. Both timestamps are normalized to UTC; naive datetimes are treated as UTC to avoid timezone-related scoring errors.
---
## Source Credibility
The `credibility_weight()` function transforms a source's credibility score into a weight component. The raw credibility value — a float between 0.0 and 1.0 stored in the `document_intelligence` table — is first clamped to the range `[0.1, 1.0]` using the `credibility_floor` and `credibility_ceiling` parameters from `ScoringConfig`. This clamping ensures that even the least credible sources retain a minimum weight of 0.1 rather than being completely silenced, while preventing any source from exceeding a weight of 1.0.
After clamping, the value is raised to the `credibility_exponent` power. The default exponent is `1.0`, which means the clamped credibility passes through unchanged. Setting the exponent above 1.0 would penalize low-credibility sources more aggressively — for example, an exponent of 2.0 would reduce a credibility of 0.5 to a weight of 0.25. Setting it below 1.0 would flatten the curve, making the system more tolerant of lower-credibility sources. The exponent is configurable through `ScoringConfig` to allow operators to tune the credibility sensitivity without changing the scoring code.
---
## Novelty Bonus
The novelty bonus rewards documents that contain genuinely new information. The bonus is computed as:
```
novelty_bonus = novelty_score × novelty_bonus_max
```
where `novelty_score` is the 0.0-to-1.0 value produced by the extraction model (see the `ExtractionResult` schema in [Page 2](02-ai-agent-processing-and-extraction.md)) and `novelty_bonus_max` is `0.25` by default. This means the bonus ranges from `0.0` (completely routine information) to `0.25` (maximally novel information), providing up to a 25% boost to the signal weight.
The bonus enters the composite formula as `(1 + novelty_bonus)`, so it acts as a multiplicative amplifier on the base weight. A document with a novelty score of 1.0 gets its weight multiplied by 1.25; a document with a novelty score of 0.0 gets multiplied by 1.0 (no change). This design ensures that novelty can only increase a signal's weight, never decrease it — routine information is not penalized, it simply does not receive the bonus.
---
## Market Context Multiplier
The `market_context_multiplier()` function computes a boost factor based on real-time market conditions for the ticker being aggregated. The multiplier is always `>= 1.0`, meaning market context can only amplify signal weights, never reduce them. When no market context data is available (the `MarketContext` object from `services/shared/schemas.py` has `has_data == False`), the multiplier defaults to `1.0`.
Two market features contribute to the boost:
**Volatility boost.** When the ticker's price volatility exceeds the `volatility_recency_boost_threshold` (default `1.0` in price units), the excess volatility is transformed through a logarithmic scaling function: `log₁₊(excess) × 0.15`. The logarithmic scaling prevents extreme volatility from producing runaway weight amplification. The boost is capped at `volatility_recency_boost_max = 0.30`, so the maximum volatility contribution is a 30% weight increase. The rationale is that in highly volatile markets, fresh intelligence is disproportionately valuable — a signal about NVDA matters more when NVDA is swinging 5% intraday than when it is trading in a tight range.
**Volume surge boost.** When the ticker's volume change percentage exceeds `volume_surge_threshold_pct = 50.0%` (meaning trading volume is at least 50% above the prior period's average), a flat `volume_surge_boost = 0.15` is added. Unlike the volatility boost, this is binary — either the volume threshold is met and the full 15% boost applies, or it is not and no boost is added. High-volume moves carry more conviction because they represent broader market participation rather than thin-market noise.
The two boosts are additive within the multiplier: `multiplier = 1.0 + volatility_boost + volume_surge_boost`. In the most extreme case — high volatility and a volume surge — the combined multiplier reaches `1.0 + 0.30 + 0.15 = 1.45`, amplifying the signal weight by 45%. The `MarketContext` data is fetched by `services/aggregation/market_context.py` from the market data tables in PostgreSQL, using the same ticker and window parameters as the impact record query.
---
## Sentiment Mapping
Before signals can be aggregated into trend summaries, the categorical sentiment labels from the extraction output must be converted to numeric values. The `sentiment_to_numeric()` function in `services/aggregation/scoring.py` performs this mapping:
| Sentiment Label | Numeric Value |
|----------------|---------------|
| `positive` | `+1.0` |
| `negative` | `-1.0` |
| `neutral` | `0.0` |
| `mixed` | `0.0` |
The mapping is case-insensitive. Any unrecognized label defaults to `0.0`. The choice to map both `neutral` and `mixed` to `0.0` is deliberate — a mixed-sentiment document (one that contains both positive and negative signals for the same company) should not push the trend in either direction. The contradiction between the positive and negative aspects is captured separately by the contradiction detection system described in [Page 4](04-trend-aggregation-and-accumulating-signals.md), rather than being baked into the sentiment value itself.
For macro signals, the direction-to-sentiment mapping in `services/aggregation/worker.py` follows the same pattern: `positive` maps to `+1.0`, `negative` to `-1.0`, and both `mixed` and `neutral` to `0.0`. For competitive signals built by `build_pattern_weighted_signals()` in `services/aggregation/signal_propagation.py`, the sentiment is derived from the pattern's directional bias: `+1.0` if `bullish_pct > bearish_pct`, `-1.0` otherwise.
---
## Weighted Sentiment Average
The `weighted_sentiment_average()` function computes the central metric that drives trend direction: a weight-adjusted average sentiment across all signals for a ticker in a given window. The formula is:
```
weighted_avg = Σ(combined_weight × impact_score × sentiment_value) / Σ(combined_weight × impact_score)
```
Each signal contributes its sentiment value scaled by both its composite weight and its impact score. The denominator normalizes by the total effective weight, producing a value in the range `[-1.0, +1.0]`. A result near `+1.0` means the weighted evidence is overwhelmingly positive; near `-1.0` means overwhelmingly negative; near `0.0` means either neutral or evenly split.
The use of `combined_weight × impact_score` as the effective weight means that high-impact, high-weight signals dominate the average. A single high-confidence, recent, credible document with a strong impact score can outweigh several older, lower-impact documents — which is the intended behavior. The aggregation engine in `services/aggregation/worker.py` passes this weighted average to `derive_trend_direction()`, which maps it to a `TrendDirection` enum value (bullish, bearish, mixed, or neutral) using the thresholds described in [Page 4](04-trend-aggregation-and-accumulating-signals.md).
If the total effective weight is zero — either because no signals exist or all signals were gated out by the confidence floor — the function returns `0.0`, which maps to a neutral trend direction.
---
## The Three Signal Layers
The aggregation engine in `services/aggregation/worker.py` does not treat all intelligence sources equally. Signals flow through three independent layers, each with a different relative weight, before being concatenated into a single `WeightedSignal` list for trend computation. This layered architecture allows the system to incorporate diverse intelligence sources while controlling how much influence each source type has on the final trend.
### Layer 1 — Company Signals (Weight: 1.0)
Company signals are the primary layer. They are built by `build_weighted_signals()` in `services/aggregation/worker.py` from `document_impact_records` — the per-company extraction output produced by the Document Intelligence Extractor (see [Page 2](02-ai-agent-processing-and-extraction.md)). Each impact record's sentiment is converted via `sentiment_to_numeric()`, and its impact score is used directly without any layer-level scaling. The `compute_signal_weight()` function produces the composite weight using the document's publication time, source credibility, novelty score, extraction confidence, and the ticker's current market context.
Company signals carry a relative weight of `1.0` — they are the baseline against which other layers are measured. This reflects the design principle that direct, company-specific intelligence (an earnings report about AAPL, a product launch by TSLA, a lawsuit against META) is the most relevant and reliable signal for that company's trend.
### Layer 2 — Macro Signals (Weight: 0.3)
Macro signals capture the indirect impact of global events on individual companies. They are built by `build_macro_weighted_signals()` in `services/aggregation/worker.py` from `macro_impact_records` — the per-company impact scores computed by the exposure-based interpolation engine after the Global Event Classifier processes a macro news article. The sentiment is mapped from the `impact_direction` field (`positive``+1.0`, `negative``-1.0`, `mixed`/`neutral``0.0`), and the impact score is scaled by `MACRO_SIGNAL_WEIGHT`, which defaults to `0.3` in `AggregationConfig`.
The 0.3 weight means that a macro signal's impact score is reduced to 30% of its raw value before entering the aggregation. This attenuation reflects the inherent uncertainty in macro-to-company impact estimation — a tariff announcement might affect XOM's revenue, but the magnitude depends on exposure profiles, supply chain flexibility, and competitive dynamics that the interpolation engine can only approximate. By weighting macro signals at 0.3 relative to company signals at 1.0, the system ensures that macro intelligence informs the trend without overwhelming direct company-specific evidence.
The recency decay, credibility, and confidence gating for macro signals use the same `compute_signal_weight()` function as company signals. The `published_at` timestamp comes from the global event's source document (the macro news article), and the `source_credibility` and `extraction_confidence` both use the macro impact record's `confidence` field.
### Layer 3 — Competitive Signals (Weight: 0.2)
Competitive signals capture cross-company effects: when a catalyst hits one company, historical patterns suggest how competitors might be affected. They are built by `build_pattern_weighted_signals()` in `services/aggregation/signal_propagation.py` from two sources: `HistoricalPattern` objects (self-company patterns mined by `services/aggregation/pattern_matcher.py`) and `CompetitiveSignalRecord` objects (cross-company propagation signals stored in `competitive_signal_records`).
For historical patterns, the sentiment is derived from the pattern's directional bias (`+1.0` if `bullish_pct > bearish_pct`, `-1.0` otherwise), and the impact score is the pattern's `avg_strength` multiplied by `competitive_signal_weight` (default `0.2` from `CompetitiveConfig`). The `published_at` for recency decay uses the pattern's `data_end` — the most recent data point in the pattern's sample — and the `extraction_confidence` uses the pattern's `pattern_confidence`. Source credibility is set to `1.0` because patterns are derived from validated historical data, and novelty is fixed at `0.5`.
For competitive signal records, the same structure applies: sentiment from `signal_direction`, impact from `signal_strength × competitive_signal_weight`, recency from `computed_at`, and confidence from `pattern_confidence`.
The 0.2 weight makes competitive signals the lightest layer. This is appropriate because competitive signal propagation involves the most inference — the system is predicting how Company B will react based on what happened to Company A in historically similar situations. The signal is valuable as supplementary evidence but should not drive trend direction on its own.
---
## Signal Merging in the Aggregation Engine
The `aggregate_company_window()` function in `services/aggregation/worker.py` orchestrates the merging of all three layers for a single ticker and window. The process follows a clear sequence:
1. **Fetch company impact records** from `document_impact_records` for the ticker within the window's time range.
2. **Fetch market context** for the ticker from market data tables.
3. **Build company weighted signals** via `build_weighted_signals()`.
4. **Check the macro toggle** — query `risk_configs` for the `macro_enabled` flag, then fetch and merge macro signals if enabled.
5. **Check the competitive toggle** — query `risk_configs` for the `competitive_enabled` flag, then fetch patterns, fetch competitive signals, and merge if enabled.
6. **Concatenate** all `WeightedSignal` lists into a single list.
7. **Assemble the `TrendSummary`** from the merged signals.
The concatenation in step 6 is a simple list append — `signals = signals + macro_signals` followed by `signals = signals + pattern_weighted`. There is no re-weighting or normalization at the merge point. The relative influence of each layer is already encoded in the impact scores (scaled by 0.3 for macro, 0.2 for competitive, 1.0 for company) and in the composite weights computed by `compute_signal_weight()`. The `weighted_sentiment_average()` function then naturally produces a sentiment average that reflects these relative weights.
---
## Runtime Toggles and Graceful Degradation
Both the macro and competitive signal layers can be enabled or disabled at runtime through the `risk_configs` PostgreSQL table, without restarting any service. The toggle state is read fresh from the database at the start of every aggregation cycle — there is no caching — so changes take effect on the very next cycle.
The `fetch_macro_enabled()` function in `services/aggregation/worker.py` queries the most recent active `risk_configs` row and reads the `config->>'macro_enabled'` JSON field. If the field is explicitly set to `"true"` or `"false"`, that value overrides the `AggregationConfig` default. If no config row exists or the field is absent, the function returns `None` and the engine falls back to the `AggregationConfig.macro_enabled` default (which is `True`). The `fetch_competitive_enabled()` function follows the identical pattern for the `competitive_enabled` field.
When a layer is disabled, the aggregation engine simply skips the fetch-and-merge step for that layer. Company signals are always computed — they cannot be toggled off. This means the system degrades gracefully: disabling the macro layer produces trends based on company signals alone (plus competitive signals if enabled), and disabling the competitive layer produces trends based on company and macro signals. Disabling both layers reduces the engine to its original single-layer behavior, using only direct document intelligence.
Crucially, disabling a layer does not stop upstream processing. When the macro layer is disabled, the Global Event Classifier continues to classify macro events and the interpolation engine continues to compute `macro_impact_records`. The data accumulates in PostgreSQL. When the layer is re-enabled, the aggregation engine immediately picks up all the macro impact records that were computed while the layer was disabled — there is no data loss or gap in coverage. The same applies to competitive signals: pattern mining and signal propagation continue regardless of the toggle state.
If the competitive signal fetch fails at runtime (for example, due to a database timeout), the aggregation engine catches the exception, logs it, and continues with company and macro signals only. This exception-based graceful degradation ensures that a transient failure in one layer does not block trend computation entirely.
---
## What Comes Next
At this point, every document intelligence record, macro impact record, and competitive signal record has been transformed into a `WeightedSignal` with a composite weight that encodes recency, credibility, novelty, confidence, and market conditions. The three signal layers have been merged into a single list, and the weighted sentiment average has been computed. But a single aggregation cycle produces only a snapshot — a point-in-time view of the evidence. The real power of the system emerges when these snapshots accumulate across multiple documents and time windows, building a case for action. [Page 4 — Trend Aggregation and Accumulating Signals](04-trend-aggregation-and-accumulating-signals.md) explains how the aggregation engine computes `TrendSummary` objects across five time windows, how consecutive same-direction signals strengthen trend confidence and escalate the system's response from neutral observation to actionable trading recommendations, and how contradiction detection and evidence ranking ensure that the trend reflects genuine consensus rather than noise.
@@ -0,0 +1,267 @@
# Page 4 — Trend Aggregation and Accumulating Signals
The scoring layer described in [Page 3](03-signal-scoring-and-weighted-signals.md) transforms every intelligence record into a `WeightedSignal` — a document reference paired with a composite weight that encodes recency, credibility, novelty, confidence, and market conditions. Three independent signal layers (Company at weight 1.0, Macro at 0.3, Competitive at 0.2) each produce `WeightedSignal` objects that are concatenated into a single list. But a single list of weighted signals is still just raw material. The aggregation engine in `services/aggregation/worker.py` is where that raw material becomes a decision-grade assessment: a `TrendSummary` object that captures the direction, strength, confidence, contradiction level, and supporting evidence for a ticker across a specific time window. This page explains how that transformation works — from weighted sentiment averages through trend direction derivation, contradiction detection, evidence ranking, and confidence computation — and, critically, how consecutive signals pointing in the same direction accumulate across documents and time windows to escalate the system's response from passive observation to actionable trading recommendations.
For a visual overview of the accumulation and escalation process, see the [Trend Accumulation and Escalation diagram](diagrams/trend-accumulation-escalation.md). For how the three signal layers merge into the aggregation engine, see the [Three-Layer Signal Merging diagram](diagrams/three-layer-signal-merging.md).
---
## Five Time Windows
The aggregation engine does not compute a single trend for each ticker. It computes five, one for each time window defined in `services/aggregation/worker.py`:
| Window | Lookback Duration |
|--------|-------------------|
| `intraday` | 12 hours |
| `1d` | 1 day |
| `7d` | 7 days |
| `30d` | 30 days |
| `90d` | 90 days |
Each window produces an independent `TrendSummary` by fetching all impact records, macro impacts, and competitive signals for the ticker within that window's time range. The `aggregate_company_window()` function in `services/aggregation/worker.py` orchestrates this per-window computation: it determines the time range from the window's lookback duration, fetches `document_impact_records` from PostgreSQL, retrieves market context, builds company weighted signals, checks the macro and competitive runtime toggles (see [Page 3](03-signal-scoring-and-weighted-signals.md) for toggle details), merges any enabled layer signals, and then assembles the `TrendSummary`.
The five-window design serves a specific purpose. Short windows (intraday, 1d) capture fast-moving sentiment shifts — a breaking earnings miss, a sudden regulatory action — while long windows (30d, 90d) reveal sustained trends that persist across many documents and news cycles. A ticker might show a bearish intraday trend after a single negative article, but a neutral 30-day trend because the broader evidence base is balanced. The recommendation engine downstream (described in [Page 5](05-recommendation-generation.md)) evaluates each window's `TrendSummary` independently, so the system can respond to both short-term catalysts and long-term directional shifts.
The `aggregate_company()` function iterates over all effective windows (configurable via `AggregationConfig.windows`, defaulting to all five) and calls `aggregate_company_window()` for each one. This means a single aggregation cycle for one ticker produces up to five `TrendSummary` objects, each reflecting a different temporal perspective on the same underlying evidence.
---
## Trend Direction Derivation
Once the weighted sentiment average has been computed from the merged signal list (see the `weighted_sentiment_average()` function described in [Page 3](03-signal-scoring-and-weighted-signals.md)), the `derive_trend_direction()` function in `services/aggregation/worker.py` maps that numeric value to a `TrendDirection` enum. The rules are evaluated in a specific order, and the first matching rule wins:
1. **Mixed** — If the contradiction score exceeds `0.10` (the `MIXED_THRESHOLD` constant) *and* the absolute value of the average sentiment is below `0.30`, the direction is `MIXED`. This rule fires first because high contradiction with a weak directional signal indicates genuine disagreement in the evidence — the trend is not simply neutral, it is actively contested.
2. **Bullish** — If the average sentiment is `≥ 0.15` (the `BULLISH_THRESHOLD` constant), the direction is `BULLISH`. This means the weight-adjusted evidence leans positive with enough conviction to cross the threshold.
3. **Bearish** — If the average sentiment is `≤ -0.15` (the `BEARISH_THRESHOLD` constant), the direction is `BEARISH`. The symmetric threshold ensures that bullish and bearish classifications require the same magnitude of evidence.
4. **Neutral** — If none of the above conditions are met, the direction is `NEUTRAL`. This covers the range where the average sentiment falls between -0.15 and +0.15 without high contradiction — the evidence is either balanced or insufficient to establish a directional lean.
The mixed-first evaluation order is important. Consider a scenario where five documents are bullish and four are bearish, all with similar weights. The weighted sentiment average might be slightly positive (say, +0.08), which would normally map to neutral. But the contradiction score — computed from the minority/majority weight split — would be high (close to 0.44). The mixed rule catches this case: the evidence is not neutral, it is conflicted. This distinction matters downstream because mixed trends receive different treatment in the recommendation engine than neutral trends.
---
## Contradiction Detection
The contradiction detection module in `services/aggregation/contradiction.py` provides a structured analysis of disagreement within the signal set. Rather than collapsing contradictory evidence into a single number, it produces a `ContradictionResult` containing both an overall score and a list of `DisagreementDetail` objects that explain *where* the disagreement lies.
The `detect_contradictions()` function runs two analyses:
### Sentiment Disagreement
The `_detect_sentiment_disagreement()` function examines whether both positive and negative sentiment signals exist in the signal set. For each signal with a non-zero effective weight (`combined_weight × impact_score > 0`), it classifies the signal as positive or negative based on its `sentiment_value` and accumulates the effective weight for each side. If both sides have at least one signal, it produces a `DisagreementDetail` with dimension `"sentiment"`, listing the document IDs and weights for each side, along with a human-readable description like "Sentiment split: 3 positive vs 2 negative signals (minority weight ratio 38%)".
### Catalyst-Level Disagreement
The `_detect_catalyst_disagreement()` function goes deeper. It groups signals by their `catalyst_type` (earnings, product_launch, regulatory, etc.) using `CatalystEntry` objects built from the `document_impact_records`. Within each catalyst group, it checks whether both positive and negative signals exist. If they do, it produces a `DisagreementDetail` with dimension `"catalyst:<type>"` — for example, `"catalyst:earnings"` when some documents interpret an earnings report positively and others negatively. This catalyst-level analysis is valuable because it pinpoints the specific topic of disagreement rather than just flagging that disagreement exists somewhere in the evidence.
### The Overall Contradiction Score
The `_compute_overall_score()` function computes the backward-compatible scalar contradiction score using the minority/majority weight ratio formula:
```
contradiction_score = minority_weight / total_weight
```
where `minority_weight` is the smaller of the positive and negative effective weights, and `total_weight` is their sum. Signals with zero effective weight or neutral sentiment are excluded. The score ranges from `0.0` (complete agreement — all signals point the same direction) to `0.5` (perfect split — positive and negative weights are exactly equal). A score of `0.0` means no contradiction at all. A score above `0.10` combined with a weak average sentiment triggers the mixed direction classification in `derive_trend_direction()`.
The contradiction score also feeds directly into the confidence computation as a penalty, described in the next section. High contradiction reduces the system's confidence in the trend, which in turn affects whether the trend can escalate to actionable recommendations.
---
## Evidence Ranking
Not all documents contributing to a trend are equally important. The `rank_evidence()` function in `services/aggregation/worker.py` delegates to the evidence ranking module (`services/aggregation/evidence.py`) to produce ordered lists of the most influential supporting and opposing documents. The ranking uses a composite scoring approach configured by `EvidenceRankConfig`, considering multiple factors:
- **Weight** — the signal's composite weight from the scoring layer, reflecting recency, credibility, novelty, confidence, and market context.
- **Impact** — the extraction's impact score for the company, reflecting how significant the document's content is.
- **Recency** — how recently the document was published, with more recent documents ranked higher.
- **Confidence** — the extraction confidence, reflecting how reliably the LLM parsed the document.
Signals are split into supporting (positive sentiment) and opposing (negative sentiment) groups. Neutral and mixed sentiment signals are excluded from evidence lists — they do not argue for or against the trend direction. Within each group, signals are sorted by their composite rank score in descending order, and the top entries (up to `MAX_EVIDENCE_REFS = 10` per side) are returned as document ID lists.
The `assemble_trend_with_evidence()` function in `services/aggregation/worker.py` uses the detailed variant `rank_evidence_detailed()` to get `RankedEvidence` objects that include the individual scoring components (weight, impact, recency, confidence, sentiment value). These detailed rankings are persisted to the `trend_evidence` table for auditability, while the document ID lists are stored directly in the `TrendSummary` as `top_supporting_evidence` and `top_opposing_evidence`.
The evidence ranking serves two purposes. First, it provides the recommendation engine with the most relevant documents to cite in its thesis generation (see [Page 5](05-recommendation-generation.md)). Second, it gives human reviewers a quick way to understand *why* the system reached a particular trend assessment — the top-ranked documents are the ones that most influenced the direction and strength.
---
## Confidence Computation
The `compute_trend_confidence()` function in `services/aggregation/worker.py` produces the confidence score for a `TrendSummary`. This score is critical because it directly gates whether a trend can produce actionable recommendations — the eligibility evaluation in `services/recommendation/eligibility.py` requires a minimum confidence of `0.35` to generate any recommendation at all, and higher confidence thresholds control escalation to paper and live trading modes.
Confidence is computed from four components:
### Unique Source Count
The function counts the number of unique document IDs across all active signals (those with `combined_weight > 0`). This count is divided by 15 and capped at `0.8`:
```
count_factor = min(unique_sources / 15.0, 0.8)
```
A trend backed by 15 or more unique source documents reaches the maximum count contribution of `0.8`. A trend backed by a single document gets only `0.067`. This component rewards breadth of evidence — a trend confirmed by many independent sources is more trustworthy than one driven by a single article, regardless of how high that article's individual weight might be.
### Average Extraction Credibility
The average credibility weight across all active signals provides a baseline quality measure. If most contributing documents come from high-credibility sources, this component is high. If the evidence is dominated by low-credibility sources, confidence is penalized accordingly.
### Signal Agreement with Sample-Size Dampening
The agreement ratio measures what fraction of directional signals (bullish + bearish, excluding neutral) agree on the majority direction. If 8 out of 10 directional signals are bullish, the raw agreement is `0.8`. But raw agreement is misleading with small sample sizes — 1 out of 1 signals agreeing gives a perfect `1.0` agreement, which is not meaningful.
To address this, the agreement is dampened by a logarithmic sample-size factor:
```
agreement_dampener = min(1.0, log₂(unique_sources + 1) / log₂(8))
```
This dampener saturates at `1.0` when `unique_sources` reaches approximately 7 (since `log₂(8) = 3.0` and `log₂(8) = 3.0`). With fewer sources, the dampener reduces the agreement contribution: 1 source gives a dampener of `0.33`, 3 sources give `0.67`, and 7 sources give the full `1.0`. The log₂ scaling means that each additional source provides diminishing marginal improvement to the dampener, which matches the intuition that the jump from 1 to 3 sources is far more meaningful than the jump from 15 to 17.
### Contradiction Penalty
The contradiction score computed by `services/aggregation/contradiction.py` is applied as a direct penalty:
```
contradiction_penalty = contradiction_score × 0.4
```
A contradiction score of `0.5` (perfect split) produces a penalty of `0.2`, which is substantial enough to push a moderately confident trend below the eligibility threshold.
### The Combined Formula
The four components are combined as:
```
confidence = 0.3 × count_factor + 0.3 × avg_credibility + 0.4 × agreement contradiction_penalty
```
The result is clamped to `[0.0, 1.0]`. The weighting gives signal agreement the largest share (40%), reflecting the principle that consensus among diverse sources is the strongest indicator of a reliable trend. Source count and credibility each contribute 30%, providing a balanced assessment of evidence breadth and quality. The contradiction penalty can reduce confidence significantly — a highly contradicted trend with a score of 0.4 loses 0.16 points of confidence, which can easily drop it below the 0.35 eligibility gate.
---
## How Accumulating Signals Escalate Decisions
The trend direction, strength, and confidence computed by the aggregation engine are not just descriptive — they directly determine what action the system takes. The escalation path from passive observation to active trading is governed by the eligibility thresholds defined in `services/recommendation/eligibility.py`, and the key insight is that consecutive signals pointing in the same direction naturally strengthen the trend metrics that control this escalation.
### The Escalation Ladder
The `EligibilityConfig` dataclass in `services/recommendation/eligibility.py` defines the thresholds that map trend metrics to actions:
**Neutral (no recommendation).** A trend fails the eligibility gates entirely when confidence is below `0.35`, trend strength is below `0.10`, contradiction exceeds `0.60`, evidence count is below `2`, or the direction is neutral. The `_check_gates()` function evaluates these hard gates — if any gate fails, no recommendation is generated for that window.
**Watch.** A trend that passes the gates but has a direction of mixed, or has strength below `0.25` with confidence below `0.50`, maps to a `WATCH` action via `_determine_action()`. This is the system's way of saying "something is happening, but the evidence is not strong enough to act on." Watch recommendations are always `informational` mode — they are logged for human review but never trigger trades.
**Hold.** When the trend has a clear direction (bullish or bearish) but strength remains below `0.25` while confidence reaches `0.50` or above, the action maps to `HOLD`. This indicates that the directional signal is real but not yet strong enough for a position change. Like watch, hold recommendations are `informational` mode.
**Buy / Sell.** When trend strength reaches `0.25` or above with a bullish direction, the action is `BUY`. With a bearish direction at the same strength threshold, the action is `SELL`. These are the only actions that can escalate beyond informational mode — `_determine_mode()` evaluates whether the recommendation qualifies for `paper_eligible` (confidence ≥ `0.50`) or `live_eligible` (confidence ≥ `0.70`, contradiction ≤ `0.25`, evidence ≥ `5`).
### How Accumulation Drives Escalation
Consider a ticker that starts with no recent intelligence. The first bearish article arrives — a single document with negative sentiment. In the intraday window, this produces:
- **Trend strength** = `|avg_sentiment|` ≈ the absolute weighted sentiment from one signal, likely close to the impact score.
- **Confidence** = low, because `count_factor = min(1/15, 0.8) = 0.067` and the agreement dampener is only `log₂(2)/log₂(8) = 0.33`.
- **Direction** = bearish (if the weighted sentiment is ≤ -0.15).
With confidence well below `0.35`, this trend fails the eligibility gate entirely. No recommendation is generated. The system is in the neutral state.
A second bearish article arrives hours later. Now the intraday window has two signals:
- **Unique sources** = 2, so `count_factor = 0.133` and `agreement_dampener = log₂(3)/log₂(8) ≈ 0.53`.
- **Agreement** = `1.0 × 0.53 = 0.53` (both signals agree on bearish).
- **Confidence** ≈ `0.3 × 0.133 + 0.3 × avg_cred + 0.4 × 0.53` — likely around `0.35-0.45` depending on credibility.
If confidence crosses `0.35` and strength exceeds `0.10`, the trend passes the eligibility gates. But with strength below `0.25`, the action is `WATCH` or `HOLD` depending on confidence.
A third and fourth bearish article arrive over the next day. The 1-day window now has four agreeing signals:
- **Unique sources** = 4, so `count_factor = 0.267` and `agreement_dampener = log₂(5)/log₂(8) ≈ 0.77`.
- **Agreement** = `1.0 × 0.77 = 0.77`.
- **Confidence** ≈ `0.3 × 0.267 + 0.3 × avg_cred + 0.4 × 0.77` — likely `0.50-0.60`.
- **Strength** = `|avg_sentiment|` — with four bearish signals and no contradicting evidence, this could easily exceed `0.25`.
Now the trend maps to `SELL` with `paper_eligible` mode (confidence ≥ `0.50`). The system has escalated from no recommendation to a paper-eligible sell recommendation purely through the accumulation of consistent bearish evidence.
If the bearish evidence continues — more documents, more sources, higher credibility — confidence climbs further. At confidence ≥ `0.70` with contradiction ≤ `0.25` and evidence ≥ `5`, the recommendation reaches `live_eligible` mode, the highest escalation level.
The same process works in reverse for bullish accumulation: consecutive positive signals strengthen the bullish trend, increase confidence through source diversity and agreement, and escalate from watch through hold to buy.
### The Role of Contradiction in Preventing False Escalation
Accumulation only works when signals agree. If the fifth article about a ticker is bullish while the previous four were bearish, the contradiction score jumps — `minority_weight / total_weight` increases because the minority (bullish) side now has non-zero weight. This has two effects: the contradiction penalty reduces confidence (potentially dropping it below an eligibility threshold), and if the contradiction exceeds `0.10` with `|avg_sentiment| < 0.30`, the direction flips to mixed, which maps to `WATCH` regardless of strength. The system effectively de-escalates when the evidence becomes contested, requiring a clearer consensus before re-escalating.
---
## Trend Projections
After the `TrendSummary` is assembled and persisted, the aggregation engine computes a forward-looking `TrendProjection` via `compute_projection()` in `services/aggregation/projection.py`. Projections estimate where the trend is heading based on current momentum, macro signal decay, and upcoming catalysts. They are advisory — they do not directly trigger recommendations — but they provide valuable context for human reviewers and can inform future automated decision-making.
### Momentum
The `compute_trend_momentum()` function computes the rate of change in signed trend strength between the current and previous aggregation cycles. If the current window shows a bearish trend at strength `0.40` and the previous cycle showed bearish at `0.30`, the momentum is `-0.10` (strengthening bearish). If no previous data is available, the function uses a heuristic: momentum is estimated as half the current signed strength, providing a reasonable baseline for new trends.
Momentum enters the projection as a half-weighted adjustment to the current signed strength:
```
momentum_projected_signed = direction_sign × current_strength + momentum × 0.5
```
This means momentum influences the projection but does not dominate it — a strong current trend with weakening momentum still projects as directional, just with reduced strength.
### Macro Decay
The `project_macro_decay()` function estimates how active macro events will evolve over the projection horizon. Each macro event has an `estimated_duration` that maps to a decay half-life:
| Duration | Half-Life |
|----------|-----------|
| `short_term` | 1 day |
| `medium_term` | 7 days |
| `long_term` | 30 days |
For each event, the function computes the projected remaining impact at the end of the horizon using exponential decay: `future_factor = 2^(future_age_days / half_life)`. The impact is further scaled by a severity weight (`critical`: 1.0, `high`: 0.75, `moderate`: 0.5, `low`: 0.25). Positive and negative macro impacts are accumulated separately, and the projected macro direction is determined by comparing the two sides — bullish if positive exceeds negative by 20%, bearish if the reverse, mixed if both are present without a clear majority.
When the macro layer is enabled and macro events exist, the projection blends the company-specific momentum projection with the macro trajectory. The macro weight is capped at `0.4` (40% of the blended projection), ensuring that macro signals inform but do not overwhelm the company-specific trend. The blending formula combines the signed company projection with the signed macro projection:
```
blended = company_weight × momentum_projected + macro_weight × macro_signed
```
### Driving Factors
The projection records a list of human-readable driving factors that explain what is influencing the projected direction. These include momentum descriptions ("Positive momentum (+0.150) in recent trend strength"), macro impact projections ("Macro signals project bearish impact (strength 0.350) over 7d"), and upcoming catalysts drawn from the trend's `dominant_catalysts` list (limited to the top 3). If no specific factors are identified, a baseline continuation factor is recorded.
### Divergence Detection
After computing the projected direction, the function compares it to the current trend direction. If they differ — for example, the current trend is bearish but the projection is bullish due to decaying negative macro events and positive momentum — the projection is flagged with `diverges_from_current = True` and a divergence driving factor is appended. Divergence signals are particularly valuable because they indicate that the trend may be about to reverse, giving the recommendation engine and human reviewers an early warning.
The projection also flags low confidence when `projected_confidence` falls below the default threshold of `0.3`. Projection confidence starts at 80% of the current trend confidence (reflecting the inherent uncertainty of forward-looking estimates), with a small boost if macro data is available and a further reduction if the macro layer is disabled entirely.
---
## Persistence
Each aggregation cycle persists its results to four PostgreSQL tables, creating a durable record of the trend assessment and its supporting evidence.
### `trend_windows` — Current State
The `persist_trend_summary()` function in `services/aggregation/worker.py` upserts the `TrendSummary` into the `trend_windows` table, keyed by `(entity_type, entity_id, window)`. Each cycle overwrites the previous row for that ticker and window, so `trend_windows` always reflects the most recent assessment. The row includes the trend direction, strength, confidence, contradiction score, disagreement details (as JSON), supporting and opposing evidence document IDs (as JSON arrays), dominant catalysts, material risks, market context, and the generation timestamp.
### `trend_history` — Time-Series Snapshots
Immediately after the upsert, `persist_trend_summary()` also inserts a snapshot row into the `trend_history` table. Unlike `trend_windows`, this table is append-only — every aggregation cycle adds a new row, creating a time-series of how the trend evolved over time. The history table stores the direction, strength, confidence, contradiction score, catalysts, risks, and timestamp. This time-series data powers the trend charts in the dashboard and enables the momentum computation in `services/aggregation/projection.py` by providing the previous cycle's strength and direction. If the history insert fails (for example, if the table does not yet exist in a development environment), the failure is logged at debug level and does not block the main upsert.
### `trend_evidence` — Per-Document Rankings
The `persist_trend_evidence()` function writes detailed evidence ranking rows to the `trend_evidence` table, linked to the `trend_windows` row by its UUID. Each row records a document ID, its role (supporting or opposing), and the individual scoring components: rank score, weight component, impact component, recency component, confidence component, and sentiment value. Non-UUID document IDs (such as synthetic pattern signal IDs like `pattern:AAPL:earnings:7d`) are filtered out before insertion, since the `trend_evidence` table enforces a foreign key to the `documents` table.
### `trend_projections` — Forward-Looking Estimates
The `persist_trend_projection()` function in `services/aggregation/projection.py` inserts the `TrendProjection` into the `trend_projections` table, linked to the `trend_windows` row. The row stores the projected direction, strength, confidence, projection horizon, driving factors (as JSON), macro contribution percentage, divergence flag, and computation timestamp. Like trend history, projections accumulate over time, allowing analysis of how well the system's forward-looking estimates matched subsequent reality.
---
## What Comes Next
At this point, the aggregation engine has transformed weighted signals into `TrendSummary` objects across five time windows, detected contradictions, ranked evidence, computed confidence, and persisted everything to PostgreSQL. The trend metrics — direction, strength, confidence, contradiction score — encode the accumulated weight of evidence for each ticker. But a `TrendSummary` is still an assessment, not an action. The next stage translates these assessments into concrete recommendations: should the system buy, sell, hold, or simply watch? And with what conviction? [Page 5 — Recommendation Generation](05-recommendation-generation.md) explains how the recommendation engine applies data quality suppression, eligibility evaluation, position sizing, thesis generation, and risk classification to convert trend summaries into actionable `Recommendation` objects that the trading engine can execute.
@@ -0,0 +1,226 @@
# Page 5 — Recommendation Generation and Signal-to-Action Translation
The aggregation engine described in [Page 4](04-trend-aggregation-and-accumulating-signals.md) produces `TrendSummary` objects across five time windows for each ticker, encoding the direction, strength, confidence, contradiction level, and supporting evidence accumulated from all three signal layers. But a `TrendSummary` is an assessment — it describes what the evidence says, not what the system should do about it. The recommendation engine is where assessment becomes action. It takes each `TrendSummary`, subjects it to a series of deterministic evaluations, and produces a `Recommendation` object that specifies a concrete action (buy, sell, hold, or watch), an execution mode (informational, paper-eligible, or live-eligible), a position sizing guideline, a human-readable thesis, and a risk classification. Every decision in this pipeline is rule-based and fully traceable — the LLM is only involved in an optional downstream step that rewrites the thesis wording.
The recommendation worker in `services/recommendation/main.py` polls the `stonks:queue:recommendation` Redis queue for jobs, each specifying a ticker and time window. For each job, it delegates to `generate_recommendation()` in `services/recommendation/worker.py`, which orchestrates the full pipeline: fetch the latest trend summary, check for duplicate recommendations, fetch any available trend projection, evaluate data quality suppression, evaluate eligibility, optionally rewrite the thesis via LLM, build the `Recommendation` object, and persist everything to PostgreSQL. For a visual overview of this flow, see the [Recommendation Generation Flow diagram](diagrams/recommendation-generation-flow.md).
---
## Data Quality Suppression
Before the eligibility engine evaluates whether a trend is strong enough to act on, the suppression layer in `services/recommendation/suppression.py` asks a more fundamental question: is the underlying data reliable enough to act on at all? A trend might show high confidence and strong directionality, but if the documents feeding it are stale, poorly extracted, or drawn from a single source type, the apparent signal quality is illusory. The suppression layer acts as a pre-filter on data quality, running before the eligibility engine and forcing any recommendation built on unreliable data to `informational` mode regardless of how strong the trend metrics look.
The `evaluate_suppression()` function accepts a `TrendSummary` and a `DataQualityContext` — a set of metrics about the documents underlying the trend, populated by querying `documents` and `document_intelligence` tables for the evidence document IDs stored in the trend summary. When full document-level metrics are not available (for example, in a development environment without the full document pipeline), the function falls back to `build_quality_context_from_summary()`, which estimates quality metrics from the trend summary's own evidence counts and confidence.
### The Six Data Quality Checks
The suppression evaluation runs six independent checks, each comparing a data quality metric against a configurable threshold defined in `SuppressionConfig`. If any single check fails, the recommendation is suppressed:
1. **Low extraction confidence** — If the average extraction confidence across the evidence documents falls below `0.40` (`min_avg_extraction_confidence`), the underlying LLM extractions are too unreliable. This catches cases where the extractor struggled with document formatting, ambiguous content, or low-quality source material, as described in [Page 2](02-ai-agent-processing-and-extraction.md).
2. **Evidence staleness** — If the most recent evidence document is older than `168` hours (7 days, `max_evidence_staleness_hours`), the trend is based on outdated information. Markets move fast, and a week-old evidence base may no longer reflect current conditions. When documents exist but no timestamp is available, the evidence is conservatively treated as stale.
3. **Low source diversity** — If fewer than `1` distinct source type (`min_source_types`) contributed to the evidence, the signal may be driven by a single unreliable source class. In practice, this check fires when the quality context has documents but all come from the same source type (for example, all news articles with no filings or market data to corroborate).
4. **High extraction failure rate** — If more than `50%` (`max_extraction_failure_rate`) of the documents that should have contributed to the trend failed extraction entirely, the data pipeline is unreliable for this ticker. A high failure rate means the trend summary is built from a biased subset of the available evidence — the failed documents might have told a different story.
5. **Insufficient valid documents** — If fewer than `2` valid (non-failed) documents (`min_valid_documents`) contributed to the trend, there simply is not enough data to act on. A single document, no matter how high-quality, does not provide the corroboration needed for automated trading decisions.
6. **Low data quality score** — The `_compute_data_quality_score()` function computes an overall quality score from three weighted components: extraction confidence (40% weight, normalized against a 0.8 baseline), evidence freshness (30% weight, linear decay over the staleness window), and document coverage (30% weight, combining the valid/total ratio with a count factor that saturates at 10 documents). If this composite score falls below `0.30` (`min_data_quality_score`) and the low-confidence check has not already fired, a general suppression reason is added.
When any check triggers, the `SuppressionResult` records the specific reasons (as `SuppressionReason` enum values) and the computed data quality score. The worker in `services/recommendation/worker.py` uses this result to force the recommendation's mode to `informational` and append a suppression note to the thesis text, ensuring the suppression decision is visible in the audit trail.
### Safety Suppressions: Macro-Only and Pattern-Only Signals
Beyond the six data quality checks, two additional safety suppressions protect against acting on signals that lack company-specific corroboration:
**Macro-only suppression** (`evaluate_macro_only_suppression()`) fires when macro signals are the sole basis for a trend direction — no company-specific signals contributed at all. As described in [Page 3](03-signal-scoring-and-weighted-signals.md), macro signals enter the aggregation engine at a reduced weight of `0.3` relative to company signals. But even at reduced weight, macro signals alone can shift a trend direction if no company-specific evidence exists. When this happens, the recommendation is forced to `informational` mode with a caveat noting that the signal is macro-only and should not be used for automated trading.
**Pattern-only suppression** (`evaluate_pattern_only_suppression()`) applies the same logic to competitive/pattern signals. When pattern-based signals from `services/aggregation/pattern_matcher.py` and `services/aggregation/signal_propagation.py` are the sole contributors — no company-specific or macro signals — the recommendation is suppressed. Historical patterns are valuable context, but acting on them without any current evidence is too speculative for automated trading.
Both safety suppressions are evaluated in the worker after the main suppression check, and both force the mode to `informational` when triggered.
---
## Eligibility Evaluation
Recommendations that survive the suppression layer enter the eligibility evaluation in `services/recommendation/eligibility.py`. This is the core decision logic — a set of deterministic rules that map trend metrics to actions, execution modes, and position sizing. The `evaluate_eligibility()` function is the single entry point, accepting a `TrendSummary` and an `EligibilityConfig` of tunable thresholds.
### Gate Checks
The `_check_gates()` function applies five hard gates. If any gate fails, the trend is ineligible for a recommendation (though the action and mode are still computed for the audit trace):
| Gate | Threshold | Rejection Reason |
|------|-----------|-----------------|
| Confidence | ≥ `0.35` | `low_confidence` |
| Trend strength | ≥ `0.10` | `low_trend_strength` |
| Contradiction score | ≤ `0.60` | `high_contradiction` |
| Evidence count | ≥ `2` (supporting + opposing) | `insufficient_evidence` |
| Direction | ≠ `neutral` | `neutral_direction` |
These gates are intentionally conservative. A confidence threshold of `0.35` means the system needs meaningful evidence breadth and agreement before generating any recommendation at all (see the confidence computation in [Page 4](04-trend-aggregation-and-accumulating-signals.md)). The contradiction ceiling of `0.60` allows moderately contested trends through — only when the evidence is deeply split does the gate reject. The evidence minimum of `2` ensures that no recommendation is ever based on a single document.
When a trend fails any gate, the resulting `EligibilityResult` has `eligible = False` and the mode is forced to `informational`, regardless of what the mode escalation logic would otherwise compute.
### Action Mapping
The `_determine_action()` function maps the trend's direction and strength to one of four action types. The logic evaluates in a specific order:
**Mixed or neutral direction → WATCH.** If the trend direction is `mixed` (high contradiction with weak directional signal) or `neutral`, the action is always `WATCH`. There is no directional conviction to act on.
**Strong directional signal → BUY or SELL.** If the trend strength reaches `0.25` or above (`action_strength_threshold`), the action follows the direction: `BUY` for bullish, `SELL` for bearish. This threshold ensures that only trends with meaningful magnitude trigger position-changing actions.
**Weak directional signal with decent confidence → HOLD.** If the trend has a clear direction (bullish or bearish) but strength remains below `0.25`, the action depends on confidence. If confidence reaches `0.50` or above (`hold_confidence_threshold`), the action is `HOLD` — the system recognizes the directional lean but does not have enough conviction to recommend a position change. Below `0.50` confidence, the action falls to `WATCH`.
This mapping creates the escalation ladder described in [Page 4](04-trend-aggregation-and-accumulating-signals.md): as consecutive signals accumulate and strengthen the trend metrics, the action naturally progresses from WATCH → HOLD → BUY/SELL.
### Mode Escalation
The `_determine_mode()` function determines the highest execution mode allowed for the recommendation. Mode controls whether the recommendation is purely informational, eligible for paper trading, or eligible for live trading:
**WATCH and HOLD → always informational.** These actions do not trigger trades, so they are always `informational` mode. They are logged for human review and dashboard display but never enter the trading engine.
**BUY and SELL → escalation based on signal quality.** For actionable recommendations, mode escalates through three tiers:
- **`informational`** — The default when confidence is below `0.50`. The recommendation is recorded but not eligible for any trading.
- **`paper_eligible`** — When confidence reaches `0.50` or above (`paper_confidence_threshold`). The recommendation can be picked up by the paper trading engine described in [Page 6](06-trading-decisions-and-execution.md).
- **`live_eligible`** — The strictest tier, requiring confidence ≥ `0.70` (`live_confidence_threshold`), contradiction ≤ `0.25` (`live_max_contradiction`), and evidence count ≥ `5` (`live_min_evidence`). This triple gate ensures that only high-conviction, well-corroborated, low-contradiction recommendations can trigger live trades.
The evidence count for mode escalation is computed as the sum of supporting and opposing evidence documents, matching the same count used in the gate checks.
---
## Position Sizing
The `_compute_position_sizing()` function in `services/recommendation/eligibility.py` translates signal quality into a portfolio allocation guideline. Position sizing is not a fixed value — it scales dynamically with the confidence and strength of the underlying trend, penalized by contradiction and thin evidence.
### Base and Scaling
The computation starts with a base portfolio allocation of `1%` (`base_portfolio_pct = 0.01`) and scales upward based on two factors:
- **Confidence factor** — `0.8 × confidence` (`confidence_sizing_weight`), reflecting how much the system trusts the trend assessment.
- **Strength factor** — `0.5 + 0.5 × trend_strength`, ranging from `0.5` (weakest trend) to `1.0` (strongest trend).
The raw portfolio percentage is computed as:
```
raw_portfolio = base + confidence_factor × strength_factor × (max - base)
```
where `max` is `10%` (`max_portfolio_pct = 0.10`). At maximum confidence (1.0) and maximum strength (1.0), the raw allocation reaches the full 10%. At typical values (confidence 0.6, strength 0.3), the raw allocation is considerably lower.
### Contradiction Penalty
The contradiction score applies a multiplicative penalty:
```
portfolio_pct = raw_portfolio × (1.0 0.5 × contradiction_score)
```
A contradiction score of `0.40` reduces the allocation by 20%. A score of `0.0` (no contradiction) applies no penalty. This ensures that contested trends receive smaller position sizes even when they pass the eligibility gates.
### Evidence Count Penalty
Thin evidence further reduces the allocation:
- Fewer than `3` evidence documents → multiply by `0.5` (halved).
- Fewer than `5` evidence documents → multiply by `0.75`.
- `5` or more documents → no penalty.
This penalty stacks with the contradiction penalty, so a trend with high contradiction and thin evidence receives a substantially reduced position size.
### Max Loss Scaling
The same scaling logic applies to the maximum loss percentage, which starts at a base of `0.3%` (`base_max_loss_pct = 0.003`) and scales up to `2%` (`max_max_loss_pct = 0.02`). Higher-conviction positions are allowed larger loss tolerances, while low-conviction or contested positions are constrained to tighter stops.
The final `PositionSizing` object (defined in `services/shared/schemas.py`) contains `portfolio_pct` and `max_loss_pct`, both clamped to their respective bounds. This object is embedded in the `Recommendation` and later consumed by the trading engine's own position sizer (described in [Page 6](06-trading-decisions-and-execution.md)), which applies additional portfolio-level constraints.
---
## Thesis Generation
Every recommendation includes a human-readable thesis that explains the reasoning behind the action. Thesis generation happens in two layers: a deterministic assembly that is always present, and an optional LLM rewrite that polishes the wording for trading-eligible recommendations.
### Deterministic Thesis Assembly
The `build_thesis()` function in `services/recommendation/worker.py` constructs a thesis string entirely from the trend data and eligibility result, with no model involvement. The thesis is assembled from several components in order:
1. **Opening** — States the ticker, trend direction, window, strength, and confidence. For example: "AAPL shows a bearish trend over the 7d window with strength 0.35 and confidence 0.62."
2. **Catalysts** — Lists the top three dominant catalysts from the `TrendSummary`, drawn from the evidence ranking described in [Page 4](04-trend-aggregation-and-accumulating-signals.md).
3. **Contradiction note** — If the contradiction score exceeds `0.15`, a note flags the signal disagreement and its magnitude.
4. **Trend projection** — When a `TrendProjection` is available and not flagged as low-confidence, the thesis incorporates the projected direction, strength, and top driving factors. If the projection diverges from the current trend, a divergence note is appended.
5. **Risks** — Lists the top two material risks from the `TrendSummary`.
6. **Evidence count** — States the number of supporting and opposing evidence documents.
7. **Prescriptive action** — States the recommended action and mode (e.g., "Recommendation: SELL (paper eligible).").
The deterministic thesis is always generated and serves as the audit reference. Even when the LLM rewrites the thesis, the deterministic version is preserved in the model metadata for traceability.
### Optional LLM Rewrite via the Thesis-Rewriter Agent
For recommendations that are both eligible and not suppressed, the worker optionally invokes the thesis-rewriter agent to polish the deterministic thesis into analyst-quality prose. The LLM rewrite is implemented in `services/recommendation/thesis_llm.py` and uses the `thesis-rewriter` agent slug, resolved at runtime through the `AgentConfigResolver` in `services/shared/agent_config.py`.
The `AgentConfigResolver` queries the `ai_agents` and `agent_variants` database tables to resolve the active configuration for the `thesis-rewriter` slug, preferring an active variant's model, timeout, and retry settings when one exists. The resolver uses a 60-second TTL in-memory cache to avoid hitting the database on every recommendation. This is the same resolution mechanism used by the document extractor and event classifier agents described in [Page 2](02-ai-agent-processing-and-extraction.md).
The `rewrite_thesis_with_llm()` function builds a prompt from the deterministic thesis and trend context (ticker, window, direction, strength, confidence, contradiction score, catalysts, risks), sends it to the local Ollama instance via HTTP, and returns the rewritten text. The system prompt enforces strict rules: no fabricated information, no numbers or facts not present in the input, under 150 words, neutral professional tone, and only the rewritten thesis text in the response.
The LLM layer is purely additive — if the call fails for any reason (network error, timeout, empty response, token budget exceeded), the original deterministic thesis is returned unchanged. The worker in `services/recommendation/main.py` resolves the thesis-rewriter configuration at startup and refreshes it every 50 jobs to pick up configuration changes without requiring a restart. When no database configuration exists for the `thesis-rewriter` slug, thesis rewriting is silently disabled.
Performance logging for the thesis-rewriter is written to the `agent_performance_log` table, recording success/failure, duration, estimated token counts, and the variant ID. Token budget enforcement checks hourly usage against the variant's configured budget before making the LLM call, preventing runaway costs from high-volume recommendation cycles.
### Risk Classification Prefix
Before the thesis is stored, the `classify_risk()` function in `services/recommendation/worker.py` assigns a risk classification label that is prepended to the thesis text as a `[risk:<level>]` prefix. The classification is computed from a composite score:
| Factor | Contribution |
|--------|-------------|
| Contradiction score | `contradiction × 2.0` |
| Low confidence | `(1.0 confidence) × 1.5` |
| Low evidence count | `+1.0` if < 3 docs, `+0.5` if < 5 docs |
| Rejection reasons | `+0.5` per rejection reason |
The composite score maps to four levels:
| Score Range | Classification |
|-------------|---------------|
| ≥ 3.0 | `very_high` |
| ≥ 2.0 | `high` |
| ≥ 1.0 | `moderate` |
| < 1.0 | `low` |
A recommendation with high contradiction (0.4 → contributes 0.8), moderate confidence (0.55 → contributes 0.675), and 4 evidence documents (contributes 0.5) would score 1.975, classifying as `moderate`. The same recommendation with only 2 evidence documents would score 2.475, pushing it to `high`. This classification gives downstream consumers — both the trading engine and human reviewers — a quick risk signal without needing to re-evaluate the underlying metrics.
---
## Persistence
The recommendation pipeline persists its output to three PostgreSQL tables, creating a complete audit trail from trend assessment through decision logic to the final recommendation.
### `recommendations` — The Core Record
The `persist_recommendation()` function in `services/recommendation/worker.py` inserts the `Recommendation` into the `recommendations` table. Each row captures the ticker, action, mode, confidence, time horizon, thesis (including the risk classification prefix and any suppression notes), invalidation conditions (as JSONB), position sizing (portfolio percentage and max loss percentage), model metadata (provider, model name, prompt version, schema version), risk classification, and generation timestamp. The insert returns the recommendation's UUID, which serves as the foreign key for the evidence and risk evaluation tables.
### `recommendation_evidence` — Evidence Citations
For each evidence document referenced in the recommendation, a row is inserted into the `recommendation_evidence` table linking the recommendation UUID to the document UUID, with an evidence type (`supporting` or `opposing`) and a position-based weight that decays with rank: `weight = 1.0 / (1.0 + index × 0.1)`. The first supporting document gets weight `1.0`, the second gets `0.91`, the third `0.83`, and so on. Non-UUID document IDs (such as synthetic pattern signal IDs like `pattern:AAPL:earnings:7d` from the competitive signal layer) are filtered out before insertion, since the table enforces a foreign key to the `documents` table.
### `risk_evaluations` — Decision Audit Trail
The `risk_evaluations` table records the full eligibility decision for each recommendation: whether the trend was eligible, the allowed mode, the list of rejection reasons (as JSONB), and a `risk_checks` JSONB object containing the time horizon, position sizing details, invalidation conditions, and risk classification. This table enables post-hoc analysis of why the system made a particular decision — auditors can trace from the recommendation back through the eligibility evaluation to the underlying trend metrics.
---
## Deduplication
Before running the full evaluation pipeline, the worker checks whether the latest recommendation for the same ticker and time horizon is effectively identical to what would be generated. The `_is_duplicate_recommendation()` function in `services/recommendation/worker.py` compares the previous recommendation's action, mode, and confidence (within a `0.01` tolerance) against the current eligibility result. If all three match, the recommendation is skipped — the underlying trend data has not changed meaningfully since the last cycle. This prevents the system from flooding the `recommendations` table with identical entries on every aggregation cycle, while still generating a new recommendation whenever the trend metrics shift enough to change the action, mode, or confidence.
---
## What Comes Next
At this point, the recommendation engine has translated trend assessments into concrete `Recommendation` objects — each with an action, execution mode, position sizing guideline, thesis, and risk classification — and persisted them alongside their evidence citations and eligibility audit trails. Recommendations marked as `paper_eligible` or `live_eligible` are now available for the trading engine to consume. [Page 6 — Trading Decisions and Execution](06-trading-decisions-and-execution.md) explains how the trading engine polls these recommendations, applies its own pre-trade check sequence (circuit breakers, trading windows, confidence gates, deduplication, declining positions, and max open positions), computes final position sizes with portfolio-level constraints, and submits orders through the broker adapter to Alpaca's paper trading API.
@@ -0,0 +1,199 @@
# Page 6 — Trading Decisions and Execution
The recommendation engine described in [Page 5](05-recommendation-generation.md) produces `Recommendation` objects with an action, execution mode, position sizing guideline, thesis, and risk classification. Recommendations marked as `paper_eligible` or `live_eligible` are persisted to the `recommendations` table and are now available for the final stage of the pipeline: autonomous trade execution. The trading engine in `services/trading/engine.py` is where intelligence becomes action. It polls eligible recommendations, subjects each one to a strict sequence of pre-trade safety checks, computes a portfolio-aware position size, and — if every gate passes — submits an order through the broker adapter to Alpaca's paper trading API. Every evaluation, whether it results in a trade or a skip, is recorded as a `TradingDecision` in the `trading_decisions` table, creating a complete audit trail from the original document signal through to the broker response.
For a visual overview of the decision flow, see the [Trading Engine Decision Loop diagram](diagrams/trading-engine-decision-loop.md).
---
## The Trading Engine Decision Loop
The `TradingEngine` class in `services/trading/engine.py` is the orchestrator. When `start()` is called, it loads the current portfolio state from PostgreSQL — open positions, reserve pool balance, sector exposure, portfolio heat — and then spawns five concurrent `asyncio` tasks that run for the lifetime of the engine:
1. **`_decision_loop()`** — The core polling loop. Every 60 seconds (configurable via `polling_interval_seconds`), it queries the `recommendations` table for rows where `action IN ('buy', 'sell')`, `mode IN ('paper_eligible', 'live_eligible')`, and `generated_at` is within the last two hours. Recommendations are ordered by confidence descending and capped at 50 per cycle. For each recommendation, the engine fetches the current market price (first from `market_snapshots`, falling back to the Polygon API), then runs the full pre-trade evaluation pipeline described below.
2. **`_stop_loss_monitor()`** — Periodically checks current prices against the stop-loss and take-profit levels maintained by the `StopLossManager` in `services/trading/stop_loss_manager.py`. When a price crosses a stop-loss or take-profit threshold, the monitor submits a sell order to the broker queue. The `StopLossManager` computes initial levels from ATR and risk tier parameters, re-evaluates them when volatility shifts materially (ATR change > 10%), activates trailing stops when the price moves more than 50% toward the take-profit target, and tightens stops proactively when portfolio heat exceeds 80% of the maximum.
3. **`_performance_loop()`** — Computes portfolio-wide performance metrics (total value, unrealized and realized P&L, win rate, Sharpe ratio, drawdown, portfolio heat), persists daily snapshots to `portfolio_snapshots`, checks for daily-loss circuit breaker triggers, evaluates profit-taking opportunities, and synchronizes positions with the database to detect closed positions and trigger reserve pool siphoning.
4. **`_risk_tier_scheduler()`** — Runs once daily at 16:00 ET (market close). It loads the latest `PerformanceMetrics` from `portfolio_snapshots`, computes the reserve pool as a fraction of total portfolio value, and delegates to the `RiskTierController` in `services/trading/risk_tier_controller.py` to determine whether the active risk tier should change. Tier changes are persisted to `risk_tier_history` and take effect immediately for subsequent decision cycles.
5. **`_rebalance_scheduler()`** — Runs weekly on Monday at 09:45 ET (shortly after market open). It loads current positions, evaluates them against the active risk tier's constraints using the `PortfolioRebalancer`, and pushes any rebalance sell orders to `stonks:queue:broker_orders`. The rebalancer respects the circuit breaker — if any breaker is active, the rebalance cycle is skipped entirely.
All five tasks run concurrently within a single `asyncio` event loop. Graceful shutdown via `stop()` cancels all tasks and awaits their completion. If any task encounters an unexpected exception, it logs the error and retries after a brief sleep rather than crashing the engine.
---
## Pre-Trade Check Sequence
When the decision loop picks up a buy recommendation, it calls `evaluate_recommendation()` — a synchronous method that runs the full pre-trade check sequence. The checks are applied in a strict order, and the first failure short-circuits the evaluation with a `skip` decision. This fail-fast design ensures that expensive downstream computations (like position sizing and correlation analysis) are never reached when a simple gate would have rejected the trade.
The six checks, in order:
**a. Circuit breaker check.** The engine calls `self.circuit_breaker.is_active()` on the current `CircuitBreakerState`. If any circuit breaker is active and its cooldown has not expired, the recommendation is skipped with reason `circuit_breaker_active`. The circuit breaker mechanism is described in detail below.
**b. Trading window check.** The `is_within_trading_window()` function verifies that the current time falls within US market hours. Outside the trading window, no orders are submitted — the recommendation is skipped with reason `outside_trading_window`.
**c. Confidence gate.** The recommendation's confidence score is compared against the active risk tier's `min_confidence` threshold. A conservative tier requires confidence ≥ 0.75, moderate requires ≥ 0.55, and aggressive requires ≥ 0.40. If the recommendation's confidence falls below the tier minimum, it is skipped with reason `insufficient_confidence`. This gate ensures that the risk tier's conservatism is enforced before any capital allocation is considered.
**d. Deduplication check.** The engine maintains an in-memory set of processed recommendation IDs (`processed_recommendation_ids`) and also checks Redis via `stonks:dedupe:trading:*` keys (with a 24-hour TTL). If the recommendation has already been evaluated in this engine session or by a previous instance, it is skipped with reason `duplicate_recommendation`. This prevents the same recommendation from generating multiple orders across polling cycles.
**e. Declining positions check.** The `check_declining_positions()` method examines all open positions. If more than 50% of positions have unrealized losses exceeding 2% of their entry value, the engine halts new entries with reason `multiple_declining_positions`. This is a portfolio-level safety valve — when the majority of existing positions are underwater, adding new exposure compounds the risk.
**f. Max open positions check.** The engine enforces a configurable maximum number of concurrent positions (default 10). If the portfolio is already at capacity, the recommendation is skipped with reason `max_positions_reached`.
For sell recommendations, the engine follows a separate, simpler path: it verifies the trading window, looks up the existing position for the ticker, and submits a market sell order for the full position quantity without running the position sizer. Sell decisions still generate a `TradingDecision` audit record and set the Redis deduplication key.
If all six checks pass for a buy recommendation, the engine proceeds to position sizing.
---
## Position Sizing
The `PositionSizer` in `services/trading/position_sizer.py` translates a recommendation's signal quality into a concrete dollar amount and share count, applying a sequential pipeline of adjustments that account for confidence, portfolio composition, sector concentration, correlation, and upcoming earnings events. The sizer operates on the *active pool* — the portion of the portfolio available for trading after subtracting the reserve pool balance.
### Base Sizing
The computation begins with a base allocation percentage derived from the risk tier:
```
base_allocation_pct = risk_tier.max_position_pct × 0.5
raw_pct = base_allocation_pct × (confidence / min_confidence)
```
The base starts at half the tier's maximum position percentage, then scales linearly with how far the recommendation's confidence exceeds the tier minimum. A moderate-tier recommendation with confidence 0.70 against a minimum of 0.55 would produce a raw percentage of `0.05 × (0.70 / 0.55) ≈ 0.0636`, or about 6.4% of the active pool. The raw percentage is clamped to `max_position_pct` (5% for conservative, 10% for moderate, 15% for aggressive) and then converted to a dollar amount against the active pool. An absolute position cap (default $50) provides a hard ceiling regardless of pool size — a safety measure for the paper trading environment.
### Correlation-Aware Diversification
The sizer computes a weighted average correlation between the candidate ticker and all existing positions, using the pairwise correlation matrix that the engine refreshes from 30 days of daily close prices in `market_snapshots`. Each existing position's correlation is weighted by its market value, so larger positions have more influence on the diversification check.
If the weighted average correlation exceeds 0.8, the position is rejected outright — the portfolio already has too much exposure to correlated assets. Between 0.5 and 0.8, the dollar amount is reduced proportionally: a correlation of 0.65 produces a scale factor of `1.0 (0.65 0.5) / (0.8 0.5) = 0.5`, halving the position size. Below 0.5, no reduction is applied.
### Sector Exposure Reduction
The sizer checks whether adding the new position would push the sector's total exposure beyond the risk tier's `max_sector_pct` (20% for conservative, 30% for moderate, 40% for aggressive). If the sector is already at its limit, the position is rejected. If the new position would exceed the limit, the dollar amount is reduced to exactly fill the remaining sector capacity.
### Diversification Bonus
When the portfolio holds fewer than three distinct sectors and the candidate ticker belongs to a new sector, the sizer applies a 1.2× bonus to the dollar amount. This incentivizes early diversification — the first few positions are encouraged to spread across sectors rather than concentrating in a single one. The bonus is re-clamped to `max_position_pct` after application to prevent oversized positions.
### Earnings Proximity Adjustment
The sizer checks the earnings calendar for the candidate ticker. If earnings are within one trading day, the position is rejected entirely — the binary risk of an earnings surprise is too high for automated entry. If earnings are within three trading days, the dollar amount is reduced by 50%. Beyond three days, no adjustment is applied.
### Portfolio Heat Check and Share Rounding
After all adjustments, the sizer estimates the new position's contribution to portfolio heat (the aggregate risk from stop-loss distances across all positions). If adding the position would push total heat beyond `max_portfolio_heat × active_pool` (10% for conservative, 20% for moderate, 30% for aggressive), the position is rejected.
Finally, the dollar amount is converted to whole shares via `floor(dollar_amount / current_price)`. If rounding produces zero shares (the position is too small for even one share at the current price), the position is rejected. The final dollar amount is recalculated from the whole-share quantity to reflect the actual capital deployed.
The `PositionSizeResult` returned to the engine includes the dollar amount, share quantity, allocation percentage, a list of human-readable adjustment notes, and a rejected flag with reason if any step failed. These adjustment notes are embedded in the `TradingDecision`'s `decision_trace` for full auditability.
---
## Circuit Breaker
The `CircuitBreaker` in `services/trading/circuit_breaker.py` is a pure computation module that evaluates three independent trigger conditions. It carries no state of its own — the engine manages the `CircuitBreakerState` dataclass and persists trigger events to the `circuit_breaker_events` table and Redis keys under `stonks:trading:circuit_breaker:*`.
### Three Trigger Types
**Daily loss trigger.** When the portfolio's daily P&L loss exceeds 5% of total portfolio value (`daily_loss_pct = 0.05`), the circuit breaker activates. The `check_daily_loss()` method compares the absolute loss ratio against the threshold. The cooldown duration is set to `volatility_pause_hours` (default 2 hours). The performance loop in the engine calls `_check_circuit_breaker_daily_loss()` periodically to evaluate this condition against the latest portfolio metrics. In extreme cases where the drawdown exceeds an emergency threshold, the reserve pool's emergency liquidation mechanism may also be triggered.
**Single position loss trigger.** When any individual position loses more than 15% of its entry value (`single_position_loss_pct = 0.15`), the circuit breaker activates with a ticker-specific cooldown. The `check_single_position()` method evaluates the loss percentage. The cooldown for the affected ticker is set to `ticker_cooldown_hours` (default 48 hours), during which the engine will not re-enter that ticker. The `is_ticker_cooled_down()` method checks whether a specific ticker is still within its cooldown window by consulting the `ticker_cooldowns` dictionary in the `CircuitBreakerState`.
**Volatility trigger (stop-loss clustering).** When three or more stop-losses fire within a 30-minute rolling window (`stop_loss_hits_threshold = 3`, `stop_loss_window_minutes = 30`), the circuit breaker activates. The `check_volatility()` method uses a sliding window algorithm: it sorts the stop-loss timestamps and checks every contiguous subsequence of length `stop_loss_hits_threshold` to see if it fits within the window. This detects rapid-fire stop-loss cascades that indicate extreme market volatility. The cooldown is `volatility_pause_hours` (default 2 hours).
### Cooldown Computation
The `compute_cooldown_expiry()` method calculates when a triggered breaker expires. For `daily_loss` and `volatility` triggers, the expiry is `triggered_at + volatility_pause_hours`. For `single_position` triggers, the expiry is `triggered_at + ticker_cooldown_hours`, giving the affected ticker a longer cooling-off period. The `is_active()` method returns `True` when the breaker is flagged active and the current time has not yet passed the cooldown expiry.
### Redis State Tracking
The engine persists circuit breaker state to Redis under the `stonks:trading:circuit_breaker:*` key pattern (constructed by `trading_cb_key()` in `services/shared/redis_keys.py`). Each trigger type gets its own key — for example, `stonks:trading:circuit_breaker:daily_loss` — storing the activation timestamp and cooldown expiry. This allows the state to survive engine restarts and enables external monitoring tools to query breaker status without accessing the engine's memory.
---
## Reserve Pool
The `ReservePoolController` in `services/trading/reserve_pool.py` manages an untouchable cash reserve that grows from realized trading profits. The reserve serves two purposes: it provides a buffer against drawdowns, and its size relative to the portfolio influences risk tier upgrade decisions.
### Profit Siphoning
When the engine detects a closed position with positive unrealized P&L (via `_sync_positions_and_siphon()` in the performance loop), it calls `siphon_profit()` on the controller. The method transfers a configurable fraction of the realized profit into the reserve — by default 20% (`siphon_pct = 0.20`). Only positive profits are siphoned; losses do not reduce the reserve balance. Each siphon event is recorded in the `reserve_pool_ledger` table with the transfer amount, resulting balance, trigger type (`profit_siphon`), the ticker as reference, and a timestamp.
### High-Water Mark Rebalancing
The `is_high_water()` method returns `True` when the reserve balance exceeds 30% of total portfolio value (`high_water_pct = 0.30`). This signal is consumed by the risk tier scheduler — when the reserve is healthy and other performance criteria are met, the controller may recommend upgrading to a more aggressive tier. The high-water mark acts as a confidence indicator: a large reserve means the system has been consistently profitable and can afford to take on more risk.
### Emergency Liquidation
The `should_emergency_liquidate()` method checks whether the current drawdown exceeds an emergency threshold. When triggered, `emergency_liquidate()` returns the full reserve balance for release back into the active pool. The caller (the engine) is responsible for zeroing the persisted balance and recording the ledger entry. Emergency liquidation is a last resort — it sacrifices the safety buffer to prevent the portfolio from hitting a catastrophic loss level.
### Active Pool Computation
The `compute_active_pool()` method calculates the capital available for trading: `active_pool = total_portfolio_value reserve_balance`. All position sizing computations use the active pool rather than the total portfolio value, ensuring that the reserve is never inadvertently deployed into new positions.
---
## Risk Tier Auto-Adjustment
The `RiskTierController` in `services/trading/risk_tier_controller.py` evaluates portfolio performance and determines whether the active risk tier should shift. The system supports three tiers — conservative, moderate, and aggressive — each defined by a `RiskTierConfig` dataclass in `services/trading/models.py` with distinct parameter values:
| Parameter | Conservative | Moderate | Aggressive |
|-----------|-------------|----------|------------|
| `min_confidence` | 0.75 | 0.55 | 0.40 |
| `max_position_pct` | 5% | 10% | 15% |
| `stop_loss_atr_multiplier` | 1.5× | 2.0× | 2.5× |
| `reward_risk_ratio` | 2.0 | 1.5 | 1.2 |
| `max_sector_pct` | 20% | 30% | 40% |
| `max_portfolio_heat` | 10% | 20% | 30% |
The tier controller's `evaluate()` method checks two conditions:
**Downgrade (any one triggers).** If the trailing 30-day win rate drops below 40% or the current drawdown exceeds 15%, the tier steps down by one level (e.g., aggressive → moderate). If the system is already at conservative, no further downgrade is possible.
**Upgrade (all must be true).** If the win rate exceeds 55%, the reserve pool exceeds 20% of total portfolio value, and the current drawdown is below 5%, the tier steps up by one level. The triple requirement ensures that upgrades only happen when the system is performing well, has built a safety cushion, and is not in a drawdown.
The risk tier scheduler in the engine evaluates these conditions daily at market close. When a tier change occurs, it is persisted to the `risk_tier_history` table with the previous tier, new tier, trigger source (`auto_adjustment`), and the metrics that drove the decision (win rate, drawdown, reserve percentage, Sharpe ratio). The new tier takes effect immediately — the engine updates its `_active_risk_tier` reference, and all subsequent decision cycles use the new tier's parameters for confidence gates, position sizing, stop-loss computation, and sector exposure limits.
---
## Order Submission Flow
When `evaluate_recommendation()` returns an `act` decision, the engine constructs an order job and pushes it through a multi-stage submission pipeline that spans two services.
### TradingDecision Persistence
Every evaluation — whether it results in `act` or `skip` — produces a `TradingDecision` dataclass that is persisted to the `trading_decisions` table via `_persist_decision()`. The record captures the recommendation ID, decision outcome, skip reason (if applicable), ticker, computed position size and share quantity, the risk tier at the time of decision, portfolio heat, active pool and reserve pool balances, circuit breaker status, correlation and sector exposure check results, earnings proximity flag, and a `decision_trace` JSONB field containing the full reasoning chain. This creates a complete audit record of every recommendation the engine evaluated and why it acted or declined.
### Order Enqueue
For `act` decisions, the engine builds an order job dictionary containing the trading decision ID, ticker, action (buy or sell), quantity, and order type (market). This job is pushed via `rpush` to the `stonks:queue:broker_orders` Redis queue (constructed by `queue_key(QUEUE_BROKER)` from `services/shared/redis_keys.py`). The engine immediately deducts the estimated order cost from the in-memory active pool to prevent over-allocation across concurrent recommendation evaluations within the same polling cycle.
### Broker Service Processing
The broker service in `services/adapters/broker_service.py` runs as a standalone worker that polls `stonks:queue:broker_orders` via `blpop`. For each order job, `process_order_job()` executes a multi-step pipeline:
1. **Idempotency check.** A deterministic idempotency key is generated from the job's ticker, action, quantity, and trading decision ID. The service checks Redis first (fast path) and then the `orders` table (durable fallback) to prevent duplicate submissions. If a matching key exists, the job is silently dropped.
2. **Risk evaluation.** The service loads the current `PortfolioRiskConfig` from the database and the account's risk state (open positions, daily P&L, sector exposure) from both the database and the Alpaca API. The `evaluate_order()` function runs the proposed order through a set of risk checks — position limits, sector concentration, daily loss thresholds — and produces an evaluation result. The evaluation is persisted to the `risk_evaluations` table regardless of outcome.
3. **Alpaca submission.** If the risk evaluation passes, the service calls `submit_order()` on the `AlpacaBrokerAdapter` in `services/adapters/broker_adapter.py`. The adapter constructs the Alpaca REST API payload (symbol, quantity, side, order type, time in force) and submits it to `paper-api.alpaca.markets/v2/orders` with an idempotency key header. The adapter follows a fail-closed policy: any network error or ambiguous response returns a rejected `OrderResponse` rather than risking duplicate orders.
4. **Persistence and audit trail.** The `persist_order()` function writes the order to the `orders` table with the full request and response details, risk evaluation results, and the recommendation ID for traceability. When the order is filled, the fill details (price, quantity) are recorded. Order events are published to the analytical lakehouse via MinIO for downstream analysis. The Redis idempotency marker is set after successful persistence to prevent reprocessing.
The result is a complete chain of custody: from the original document that produced a signal (Pages [1](01-data-ingestion-and-preparation.md)[2](02-ai-agent-processing-and-extraction.md)), through signal scoring ([Page 3](03-signal-scoring-and-weighted-signals.md)) and trend aggregation ([Page 4](04-trend-aggregation-and-accumulating-signals.md)), to the recommendation ([Page 5](05-recommendation-generation.md)), the trading decision, the risk evaluation, and the broker response — every step is persisted and linked by foreign keys. The `trading_decisions` table links to `recommendations` via `recommendation_id`, the `orders` table links back to both, and the `positions` and `portfolio_snapshots` tables capture the portfolio impact over time.
For additional reference on the trading engine's configuration, queue topology, and database tables, see [docs/services.md](../services.md).
---
## Conclusion: From Raw Data to Trade Execution
This six-page series has traced the full intelligence-to-decision pipeline in Stonks Oracle, from the moment raw data enters the system to the moment an order reaches the broker.
It began with [Page 1](01-data-ingestion-and-preparation.md), where the scheduler orchestrates ingestion cycles across four data sources — Polygon news, SEC EDGAR filings, Polygon market data, and macro news APIs — and the parser normalizes raw content into structured documents ready for AI processing. [Page 2](02-ai-agent-processing-and-extraction.md) described how the Document Intelligence Extractor and Global Event Classifier agents use LLM inference to produce structured JSON intelligence, with hot-swappable model configurations and a robust JSON repair pipeline. [Page 3](03-signal-scoring-and-weighted-signals.md) explained how raw extraction output is transformed into `WeightedSignal` objects through a composite formula that balances recency, credibility, novelty, and market context across three independent signal layers. [Page 4](04-trend-aggregation-and-accumulating-signals.md) showed how the aggregation engine merges these signals across five time windows, detecting contradictions, ranking evidence, and computing trend projections — with consecutive same-direction signals accumulating to escalate the system's response from neutral through watch and hold to buy or sell. [Page 5](05-recommendation-generation.md) covered the translation of trend assessments into actionable recommendations through data quality suppression, eligibility evaluation, position sizing, thesis generation, and risk classification.
And here in Page 6, the pipeline reached its terminus: the trading engine's decision loop polling those recommendations, subjecting each to circuit breaker checks, confidence gates, deduplication, portfolio health assessments, and a multi-step position sizer — then submitting approved orders through the broker adapter to Alpaca's paper trading API, with every decision recorded in a fully auditable trail from signal to execution.
The pipeline is designed to be conservative by default and transparent throughout. Every stage applies its own safety checks — deduplication at ingestion, confidence gates at extraction, contradiction detection at aggregation, suppression at recommendation, and circuit breakers at trading. The system can be tuned through runtime configuration (risk tier parameters, suppression thresholds, signal layer toggles in `risk_configs`) without code changes or restarts. And the complete audit trail — from `documents` through `document_intelligence`, `document_impact_records`, `trend_windows`, `recommendations`, `trading_decisions`, and `orders` — means that any trade can be traced back to the specific documents, signals, and decisions that produced it.
@@ -0,0 +1 @@
@@ -0,0 +1,81 @@
# Ingestion-to-Extraction Flow
```mermaid
flowchart TD
subgraph Scheduler["Scheduler\nservices/scheduler/app.py"]
S1["schedule_cycle()"]
S2["Cadence check\nmarket_api: 300s\nnews_api: 300s\nfilings_api: 3600s\nmacro_news: 600s"]
S3["Rate limit check\ncheck_rate_limit()"]
S1 --> S2 --> S3
end
S3 -->|"rpush"| Q_ING["stonks:queue:ingestion"]
Q_ING -->|"lpop"| ING
subgraph ING["Ingestion Worker\nservices/ingestion/worker.py"]
direction TB
AD["Adapter Dispatch\nprocess_job()"]
AD --> PA["PolygonMarketAdapter\nservices/adapters/market_adapter.py"]
AD --> PB["PolygonNewsAdapter\nservices/adapters/news_adapter.py"]
AD --> PC["SECEdgarAdapter\nservices/adapters/filings_adapter.py"]
AD --> PD["MacroNewsAdapter\nservices/adapters/macro_news_adapter.py"]
AD --> PE["WebScrapeAdapter\nservices/adapters/web_scrape_adapter.py"]
end
ING -->|"Content hash check\nstonks:dedupe:*\nTTL 24h"| REDIS_DEDUPE[("Redis\nDedupe Markers")]
ING -->|"upload_raw_artifact()"| MINIO_RAW
subgraph MINIO_RAW["MinIO Raw Storage"]
B1["stonks-raw-market"]
B2["stonks-raw-news"]
B3["stonks-raw-filings"]
end
ING -->|"persist_ingestion_items()"| PG_ING
subgraph PG_ING["PostgreSQL"]
T1["documents"]
T2["ingestion_runs"]
T3["document_company_mentions"]
end
ING -->|"rpush new doc IDs"| Q_PARSE["stonks:queue:parsing"]
Q_PARSE -->|"lpop"| PARSER
subgraph PARSER["Parser Worker\nservices/parser/worker.py"]
P1["fetch_html() → parse_html()"]
P2["Quality scoring\nconfidence: high / medium / low"]
P3["Company mention detection\ndetect_company_mentions()"]
P4["Routing decision"]
P1 --> P2 --> P3 --> P4
end
PARSER -->|"upload_normalized_text()\nupload_parser_output()"| MINIO_NORM["MinIO\nstonks-normalized"]
PARSER -->|"update_document_parse_results()"| PG_ING
P4 -->|"doc_type = macro_event"| Q_MACRO["stonks:queue:macro_classification"]
P4 -->|"doc_type ≠ macro_event"| Q_EXT["stonks:queue:extraction"]
Q_EXT -->|"lpop"| EXT
Q_MACRO -->|"lpop"| EXT
subgraph EXT["Extractor Worker\nservices/extractor/main.py"]
E1["Document Intelligence\nExtractor agent\nslug: document-extractor"]
E2["Global Event Classifier\nslug: event-classifier\nservices/extractor/event_classifier.py"]
E3["persist_extraction()\nservices/extractor/worker.py"]
end
EXT -->|"persist to"| PG_EXT
subgraph PG_EXT["PostgreSQL"]
T4["document_intelligence"]
T5["document_impact_records"]
T6["global_events"]
T7["macro_impact_records"]
end
EXT -->|"rpush"| Q_AGG["stonks:queue:aggregation"]
```
@@ -0,0 +1,80 @@
# Recommendation Generation Flow
```mermaid
flowchart TD
Q_REC["stonks:queue:recommendation"] -->|"lpop"| WORKER["Recommendation Worker\nservices/recommendation/main.py"]
WORKER --> FETCH["Fetch TrendSummary\nfrom trend_windows\nfor ticker + window"]
FETCH --> SUPP
subgraph SUPP["Data Quality Suppression\nservices/recommendation/suppression.py"]
S1["extraction confidence < 0.40?"]
S2["evidence staleness > 168h?"]
S3["source diversity < 1 type?"]
S4["extraction failure rate > 50%?"]
S5["valid documents < 2?"]
S6["data quality score < 0.30?"]
S7["Macro-only signal?\nevaluate_macro_only_suppression()"]
S8["Pattern-only signal?\nevaluate_pattern_only_suppression()"]
end
SUPP -->|"Any check fails:\nsuppressed = true\nmode → informational"| ELIG
SUPP -->|"All checks pass"| ELIG
subgraph ELIG["Eligibility Evaluation\nservices/recommendation/eligibility.py"]
direction TB
G["Gate Checks"]
G1["confidence ≥ 0.35"]
G2["strength ≥ 0.10"]
G3["contradiction ≤ 0.60"]
G4["evidence ≥ 2"]
G5["direction ≠ neutral"]
G --> G1 & G2 & G3 & G4 & G5
G1 & G2 & G3 & G4 & G5 --> ACT["Action Mapping"]
ACT --> A1["BUY: bullish + strength ≥ 0.25"]
ACT --> A2["SELL: bearish + strength ≥ 0.25"]
ACT --> A3["HOLD: directional + confidence ≥ 0.50"]
ACT --> A4["WATCH: otherwise"]
A1 & A2 & A3 & A4 --> MODE["Mode Escalation"]
MODE --> M1["informational\n(default for HOLD/WATCH)"]
MODE --> M2["paper_eligible\nconfidence ≥ 0.50"]
MODE --> M3["live_eligible\nconfidence ≥ 0.70\ncontradiction ≤ 0.25\nevidence ≥ 5"]
end
ELIG --> SIZING
subgraph SIZING["Position Sizing\nservices/recommendation/eligibility.py"]
PS1["base = 1% portfolio"]
PS2["scale by confidence × strength\nup to 10% max"]
PS3["contradiction penalty\n0.5 × contradiction_score"]
PS4["evidence count penalty\n< 3 docs → ×0.5\n< 5 docs → ×0.75"]
end
SIZING --> THESIS
subgraph THESIS["Thesis Generation"]
TH1["Deterministic thesis\nassembled from trend data"]
TH2["Optional LLM rewrite\nthesis-rewriter agent\nservices/recommendation/thesis_llm.py"]
TH1 --> TH2
end
THESIS --> RISK
subgraph RISK["Risk Classification"]
RC1["low"]
RC2["moderate"]
RC3["high"]
RC4["very_high"]
end
RISK --> PERSIST
subgraph PERSIST["Persistence — PostgreSQL"]
P1["recommendations"]
P2["recommendation_evidence"]
P3["risk_evaluations"]
end
```
@@ -0,0 +1,52 @@
# Three-Layer Signal Merging
```mermaid
flowchart TD
subgraph Layer1["Layer 1 — Company Signals"]
DIR["document_impact_records\n(per-company extraction output)"]
DIR -->|"build_weighted_signals()"| WS1["WeightedSignal[]\nweight = 1.0 (full)"]
end
subgraph Layer2["Layer 2 — Macro Signals"]
MIR["macro_impact_records\n(global event interpolation)"]
MIR -->|"build_macro_weighted_signals()"| WS2["WeightedSignal[]\nimpact × MACRO_SIGNAL_WEIGHT\n(0.3)"]
TOGGLE_M{"macro_enabled\nin risk_configs?"}
TOGGLE_M -->|"true"| MIR
TOGGLE_M -->|"false"| SKIP_M["Layer skipped\ngraceful degradation"]
end
subgraph Layer3["Layer 3 — Competitive Signals"]
CSR["competitive_signal_records\n(pattern mining + propagation)"]
CSR -->|"build_pattern_weighted_signals()\nservices/aggregation/signal_propagation.py"| WS3["WeightedSignal[]\nimpact × COMPETITIVE_SIGNAL_WEIGHT\n(0.2)"]
TOGGLE_C{"competitive_enabled\nin risk_configs?"}
TOGGLE_C -->|"true"| CSR
TOGGLE_C -->|"false"| SKIP_C["Layer skipped\ngraceful degradation"]
end
WS1 --> MERGE["Concatenate all WeightedSignal lists"]
WS2 --> MERGE
WS3 --> MERGE
MERGE --> AGG
subgraph AGG["Aggregation Engine\nservices/aggregation/worker.py"]
A1["weighted_sentiment_average()"]
A2["detect_contradictions()\nservices/aggregation/contradiction.py"]
A3["derive_trend_direction()"]
A4["compute_trend_confidence()"]
A5["rank_evidence()"]
A1 --> A2 --> A3 --> A4 --> A5
end
AGG -->|"assemble_trend_summary()"| TS["TrendSummary\nservices/shared/schemas.py"]
TS -->|"persist_trend_summary()"| PG_TREND
subgraph PG_TREND["PostgreSQL"]
TW["trend_windows\n(upserted each cycle)"]
TH["trend_history\n(time-series snapshots)"]
TE["trend_evidence\n(per-document rankings)"]
end
AGG -->|"rpush"| Q_REC["stonks:queue:recommendation"]
```
@@ -0,0 +1,94 @@
# Trading Engine Decision Loop
```mermaid
flowchart TD
subgraph ENGINE["Trading Engine\nservices/trading/engine.py"]
direction TB
TASKS["5 Concurrent Async Tasks"]
T1["_decision_loop()\n60s polling interval"]
T2["_stop_loss_monitor()"]
T3["_performance_loop()"]
T4["_risk_tier_scheduler()"]
T5["_rebalance_scheduler()"]
TASKS --> T1 & T2 & T3 & T4 & T5
end
T1 --> POLL["Poll recommendations table\naction IN (buy, sell)\nmode IN (paper_eligible, live_eligible)\ngenerated_at > NOW() 2h"]
POLL --> EVAL["evaluate_recommendation()"]
EVAL --> CHK_A
subgraph PRETRADE["Pre-Trade Check Sequence\n(first failure short-circuits)"]
direction TB
CHK_A["a. Circuit Breaker active?\nservices/trading/circuit_breaker.py\nTriggers: daily_loss, single_position, volatility"]
CHK_B["b. Trading Window?\nis_within_trading_window()"]
CHK_C["c. Confidence Gate\nconfidence ≥ risk_tier.min_confidence"]
CHK_D["d. Deduplication\nRec ID in processed set?\nRedis: stonks:dedupe:trading:*"]
CHK_E["e. Declining Positions\n> 50% positions down > 2%"]
CHK_F["f. Max Open Positions\nopen_count ≥ max (default 10)"]
CHK_A -->|"pass"| CHK_B
CHK_B -->|"pass"| CHK_C
CHK_C -->|"pass"| CHK_D
CHK_D -->|"pass"| CHK_E
CHK_E -->|"pass"| CHK_F
end
CHK_A & CHK_B & CHK_C & CHK_D & CHK_E & CHK_F -->|"fail"| SKIP["TradingDecision\ndecision = skip\n+ skip_reason"]
CHK_F -->|"pass"| SIZER
subgraph SIZER["Position Sizing\nservices/trading/position_sizer.py"]
direction TB
SZ1["Base sizing\nrisk_tier.max_position_pct × 0.5\n× (confidence / min_confidence)"]
SZ2["Correlation reduction\nweighted avg corr > 0.8 → reject\n> 0.5 → proportional reduction"]
SZ3["Sector exposure\ncap at risk_tier.max_sector_pct"]
SZ4["Diversification bonus\n1.2× for new sector (< 3 sectors)"]
SZ5["Earnings proximity\n≤ 1 day → reject\n≤ 3 days → 50% reduction"]
SZ6["Absolute position cap"]
SZ7["Portfolio heat check\nmax_portfolio_heat × active_pool"]
SZ8["Share rounding\nfloor(dollar / price)"]
SZ1 --> SZ2 --> SZ3 --> SZ4 --> SZ5 --> SZ6 --> SZ7 --> SZ8
end
SIZER -->|"rejected"| SKIP
SIZER -->|"approved"| ACT["TradingDecision\ndecision = act\nshares, dollar amount"]
ACT --> PERSIST_TD["Persist to\ntrading_decisions"]
ACT --> ORDER["Build order job\n{ticker, action, side,\nquantity, order_type}"]
ORDER -->|"rpush"| Q_BROKER["stonks:queue:broker_orders"]
Q_BROKER --> BROKER["Broker Adapter\nAlpaca paper trading\nservices/adapters/broker_adapter.py"]
BROKER --> AUDIT
subgraph AUDIT["Audit Trail — PostgreSQL"]
AU1["orders"]
AU2["positions"]
AU3["portfolio_snapshots"]
end
subgraph CB_DETAIL["Circuit Breaker Detail\nservices/trading/circuit_breaker.py"]
CB1["daily_loss\nportfolio loss > 5%\ncooldown: volatility_pause_hours"]
CB2["single_position\nposition loss > 15%\ncooldown: ticker_cooldown_hours (48h)"]
CB3["volatility\n≥ 3 stop-losses in 30min\ncooldown: volatility_pause_hours (2h)"]
CB4["Redis state\nstonks:trading:circuit_breaker:*"]
end
subgraph RESERVE["Reserve Pool\nservices/trading/reserve_pool.py"]
RP1["Profit siphoning: 20%"]
RP2["High-water rebalance: 30%"]
RP3["Emergency liquidation"]
RP4["reserve_pool_ledger"]
end
subgraph RISK_TIER["Risk Tier Auto-Adjustment\nservices/trading/risk_tier_controller.py"]
RT1["Evaluate: Sharpe ratio,\ndrawdown, win rate"]
RT2["conservative → moderate → aggressive"]
RT3["risk_tier_history"]
end
```
@@ -0,0 +1,62 @@
# Trend Accumulation and Escalation
```mermaid
flowchart TD
subgraph Windows["Five Time Windows\nservices/aggregation/worker.py"]
W1["intraday (12h)"]
W2["1d (1 day)"]
W3["7d (7 days)"]
W4["30d (30 days)"]
W5["90d (90 days)"]
end
W1 & W2 & W3 & W4 & W5 --> SIGNALS
SIGNALS["Fetch signals per window\nCompany + Macro + Competitive\n→ WeightedSignal[]"]
SIGNALS --> SENT["weighted_sentiment_average()\nCompute avg sentiment across signals"]
SENT --> DIR
subgraph DIR["derive_trend_direction()"]
D1["avg_sentiment ≥ 0.15 → BULLISH"]
D2["avg_sentiment ≤ 0.15 → BEARISH"]
D3["contradiction > 0.10\nAND |avg| < 0.30 → MIXED"]
D4["otherwise → NEUTRAL"]
end
DIR --> CONF
subgraph CONF["compute_trend_confidence()"]
C1["Unique source count\ncaps at 15 → 0.8 contribution"]
C2["Avg extraction credibility"]
C3["Signal agreement ratio\ndampened by log₂(n+1)/log₂(8)\nsaturates ~7 unique sources"]
C4["Contradiction penalty\n0.4 × contradiction_score"]
C5["confidence = 0.3×count + 0.3×credibility\n+ 0.4×agreement penalty"]
end
CONF --> STRENGTH["trend_strength = |avg_sentiment|\nclamped to [0, 1]"]
STRENGTH --> ESC
subgraph ESC["Escalation Path\n(via eligibility thresholds)"]
direction TB
E1["NEUTRAL\nconfidence < 0.35\nOR strength < 0.10\nOR direction = neutral"]
E2["WATCH\nstrength < 0.25\nAND confidence < 0.50"]
E3["HOLD\nstrength < 0.25\nAND confidence ≥ 0.50"]
E4["BUY / SELL\nstrength ≥ 0.25\nAND direction = bullish/bearish"]
E1 -->|"More signals\nsame direction"| E2
E2 -->|"Confidence grows\nmore unique sources"| E3
E3 -->|"Strength exceeds 0.25\naccumulated evidence"| E4
end
ESC --> PERSIST
subgraph PERSIST["Persistence"]
P1["trend_windows\n(upserted each cycle)"]
P2["trend_history\n(time-series snapshots)"]
P3["trend_evidence\n(per-document rankings)"]
P4["trend_projections\nservices/aggregation/projection.py"]
end
```
@@ -0,0 +1,58 @@
# Weighted Signal Computation
```mermaid
flowchart TD
DOC["Document Signal Input\n(published_at, source_credibility,\nnovelty_score, extraction_confidence,\nmarket_ctx)"]
DOC --> GATE
DOC --> REC
DOC --> CRED
DOC --> NOV
DOC --> MKT
subgraph GATE["Confidence Gate"]
G1["extraction_confidence ≥ 0.2?"]
G1 -->|"Yes"| G2["gate = 1.0"]
G1 -->|"No"| G3["gate = 0.0\n(signal zeroed out)"]
end
subgraph REC["Recency Decay"]
R1["w = 2^(age_hours / half_life)"]
R2["Half-lives per window:\nintraday: 2h\n1d: 12h\n7d: 72h\n30d: 240h\n90d: 720h"]
R3["Floor: min_recency_weight = 0.01"]
R1 --- R2
R1 --- R3
end
subgraph CRED["Source Credibility"]
C1["Clamp to [0.1, 1.0]"]
C2["Apply exponent\n(default 1.0)"]
C1 --> C2
end
subgraph NOV["Novelty Bonus"]
N1["bonus = novelty_score × 0.25"]
N2["Range: [0.0, 0.25]\n(up to 25% boost)"]
N1 --- N2
end
subgraph MKT["Market Context Multiplier"]
M1["Volatility boost\nlog₁₊(excess) × 0.15\ncapped at 0.30"]
M2["Volume surge boost\nvolume_change > 50% → +0.15"]
M3["multiplier = 1.0 + boost\n(always ≥ 1.0)"]
M1 --> M3
M2 --> M3
end
GATE --> FORMULA
REC --> FORMULA
CRED --> FORMULA
NOV --> FORMULA
MKT --> FORMULA
FORMULA["combined = gate × recency × credibility\n× (1 + novelty_bonus)\n× market_context_multiplier"]
FORMULA --> SW["SignalWeight\nservices/aggregation/scoring.py"]
SW --> WS["WeightedSignal\n{ document_id, weight: SignalWeight,\nsentiment_value, impact_score }"]
```
@@ -0,0 +1,40 @@
# Intelligence Pipeline Deep Dive
This document series provides a narrative walkthrough of the full intelligence-to-decision pipeline in Stonks Oracle. Unlike the existing service reference and API documentation, these pages tell the story of how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions.
Each page covers one stage of the pipeline and ends with a transition to the next, so you can read the series end-to-end or jump directly to the stage you need. Diagrams are stored as standalone Mermaid files that can be rendered independently or embedded in other documents.
---
## Table of Contents
1. [Data Ingestion and Preparation](01-data-ingestion-and-preparation.md) — How raw data from Polygon.io, SEC EDGAR, and macro news APIs enters the system, gets deduplicated, stored, parsed, and routed for AI processing.
2. [AI Agent Processing and Structured Extraction](02-ai-agent-processing-and-extraction.md) — How the Document Intelligence Extractor and Global Event Classifier agents use LLM inference to produce structured JSON intelligence from documents.
3. [Signal Scoring and the WeightedSignal Abstraction](03-signal-scoring-and-weighted-signals.md) — How raw extraction output is transformed into weighted signals through confidence gating, recency decay, source credibility, novelty bonuses, and market context multipliers.
4. [Trend Aggregation and Accumulating Signals](04-trend-aggregation-and-accumulating-signals.md) — How the aggregation engine merges weighted signals across five time windows, detects contradictions, ranks evidence, and escalates trend strength as consecutive signals accumulate.
5. [Recommendation Generation](05-recommendation-generation.md) — How trend summaries pass through data quality suppression, eligibility evaluation, position sizing, thesis generation, and risk classification to produce actionable recommendations.
6. [Trading Decisions and Execution](06-trading-decisions-and-execution.md) — How the trading engine polls recommendations, runs pre-trade checks, sizes positions, enforces circuit breakers, and submits orders through the broker adapter.
---
## Diagrams
The following Mermaid diagram files can be rendered independently or referenced from the narrative pages:
- [Ingestion to Extraction Flow](diagrams/ingestion-to-extraction-flow.md) — Flowchart from Scheduler through Ingestion, Parser, to Extractor with all queues and storage.
- [Three-Layer Signal Merging](diagrams/three-layer-signal-merging.md) — Company, Macro, and Competitive signal layers converging into the Aggregation engine.
- [Weighted Signal Computation](diagrams/weighted-signal-computation.md) — Component breakdown of the composite weight formula.
- [Trend Accumulation and Escalation](diagrams/trend-accumulation-escalation.md) — How consecutive signals strengthen trends and escalate actions across time windows.
- [Recommendation Generation Flow](diagrams/recommendation-generation-flow.md) — From TrendSummary through suppression, eligibility, thesis, risk classification, to persistence.
- [Trading Engine Decision Loop](diagrams/trading-engine-decision-loop.md) — Pre-trade check sequence, position sizing, and order submission flow.
---
## Related Documentation
For reference-level detail on individual services, AI agent configuration, and infrastructure, see the existing documentation:
- [Services Reference](../services.md) — Per-service configuration, database tables, queues, and runtime behaviors.
- [AI Agents Guide](../ai-agents.md) — AI agent configuration, variants, A/B testing, and the agent management API.
- [Data Pipeline Architecture](../architecture-data-pipeline.md) — Queue topology, data store summary, and Mermaid flow diagrams for the full data pipeline.
- [LLM-to-Trade Pipeline](../llm-to-trade-pipeline.md) — End-to-end data flow from model output through signal aggregation to trade execution.
+612
View File
@@ -0,0 +1,612 @@
# Observability and Metrics Reference
This document covers the full observability stack for Stonks Oracle: Prometheus metrics, operational alerting, structured logging, dead-letter queues, and recommended monitoring queries.
## Prometheus Metrics Endpoint
The Query API exposes a `/metrics` endpoint that returns all registered Prometheus metrics in the standard text exposition format.
**Endpoint**: `GET /metrics` on the Query API service (port 8000)
**Response**: `text/plain; version=0.0.4; charset=utf-8` — standard Prometheus scrape format via `prometheus_client.generate_latest()`.
### Prometheus Scrape Configuration
Add the following job to your `prometheus.yml`:
```yaml
scrape_configs:
- job_name: "stonks-oracle"
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
static_targets:
- targets:
# Docker Compose
- "query-api:8000"
# Kubernetes
# - "query-api.stonks-oracle.svc.cluster.local:8000"
```
For Kubernetes deployments, you can also use a `ServiceMonitor` resource if the Prometheus Operator is installed:
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: stonks-oracle
namespace: stonks-oracle
spec:
selector:
matchLabels:
app: query-api
endpoints:
- port: http
path: /metrics
interval: 15s
```
---
## Prometheus Metrics Reference
All metrics are defined in `services/shared/metrics.py`. Metric names use the `stonks_` prefix.
### Service Info
| Metric | Type | Description |
|--------|------|-------------|
| `stonks_oracle_info` | Info | Service metadata (build version, etc.) |
### Ingestion Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_ingestion_jobs_total` | Counter | `source_type`, `status` | Total ingestion jobs processed |
| `stonks_ingestion_items_fetched_total` | Counter | `source_type` | Total items fetched from external sources |
| `stonks_ingestion_items_new_total` | Counter | `source_type` | New (non-duplicate) items ingested |
| `stonks_ingestion_items_deduped_total` | Counter | `source_type` | Items skipped due to deduplication |
| `stonks_ingestion_errors_total` | Counter | `source_type` | Ingestion errors by source type |
| `stonks_ingestion_adapter_duration_seconds` | Histogram | `source_type` | Adapter fetch latency (buckets: 0.1, 0.5, 1, 2, 5, 10, 30, 60s) |
### Parsing Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_parse_jobs_total` | Counter | `status` | Total parse jobs processed |
| `stonks_parse_quality_score` | Histogram | — | Distribution of parser quality scores (buckets: 0.11.0 in 0.1 steps) |
| `stonks_parse_low_quality_total` | Counter | — | Documents flagged as low quality by the parser |
| `stonks_parse_duration_seconds` | Histogram | — | Parse job duration (buckets: 0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10s) |
### Extraction Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_extraction_jobs_total` | Counter | `status` | Total extraction jobs processed |
| `stonks_extraction_attempts_total` | Counter | — | Total Ollama extraction attempts (including retries) |
| `stonks_extraction_retries_total` | Counter | — | Extraction retry count |
| `stonks_extraction_duration_seconds` | Histogram | — | Extraction total duration (buckets: 1, 2, 5, 10, 20, 30, 60, 120s) |
| `stonks_extraction_confidence` | Histogram | — | Distribution of extraction confidence scores (buckets: 0.11.0) |
| `stonks_extraction_validation_errors_total` | Counter | — | Total validation errors across extractions |
| `stonks_extraction_tokens_total` | Counter | `direction` | Estimated token usage (labels: `input`, `output`) |
### Aggregation Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_aggregation_windows_total` | Counter | `window` | Trend windows computed |
| `stonks_aggregation_signals_total` | Counter | `window` | Signals processed during aggregation |
| `stonks_aggregation_contradiction_score` | Histogram | — | Distribution of contradiction scores in trend windows (buckets: 0.01.0) |
| `stonks_aggregation_duration_seconds` | Histogram | `window` | Aggregation job duration (buckets: 0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10s) |
### Recommendation Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_recommendations_total` | Counter | `action`, `mode` | Recommendations generated |
| `stonks_recommendations_suppressed_total` | Counter | — | Recommendations suppressed due to low data quality |
| `stonks_recommendation_confidence` | Histogram | — | Distribution of recommendation confidence scores (buckets: 0.11.0) |
### Lake Publication Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_lake_facts_published_total` | Counter | `table_name` | Analytical facts published to the lakehouse |
| `stonks_lake_publish_duration_seconds` | Histogram | `table_name` | Lake publication write latency (buckets: 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 5s) |
| `stonks_lake_publish_errors_total` | Counter | `table_name` | Lake publication errors |
| `stonks_lake_publish_bytes_total` | Counter | `table_name` | Total bytes written to the lakehouse |
### Trading and Broker Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_orders_submitted_total` | Counter | `side`, `order_type`, `mode` | Orders submitted to broker |
| `stonks_orders_rejected_total` | Counter | `reason_category` | Orders rejected before broker submission |
| `stonks_orders_filled_total` | Counter | `side` | Orders filled by broker |
| `stonks_orders_duplicates_prevented_total` | Counter | `detected_via` | Duplicate orders prevented by idempotency checks |
| `stonks_risk_evaluations_total` | Counter | `result` | Risk evaluations performed |
| `stonks_risk_check_failures_total` | Counter | `check_name` | Individual risk check failures |
| `stonks_positions_synced_total` | Counter | — | Position sync operations completed |
### Alerting Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_alerts_fired_total` | Counter | `rule`, `severity` | Total alerts fired by rule |
| `stonks_alerts_resolved_total` | Counter | `rule` | Total alerts resolved by rule |
| `stonks_alert_check_duration_seconds` | Histogram | — | Duration of alert evaluation cycle (buckets: 0.015s) |
| `stonks_alert_active` | Gauge | `rule` | Whether an alert rule is currently firing (1) or resolved (0) |
### Dead-Letter Queue Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_dlq_items_total` | Counter | `queue` | Jobs sent to dead-letter queues |
| `stonks_dlq_replayed_total` | Counter | `queue` | Jobs replayed from dead-letter queues |
| `stonks_dlq_depth` | Gauge | `queue` | Current dead-letter queue depth |
### Active Jobs Gauge
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `stonks_active_jobs` | Gauge | `stage` | Currently processing jobs by pipeline stage |
---
## Alerting Module
The alerting module (`services/shared/alerting.py`) evaluates four operational alert rules against PostgreSQL state on a configurable interval. When a threshold is breached, the module emits structured log events and increments Prometheus counters. When a previously firing alert clears, it logs a resolution event.
### Alert Rules
#### 1. `source_failures` — Sustained Source Retrieval Failures
Detects sources where the last N ingestion runs all failed within the lookback window.
| Parameter | ConfigMap Variable | Default | Description |
|-----------|--------------------|---------|-------------|
| Consecutive failure threshold | `ALERT_SOURCE_FAILURE_THRESHOLD` | `3` | Number of consecutive failures before alert fires |
| Lookback window | `ALERT_SOURCE_FAILURE_WINDOW_HOURS` | `6` hours | How far back to check ingestion_runs |
**Severity**: `warning`
**Query**: Checks `ingestion_runs` for sources where the most recent N runs (within the window) all have `status = 'failed'`.
**Details emitted**: `source_id`, `source_type`, `source_name`, `ticker`, `consecutive_failures`
#### 2. `schema_failure_spike` — Extraction Validation Failure Rate
Detects when the extraction schema validation failure rate exceeds a threshold.
| Parameter | ConfigMap Variable | Default | Description |
|-----------|--------------------|---------|-------------|
| Failure rate threshold | `ALERT_SCHEMA_FAILURE_RATE_THRESHOLD` | `0.3` (30%) | Failure rate that triggers the alert |
| Lookback window | `ALERT_SCHEMA_FAILURE_WINDOW_HOURS` | `1` hour | Window for computing failure rate |
**Severity**: `warning` if rate ≥ 30%, `critical` if rate ≥ 50%
**Query**: Computes `failed / total` from `model_performance_metrics` within the window.
**Details emitted**: `total_extractions`, `failed_extractions`, `failure_rate`, `threshold`, `window_hours`
#### 3. `analytical_lag` — Lake Publication Lag
Detects when lake publication has not completed within the threshold for any table.
| Parameter | ConfigMap Variable | Default | Description |
|-----------|--------------------|---------|-------------|
| Lag threshold | `ALERT_LAKE_LAG_THRESHOLD_MINUTES` | `60` minutes | Maximum acceptable time since last successful publish |
**Severity**: `warning`
**Query**: Checks `audit_events` for the most recent successful `lake_publish` event per table, alerts if any are older than the threshold.
**Details emitted**: `table_name`, `last_publish`, `lag_minutes`, `threshold_minutes`
#### 4. `broker_issues` — Consecutive Broker Errors
Detects consecutive broker submission errors (rejections, timeouts, connection failures).
| Parameter | ConfigMap Variable | Default | Description |
|-----------|--------------------|---------|-------------|
| Error threshold | `ALERT_BROKER_ERROR_THRESHOLD` | `3` | Consecutive broker errors before alert fires |
| Lookback window | `ALERT_BROKER_ERROR_WINDOW_HOURS` | `1` hour | Window for checking order_events |
**Severity**: `critical`
**Query**: Counts recent `order_events` with `event_type IN ('broker_error', 'broker_timeout', 'connection_failed')`.
**Details emitted**: `error_count`, `threshold`, `window_hours`
### Evaluation Cycle
The alerting module runs on a configurable interval (default: every 120 seconds, controlled by `ALERT_CHECK_INTERVAL_SECONDS`). Each cycle:
1. Runs all four alert rules against PostgreSQL
2. Compares results to the current `AlertState` to detect new firings and resolutions
3. For new firings: increments `stonks_alerts_fired_total`, sets `stonks_alert_active` gauge to 1, logs a `WARNING`
4. For resolutions: increments `stonks_alerts_resolved_total`, sets `stonks_alert_active` gauge to 0, logs an `INFO`
5. Records the evaluation duration in `stonks_alert_check_duration_seconds`
Each rule check is wrapped in a try/except so a failure in one rule does not block the others.
### ConfigMap Variables Summary
| Variable | Default | Description |
|----------|---------|-------------|
| `ALERT_SOURCE_FAILURE_THRESHOLD` | `3` | Consecutive source failures before alert |
| `ALERT_SOURCE_FAILURE_WINDOW_HOURS` | `6` | Source failure lookback window (hours) |
| `ALERT_SCHEMA_FAILURE_RATE_THRESHOLD` | `0.3` | Extraction failure rate threshold (0.01.0) |
| `ALERT_SCHEMA_FAILURE_WINDOW_HOURS` | `1` | Schema failure lookback window (hours) |
| `ALERT_LAKE_LAG_THRESHOLD_MINUTES` | `60` | Max minutes since last lake publish |
| `ALERT_BROKER_ERROR_THRESHOLD` | `3` | Consecutive broker errors before alert |
| `ALERT_BROKER_ERROR_WINDOW_HOURS` | `1` | Broker error lookback window (hours) |
| `ALERT_CHECK_INTERVAL_SECONDS` | `120` | Seconds between alert evaluation cycles |
---
## Structured Logging
All services use structured JSON logging configured via `services/shared/logging.py`. Call `setup_logging(service_name)` once at service startup.
### JSON Log Format
Each log line is a single JSON object with the following fields:
```json
{
"timestamp": "2025-01-15T12:34:56.789012+00:00",
"level": "INFO",
"logger": "ingestion_worker",
"message": "Processed job for AAPL",
"service": "ingestion_worker",
"trace_id": "a1b2c3d4e5f67890",
"span_id": "1a2b3c4d"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `timestamp` | string (ISO 8601) | UTC timestamp of the log event |
| `level` | string | Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL` |
| `logger` | string | Python logger name |
| `message` | string | Human-readable log message |
| `service` | string | Service name set at startup (e.g., `ingestion_worker`, `scheduler`) |
| `trace_id` | string | 16-character hex trace ID for distributed tracing |
| `span_id` | string | 8-character hex span ID for the current operation |
### Additional Context Fields
When present, these fields are merged into the JSON output:
| Field | Source | Description |
|-------|--------|-------------|
| `span_operation` | `Span` context manager | Name of the traced operation |
| `span_status` | `Span` context manager | `ok` or `error` |
| `span_duration_ms` | `Span` context manager | Duration of the span in milliseconds |
| `span_parent_id` | `Span` context manager | Parent span ID for nested spans |
| `span_attributes` | `Span` context manager | Arbitrary key-value attributes set on the span |
| `ticker` | Manual `extra={}` | Company ticker symbol |
| `document_id` | Manual `extra={}` | Document UUID |
| `source_type` | Manual `extra={}` | Source type (e.g., `polygon`, `news_api`) |
| `job_id` | Manual `extra={}` | Job identifier |
| `duration_ms` | Manual `extra={}` | Operation duration |
| `error` | Manual `extra={}` | Error description |
| `count` | Manual `extra={}` | Item count |
| `exception` | Automatic | Formatted exception traceback (when `exc_info` is set) |
### Trace Context Propagation
Trace context flows through the pipeline via job payloads:
1. **Inject**: Before enqueuing a job to Redis, call `inject_trace_context(payload)` to add `_trace_id` to the payload dict.
2. **Extract**: At the start of job processing, call `extract_trace_context(payload)` to restore the trace context (or generate a new one if absent).
3. **Span**: Use the `Span` context manager to create child spans within a service:
```python
from services.shared.logging import Span
with Span("process_document", ticker="AAPL") as span:
# ... do work ...
span.set_attribute("doc_count", 5)
```
This produces a structured log entry on span exit with duration, status, and attributes.
### Log Querying
To trace a request through the pipeline, filter by `trace_id`:
```bash
# Kubernetes — find all logs for a specific trace
kubectl logs -n stonks-oracle -l app.kubernetes.io/part-of=stonks-oracle --all-containers \
| jq -r 'select(.trace_id == "a1b2c3d4e5f67890")'
# Docker Compose — search across all services
docker compose logs --no-color | grep '"trace_id":"a1b2c3d4e5f67890"'
```
To find errors in a specific service:
```bash
# Kubernetes
kubectl logs -n stonks-oracle deployment/extractor --tail=500 \
| jq 'select(.level == "ERROR")'
# Docker Compose
docker compose logs extractor --no-color --tail=500 \
| jq 'select(.level == "ERROR")'
```
To find slow extraction spans:
```bash
kubectl logs -n stonks-oracle deployment/extractor --tail=1000 \
| jq 'select(.span_operation == "extract_document" and .span_duration_ms > 30000)'
```
---
## Dead-Letter Queue System
When a worker fails to process a job after exhausting retries (default: 3 attempts), the job is pushed to a per-queue dead-letter list in Redis. The DLQ system is implemented in `services/shared/dead_letter.py`.
### Queue Names
Dead-letter queues follow the naming pattern `stonks:dlq:<queue_name>`:
| DLQ Key | Source Queue | Description |
|---------|-------------|-------------|
| `stonks:dlq:ingestion` | `stonks:queue:ingestion` | Failed ingestion jobs (adapter errors, API failures) |
| `stonks:dlq:parsing` | `stonks:queue:parsing` | Failed parse jobs |
| `stonks:dlq:extraction` | `stonks:queue:extraction` | Failed extraction jobs (LLM errors, validation failures) |
| `stonks:dlq:aggregation` | `stonks:queue:aggregation` | Failed aggregation jobs |
| `stonks:dlq:recommendation` | `stonks:queue:recommendation` | Failed recommendation jobs |
| `stonks:dlq:broker_orders` | `stonks:queue:broker_orders` | Failed broker order submissions |
When `DEPLOY_STAGE` is set, the prefix becomes `stonks:<stage>:dlq:<queue_name>`.
### DLQ Entry Format
Each DLQ entry wraps the original job payload with failure metadata:
```json
{
"original_payload": {
"source_id": "...",
"source_type": "polygon",
"ticker": "AAPL",
"company_id": "...",
"config": {}
},
"queue": "ingestion",
"error": "ConnectionError: API timeout after 30s",
"attempt": 3,
"worker": "ingestion_worker",
"dead_lettered_at": "2025-01-15T12:34:56.789012+00:00"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `original_payload` | object | The original job payload as it was enqueued |
| `queue` | string | Source queue name |
| `error` | string | Error message from the final failed attempt |
| `attempt` | integer | Number of attempts made before dead-lettering |
| `worker` | string | Worker identifier that dead-lettered the job |
| `dead_lettered_at` | string (ISO 8601) | UTC timestamp when the job was dead-lettered |
### Routing
Jobs are routed to the DLQ by calling `send_to_dlq()` from worker code after retry exhaustion:
```python
from services.shared.dead_letter import send_to_dlq
await send_to_dlq(
rds=redis_client,
queue_name="ingestion",
original_payload=job,
error=str(exception),
attempt=3,
worker="ingestion_worker",
)
```
The default maximum attempts before dead-lettering is `DEFAULT_MAX_ATTEMPTS = 3`.
### Replay Tooling
The `services/shared/dead_letter.py` module provides functions for inspecting and replaying DLQ items:
| Function | Description |
|----------|-------------|
| `peek_dlq(rds, queue_name, start=0, count=10)` | Inspect DLQ entries without removing them |
| `replay_one(rds, queue_name)` | Pop the oldest DLQ entry and re-enqueue its original payload to the source queue |
| `replay_all(rds, queue_name)` | Replay every item in the DLQ back to the source queue. Returns the count replayed |
| `dlq_length(rds, queue_name)` | Return the number of items in the DLQ |
| `dlq_summary(rds, queue_names)` | Return a mapping of queue_name → DLQ depth for multiple queues |
| `purge_dlq(rds, queue_name)` | Delete all items from the DLQ. Returns count removed |
### Monitoring DLQ Depth
Use the `scripts/check_queues.py` script to inspect queue and DLQ depths from the command line:
```bash
# Docker Compose
REDIS_HOST=localhost REDIS_PORT=6379 REDIS_PASSWORD="" \
python scripts/check_queues.py
# Kubernetes
kubectl exec -n stonks-oracle deployment/query-api -- \
python scripts/check_queues.py
```
The Query API also exposes DLQ depths in the `/api/ops/pipeline/stream` SSE endpoint and the DevOps metrics endpoints, reporting `dlq:<queue_name>` keys alongside regular queue depths.
The `stonks_dlq_depth` Prometheus gauge tracks DLQ depth per queue for dashboard alerting.
---
## Recommended Prometheus/Grafana Queries
### Ingestion Throughput
```promql
# Ingestion jobs per minute by source type and status
sum(rate(stonks_ingestion_jobs_total[5m])) by (source_type, status) * 60
# New items ingested per minute
sum(rate(stonks_ingestion_items_new_total[5m])) * 60
# Deduplication ratio (higher = more duplicates being filtered)
sum(rate(stonks_ingestion_items_deduped_total[5m]))
/ sum(rate(stonks_ingestion_items_fetched_total[5m]))
# Adapter latency p95 by source type
histogram_quantile(0.95, sum(rate(stonks_ingestion_adapter_duration_seconds_bucket[5m])) by (le, source_type))
# Ingestion error rate
sum(rate(stonks_ingestion_errors_total[5m])) by (source_type)
```
### Extraction Latency and Quality
```promql
# Extraction duration p50 and p95
histogram_quantile(0.5, sum(rate(stonks_extraction_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.95, sum(rate(stonks_extraction_duration_seconds_bucket[5m])) by (le))
# Extraction success rate
sum(rate(stonks_extraction_jobs_total{status="success"}[5m]))
/ sum(rate(stonks_extraction_jobs_total[5m]))
# Average extraction confidence
histogram_quantile(0.5, sum(rate(stonks_extraction_confidence_bucket[5m])) by (le))
# Validation error rate
sum(rate(stonks_extraction_validation_errors_total[5m]))
# Token usage rate (input vs output)
sum(rate(stonks_extraction_tokens_total[5m])) by (direction)
```
### Aggregation Volume
```promql
# Trend windows computed per minute by window size
sum(rate(stonks_aggregation_windows_total[5m])) by (window) * 60
# Signals processed per minute
sum(rate(stonks_aggregation_signals_total[5m])) by (window) * 60
# Average contradiction score (higher = more conflicting signals)
histogram_quantile(0.5, sum(rate(stonks_aggregation_contradiction_score_bucket[5m])) by (le))
# Aggregation duration p95
histogram_quantile(0.95, sum(rate(stonks_aggregation_duration_seconds_bucket[5m])) by (le, window))
```
### Recommendation Generation
```promql
# Recommendations generated per minute by action
sum(rate(stonks_recommendations_total[5m])) by (action, mode) * 60
# Suppression rate
sum(rate(stonks_recommendations_suppressed_total[5m]))
/ sum(rate(stonks_recommendations_total[5m]))
# Recommendation confidence distribution
histogram_quantile(0.5, sum(rate(stonks_recommendation_confidence_bucket[5m])) by (le))
```
### Trading Engine Activity
```promql
# Orders submitted per minute by side
sum(rate(stonks_orders_submitted_total[5m])) by (side, mode) * 60
# Order rejection rate by reason
sum(rate(stonks_orders_rejected_total[5m])) by (reason_category)
# Fill rate
sum(rate(stonks_orders_filled_total[5m]))
/ sum(rate(stonks_orders_submitted_total[5m]))
# Duplicate orders prevented
sum(rate(stonks_orders_duplicates_prevented_total[5m])) by (detected_via)
# Risk evaluation outcomes
sum(rate(stonks_risk_evaluations_total[5m])) by (result)
# Risk check failure breakdown
sum(rate(stonks_risk_check_failures_total[5m])) by (check_name)
```
### Lake Publication
```promql
# Facts published per minute by table
sum(rate(stonks_lake_facts_published_total[5m])) by (table_name) * 60
# Write latency p95 by table
histogram_quantile(0.95, sum(rate(stonks_lake_publish_duration_seconds_bucket[5m])) by (le, table_name))
# Publication error rate
sum(rate(stonks_lake_publish_errors_total[5m])) by (table_name)
# Bytes written per minute
sum(rate(stonks_lake_publish_bytes_total[5m])) by (table_name) * 60
```
### Alerting Health
```promql
# Currently active alerts by rule
stonks_alert_active
# Alert firing rate
sum(rate(stonks_alerts_fired_total[1h])) by (rule, severity)
# Alert evaluation duration
histogram_quantile(0.95, sum(rate(stonks_alert_check_duration_seconds_bucket[5m])) by (le))
```
### Dead-Letter Queue Health
```promql
# Current DLQ depth by queue
stonks_dlq_depth
# DLQ inflow rate (jobs dead-lettered per minute)
sum(rate(stonks_dlq_items_total[5m])) by (queue) * 60
# DLQ replay rate
sum(rate(stonks_dlq_replayed_total[5m])) by (queue) * 60
```
### Pipeline Overview (Active Jobs)
```promql
# Currently active jobs by pipeline stage
stonks_active_jobs
# Parse quality score distribution
histogram_quantile(0.5, sum(rate(stonks_parse_quality_score_bucket[5m])) by (le))
# Low quality document rate
sum(rate(stonks_parse_low_quality_total[5m]))
/ sum(rate(stonks_parse_jobs_total[5m]))
```
### Recommended Grafana Alert Rules
| Alert | Expression | For | Severity |
|-------|-----------|-----|----------|
| High DLQ depth | `stonks_dlq_depth > 10` | 5m | warning |
| Ingestion error spike | `sum(rate(stonks_ingestion_errors_total[5m])) > 0.5` | 5m | warning |
| Extraction latency high | `histogram_quantile(0.95, sum(rate(stonks_extraction_duration_seconds_bucket[5m])) by (le)) > 60` | 10m | warning |
| Lake publication stale | `stonks_alert_active{rule="analytical_lag"} == 1` | 5m | warning |
| Broker errors active | `stonks_alert_active{rule="broker_issues"} == 1` | 1m | critical |
| Zero ingestion throughput | `sum(rate(stonks_ingestion_jobs_total[15m])) == 0` | 15m | critical |
@@ -0,0 +1,130 @@
# Page 1 — Data Ingestion and Preparation
Every signal that the platform eventually acts on begins its life as raw data pulled from an external source. Before any AI agent can extract structured intelligence, before any trend can accumulate, and before any decision can be executed, the platform must first discover new content, fetch it reliably, eliminate duplicates, store the raw artifacts for audit, and normalize the text into a form suitable for downstream processing. This page traces that journey from external API to parser output, covering the Scheduler, Ingestion Worker, deduplication layer, raw storage, and Parser in detail.
For a visual overview of the full flow described here, see the [Ingestion to Extraction Flow diagram](diagrams/ingestion-to-extraction-flow.md).
---
## Four Categories of Input Data
The platform tracks 50 entities across 10 sectors, and it draws intelligence from four distinct categories of external data. Each category has its own adapter, its own API conventions, and its own scheduling cadence, but all of them feed into the same ingestion pipeline.
The first category is **entity news**, sourced from the external data provider's news endpoint (`/v2/reference/news`). The `ExternalNewsAdapter` in `services/adapters/news_adapter.py` fetches articles linked to a specific entity identifier, returning structured results that include title, publisher, article URL, description, keywords, and publication timestamp. Each request can return up to 1,000 articles, though the default limit is 20 per fetch. The adapter tracks the most recent `published_utc` value and uses it on subsequent fetches to avoid re-retrieving articles the system has already seen.
The second category is **regulatory filings**, sourced from the public records API full-text search system (regulatory filings source). The `RegulatoryFilingsAdapter` in `services/adapters/filings_adapter.py` queries the `/LATEST/search-index` endpoint for regulatory filing types and other form types associated with an entity's identifier or CIK number. Unlike the external data provider endpoints, the public records API requires no key — only a descriptive `User-Agent` header per the API's fair-access policy. The adapter deduplicates results by accession number (`adsh`), filters out non-primary documents like XML fragments and graphics, and constructs the public records API filing index URL for each hit so downstream services can fetch the full document.
The third category is **data feeds**, also sourced from the external data provider. The `ExternalDataAdapter` in `services/adapters/market_adapter.py` supports multiple endpoints: previous-day aggregate bars (`/v2/aggs/ticker/{ticker}/prev`), range bars for custom date windows, intraday hourly bars, grouped daily bars that return data for all entities in a single call (`/v2/aggs/grouped/locale/us/market/stocks/{date}`), and entity detail lookups. Data feeds follow a different path than textual content — they do not pass through the Parser or Extractor, since the structured numeric data is already in a usable form.
The fourth category is **macro and geopolitical news**, fetched by the `MacroNewsAdapter` in `services/adapters/macro_news_adapter.py`. Unlike the other three categories, macro news is not entity-specific. These sources have `source_type='macro_news'` in the `sources` database table and may have a `NULL` `company_id`. The adapter fetches from a configurable HTTP endpoint (typically the external data provider's news API filtered for broad topics) and returns articles that describe global events — policy shifts, central bank decisions, geopolitical conflicts — rather than entity-specific developments. Macro news articles are eventually classified by the Global Event Classifier agent and routed through a separate queue, as described in [Page 2](02-ai-agent-processing-and-extraction.md).
All four adapter classes inherit from `BaseAdapter` defined in `services/adapters/base.py` and return an `AdapterResult` dataclass containing the raw payload bytes, a SHA-256 content hash, a list of parsed item dicts, HTTP metadata (status code, response time), and an error field that is `None` on success. This uniform interface allows the Ingestion Worker to handle all source types through a single dispatch mechanism.
---
## The Scheduler: Orchestrating Ingestion Cycles
The Scheduler (`services/scheduler/app.py`) is the heartbeat of the ingestion pipeline. It runs a continuous loop that ticks every 15 seconds (`SCHEDULER_TICK = 15`), and on each tick it evaluates which sources are due for their next fetch. The Scheduler does not fetch data itself — it enqueues jobs onto the `app:queue:ingestion` Redis list for the Ingestion Worker to process.
Each source type has a default polling cadence defined in the `DEFAULT_CADENCES` dictionary:
| Source Type | Default Cadence |
|------------------|-----------------|
| `market_api` | 300 seconds |
| `news_api` | 300 seconds |
| `filings_api` | 3,600 seconds |
| `macro_news` | 600 seconds |
| `web_scrape` | 1,800 seconds |
| `execution_api` | 30 seconds |
Individual sources can override their cadence via the `polling_interval_seconds` field in their `config` JSONB column in the `sources` table. The `get_cadence_for_source()` function checks for this override first, falling back to the default if none is set, and enforces a minimum interval of 10 seconds.
The Scheduler determines whether a source is due by calling `is_source_due()`, which considers several conditions. If a source has never run before (no entry in the `ingestion_runs` table), it is immediately due. If the last run failed, the Scheduler respects an exponential backoff computed by `compute_backoff()`: the delay starts at 60 seconds (`DEFAULT_BACKOFF_BASE`) and doubles with each retry up to a maximum of 3,600 seconds (`MAX_BACKOFF`). If a source has failed 10 consecutive times (`MAX_RETRY_COUNT`), the Scheduler stops scheduling it entirely until an operator manually resets the retry state. If the last run is still marked as `running`, the source is skipped to prevent double-scheduling. Otherwise, the Scheduler checks whether enough time has elapsed since the last completed run based on the source's cadence.
Rate limiting adds another layer of protection. The `check_rate_limit()` function enforces two constraints. First, each source type has a per-type limit defined in `DEFAULT_RATE_LIMITS` — for example, `market_api` and `news_api` are each capped at 20 requests per minute, while `filings_api` and `macro_news` are capped at 10. Second, because `market_api` and `news_api` both use the same external data provider API key, a global provider rate limit of 45 requests per minute (`PROVIDER_GLOBAL_RATE_LIMIT`) is enforced across both types combined. Rate limit state is tracked in Redis using keys of the form `app:ratelimit:{source_type}:{window}`, where the window is a minute-granularity timestamp. If a source type exceeds its limit, the Scheduler logs a warning and skips that source for the current tick.
The Scheduler handles three categories of sources in each cycle. First, it fetches all active entity-specific sources (excluding `macro_news`) by joining the `sources` and `companies` tables. Second, it fetches active macro news sources separately, since these may not have a `company_id`. Third, it fetches global data sources — those with `source_type='market_api'` and `company_id IS NULL` — which represent endpoints like the grouped daily bars that return data for all entities in a single API call. For intraday bar sources, the Scheduler expands a single global source into per-entity jobs for every active entity.
Each enqueued job payload includes the `source_id`, `company_id`, `ticker`, `legal_name`, `source_type`, `source_name`, `config`, `credibility_score`, a list of company `aliases` (fetched from the `company_aliases` table), and a `scheduled_at` timestamp. The job is pushed onto `app:queue:ingestion` via Redis `RPUSH`.
Beyond scheduling, the Scheduler also performs periodic maintenance. Every ~20 cycles (~5 minutes), it runs `recover_stale_documents()` to re-enqueue documents that have been stuck in `parsed` status for longer than 240 minutes — a safety net for cases where Redis loses queue entries due to pod restarts or OOM events. Every ~40 cycles (~10 minutes), it runs `retry_failed_extractions()` to give documents in `extraction_failed` status another chance, resetting them to `parsed` and deleting the failed `document_intelligence` row so the Extractor treats them as fresh. Every ~100 cycles (~25 minutes), it runs `cleanup_all_tables()` to enforce retention policies across tables like `competitive_signal_records` (30 days), `ingestion_runs` (14 days), and `execution_decisions` (90 days).
For more detail on the Scheduler's configuration and operational behavior, see the [Services Reference](../services.md).
---
## The Ingestion Worker: Adapter Dispatch and Persistence
The Ingestion Worker (`services/ingestion/worker.py`) is a long-running process that continuously pops jobs from the `app:queue:ingestion` Redis list and processes them. On startup, it initializes one instance of each adapter class and stores them in a dispatch dictionary keyed by `source_type`:
```
adapters = {
"market_api": ExternalDataAdapter(...),
"news_api": ExternalNewsAdapter(...),
"filings_api": RegulatoryFilingsAdapter(),
"web_scrape": WebScrapeAdapter(),
"execution_api": ExecutionAdapter(...),
"macro_news": MacroNewsAdapter(...),
}
```
When a job arrives, the `process_job()` function looks up the appropriate adapter by `source_type` and calls its `fetch()` method with the ticker and source config. Before fetching, it records a new row in the `ingestion_runs` table with status `running`. If the adapter returns an error, the worker calls `record_retrieval_failure()` to update the run status and increment the source's retry counter with exponential backoff timing.
On a successful fetch, the worker performs several steps in sequence. First, it uploads the raw payload to MinIO via `upload_raw_artifact()` in `services/shared/storage.py`. The target bucket is determined by the source type through the `SOURCE_BUCKET_MAP`: `market_api` payloads go to `app-raw-data`, `news_api` and `macro_news` payloads go to `app-raw-content`, and `filings_api` payloads go to `app-raw-filings`. Objects are stored under a path that encodes the source type, entity identifier, date hierarchy, and document ID — for example, `news_api/Entity-A/2025/01/15/{run_id}/raw.json`.
---
## Content Deduplication via Redis
After storing the raw artifact, the Ingestion Worker checks for duplicate content. Deduplication operates at two levels.
At the payload level, the worker checks the overall `content_hash` (a SHA-256 digest of the raw API response) against Redis. The key pattern is `app:dedupe:{content_hash}` with a 24-hour TTL (86,400 seconds). If the hash is already present, the entire payload is skipped — the `ingestion_runs` row is marked as completed with `items_new=0`, and no downstream jobs are enqueued. If the hash is new, the worker sets the marker in Redis so future fetches of identical content are caught.
At the individual item level, for source types other than `market_api` and `execution_api`, the worker calls `dedupe_items()` from `services/shared/dedupe.py`. This function checks each item against a layered deduplication strategy. The fast path checks Redis for both content-hash markers (`app:dedupe:{hash}`) and canonical-URL markers (`app:dedupe:url:{url_hash}`), both with 24-hour TTLs. If the Redis check misses, the function falls back to PostgreSQL, querying the `documents` table by `content_hash` or `canonical_url` for durable cross-source matching. When a duplicate is found through the PostgreSQL fallback, the function warms the Redis cache so subsequent checks are fast.
Items identified as duplicates are not discarded entirely. If the duplicate document was originally ingested for a different entity, the worker creates a cross-source mention link in the `document_company_mentions` table via `persist_document_company_mention()`. This ensures that a news article mentioning both Entity-A and Entity-F is linked to both entities even if it was first ingested through Entity-A's news source.
New (non-duplicate) items are persisted to PostgreSQL through `persist_ingestion_items()` in `services/shared/metadata.py`, which inserts rows into the `documents` table and records entity mentions in `document_company_mentions`. Each new document ID is then pushed onto `app:queue:parsing` for the Parser to process. After persistence, the worker calls `mark_as_seen()` to set Redis dedupe markers for both the content hash and canonical URL of each new item, ensuring that the next fetch cycle's deduplication checks are fast.
On successful completion, the worker updates the `ingestion_runs` row with the final counts (`items_fetched`, `items_new`) and calls `reset_source_retry_state()` to clear any accumulated backoff from previous failures. For news-type sources (`news_api` and `macro_news`), the worker also updates the source's `config` JSONB column with the latest `published_utc` value, so the next fetch only retrieves newer articles.
---
## The Parser: Normalization, Quality Scoring, and Routing
Documents that pass through ingestion arrive on the `app:queue:parsing` Redis list as JSON payloads containing a `document_id`, `ticker`, and `source_type`. The Parser Worker (`services/parser/worker.py`) pops these jobs and transforms raw HTML or text into normalized, quality-scored documents ready for AI extraction.
The parsing pipeline begins with HTML fetching. If the document has a URL (looked up from the `documents` table if not present in the job payload), the worker calls `fetch_html()` to retrieve the page content. Public records API URLs receive a specialized `User-Agent` header to comply with the API's fair-access policy. The raw HTML is then passed to `parse_html()` in `services/parser/html_parser.py`, which runs a multi-stage extraction pipeline.
The HTML parser first strips non-content tags — `script`, `style`, `nav`, `footer`, `header`, `aside`, `iframe`, and others — and removes boilerplate containers identified by CSS class or ID patterns (sidebars, ad slots, newsletter signups, social share bars, and similar UI elements). It then searches for the article body using a priority list of semantic selectors (`article`, `[role='main']`, `.article-body`, `.post-content`, and others). If no semantic match is found, it falls back to text-density scoring across candidate `div`, `section`, and `td` elements, selecting the block with the highest composite score based on text density, link density, paragraph count, and word count. The extracted text undergoes further cleaning: regex-based removal of residual boilerplate phrases (copyright notices, "subscribe to our newsletter" prompts, "share this article" fragments), removal of short orphan lines that are likely UI fragments, detection and collapse of repeated template blocks, and whitespace normalization.
Metadata extraction pulls the document title (from `og:title` or `<title>`), author, publisher (from `og:site_name` or hostname), publication date (from `article:published_time` or JSON-LD `datePublished`), canonical URL, language, description, and keywords from the HTML head elements.
If the parsed body text is shorter than 500 characters, the worker attempts to enrich it by reading the raw API payload from MinIO and extracting the data provider's article description, keywords, and author fields for the matching article. This enrichment step ensures that even articles with minimal scrapeable HTML still have enough textual content for meaningful AI extraction.
Quality scoring is performed by `score_parse_quality()` in `services/parser/html_parser.py`, which evaluates six weighted signals to produce a composite score between 0 and 0.95:
| Signal | Weight | What It Measures |
|--------------------|--------|-----------------------------------------------------------------|
| `word_count` | 0.30 | Length of extracted text (thresholds at 20, 50, 150, 300 words) |
| `body_found` | 0.20 | Whether a semantic article body element was located |
| `diversity` | 0.15 | Vocabulary richness (unique words / total words) |
| `sentence` | 0.15 | Presence of proper sentence structure (terminal punctuation) |
| `paragraph` | 0.10 | Multi-paragraph structure (blocks separated by blank lines) |
| `metadata` | 0.10 | Presence of title, author, publisher, and publication date |
The composite score maps to a confidence label: scores below 0.35 are labeled `low`, scores between 0.35 and 0.65 are `medium`, and scores 0.65 and above are `high`. Documents with `low` confidence are marked with status `low_quality` in the `documents` table and are not enqueued for extraction — they are effectively filtered out of the pipeline at this stage.
Entity mention detection runs next. The worker fetches all known aliases from the `company_aliases` table (plus entity identifiers and legal names from the `companies` table) and calls `detect_company_mentions()` in `services/parser/html_parser.py`. The matching strategy varies by alias length: one-to-two character aliases use case-sensitive word-boundary matching to avoid false positives (the letter "A" should not match every occurrence of the word "a"), three-to-four character aliases use case-insensitive word-boundary matching (standard identifier format), and aliases of five or more characters use case-insensitive substring matching (entity names and brands). Confidence scores vary by alias type: identifier matches receive 0.9, legal name matches 0.85, general aliases 0.7, and brand matches 0.6. Multiple alias hits for the same entity are deduplicated, keeping the highest-confidence match and summing match counts. Detected mentions are persisted to the `document_company_mentions` table.
The normalized text and a structured parser output JSON (containing all metadata, quality signals, warnings, outbound links, tags, and mentions) are uploaded to the `app-normalized` MinIO bucket. The `documents` row is updated with the normalized storage reference, parser output reference, quality score, and confidence level.
Finally, the Parser makes a routing decision. If the document's `document_type` is `macro_event`, it is pushed onto `app:queue:macro_classification` for the Global Event Classifier agent. All other documents are pushed onto `app:queue:extraction` for the Document Intelligence Extractor agent. Both queues feed into the Extractor service described in [Page 2](02-ai-agent-processing-and-extraction.md). The job payload includes the `document_id`, `ticker`, and the first 32,000 characters of the normalized text, giving the downstream agent immediate access to the content without needing to fetch it from MinIO.
For additional detail on queue topology and data store layout, see the [Data Pipeline Architecture](../architecture-data-pipeline.md) documentation.
---
## What Comes Next
At this point, raw data has been fetched from four external sources, deduplicated, stored in MinIO, parsed into normalized text, scored for quality, tagged with entity mentions, and routed to the appropriate extraction queue. The documents sitting on `app:queue:extraction` and `app:queue:macro_classification` are clean, quality-filtered, and ready for AI processing. [Page 2 — AI Agent Processing and Structured Extraction](02-ai-agent-processing-and-extraction.md) picks up the story from here, explaining how the Document Intelligence Extractor and Global Event Classifier agents use LLM inference to transform these normalized documents into the structured JSON intelligence that feeds the rest of the pipeline.
@@ -0,0 +1,164 @@
# Page 2 — AI Agent Processing and Structured Extraction
Documents that arrive on the `app:queue:extraction` and `app:queue:macro_classification` Redis queues are clean, quality-filtered, and normalized — but they are still unstructured text. The job of the Extractor service is to transform that text into structured JSON intelligence that the rest of the pipeline can reason about quantitatively. Two AI agents share this responsibility: the Document Intelligence Extractor handles entity-specific content, filings, and transcripts, while the Global Event Classifier handles macro-level geopolitical and economic events. Both agents run through the same Ollama-based inference infrastructure, share a common JSON repair pipeline, and persist their results to PostgreSQL and MinIO for downstream consumption and audit.
This page explains how each agent works, what schemas they produce, how the system validates and repairs LLM output, how runtime configuration is resolved from the database, and how the final structured records are persisted. For a visual overview of the full flow from ingestion through extraction, see the [Ingestion to Extraction Flow diagram](diagrams/ingestion-to-extraction-flow.md). For reference-level detail on agent configuration and the variant management API, see the [AI Agents Guide](../ai-agents.md).
---
## The Document Intelligence Extractor
The Document Intelligence Extractor is the primary AI agent in the pipeline. Registered under the slug `document-extractor` in the `ai_agents` database table, it processes every non-macro document that passes through the Parser — news articles, regulatory filings, performance transcripts, and press releases. Its purpose is to read a normalized document and produce a structured JSON object that captures the document's summary, the entities it affects, the sentiment and impact for each entity, the catalysts driving that impact, and the evidence supporting the analysis.
The entry point is `services/extractor/main.py`, which runs a continuous worker loop polling the `app:queue:extraction` Redis list. When a job arrives, the worker extracts the `document_id`, `ticker`, and `text` fields from the JSON payload. If the job payload does not include the document text directly, the worker fetches it from MinIO using the `normalized_storage_ref` stored in the `documents` table — the Parser uploaded the normalized text to the `app-normalized` bucket during the previous pipeline stage (see [Page 1](01-data-ingestion-and-preparation.md)).
The actual LLM inference is handled by `OllamaClient` in `services/extractor/client.py`. The client sends the document to a local Ollama instance via the `/api/chat` HTTP endpoint with `stream=False` and `think=False`. The `think=False` flag is a deliberate performance choice — it disables the model's chain-of-thought reasoning phase, which would otherwise add two to four minutes of latency per document. The client does not use Ollama's `format` parameter for structured output because of a known Ollama bug (#14645) where the format constraint is silently ignored when `think=False` on qwen3.5 models. Instead, the system relies on prompt engineering to produce JSON and repairs any syntax issues after the fact.
The prompt sent to the model has two parts. The system prompt, defined in `services/extractor/prompts.py`, establishes the model's role as a document analyst and sets strict output rules: return only a single JSON object, no markdown fences, no explanation text, every schema field is required, use `"other"` for `catalyst_type` when unsure, keep evidence spans under 20 words, and limit key facts to three to five items. The user prompt, built by `build_extraction_prompt()` in the same module, provides the document text along with document-type-specific guidance. Four guidance variants exist — one each for articles, filings, transcripts, and press releases — each calibrated to the conventions and biases of that document type. For example, the filing guidance instructs the model to preserve the precise legal language of regulatory documents, while the press release guidance warns that sentiment may be biased positive and directs the model to focus on concrete metrics rather than marketing language.
The user prompt also includes a list of all tracked entity identifiers from the `companies` table, along with rules for how the model should use them. If a tracked entity identifier appears verbatim in the text, the model must include it in the output with at least one evidence span. If the article discusses a sector or theme that clearly affects a tracked entity (oil prices affecting Entity-D, AI chip demand affecting Entity-C), the model should include that entity as well. The model is explicitly told not to invent identifiers that are not in the provided list. Documents longer than 8,000 characters are truncated before being included in the prompt, with a `[... truncated for extraction ...]` marker appended.
The `OllamaClient` also supports a `context_window` override via the Ollama `num_ctx` option, which can be configured per agent variant through the `AgentConfigResolver` mechanism described later in this page.
---
## The ExtractionResult Schema
The structured output that the Document Intelligence Extractor produces is defined by the `ExtractionResult` Pydantic model in `services/extractor/schemas.py`. Every field is required — the model has no defaults — so the generated JSON schema forces the LLM to produce every field explicitly. The top-level fields are:
**`summary`** — a concise one-to-three sentence summary of the document's main point. This becomes the human-readable description stored in the `document_intelligence` table.
**`companies`** — an array of `CompanyExtractionItem` objects, one per affected entity. Each entity entry contains:
- `ticker` — the entity identifier (validated against a regex pattern of one to five uppercase letters).
- `company_name` — the full entity name as referenced in the document.
- `relevance` — a float between 0.0 and 1.0 indicating how relevant the document is to this entity, where 0 means tangential and 1 means the entity is the primary subject.
- `sentiment` — one of `positive`, `negative`, `neutral`, or `mixed`, representing the overall sentiment toward this entity in the document.
- `impact_score` — a float between 0.0 and 1.0 estimating the magnitude of impact, where 0 is negligible and 1 is highly material.
- `impact_horizon` — one of `intraday`, `1d`, `1d_7d`, `1d_30d`, `30d_90d`, or `90d_plus`, indicating the expected timeframe over which the impact will play out.
- `catalyst_type` — exactly one of `performance_report`, `product`, `legal`, `macro`, `supply_chain`, `m_and_a`, `rating_change`, or `other`. The prompt instructs the model to use `other` when none of the specific categories fit.
- `key_facts` — a list of facts explicitly stated in the document. The prompt emphasizes that the model must not infer or fabricate facts.
- `risks` — a list of risks explicitly mentioned in the document.
- `evidence_spans` — short verbatim quotes from the document supporting the analysis. The prompt requests these be kept under 20 words each.
**`macro_themes`** — a list of broad economic or environmental themes mentioned in the document, such as `rates`, `inflation`, or `ai_capex`.
**`novelty_score`** — a float between 0.0 and 1.0 indicating how novel or surprising the information is. Routine performance reports score low; unexpected regulatory actions score high. This value feeds into the novelty bonus component of the signal weighting formula described in [Page 3](03-signal-scoring-and-weighted-signals.md).
**`confidence`** — a float between 0.0 and 1.0 representing the model's confidence in the accuracy of its extraction. Lower values indicate ambiguous or incomplete source text. This value becomes the confidence gate input for signal scoring.
**`extraction_warnings`** — a list of issues encountered during extraction, such as `ambiguous_ticker`, `incomplete_text`, or `low_confidence`. These warnings are persisted alongside the intelligence record for operational monitoring.
The JSON schema is generated programmatically from the Pydantic models via `generate_json_schema()` in `services/extractor/schemas.py`, which calls Pydantic's `model_json_schema()` and then inlines all `$defs` references so the schema is self-contained and Ollama-friendly.
---
## The Global Event Classifier
Not all documents describe entity-specific developments. Macro news articles — those tagged with `document_type='macro_event'` by the Parser — describe events that affect entire sectors or economies: trade disputes, central bank rate decisions, commodity supply disruptions, geopolitical conflicts. These documents are routed to the `app:queue:macro_classification` Redis queue and processed by the Global Event Classifier agent, registered under the slug `event-classifier` in the `ai_agents` table.
The classifier is implemented in `services/extractor/event_classifier.py`. When the extractor worker in `services/extractor/main.py` pops a job and determines that the document type is `macro_event` (either because the job came from the macro queue or because the `documents` table records it as such), it routes the document to `_process_macro_classification()` instead of the standard extraction pipeline. This function calls `classify_global_event()`, which builds a dedicated prompt, sends it to Ollama through the same `OllamaClient` infrastructure, parses the response, and persists the result.
The classifier's system prompt is distinct from the extractor's. It establishes the model's role as a macro-level news classifier and includes explicit anti-hallucination rules that are critical to preventing the classifier from overreaching. The prompt states that the model should only classify articles about macro events that affect entire sectors or economies — trade disputes, interest rate changes, commodity supply disruptions, regulatory changes, geopolitical conflicts, natural disasters. It explicitly lists what should not be classified as macro events: individual entity performance reports, lawsuits against a single entity, single-entity management changes, individual entity analysis, entity-specific debt or bankruptcy, and product launches by one entity. For these entity-specific articles that were incorrectly routed, the model is instructed to set severity to `"low"`, confidence below 0.3, and leave the `affected_regions`, `affected_sectors`, and `affected_commodities` arrays empty.
The user prompt, built by `build_event_classification_prompt()`, reinforces these anti-hallucination rules and provides additional guidance. It instructs the model to only extract facts explicitly stated in the text, to set confidence below 0.4 for vague or speculative content, to distinguish announced policy from rumored policy, and to reserve `"critical"` severity for events affecting multiple countries or entire global systems. Articles longer than 6,000 characters are truncated before inclusion in the prompt.
The output schema is the `GlobalEvent` dataclass, which contains:
- `event_types` — a list of impact type strings, drawn from a fixed set: `supply_disruption`, `demand_shift`, `cost_increase`, `regulatory_pressure`, `currency_impact`, `commodity_shock`, `trade_barrier`, and `geopolitical_risk`. The model is instructed to include all applicable types rather than collapsing to a single category.
- `severity` — one of `low`, `moderate`, `high`, or `critical`.
- `affected_regions` — ISO 3166-1 alpha-2 country codes or region names (e.g., `US`, `CN`, `EU`, `GB`, `JP`). Only regions explicitly mentioned or clearly implied should be included.
- `affected_sectors` — GICS sector identifiers such as `Energy`, `Financials`, `Information Technology`, or `Industrials`.
- `affected_commodities` — commodity identifiers like `crude_oil`, `natural_gas`, `gold`, `copper`, `wheat`, `lithium`, or `semiconductors`. An empty list if no commodities are directly affected.
- `summary` — a one-to-three sentence summary of the event and its domain implications.
- `key_facts` — facts explicitly stated in the article, limited to three to five items.
- `estimated_duration` — one of `short_term` (days to weeks), `medium_term` (weeks to months), or `long_term` (months to years).
- `confidence` — a float between 0.0 and 1.0, clamped during parsing.
Each `GlobalEvent` also carries a `model_metadata` object recording the provider (`ollama`), model name, prompt version (`event-classification-v1`), and schema version (`1.0.0`), plus a `source_document_id` linking back to the originating document.
After a successful classification, the system computes macro impact records for all tracked entities using the exposure-based interpolation engine in `services/aggregation/interpolation.py`. Each entity's exposure profile — geographic revenue mix, supply chain regions, key input commodities, regulatory jurisdictions, and position tier — determines how much a given macro event affects that entity. Entities with non-zero macro impact scores get `macro_impact_records` rows persisted to PostgreSQL, and aggregation jobs are enqueued to `app:queue:aggregation` for each affected entity identifier. The extractor worker tracks consecutive macro classification failures and emits a critical-level alert after three consecutive failures, continuing with entity-only signals in the meantime.
---
## The JSON Repair Pipeline
LLM output is inherently unreliable at the syntactic level. Models sometimes wrap JSON in markdown fences, produce trailing commas, leave strings unterminated, or truncate output mid-object when they hit token limits. The extractor addresses this with a three-stage JSON repair pipeline implemented across `services/extractor/client.py` and `services/extractor/schemas.py`.
The first stage is a direct `json.loads()` call. If the raw model output is already valid JSON, no repair is needed and the pipeline moves straight to validation. This is the fast path for well-behaved model responses.
The second stage strips markdown fences. Models frequently wrap their output in `` ```json ... ``` `` blocks despite being told not to. The `_strip_markdown_fences()` function in `services/extractor/client.py` uses a regex to detect and remove these wrappers before attempting another parse.
The third stage invokes the `json-repair` library as a fallback. The `_repair_json()` function in `services/extractor/client.py` calls `repair_json()` with `return_objects=False` to get a repaired JSON string. This library handles a wide range of common LLM JSON errors — trailing commas, missing quotes, unescaped characters — that would otherwise require custom repair logic.
The `services/extractor/schemas.py` module contains an additional layer of repair logic in its own `_repair_json()` function, which handles cases that the library might miss. It strips non-JSON prefixes (models sometimes prepend explanatory text before the opening brace), removes control characters that break parsing, fixes trailing commas before closing brackets, and as a last resort calls `_repair_truncated_json()` — a state-machine parser that walks the string tracking bracket depth and string state, then appends the necessary closing tokens to complete a truncated JSON object.
For the Global Event Classifier, the `_parse_classification_response()` function in `services/extractor/event_classifier.py` reuses the same `_strip_markdown_fences()` and `_repair_json()` functions from the client module, and additionally handles the case where the model wraps the output object in a single-element list — a quirk observed with some model configurations.
---
## Structural and Semantic Validation
Repairing JSON syntax is only the first step. The `validate_extraction()` function in `services/extractor/schemas.py` performs both structural and semantic validation on the parsed output, and the distinction between the two is important for understanding the retry logic.
Structural validation begins with normalization. The `_normalize_extraction_data()` function fills in missing top-level fields with sensible defaults (empty summary, empty companies array, 0.5 novelty score, 0.3 confidence), clamps numeric fields to the [0.0, 1.0] range, and normalizes per-entity fields. Catalyst types that the model produces as free-text alternatives — `"strategic pivot"`, `"acquisition"`, `"lawsuit"`, `"inflation"`, `"launch"` — are mapped to their canonical enum values through a comprehensive alias dictionary. Impact horizons like `"long-term"`, `"short"`, `"immediate"`, or `"near-term"` are similarly mapped to the valid set (`intraday`, `1d`, `1d_7d`, `1d_30d`, `30d_90d`, `90d_plus`). After normalization, the data is validated against the `ExtractionResult` Pydantic model, which enforces type constraints, enum membership, and range bounds.
Semantic validation catches issues that are structurally valid but logically suspect. The `_semantic_checks()` function runs a series of cross-field consistency checks that produce either errors (which trigger a retry) or warnings (which are logged but do not block acceptance). Semantic errors include duplicate entity identifiers across entity entries, missing identifier fields, and invalid impact horizon values. Semantic warnings include empty summaries, low confidence with entities present, invalid identifier formats (not matching the one-to-five uppercase letter pattern), missing evidence spans, evidence spans that are too short (under 8 characters) or too long (over 500 characters), high impact scores with no supporting key facts, very low relevance scores, and strong sentiment paired with negligible impact scores.
When the original document text is available, the validator also performs an evidence grounding check: each evidence span is searched for in the source text (case-insensitive), and spans not found in the document are flagged with a warning. This helps detect hallucinated evidence — quotes the model fabricated rather than extracted from the actual text.
If validation produces any semantic errors, the `ValidationReport` is marked as invalid and the `OllamaClient` retry loop treats it as a failed attempt. The retry logic uses exponential backoff with configurable parameters: a base delay (default from `OllamaConfig`), a multiplier applied on each retry, and a maximum delay cap. The number of retries is configurable per agent through the `max_retries` field in the `ai_agents` or `agent_variants` table. Non-retryable errors — HTTP 400, 401, 403, 404, and 422 responses from Ollama — short-circuit the retry loop immediately, since these indicate a problem with the request itself rather than a transient model failure.
Every attempt, whether successful or not, is recorded in an `ExtractionAttempt` dataclass that captures the raw output, validation report, error description, duration in milliseconds, model name, and whether the error was retryable. The full list of attempts is preserved in the `ExtractionResponse` for audit purposes and uploaded to MinIO by the persistence layer.
---
## The AgentConfigResolver: Hot-Swapping Models and Prompts
Both the Document Intelligence Extractor and the Global Event Classifier resolve their runtime configuration through the `AgentConfigResolver` in `services/shared/agent_config.py`. This mechanism allows operators to change models, prompts, timeouts, retry counts, and token budgets without restarting any service — changes take effect within 60 seconds.
The resolver works by querying the `ai_agents` and `agent_variants` PostgreSQL tables with a single SQL statement that uses `COALESCE` to prefer variant values over base agent values. When the extractor worker starts, it creates an `AgentConfigResolver` instance with a 60-second TTL cache and calls `resolver.resolve("document-extractor")` to get the active configuration. If an active variant exists for the agent (enforced by a unique partial index on `agent_variants` that allows at most one active variant per agent), the variant's `model_name`, `system_prompt`, `temperature`, `max_tokens`, `context_window`, `timeout_seconds`, and `max_retries` override the base agent's values wherever the variant provides a non-NULL value. If no active variant exists, the base agent's configuration is used. If the database query fails entirely, the resolver returns `None` and the worker falls back to environment-variable-based `OllamaConfig` defaults.
The resolved configuration is captured in a `ResolvedAgentConfig` frozen dataclass that includes the `agent_id`, `variant_id` (if any), `model_provider`, `model_name`, `system_prompt`, `user_prompt_template`, `prompt_version`, `temperature`, `max_tokens`, `context_window`, `input_token_limit`, `token_budget`, `timeout_seconds`, and `max_retries`. The extractor worker uses this to build an `OllamaConfig` that is passed to the `OllamaClient`.
The 60-second TTL cache means the resolver only hits the database once per minute per agent slug. Cache entries are keyed by slug and timestamped with `time.monotonic()`. When a cached entry expires, the next `resolve()` call re-queries the database and refreshes the cache. The `invalidate()` method can clear a single slug or the entire cache, though in practice the TTL-based expiry is sufficient for normal operations.
The extractor worker re-resolves its configuration every 100 jobs. If the resolved model name has changed (for example, because an operator activated a variant that uses a different model), the worker closes the old `OllamaClient` and creates a new one with the updated configuration. The event classifier is resolved separately and can use a different model than the document extractor — the worker maintains two independent `OllamaClient` instances when the models differ.
Token budget enforcement adds another layer of control. If a variant specifies a `token_budget` (total tokens per hour), the worker checks the `agent_performance_log` table before each invocation to see whether the budget has been exceeded. If so, the invocation is skipped entirely. Input token limits work similarly: if a variant sets an `input_token_limit`, the worker truncates the document text to approximately that many tokens (estimated at four characters per token) before sending it to the model.
For a complete guide to creating variants, activating them, and comparing their performance, see the [AI Agents Guide](../ai-agents.md).
---
## Persistence: From Extraction to Database
Once the LLM produces a valid extraction and it passes validation, the `persist_extraction()` function in `services/extractor/worker.py` orchestrates the full persistence pipeline. This function writes to both MinIO (for audit) and PostgreSQL (for downstream consumption), ensuring that every extraction attempt is fully traceable.
The MinIO persistence layer uploads four artifacts per extraction, all stored under date-partitioned paths in dedicated buckets. The prompt metadata (prompt version, schema version, model name) goes to `app-llm-prompts`. The raw model output for every attempt — including failed ones — goes to `app-llm-results`, preserving the full retry history. A validation report summarizing the final attempt's status, errors, and warnings is uploaded alongside the raw output. On success, the final parsed intelligence object (the `ExtractionResult` serialized as JSON) is uploaded to a separate path for easy retrieval.
The PostgreSQL persistence writes to two tables. The `document_intelligence` table receives one row per document, containing the summary, macro themes, novelty score, source credibility, extraction warnings, confidence, model metadata (provider, model name, prompt version, schema version), references to the MinIO artifacts (raw output ref, prompt ref), validation status (`valid` or `failed`), validation errors, and retry count. This row is the authoritative record of what the AI extracted from the document.
The `document_impact_records` table receives one row per entity mention within the extraction. Each impact record is linked to the parent `document_intelligence` row via `intelligence_id` and to the `companies` table via `company_id`. The record captures the entity identifier, relevance, sentiment, impact score, impact horizon, catalyst type, key facts, risks, and evidence spans for that specific entity. The `company_id` is resolved from an identifier-to-UUID mapping that the worker maintains by querying the `companies` table (refreshed every 100 jobs). If an identifier in the extraction output does not match any tracked entity, the impact record is skipped with a warning — the system only persists impact records for entities in its tracked universe.
After persisting the intelligence and impact records, the worker updates the document's status in the `documents` table to `extracted` (or `extraction_failed` if all retry attempts were exhausted). Even failed extractions get a `document_intelligence` row with `validation_status='failed'`, empty summary, zero confidence, and the accumulated error messages — this ensures the failure is visible in the database rather than silently lost.
Performance metrics are collected for every extraction via `collect_metrics()` in `services/extractor/metrics.py` and persisted to a metrics table. Prometheus counters and histograms track extraction attempts, duration, retries, confidence distribution, validation errors, and estimated token usage (input and output, estimated at four characters per token). When a resolved agent config is available, the worker also logs to the `agent_performance_log` table with variant attribution, enabling the A/B comparison queries described in the [AI Agents Guide](../ai-agents.md).
For the Global Event Classifier, persistence follows a parallel path. The prompt and raw output are uploaded to MinIO under an `event_classification/macro/` path prefix. The parsed `GlobalEvent` is persisted to the `global_events` PostgreSQL table, which stores the event types, severity, affected regions, affected sectors, affected commodities, summary, key facts, estimated duration, confidence, source document ID, and model metadata. Downstream, the macro interpolation engine computes `macro_impact_records` for each affected entity and persists those as well.
---
## Enqueuing Aggregation Jobs
The final step in the extraction pipeline is to notify the downstream aggregation engine that new intelligence is available. After a successful document extraction, the worker pushes a job onto the `app:queue:aggregation` Redis list containing the identifier of the affected entity. The aggregation engine (described in [Page 3](03-signal-scoring-and-weighted-signals.md)) will pick up this job and recompute the weighted signals and trend summaries for that entity, incorporating the freshly extracted intelligence.
For macro events, the enqueue logic is more expansive. After the Global Event Classifier produces a `GlobalEvent` and the interpolation engine computes macro impact records, the worker enqueues an aggregation job for every entity identifier that received a non-zero macro impact score. A single macro event — say, a new regulatory policy change affecting the Energy and Industrials sectors — can trigger aggregation recomputation for dozens of entities simultaneously. The aggregation job payload includes both the entity identifier and the `macro_event_id`, so the aggregation engine knows to incorporate the new macro signals.
The worker alternates between the extraction and macro classification queues to prevent starvation: every third job is pulled from `app:queue:macro_classification`, with the remaining two-thirds from `app:queue:extraction`. If the preferred queue is empty, the worker falls back to the other queue, ensuring that neither pipeline stalls while the other has work available.
---
## What Comes Next
At this point, documents have been transformed from unstructured text into structured JSON intelligence — `ExtractionResult` objects for entity-specific documents and `GlobalEvent` objects for macro news. These structured records are persisted in PostgreSQL and their entity identifiers have been enqueued for aggregation. But raw extraction output is not yet actionable for downstream decisions. The extraction tells us that a document is negative for Entity-A with an impact score of 0.7 and a confidence of 0.8, but it does not tell us how much weight that signal should carry relative to other signals about Entity-A, or how it compares to signals from different sources, time periods, or environmental conditions. [Page 3 — Signal Scoring and the WeightedSignal Abstraction](03-signal-scoring-and-weighted-signals.md) picks up the story from here, explaining how the aggregation engine transforms these raw extraction outputs into weighted signals through confidence gating, recency decay, source credibility scoring, novelty bonuses, and environmental context multipliers.
@@ -0,0 +1,210 @@
# Page 3 — Signal Scoring and the WeightedSignal Abstraction
The extraction pipeline described in [Page 2](02-ai-agent-processing-and-extraction.md) produces structured intelligence records — `document_impact_records` for entity-specific documents, `macro_impact_records` for global events, and `competitive_signal_records` for cross-entity pattern propagation. Each record carries a sentiment, an impact score, a confidence value, and a publication timestamp. But these raw values are not directly comparable. A high-confidence extraction from a reputable source published ten minutes ago should carry far more weight than a low-confidence extraction from an unknown source published three weeks ago. A document that breaks genuinely novel information should matter more than one that rehashes yesterday's performance report. And when conditions are changing fast — high volatility, surging volume — fresh signals become even more critical.
The signal scoring layer in `services/aggregation/scoring.py` solves this problem by transforming each raw intelligence record into a `WeightedSignal` object: a document reference paired with a composite aggregation weight that encodes recency, credibility, novelty, confidence, and environmental conditions into a single number. This page explains how that weight is computed, how sentiment labels become numeric values, and how three independent signal layers — Entity-Specific, Macro, and Competitive — each produce `WeightedSignal` objects that are concatenated into a unified list before the aggregation engine computes trend summaries. For a visual breakdown of the composite weight formula, see the [Weighted Signal Computation diagram](diagrams/weighted-signal-computation.md). For the full picture of how the three layers merge, see the [Three-Layer Signal Merging diagram](diagrams/three-layer-signal-merging.md).
---
## The WeightedSignal and SignalWeight Dataclasses
The core abstraction is the `WeightedSignal` dataclass, defined in `services/aggregation/scoring.py`. It pairs a document reference with the computed weight and the signal's sentiment and impact values:
- **`document_id`** — the UUID of the source document (for entity-specific and macro signals) or a synthetic identifier for pattern-derived signals (e.g., `pattern:Entity-A:performance_report:7d`).
- **`weight`** — a `SignalWeight` object containing the component breakdown and the final combined score.
- **`sentiment_value`** — a numeric sentiment value: `+1.0` for positive, `-1.0` for negative, `0.0` for neutral or mixed.
- **`impact_score`** — the magnitude of impact, drawn from the extraction's per-entity impact score for entity-specific signals, or scaled by a layer-specific weight multiplier for macro and competitive signals.
The `SignalWeight` dataclass captures the individual components that feed into the combined weight, making the scoring decision fully transparent and auditable:
- **`recency`** — the exponential decay weight based on document age.
- **`credibility`** — the source credibility weight after clamping and exponentiation.
- **`novelty_bonus`** — the additive bonus derived from the document's novelty score.
- **`confidence_gate`** — either `1.0` (signal passes) or `0.0` (signal is gated out).
- **`market_ctx_multiplier`** — a multiplicative boost from environmental conditions, always `>= 1.0`.
- **`combined`** — the final composite weight used by the aggregation engine.
The `ScoringConfig` frozen dataclass holds all tunable parameters for the scoring functions — half-life hours per window, credibility bounds, novelty bonus cap, confidence floor, and environmental context thresholds. A module-level `DEFAULT_CONFIG` singleton provides the production defaults, but every scoring function accepts an optional `config` parameter so that tests and alternative configurations can override any parameter without modifying global state.
---
## The Composite Weight Formula
The `compute_signal_weight()` function in `services/aggregation/scoring.py` computes the combined weight for a single document signal. The formula is:
```
combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier
```
Each factor is computed independently and then multiplied together. This multiplicative structure means that any single factor can zero out the entire weight (the confidence gate) or amplify it (the market context multiplier), and the interaction between factors is naturally captured — a highly credible, very recent document with novel information in a volatile environment receives the maximum possible weight, while a stale, low-credibility document with routine information receives a weight close to zero.
The following sections describe each component in detail.
---
## Confidence Gate
The confidence gate is the first and most decisive filter. If the extraction confidence for a document falls below the `confidence_floor` threshold — set to `0.2` in the default `ScoringConfig` — the gate evaluates to `0.0` and the entire combined weight becomes zero. The document is effectively excluded from aggregation. If the confidence meets or exceeds the threshold, the gate evaluates to `1.0` and has no further effect on the weight.
This binary gate exists because documents with very low extraction confidence are too unreliable to aggregate. A confidence of 0.15 typically means the LLM struggled to parse the document — perhaps the text was truncated, the language was ambiguous, or the document type was unusual. Including such signals would add noise rather than information. The threshold of 0.2 is deliberately low; it filters only the most unreliable extractions while allowing moderately confident signals to participate (their lower confidence is reflected through the credibility component instead).
---
## Recency Decay
The `recency_weight()` function computes an exponential decay based on how old a document is relative to the aggregation anchor time. The formula is:
```
w = 2^(age_hours / half_life)
```
A document published exactly one half-life ago receives a recency weight of `0.5`. A document published two half-lives ago receives `0.25`, and so on. A document published at or after the reference time receives the maximum weight of `1.0`.
The half-life varies by trend window, reflecting the intuition that shorter windows need faster decay to stay responsive, while longer windows should give older documents more influence. The default half-lives, configured in `ScoringConfig.half_life_hours`, are:
| Window | Half-Life |
|--------|-----------|
| `intraday` | 2 hours |
| `1d` | 12 hours |
| `7d` | 72 hours (3 days) |
| `30d` | 240 hours (10 days) |
| `90d` | 720 hours (30 days) |
For the intraday window, a document published four hours ago already has a recency weight of `0.25` — it is rapidly losing influence as newer information arrives. For the 90-day window, that same four-hour-old document still has a recency weight of essentially `1.0`, because the 30-day half-life means age only becomes significant over weeks.
A floor value of `min_recency_weight = 0.01` prevents very old documents from being completely zeroed out. Even a document from months ago retains a trace-level weight of 1%, ensuring it can still contribute to trend computation if no newer signals exist. Both timestamps are normalized to UTC; naive datetimes are treated as UTC to avoid timezone-related scoring errors.
---
## Source Credibility
The `credibility_weight()` function transforms a source's credibility score into a weight component. The raw credibility value — a float between 0.0 and 1.0 stored in the `document_intelligence` table — is first clamped to the range `[0.1, 1.0]` using the `credibility_floor` and `credibility_ceiling` parameters from `ScoringConfig`. This clamping ensures that even the least credible sources retain a minimum weight of 0.1 rather than being completely silenced, while preventing any source from exceeding a weight of 1.0.
After clamping, the value is raised to the `credibility_exponent` power. The default exponent is `1.0`, which means the clamped credibility passes through unchanged. Setting the exponent above 1.0 would penalize low-credibility sources more aggressively — for example, an exponent of 2.0 would reduce a credibility of 0.5 to a weight of 0.25. Setting it below 1.0 would flatten the curve, making the system more tolerant of lower-credibility sources. The exponent is configurable through `ScoringConfig` to allow operators to tune the credibility sensitivity without changing the scoring code.
---
## Novelty Bonus
The novelty bonus rewards documents that contain genuinely new information. The bonus is computed as:
```
novelty_bonus = novelty_score × novelty_bonus_max
```
where `novelty_score` is the 0.0-to-1.0 value produced by the extraction model (see the `ExtractionResult` schema in [Page 2](02-ai-agent-processing-and-extraction.md)) and `novelty_bonus_max` is `0.25` by default. This means the bonus ranges from `0.0` (completely routine information) to `0.25` (maximally novel information), providing up to a 25% boost to the signal weight.
The bonus enters the composite formula as `(1 + novelty_bonus)`, so it acts as a multiplicative amplifier on the base weight. A document with a novelty score of 1.0 gets its weight multiplied by 1.25; a document with a novelty score of 0.0 gets multiplied by 1.0 (no change). This design ensures that novelty can only increase a signal's weight, never decrease it — routine information is not penalized, it simply does not receive the bonus.
---
## Environmental Context Multiplier
The `market_context_multiplier()` function computes a boost factor based on real-time environmental conditions for the entity being aggregated. The multiplier is always `>= 1.0`, meaning environmental context can only amplify signal weights, never reduce them. When no environmental context data is available (the `MarketContext` object from `services/shared/schemas.py` has `has_data == False`), the multiplier defaults to `1.0`.
Two environmental features contribute to the boost:
**Volatility boost.** When the entity's price volatility exceeds the `volatility_recency_boost_threshold` (default `1.0` in price units), the excess volatility is transformed through a logarithmic scaling function: `log₁₊(excess) × 0.15`. The logarithmic scaling prevents extreme volatility from producing runaway weight amplification. The boost is capped at `volatility_recency_boost_max = 0.30`, so the maximum volatility contribution is a 30% weight increase. The rationale is that in highly volatile environments, fresh intelligence is disproportionately valuable — a signal about Entity-C matters more when Entity-C is swinging 5% intraday than when it is moving in a tight range.
**Volume surge boost.** When the entity's volume change percentage exceeds `volume_surge_threshold_pct = 50.0%` (meaning activity volume is at least 50% above the prior period's average), a flat `volume_surge_boost = 0.15` is added. Unlike the volatility boost, this is binary — either the volume threshold is met and the full 15% boost applies, or it is not and no boost is added. High-volume moves carry more conviction because they represent broader participation rather than thin-activity noise.
The two boosts are additive within the multiplier: `multiplier = 1.0 + volatility_boost + volume_surge_boost`. In the most extreme case — high volatility and a volume surge — the combined multiplier reaches `1.0 + 0.30 + 0.15 = 1.45`, amplifying the signal weight by 45%. The `MarketContext` data is fetched by `services/aggregation/market_context.py` from the data tables in PostgreSQL, using the same entity identifier and window parameters as the impact record query.
---
## Sentiment Mapping
Before signals can be aggregated into trend summaries, the categorical sentiment labels from the extraction output must be converted to numeric values. The `sentiment_to_numeric()` function in `services/aggregation/scoring.py` performs this mapping:
| Sentiment Label | Numeric Value |
|----------------|---------------|
| `positive` | `+1.0` |
| `negative` | `-1.0` |
| `neutral` | `0.0` |
| `mixed` | `0.0` |
The mapping is case-insensitive. Any unrecognized label defaults to `0.0`. The choice to map both `neutral` and `mixed` to `0.0` is deliberate — a mixed-sentiment document (one that contains both positive and negative signals for the same entity) should not push the trend in either direction. The contradiction between the positive and negative aspects is captured separately by the contradiction detection system described in [Page 4](04-trend-aggregation-and-accumulating-signals.md), rather than being baked into the sentiment value itself.
For macro signals, the direction-to-sentiment mapping in `services/aggregation/worker.py` follows the same pattern: `positive` maps to `+1.0`, `negative` to `-1.0`, and both `mixed` and `neutral` to `0.0`. For competitive signals built by `build_pattern_weighted_signals()` in `services/aggregation/signal_propagation.py`, the sentiment is derived from the pattern's directional bias: `+1.0` if `positive_pct > negative_pct`, `-1.0` otherwise.
---
## Weighted Sentiment Average
The `weighted_sentiment_average()` function computes the central metric that drives trend direction: a weight-adjusted average sentiment across all signals for an entity in a given window. The formula is:
```
weighted_avg = Σ(combined_weight × impact_score × sentiment_value) / Σ(combined_weight × impact_score)
```
Each signal contributes its sentiment value scaled by both its composite weight and its impact score. The denominator normalizes by the total effective weight, producing a value in the range `[-1.0, +1.0]`. A result near `+1.0` means the weighted evidence is overwhelmingly positive; near `-1.0` means overwhelmingly negative; near `0.0` means either neutral or evenly split.
The use of `combined_weight × impact_score` as the effective weight means that high-impact, high-weight signals dominate the average. A single high-confidence, recent, credible document with a strong impact score can outweigh several older, lower-impact documents — which is the intended behavior. The aggregation engine in `services/aggregation/worker.py` passes this weighted average to `derive_trend_direction()`, which maps it to a `TrendDirection` enum value (positive, negative, mixed, or neutral) using the thresholds described in [Page 4](04-trend-aggregation-and-accumulating-signals.md).
If the total effective weight is zero — either because no signals exist or all signals were gated out by the confidence floor — the function returns `0.0`, which maps to a neutral trend direction.
---
## The Three Signal Layers
The aggregation engine in `services/aggregation/worker.py` does not treat all intelligence sources equally. Signals flow through three independent layers, each with a different relative weight, before being concatenated into a single `WeightedSignal` list for trend computation. This layered architecture allows the system to incorporate diverse intelligence sources while controlling how much influence each source type has on the final trend.
### Layer 1 — Entity-Specific Signals (Weight: 1.0)
Entity-specific signals are the primary layer. They are built by `build_weighted_signals()` in `services/aggregation/worker.py` from `document_impact_records` — the per-entity extraction output produced by the Document Intelligence Extractor (see [Page 2](02-ai-agent-processing-and-extraction.md)). Each impact record's sentiment is converted via `sentiment_to_numeric()`, and its impact score is used directly without any layer-level scaling. The `compute_signal_weight()` function produces the composite weight using the document's publication time, source credibility, novelty score, extraction confidence, and the entity's current environmental context.
Entity-specific signals carry a relative weight of `1.0` — they are the baseline against which other layers are measured. This reflects the design principle that direct, entity-specific intelligence (a performance report about Entity-A, a product launch by Entity-B, a lawsuit against Entity-E) is the most relevant and reliable signal for that entity's trend.
### Layer 2 — Macro Signals (Weight: 0.3)
Macro signals capture the indirect impact of global events on individual entities. They are built by `build_macro_weighted_signals()` in `services/aggregation/worker.py` from `macro_impact_records` — the per-entity impact scores computed by the exposure-based interpolation engine after the Global Event Classifier processes a macro news article. The sentiment is mapped from the `impact_direction` field (`positive``+1.0`, `negative``-1.0`, `mixed`/`neutral``0.0`), and the impact score is scaled by `MACRO_SIGNAL_WEIGHT`, which defaults to `0.3` in `AggregationConfig`.
The 0.3 weight means that a macro signal's impact score is reduced to 30% of its raw value before entering the aggregation. This attenuation reflects the inherent uncertainty in macro-to-entity impact estimation — a policy change might affect Entity-D's revenue, but the magnitude depends on exposure profiles, supply chain flexibility, and competitive dynamics that the interpolation engine can only approximate. By weighting macro signals at 0.3 relative to entity-specific signals at 1.0, the system ensures that macro intelligence informs the trend without overwhelming direct entity-specific evidence.
The recency decay, credibility, and confidence gating for macro signals use the same `compute_signal_weight()` function as entity-specific signals. The `published_at` timestamp comes from the global event's source document (the macro news article), and the `source_credibility` and `extraction_confidence` both use the macro impact record's `confidence` field.
### Layer 3 — Competitive Signals (Weight: 0.2)
Competitive signals capture cross-entity effects: when a catalyst hits one entity, historical patterns suggest how competitors might be affected. They are built by `build_pattern_weighted_signals()` in `services/aggregation/signal_propagation.py` from two sources: `HistoricalPattern` objects (self-entity patterns mined by `services/aggregation/pattern_matcher.py`) and `CompetitiveSignalRecord` objects (cross-entity propagation signals stored in `competitive_signal_records`).
For historical patterns, the sentiment is derived from the pattern's directional bias (`+1.0` if `positive_pct > negative_pct`, `-1.0` otherwise), and the impact score is the pattern's `avg_strength` multiplied by `competitive_signal_weight` (default `0.2` from `CompetitiveConfig`). The `published_at` for recency decay uses the pattern's `data_end` — the most recent data point in the pattern's sample — and the `extraction_confidence` uses the pattern's `pattern_confidence`. Source credibility is set to `1.0` because patterns are derived from validated historical data, and novelty is fixed at `0.5`.
For competitive signal records, the same structure applies: sentiment from `signal_direction`, impact from `signal_strength × competitive_signal_weight`, recency from `computed_at`, and confidence from `pattern_confidence`.
The 0.2 weight makes competitive signals the lightest layer. This is appropriate because competitive signal propagation involves the most inference — the system is predicting how Entity B will react based on what happened to Entity A in historically similar situations. The signal is valuable as supplementary evidence but should not drive trend direction on its own.
---
## Signal Merging in the Aggregation Engine
The `aggregate_company_window()` function in `services/aggregation/worker.py` orchestrates the merging of all three layers for a single entity and window. The process follows a clear sequence:
1. **Fetch entity-specific impact records** from `document_impact_records` for the entity within the window's time range.
2. **Fetch environmental context** for the entity from data tables.
3. **Build entity-specific weighted signals** via `build_weighted_signals()`.
4. **Check the macro toggle** — query `risk_configs` for the `macro_enabled` flag, then fetch and merge macro signals if enabled.
5. **Check the competitive toggle** — query `risk_configs` for the `competitive_enabled` flag, then fetch patterns, fetch competitive signals, and merge if enabled.
6. **Concatenate** all `WeightedSignal` lists into a single list.
7. **Assemble the `TrendSummary`** from the merged signals.
The concatenation in step 6 is a simple list append — `signals = signals + macro_signals` followed by `signals = signals + pattern_weighted`. There is no re-weighting or normalization at the merge point. The relative influence of each layer is already encoded in the impact scores (scaled by 0.3 for macro, 0.2 for competitive, 1.0 for entity-specific) and in the composite weights computed by `compute_signal_weight()`. The `weighted_sentiment_average()` function then naturally produces a sentiment average that reflects these relative weights.
---
## Runtime Toggles and Graceful Degradation
Both the macro and competitive signal layers can be enabled or disabled at runtime through the `risk_configs` PostgreSQL table, without restarting any service. The toggle state is read fresh from the database at the start of every aggregation cycle — there is no caching — so changes take effect on the very next cycle.
The `fetch_macro_enabled()` function in `services/aggregation/worker.py` queries the most recent active `risk_configs` row and reads the `config->>'macro_enabled'` JSON field. If the field is explicitly set to `"true"` or `"false"`, that value overrides the `AggregationConfig` default. If no config row exists or the field is absent, the function returns `None` and the engine falls back to the `AggregationConfig.macro_enabled` default (which is `True`). The `fetch_competitive_enabled()` function follows the identical pattern for the `competitive_enabled` field.
When a layer is disabled, the aggregation engine simply skips the fetch-and-merge step for that layer. Entity-specific signals are always computed — they cannot be toggled off. This means the system degrades gracefully: disabling the macro layer produces trends based on entity-specific signals alone (plus competitive signals if enabled), and disabling the competitive layer produces trends based on entity-specific and macro signals. Disabling both layers reduces the engine to its original single-layer behavior, using only direct document intelligence.
Crucially, disabling a layer does not stop upstream processing. When the macro layer is disabled, the Global Event Classifier continues to classify macro events and the interpolation engine continues to compute `macro_impact_records`. The data accumulates in PostgreSQL. When the layer is re-enabled, the aggregation engine immediately picks up all the macro impact records that were computed while the layer was disabled — there is no data loss or gap in coverage. The same applies to competitive signals: pattern mining and signal propagation continue regardless of the toggle state.
If the competitive signal fetch fails at runtime (for example, due to a database timeout), the aggregation engine catches the exception, logs it, and continues with entity-specific and macro signals only. This exception-based graceful degradation ensures that a transient failure in one layer does not block trend computation entirely.
---
## What Comes Next
At this point, every document intelligence record, macro impact record, and competitive signal record has been transformed into a `WeightedSignal` with a composite weight that encodes recency, credibility, novelty, confidence, and environmental conditions. The three signal layers have been merged into a single list, and the weighted sentiment average has been computed. But a single aggregation cycle produces only a snapshot — a point-in-time view of the evidence. The real power of the system emerges when these snapshots accumulate across multiple documents and time windows, building a case for action. [Page 4 — Trend Aggregation and Accumulating Signals](04-trend-aggregation-and-accumulating-signals.md) explains how the aggregation engine computes `TrendSummary` objects across five time windows, how consecutive same-direction signals strengthen trend confidence and escalate the system's response from neutral observation to actionable decision recommendations, and how contradiction detection and evidence ranking ensure that the trend reflects genuine consensus rather than noise.
@@ -0,0 +1,267 @@
# Page 4 — Trend Aggregation and Accumulating Signals
The scoring layer described in [Page 3](03-signal-scoring-and-weighted-signals.md) transforms every intelligence record into a `WeightedSignal` — a document reference paired with a composite weight that encodes recency, credibility, novelty, confidence, and environmental conditions. Three independent signal layers (Entity-Specific at weight 1.0, Environmental at 0.3, Relational at 0.2) each produce `WeightedSignal` objects that are concatenated into a single list. But a single list of weighted signals is still just raw material. The aggregation engine in `services/aggregation/worker.py` is where that raw material becomes a decision-grade assessment: a `TrendSummary` object that captures the direction, strength, confidence, contradiction level, and supporting evidence for an entity across a specific time window. This page explains how that transformation works — from weighted sentiment averages through trend direction derivation, contradiction detection, evidence ranking, and confidence computation — and, critically, how consecutive signals pointing in the same direction accumulate across documents and time windows to escalate the system's response from passive observation to actionable decision recommendations.
For a visual overview of the accumulation and escalation process, see the [Trend Accumulation and Escalation diagram](diagrams/trend-accumulation-escalation.md). For how the three signal layers merge into the aggregation engine, see the [Three-Layer Signal Merging diagram](diagrams/three-layer-signal-merging.md).
---
## Five Time Windows
The aggregation engine does not compute a single trend for each entity. It computes five, one for each time window defined in `services/aggregation/worker.py`:
| Window | Lookback Duration |
|--------|-------------------|
| `intraday` | 12 hours |
| `1d` | 1 day |
| `7d` | 7 days |
| `30d` | 30 days |
| `90d` | 90 days |
Each window produces an independent `TrendSummary` by fetching all impact records, macro impacts, and competitive signals for the entity within that window's time range. The `aggregate_company_window()` function in `services/aggregation/worker.py` orchestrates this per-window computation: it determines the time range from the window's lookback duration, fetches `document_impact_records` from PostgreSQL, retrieves environmental context, builds entity-specific weighted signals, checks the macro and competitive runtime toggles (see [Page 3](03-signal-scoring-and-weighted-signals.md) for toggle details), merges any enabled layer signals, and then assembles the `TrendSummary`.
The five-window design serves a specific purpose. Short windows (intraday, 1d) capture fast-moving sentiment shifts — a breaking negative performance disclosure, a sudden regulatory action — while long windows (30d, 90d) reveal sustained trends that persist across many documents and data cycles. An entity might show a negative intraday trend after a single unfavorable article, but a neutral 30-day trend because the broader evidence base is balanced. The recommendation engine downstream (described in [Page 5](05-recommendation-generation.md)) evaluates each window's `TrendSummary` independently, so the system can respond to both short-term catalysts and long-term directional shifts.
The `aggregate_company()` function iterates over all effective windows (configurable via `AggregationConfig.windows`, defaulting to all five) and calls `aggregate_company_window()` for each one. This means a single aggregation cycle for one entity produces up to five `TrendSummary` objects, each reflecting a different temporal perspective on the same underlying evidence.
---
## Trend Direction Derivation
Once the weighted sentiment average has been computed from the merged signal list (see the `weighted_sentiment_average()` function described in [Page 3](03-signal-scoring-and-weighted-signals.md)), the `derive_trend_direction()` function in `services/aggregation/worker.py` maps that numeric value to a `TrendDirection` enum. The rules are evaluated in a specific order, and the first matching rule wins:
1. **Mixed** — If the contradiction score exceeds `0.10` (the `MIXED_THRESHOLD` constant) *and* the absolute value of the average sentiment is below `0.30`, the direction is `MIXED`. This rule fires first because high contradiction with a weak directional signal indicates genuine disagreement in the evidence — the trend is not simply neutral, it is actively contested.
2. **Positive** — If the average sentiment is `≥ 0.15` (the `POSITIVE_THRESHOLD` constant), the direction is `POSITIVE`. This means the weight-adjusted evidence leans favorable with enough conviction to cross the threshold.
3. **Negative** — If the average sentiment is `≤ -0.15` (the `NEGATIVE_THRESHOLD` constant), the direction is `NEGATIVE`. The symmetric threshold ensures that positive and negative classifications require the same magnitude of evidence.
4. **Neutral** — If none of the above conditions are met, the direction is `NEUTRAL`. This covers the range where the average sentiment falls between -0.15 and +0.15 without high contradiction — the evidence is either balanced or insufficient to establish a directional lean.
The mixed-first evaluation order is important. Consider a scenario where five documents are positive and four are negative, all with similar weights. The weighted sentiment average might be slightly positive (say, +0.08), which would normally map to neutral. But the contradiction score — computed from the minority/majority weight split — would be high (close to 0.44). The mixed rule catches this case: the evidence is not neutral, it is conflicted. This distinction matters downstream because mixed trends receive different treatment in the recommendation engine than neutral trends.
---
## Contradiction Detection
The contradiction detection module in `services/aggregation/contradiction.py` provides a structured analysis of disagreement within the signal set. Rather than collapsing contradictory evidence into a single number, it produces a `ContradictionResult` containing both an overall score and a list of `DisagreementDetail` objects that explain *where* the disagreement lies.
The `detect_contradictions()` function runs two analyses:
### Sentiment Disagreement
The `_detect_sentiment_disagreement()` function examines whether both positive and negative sentiment signals exist in the signal set. For each signal with a non-zero effective weight (`combined_weight × impact_score > 0`), it classifies the signal as positive or negative based on its `sentiment_value` and accumulates the effective weight for each side. If both sides have at least one signal, it produces a `DisagreementDetail` with dimension `"sentiment"`, listing the document IDs and weights for each side, along with a human-readable description like "Sentiment split: 3 positive vs 2 negative signals (minority weight ratio 38%)".
### Catalyst-Level Disagreement
The `_detect_catalyst_disagreement()` function goes deeper. It groups signals by their `catalyst_type` (performance_report, product_launch, regulatory, etc.) using `CatalystEntry` objects built from the `document_impact_records`. Within each catalyst group, it checks whether both positive and negative signals exist. If they do, it produces a `DisagreementDetail` with dimension `"catalyst:<type>"` — for example, `"catalyst:performance_report"` when some documents interpret a periodic disclosure positively and others negatively. This catalyst-level analysis is valuable because it pinpoints the specific topic of disagreement rather than just flagging that disagreement exists somewhere in the evidence.
### The Overall Contradiction Score
The `_compute_overall_score()` function computes the backward-compatible scalar contradiction score using the minority/majority weight ratio formula:
```
contradiction_score = minority_weight / total_weight
```
where `minority_weight` is the smaller of the positive and negative effective weights, and `total_weight` is their sum. Signals with zero effective weight or neutral sentiment are excluded. The score ranges from `0.0` (complete agreement — all signals point the same direction) to `0.5` (perfect split — positive and negative weights are exactly equal). A score of `0.0` means no contradiction at all. A score above `0.10` combined with a weak average sentiment triggers the mixed direction classification in `derive_trend_direction()`.
The contradiction score also feeds directly into the confidence computation as a penalty, described in the next section. High contradiction reduces the system's confidence in the trend, which in turn affects whether the trend can escalate to actionable recommendations.
---
## Evidence Ranking
Not all documents contributing to a trend are equally important. The `rank_evidence()` function in `services/aggregation/worker.py` delegates to the evidence ranking module (`services/aggregation/evidence.py`) to produce ordered lists of the most influential supporting and opposing documents. The ranking uses a composite scoring approach configured by `EvidenceRankConfig`, considering multiple factors:
- **Weight** — the signal's composite weight from the scoring layer, reflecting recency, credibility, novelty, confidence, and environmental context.
- **Impact** — the extraction's impact score for the entity, reflecting how significant the document's content is.
- **Recency** — how recently the document was published, with more recent documents ranked higher.
- **Confidence** — the extraction confidence, reflecting how reliably the LLM parsed the document.
Signals are split into supporting (positive sentiment) and opposing (negative sentiment) groups. Neutral and mixed sentiment signals are excluded from evidence lists — they do not argue for or against the trend direction. Within each group, signals are sorted by their composite rank score in descending order, and the top entries (up to `MAX_EVIDENCE_REFS = 10` per side) are returned as document ID lists.
The `assemble_trend_with_evidence()` function in `services/aggregation/worker.py` uses the detailed variant `rank_evidence_detailed()` to get `RankedEvidence` objects that include the individual scoring components (weight, impact, recency, confidence, sentiment value). These detailed rankings are persisted to the `trend_evidence` table for auditability, while the document ID lists are stored directly in the `TrendSummary` as `top_supporting_evidence` and `top_opposing_evidence`.
The evidence ranking serves two purposes. First, it provides the recommendation engine with the most relevant documents to cite in its thesis generation (see [Page 5](05-recommendation-generation.md)). Second, it gives human reviewers a quick way to understand *why* the system reached a particular trend assessment — the top-ranked documents are the ones that most influenced the direction and strength.
---
## Confidence Computation
The `compute_trend_confidence()` function in `services/aggregation/worker.py` produces the confidence score for a `TrendSummary`. This score is critical because it directly gates whether a trend can produce actionable recommendations — the eligibility evaluation in `services/recommendation/eligibility.py` requires a minimum confidence of `0.35` to generate any recommendation at all, and higher confidence thresholds control escalation to simulation and live execution modes.
Confidence is computed from four components:
### Unique Source Count
The function counts the number of unique document IDs across all active signals (those with `combined_weight > 0`). This count is divided by 15 and capped at `0.8`:
```
count_factor = min(unique_sources / 15.0, 0.8)
```
A trend backed by 15 or more unique source documents reaches the maximum count contribution of `0.8`. A trend backed by a single document gets only `0.067`. This component rewards breadth of evidence — a trend confirmed by many independent sources is more trustworthy than one driven by a single article, regardless of how high that article's individual weight might be.
### Average Extraction Credibility
The average credibility weight across all active signals provides a baseline quality measure. If most contributing documents come from high-credibility sources, this component is high. If the evidence is dominated by low-credibility sources, confidence is penalized accordingly.
### Signal Agreement with Sample-Size Dampening
The agreement ratio measures what fraction of directional signals (positive + negative, excluding neutral) agree on the majority direction. If 8 out of 10 directional signals are positive, the raw agreement is `0.8`. But raw agreement is misleading with small sample sizes — 1 out of 1 signals agreeing gives a perfect `1.0` agreement, which is not meaningful.
To address this, the agreement is dampened by a logarithmic sample-size factor:
```
agreement_dampener = min(1.0, log₂(unique_sources + 1) / log₂(8))
```
This dampener saturates at `1.0` when `unique_sources` reaches approximately 7 (since `log₂(8) = 3.0` and `log₂(8) = 3.0`). With fewer sources, the dampener reduces the agreement contribution: 1 source gives a dampener of `0.33`, 3 sources give `0.67`, and 7 sources give the full `1.0`. The log₂ scaling means that each additional source provides diminishing marginal improvement to the dampener, which matches the intuition that the jump from 1 to 3 sources is far more meaningful than the jump from 15 to 17.
### Contradiction Penalty
The contradiction score computed by `services/aggregation/contradiction.py` is applied as a direct penalty:
```
contradiction_penalty = contradiction_score × 0.4
```
A contradiction score of `0.5` (perfect split) produces a penalty of `0.2`, which is substantial enough to push a moderately confident trend below the eligibility threshold.
### The Combined Formula
The four components are combined as:
```
confidence = 0.3 × count_factor + 0.3 × avg_credibility + 0.4 × agreement contradiction_penalty
```
The result is clamped to `[0.0, 1.0]`. The weighting gives signal agreement the largest share (40%), reflecting the principle that consensus among diverse sources is the strongest indicator of a reliable trend. Source count and credibility each contribute 30%, providing a balanced assessment of evidence breadth and quality. The contradiction penalty can reduce confidence significantly — a highly contradicted trend with a score of 0.4 loses 0.16 points of confidence, which can easily drop it below the 0.35 eligibility gate.
---
## How Accumulating Signals Escalate Decisions
The trend direction, strength, and confidence computed by the aggregation engine are not just descriptive — they directly determine what action the system takes. The escalation path from passive observation to active execution is governed by the eligibility thresholds defined in `services/recommendation/eligibility.py`, and the key insight is that consecutive signals pointing in the same direction naturally strengthen the trend metrics that control this escalation.
### The Escalation Ladder
The `EligibilityConfig` dataclass in `services/recommendation/eligibility.py` defines the thresholds that map trend metrics to actions:
**Neutral (no recommendation).** A trend fails the eligibility gates entirely when confidence is below `0.35`, trend strength is below `0.10`, contradiction exceeds `0.60`, evidence count is below `2`, or the direction is neutral. The `_check_gates()` function evaluates these hard gates — if any gate fails, no recommendation is generated for that window.
**Observe.** A trend that passes the gates but has a direction of mixed, or has strength below `0.25` with confidence below `0.50`, maps to an `OBSERVE` action via `_determine_action()`. This is the system's way of saying "something is happening, but the evidence is not strong enough to act on." Observe recommendations are always `informational` mode — they are logged for human review but never trigger decisions.
**Monitor.** When the trend has a clear direction (positive or negative) but strength remains below `0.25` while confidence reaches `0.50` or above, the action maps to `MONITOR`. This indicates that the directional signal is real but not yet strong enough for a commitment change. Like observe, monitor recommendations are `informational` mode.
**Act / Defer.** When trend strength reaches `0.25` or above with a positive direction, the action is `ACT`. With a negative direction at the same strength threshold, the action is `DEFER`. These are the only actions that can escalate beyond informational mode — `_determine_mode()` evaluates whether the recommendation qualifies for `simulation_eligible` (confidence ≥ `0.50`) or `production_eligible` (confidence ≥ `0.70`, contradiction ≤ `0.25`, evidence ≥ `5`).
### How Accumulation Drives Escalation
Consider an entity that starts with no recent intelligence. The first negative article arrives — a single document with negative sentiment. In the intraday window, this produces:
- **Trend strength** = `|avg_sentiment|` ≈ the absolute weighted sentiment from one signal, likely close to the impact score.
- **Confidence** = low, because `count_factor = min(1/15, 0.8) = 0.067` and the agreement dampener is only `log₂(2)/log₂(8) = 0.33`.
- **Direction** = negative (if the weighted sentiment is ≤ -0.15).
With confidence well below `0.35`, this trend fails the eligibility gate entirely. No recommendation is generated. The system is in the neutral state.
A second negative article arrives hours later. Now the intraday window has two signals:
- **Unique sources** = 2, so `count_factor = 0.133` and `agreement_dampener = log₂(3)/log₂(8) ≈ 0.53`.
- **Agreement** = `1.0 × 0.53 = 0.53` (both signals agree on negative).
- **Confidence** ≈ `0.3 × 0.133 + 0.3 × avg_cred + 0.4 × 0.53` — likely around `0.35-0.45` depending on credibility.
If confidence crosses `0.35` and strength exceeds `0.10`, the trend passes the eligibility gates. But with strength below `0.25`, the action is `OBSERVE` or `MONITOR` depending on confidence.
A third and fourth negative article arrive over the next day. The 1-day window now has four agreeing signals:
- **Unique sources** = 4, so `count_factor = 0.267` and `agreement_dampener = log₂(5)/log₂(8) ≈ 0.77`.
- **Agreement** = `1.0 × 0.77 = 0.77`.
- **Confidence** ≈ `0.3 × 0.267 + 0.3 × avg_cred + 0.4 × 0.77` — likely `0.50-0.60`.
- **Strength** = `|avg_sentiment|` — with four negative signals and no contradicting evidence, this could easily exceed `0.25`.
Now the trend maps to `DEFER` with `simulation_eligible` mode (confidence ≥ `0.50`). The system has escalated from no recommendation to a simulation-eligible defer recommendation purely through the accumulation of consistent negative evidence.
If the negative evidence continues — more documents, more sources, higher credibility — confidence climbs further. At confidence ≥ `0.70` with contradiction ≤ `0.25` and evidence ≥ `5`, the recommendation reaches `production_eligible` mode, the highest escalation level.
The same process works in reverse for positive accumulation: consecutive favorable signals strengthen the positive trend, increase confidence through source diversity and agreement, and escalate from observe through monitor to act.
### The Role of Contradiction in Preventing False Escalation
Accumulation only works when signals agree. If the fifth article about an entity is positive while the previous four were negative, the contradiction score jumps — `minority_weight / total_weight` increases because the minority (positive) side now has non-zero weight. This has two effects: the contradiction penalty reduces confidence (potentially dropping it below an eligibility threshold), and if the contradiction exceeds `0.10` with `|avg_sentiment| < 0.30`, the direction flips to mixed, which maps to `OBSERVE` regardless of strength. The system effectively de-escalates when the evidence becomes contested, requiring a clearer consensus before re-escalating.
---
## Trend Projections
After the `TrendSummary` is assembled and persisted, the aggregation engine computes a forward-looking `TrendProjection` via `compute_projection()` in `services/aggregation/projection.py`. Projections estimate where the trend is heading based on current momentum, macro signal decay, and upcoming catalysts. They are advisory — they do not directly trigger recommendations — but they provide valuable context for human reviewers and can inform future automated decision-making.
### Momentum
The `compute_trend_momentum()` function computes the rate of change in signed trend strength between the current and previous aggregation cycles. If the current window shows a negative trend at strength `0.40` and the previous cycle showed negative at `0.30`, the momentum is `-0.10` (strengthening negative). If no previous data is available, the function uses a heuristic: momentum is estimated as half the current signed strength, providing a reasonable baseline for new trends.
Momentum enters the projection as a half-weighted adjustment to the current signed strength:
```
momentum_projected_signed = direction_sign × current_strength + momentum × 0.5
```
This means momentum influences the projection but does not dominate it — a strong current trend with weakening momentum still projects as directional, just with reduced strength.
### Macro Decay
The `project_macro_decay()` function estimates how active macro events will evolve over the projection horizon. Each macro event has an `estimated_duration` that maps to a decay half-life:
| Duration | Half-Life |
|----------|-----------|
| `short_term` | 1 day |
| `medium_term` | 7 days |
| `long_term` | 30 days |
For each event, the function computes the projected remaining impact at the end of the horizon using exponential decay: `future_factor = 2^(future_age_days / half_life)`. The impact is further scaled by a severity weight (`critical`: 1.0, `high`: 0.75, `moderate`: 0.5, `low`: 0.25). Positive and negative macro impacts are accumulated separately, and the projected macro direction is determined by comparing the two sides — positive if the favorable side exceeds the unfavorable by 20%, negative if the reverse, mixed if both are present without a clear majority.
When the macro layer is enabled and macro events exist, the projection blends the entity-specific momentum projection with the macro trajectory. The macro weight is capped at `0.4` (40% of the blended projection), ensuring that macro signals inform but do not overwhelm the entity-specific trend. The blending formula combines the signed entity projection with the signed macro projection:
```
blended = company_weight × momentum_projected + macro_weight × macro_signed
```
### Driving Factors
The projection records a list of human-readable driving factors that explain what is influencing the projected direction. These include momentum descriptions ("Positive momentum (+0.150) in recent trend strength"), macro impact projections ("Macro signals project negative impact (strength 0.350) over 7d"), and upcoming catalysts drawn from the trend's `dominant_catalysts` list (limited to the top 3). If no specific factors are identified, a baseline continuation factor is recorded.
### Divergence Detection
After computing the projected direction, the function compares it to the current trend direction. If they differ — for example, the current trend is negative but the projection is positive due to decaying unfavorable macro events and favorable momentum — the projection is flagged with `diverges_from_current = True` and a divergence driving factor is appended. Divergence signals are particularly valuable because they indicate that the trend may be about to reverse, giving the recommendation engine and human reviewers an early warning.
The projection also flags low confidence when `projected_confidence` falls below the default threshold of `0.3`. Projection confidence starts at 80% of the current trend confidence (reflecting the inherent uncertainty of forward-looking estimates), with a small boost if macro data is available and a further reduction if the macro layer is disabled entirely.
---
## Persistence
Each aggregation cycle persists its results to four PostgreSQL tables, creating a durable record of the trend assessment and its supporting evidence.
### `trend_windows` — Current State
The `persist_trend_summary()` function in `services/aggregation/worker.py` upserts the `TrendSummary` into the `trend_windows` table, keyed by `(entity_type, entity_id, window)`. Each cycle overwrites the previous row for that entity and window, so `trend_windows` always reflects the most recent assessment. The row includes the trend direction, strength, confidence, contradiction score, disagreement details (as JSON), supporting and opposing evidence document IDs (as JSON arrays), dominant catalysts, material risks, environmental context, and the generation timestamp.
### `trend_history` — Time-Series Snapshots
Immediately after the upsert, `persist_trend_summary()` also inserts a snapshot row into the `trend_history` table. Unlike `trend_windows`, this table is append-only — every aggregation cycle adds a new row, creating a time-series of how the trend evolved over time. The history table stores the direction, strength, confidence, contradiction score, catalysts, risks, and timestamp. This time-series data powers the trend charts in the dashboard and enables the momentum computation in `services/aggregation/projection.py` by providing the previous cycle's strength and direction. If the history insert fails (for example, if the table does not yet exist in a development environment), the failure is logged at debug level and does not block the main upsert.
### `trend_evidence` — Per-Document Rankings
The `persist_trend_evidence()` function writes detailed evidence ranking rows to the `trend_evidence` table, linked to the `trend_windows` row by its UUID. Each row records a document ID, its role (supporting or opposing), and the individual scoring components: rank score, weight component, impact component, recency component, confidence component, and sentiment value. Non-UUID document IDs (such as synthetic pattern signal IDs like `pattern:Entity-A:performance_report:7d`) are filtered out before insertion, since the `trend_evidence` table enforces a foreign key to the `documents` table.
### `trend_projections` — Forward-Looking Estimates
The `persist_trend_projection()` function in `services/aggregation/projection.py` inserts the `TrendProjection` into the `trend_projections` table, linked to the `trend_windows` row. The row stores the projected direction, strength, confidence, projection horizon, driving factors (as JSON), macro contribution percentage, divergence flag, and computation timestamp. Like trend history, projections accumulate over time, allowing analysis of how well the system's forward-looking estimates matched subsequent reality.
---
## What Comes Next
At this point, the aggregation engine has transformed weighted signals into `TrendSummary` objects across five time windows, detected contradictions, ranked evidence, computed confidence, and persisted everything to PostgreSQL. The trend metrics — direction, strength, confidence, contradiction score — encode the accumulated weight of evidence for each entity. But a `TrendSummary` is still an assessment, not an action. The next stage translates these assessments into concrete recommendations: should the system act, defer, monitor, or simply observe? And with what conviction? [Page 5 — Recommendation Generation](05-recommendation-generation.md) explains how the recommendation engine applies data quality suppression, eligibility evaluation, commitment sizing, thesis generation, and risk classification to convert trend summaries into actionable `Recommendation` objects that the decision execution engine can execute.
@@ -0,0 +1,226 @@
# Page 5 — Recommendation Generation and Signal-to-Action Translation
The aggregation engine described in [Page 4](04-trend-aggregation-and-accumulating-signals.md) produces `TrendSummary` objects across five time windows for each entity identifier, encoding the direction, strength, confidence, contradiction level, and supporting evidence accumulated from all three signal layers. But a `TrendSummary` is an assessment — it describes what the evidence says, not what the system should do about it. The recommendation engine is where assessment becomes action. It takes each `TrendSummary`, subjects it to a series of deterministic evaluations, and produces a `Recommendation` object that specifies a concrete action (act, defer, monitor, or observe), an execution mode (informational, simulation-eligible, or production-eligible), a commitment sizing guideline, a human-readable thesis, and a risk classification. Every decision in this pipeline is rule-based and fully traceable — the LLM is only involved in an optional downstream step that rewrites the thesis wording.
The recommendation worker in `services/recommendation/main.py` polls the `app:queue:recommendation` Redis queue for jobs, each specifying an entity identifier and time window. For each job, it delegates to `generate_recommendation()` in `services/recommendation/worker.py`, which orchestrates the full pipeline: fetch the latest trend summary, check for duplicate recommendations, fetch any available trend projection, evaluate data quality suppression, evaluate eligibility, optionally rewrite the thesis via LLM, build the `Recommendation` object, and persist everything to PostgreSQL. For a visual overview of this flow, see the [Recommendation Generation Flow diagram](diagrams/recommendation-generation-flow.md).
---
## Data Quality Suppression
Before the eligibility engine evaluates whether a trend is strong enough to act on, the suppression layer in `services/recommendation/suppression.py` asks a more fundamental question: is the underlying data reliable enough to act on at all? A trend might show high confidence and strong directionality, but if the documents feeding it are stale, poorly extracted, or drawn from a single source type, the apparent signal quality is illusory. The suppression layer acts as a pre-filter on data quality, running before the eligibility engine and forcing any recommendation built on unreliable data to `informational` mode regardless of how strong the trend metrics look.
The `evaluate_suppression()` function accepts a `TrendSummary` and a `DataQualityContext` — a set of metrics about the documents underlying the trend, populated by querying `documents` and `document_intelligence` tables for the evidence document IDs stored in the trend summary. When full document-level metrics are not available (for example, in a development environment without the full document pipeline), the function falls back to `build_quality_context_from_summary()`, which estimates quality metrics from the trend summary's own evidence counts and confidence.
### The Six Data Quality Checks
The suppression evaluation runs six independent checks, each comparing a data quality metric against a configurable threshold defined in `SuppressionConfig`. If any single check fails, the recommendation is suppressed:
1. **Low extraction confidence** — If the average extraction confidence across the evidence documents falls below `0.40` (`min_avg_extraction_confidence`), the underlying LLM extractions are too unreliable. This catches cases where the extractor struggled with document formatting, ambiguous content, or low-quality source material, as described in [Page 2](02-ai-agent-processing-and-extraction.md).
2. **Evidence staleness** — If the most recent evidence document is older than `168` hours (7 days, `max_evidence_staleness_hours`), the trend is based on outdated information. Conditions change rapidly, and a week-old evidence base may no longer reflect the current state. When documents exist but no timestamp is available, the evidence is conservatively treated as stale.
3. **Low source diversity** — If fewer than `1` distinct source type (`min_source_types`) contributed to the evidence, the signal may be driven by a single unreliable source class. In practice, this check fires when the quality context has documents but all come from the same source type (for example, all news articles with no filings or supplementary data to corroborate).
4. **High extraction failure rate** — If more than `50%` (`max_extraction_failure_rate`) of the documents that should have contributed to the trend failed extraction entirely, the data pipeline is unreliable for this entity. A high failure rate means the trend summary is built from a biased subset of the available evidence — the failed documents might have told a different story.
5. **Insufficient valid documents** — If fewer than `2` valid (non-failed) documents (`min_valid_documents`) contributed to the trend, there simply is not enough data to act on. A single document, no matter how high-quality, does not provide the corroboration needed for automated execution decisions.
6. **Low data quality score** — The `_compute_data_quality_score()` function computes an overall quality score from three weighted components: extraction confidence (40% weight, normalized against a 0.8 baseline), evidence freshness (30% weight, linear decay over the staleness window), and document coverage (30% weight, combining the valid/total ratio with a count factor that saturates at 10 documents). If this composite score falls below `0.30` (`min_data_quality_score`) and the low-confidence check has not already fired, a general suppression reason is added.
When any check triggers, the `SuppressionResult` records the specific reasons (as `SuppressionReason` enum values) and the computed data quality score. The worker in `services/recommendation/worker.py` uses this result to force the recommendation's mode to `informational` and append a suppression note to the thesis text, ensuring the suppression decision is visible in the audit trail.
### Safety Suppressions: Macro-Only and Pattern-Only Signals
Beyond the six data quality checks, two additional safety suppressions protect against acting on signals that lack entity-specific corroboration:
**Macro-only suppression** (`evaluate_macro_only_suppression()`) fires when macro signals are the sole basis for a trend direction — no entity-specific signals contributed at all. As described in [Page 3](03-signal-scoring-and-weighted-signals.md), macro signals enter the aggregation engine at a reduced weight of `0.3` relative to entity-specific signals. But even at reduced weight, macro signals alone can shift a trend direction if no entity-specific evidence exists. When this happens, the recommendation is forced to `informational` mode with a caveat noting that the signal is macro-only and should not be used for automated execution.
**Pattern-only suppression** (`evaluate_pattern_only_suppression()`) applies the same logic to competitive/pattern signals. When pattern-based signals from `services/aggregation/pattern_matcher.py` and `services/aggregation/signal_propagation.py` are the sole contributors — no entity-specific or macro signals — the recommendation is suppressed. Historical patterns are valuable context, but acting on them without any current evidence is too speculative for automated execution.
Both safety suppressions are evaluated in the worker after the main suppression check, and both force the mode to `informational` when triggered.
---
## Eligibility Evaluation
Recommendations that survive the suppression layer enter the eligibility evaluation in `services/recommendation/eligibility.py`. This is the core decision logic — a set of deterministic rules that map trend metrics to actions, execution modes, and commitment sizing. The `evaluate_eligibility()` function is the single entry point, accepting a `TrendSummary` and an `EligibilityConfig` of tunable thresholds.
### Gate Checks
The `_check_gates()` function applies five hard gates. If any gate fails, the trend is ineligible for a recommendation (though the action and mode are still computed for the audit trace):
| Gate | Threshold | Rejection Reason |
|------|-----------|-----------------|
| Confidence | ≥ `0.35` | `low_confidence` |
| Trend strength | ≥ `0.10` | `low_trend_strength` |
| Contradiction score | ≤ `0.60` | `high_contradiction` |
| Evidence count | ≥ `2` (supporting + opposing) | `insufficient_evidence` |
| Direction | ≠ `neutral` | `neutral_direction` |
These gates are intentionally conservative. A confidence threshold of `0.35` means the system needs meaningful evidence breadth and agreement before generating any recommendation at all (see the confidence computation in [Page 4](04-trend-aggregation-and-accumulating-signals.md)). The contradiction ceiling of `0.60` allows moderately contested trends through — only when the evidence is deeply split does the gate reject. The evidence minimum of `2` ensures that no recommendation is ever based on a single document.
When a trend fails any gate, the resulting `EligibilityResult` has `eligible = False` and the mode is forced to `informational`, regardless of what the mode escalation logic would otherwise compute.
### Action Mapping
The `_determine_action()` function maps the trend's direction and strength to one of four action types. The logic evaluates in a specific order:
**Mixed or neutral direction → OBSERVE.** If the trend direction is `mixed` (high contradiction with weak directional signal) or `neutral`, the action is always `OBSERVE`. There is no directional conviction to act on.
**Strong directional signal → ACT or DEFER.** If the trend strength reaches `0.25` or above (`action_strength_threshold`), the action follows the direction: `ACT` for positive, `DEFER` for negative. This threshold ensures that only trends with meaningful magnitude trigger commitment-changing actions.
**Weak directional signal with decent confidence → MONITOR.** If the trend has a clear direction (positive or negative) but strength remains below `0.25`, the action depends on confidence. If confidence reaches `0.50` or above (`hold_confidence_threshold`), the action is `MONITOR` — the system recognizes the directional lean but does not have enough conviction to recommend a commitment change. Below `0.50` confidence, the action falls to `OBSERVE`.
This mapping creates the escalation ladder described in [Page 4](04-trend-aggregation-and-accumulating-signals.md): as consecutive signals accumulate and strengthen the trend metrics, the action naturally progresses from OBSERVE → MONITOR → ACT/DEFER.
### Mode Escalation
The `_determine_mode()` function determines the highest execution mode allowed for the recommendation. Mode controls whether the recommendation is purely informational, eligible for simulation mode, or eligible for live execution mode:
**OBSERVE and MONITOR → always informational.** These actions do not trigger executions, so they are always `informational` mode. They are logged for human review and dashboard display but never enter the decision execution engine.
**ACT and DEFER → escalation based on signal quality.** For actionable recommendations, mode escalates through three tiers:
- **`informational`** — The default when confidence is below `0.50`. The recommendation is recorded but not eligible for any execution.
- **`simulation_eligible`** — When confidence reaches `0.50` or above (`paper_confidence_threshold`). The recommendation can be picked up by the simulation engine described in [Page 6](06-decision-execution.md).
- **`production_eligible`** — The strictest tier, requiring confidence ≥ `0.70` (`live_confidence_threshold`), contradiction ≤ `0.25` (`live_max_contradiction`), and evidence count ≥ `5` (`live_min_evidence`). This triple gate ensures that only high-conviction, well-corroborated, low-contradiction recommendations can trigger live executions.
The evidence count for mode escalation is computed as the sum of supporting and opposing evidence documents, matching the same count used in the gate checks.
---
## Commitment Sizing
The `_compute_position_sizing()` function in `services/recommendation/eligibility.py` translates signal quality into an allocation pool guideline. Commitment sizing is not a fixed value — it scales dynamically with the confidence and strength of the underlying trend, penalized by contradiction and thin evidence.
### Base and Scaling
The computation starts with a base allocation of `1%` (`base_allocation_pct = 0.01`) and scales upward based on two factors:
- **Confidence factor** — `0.8 × confidence` (`confidence_sizing_weight`), reflecting how much the system trusts the trend assessment.
- **Strength factor** — `0.5 + 0.5 × trend_strength`, ranging from `0.5` (weakest trend) to `1.0` (strongest trend).
The raw allocation percentage is computed as:
```
raw_allocation = base + confidence_factor × strength_factor × (max - base)
```
where `max` is `10%` (`max_allocation_pct = 0.10`). At maximum confidence (1.0) and maximum strength (1.0), the raw allocation reaches the full 10%. At typical values (confidence 0.6, strength 0.3), the raw allocation is considerably lower.
### Contradiction Penalty
The contradiction score applies a multiplicative penalty:
```
allocation_pct = raw_allocation × (1.0 0.5 × contradiction_score)
```
A contradiction score of `0.40` reduces the allocation by 20%. A score of `0.0` (no contradiction) applies no penalty. This ensures that contested trends receive smaller commitment sizes even when they pass the eligibility gates.
### Evidence Count Penalty
Thin evidence further reduces the allocation:
- Fewer than `3` evidence documents → multiply by `0.5` (halved).
- Fewer than `5` evidence documents → multiply by `0.75`.
- `5` or more documents → no penalty.
This penalty stacks with the contradiction penalty, so a trend with high contradiction and thin evidence receives a substantially reduced commitment size.
### Max Loss Scaling
The same scaling logic applies to the maximum loss percentage, which starts at a base of `0.3%` (`base_max_loss_pct = 0.003`) and scales up to `2%` (`max_max_loss_pct = 0.02`). Higher-conviction commitments are allowed larger loss tolerances, while low-conviction or contested commitments are constrained to tighter risk thresholds.
The final `PositionSizing` object (defined in `services/shared/schemas.py`) contains `allocation_pct` and `max_loss_pct`, both clamped to their respective bounds. This object is embedded in the `Recommendation` and later consumed by the decision execution engine's own commitment sizer (described in [Page 6](06-decision-execution.md)), which applies additional resource pool-level constraints.
---
## Thesis Generation
Every recommendation includes a human-readable thesis that explains the reasoning behind the action. Thesis generation happens in two layers: a deterministic assembly that is always present, and an optional LLM rewrite that polishes the wording for execution-eligible recommendations.
### Deterministic Thesis Assembly
The `build_thesis()` function in `services/recommendation/worker.py` constructs a thesis string entirely from the trend data and eligibility result, with no model involvement. The thesis is assembled from several components in order:
1. **Opening** — States the entity identifier, trend direction, window, strength, and confidence. For example: "Entity-A shows a negative trend over the 7d window with strength 0.35 and confidence 0.62."
2. **Catalysts** — Lists the top three dominant catalysts from the `TrendSummary`, drawn from the evidence ranking described in [Page 4](04-trend-aggregation-and-accumulating-signals.md).
3. **Contradiction note** — If the contradiction score exceeds `0.15`, a note flags the signal disagreement and its magnitude.
4. **Trend projection** — When a `TrendProjection` is available and not flagged as low-confidence, the thesis incorporates the projected direction, strength, and top driving factors. If the projection diverges from the current trend, a divergence note is appended.
5. **Risks** — Lists the top two material risks from the `TrendSummary`.
6. **Evidence count** — States the number of supporting and opposing evidence documents.
7. **Prescriptive action** — States the recommended action and mode (e.g., "Recommendation: DEFER (simulation eligible).").
The deterministic thesis is always generated and serves as the audit reference. Even when the LLM rewrites the thesis, the deterministic version is preserved in the model metadata for traceability.
### Optional LLM Rewrite via the Thesis-Rewriter Agent
For recommendations that are both eligible and not suppressed, the worker optionally invokes the thesis-rewriter agent to polish the deterministic thesis into professional-quality prose. The LLM rewrite is implemented in `services/recommendation/thesis_llm.py` and uses the `thesis-rewriter` agent slug, resolved at runtime through the `AgentConfigResolver` in `services/shared/agent_config.py`.
The `AgentConfigResolver` queries the `ai_agents` and `agent_variants` database tables to resolve the active configuration for the `thesis-rewriter` slug, preferring an active variant's model, timeout, and retry settings when one exists. The resolver uses a 60-second TTL in-memory cache to avoid hitting the database on every recommendation. This is the same resolution mechanism used by the document extractor and event classifier agents described in [Page 2](02-ai-agent-processing-and-extraction.md).
The `rewrite_thesis_with_llm()` function builds a prompt from the deterministic thesis and trend context (entity identifier, window, direction, strength, confidence, contradiction score, catalysts, risks), sends it to the local Ollama instance via HTTP, and returns the rewritten text. The system prompt enforces strict rules: no fabricated information, no numbers or facts not present in the input, under 150 words, neutral professional tone, and only the rewritten thesis text in the response.
The LLM layer is purely additive — if the call fails for any reason (network error, timeout, empty response, token budget exceeded), the original deterministic thesis is returned unchanged. The worker in `services/recommendation/main.py` resolves the thesis-rewriter configuration at startup and refreshes it every 50 jobs to pick up configuration changes without requiring a restart. When no database configuration exists for the `thesis-rewriter` slug, thesis rewriting is silently disabled.
Performance logging for the thesis-rewriter is written to the `agent_performance_log` table, recording success/failure, duration, estimated token counts, and the variant ID. Token budget enforcement checks hourly usage against the variant's configured budget before making the LLM call, preventing runaway costs from high-volume recommendation cycles.
### Risk Classification Prefix
Before the thesis is stored, the `classify_risk()` function in `services/recommendation/worker.py` assigns a risk classification label that is prepended to the thesis text as a `[risk:<level>]` prefix. The classification is computed from a composite score:
| Factor | Contribution |
|--------|-------------|
| Contradiction score | `contradiction × 2.0` |
| Low confidence | `(1.0 confidence) × 1.5` |
| Low evidence count | `+1.0` if < 3 docs, `+0.5` if < 5 docs |
| Rejection reasons | `+0.5` per rejection reason |
The composite score maps to four levels:
| Score Range | Classification |
|-------------|---------------|
| ≥ 3.0 | `very_high` |
| ≥ 2.0 | `high` |
| ≥ 1.0 | `moderate` |
| < 1.0 | `low` |
A recommendation with high contradiction (0.4 → contributes 0.8), moderate confidence (0.55 → contributes 0.675), and 4 evidence documents (contributes 0.5) would score 1.975, classifying as `moderate`. The same recommendation with only 2 evidence documents would score 2.475, pushing it to `high`. This classification gives downstream consumers — both the decision execution engine and human reviewers — a quick risk signal without needing to re-evaluate the underlying metrics.
---
## Persistence
The recommendation pipeline persists its output to three PostgreSQL tables, creating a complete audit trail from trend assessment through decision logic to the final recommendation.
### `recommendations` — The Core Record
The `persist_recommendation()` function in `services/recommendation/worker.py` inserts the `Recommendation` into the `recommendations` table. Each row captures the entity identifier, action, mode, confidence, time horizon, thesis (including the risk classification prefix and any suppression notes), invalidation conditions (as JSONB), commitment sizing (allocation percentage and max loss percentage), model metadata (provider, model name, prompt version, schema version), risk classification, and generation timestamp. The insert returns the recommendation's UUID, which serves as the foreign key for the evidence and risk evaluation tables.
### `recommendation_evidence` — Evidence Citations
For each evidence document referenced in the recommendation, a row is inserted into the `recommendation_evidence` table linking the recommendation UUID to the document UUID, with an evidence type (`supporting` or `opposing`) and a position-based weight that decays with rank: `weight = 1.0 / (1.0 + index × 0.1)`. The first supporting document gets weight `1.0`, the second gets `0.91`, the third `0.83`, and so on. Non-UUID document IDs (such as synthetic pattern signal IDs like `pattern:Entity-A:performance_report:7d` from the competitive signal layer) are filtered out before insertion, since the table enforces a foreign key to the `documents` table.
### `risk_evaluations` — Decision Audit Trail
The `risk_evaluations` table records the full eligibility decision for each recommendation: whether the trend was eligible, the allowed mode, the list of rejection reasons (as JSONB), and a `risk_checks` JSONB object containing the time horizon, commitment sizing details, invalidation conditions, and risk classification. This table enables post-hoc analysis of why the system made a particular decision — auditors can trace from the recommendation back through the eligibility evaluation to the underlying trend metrics.
---
## Deduplication
Before running the full evaluation pipeline, the worker checks whether the latest recommendation for the same entity identifier and time horizon is effectively identical to what would be generated. The `_is_duplicate_recommendation()` function in `services/recommendation/worker.py` compares the previous recommendation's action, mode, and confidence (within a `0.01` tolerance) against the current eligibility result. If all three match, the recommendation is skipped — the underlying trend data has not changed meaningfully since the last cycle. This prevents the system from flooding the `recommendations` table with identical entries on every aggregation cycle, while still generating a new recommendation whenever the trend metrics shift enough to change the action, mode, or confidence.
---
## What Comes Next
At this point, the recommendation engine has translated trend assessments into concrete `Recommendation` objects — each with an action, execution mode, commitment sizing guideline, thesis, and risk classification — and persisted them alongside their evidence citations and eligibility audit trails. Recommendations marked as `simulation_eligible` or `production_eligible` are now available for the decision execution engine to consume. [Page 6 — Decision Execution](06-decision-execution.md) explains how the decision execution engine polls these recommendations, applies its own pre-execution check sequence (circuit breakers, execution windows, confidence gates, deduplication, declining commitments, and max open commitments), computes final commitment sizes with resource pool-level constraints, and submits execution requests through the execution adapter to the external execution API.
@@ -0,0 +1,199 @@
# Page 6 — Decision Execution
The recommendation engine described in [Page 5](05-recommendation-generation.md) produces `Recommendation` objects with an action, execution mode, commitment sizing guideline, thesis, and risk classification. Recommendations marked as `simulation_eligible` or `production_eligible` are persisted to the `recommendations` table and are now available for the final stage of the pipeline: autonomous decision execution. The decision execution engine in `services/trading/engine.py` is where intelligence becomes action. It polls eligible recommendations, subjects each one to a strict sequence of pre-execution safety checks, computes a pool-aware commitment size, and — if every gate passes — submits an execution request through the execution adapter to the external execution API. Every evaluation, whether it results in a decision or a skip, is recorded as a `DecisionRecord` in the `execution_decisions` table, creating a complete audit trail from the original document signal through to the execution response.
For a visual overview of the decision flow, see the [Decision Engine Loop diagram](diagrams/decision-engine-loop.md).
---
## The Decision Execution Engine Loop
The `DecisionEngine` class in `services/trading/engine.py` is the orchestrator. When `start()` is called, it loads the current resource pool state from PostgreSQL — active commitments, reserve pool balance, sector exposure, pool exposure — and then spawns five concurrent `asyncio` tasks that run for the lifetime of the engine:
1. **`_decision_loop()`** — The core polling loop. Every 60 seconds (configurable via `polling_interval_seconds`), it queries the `recommendations` table for rows where `action IN ('act', 'defer')`, `mode IN ('simulation_eligible', 'production_eligible')`, and `generated_at` is within the last two hours. Recommendations are ordered by confidence descending and capped at 50 per cycle. For each recommendation, the engine fetches the current data point (first from `market_snapshots`, falling back to the data source API), then runs the full pre-execution evaluation pipeline described below.
2. **`_risk_threshold_monitor()`** — Periodically checks current values against the risk threshold and gain target levels maintained by the `RiskThresholdManager` in `services/trading/stop_loss_manager.py`. When a value crosses a risk threshold or gain target, the monitor submits a defer execution request to the execution queue. The `RiskThresholdManager` computes initial levels from ATR and risk tier parameters, re-evaluates them when volatility shifts materially (ATR change > 10%), activates trailing thresholds when the value moves more than 50% toward the gain target, and tightens thresholds proactively when pool exposure exceeds 80% of the maximum.
3. **`_performance_loop()`** — Computes pool-wide performance metrics (total value, unrealized and realized gain/loss, success rate, risk-adjusted return ratio, peak-to-trough decline, pool exposure), persists daily snapshots to `pool_snapshots`, checks for daily-loss circuit breaker triggers, evaluates gain-taking opportunities, and synchronizes commitments with the database to detect closed commitments and trigger reserve pool siphoning.
4. **`_risk_tier_scheduler()`** — Runs once daily at 16:00 ET (session close). It loads the latest `PerformanceMetrics` from `pool_snapshots`, computes the reserve pool as a fraction of total resource pool value, and delegates to the `RiskTierController` in `services/trading/risk_tier_controller.py` to determine whether the active risk tier should change. Tier changes are persisted to `risk_tier_history` and take effect immediately for subsequent decision cycles.
5. **`_rebalance_scheduler()`** — Runs weekly on Monday at 09:45 ET (shortly after session open). It loads current commitments, evaluates them against the active risk tier's constraints using the `PoolRebalancer`, and pushes any rebalance defer execution requests to `app:queue:execution_orders`. The rebalancer respects the circuit breaker — if any breaker is active, the rebalance cycle is skipped entirely.
All five tasks run concurrently within a single `asyncio` event loop. Graceful shutdown via `stop()` cancels all tasks and awaits their completion. If any task encounters an unexpected exception, it logs the error and retries after a brief sleep rather than crashing the engine.
---
## Pre-Execution Check Sequence
When the decision loop picks up an act recommendation, it calls `evaluate_recommendation()` — a synchronous method that runs the full pre-execution check sequence. The checks are applied in a strict order, and the first failure short-circuits the evaluation with a `skip` decision. This fail-fast design ensures that expensive downstream computations (like commitment sizing and correlation analysis) are never reached when a simple gate would have rejected the decision.
The six checks, in order:
**a. Circuit breaker check.** The engine calls `self.circuit_breaker.is_active()` on the current `CircuitBreakerState`. If any circuit breaker is active and its cooldown has not expired, the recommendation is skipped with reason `circuit_breaker_active`. The circuit breaker mechanism is described in detail below.
**b. Execution window check.** The `is_within_execution_window()` function verifies that the current time falls within the active session hours. Outside the execution window, no execution requests are submitted — the recommendation is skipped with reason `outside_execution_window`.
**c. Confidence gate.** The recommendation's confidence score is compared against the active risk tier's `min_confidence` threshold. A conservative tier requires confidence ≥ 0.75, moderate requires ≥ 0.55, and aggressive requires ≥ 0.40. If the recommendation's confidence falls below the tier minimum, it is skipped with reason `insufficient_confidence`. This gate ensures that the risk tier's conservatism is enforced before any resource allocation is considered.
**d. Deduplication check.** The engine maintains an in-memory set of processed recommendation IDs (`processed_recommendation_ids`) and also checks Redis via `app:dedupe:execution:*` keys (with a 24-hour TTL). If the recommendation has already been evaluated in this engine session or by a previous instance, it is skipped with reason `duplicate_recommendation`. This prevents the same recommendation from generating multiple execution requests across polling cycles.
**e. Declining commitments check.** The `check_declining_commitments()` method examines all active commitments. If more than 50% of commitments have unrealized losses exceeding 2% of their entry value, the engine halts new entries with reason `multiple_declining_commitments`. This is a pool-level safety valve — when the majority of existing commitments are underwater, adding new exposure compounds the risk.
**f. Max active commitments check.** The engine enforces a configurable maximum number of concurrent commitments (default 10). If the resource pool is already at capacity, the recommendation is skipped with reason `max_commitments_reached`.
For defer recommendations, the engine follows a separate, simpler path: it verifies the execution window, looks up the existing commitment for the entity, and submits a full-quantity defer execution request without running the commitment sizer. Defer decisions still generate an audit record in `execution_decisions` and set the Redis deduplication key.
If all six checks pass for an act recommendation, the engine proceeds to commitment sizing.
---
## Commitment Sizing
The `CommitmentSizer` in `services/trading/position_sizer.py` translates a recommendation's signal quality into a concrete dollar amount and unit count, applying a sequential pipeline of adjustments that account for confidence, pool composition, sector concentration, correlation, and upcoming performance report events. The sizer operates on the *active pool* — the portion of the resource pool available for execution after subtracting the reserve pool balance.
### Base Sizing
The computation begins with a base allocation percentage derived from the risk tier:
```
base_allocation_pct = risk_tier.max_position_pct × 0.5
raw_pct = base_allocation_pct × (confidence / min_confidence)
```
The base starts at half the tier's maximum commitment percentage, then scales linearly with how far the recommendation's confidence exceeds the tier minimum. A moderate-tier recommendation with confidence 0.70 against a minimum of 0.55 would produce a raw percentage of `0.05 × (0.70 / 0.55) ≈ 0.0636`, or about 6.4% of the active pool. The raw percentage is clamped to `max_position_pct` (5% for conservative, 10% for moderate, 15% for aggressive) and then converted to a dollar amount against the active pool. An absolute commitment cap (default $50) provides a hard ceiling regardless of pool size — a safety measure for the simulation mode environment.
### Correlation-Aware Diversification
The sizer computes a weighted average correlation between the candidate entity and all existing commitments, using the pairwise correlation matrix that the engine refreshes from 30 days of daily close values in `market_snapshots`. Each existing commitment's correlation is weighted by its value, so larger commitments have more influence on the diversification check.
If the weighted average correlation exceeds 0.8, the commitment is rejected outright — the resource pool already has too much exposure to correlated assets. Between 0.5 and 0.8, the dollar amount is reduced proportionally: a correlation of 0.65 produces a scale factor of `1.0 (0.65 0.5) / (0.8 0.5) = 0.5`, halving the commitment size. Below 0.5, no reduction is applied.
### Sector Exposure Reduction
The sizer checks whether adding the new commitment would push the sector's total exposure beyond the risk tier's `max_sector_pct` (20% for conservative, 30% for moderate, 40% for aggressive). If the sector is already at its limit, the commitment is rejected. If the new commitment would exceed the limit, the dollar amount is reduced to exactly fill the remaining sector capacity.
### Diversification Bonus
When the resource pool holds fewer than three distinct sectors and the candidate entity belongs to a new sector, the sizer applies a 1.2× bonus to the dollar amount. This incentivizes early diversification — the first few commitments are encouraged to spread across sectors rather than concentrating in a single one. The bonus is re-clamped to `max_position_pct` after application to prevent oversized commitments.
### Performance Report Proximity Adjustment
The sizer checks the performance report calendar for the candidate entity. If a performance report is within one active session, the commitment is rejected entirely — the binary risk of a disclosure surprise is too high for automated entry. If a performance report is within three active sessions, the dollar amount is reduced by 50%. Beyond three sessions, no adjustment is applied.
### Pool Exposure Check and Unit Rounding
After all adjustments, the sizer estimates the new commitment's contribution to pool exposure (the aggregate risk from risk threshold distances across all commitments). If adding the commitment would push total exposure beyond `max_portfolio_heat × active_pool` (10% for conservative, 20% for moderate, 30% for aggressive), the commitment is rejected.
Finally, the dollar amount is converted to whole units via `floor(dollar_amount / current_value)`. If rounding produces zero units (the commitment is too small for even one unit at the current value), the commitment is rejected. The final dollar amount is recalculated from the whole-unit quantity to reflect the actual capital deployed.
The `CommitmentSizeResult` returned to the engine includes the dollar amount, unit quantity, allocation percentage, a list of human-readable adjustment notes, and a rejected flag with reason if any step failed. These adjustment notes are embedded in the decision record's `decision_trace` for full auditability.
---
## Circuit Breaker
The `CircuitBreaker` in `services/trading/circuit_breaker.py` is a pure computation module that evaluates three independent trigger conditions. It carries no state of its own — the engine manages the `CircuitBreakerState` dataclass and persists trigger events to the `circuit_breaker_events` table and Redis keys under `app:execution:circuit_breaker:*`.
### Three Trigger Types
**Daily loss trigger.** When the resource pool's daily gain/loss exceeds 5% of total resource pool value (`daily_loss_pct = 0.05`), the circuit breaker activates. The `check_daily_loss()` method compares the absolute loss ratio against the threshold. The cooldown duration is set to `volatility_pause_hours` (default 2 hours). The performance loop in the engine calls `_check_circuit_breaker_daily_loss()` periodically to evaluate this condition against the latest pool metrics. In extreme cases where the peak-to-trough decline exceeds an emergency threshold, the reserve pool's emergency liquidation mechanism may also be triggered.
**Single commitment loss trigger.** When any individual commitment loses more than 15% of its entry value (`single_position_loss_pct = 0.15`), the circuit breaker activates with an entity-specific cooldown. The `check_single_position()` method evaluates the loss percentage. The cooldown for the affected entity is set to `ticker_cooldown_hours` (default 48 hours), during which the engine will not re-enter that entity. The `is_ticker_cooled_down()` method checks whether a specific entity is still within its cooldown window by consulting the `ticker_cooldowns` dictionary in the `CircuitBreakerState`.
**Volatility trigger (risk threshold clustering).** When three or more risk thresholds fire within a 30-minute rolling window (`stop_loss_hits_threshold = 3`, `stop_loss_window_minutes = 30`), the circuit breaker activates. The `check_volatility()` method uses a sliding window algorithm: it sorts the risk threshold timestamps and checks every contiguous subsequence of length `stop_loss_hits_threshold` to see if it fits within the window. This detects rapid-fire risk threshold cascades that indicate extreme volatility. The cooldown is `volatility_pause_hours` (default 2 hours).
### Cooldown Computation
The `compute_cooldown_expiry()` method calculates when a triggered breaker expires. For `daily_loss` and `volatility` triggers, the expiry is `triggered_at + volatility_pause_hours`. For `single_position` triggers, the expiry is `triggered_at + ticker_cooldown_hours`, giving the affected entity a longer cooling-off period. The `is_active()` method returns `True` when the breaker is flagged active and the current time has not yet passed the cooldown expiry.
### Redis State Tracking
The engine persists circuit breaker state to Redis under the `app:execution:circuit_breaker:*` key pattern (constructed by `execution_cb_key()` in `services/shared/redis_keys.py`). Each trigger type gets its own key — for example, `app:execution:circuit_breaker:daily_loss` — storing the activation timestamp and cooldown expiry. This allows the state to survive engine restarts and enables external monitoring tools to query breaker status without accessing the engine's memory.
---
## Reserve Pool
The `ReservePoolController` in `services/trading/reserve_pool.py` manages an untouchable cash reserve that grows from realized execution gains. The reserve serves two purposes: it provides a buffer against peak-to-trough declines, and its size relative to the resource pool influences risk tier upgrade decisions.
### Profit Siphoning
When the engine detects a closed commitment with positive unrealized gain/loss (via `_sync_commitments_and_siphon()` in the performance loop), it calls `siphon_profit()` on the controller. The method transfers a configurable fraction of the realized gain into the reserve — by default 20% (`siphon_pct = 0.20`). Only positive gains are siphoned; losses do not reduce the reserve balance. Each siphon event is recorded in the `reserve_pool_ledger` table with the transfer amount, resulting balance, trigger type (`profit_siphon`), the entity as reference, and a timestamp.
### High-Water Mark Rebalancing
The `is_high_water()` method returns `True` when the reserve balance exceeds 30% of total resource pool value (`high_water_pct = 0.30`). This signal is consumed by the risk tier scheduler — when the reserve is healthy and other performance criteria are met, the controller may recommend upgrading to a more aggressive tier. The high-water mark acts as a confidence indicator: a large reserve means the system has been consistently successful and can afford to take on more risk.
### Emergency Liquidation
The `should_emergency_liquidate()` method checks whether the current peak-to-trough decline exceeds an emergency threshold. When triggered, `emergency_liquidate()` returns the full reserve balance for release back into the active pool. The caller (the engine) is responsible for zeroing the persisted balance and recording the ledger entry. Emergency liquidation is a last resort — it sacrifices the safety buffer to prevent the resource pool from hitting a catastrophic loss level.
### Active Pool Computation
The `compute_active_pool()` method calculates the capital available for execution: `active_pool = total_pool_value reserve_balance`. All commitment sizing computations use the active pool rather than the total resource pool value, ensuring that the reserve is never inadvertently deployed into new commitments.
---
## Risk Tier Auto-Adjustment
The `RiskTierController` in `services/trading/risk_tier_controller.py` evaluates resource pool performance and determines whether the active risk tier should shift. The system supports three tiers — conservative, moderate, and aggressive — each defined by a `RiskTierConfig` dataclass in `services/trading/models.py` with distinct parameter values:
| Parameter | Conservative | Moderate | Aggressive |
|-----------|-------------|----------|------------|
| `min_confidence` | 0.75 | 0.55 | 0.40 |
| `max_position_pct` | 5% | 10% | 15% |
| `stop_loss_atr_multiplier` | 1.5× | 2.0× | 2.5× |
| `reward_risk_ratio` | 2.0 | 1.5 | 1.2 |
| `max_sector_pct` | 20% | 30% | 40% |
| `max_portfolio_heat` | 10% | 20% | 30% |
The tier controller's `evaluate()` method checks two conditions:
**Downgrade (any one triggers).** If the trailing 30-day success rate drops below 40% or the current peak-to-trough decline exceeds 15%, the tier steps down by one level (e.g., aggressive → moderate). If the system is already at conservative, no further downgrade is possible.
**Upgrade (all must be true).** If the success rate exceeds 55%, the reserve pool exceeds 20% of total resource pool value, and the current peak-to-trough decline is below 5%, the tier steps up by one level. The triple requirement ensures that upgrades only happen when the system is performing well, has built a safety cushion, and is not in a decline.
The risk tier scheduler in the engine evaluates these conditions daily at session close. When a tier change occurs, it is persisted to the `risk_tier_history` table with the previous tier, new tier, trigger source (`auto_adjustment`), and the metrics that drove the decision (success rate, peak-to-trough decline, reserve percentage, risk-adjusted return ratio). The new tier takes effect immediately — the engine updates its `_active_risk_tier` reference, and all subsequent decision cycles use the new tier's parameters for confidence gates, commitment sizing, risk threshold computation, and sector exposure limits.
---
## Execution Request Submission Flow
When `evaluate_recommendation()` returns an `act` decision, the engine constructs an execution request job and pushes it through a multi-stage submission pipeline that spans two services.
### Decision Persistence
Every evaluation — whether it results in `act` or `skip` — produces a decision record that is persisted to the `execution_decisions` table via `_persist_decision()`. The record captures the recommendation ID, decision outcome, skip reason (if applicable), entity identifier, computed commitment size and unit quantity, the risk tier at the time of decision, pool exposure, active pool and reserve pool balances, circuit breaker status, correlation and sector exposure check results, performance report proximity flag, and a `decision_trace` JSONB field containing the full reasoning chain. This creates a complete audit record of every recommendation the engine evaluated and why it acted or declined.
### Execution Request Enqueue
For `act` decisions, the engine builds an execution request job dictionary containing the decision ID, entity identifier, action (act or defer), quantity, and request type (immediate). This job is pushed via `rpush` to the `app:queue:execution_orders` Redis queue (constructed by `queue_key(QUEUE_BROKER)` from `services/shared/redis_keys.py`). The engine immediately deducts the estimated execution cost from the in-memory active pool to prevent over-allocation across concurrent recommendation evaluations within the same polling cycle.
### Execution Service Processing
The execution service in `services/adapters/broker_service.py` runs as a standalone worker that polls `app:queue:execution_orders` via `blpop`. For each execution request job, `process_order_job()` executes a multi-step pipeline:
1. **Idempotency check.** A deterministic idempotency key is generated from the job's entity identifier, action, quantity, and decision ID. The service checks Redis first (fast path) and then the `orders` table (durable fallback) to prevent duplicate submissions. If a matching key exists, the job is silently dropped.
2. **Risk evaluation.** The service loads the current `PoolRiskConfig` from the database and the account's risk state (active commitments, daily gain/loss, sector exposure) from both the database and the external execution API. The `evaluate_order()` function runs the proposed execution request through a set of risk checks — commitment limits, sector concentration, daily loss thresholds — and produces an evaluation result. The evaluation is persisted to the `risk_evaluations` table regardless of outcome.
3. **External API submission.** If the risk evaluation passes, the service calls `submit_order()` on the `ExecutionAdapter` in `services/adapters/broker_adapter.py`. The adapter constructs the external execution API payload (entity identifier, quantity, side, request type, time in force) and submits it to `execution-api.example.com/v2/orders` with an idempotency key header. The adapter follows a fail-closed policy: any network error or ambiguous response returns a rejected `ExecutionResponse` rather than risking duplicate execution requests.
4. **Persistence and audit trail.** The `persist_order()` function writes the execution request to the `orders` table with the full request and response details, risk evaluation results, and the recommendation ID for traceability. When the execution request is filled, the fill details (value, quantity) are recorded. Execution request events are published to the analytical lakehouse via MinIO for downstream analysis. The Redis idempotency marker is set after successful persistence to prevent reprocessing.
The result is a complete chain of custody: from the original document that produced a signal (Pages [1](01-data-ingestion-and-preparation.md)[2](02-ai-agent-processing-and-extraction.md)), through signal scoring ([Page 3](03-signal-scoring-and-weighted-signals.md)) and trend aggregation ([Page 4](04-trend-aggregation-and-accumulating-signals.md)), to the recommendation ([Page 5](05-recommendation-generation.md)), the execution decision, the risk evaluation, and the execution response — every step is persisted and linked by foreign keys. The `execution_decisions` table links to `recommendations` via `recommendation_id`, the `orders` table links back to both, and the `commitments` and `pool_snapshots` tables capture the resource pool impact over time.
For additional reference on the decision execution engine's configuration, queue topology, and database tables, see [docs/services.md](../services.md).
---
## Conclusion: From Raw Data to Decision Execution
This six-page series has traced the full intelligence-to-decision pipeline, from the moment raw data enters the system to the moment an execution request reaches the external execution API.
It began with [Page 1](01-data-ingestion-and-preparation.md), where the scheduler orchestrates ingestion cycles across four data sources — external news, regulatory filings, external data feeds, and macro news APIs — and the parser normalizes raw content into structured documents ready for AI processing. [Page 2](02-ai-agent-processing-and-extraction.md) described how the Document Intelligence Extractor and Global Event Classifier agents use LLM inference to produce structured JSON intelligence, with hot-swappable model configurations and a robust JSON repair pipeline. [Page 3](03-signal-scoring-and-weighted-signals.md) explained how raw extraction output is transformed into `WeightedSignal` objects through a composite formula that balances recency, credibility, novelty, and environmental context across three independent signal layers. [Page 4](04-trend-aggregation-and-accumulating-signals.md) showed how the aggregation engine merges these signals across five time windows, detecting contradictions, ranking evidence, and computing trend projections — with consecutive same-direction signals accumulating to escalate the system's response from neutral through observe and monitor to act or defer. [Page 5](05-recommendation-generation.md) covered the translation of trend assessments into actionable recommendations through data quality suppression, eligibility evaluation, commitment sizing, thesis generation, and risk classification.
And here in Page 6, the pipeline reached its terminus: the decision execution engine's decision loop polling those recommendations, subjecting each to circuit breaker checks, confidence gates, deduplication, pool health assessments, and a multi-step commitment sizer — then submitting approved execution requests through the execution adapter to the external execution API, with every decision recorded in a fully auditable trail from signal to execution.
The pipeline is designed to be conservative by default and transparent throughout. Every stage applies its own safety checks — deduplication at ingestion, confidence gates at extraction, contradiction detection at aggregation, suppression at recommendation, and circuit breakers at execution. The system can be tuned through runtime configuration (risk tier parameters, suppression thresholds, signal layer toggles in `risk_configs`) without code changes or restarts. And the complete audit trail — from `documents` through `document_intelligence`, `document_impact_records`, `trend_windows`, `recommendations`, `execution_decisions`, and `orders` — means that any decision can be traced back to the specific documents, signals, and evaluations that produced it.
@@ -0,0 +1 @@
@@ -0,0 +1,94 @@
# Decision Execution Engine Loop
```mermaid
flowchart TD
subgraph ENGINE["Decision Execution Engine\nservices/trading/engine.py"]
direction TB
TASKS["5 Concurrent Async Tasks"]
T1["_decision_loop()\n60s polling interval"]
T2["_risk_threshold_monitor()"]
T3["_performance_loop()"]
T4["_risk_tier_scheduler()"]
T5["_rebalance_scheduler()"]
TASKS --> T1 & T2 & T3 & T4 & T5
end
T1 --> POLL["Poll recommendations table\naction IN (act, defer)\nmode IN (simulation_eligible, production_eligible)\ngenerated_at > NOW() 2h"]
POLL --> EVAL["evaluate_recommendation()"]
EVAL --> CHK_A
subgraph PRETRADE["Pre-Execution Check Sequence\n(first failure short-circuits)"]
direction TB
CHK_A["a. Circuit Breaker active?\nservices/trading/circuit_breaker.py\nTriggers: daily_loss, single_commitment, volatility"]
CHK_B["b. Execution Window?\nis_within_execution_window()"]
CHK_C["c. Confidence Gate\nconfidence ≥ risk_tier.min_confidence"]
CHK_D["d. Deduplication\nRec ID in processed set?\nRedis: app:dedupe:execution:*"]
CHK_E["e. Declining Commitments\n> 50% commitments down > 2%"]
CHK_F["f. Max Open Commitments\nopen_count ≥ max (default 10)"]
CHK_A -->|"pass"| CHK_B
CHK_B -->|"pass"| CHK_C
CHK_C -->|"pass"| CHK_D
CHK_D -->|"pass"| CHK_E
CHK_E -->|"pass"| CHK_F
end
CHK_A & CHK_B & CHK_C & CHK_D & CHK_E & CHK_F -->|"fail"| SKIP["ExecutionDecision\ndecision = skip\n+ skip_reason"]
CHK_F -->|"pass"| SIZER
subgraph SIZER["Commitment Sizing\nservices/trading/position_sizer.py"]
direction TB
SZ1["Base sizing\nrisk_tier.max_commitment_pct × 0.5\n× (confidence / min_confidence)"]
SZ2["Correlation reduction\nweighted avg corr > 0.8 → reject\n> 0.5 → proportional reduction"]
SZ3["Sector exposure\ncap at risk_tier.max_sector_pct"]
SZ4["Diversification bonus\n1.2× for new sector (< 3 sectors)"]
SZ5["Event proximity\n≤ 1 day → reject\n≤ 3 days → 50% reduction"]
SZ6["Absolute commitment cap"]
SZ7["Pool exposure check\nmax_pool_exposure × active_pool"]
SZ8["Share rounding\nfloor(dollar / price)"]
SZ1 --> SZ2 --> SZ3 --> SZ4 --> SZ5 --> SZ6 --> SZ7 --> SZ8
end
SIZER -->|"rejected"| SKIP
SIZER -->|"approved"| ACT["ExecutionDecision\ndecision = act\nshares, dollar amount"]
ACT --> PERSIST_TD["Persist to\nexecution_decisions"]
ACT --> ORDER["Build execution request\n{entity, action, side,\nquantity, request_type}"]
ORDER -->|"rpush"| Q_BROKER["app:queue:execution_orders"]
Q_BROKER --> BROKER["Execution Adapter\nexternal execution API (simulation)\nservices/adapters/broker_adapter.py"]
BROKER --> AUDIT
subgraph AUDIT["Audit Trail — PostgreSQL"]
AU1["execution_requests"]
AU2["commitments"]
AU3["pool_snapshots"]
end
subgraph CB_DETAIL["Circuit Breaker Detail\nservices/trading/circuit_breaker.py"]
CB1["daily_loss\npool loss > 5%\ncooldown: volatility_pause_hours"]
CB2["single_commitment\ncommitment loss > 15%\ncooldown: entity_cooldown_hours (48h)"]
CB3["volatility\n≥ 3 risk thresholds in 30min\ncooldown: volatility_pause_hours (2h)"]
CB4["Redis state\napp:execution:circuit_breaker:*"]
end
subgraph RESERVE["Reserve Pool\nservices/trading/reserve_pool.py"]
RP1["Profit siphoning: 20%"]
RP2["High-water rebalance: 30%"]
RP3["Emergency liquidation"]
RP4["reserve_pool_ledger"]
end
subgraph RISK_TIER["Risk Tier Auto-Adjustment\nservices/trading/risk_tier_controller.py"]
RT1["Evaluate: risk-adjusted return ratio,\npeak-to-trough decline, success rate"]
RT2["conservative → moderate → aggressive"]
RT3["risk_tier_history"]
end
```
@@ -0,0 +1,81 @@
# Ingestion-to-Extraction Flow
```mermaid
flowchart TD
subgraph Scheduler["Scheduler\nservices/scheduler/app.py"]
S1["schedule_cycle()"]
S2["Cadence check\nmarket_api: 300s\nnews_api: 300s\nfilings_api: 3600s\nmacro_news: 600s"]
S3["Rate limit check\ncheck_rate_limit()"]
S1 --> S2 --> S3
end
S3 -->|"rpush"| Q_ING["app:queue:ingestion"]
Q_ING -->|"lpop"| ING
subgraph ING["Ingestion Worker\nservices/ingestion/worker.py"]
direction TB
AD["Adapter Dispatch\nprocess_job()"]
AD --> PA["ExternalDataAdapter\nservices/adapters/market_adapter.py"]
AD --> PB["ExternalNewsAdapter\nservices/adapters/news_adapter.py"]
AD --> PC["RegulatoryFilingsAdapter\nservices/adapters/filings_adapter.py"]
AD --> PD["MacroNewsAdapter\nservices/adapters/macro_news_adapter.py"]
AD --> PE["WebScrapeAdapter\nservices/adapters/web_scrape_adapter.py"]
end
ING -->|"Content hash check\napp:dedupe:*\nTTL 24h"| REDIS_DEDUPE[("Redis\nDedupe Markers")]
ING -->|"upload_raw_artifact()"| MINIO_RAW
subgraph MINIO_RAW["MinIO Raw Storage"]
B1["app-raw-data"]
B2["app-raw-content"]
B3["app-raw-filings"]
end
ING -->|"persist_ingestion_items()"| PG_ING
subgraph PG_ING["PostgreSQL"]
T1["documents"]
T2["ingestion_runs"]
T3["document_company_mentions"]
end
ING -->|"rpush new doc IDs"| Q_PARSE["app:queue:parsing"]
Q_PARSE -->|"lpop"| PARSER
subgraph PARSER["Parser Worker\nservices/parser/worker.py"]
P1["fetch_html() → parse_html()"]
P2["Quality scoring\nconfidence: high / medium / low"]
P3["Company mention detection\ndetect_company_mentions()"]
P4["Routing decision"]
P1 --> P2 --> P3 --> P4
end
PARSER -->|"upload_normalized_text()\nupload_parser_output()"| MINIO_NORM["MinIO\napp-normalized"]
PARSER -->|"update_document_parse_results()"| PG_ING
P4 -->|"doc_type = macro_event"| Q_MACRO["app:queue:macro_classification"]
P4 -->|"doc_type ≠ macro_event"| Q_EXT["app:queue:extraction"]
Q_EXT -->|"lpop"| EXT
Q_MACRO -->|"lpop"| EXT
subgraph EXT["Extractor Worker\nservices/extractor/main.py"]
E1["Document Intelligence\nExtractor agent\nslug: document-extractor"]
E2["Global Event Classifier\nslug: event-classifier\nservices/extractor/event_classifier.py"]
E3["persist_extraction()\nservices/extractor/worker.py"]
end
EXT -->|"persist to"| PG_EXT
subgraph PG_EXT["PostgreSQL"]
T4["document_intelligence"]
T5["document_impact_records"]
T6["global_events"]
T7["macro_impact_records"]
end
EXT -->|"rpush"| Q_AGG["app:queue:aggregation"]
```
@@ -0,0 +1,80 @@
# Recommendation Generation Flow
```mermaid
flowchart TD
Q_REC["app:queue:recommendation"] -->|"lpop"| WORKER["Recommendation Worker\nservices/recommendation/main.py"]
WORKER --> FETCH["Fetch TrendSummary\nfrom trend_windows\nfor entity + window"]
FETCH --> SUPP
subgraph SUPP["Data Quality Suppression\nservices/recommendation/suppression.py"]
S1["extraction confidence < 0.40?"]
S2["evidence staleness > 168h?"]
S3["source diversity < 1 type?"]
S4["extraction failure rate > 50%?"]
S5["valid documents < 2?"]
S6["data quality score < 0.30?"]
S7["Macro-only signal?\nevaluate_macro_only_suppression()"]
S8["Pattern-only signal?\nevaluate_pattern_only_suppression()"]
end
SUPP -->|"Any check fails:\nsuppressed = true\nmode → informational"| ELIG
SUPP -->|"All checks pass"| ELIG
subgraph ELIG["Eligibility Evaluation\nservices/recommendation/eligibility.py"]
direction TB
G["Gate Checks"]
G1["confidence ≥ 0.35"]
G2["strength ≥ 0.10"]
G3["contradiction ≤ 0.60"]
G4["evidence ≥ 2"]
G5["direction ≠ neutral"]
G --> G1 & G2 & G3 & G4 & G5
G1 & G2 & G3 & G4 & G5 --> ACT["Action Mapping"]
ACT --> A1["ACT: positive + strength ≥ 0.25"]
ACT --> A2["DEFER: negative + strength ≥ 0.25"]
ACT --> A3["MONITOR: directional + confidence ≥ 0.50"]
ACT --> A4["OBSERVE: otherwise"]
A1 & A2 & A3 & A4 --> MODE["Mode Escalation"]
MODE --> M1["informational\n(default for MONITOR/OBSERVE)"]
MODE --> M2["simulation_eligible\nconfidence ≥ 0.50"]
MODE --> M3["production_eligible\nconfidence ≥ 0.70\ncontradiction ≤ 0.25\nevidence ≥ 5"]
end
ELIG --> SIZING
subgraph SIZING["Commitment Sizing\nservices/recommendation/eligibility.py"]
PS1["base = 1% allocation pool"]
PS2["scale by confidence × strength\nup to 10% max"]
PS3["contradiction penalty\n0.5 × contradiction_score"]
PS4["evidence count penalty\n< 3 docs → ×0.5\n< 5 docs → ×0.75"]
end
SIZING --> THESIS
subgraph THESIS["Thesis Generation"]
TH1["Deterministic thesis\nassembled from trend data"]
TH2["Optional LLM rewrite\nthesis-rewriter agent\nservices/recommendation/thesis_llm.py"]
TH1 --> TH2
end
THESIS --> RISK
subgraph RISK["Risk Classification"]
RC1["low"]
RC2["moderate"]
RC3["high"]
RC4["very_high"]
end
RISK --> PERSIST
subgraph PERSIST["Persistence — PostgreSQL"]
P1["recommendations"]
P2["recommendation_evidence"]
P3["risk_evaluations"]
end
```
@@ -0,0 +1,52 @@
# Three-Layer Signal Merging
```mermaid
flowchart TD
subgraph Layer1["Layer 1 — Entity Signals"]
DIR["document_impact_records\n(per-entity extraction output)"]
DIR -->|"build_weighted_signals()"| WS1["WeightedSignal[]\nweight = 1.0 (full)"]
end
subgraph Layer2["Layer 2 — Macro Signals"]
MIR["macro_impact_records\n(global event interpolation)"]
MIR -->|"build_macro_weighted_signals()"| WS2["WeightedSignal[]\nimpact × MACRO_SIGNAL_WEIGHT\n(0.3)"]
TOGGLE_M{"macro_enabled\nin risk_configs?"}
TOGGLE_M -->|"true"| MIR
TOGGLE_M -->|"false"| SKIP_M["Layer skipped\ngraceful degradation"]
end
subgraph Layer3["Layer 3 — Competitive Signals"]
CSR["competitive_signal_records\n(pattern mining + propagation)"]
CSR -->|"build_pattern_weighted_signals()\nservices/aggregation/signal_propagation.py"| WS3["WeightedSignal[]\nimpact × COMPETITIVE_SIGNAL_WEIGHT\n(0.2)"]
TOGGLE_C{"competitive_enabled\nin risk_configs?"}
TOGGLE_C -->|"true"| CSR
TOGGLE_C -->|"false"| SKIP_C["Layer skipped\ngraceful degradation"]
end
WS1 --> MERGE["Concatenate all WeightedSignal lists"]
WS2 --> MERGE
WS3 --> MERGE
MERGE --> AGG
subgraph AGG["Aggregation Engine\nservices/aggregation/worker.py"]
A1["weighted_sentiment_average()"]
A2["detect_contradictions()\nservices/aggregation/contradiction.py"]
A3["derive_trend_direction()"]
A4["compute_trend_confidence()"]
A5["rank_evidence()"]
A1 --> A2 --> A3 --> A4 --> A5
end
AGG -->|"assemble_trend_summary()"| TS["TrendSummary\nservices/shared/schemas.py"]
TS -->|"persist_trend_summary()"| PG_TREND
subgraph PG_TREND["PostgreSQL"]
TW["trend_windows\n(upserted each cycle)"]
TH["trend_history\n(time-series snapshots)"]
TE["trend_evidence\n(per-document rankings)"]
end
AGG -->|"rpush"| Q_REC["app:queue:recommendation"]
```
@@ -0,0 +1,62 @@
# Trend Accumulation and Escalation
```mermaid
flowchart TD
subgraph Windows["Five Time Windows\nservices/aggregation/worker.py"]
W1["intraday (12h)"]
W2["1d (1 day)"]
W3["7d (7 days)"]
W4["30d (30 days)"]
W5["90d (90 days)"]
end
W1 & W2 & W3 & W4 & W5 --> SIGNALS
SIGNALS["Fetch signals per window\nEntity + Macro + Competitive\n→ WeightedSignal[]"]
SIGNALS --> SENT["weighted_sentiment_average()\nCompute avg sentiment across signals"]
SENT --> DIR
subgraph DIR["derive_trend_direction()"]
D1["avg_sentiment ≥ 0.15 → POSITIVE"]
D2["avg_sentiment ≤ 0.15 → NEGATIVE"]
D3["contradiction > 0.10\nAND |avg| < 0.30 → MIXED"]
D4["otherwise → NEUTRAL"]
end
DIR --> CONF
subgraph CONF["compute_trend_confidence()"]
C1["Unique source count\ncaps at 15 → 0.8 contribution"]
C2["Avg extraction credibility"]
C3["Signal agreement ratio\ndampened by log₂(n+1)/log₂(8)\nsaturates ~7 unique sources"]
C4["Contradiction penalty\n0.4 × contradiction_score"]
C5["confidence = 0.3×count + 0.3×credibility\n+ 0.4×agreement penalty"]
end
CONF --> STRENGTH["trend_strength = |avg_sentiment|\nclamped to [0, 1]"]
STRENGTH --> ESC
subgraph ESC["Escalation Path\n(via eligibility thresholds)"]
direction TB
E1["NEUTRAL\nconfidence < 0.35\nOR strength < 0.10\nOR direction = neutral"]
E2["OBSERVE\nstrength < 0.25\nAND confidence < 0.50"]
E3["MONITOR\nstrength < 0.25\nAND confidence ≥ 0.50"]
E4["ACT / DEFER\nstrength ≥ 0.25\nAND direction = positive/negative"]
E1 -->|"More signals\nsame direction"| E2
E2 -->|"Confidence grows\nmore unique sources"| E3
E3 -->|"Strength exceeds 0.25\naccumulated evidence"| E4
end
ESC --> PERSIST
subgraph PERSIST["Persistence"]
P1["trend_windows\n(upserted each cycle)"]
P2["trend_history\n(time-series snapshots)"]
P3["trend_evidence\n(per-document rankings)"]
P4["trend_projections\nservices/aggregation/projection.py"]
end
```
@@ -0,0 +1,58 @@
# Weighted Signal Computation
```mermaid
flowchart TD
DOC["Document Signal Input\n(published_at, source_credibility,\nnovelty_score, extraction_confidence,\nmarket_ctx)"]
DOC --> GATE
DOC --> REC
DOC --> CRED
DOC --> NOV
DOC --> MKT
subgraph GATE["Confidence Gate"]
G1["extraction_confidence ≥ 0.2?"]
G1 -->|"Yes"| G2["gate = 1.0"]
G1 -->|"No"| G3["gate = 0.0\n(signal zeroed out)"]
end
subgraph REC["Recency Decay"]
R1["w = 2^(age_hours / half_life)"]
R2["Half-lives per window:\nintraday: 2h\n1d: 12h\n7d: 72h\n30d: 240h\n90d: 720h"]
R3["Floor: min_recency_weight = 0.01"]
R1 --- R2
R1 --- R3
end
subgraph CRED["Source Credibility"]
C1["Clamp to [0.1, 1.0]"]
C2["Apply exponent\n(default 1.0)"]
C1 --> C2
end
subgraph NOV["Novelty Bonus"]
N1["bonus = novelty_score × 0.25"]
N2["Range: [0.0, 0.25]\n(up to 25% boost)"]
N1 --- N2
end
subgraph MKT["Environmental Context Multiplier"]
M1["Volatility boost\nlog₁₊(excess) × 0.15\ncapped at 0.30"]
M2["Volume surge boost\nvolume_change > 50% → +0.15"]
M3["multiplier = 1.0 + boost\n(always ≥ 1.0)"]
M1 --> M3
M2 --> M3
end
GATE --> FORMULA
REC --> FORMULA
CRED --> FORMULA
NOV --> FORMULA
MKT --> FORMULA
FORMULA["combined = gate × recency × credibility\n× (1 + novelty_bonus)\n× market_context_multiplier"]
FORMULA --> SW["SignalWeight\nservices/aggregation/scoring.py"]
SW --> WS["WeightedSignal\n{ document_id, weight: SignalWeight,\nsentiment_value, impact_score }"]
```
@@ -0,0 +1,39 @@
# Intelligence Pipeline Deep Dive
This document series provides a narrative walkthrough of the full intelligence-to-decision pipeline in the platform. Unlike the existing service reference and API documentation, these pages tell the story of how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous decision execution.
Each page covers one stage of the pipeline and ends with a transition to the next, so you can read the series end-to-end or jump directly to the stage you need. Diagrams are stored as standalone Mermaid files that can be rendered independently or embedded in other documents.
---
## Table of Contents
1. [Data Ingestion and Preparation](01-data-ingestion-and-preparation.md) — How raw data from an external data provider, a public records API, and macro news APIs enters the system, gets deduplicated, stored, parsed, and routed for AI processing.
2. [AI Agent Processing and Structured Extraction](02-ai-agent-processing-and-extraction.md) — How the Document Intelligence Extractor and Global Event Classifier agents use LLM inference to produce structured JSON intelligence from documents.
3. [Signal Scoring and the WeightedSignal Abstraction](03-signal-scoring-and-weighted-signals.md) — How raw extraction output is transformed into weighted signals through confidence gating, recency decay, source credibility, novelty bonuses, and environmental context multipliers.
4. [Trend Aggregation and Accumulating Signals](04-trend-aggregation-and-accumulating-signals.md) — How the aggregation engine merges weighted signals across five time windows, detects contradictions, ranks evidence, and escalates trend strength as consecutive signals accumulate.
5. [Recommendation Generation](05-recommendation-generation.md) — How trend summaries pass through data quality suppression, eligibility evaluation, commitment sizing, thesis generation, and risk classification to produce actionable recommendations.
6. [Decision Execution](06-decision-execution.md) — How the decision execution engine polls recommendations, runs pre-execution checks, sizes commitments, enforces circuit breakers, and submits execution requests through the execution adapter.
---
## Diagrams
The following Mermaid diagram files can be rendered independently or referenced from the narrative pages:
- [Ingestion to Extraction Flow](diagrams/ingestion-to-extraction-flow.md) — Flowchart from Scheduler through Ingestion, Parser, to Extractor with all queues and storage.
- [Three-Layer Signal Merging](diagrams/three-layer-signal-merging.md) — Entity-specific, Environmental, and Relational signal layers converging into the Aggregation engine.
- [Weighted Signal Computation](diagrams/weighted-signal-computation.md) — Component breakdown of the composite weight formula.
- [Trend Accumulation and Escalation](diagrams/trend-accumulation-escalation.md) — How consecutive signals strengthen trends and escalate actions across time windows.
- [Recommendation Generation Flow](diagrams/recommendation-generation-flow.md) — From TrendSummary through suppression, eligibility, thesis, risk classification, to persistence.
- [Decision Engine Loop](diagrams/decision-engine-loop.md) — Pre-execution check sequence, commitment sizing, and execution request submission flow.
---
## Related Documentation
For reference-level detail on individual services, AI agent configuration, and infrastructure, see the existing documentation:
- [Services Reference](../services.md) — Per-service configuration, database tables, queues, and runtime behaviors.
- [AI Agents Guide](../ai-agents.md) — AI agent configuration, variants, A/B testing, and the agent management API.
- [Data Pipeline Architecture](../architecture-data-pipeline.md) — Queue topology, data store summary, and Mermaid flow diagrams for the full data pipeline.
+1052
View File
File diff suppressed because it is too large Load Diff