feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
This commit is contained in:
@@ -0,0 +1,130 @@
|
||||
# Page 1 — Data Ingestion and Preparation
|
||||
|
||||
Every signal that Stonks Oracle eventually acts on begins its life as raw data pulled from an external source. Before any AI agent can extract structured intelligence, before any trend can accumulate, and before any trade can be placed, the platform must first discover new content, fetch it reliably, eliminate duplicates, store the raw artifacts for audit, and normalize the text into a form suitable for downstream processing. This page traces that journey from external API to parser output, covering the Scheduler, Ingestion Worker, deduplication layer, raw storage, and Parser in detail.
|
||||
|
||||
For a visual overview of the full flow described here, see the [Ingestion to Extraction Flow diagram](diagrams/ingestion-to-extraction-flow.md).
|
||||
|
||||
---
|
||||
|
||||
## Four Categories of Input Data
|
||||
|
||||
Stonks Oracle tracks 50 companies across 10 sectors, and it draws intelligence from four distinct categories of external data. Each category has its own adapter, its own API conventions, and its own scheduling cadence, but all of them feed into the same ingestion pipeline.
|
||||
|
||||
The first category is **company news**, sourced from the Polygon.io ticker news endpoint (`/v2/reference/news`). The `PolygonNewsAdapter` in `services/adapters/news_adapter.py` fetches articles linked to a specific ticker, returning structured results that include title, publisher, article URL, description, keywords, and publication timestamp. Each request can return up to 1,000 articles, though the default limit is 20 per fetch. The adapter tracks the most recent `published_utc` value and uses it on subsequent fetches to avoid re-retrieving articles the system has already seen.
|
||||
|
||||
The second category is **SEC filings**, sourced from the SEC EDGAR full-text search system (EFTS). The `SECEdgarAdapter` in `services/adapters/filings_adapter.py` queries the `/LATEST/search-index` endpoint for 8-K, 10-Q, 10-K, and other form types associated with a company's ticker or CIK number. Unlike the Polygon endpoints, EDGAR is a public API that requires no key — only a descriptive `User-Agent` header per the SEC's fair-access policy. The adapter deduplicates results by accession number (`adsh`), filters out non-primary documents like XML fragments and graphics, and constructs the SEC EDGAR filing index URL for each hit so downstream services can fetch the full document.
|
||||
|
||||
The third category is **market data**, also sourced from Polygon.io. The `PolygonMarketAdapter` in `services/adapters/market_adapter.py` supports multiple endpoints: previous-day aggregate bars (`/v2/aggs/ticker/{ticker}/prev`), range bars for custom date windows, intraday hourly bars, grouped daily bars that return data for all tickers in a single call (`/v2/aggs/grouped/locale/us/market/stocks/{date}`), and ticker detail lookups. Market data follows a different path than textual content — it does not pass through the Parser or Extractor, since the structured numeric data is already in a usable form.
|
||||
|
||||
The fourth category is **macro and geopolitical news**, fetched by the `MacroNewsAdapter` in `services/adapters/macro_news_adapter.py`. Unlike the other three categories, macro news is not company-specific. These sources have `source_type='macro_news'` in the `sources` database table and may have a `NULL` `company_id`. The adapter fetches from a configurable HTTP endpoint (typically the Polygon news API filtered for broad market topics) and returns articles that describe global events — trade policy shifts, central bank decisions, geopolitical conflicts — rather than company-specific developments. Macro news articles are eventually classified by the Global Event Classifier agent and routed through a separate queue, as described in [Page 2](02-ai-agent-processing-and-extraction.md).
|
||||
|
||||
All four adapter classes inherit from `BaseAdapter` defined in `services/adapters/base.py` and return an `AdapterResult` dataclass containing the raw payload bytes, a SHA-256 content hash, a list of parsed item dicts, HTTP metadata (status code, response time), and an error field that is `None` on success. This uniform interface allows the Ingestion Worker to handle all source types through a single dispatch mechanism.
|
||||
|
||||
---
|
||||
|
||||
## The Scheduler: Orchestrating Ingestion Cycles
|
||||
|
||||
The Scheduler (`services/scheduler/app.py`) is the heartbeat of the ingestion pipeline. It runs a continuous loop that ticks every 15 seconds (`SCHEDULER_TICK = 15`), and on each tick it evaluates which sources are due for their next fetch. The Scheduler does not fetch data itself — it enqueues jobs onto the `stonks:queue:ingestion` Redis list for the Ingestion Worker to process.
|
||||
|
||||
Each source type has a default polling cadence defined in the `DEFAULT_CADENCES` dictionary:
|
||||
|
||||
| Source Type | Default Cadence |
|
||||
|---------------|-----------------|
|
||||
| `market_api` | 300 seconds |
|
||||
| `news_api` | 300 seconds |
|
||||
| `filings_api` | 3,600 seconds |
|
||||
| `macro_news` | 600 seconds |
|
||||
| `web_scrape` | 1,800 seconds |
|
||||
| `broker` | 30 seconds |
|
||||
|
||||
Individual sources can override their cadence via the `polling_interval_seconds` field in their `config` JSONB column in the `sources` table. The `get_cadence_for_source()` function checks for this override first, falling back to the default if none is set, and enforces a minimum interval of 10 seconds.
|
||||
|
||||
The Scheduler determines whether a source is due by calling `is_source_due()`, which considers several conditions. If a source has never run before (no entry in the `ingestion_runs` table), it is immediately due. If the last run failed, the Scheduler respects an exponential backoff computed by `compute_backoff()`: the delay starts at 60 seconds (`DEFAULT_BACKOFF_BASE`) and doubles with each retry up to a maximum of 3,600 seconds (`MAX_BACKOFF`). If a source has failed 10 consecutive times (`MAX_RETRY_COUNT`), the Scheduler stops scheduling it entirely until an operator manually resets the retry state. If the last run is still marked as `running`, the source is skipped to prevent double-scheduling. Otherwise, the Scheduler checks whether enough time has elapsed since the last completed run based on the source's cadence.
|
||||
|
||||
Rate limiting adds another layer of protection. The `check_rate_limit()` function enforces two constraints. First, each source type has a per-type limit defined in `DEFAULT_RATE_LIMITS` — for example, `market_api` and `news_api` are each capped at 20 requests per minute, while `filings_api` and `macro_news` are capped at 10. Second, because `market_api` and `news_api` both use the same Polygon.io API key, a global Polygon rate limit of 45 requests per minute (`POLYGON_GLOBAL_RATE_LIMIT`) is enforced across both types combined. Rate limit state is tracked in Redis using keys of the form `stonks:ratelimit:{source_type}:{window}`, where the window is a minute-granularity timestamp. If a source type exceeds its limit, the Scheduler logs a warning and skips that source for the current tick.
|
||||
|
||||
The Scheduler handles three categories of sources in each cycle. First, it fetches all active company-specific sources (excluding `macro_news`) by joining the `sources` and `companies` tables. Second, it fetches active macro news sources separately, since these may not have a `company_id`. Third, it fetches global market sources — those with `source_type='market_api'` and `company_id IS NULL` — which represent endpoints like the grouped daily bars that return data for all tickers in a single API call. For intraday bar sources, the Scheduler expands a single global source into per-ticker jobs for every active company.
|
||||
|
||||
Each enqueued job payload includes the `source_id`, `company_id`, `ticker`, `legal_name`, `source_type`, `source_name`, `config`, `credibility_score`, a list of company `aliases` (fetched from the `company_aliases` table), and a `scheduled_at` timestamp. The job is pushed onto `stonks:queue:ingestion` via Redis `RPUSH`.
|
||||
|
||||
Beyond scheduling, the Scheduler also performs periodic maintenance. Every ~20 cycles (~5 minutes), it runs `recover_stale_documents()` to re-enqueue documents that have been stuck in `parsed` status for longer than 240 minutes — a safety net for cases where Redis loses queue entries due to pod restarts or OOM events. Every ~40 cycles (~10 minutes), it runs `retry_failed_extractions()` to give documents in `extraction_failed` status another chance, resetting them to `parsed` and deleting the failed `document_intelligence` row so the Extractor treats them as fresh. Every ~100 cycles (~25 minutes), it runs `cleanup_all_tables()` to enforce retention policies across tables like `competitive_signal_records` (30 days), `ingestion_runs` (14 days), and `trading_decisions` (90 days).
|
||||
|
||||
For more detail on the Scheduler's configuration and operational behavior, see the [Services Reference](../services.md).
|
||||
|
||||
---
|
||||
|
||||
## The Ingestion Worker: Adapter Dispatch and Persistence
|
||||
|
||||
The Ingestion Worker (`services/ingestion/worker.py`) is a long-running process that continuously pops jobs from the `stonks:queue:ingestion` Redis list and processes them. On startup, it initializes one instance of each adapter class and stores them in a dispatch dictionary keyed by `source_type`:
|
||||
|
||||
```
|
||||
adapters = {
|
||||
"market_api": PolygonMarketAdapter(...),
|
||||
"news_api": PolygonNewsAdapter(...),
|
||||
"filings_api": SECEdgarAdapter(),
|
||||
"web_scrape": WebScrapeAdapter(),
|
||||
"broker": AlpacaBrokerAdapter(...),
|
||||
"macro_news": MacroNewsAdapter(...),
|
||||
}
|
||||
```
|
||||
|
||||
When a job arrives, the `process_job()` function looks up the appropriate adapter by `source_type` and calls its `fetch()` method with the ticker and source config. Before fetching, it records a new row in the `ingestion_runs` table with status `running`. If the adapter returns an error, the worker calls `record_retrieval_failure()` to update the run status and increment the source's retry counter with exponential backoff timing.
|
||||
|
||||
On a successful fetch, the worker performs several steps in sequence. First, it uploads the raw payload to MinIO via `upload_raw_artifact()` in `services/shared/storage.py`. The target bucket is determined by the source type through the `SOURCE_BUCKET_MAP`: `market_api` payloads go to `stonks-raw-market`, `news_api` and `macro_news` payloads go to `stonks-raw-news`, and `filings_api` payloads go to `stonks-raw-filings`. Objects are stored under a path that encodes the source type, ticker, date hierarchy, and document ID — for example, `news_api/AAPL/2025/01/15/{run_id}/raw.json`.
|
||||
|
||||
---
|
||||
|
||||
## Content Deduplication via Redis
|
||||
|
||||
After storing the raw artifact, the Ingestion Worker checks for duplicate content. Deduplication operates at two levels.
|
||||
|
||||
At the payload level, the worker checks the overall `content_hash` (a SHA-256 digest of the raw API response) against Redis. The key pattern is `stonks:dedupe:{content_hash}` with a 24-hour TTL (86,400 seconds). If the hash is already present, the entire payload is skipped — the `ingestion_runs` row is marked as completed with `items_new=0`, and no downstream jobs are enqueued. If the hash is new, the worker sets the marker in Redis so future fetches of identical content are caught.
|
||||
|
||||
At the individual item level, for source types other than `market_api` and `broker`, the worker calls `dedupe_items()` from `services/shared/dedupe.py`. This function checks each item against a layered deduplication strategy. The fast path checks Redis for both content-hash markers (`stonks:dedupe:{hash}`) and canonical-URL markers (`stonks:dedupe:url:{url_hash}`), both with 24-hour TTLs. If the Redis check misses, the function falls back to PostgreSQL, querying the `documents` table by `content_hash` or `canonical_url` for durable cross-source matching. When a duplicate is found through the PostgreSQL fallback, the function warms the Redis cache so subsequent checks are fast.
|
||||
|
||||
Items identified as duplicates are not discarded entirely. If the duplicate document was originally ingested for a different company, the worker creates a cross-source mention link in the `document_company_mentions` table via `persist_document_company_mention()`. This ensures that a news article mentioning both Apple and Microsoft is linked to both companies even if it was first ingested through Apple's news source.
|
||||
|
||||
New (non-duplicate) items are persisted to PostgreSQL through `persist_ingestion_items()` in `services/shared/metadata.py`, which inserts rows into the `documents` table and records company mentions in `document_company_mentions`. Each new document ID is then pushed onto `stonks:queue:parsing` for the Parser to process. After persistence, the worker calls `mark_as_seen()` to set Redis dedupe markers for both the content hash and canonical URL of each new item, ensuring that the next fetch cycle's deduplication checks are fast.
|
||||
|
||||
On successful completion, the worker updates the `ingestion_runs` row with the final counts (`items_fetched`, `items_new`) and calls `reset_source_retry_state()` to clear any accumulated backoff from previous failures. For news-type sources (`news_api` and `macro_news`), the worker also updates the source's `config` JSONB column with the latest `published_utc` value, so the next fetch only retrieves newer articles.
|
||||
|
||||
---
|
||||
|
||||
## The Parser: Normalization, Quality Scoring, and Routing
|
||||
|
||||
Documents that pass through ingestion arrive on the `stonks:queue:parsing` Redis list as JSON payloads containing a `document_id`, `ticker`, and `source_type`. The Parser Worker (`services/parser/worker.py`) pops these jobs and transforms raw HTML or text into normalized, quality-scored documents ready for AI extraction.
|
||||
|
||||
The parsing pipeline begins with HTML fetching. If the document has a URL (looked up from the `documents` table if not present in the job payload), the worker calls `fetch_html()` to retrieve the page content. SEC EDGAR URLs receive a specialized `User-Agent` header to comply with the SEC's fair-access policy. The raw HTML is then passed to `parse_html()` in `services/parser/html_parser.py`, which runs a multi-stage extraction pipeline.
|
||||
|
||||
The HTML parser first strips non-content tags — `script`, `style`, `nav`, `footer`, `header`, `aside`, `iframe`, and others — and removes boilerplate containers identified by CSS class or ID patterns (sidebars, ad slots, newsletter signups, social share bars, and similar UI elements). It then searches for the article body using a priority list of semantic selectors (`article`, `[role='main']`, `.article-body`, `.post-content`, and others). If no semantic match is found, it falls back to text-density scoring across candidate `div`, `section`, and `td` elements, selecting the block with the highest composite score based on text density, link density, paragraph count, and word count. The extracted text undergoes further cleaning: regex-based removal of residual boilerplate phrases (copyright notices, "subscribe to our newsletter" prompts, "share this article" fragments), removal of short orphan lines that are likely UI fragments, detection and collapse of repeated template blocks, and whitespace normalization.
|
||||
|
||||
Metadata extraction pulls the document title (from `og:title` or `<title>`), author, publisher (from `og:site_name` or hostname), publication date (from `article:published_time` or JSON-LD `datePublished`), canonical URL, language, description, and keywords from the HTML head elements.
|
||||
|
||||
If the parsed body text is shorter than 500 characters, the worker attempts to enrich it by reading the raw API payload from MinIO and extracting the Polygon article description, keywords, and author fields for the matching article. This enrichment step ensures that even articles with minimal scrapeable HTML still have enough textual content for meaningful AI extraction.
|
||||
|
||||
Quality scoring is performed by `score_parse_quality()` in `services/parser/html_parser.py`, which evaluates six weighted signals to produce a composite score between 0 and 0.95:
|
||||
|
||||
| Signal | Weight | What It Measures |
|
||||
|--------------------|--------|-----------------------------------------------------------------|
|
||||
| `word_count` | 0.30 | Length of extracted text (thresholds at 20, 50, 150, 300 words) |
|
||||
| `body_found` | 0.20 | Whether a semantic article body element was located |
|
||||
| `diversity` | 0.15 | Vocabulary richness (unique words / total words) |
|
||||
| `sentence` | 0.15 | Presence of proper sentence structure (terminal punctuation) |
|
||||
| `paragraph` | 0.10 | Multi-paragraph structure (blocks separated by blank lines) |
|
||||
| `metadata` | 0.10 | Presence of title, author, publisher, and publication date |
|
||||
|
||||
The composite score maps to a confidence label: scores below 0.35 are labeled `low`, scores between 0.35 and 0.65 are `medium`, and scores 0.65 and above are `high`. Documents with `low` confidence are marked with status `low_quality` in the `documents` table and are not enqueued for extraction — they are effectively filtered out of the pipeline at this stage.
|
||||
|
||||
Company mention detection runs next. The worker fetches all known aliases from the `company_aliases` table (plus tickers and legal names from the `companies` table) and calls `detect_company_mentions()` in `services/parser/html_parser.py`. The matching strategy varies by alias length: one-to-two character aliases use case-sensitive word-boundary matching to avoid false positives (the letter "A" should not match every occurrence of the word "a"), three-to-four character aliases use case-insensitive word-boundary matching (standard ticker format), and aliases of five or more characters use case-insensitive substring matching (company names and brands). Confidence scores vary by alias type: ticker matches receive 0.9, legal name matches 0.85, general aliases 0.7, and brand matches 0.6. Multiple alias hits for the same company are deduplicated, keeping the highest-confidence match and summing match counts. Detected mentions are persisted to the `document_company_mentions` table.
|
||||
|
||||
The normalized text and a structured parser output JSON (containing all metadata, quality signals, warnings, outbound links, tags, and mentions) are uploaded to the `stonks-normalized` MinIO bucket. The `documents` row is updated with the normalized storage reference, parser output reference, quality score, and confidence level.
|
||||
|
||||
Finally, the Parser makes a routing decision. If the document's `document_type` is `macro_event`, it is pushed onto `stonks:queue:macro_classification` for the Global Event Classifier agent. All other documents are pushed onto `stonks:queue:extraction` for the Document Intelligence Extractor agent. Both queues feed into the Extractor service described in [Page 2](02-ai-agent-processing-and-extraction.md). The job payload includes the `document_id`, `ticker`, and the first 32,000 characters of the normalized text, giving the downstream agent immediate access to the content without needing to fetch it from MinIO.
|
||||
|
||||
For additional detail on queue topology and data store layout, see the [Data Pipeline Architecture](../architecture-data-pipeline.md) documentation.
|
||||
|
||||
---
|
||||
|
||||
## What Comes Next
|
||||
|
||||
At this point, raw data has been fetched from four external sources, deduplicated, stored in MinIO, parsed into normalized text, scored for quality, tagged with company mentions, and routed to the appropriate extraction queue. The documents sitting on `stonks:queue:extraction` and `stonks:queue:macro_classification` are clean, quality-filtered, and ready for AI processing. [Page 2 — AI Agent Processing and Structured Extraction](02-ai-agent-processing-and-extraction.md) picks up the story from here, explaining how the Document Intelligence Extractor and Global Event Classifier agents use LLM inference to transform these normalized documents into the structured JSON intelligence that feeds the rest of the pipeline.
|
||||
@@ -0,0 +1,164 @@
|
||||
# Page 2 — AI Agent Processing and Structured Extraction
|
||||
|
||||
Documents that arrive on the `stonks:queue:extraction` and `stonks:queue:macro_classification` Redis queues are clean, quality-filtered, and normalized — but they are still unstructured text. The job of the Extractor service is to transform that text into structured JSON intelligence that the rest of the pipeline can reason about quantitatively. Two AI agents share this responsibility: the Document Intelligence Extractor handles company-specific news, filings, and transcripts, while the Global Event Classifier handles macro-level geopolitical and economic events. Both agents run through the same Ollama-based inference infrastructure, share a common JSON repair pipeline, and persist their results to PostgreSQL and MinIO for downstream consumption and audit.
|
||||
|
||||
This page explains how each agent works, what schemas they produce, how the system validates and repairs LLM output, how runtime configuration is resolved from the database, and how the final structured records are persisted. For a visual overview of the full flow from ingestion through extraction, see the [Ingestion to Extraction Flow diagram](diagrams/ingestion-to-extraction-flow.md). For reference-level detail on agent configuration and the variant management API, see the [AI Agents Guide](../ai-agents.md).
|
||||
|
||||
---
|
||||
|
||||
## The Document Intelligence Extractor
|
||||
|
||||
The Document Intelligence Extractor is the primary AI agent in the pipeline. Registered under the slug `document-extractor` in the `ai_agents` database table, it processes every non-macro document that passes through the Parser — news articles, SEC filings, earnings transcripts, and press releases. Its purpose is to read a normalized document and produce a structured JSON object that captures the document's summary, the companies it affects, the sentiment and impact for each company, the catalysts driving that impact, and the evidence supporting the analysis.
|
||||
|
||||
The entry point is `services/extractor/main.py`, which runs a continuous worker loop polling the `stonks:queue:extraction` Redis list. When a job arrives, the worker extracts the `document_id`, `ticker`, and `text` fields from the JSON payload. If the job payload does not include the document text directly, the worker fetches it from MinIO using the `normalized_storage_ref` stored in the `documents` table — the Parser uploaded the normalized text to the `stonks-normalized` bucket during the previous pipeline stage (see [Page 1](01-data-ingestion-and-preparation.md)).
|
||||
|
||||
The actual LLM inference is handled by `OllamaClient` in `services/extractor/client.py`. The client sends the document to a local Ollama instance via the `/api/chat` HTTP endpoint with `stream=False` and `think=False`. The `think=False` flag is a deliberate performance choice — it disables the model's chain-of-thought reasoning phase, which would otherwise add two to four minutes of latency per document. The client does not use Ollama's `format` parameter for structured output because of a known Ollama bug (#14645) where the format constraint is silently ignored when `think=False` on qwen3.5 models. Instead, the system relies on prompt engineering to produce JSON and repairs any syntax issues after the fact.
|
||||
|
||||
The prompt sent to the model has two parts. The system prompt, defined in `services/extractor/prompts.py`, establishes the model's role as a financial document analyst and sets strict output rules: return only a single JSON object, no markdown fences, no explanation text, every schema field is required, use `"other"` for `catalyst_type` when unsure, keep evidence spans under 20 words, and limit key facts to three to five items. The user prompt, built by `build_extraction_prompt()` in the same module, provides the document text along with document-type-specific guidance. Four guidance variants exist — one each for articles, filings, transcripts, and press releases — each calibrated to the conventions and biases of that document type. For example, the filing guidance instructs the model to preserve the precise legal language of SEC documents, while the press release guidance warns that sentiment may be biased positive and directs the model to focus on concrete metrics rather than marketing language.
|
||||
|
||||
The user prompt also includes a list of all tracked tickers from the `companies` table, along with rules for how the model should use them. If a tracked ticker appears verbatim in the text, the model must include it in the output with at least one evidence span. If the article discusses a sector or theme that clearly affects a tracked company (oil prices affecting XOM, AI chip demand affecting NVDA), the model should include that company as well. The model is explicitly told not to invent tickers that are not in the provided list. Documents longer than 8,000 characters are truncated before being included in the prompt, with a `[... truncated for extraction ...]` marker appended.
|
||||
|
||||
The `OllamaClient` also supports a `context_window` override via the Ollama `num_ctx` option, which can be configured per agent variant through the `AgentConfigResolver` mechanism described later in this page.
|
||||
|
||||
---
|
||||
|
||||
## The ExtractionResult Schema
|
||||
|
||||
The structured output that the Document Intelligence Extractor produces is defined by the `ExtractionResult` Pydantic model in `services/extractor/schemas.py`. Every field is required — the model has no defaults — so the generated JSON schema forces the LLM to produce every field explicitly. The top-level fields are:
|
||||
|
||||
**`summary`** — a concise one-to-three sentence summary of the document's main point. This becomes the human-readable description stored in the `document_intelligence` table.
|
||||
|
||||
**`companies`** — an array of `CompanyExtractionItem` objects, one per affected company. Each company entry contains:
|
||||
|
||||
- `ticker` — the stock ticker symbol (validated against a regex pattern of one to five uppercase letters).
|
||||
- `company_name` — the full company name as referenced in the document.
|
||||
- `relevance` — a float between 0.0 and 1.0 indicating how relevant the document is to this company, where 0 means tangential and 1 means the company is the primary subject.
|
||||
- `sentiment` — one of `positive`, `negative`, `neutral`, or `mixed`, representing the overall sentiment toward this company in the document.
|
||||
- `impact_score` — a float between 0.0 and 1.0 estimating the magnitude of impact, where 0 is negligible and 1 is highly material.
|
||||
- `impact_horizon` — one of `intraday`, `1d`, `1d_7d`, `1d_30d`, `30d_90d`, or `90d_plus`, indicating the expected timeframe over which the impact will play out.
|
||||
- `catalyst_type` — exactly one of `earnings`, `product`, `legal`, `macro`, `supply_chain`, `m_and_a`, `rating_change`, or `other`. The prompt instructs the model to use `other` when none of the specific categories fit.
|
||||
- `key_facts` — a list of facts explicitly stated in the document. The prompt emphasizes that the model must not infer or fabricate facts.
|
||||
- `risks` — a list of risks explicitly mentioned in the document.
|
||||
- `evidence_spans` — short verbatim quotes from the document supporting the analysis. The prompt requests these be kept under 20 words each.
|
||||
|
||||
**`macro_themes`** — a list of broad economic or market themes mentioned in the document, such as `rates`, `inflation`, or `ai_capex`.
|
||||
|
||||
**`novelty_score`** — a float between 0.0 and 1.0 indicating how novel or surprising the information is. Routine earnings reports score low; unexpected regulatory actions score high. This value feeds into the novelty bonus component of the signal weighting formula described in [Page 3](03-signal-scoring-and-weighted-signals.md).
|
||||
|
||||
**`confidence`** — a float between 0.0 and 1.0 representing the model's confidence in the accuracy of its extraction. Lower values indicate ambiguous or incomplete source text. This value becomes the confidence gate input for signal scoring.
|
||||
|
||||
**`extraction_warnings`** — a list of issues encountered during extraction, such as `ambiguous_ticker`, `incomplete_text`, or `low_confidence`. These warnings are persisted alongside the intelligence record for operational monitoring.
|
||||
|
||||
The JSON schema is generated programmatically from the Pydantic models via `generate_json_schema()` in `services/extractor/schemas.py`, which calls Pydantic's `model_json_schema()` and then inlines all `$defs` references so the schema is self-contained and Ollama-friendly.
|
||||
|
||||
---
|
||||
|
||||
## The Global Event Classifier
|
||||
|
||||
Not all documents describe company-specific developments. Macro news articles — those tagged with `document_type='macro_event'` by the Parser — describe events that affect entire markets, sectors, or economies: trade wars, central bank rate decisions, commodity supply disruptions, geopolitical conflicts. These documents are routed to the `stonks:queue:macro_classification` Redis queue and processed by the Global Event Classifier agent, registered under the slug `event-classifier` in the `ai_agents` table.
|
||||
|
||||
The classifier is implemented in `services/extractor/event_classifier.py`. When the extractor worker in `services/extractor/main.py` pops a job and determines that the document type is `macro_event` (either because the job came from the macro queue or because the `documents` table records it as such), it routes the document to `_process_macro_classification()` instead of the standard extraction pipeline. This function calls `classify_global_event()`, which builds a dedicated prompt, sends it to Ollama through the same `OllamaClient` infrastructure, parses the response, and persists the result.
|
||||
|
||||
The classifier's system prompt is distinct from the extractor's. It establishes the model's role as a macro-level news classifier and includes explicit anti-hallucination rules that are critical to preventing the classifier from overreaching. The prompt states that the model should only classify articles about macro events that affect entire markets, sectors, or economies — trade wars, interest rate changes, commodity supply disruptions, regulatory changes, geopolitical conflicts, natural disasters. It explicitly lists what should not be classified as macro events: individual company earnings, lawsuits against a single company, single-company management changes, individual stock analysis, company-specific debt or bankruptcy, and product launches by one company. For these company-specific articles that were incorrectly routed, the model is instructed to set severity to `"low"`, confidence below 0.3, and leave the `affected_regions`, `affected_sectors`, and `affected_commodities` arrays empty.
|
||||
|
||||
The user prompt, built by `build_event_classification_prompt()`, reinforces these anti-hallucination rules and provides additional guidance. It instructs the model to only extract facts explicitly stated in the text, to set confidence below 0.4 for vague or speculative content, to distinguish announced policy from rumored policy, and to reserve `"critical"` severity for events affecting multiple countries or entire global markets. Articles longer than 6,000 characters are truncated before inclusion in the prompt.
|
||||
|
||||
The output schema is the `GlobalEvent` dataclass, which contains:
|
||||
|
||||
- `event_types` — a list of impact type strings, drawn from a fixed set: `supply_disruption`, `demand_shift`, `cost_increase`, `regulatory_pressure`, `currency_impact`, `commodity_shock`, `trade_barrier`, and `geopolitical_risk`. The model is instructed to include all applicable types rather than collapsing to a single category.
|
||||
- `severity` — one of `low`, `moderate`, `high`, or `critical`.
|
||||
- `affected_regions` — ISO 3166-1 alpha-2 country codes or region names (e.g., `US`, `CN`, `EU`, `GB`, `JP`). Only regions explicitly mentioned or clearly implied should be included.
|
||||
- `affected_sectors` — GICS sector identifiers such as `Energy`, `Financials`, `Information Technology`, or `Industrials`.
|
||||
- `affected_commodities` — commodity identifiers like `crude_oil`, `natural_gas`, `gold`, `copper`, `wheat`, `lithium`, or `semiconductors`. An empty list if no commodities are directly affected.
|
||||
- `summary` — a one-to-three sentence summary of the event and its market implications.
|
||||
- `key_facts` — facts explicitly stated in the article, limited to three to five items.
|
||||
- `estimated_duration` — one of `short_term` (days to weeks), `medium_term` (weeks to months), or `long_term` (months to years).
|
||||
- `confidence` — a float between 0.0 and 1.0, clamped during parsing.
|
||||
|
||||
Each `GlobalEvent` also carries a `model_metadata` object recording the provider (`ollama`), model name, prompt version (`event-classification-v1`), and schema version (`1.0.0`), plus a `source_document_id` linking back to the originating document.
|
||||
|
||||
After a successful classification, the system computes macro impact records for all tracked companies using the exposure-based interpolation engine in `services/aggregation/interpolation.py`. Each company's exposure profile — geographic revenue mix, supply chain regions, key input commodities, regulatory jurisdictions, and market position tier — determines how much a given macro event affects that company. Companies with non-zero macro impact scores get `macro_impact_records` rows persisted to PostgreSQL, and aggregation jobs are enqueued to `stonks:queue:aggregation` for each affected ticker. The extractor worker tracks consecutive macro classification failures and emits a critical-level alert after three consecutive failures, continuing with company-only signals in the meantime.
|
||||
|
||||
---
|
||||
|
||||
## The JSON Repair Pipeline
|
||||
|
||||
LLM output is inherently unreliable at the syntactic level. Models sometimes wrap JSON in markdown fences, produce trailing commas, leave strings unterminated, or truncate output mid-object when they hit token limits. The extractor addresses this with a three-stage JSON repair pipeline implemented across `services/extractor/client.py` and `services/extractor/schemas.py`.
|
||||
|
||||
The first stage is a direct `json.loads()` call. If the raw model output is already valid JSON, no repair is needed and the pipeline moves straight to validation. This is the fast path for well-behaved model responses.
|
||||
|
||||
The second stage strips markdown fences. Models frequently wrap their output in `` ```json ... ``` `` blocks despite being told not to. The `_strip_markdown_fences()` function in `services/extractor/client.py` uses a regex to detect and remove these wrappers before attempting another parse.
|
||||
|
||||
The third stage invokes the `json-repair` library as a fallback. The `_repair_json()` function in `services/extractor/client.py` calls `repair_json()` with `return_objects=False` to get a repaired JSON string. This library handles a wide range of common LLM JSON errors — trailing commas, missing quotes, unescaped characters — that would otherwise require custom repair logic.
|
||||
|
||||
The `services/extractor/schemas.py` module contains an additional layer of repair logic in its own `_repair_json()` function, which handles cases that the library might miss. It strips non-JSON prefixes (models sometimes prepend explanatory text before the opening brace), removes control characters that break parsing, fixes trailing commas before closing brackets, and as a last resort calls `_repair_truncated_json()` — a state-machine parser that walks the string tracking bracket depth and string state, then appends the necessary closing tokens to complete a truncated JSON object.
|
||||
|
||||
For the Global Event Classifier, the `_parse_classification_response()` function in `services/extractor/event_classifier.py` reuses the same `_strip_markdown_fences()` and `_repair_json()` functions from the client module, and additionally handles the case where the model wraps the output object in a single-element list — a quirk observed with some model configurations.
|
||||
|
||||
---
|
||||
|
||||
## Structural and Semantic Validation
|
||||
|
||||
Repairing JSON syntax is only the first step. The `validate_extraction()` function in `services/extractor/schemas.py` performs both structural and semantic validation on the parsed output, and the distinction between the two is important for understanding the retry logic.
|
||||
|
||||
Structural validation begins with normalization. The `_normalize_extraction_data()` function fills in missing top-level fields with sensible defaults (empty summary, empty companies array, 0.5 novelty score, 0.3 confidence), clamps numeric fields to the [0.0, 1.0] range, and normalizes per-company fields. Catalyst types that the model produces as free-text alternatives — `"strategic pivot"`, `"acquisition"`, `"lawsuit"`, `"inflation"`, `"launch"` — are mapped to their canonical enum values through a comprehensive alias dictionary. Impact horizons like `"long-term"`, `"short"`, `"immediate"`, or `"near-term"` are similarly mapped to the valid set (`intraday`, `1d`, `1d_7d`, `1d_30d`, `30d_90d`, `90d_plus`). After normalization, the data is validated against the `ExtractionResult` Pydantic model, which enforces type constraints, enum membership, and range bounds.
|
||||
|
||||
Semantic validation catches issues that are structurally valid but logically suspect. The `_semantic_checks()` function runs a series of cross-field consistency checks that produce either errors (which trigger a retry) or warnings (which are logged but do not block acceptance). Semantic errors include duplicate tickers across company entries, missing ticker fields, and invalid impact horizon values. Semantic warnings include empty summaries, low confidence with companies present, invalid ticker formats (not matching the one-to-five uppercase letter pattern), missing evidence spans, evidence spans that are too short (under 8 characters) or too long (over 500 characters), high impact scores with no supporting key facts, very low relevance scores, and strong sentiment paired with negligible impact scores.
|
||||
|
||||
When the original document text is available, the validator also performs an evidence grounding check: each evidence span is searched for in the source text (case-insensitive), and spans not found in the document are flagged with a warning. This helps detect hallucinated evidence — quotes the model fabricated rather than extracted from the actual text.
|
||||
|
||||
If validation produces any semantic errors, the `ValidationReport` is marked as invalid and the `OllamaClient` retry loop treats it as a failed attempt. The retry logic uses exponential backoff with configurable parameters: a base delay (default from `OllamaConfig`), a multiplier applied on each retry, and a maximum delay cap. The number of retries is configurable per agent through the `max_retries` field in the `ai_agents` or `agent_variants` table. Non-retryable errors — HTTP 400, 401, 403, 404, and 422 responses from Ollama — short-circuit the retry loop immediately, since these indicate a problem with the request itself rather than a transient model failure.
|
||||
|
||||
Every attempt, whether successful or not, is recorded in an `ExtractionAttempt` dataclass that captures the raw output, validation report, error description, duration in milliseconds, model name, and whether the error was retryable. The full list of attempts is preserved in the `ExtractionResponse` for audit purposes and uploaded to MinIO by the persistence layer.
|
||||
|
||||
---
|
||||
|
||||
## The AgentConfigResolver: Hot-Swapping Models and Prompts
|
||||
|
||||
Both the Document Intelligence Extractor and the Global Event Classifier resolve their runtime configuration through the `AgentConfigResolver` in `services/shared/agent_config.py`. This mechanism allows operators to change models, prompts, timeouts, retry counts, and token budgets without restarting any service — changes take effect within 60 seconds.
|
||||
|
||||
The resolver works by querying the `ai_agents` and `agent_variants` PostgreSQL tables with a single SQL statement that uses `COALESCE` to prefer variant values over base agent values. When the extractor worker starts, it creates an `AgentConfigResolver` instance with a 60-second TTL cache and calls `resolver.resolve("document-extractor")` to get the active configuration. If an active variant exists for the agent (enforced by a unique partial index on `agent_variants` that allows at most one active variant per agent), the variant's `model_name`, `system_prompt`, `temperature`, `max_tokens`, `context_window`, `timeout_seconds`, and `max_retries` override the base agent's values wherever the variant provides a non-NULL value. If no active variant exists, the base agent's configuration is used. If the database query fails entirely, the resolver returns `None` and the worker falls back to environment-variable-based `OllamaConfig` defaults.
|
||||
|
||||
The resolved configuration is captured in a `ResolvedAgentConfig` frozen dataclass that includes the `agent_id`, `variant_id` (if any), `model_provider`, `model_name`, `system_prompt`, `user_prompt_template`, `prompt_version`, `temperature`, `max_tokens`, `context_window`, `input_token_limit`, `token_budget`, `timeout_seconds`, and `max_retries`. The extractor worker uses this to build an `OllamaConfig` that is passed to the `OllamaClient`.
|
||||
|
||||
The 60-second TTL cache means the resolver only hits the database once per minute per agent slug. Cache entries are keyed by slug and timestamped with `time.monotonic()`. When a cached entry expires, the next `resolve()` call re-queries the database and refreshes the cache. The `invalidate()` method can clear a single slug or the entire cache, though in practice the TTL-based expiry is sufficient for normal operations.
|
||||
|
||||
The extractor worker re-resolves its configuration every 100 jobs. If the resolved model name has changed (for example, because an operator activated a variant that uses a different model), the worker closes the old `OllamaClient` and creates a new one with the updated configuration. The event classifier is resolved separately and can use a different model than the document extractor — the worker maintains two independent `OllamaClient` instances when the models differ.
|
||||
|
||||
Token budget enforcement adds another layer of control. If a variant specifies a `token_budget` (total tokens per hour), the worker checks the `agent_performance_log` table before each invocation to see whether the budget has been exceeded. If so, the invocation is skipped entirely. Input token limits work similarly: if a variant sets an `input_token_limit`, the worker truncates the document text to approximately that many tokens (estimated at four characters per token) before sending it to the model.
|
||||
|
||||
For a complete guide to creating variants, activating them, and comparing their performance, see the [AI Agents Guide](../ai-agents.md).
|
||||
|
||||
---
|
||||
|
||||
## Persistence: From Extraction to Database
|
||||
|
||||
Once the LLM produces a valid extraction and it passes validation, the `persist_extraction()` function in `services/extractor/worker.py` orchestrates the full persistence pipeline. This function writes to both MinIO (for audit) and PostgreSQL (for downstream consumption), ensuring that every extraction attempt is fully traceable.
|
||||
|
||||
The MinIO persistence layer uploads four artifacts per extraction, all stored under date-partitioned paths in dedicated buckets. The prompt metadata (prompt version, schema version, model name) goes to `stonks-llm-prompts`. The raw model output for every attempt — including failed ones — goes to `stonks-llm-results`, preserving the full retry history. A validation report summarizing the final attempt's status, errors, and warnings is uploaded alongside the raw output. On success, the final parsed intelligence object (the `ExtractionResult` serialized as JSON) is uploaded to a separate path for easy retrieval.
|
||||
|
||||
The PostgreSQL persistence writes to two tables. The `document_intelligence` table receives one row per document, containing the summary, macro themes, novelty score, source credibility, extraction warnings, confidence, model metadata (provider, model name, prompt version, schema version), references to the MinIO artifacts (raw output ref, prompt ref), validation status (`valid` or `failed`), validation errors, and retry count. This row is the authoritative record of what the AI extracted from the document.
|
||||
|
||||
The `document_impact_records` table receives one row per company mention within the extraction. Each impact record is linked to the parent `document_intelligence` row via `intelligence_id` and to the `companies` table via `company_id`. The record captures the ticker, relevance, sentiment, impact score, impact horizon, catalyst type, key facts, risks, and evidence spans for that specific company. The `company_id` is resolved from a ticker-to-UUID mapping that the worker maintains by querying the `companies` table (refreshed every 100 jobs). If a ticker in the extraction output does not match any tracked company, the impact record is skipped with a warning — the system only persists impact records for companies in its tracked universe.
|
||||
|
||||
After persisting the intelligence and impact records, the worker updates the document's status in the `documents` table to `extracted` (or `extraction_failed` if all retry attempts were exhausted). Even failed extractions get a `document_intelligence` row with `validation_status='failed'`, empty summary, zero confidence, and the accumulated error messages — this ensures the failure is visible in the database rather than silently lost.
|
||||
|
||||
Performance metrics are collected for every extraction via `collect_metrics()` in `services/extractor/metrics.py` and persisted to a metrics table. Prometheus counters and histograms track extraction attempts, duration, retries, confidence distribution, validation errors, and estimated token usage (input and output, estimated at four characters per token). When a resolved agent config is available, the worker also logs to the `agent_performance_log` table with variant attribution, enabling the A/B comparison queries described in the [AI Agents Guide](../ai-agents.md).
|
||||
|
||||
For the Global Event Classifier, persistence follows a parallel path. The prompt and raw output are uploaded to MinIO under an `event_classification/macro/` path prefix. The parsed `GlobalEvent` is persisted to the `global_events` PostgreSQL table, which stores the event types, severity, affected regions, affected sectors, affected commodities, summary, key facts, estimated duration, confidence, source document ID, and model metadata. Downstream, the macro interpolation engine computes `macro_impact_records` for each affected company and persists those as well.
|
||||
|
||||
---
|
||||
|
||||
## Enqueuing Aggregation Jobs
|
||||
|
||||
The final step in the extraction pipeline is to notify the downstream aggregation engine that new intelligence is available. After a successful document extraction, the worker pushes a job onto the `stonks:queue:aggregation` Redis list containing the ticker of the affected company. The aggregation engine (described in [Page 3](03-signal-scoring-and-weighted-signals.md)) will pick up this job and recompute the weighted signals and trend summaries for that ticker, incorporating the freshly extracted intelligence.
|
||||
|
||||
For macro events, the enqueue logic is more expansive. After the Global Event Classifier produces a `GlobalEvent` and the interpolation engine computes macro impact records, the worker enqueues an aggregation job for every ticker that received a non-zero macro impact score. A single macro event — say, a new tariff announcement affecting the Energy and Industrials sectors — can trigger aggregation recomputation for dozens of tickers simultaneously. The aggregation job payload includes both the `ticker` and the `macro_event_id`, so the aggregation engine knows to incorporate the new macro signals.
|
||||
|
||||
The worker alternates between the extraction and macro classification queues to prevent starvation: every third job is pulled from `stonks:queue:macro_classification`, with the remaining two-thirds from `stonks:queue:extraction`. If the preferred queue is empty, the worker falls back to the other queue, ensuring that neither pipeline stalls while the other has work available.
|
||||
|
||||
---
|
||||
|
||||
## What Comes Next
|
||||
|
||||
At this point, documents have been transformed from unstructured text into structured JSON intelligence — `ExtractionResult` objects for company-specific documents and `GlobalEvent` objects for macro news. These structured records are persisted in PostgreSQL and their tickers have been enqueued for aggregation. But raw extraction output is not yet actionable for trading decisions. The extraction tells us that a document is bearish for AAPL with an impact score of 0.7 and a confidence of 0.8, but it does not tell us how much weight that signal should carry relative to other signals about AAPL, or how it compares to signals from different sources, time periods, or market conditions. [Page 3 — Signal Scoring and the WeightedSignal Abstraction](03-signal-scoring-and-weighted-signals.md) picks up the story from here, explaining how the aggregation engine transforms these raw extraction outputs into weighted signals through confidence gating, recency decay, source credibility scoring, novelty bonuses, and market context multipliers.
|
||||
@@ -0,0 +1,210 @@
|
||||
# Page 3 — Signal Scoring and the WeightedSignal Abstraction
|
||||
|
||||
The extraction pipeline described in [Page 2](02-ai-agent-processing-and-extraction.md) produces structured intelligence records — `document_impact_records` for company-specific documents, `macro_impact_records` for global events, and `competitive_signal_records` for cross-company pattern propagation. Each record carries a sentiment, an impact score, a confidence value, and a publication timestamp. But these raw values are not directly comparable. A high-confidence extraction from a reputable source published ten minutes ago should carry far more weight than a low-confidence extraction from an unknown source published three weeks ago. A document that breaks genuinely novel information should matter more than one that rehashes yesterday's earnings call. And when the market is moving fast — high volatility, surging volume — fresh signals become even more critical.
|
||||
|
||||
The signal scoring layer in `services/aggregation/scoring.py` solves this problem by transforming each raw intelligence record into a `WeightedSignal` object: a document reference paired with a composite aggregation weight that encodes recency, credibility, novelty, confidence, and market conditions into a single number. This page explains how that weight is computed, how sentiment labels become numeric values, and how three independent signal layers — Company, Macro, and Competitive — each produce `WeightedSignal` objects that are concatenated into a unified list before the aggregation engine computes trend summaries. For a visual breakdown of the composite weight formula, see the [Weighted Signal Computation diagram](diagrams/weighted-signal-computation.md). For the full picture of how the three layers merge, see the [Three-Layer Signal Merging diagram](diagrams/three-layer-signal-merging.md).
|
||||
|
||||
---
|
||||
|
||||
## The WeightedSignal and SignalWeight Dataclasses
|
||||
|
||||
The core abstraction is the `WeightedSignal` dataclass, defined in `services/aggregation/scoring.py`. It pairs a document reference with the computed weight and the signal's sentiment and impact values:
|
||||
|
||||
- **`document_id`** — the UUID of the source document (for company and macro signals) or a synthetic identifier for pattern-derived signals (e.g., `pattern:AAPL:earnings:7d`).
|
||||
- **`weight`** — a `SignalWeight` object containing the component breakdown and the final combined score.
|
||||
- **`sentiment_value`** — a numeric sentiment value: `+1.0` for positive, `-1.0` for negative, `0.0` for neutral or mixed.
|
||||
- **`impact_score`** — the magnitude of impact, drawn from the extraction's per-company impact score for company signals, or scaled by a layer-specific weight multiplier for macro and competitive signals.
|
||||
|
||||
The `SignalWeight` dataclass captures the individual components that feed into the combined weight, making the scoring decision fully transparent and auditable:
|
||||
|
||||
- **`recency`** — the exponential decay weight based on document age.
|
||||
- **`credibility`** — the source credibility weight after clamping and exponentiation.
|
||||
- **`novelty_bonus`** — the additive bonus derived from the document's novelty score.
|
||||
- **`confidence_gate`** — either `1.0` (signal passes) or `0.0` (signal is gated out).
|
||||
- **`market_ctx_multiplier`** — a multiplicative boost from market conditions, always `>= 1.0`.
|
||||
- **`combined`** — the final composite weight used by the aggregation engine.
|
||||
|
||||
The `ScoringConfig` frozen dataclass holds all tunable parameters for the scoring functions — half-life hours per window, credibility bounds, novelty bonus cap, confidence floor, and market context thresholds. A module-level `DEFAULT_CONFIG` singleton provides the production defaults, but every scoring function accepts an optional `config` parameter so that tests and alternative configurations can override any parameter without modifying global state.
|
||||
|
||||
---
|
||||
|
||||
## The Composite Weight Formula
|
||||
|
||||
The `compute_signal_weight()` function in `services/aggregation/scoring.py` computes the combined weight for a single document signal. The formula is:
|
||||
|
||||
```
|
||||
combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier
|
||||
```
|
||||
|
||||
Each factor is computed independently and then multiplied together. This multiplicative structure means that any single factor can zero out the entire weight (the confidence gate) or amplify it (the market context multiplier), and the interaction between factors is naturally captured — a highly credible, very recent document with novel information in a volatile market receives the maximum possible weight, while a stale, low-credibility document with routine information receives a weight close to zero.
|
||||
|
||||
The following sections describe each component in detail.
|
||||
|
||||
---
|
||||
|
||||
## Confidence Gate
|
||||
|
||||
The confidence gate is the first and most decisive filter. If the extraction confidence for a document falls below the `confidence_floor` threshold — set to `0.2` in the default `ScoringConfig` — the gate evaluates to `0.0` and the entire combined weight becomes zero. The document is effectively excluded from aggregation. If the confidence meets or exceeds the threshold, the gate evaluates to `1.0` and has no further effect on the weight.
|
||||
|
||||
This binary gate exists because documents with very low extraction confidence are too unreliable to aggregate. A confidence of 0.15 typically means the LLM struggled to parse the document — perhaps the text was truncated, the language was ambiguous, or the document type was unusual. Including such signals would add noise rather than information. The threshold of 0.2 is deliberately low; it filters only the most unreliable extractions while allowing moderately confident signals to participate (their lower confidence is reflected through the credibility component instead).
|
||||
|
||||
---
|
||||
|
||||
## Recency Decay
|
||||
|
||||
The `recency_weight()` function computes an exponential decay based on how old a document is relative to the aggregation anchor time. The formula is:
|
||||
|
||||
```
|
||||
w = 2^(−age_hours / half_life)
|
||||
```
|
||||
|
||||
A document published exactly one half-life ago receives a recency weight of `0.5`. A document published two half-lives ago receives `0.25`, and so on. A document published at or after the reference time receives the maximum weight of `1.0`.
|
||||
|
||||
The half-life varies by trend window, reflecting the intuition that shorter windows need faster decay to stay responsive, while longer windows should give older documents more influence. The default half-lives, configured in `ScoringConfig.half_life_hours`, are:
|
||||
|
||||
| Window | Half-Life |
|
||||
|--------|-----------|
|
||||
| `intraday` | 2 hours |
|
||||
| `1d` | 12 hours |
|
||||
| `7d` | 72 hours (3 days) |
|
||||
| `30d` | 240 hours (10 days) |
|
||||
| `90d` | 720 hours (30 days) |
|
||||
|
||||
For the intraday window, a document published four hours ago already has a recency weight of `0.25` — it is rapidly losing influence as newer information arrives. For the 90-day window, that same four-hour-old document still has a recency weight of essentially `1.0`, because the 30-day half-life means age only becomes significant over weeks.
|
||||
|
||||
A floor value of `min_recency_weight = 0.01` prevents very old documents from being completely zeroed out. Even a document from months ago retains a trace-level weight of 1%, ensuring it can still contribute to trend computation if no newer signals exist. Both timestamps are normalized to UTC; naive datetimes are treated as UTC to avoid timezone-related scoring errors.
|
||||
|
||||
---
|
||||
|
||||
## Source Credibility
|
||||
|
||||
The `credibility_weight()` function transforms a source's credibility score into a weight component. The raw credibility value — a float between 0.0 and 1.0 stored in the `document_intelligence` table — is first clamped to the range `[0.1, 1.0]` using the `credibility_floor` and `credibility_ceiling` parameters from `ScoringConfig`. This clamping ensures that even the least credible sources retain a minimum weight of 0.1 rather than being completely silenced, while preventing any source from exceeding a weight of 1.0.
|
||||
|
||||
After clamping, the value is raised to the `credibility_exponent` power. The default exponent is `1.0`, which means the clamped credibility passes through unchanged. Setting the exponent above 1.0 would penalize low-credibility sources more aggressively — for example, an exponent of 2.0 would reduce a credibility of 0.5 to a weight of 0.25. Setting it below 1.0 would flatten the curve, making the system more tolerant of lower-credibility sources. The exponent is configurable through `ScoringConfig` to allow operators to tune the credibility sensitivity without changing the scoring code.
|
||||
|
||||
---
|
||||
|
||||
## Novelty Bonus
|
||||
|
||||
The novelty bonus rewards documents that contain genuinely new information. The bonus is computed as:
|
||||
|
||||
```
|
||||
novelty_bonus = novelty_score × novelty_bonus_max
|
||||
```
|
||||
|
||||
where `novelty_score` is the 0.0-to-1.0 value produced by the extraction model (see the `ExtractionResult` schema in [Page 2](02-ai-agent-processing-and-extraction.md)) and `novelty_bonus_max` is `0.25` by default. This means the bonus ranges from `0.0` (completely routine information) to `0.25` (maximally novel information), providing up to a 25% boost to the signal weight.
|
||||
|
||||
The bonus enters the composite formula as `(1 + novelty_bonus)`, so it acts as a multiplicative amplifier on the base weight. A document with a novelty score of 1.0 gets its weight multiplied by 1.25; a document with a novelty score of 0.0 gets multiplied by 1.0 (no change). This design ensures that novelty can only increase a signal's weight, never decrease it — routine information is not penalized, it simply does not receive the bonus.
|
||||
|
||||
---
|
||||
|
||||
## Market Context Multiplier
|
||||
|
||||
The `market_context_multiplier()` function computes a boost factor based on real-time market conditions for the ticker being aggregated. The multiplier is always `>= 1.0`, meaning market context can only amplify signal weights, never reduce them. When no market context data is available (the `MarketContext` object from `services/shared/schemas.py` has `has_data == False`), the multiplier defaults to `1.0`.
|
||||
|
||||
Two market features contribute to the boost:
|
||||
|
||||
**Volatility boost.** When the ticker's price volatility exceeds the `volatility_recency_boost_threshold` (default `1.0` in price units), the excess volatility is transformed through a logarithmic scaling function: `log₁₊(excess) × 0.15`. The logarithmic scaling prevents extreme volatility from producing runaway weight amplification. The boost is capped at `volatility_recency_boost_max = 0.30`, so the maximum volatility contribution is a 30% weight increase. The rationale is that in highly volatile markets, fresh intelligence is disproportionately valuable — a signal about NVDA matters more when NVDA is swinging 5% intraday than when it is trading in a tight range.
|
||||
|
||||
**Volume surge boost.** When the ticker's volume change percentage exceeds `volume_surge_threshold_pct = 50.0%` (meaning trading volume is at least 50% above the prior period's average), a flat `volume_surge_boost = 0.15` is added. Unlike the volatility boost, this is binary — either the volume threshold is met and the full 15% boost applies, or it is not and no boost is added. High-volume moves carry more conviction because they represent broader market participation rather than thin-market noise.
|
||||
|
||||
The two boosts are additive within the multiplier: `multiplier = 1.0 + volatility_boost + volume_surge_boost`. In the most extreme case — high volatility and a volume surge — the combined multiplier reaches `1.0 + 0.30 + 0.15 = 1.45`, amplifying the signal weight by 45%. The `MarketContext` data is fetched by `services/aggregation/market_context.py` from the market data tables in PostgreSQL, using the same ticker and window parameters as the impact record query.
|
||||
|
||||
---
|
||||
|
||||
## Sentiment Mapping
|
||||
|
||||
Before signals can be aggregated into trend summaries, the categorical sentiment labels from the extraction output must be converted to numeric values. The `sentiment_to_numeric()` function in `services/aggregation/scoring.py` performs this mapping:
|
||||
|
||||
| Sentiment Label | Numeric Value |
|
||||
|----------------|---------------|
|
||||
| `positive` | `+1.0` |
|
||||
| `negative` | `-1.0` |
|
||||
| `neutral` | `0.0` |
|
||||
| `mixed` | `0.0` |
|
||||
|
||||
The mapping is case-insensitive. Any unrecognized label defaults to `0.0`. The choice to map both `neutral` and `mixed` to `0.0` is deliberate — a mixed-sentiment document (one that contains both positive and negative signals for the same company) should not push the trend in either direction. The contradiction between the positive and negative aspects is captured separately by the contradiction detection system described in [Page 4](04-trend-aggregation-and-accumulating-signals.md), rather than being baked into the sentiment value itself.
|
||||
|
||||
For macro signals, the direction-to-sentiment mapping in `services/aggregation/worker.py` follows the same pattern: `positive` maps to `+1.0`, `negative` to `-1.0`, and both `mixed` and `neutral` to `0.0`. For competitive signals built by `build_pattern_weighted_signals()` in `services/aggregation/signal_propagation.py`, the sentiment is derived from the pattern's directional bias: `+1.0` if `bullish_pct > bearish_pct`, `-1.0` otherwise.
|
||||
|
||||
---
|
||||
|
||||
## Weighted Sentiment Average
|
||||
|
||||
The `weighted_sentiment_average()` function computes the central metric that drives trend direction: a weight-adjusted average sentiment across all signals for a ticker in a given window. The formula is:
|
||||
|
||||
```
|
||||
weighted_avg = Σ(combined_weight × impact_score × sentiment_value) / Σ(combined_weight × impact_score)
|
||||
```
|
||||
|
||||
Each signal contributes its sentiment value scaled by both its composite weight and its impact score. The denominator normalizes by the total effective weight, producing a value in the range `[-1.0, +1.0]`. A result near `+1.0` means the weighted evidence is overwhelmingly positive; near `-1.0` means overwhelmingly negative; near `0.0` means either neutral or evenly split.
|
||||
|
||||
The use of `combined_weight × impact_score` as the effective weight means that high-impact, high-weight signals dominate the average. A single high-confidence, recent, credible document with a strong impact score can outweigh several older, lower-impact documents — which is the intended behavior. The aggregation engine in `services/aggregation/worker.py` passes this weighted average to `derive_trend_direction()`, which maps it to a `TrendDirection` enum value (bullish, bearish, mixed, or neutral) using the thresholds described in [Page 4](04-trend-aggregation-and-accumulating-signals.md).
|
||||
|
||||
If the total effective weight is zero — either because no signals exist or all signals were gated out by the confidence floor — the function returns `0.0`, which maps to a neutral trend direction.
|
||||
|
||||
---
|
||||
|
||||
## The Three Signal Layers
|
||||
|
||||
The aggregation engine in `services/aggregation/worker.py` does not treat all intelligence sources equally. Signals flow through three independent layers, each with a different relative weight, before being concatenated into a single `WeightedSignal` list for trend computation. This layered architecture allows the system to incorporate diverse intelligence sources while controlling how much influence each source type has on the final trend.
|
||||
|
||||
### Layer 1 — Company Signals (Weight: 1.0)
|
||||
|
||||
Company signals are the primary layer. They are built by `build_weighted_signals()` in `services/aggregation/worker.py` from `document_impact_records` — the per-company extraction output produced by the Document Intelligence Extractor (see [Page 2](02-ai-agent-processing-and-extraction.md)). Each impact record's sentiment is converted via `sentiment_to_numeric()`, and its impact score is used directly without any layer-level scaling. The `compute_signal_weight()` function produces the composite weight using the document's publication time, source credibility, novelty score, extraction confidence, and the ticker's current market context.
|
||||
|
||||
Company signals carry a relative weight of `1.0` — they are the baseline against which other layers are measured. This reflects the design principle that direct, company-specific intelligence (an earnings report about AAPL, a product launch by TSLA, a lawsuit against META) is the most relevant and reliable signal for that company's trend.
|
||||
|
||||
### Layer 2 — Macro Signals (Weight: 0.3)
|
||||
|
||||
Macro signals capture the indirect impact of global events on individual companies. They are built by `build_macro_weighted_signals()` in `services/aggregation/worker.py` from `macro_impact_records` — the per-company impact scores computed by the exposure-based interpolation engine after the Global Event Classifier processes a macro news article. The sentiment is mapped from the `impact_direction` field (`positive` → `+1.0`, `negative` → `-1.0`, `mixed`/`neutral` → `0.0`), and the impact score is scaled by `MACRO_SIGNAL_WEIGHT`, which defaults to `0.3` in `AggregationConfig`.
|
||||
|
||||
The 0.3 weight means that a macro signal's impact score is reduced to 30% of its raw value before entering the aggregation. This attenuation reflects the inherent uncertainty in macro-to-company impact estimation — a tariff announcement might affect XOM's revenue, but the magnitude depends on exposure profiles, supply chain flexibility, and competitive dynamics that the interpolation engine can only approximate. By weighting macro signals at 0.3 relative to company signals at 1.0, the system ensures that macro intelligence informs the trend without overwhelming direct company-specific evidence.
|
||||
|
||||
The recency decay, credibility, and confidence gating for macro signals use the same `compute_signal_weight()` function as company signals. The `published_at` timestamp comes from the global event's source document (the macro news article), and the `source_credibility` and `extraction_confidence` both use the macro impact record's `confidence` field.
|
||||
|
||||
### Layer 3 — Competitive Signals (Weight: 0.2)
|
||||
|
||||
Competitive signals capture cross-company effects: when a catalyst hits one company, historical patterns suggest how competitors might be affected. They are built by `build_pattern_weighted_signals()` in `services/aggregation/signal_propagation.py` from two sources: `HistoricalPattern` objects (self-company patterns mined by `services/aggregation/pattern_matcher.py`) and `CompetitiveSignalRecord` objects (cross-company propagation signals stored in `competitive_signal_records`).
|
||||
|
||||
For historical patterns, the sentiment is derived from the pattern's directional bias (`+1.0` if `bullish_pct > bearish_pct`, `-1.0` otherwise), and the impact score is the pattern's `avg_strength` multiplied by `competitive_signal_weight` (default `0.2` from `CompetitiveConfig`). The `published_at` for recency decay uses the pattern's `data_end` — the most recent data point in the pattern's sample — and the `extraction_confidence` uses the pattern's `pattern_confidence`. Source credibility is set to `1.0` because patterns are derived from validated historical data, and novelty is fixed at `0.5`.
|
||||
|
||||
For competitive signal records, the same structure applies: sentiment from `signal_direction`, impact from `signal_strength × competitive_signal_weight`, recency from `computed_at`, and confidence from `pattern_confidence`.
|
||||
|
||||
The 0.2 weight makes competitive signals the lightest layer. This is appropriate because competitive signal propagation involves the most inference — the system is predicting how Company B will react based on what happened to Company A in historically similar situations. The signal is valuable as supplementary evidence but should not drive trend direction on its own.
|
||||
|
||||
---
|
||||
|
||||
## Signal Merging in the Aggregation Engine
|
||||
|
||||
The `aggregate_company_window()` function in `services/aggregation/worker.py` orchestrates the merging of all three layers for a single ticker and window. The process follows a clear sequence:
|
||||
|
||||
1. **Fetch company impact records** from `document_impact_records` for the ticker within the window's time range.
|
||||
2. **Fetch market context** for the ticker from market data tables.
|
||||
3. **Build company weighted signals** via `build_weighted_signals()`.
|
||||
4. **Check the macro toggle** — query `risk_configs` for the `macro_enabled` flag, then fetch and merge macro signals if enabled.
|
||||
5. **Check the competitive toggle** — query `risk_configs` for the `competitive_enabled` flag, then fetch patterns, fetch competitive signals, and merge if enabled.
|
||||
6. **Concatenate** all `WeightedSignal` lists into a single list.
|
||||
7. **Assemble the `TrendSummary`** from the merged signals.
|
||||
|
||||
The concatenation in step 6 is a simple list append — `signals = signals + macro_signals` followed by `signals = signals + pattern_weighted`. There is no re-weighting or normalization at the merge point. The relative influence of each layer is already encoded in the impact scores (scaled by 0.3 for macro, 0.2 for competitive, 1.0 for company) and in the composite weights computed by `compute_signal_weight()`. The `weighted_sentiment_average()` function then naturally produces a sentiment average that reflects these relative weights.
|
||||
|
||||
---
|
||||
|
||||
## Runtime Toggles and Graceful Degradation
|
||||
|
||||
Both the macro and competitive signal layers can be enabled or disabled at runtime through the `risk_configs` PostgreSQL table, without restarting any service. The toggle state is read fresh from the database at the start of every aggregation cycle — there is no caching — so changes take effect on the very next cycle.
|
||||
|
||||
The `fetch_macro_enabled()` function in `services/aggregation/worker.py` queries the most recent active `risk_configs` row and reads the `config->>'macro_enabled'` JSON field. If the field is explicitly set to `"true"` or `"false"`, that value overrides the `AggregationConfig` default. If no config row exists or the field is absent, the function returns `None` and the engine falls back to the `AggregationConfig.macro_enabled` default (which is `True`). The `fetch_competitive_enabled()` function follows the identical pattern for the `competitive_enabled` field.
|
||||
|
||||
When a layer is disabled, the aggregation engine simply skips the fetch-and-merge step for that layer. Company signals are always computed — they cannot be toggled off. This means the system degrades gracefully: disabling the macro layer produces trends based on company signals alone (plus competitive signals if enabled), and disabling the competitive layer produces trends based on company and macro signals. Disabling both layers reduces the engine to its original single-layer behavior, using only direct document intelligence.
|
||||
|
||||
Crucially, disabling a layer does not stop upstream processing. When the macro layer is disabled, the Global Event Classifier continues to classify macro events and the interpolation engine continues to compute `macro_impact_records`. The data accumulates in PostgreSQL. When the layer is re-enabled, the aggregation engine immediately picks up all the macro impact records that were computed while the layer was disabled — there is no data loss or gap in coverage. The same applies to competitive signals: pattern mining and signal propagation continue regardless of the toggle state.
|
||||
|
||||
If the competitive signal fetch fails at runtime (for example, due to a database timeout), the aggregation engine catches the exception, logs it, and continues with company and macro signals only. This exception-based graceful degradation ensures that a transient failure in one layer does not block trend computation entirely.
|
||||
|
||||
---
|
||||
|
||||
## What Comes Next
|
||||
|
||||
At this point, every document intelligence record, macro impact record, and competitive signal record has been transformed into a `WeightedSignal` with a composite weight that encodes recency, credibility, novelty, confidence, and market conditions. The three signal layers have been merged into a single list, and the weighted sentiment average has been computed. But a single aggregation cycle produces only a snapshot — a point-in-time view of the evidence. The real power of the system emerges when these snapshots accumulate across multiple documents and time windows, building a case for action. [Page 4 — Trend Aggregation and Accumulating Signals](04-trend-aggregation-and-accumulating-signals.md) explains how the aggregation engine computes `TrendSummary` objects across five time windows, how consecutive same-direction signals strengthen trend confidence and escalate the system's response from neutral observation to actionable trading recommendations, and how contradiction detection and evidence ranking ensure that the trend reflects genuine consensus rather than noise.
|
||||
+267
@@ -0,0 +1,267 @@
|
||||
# Page 4 — Trend Aggregation and Accumulating Signals
|
||||
|
||||
The scoring layer described in [Page 3](03-signal-scoring-and-weighted-signals.md) transforms every intelligence record into a `WeightedSignal` — a document reference paired with a composite weight that encodes recency, credibility, novelty, confidence, and market conditions. Three independent signal layers (Company at weight 1.0, Macro at 0.3, Competitive at 0.2) each produce `WeightedSignal` objects that are concatenated into a single list. But a single list of weighted signals is still just raw material. The aggregation engine in `services/aggregation/worker.py` is where that raw material becomes a decision-grade assessment: a `TrendSummary` object that captures the direction, strength, confidence, contradiction level, and supporting evidence for a ticker across a specific time window. This page explains how that transformation works — from weighted sentiment averages through trend direction derivation, contradiction detection, evidence ranking, and confidence computation — and, critically, how consecutive signals pointing in the same direction accumulate across documents and time windows to escalate the system's response from passive observation to actionable trading recommendations.
|
||||
|
||||
For a visual overview of the accumulation and escalation process, see the [Trend Accumulation and Escalation diagram](diagrams/trend-accumulation-escalation.md). For how the three signal layers merge into the aggregation engine, see the [Three-Layer Signal Merging diagram](diagrams/three-layer-signal-merging.md).
|
||||
|
||||
---
|
||||
|
||||
## Five Time Windows
|
||||
|
||||
The aggregation engine does not compute a single trend for each ticker. It computes five, one for each time window defined in `services/aggregation/worker.py`:
|
||||
|
||||
| Window | Lookback Duration |
|
||||
|--------|-------------------|
|
||||
| `intraday` | 12 hours |
|
||||
| `1d` | 1 day |
|
||||
| `7d` | 7 days |
|
||||
| `30d` | 30 days |
|
||||
| `90d` | 90 days |
|
||||
|
||||
Each window produces an independent `TrendSummary` by fetching all impact records, macro impacts, and competitive signals for the ticker within that window's time range. The `aggregate_company_window()` function in `services/aggregation/worker.py` orchestrates this per-window computation: it determines the time range from the window's lookback duration, fetches `document_impact_records` from PostgreSQL, retrieves market context, builds company weighted signals, checks the macro and competitive runtime toggles (see [Page 3](03-signal-scoring-and-weighted-signals.md) for toggle details), merges any enabled layer signals, and then assembles the `TrendSummary`.
|
||||
|
||||
The five-window design serves a specific purpose. Short windows (intraday, 1d) capture fast-moving sentiment shifts — a breaking earnings miss, a sudden regulatory action — while long windows (30d, 90d) reveal sustained trends that persist across many documents and news cycles. A ticker might show a bearish intraday trend after a single negative article, but a neutral 30-day trend because the broader evidence base is balanced. The recommendation engine downstream (described in [Page 5](05-recommendation-generation.md)) evaluates each window's `TrendSummary` independently, so the system can respond to both short-term catalysts and long-term directional shifts.
|
||||
|
||||
The `aggregate_company()` function iterates over all effective windows (configurable via `AggregationConfig.windows`, defaulting to all five) and calls `aggregate_company_window()` for each one. This means a single aggregation cycle for one ticker produces up to five `TrendSummary` objects, each reflecting a different temporal perspective on the same underlying evidence.
|
||||
|
||||
---
|
||||
|
||||
## Trend Direction Derivation
|
||||
|
||||
Once the weighted sentiment average has been computed from the merged signal list (see the `weighted_sentiment_average()` function described in [Page 3](03-signal-scoring-and-weighted-signals.md)), the `derive_trend_direction()` function in `services/aggregation/worker.py` maps that numeric value to a `TrendDirection` enum. The rules are evaluated in a specific order, and the first matching rule wins:
|
||||
|
||||
1. **Mixed** — If the contradiction score exceeds `0.10` (the `MIXED_THRESHOLD` constant) *and* the absolute value of the average sentiment is below `0.30`, the direction is `MIXED`. This rule fires first because high contradiction with a weak directional signal indicates genuine disagreement in the evidence — the trend is not simply neutral, it is actively contested.
|
||||
|
||||
2. **Bullish** — If the average sentiment is `≥ 0.15` (the `BULLISH_THRESHOLD` constant), the direction is `BULLISH`. This means the weight-adjusted evidence leans positive with enough conviction to cross the threshold.
|
||||
|
||||
3. **Bearish** — If the average sentiment is `≤ -0.15` (the `BEARISH_THRESHOLD` constant), the direction is `BEARISH`. The symmetric threshold ensures that bullish and bearish classifications require the same magnitude of evidence.
|
||||
|
||||
4. **Neutral** — If none of the above conditions are met, the direction is `NEUTRAL`. This covers the range where the average sentiment falls between -0.15 and +0.15 without high contradiction — the evidence is either balanced or insufficient to establish a directional lean.
|
||||
|
||||
The mixed-first evaluation order is important. Consider a scenario where five documents are bullish and four are bearish, all with similar weights. The weighted sentiment average might be slightly positive (say, +0.08), which would normally map to neutral. But the contradiction score — computed from the minority/majority weight split — would be high (close to 0.44). The mixed rule catches this case: the evidence is not neutral, it is conflicted. This distinction matters downstream because mixed trends receive different treatment in the recommendation engine than neutral trends.
|
||||
|
||||
---
|
||||
|
||||
## Contradiction Detection
|
||||
|
||||
The contradiction detection module in `services/aggregation/contradiction.py` provides a structured analysis of disagreement within the signal set. Rather than collapsing contradictory evidence into a single number, it produces a `ContradictionResult` containing both an overall score and a list of `DisagreementDetail` objects that explain *where* the disagreement lies.
|
||||
|
||||
The `detect_contradictions()` function runs two analyses:
|
||||
|
||||
### Sentiment Disagreement
|
||||
|
||||
The `_detect_sentiment_disagreement()` function examines whether both positive and negative sentiment signals exist in the signal set. For each signal with a non-zero effective weight (`combined_weight × impact_score > 0`), it classifies the signal as positive or negative based on its `sentiment_value` and accumulates the effective weight for each side. If both sides have at least one signal, it produces a `DisagreementDetail` with dimension `"sentiment"`, listing the document IDs and weights for each side, along with a human-readable description like "Sentiment split: 3 positive vs 2 negative signals (minority weight ratio 38%)".
|
||||
|
||||
### Catalyst-Level Disagreement
|
||||
|
||||
The `_detect_catalyst_disagreement()` function goes deeper. It groups signals by their `catalyst_type` (earnings, product_launch, regulatory, etc.) using `CatalystEntry` objects built from the `document_impact_records`. Within each catalyst group, it checks whether both positive and negative signals exist. If they do, it produces a `DisagreementDetail` with dimension `"catalyst:<type>"` — for example, `"catalyst:earnings"` when some documents interpret an earnings report positively and others negatively. This catalyst-level analysis is valuable because it pinpoints the specific topic of disagreement rather than just flagging that disagreement exists somewhere in the evidence.
|
||||
|
||||
### The Overall Contradiction Score
|
||||
|
||||
The `_compute_overall_score()` function computes the backward-compatible scalar contradiction score using the minority/majority weight ratio formula:
|
||||
|
||||
```
|
||||
contradiction_score = minority_weight / total_weight
|
||||
```
|
||||
|
||||
where `minority_weight` is the smaller of the positive and negative effective weights, and `total_weight` is their sum. Signals with zero effective weight or neutral sentiment are excluded. The score ranges from `0.0` (complete agreement — all signals point the same direction) to `0.5` (perfect split — positive and negative weights are exactly equal). A score of `0.0` means no contradiction at all. A score above `0.10` combined with a weak average sentiment triggers the mixed direction classification in `derive_trend_direction()`.
|
||||
|
||||
The contradiction score also feeds directly into the confidence computation as a penalty, described in the next section. High contradiction reduces the system's confidence in the trend, which in turn affects whether the trend can escalate to actionable recommendations.
|
||||
|
||||
---
|
||||
|
||||
## Evidence Ranking
|
||||
|
||||
Not all documents contributing to a trend are equally important. The `rank_evidence()` function in `services/aggregation/worker.py` delegates to the evidence ranking module (`services/aggregation/evidence.py`) to produce ordered lists of the most influential supporting and opposing documents. The ranking uses a composite scoring approach configured by `EvidenceRankConfig`, considering multiple factors:
|
||||
|
||||
- **Weight** — the signal's composite weight from the scoring layer, reflecting recency, credibility, novelty, confidence, and market context.
|
||||
- **Impact** — the extraction's impact score for the company, reflecting how significant the document's content is.
|
||||
- **Recency** — how recently the document was published, with more recent documents ranked higher.
|
||||
- **Confidence** — the extraction confidence, reflecting how reliably the LLM parsed the document.
|
||||
|
||||
Signals are split into supporting (positive sentiment) and opposing (negative sentiment) groups. Neutral and mixed sentiment signals are excluded from evidence lists — they do not argue for or against the trend direction. Within each group, signals are sorted by their composite rank score in descending order, and the top entries (up to `MAX_EVIDENCE_REFS = 10` per side) are returned as document ID lists.
|
||||
|
||||
The `assemble_trend_with_evidence()` function in `services/aggregation/worker.py` uses the detailed variant `rank_evidence_detailed()` to get `RankedEvidence` objects that include the individual scoring components (weight, impact, recency, confidence, sentiment value). These detailed rankings are persisted to the `trend_evidence` table for auditability, while the document ID lists are stored directly in the `TrendSummary` as `top_supporting_evidence` and `top_opposing_evidence`.
|
||||
|
||||
The evidence ranking serves two purposes. First, it provides the recommendation engine with the most relevant documents to cite in its thesis generation (see [Page 5](05-recommendation-generation.md)). Second, it gives human reviewers a quick way to understand *why* the system reached a particular trend assessment — the top-ranked documents are the ones that most influenced the direction and strength.
|
||||
|
||||
---
|
||||
|
||||
## Confidence Computation
|
||||
|
||||
The `compute_trend_confidence()` function in `services/aggregation/worker.py` produces the confidence score for a `TrendSummary`. This score is critical because it directly gates whether a trend can produce actionable recommendations — the eligibility evaluation in `services/recommendation/eligibility.py` requires a minimum confidence of `0.35` to generate any recommendation at all, and higher confidence thresholds control escalation to paper and live trading modes.
|
||||
|
||||
Confidence is computed from four components:
|
||||
|
||||
### Unique Source Count
|
||||
|
||||
The function counts the number of unique document IDs across all active signals (those with `combined_weight > 0`). This count is divided by 15 and capped at `0.8`:
|
||||
|
||||
```
|
||||
count_factor = min(unique_sources / 15.0, 0.8)
|
||||
```
|
||||
|
||||
A trend backed by 15 or more unique source documents reaches the maximum count contribution of `0.8`. A trend backed by a single document gets only `0.067`. This component rewards breadth of evidence — a trend confirmed by many independent sources is more trustworthy than one driven by a single article, regardless of how high that article's individual weight might be.
|
||||
|
||||
### Average Extraction Credibility
|
||||
|
||||
The average credibility weight across all active signals provides a baseline quality measure. If most contributing documents come from high-credibility sources, this component is high. If the evidence is dominated by low-credibility sources, confidence is penalized accordingly.
|
||||
|
||||
### Signal Agreement with Sample-Size Dampening
|
||||
|
||||
The agreement ratio measures what fraction of directional signals (bullish + bearish, excluding neutral) agree on the majority direction. If 8 out of 10 directional signals are bullish, the raw agreement is `0.8`. But raw agreement is misleading with small sample sizes — 1 out of 1 signals agreeing gives a perfect `1.0` agreement, which is not meaningful.
|
||||
|
||||
To address this, the agreement is dampened by a logarithmic sample-size factor:
|
||||
|
||||
```
|
||||
agreement_dampener = min(1.0, log₂(unique_sources + 1) / log₂(8))
|
||||
```
|
||||
|
||||
This dampener saturates at `1.0` when `unique_sources` reaches approximately 7 (since `log₂(8) = 3.0` and `log₂(8) = 3.0`). With fewer sources, the dampener reduces the agreement contribution: 1 source gives a dampener of `0.33`, 3 sources give `0.67`, and 7 sources give the full `1.0`. The log₂ scaling means that each additional source provides diminishing marginal improvement to the dampener, which matches the intuition that the jump from 1 to 3 sources is far more meaningful than the jump from 15 to 17.
|
||||
|
||||
### Contradiction Penalty
|
||||
|
||||
The contradiction score computed by `services/aggregation/contradiction.py` is applied as a direct penalty:
|
||||
|
||||
```
|
||||
contradiction_penalty = contradiction_score × 0.4
|
||||
```
|
||||
|
||||
A contradiction score of `0.5` (perfect split) produces a penalty of `0.2`, which is substantial enough to push a moderately confident trend below the eligibility threshold.
|
||||
|
||||
### The Combined Formula
|
||||
|
||||
The four components are combined as:
|
||||
|
||||
```
|
||||
confidence = 0.3 × count_factor + 0.3 × avg_credibility + 0.4 × agreement − contradiction_penalty
|
||||
```
|
||||
|
||||
The result is clamped to `[0.0, 1.0]`. The weighting gives signal agreement the largest share (40%), reflecting the principle that consensus among diverse sources is the strongest indicator of a reliable trend. Source count and credibility each contribute 30%, providing a balanced assessment of evidence breadth and quality. The contradiction penalty can reduce confidence significantly — a highly contradicted trend with a score of 0.4 loses 0.16 points of confidence, which can easily drop it below the 0.35 eligibility gate.
|
||||
|
||||
---
|
||||
|
||||
## How Accumulating Signals Escalate Decisions
|
||||
|
||||
The trend direction, strength, and confidence computed by the aggregation engine are not just descriptive — they directly determine what action the system takes. The escalation path from passive observation to active trading is governed by the eligibility thresholds defined in `services/recommendation/eligibility.py`, and the key insight is that consecutive signals pointing in the same direction naturally strengthen the trend metrics that control this escalation.
|
||||
|
||||
### The Escalation Ladder
|
||||
|
||||
The `EligibilityConfig` dataclass in `services/recommendation/eligibility.py` defines the thresholds that map trend metrics to actions:
|
||||
|
||||
**Neutral (no recommendation).** A trend fails the eligibility gates entirely when confidence is below `0.35`, trend strength is below `0.10`, contradiction exceeds `0.60`, evidence count is below `2`, or the direction is neutral. The `_check_gates()` function evaluates these hard gates — if any gate fails, no recommendation is generated for that window.
|
||||
|
||||
**Watch.** A trend that passes the gates but has a direction of mixed, or has strength below `0.25` with confidence below `0.50`, maps to a `WATCH` action via `_determine_action()`. This is the system's way of saying "something is happening, but the evidence is not strong enough to act on." Watch recommendations are always `informational` mode — they are logged for human review but never trigger trades.
|
||||
|
||||
**Hold.** When the trend has a clear direction (bullish or bearish) but strength remains below `0.25` while confidence reaches `0.50` or above, the action maps to `HOLD`. This indicates that the directional signal is real but not yet strong enough for a position change. Like watch, hold recommendations are `informational` mode.
|
||||
|
||||
**Buy / Sell.** When trend strength reaches `0.25` or above with a bullish direction, the action is `BUY`. With a bearish direction at the same strength threshold, the action is `SELL`. These are the only actions that can escalate beyond informational mode — `_determine_mode()` evaluates whether the recommendation qualifies for `paper_eligible` (confidence ≥ `0.50`) or `live_eligible` (confidence ≥ `0.70`, contradiction ≤ `0.25`, evidence ≥ `5`).
|
||||
|
||||
### How Accumulation Drives Escalation
|
||||
|
||||
Consider a ticker that starts with no recent intelligence. The first bearish article arrives — a single document with negative sentiment. In the intraday window, this produces:
|
||||
|
||||
- **Trend strength** = `|avg_sentiment|` ≈ the absolute weighted sentiment from one signal, likely close to the impact score.
|
||||
- **Confidence** = low, because `count_factor = min(1/15, 0.8) = 0.067` and the agreement dampener is only `log₂(2)/log₂(8) = 0.33`.
|
||||
- **Direction** = bearish (if the weighted sentiment is ≤ -0.15).
|
||||
|
||||
With confidence well below `0.35`, this trend fails the eligibility gate entirely. No recommendation is generated. The system is in the neutral state.
|
||||
|
||||
A second bearish article arrives hours later. Now the intraday window has two signals:
|
||||
|
||||
- **Unique sources** = 2, so `count_factor = 0.133` and `agreement_dampener = log₂(3)/log₂(8) ≈ 0.53`.
|
||||
- **Agreement** = `1.0 × 0.53 = 0.53` (both signals agree on bearish).
|
||||
- **Confidence** ≈ `0.3 × 0.133 + 0.3 × avg_cred + 0.4 × 0.53` — likely around `0.35-0.45` depending on credibility.
|
||||
|
||||
If confidence crosses `0.35` and strength exceeds `0.10`, the trend passes the eligibility gates. But with strength below `0.25`, the action is `WATCH` or `HOLD` depending on confidence.
|
||||
|
||||
A third and fourth bearish article arrive over the next day. The 1-day window now has four agreeing signals:
|
||||
|
||||
- **Unique sources** = 4, so `count_factor = 0.267` and `agreement_dampener = log₂(5)/log₂(8) ≈ 0.77`.
|
||||
- **Agreement** = `1.0 × 0.77 = 0.77`.
|
||||
- **Confidence** ≈ `0.3 × 0.267 + 0.3 × avg_cred + 0.4 × 0.77` — likely `0.50-0.60`.
|
||||
- **Strength** = `|avg_sentiment|` — with four bearish signals and no contradicting evidence, this could easily exceed `0.25`.
|
||||
|
||||
Now the trend maps to `SELL` with `paper_eligible` mode (confidence ≥ `0.50`). The system has escalated from no recommendation to a paper-eligible sell recommendation purely through the accumulation of consistent bearish evidence.
|
||||
|
||||
If the bearish evidence continues — more documents, more sources, higher credibility — confidence climbs further. At confidence ≥ `0.70` with contradiction ≤ `0.25` and evidence ≥ `5`, the recommendation reaches `live_eligible` mode, the highest escalation level.
|
||||
|
||||
The same process works in reverse for bullish accumulation: consecutive positive signals strengthen the bullish trend, increase confidence through source diversity and agreement, and escalate from watch through hold to buy.
|
||||
|
||||
### The Role of Contradiction in Preventing False Escalation
|
||||
|
||||
Accumulation only works when signals agree. If the fifth article about a ticker is bullish while the previous four were bearish, the contradiction score jumps — `minority_weight / total_weight` increases because the minority (bullish) side now has non-zero weight. This has two effects: the contradiction penalty reduces confidence (potentially dropping it below an eligibility threshold), and if the contradiction exceeds `0.10` with `|avg_sentiment| < 0.30`, the direction flips to mixed, which maps to `WATCH` regardless of strength. The system effectively de-escalates when the evidence becomes contested, requiring a clearer consensus before re-escalating.
|
||||
|
||||
---
|
||||
|
||||
## Trend Projections
|
||||
|
||||
After the `TrendSummary` is assembled and persisted, the aggregation engine computes a forward-looking `TrendProjection` via `compute_projection()` in `services/aggregation/projection.py`. Projections estimate where the trend is heading based on current momentum, macro signal decay, and upcoming catalysts. They are advisory — they do not directly trigger recommendations — but they provide valuable context for human reviewers and can inform future automated decision-making.
|
||||
|
||||
### Momentum
|
||||
|
||||
The `compute_trend_momentum()` function computes the rate of change in signed trend strength between the current and previous aggregation cycles. If the current window shows a bearish trend at strength `0.40` and the previous cycle showed bearish at `0.30`, the momentum is `-0.10` (strengthening bearish). If no previous data is available, the function uses a heuristic: momentum is estimated as half the current signed strength, providing a reasonable baseline for new trends.
|
||||
|
||||
Momentum enters the projection as a half-weighted adjustment to the current signed strength:
|
||||
|
||||
```
|
||||
momentum_projected_signed = direction_sign × current_strength + momentum × 0.5
|
||||
```
|
||||
|
||||
This means momentum influences the projection but does not dominate it — a strong current trend with weakening momentum still projects as directional, just with reduced strength.
|
||||
|
||||
### Macro Decay
|
||||
|
||||
The `project_macro_decay()` function estimates how active macro events will evolve over the projection horizon. Each macro event has an `estimated_duration` that maps to a decay half-life:
|
||||
|
||||
| Duration | Half-Life |
|
||||
|----------|-----------|
|
||||
| `short_term` | 1 day |
|
||||
| `medium_term` | 7 days |
|
||||
| `long_term` | 30 days |
|
||||
|
||||
For each event, the function computes the projected remaining impact at the end of the horizon using exponential decay: `future_factor = 2^(−future_age_days / half_life)`. The impact is further scaled by a severity weight (`critical`: 1.0, `high`: 0.75, `moderate`: 0.5, `low`: 0.25). Positive and negative macro impacts are accumulated separately, and the projected macro direction is determined by comparing the two sides — bullish if positive exceeds negative by 20%, bearish if the reverse, mixed if both are present without a clear majority.
|
||||
|
||||
When the macro layer is enabled and macro events exist, the projection blends the company-specific momentum projection with the macro trajectory. The macro weight is capped at `0.4` (40% of the blended projection), ensuring that macro signals inform but do not overwhelm the company-specific trend. The blending formula combines the signed company projection with the signed macro projection:
|
||||
|
||||
```
|
||||
blended = company_weight × momentum_projected + macro_weight × macro_signed
|
||||
```
|
||||
|
||||
### Driving Factors
|
||||
|
||||
The projection records a list of human-readable driving factors that explain what is influencing the projected direction. These include momentum descriptions ("Positive momentum (+0.150) in recent trend strength"), macro impact projections ("Macro signals project bearish impact (strength 0.350) over 7d"), and upcoming catalysts drawn from the trend's `dominant_catalysts` list (limited to the top 3). If no specific factors are identified, a baseline continuation factor is recorded.
|
||||
|
||||
### Divergence Detection
|
||||
|
||||
After computing the projected direction, the function compares it to the current trend direction. If they differ — for example, the current trend is bearish but the projection is bullish due to decaying negative macro events and positive momentum — the projection is flagged with `diverges_from_current = True` and a divergence driving factor is appended. Divergence signals are particularly valuable because they indicate that the trend may be about to reverse, giving the recommendation engine and human reviewers an early warning.
|
||||
|
||||
The projection also flags low confidence when `projected_confidence` falls below the default threshold of `0.3`. Projection confidence starts at 80% of the current trend confidence (reflecting the inherent uncertainty of forward-looking estimates), with a small boost if macro data is available and a further reduction if the macro layer is disabled entirely.
|
||||
|
||||
---
|
||||
|
||||
## Persistence
|
||||
|
||||
Each aggregation cycle persists its results to four PostgreSQL tables, creating a durable record of the trend assessment and its supporting evidence.
|
||||
|
||||
### `trend_windows` — Current State
|
||||
|
||||
The `persist_trend_summary()` function in `services/aggregation/worker.py` upserts the `TrendSummary` into the `trend_windows` table, keyed by `(entity_type, entity_id, window)`. Each cycle overwrites the previous row for that ticker and window, so `trend_windows` always reflects the most recent assessment. The row includes the trend direction, strength, confidence, contradiction score, disagreement details (as JSON), supporting and opposing evidence document IDs (as JSON arrays), dominant catalysts, material risks, market context, and the generation timestamp.
|
||||
|
||||
### `trend_history` — Time-Series Snapshots
|
||||
|
||||
Immediately after the upsert, `persist_trend_summary()` also inserts a snapshot row into the `trend_history` table. Unlike `trend_windows`, this table is append-only — every aggregation cycle adds a new row, creating a time-series of how the trend evolved over time. The history table stores the direction, strength, confidence, contradiction score, catalysts, risks, and timestamp. This time-series data powers the trend charts in the dashboard and enables the momentum computation in `services/aggregation/projection.py` by providing the previous cycle's strength and direction. If the history insert fails (for example, if the table does not yet exist in a development environment), the failure is logged at debug level and does not block the main upsert.
|
||||
|
||||
### `trend_evidence` — Per-Document Rankings
|
||||
|
||||
The `persist_trend_evidence()` function writes detailed evidence ranking rows to the `trend_evidence` table, linked to the `trend_windows` row by its UUID. Each row records a document ID, its role (supporting or opposing), and the individual scoring components: rank score, weight component, impact component, recency component, confidence component, and sentiment value. Non-UUID document IDs (such as synthetic pattern signal IDs like `pattern:AAPL:earnings:7d`) are filtered out before insertion, since the `trend_evidence` table enforces a foreign key to the `documents` table.
|
||||
|
||||
### `trend_projections` — Forward-Looking Estimates
|
||||
|
||||
The `persist_trend_projection()` function in `services/aggregation/projection.py` inserts the `TrendProjection` into the `trend_projections` table, linked to the `trend_windows` row. The row stores the projected direction, strength, confidence, projection horizon, driving factors (as JSON), macro contribution percentage, divergence flag, and computation timestamp. Like trend history, projections accumulate over time, allowing analysis of how well the system's forward-looking estimates matched subsequent reality.
|
||||
|
||||
---
|
||||
|
||||
## What Comes Next
|
||||
|
||||
At this point, the aggregation engine has transformed weighted signals into `TrendSummary` objects across five time windows, detected contradictions, ranked evidence, computed confidence, and persisted everything to PostgreSQL. The trend metrics — direction, strength, confidence, contradiction score — encode the accumulated weight of evidence for each ticker. But a `TrendSummary` is still an assessment, not an action. The next stage translates these assessments into concrete recommendations: should the system buy, sell, hold, or simply watch? And with what conviction? [Page 5 — Recommendation Generation](05-recommendation-generation.md) explains how the recommendation engine applies data quality suppression, eligibility evaluation, position sizing, thesis generation, and risk classification to convert trend summaries into actionable `Recommendation` objects that the trading engine can execute.
|
||||
@@ -0,0 +1,226 @@
|
||||
# Page 5 — Recommendation Generation and Signal-to-Action Translation
|
||||
|
||||
The aggregation engine described in [Page 4](04-trend-aggregation-and-accumulating-signals.md) produces `TrendSummary` objects across five time windows for each ticker, encoding the direction, strength, confidence, contradiction level, and supporting evidence accumulated from all three signal layers. But a `TrendSummary` is an assessment — it describes what the evidence says, not what the system should do about it. The recommendation engine is where assessment becomes action. It takes each `TrendSummary`, subjects it to a series of deterministic evaluations, and produces a `Recommendation` object that specifies a concrete action (buy, sell, hold, or watch), an execution mode (informational, paper-eligible, or live-eligible), a position sizing guideline, a human-readable thesis, and a risk classification. Every decision in this pipeline is rule-based and fully traceable — the LLM is only involved in an optional downstream step that rewrites the thesis wording.
|
||||
|
||||
The recommendation worker in `services/recommendation/main.py` polls the `stonks:queue:recommendation` Redis queue for jobs, each specifying a ticker and time window. For each job, it delegates to `generate_recommendation()` in `services/recommendation/worker.py`, which orchestrates the full pipeline: fetch the latest trend summary, check for duplicate recommendations, fetch any available trend projection, evaluate data quality suppression, evaluate eligibility, optionally rewrite the thesis via LLM, build the `Recommendation` object, and persist everything to PostgreSQL. For a visual overview of this flow, see the [Recommendation Generation Flow diagram](diagrams/recommendation-generation-flow.md).
|
||||
|
||||
---
|
||||
|
||||
## Data Quality Suppression
|
||||
|
||||
Before the eligibility engine evaluates whether a trend is strong enough to act on, the suppression layer in `services/recommendation/suppression.py` asks a more fundamental question: is the underlying data reliable enough to act on at all? A trend might show high confidence and strong directionality, but if the documents feeding it are stale, poorly extracted, or drawn from a single source type, the apparent signal quality is illusory. The suppression layer acts as a pre-filter on data quality, running before the eligibility engine and forcing any recommendation built on unreliable data to `informational` mode regardless of how strong the trend metrics look.
|
||||
|
||||
The `evaluate_suppression()` function accepts a `TrendSummary` and a `DataQualityContext` — a set of metrics about the documents underlying the trend, populated by querying `documents` and `document_intelligence` tables for the evidence document IDs stored in the trend summary. When full document-level metrics are not available (for example, in a development environment without the full document pipeline), the function falls back to `build_quality_context_from_summary()`, which estimates quality metrics from the trend summary's own evidence counts and confidence.
|
||||
|
||||
### The Six Data Quality Checks
|
||||
|
||||
The suppression evaluation runs six independent checks, each comparing a data quality metric against a configurable threshold defined in `SuppressionConfig`. If any single check fails, the recommendation is suppressed:
|
||||
|
||||
1. **Low extraction confidence** — If the average extraction confidence across the evidence documents falls below `0.40` (`min_avg_extraction_confidence`), the underlying LLM extractions are too unreliable. This catches cases where the extractor struggled with document formatting, ambiguous content, or low-quality source material, as described in [Page 2](02-ai-agent-processing-and-extraction.md).
|
||||
|
||||
2. **Evidence staleness** — If the most recent evidence document is older than `168` hours (7 days, `max_evidence_staleness_hours`), the trend is based on outdated information. Markets move fast, and a week-old evidence base may no longer reflect current conditions. When documents exist but no timestamp is available, the evidence is conservatively treated as stale.
|
||||
|
||||
3. **Low source diversity** — If fewer than `1` distinct source type (`min_source_types`) contributed to the evidence, the signal may be driven by a single unreliable source class. In practice, this check fires when the quality context has documents but all come from the same source type (for example, all news articles with no filings or market data to corroborate).
|
||||
|
||||
4. **High extraction failure rate** — If more than `50%` (`max_extraction_failure_rate`) of the documents that should have contributed to the trend failed extraction entirely, the data pipeline is unreliable for this ticker. A high failure rate means the trend summary is built from a biased subset of the available evidence — the failed documents might have told a different story.
|
||||
|
||||
5. **Insufficient valid documents** — If fewer than `2` valid (non-failed) documents (`min_valid_documents`) contributed to the trend, there simply is not enough data to act on. A single document, no matter how high-quality, does not provide the corroboration needed for automated trading decisions.
|
||||
|
||||
6. **Low data quality score** — The `_compute_data_quality_score()` function computes an overall quality score from three weighted components: extraction confidence (40% weight, normalized against a 0.8 baseline), evidence freshness (30% weight, linear decay over the staleness window), and document coverage (30% weight, combining the valid/total ratio with a count factor that saturates at 10 documents). If this composite score falls below `0.30` (`min_data_quality_score`) and the low-confidence check has not already fired, a general suppression reason is added.
|
||||
|
||||
When any check triggers, the `SuppressionResult` records the specific reasons (as `SuppressionReason` enum values) and the computed data quality score. The worker in `services/recommendation/worker.py` uses this result to force the recommendation's mode to `informational` and append a suppression note to the thesis text, ensuring the suppression decision is visible in the audit trail.
|
||||
|
||||
### Safety Suppressions: Macro-Only and Pattern-Only Signals
|
||||
|
||||
Beyond the six data quality checks, two additional safety suppressions protect against acting on signals that lack company-specific corroboration:
|
||||
|
||||
**Macro-only suppression** (`evaluate_macro_only_suppression()`) fires when macro signals are the sole basis for a trend direction — no company-specific signals contributed at all. As described in [Page 3](03-signal-scoring-and-weighted-signals.md), macro signals enter the aggregation engine at a reduced weight of `0.3` relative to company signals. But even at reduced weight, macro signals alone can shift a trend direction if no company-specific evidence exists. When this happens, the recommendation is forced to `informational` mode with a caveat noting that the signal is macro-only and should not be used for automated trading.
|
||||
|
||||
**Pattern-only suppression** (`evaluate_pattern_only_suppression()`) applies the same logic to competitive/pattern signals. When pattern-based signals from `services/aggregation/pattern_matcher.py` and `services/aggregation/signal_propagation.py` are the sole contributors — no company-specific or macro signals — the recommendation is suppressed. Historical patterns are valuable context, but acting on them without any current evidence is too speculative for automated trading.
|
||||
|
||||
Both safety suppressions are evaluated in the worker after the main suppression check, and both force the mode to `informational` when triggered.
|
||||
|
||||
---
|
||||
|
||||
## Eligibility Evaluation
|
||||
|
||||
Recommendations that survive the suppression layer enter the eligibility evaluation in `services/recommendation/eligibility.py`. This is the core decision logic — a set of deterministic rules that map trend metrics to actions, execution modes, and position sizing. The `evaluate_eligibility()` function is the single entry point, accepting a `TrendSummary` and an `EligibilityConfig` of tunable thresholds.
|
||||
|
||||
### Gate Checks
|
||||
|
||||
The `_check_gates()` function applies five hard gates. If any gate fails, the trend is ineligible for a recommendation (though the action and mode are still computed for the audit trace):
|
||||
|
||||
| Gate | Threshold | Rejection Reason |
|
||||
|------|-----------|-----------------|
|
||||
| Confidence | ≥ `0.35` | `low_confidence` |
|
||||
| Trend strength | ≥ `0.10` | `low_trend_strength` |
|
||||
| Contradiction score | ≤ `0.60` | `high_contradiction` |
|
||||
| Evidence count | ≥ `2` (supporting + opposing) | `insufficient_evidence` |
|
||||
| Direction | ≠ `neutral` | `neutral_direction` |
|
||||
|
||||
These gates are intentionally conservative. A confidence threshold of `0.35` means the system needs meaningful evidence breadth and agreement before generating any recommendation at all (see the confidence computation in [Page 4](04-trend-aggregation-and-accumulating-signals.md)). The contradiction ceiling of `0.60` allows moderately contested trends through — only when the evidence is deeply split does the gate reject. The evidence minimum of `2` ensures that no recommendation is ever based on a single document.
|
||||
|
||||
When a trend fails any gate, the resulting `EligibilityResult` has `eligible = False` and the mode is forced to `informational`, regardless of what the mode escalation logic would otherwise compute.
|
||||
|
||||
### Action Mapping
|
||||
|
||||
The `_determine_action()` function maps the trend's direction and strength to one of four action types. The logic evaluates in a specific order:
|
||||
|
||||
**Mixed or neutral direction → WATCH.** If the trend direction is `mixed` (high contradiction with weak directional signal) or `neutral`, the action is always `WATCH`. There is no directional conviction to act on.
|
||||
|
||||
**Strong directional signal → BUY or SELL.** If the trend strength reaches `0.25` or above (`action_strength_threshold`), the action follows the direction: `BUY` for bullish, `SELL` for bearish. This threshold ensures that only trends with meaningful magnitude trigger position-changing actions.
|
||||
|
||||
**Weak directional signal with decent confidence → HOLD.** If the trend has a clear direction (bullish or bearish) but strength remains below `0.25`, the action depends on confidence. If confidence reaches `0.50` or above (`hold_confidence_threshold`), the action is `HOLD` — the system recognizes the directional lean but does not have enough conviction to recommend a position change. Below `0.50` confidence, the action falls to `WATCH`.
|
||||
|
||||
This mapping creates the escalation ladder described in [Page 4](04-trend-aggregation-and-accumulating-signals.md): as consecutive signals accumulate and strengthen the trend metrics, the action naturally progresses from WATCH → HOLD → BUY/SELL.
|
||||
|
||||
### Mode Escalation
|
||||
|
||||
The `_determine_mode()` function determines the highest execution mode allowed for the recommendation. Mode controls whether the recommendation is purely informational, eligible for paper trading, or eligible for live trading:
|
||||
|
||||
**WATCH and HOLD → always informational.** These actions do not trigger trades, so they are always `informational` mode. They are logged for human review and dashboard display but never enter the trading engine.
|
||||
|
||||
**BUY and SELL → escalation based on signal quality.** For actionable recommendations, mode escalates through three tiers:
|
||||
|
||||
- **`informational`** — The default when confidence is below `0.50`. The recommendation is recorded but not eligible for any trading.
|
||||
- **`paper_eligible`** — When confidence reaches `0.50` or above (`paper_confidence_threshold`). The recommendation can be picked up by the paper trading engine described in [Page 6](06-trading-decisions-and-execution.md).
|
||||
- **`live_eligible`** — The strictest tier, requiring confidence ≥ `0.70` (`live_confidence_threshold`), contradiction ≤ `0.25` (`live_max_contradiction`), and evidence count ≥ `5` (`live_min_evidence`). This triple gate ensures that only high-conviction, well-corroborated, low-contradiction recommendations can trigger live trades.
|
||||
|
||||
The evidence count for mode escalation is computed as the sum of supporting and opposing evidence documents, matching the same count used in the gate checks.
|
||||
|
||||
---
|
||||
|
||||
## Position Sizing
|
||||
|
||||
The `_compute_position_sizing()` function in `services/recommendation/eligibility.py` translates signal quality into a portfolio allocation guideline. Position sizing is not a fixed value — it scales dynamically with the confidence and strength of the underlying trend, penalized by contradiction and thin evidence.
|
||||
|
||||
### Base and Scaling
|
||||
|
||||
The computation starts with a base portfolio allocation of `1%` (`base_portfolio_pct = 0.01`) and scales upward based on two factors:
|
||||
|
||||
- **Confidence factor** — `0.8 × confidence` (`confidence_sizing_weight`), reflecting how much the system trusts the trend assessment.
|
||||
- **Strength factor** — `0.5 + 0.5 × trend_strength`, ranging from `0.5` (weakest trend) to `1.0` (strongest trend).
|
||||
|
||||
The raw portfolio percentage is computed as:
|
||||
|
||||
```
|
||||
raw_portfolio = base + confidence_factor × strength_factor × (max - base)
|
||||
```
|
||||
|
||||
where `max` is `10%` (`max_portfolio_pct = 0.10`). At maximum confidence (1.0) and maximum strength (1.0), the raw allocation reaches the full 10%. At typical values (confidence 0.6, strength 0.3), the raw allocation is considerably lower.
|
||||
|
||||
### Contradiction Penalty
|
||||
|
||||
The contradiction score applies a multiplicative penalty:
|
||||
|
||||
```
|
||||
portfolio_pct = raw_portfolio × (1.0 − 0.5 × contradiction_score)
|
||||
```
|
||||
|
||||
A contradiction score of `0.40` reduces the allocation by 20%. A score of `0.0` (no contradiction) applies no penalty. This ensures that contested trends receive smaller position sizes even when they pass the eligibility gates.
|
||||
|
||||
### Evidence Count Penalty
|
||||
|
||||
Thin evidence further reduces the allocation:
|
||||
|
||||
- Fewer than `3` evidence documents → multiply by `0.5` (halved).
|
||||
- Fewer than `5` evidence documents → multiply by `0.75`.
|
||||
- `5` or more documents → no penalty.
|
||||
|
||||
This penalty stacks with the contradiction penalty, so a trend with high contradiction and thin evidence receives a substantially reduced position size.
|
||||
|
||||
### Max Loss Scaling
|
||||
|
||||
The same scaling logic applies to the maximum loss percentage, which starts at a base of `0.3%` (`base_max_loss_pct = 0.003`) and scales up to `2%` (`max_max_loss_pct = 0.02`). Higher-conviction positions are allowed larger loss tolerances, while low-conviction or contested positions are constrained to tighter stops.
|
||||
|
||||
The final `PositionSizing` object (defined in `services/shared/schemas.py`) contains `portfolio_pct` and `max_loss_pct`, both clamped to their respective bounds. This object is embedded in the `Recommendation` and later consumed by the trading engine's own position sizer (described in [Page 6](06-trading-decisions-and-execution.md)), which applies additional portfolio-level constraints.
|
||||
|
||||
---
|
||||
|
||||
## Thesis Generation
|
||||
|
||||
Every recommendation includes a human-readable thesis that explains the reasoning behind the action. Thesis generation happens in two layers: a deterministic assembly that is always present, and an optional LLM rewrite that polishes the wording for trading-eligible recommendations.
|
||||
|
||||
### Deterministic Thesis Assembly
|
||||
|
||||
The `build_thesis()` function in `services/recommendation/worker.py` constructs a thesis string entirely from the trend data and eligibility result, with no model involvement. The thesis is assembled from several components in order:
|
||||
|
||||
1. **Opening** — States the ticker, trend direction, window, strength, and confidence. For example: "AAPL shows a bearish trend over the 7d window with strength 0.35 and confidence 0.62."
|
||||
|
||||
2. **Catalysts** — Lists the top three dominant catalysts from the `TrendSummary`, drawn from the evidence ranking described in [Page 4](04-trend-aggregation-and-accumulating-signals.md).
|
||||
|
||||
3. **Contradiction note** — If the contradiction score exceeds `0.15`, a note flags the signal disagreement and its magnitude.
|
||||
|
||||
4. **Trend projection** — When a `TrendProjection` is available and not flagged as low-confidence, the thesis incorporates the projected direction, strength, and top driving factors. If the projection diverges from the current trend, a divergence note is appended.
|
||||
|
||||
5. **Risks** — Lists the top two material risks from the `TrendSummary`.
|
||||
|
||||
6. **Evidence count** — States the number of supporting and opposing evidence documents.
|
||||
|
||||
7. **Prescriptive action** — States the recommended action and mode (e.g., "Recommendation: SELL (paper eligible).").
|
||||
|
||||
The deterministic thesis is always generated and serves as the audit reference. Even when the LLM rewrites the thesis, the deterministic version is preserved in the model metadata for traceability.
|
||||
|
||||
### Optional LLM Rewrite via the Thesis-Rewriter Agent
|
||||
|
||||
For recommendations that are both eligible and not suppressed, the worker optionally invokes the thesis-rewriter agent to polish the deterministic thesis into analyst-quality prose. The LLM rewrite is implemented in `services/recommendation/thesis_llm.py` and uses the `thesis-rewriter` agent slug, resolved at runtime through the `AgentConfigResolver` in `services/shared/agent_config.py`.
|
||||
|
||||
The `AgentConfigResolver` queries the `ai_agents` and `agent_variants` database tables to resolve the active configuration for the `thesis-rewriter` slug, preferring an active variant's model, timeout, and retry settings when one exists. The resolver uses a 60-second TTL in-memory cache to avoid hitting the database on every recommendation. This is the same resolution mechanism used by the document extractor and event classifier agents described in [Page 2](02-ai-agent-processing-and-extraction.md).
|
||||
|
||||
The `rewrite_thesis_with_llm()` function builds a prompt from the deterministic thesis and trend context (ticker, window, direction, strength, confidence, contradiction score, catalysts, risks), sends it to the local Ollama instance via HTTP, and returns the rewritten text. The system prompt enforces strict rules: no fabricated information, no numbers or facts not present in the input, under 150 words, neutral professional tone, and only the rewritten thesis text in the response.
|
||||
|
||||
The LLM layer is purely additive — if the call fails for any reason (network error, timeout, empty response, token budget exceeded), the original deterministic thesis is returned unchanged. The worker in `services/recommendation/main.py` resolves the thesis-rewriter configuration at startup and refreshes it every 50 jobs to pick up configuration changes without requiring a restart. When no database configuration exists for the `thesis-rewriter` slug, thesis rewriting is silently disabled.
|
||||
|
||||
Performance logging for the thesis-rewriter is written to the `agent_performance_log` table, recording success/failure, duration, estimated token counts, and the variant ID. Token budget enforcement checks hourly usage against the variant's configured budget before making the LLM call, preventing runaway costs from high-volume recommendation cycles.
|
||||
|
||||
### Risk Classification Prefix
|
||||
|
||||
Before the thesis is stored, the `classify_risk()` function in `services/recommendation/worker.py` assigns a risk classification label that is prepended to the thesis text as a `[risk:<level>]` prefix. The classification is computed from a composite score:
|
||||
|
||||
| Factor | Contribution |
|
||||
|--------|-------------|
|
||||
| Contradiction score | `contradiction × 2.0` |
|
||||
| Low confidence | `(1.0 − confidence) × 1.5` |
|
||||
| Low evidence count | `+1.0` if < 3 docs, `+0.5` if < 5 docs |
|
||||
| Rejection reasons | `+0.5` per rejection reason |
|
||||
|
||||
The composite score maps to four levels:
|
||||
|
||||
| Score Range | Classification |
|
||||
|-------------|---------------|
|
||||
| ≥ 3.0 | `very_high` |
|
||||
| ≥ 2.0 | `high` |
|
||||
| ≥ 1.0 | `moderate` |
|
||||
| < 1.0 | `low` |
|
||||
|
||||
A recommendation with high contradiction (0.4 → contributes 0.8), moderate confidence (0.55 → contributes 0.675), and 4 evidence documents (contributes 0.5) would score 1.975, classifying as `moderate`. The same recommendation with only 2 evidence documents would score 2.475, pushing it to `high`. This classification gives downstream consumers — both the trading engine and human reviewers — a quick risk signal without needing to re-evaluate the underlying metrics.
|
||||
|
||||
---
|
||||
|
||||
## Persistence
|
||||
|
||||
The recommendation pipeline persists its output to three PostgreSQL tables, creating a complete audit trail from trend assessment through decision logic to the final recommendation.
|
||||
|
||||
### `recommendations` — The Core Record
|
||||
|
||||
The `persist_recommendation()` function in `services/recommendation/worker.py` inserts the `Recommendation` into the `recommendations` table. Each row captures the ticker, action, mode, confidence, time horizon, thesis (including the risk classification prefix and any suppression notes), invalidation conditions (as JSONB), position sizing (portfolio percentage and max loss percentage), model metadata (provider, model name, prompt version, schema version), risk classification, and generation timestamp. The insert returns the recommendation's UUID, which serves as the foreign key for the evidence and risk evaluation tables.
|
||||
|
||||
### `recommendation_evidence` — Evidence Citations
|
||||
|
||||
For each evidence document referenced in the recommendation, a row is inserted into the `recommendation_evidence` table linking the recommendation UUID to the document UUID, with an evidence type (`supporting` or `opposing`) and a position-based weight that decays with rank: `weight = 1.0 / (1.0 + index × 0.1)`. The first supporting document gets weight `1.0`, the second gets `0.91`, the third `0.83`, and so on. Non-UUID document IDs (such as synthetic pattern signal IDs like `pattern:AAPL:earnings:7d` from the competitive signal layer) are filtered out before insertion, since the table enforces a foreign key to the `documents` table.
|
||||
|
||||
### `risk_evaluations` — Decision Audit Trail
|
||||
|
||||
The `risk_evaluations` table records the full eligibility decision for each recommendation: whether the trend was eligible, the allowed mode, the list of rejection reasons (as JSONB), and a `risk_checks` JSONB object containing the time horizon, position sizing details, invalidation conditions, and risk classification. This table enables post-hoc analysis of why the system made a particular decision — auditors can trace from the recommendation back through the eligibility evaluation to the underlying trend metrics.
|
||||
|
||||
---
|
||||
|
||||
## Deduplication
|
||||
|
||||
Before running the full evaluation pipeline, the worker checks whether the latest recommendation for the same ticker and time horizon is effectively identical to what would be generated. The `_is_duplicate_recommendation()` function in `services/recommendation/worker.py` compares the previous recommendation's action, mode, and confidence (within a `0.01` tolerance) against the current eligibility result. If all three match, the recommendation is skipped — the underlying trend data has not changed meaningfully since the last cycle. This prevents the system from flooding the `recommendations` table with identical entries on every aggregation cycle, while still generating a new recommendation whenever the trend metrics shift enough to change the action, mode, or confidence.
|
||||
|
||||
---
|
||||
|
||||
## What Comes Next
|
||||
|
||||
At this point, the recommendation engine has translated trend assessments into concrete `Recommendation` objects — each with an action, execution mode, position sizing guideline, thesis, and risk classification — and persisted them alongside their evidence citations and eligibility audit trails. Recommendations marked as `paper_eligible` or `live_eligible` are now available for the trading engine to consume. [Page 6 — Trading Decisions and Execution](06-trading-decisions-and-execution.md) explains how the trading engine polls these recommendations, applies its own pre-trade check sequence (circuit breakers, trading windows, confidence gates, deduplication, declining positions, and max open positions), computes final position sizes with portfolio-level constraints, and submits orders through the broker adapter to Alpaca's paper trading API.
|
||||
@@ -0,0 +1,199 @@
|
||||
# Page 6 — Trading Decisions and Execution
|
||||
|
||||
The recommendation engine described in [Page 5](05-recommendation-generation.md) produces `Recommendation` objects with an action, execution mode, position sizing guideline, thesis, and risk classification. Recommendations marked as `paper_eligible` or `live_eligible` are persisted to the `recommendations` table and are now available for the final stage of the pipeline: autonomous trade execution. The trading engine in `services/trading/engine.py` is where intelligence becomes action. It polls eligible recommendations, subjects each one to a strict sequence of pre-trade safety checks, computes a portfolio-aware position size, and — if every gate passes — submits an order through the broker adapter to Alpaca's paper trading API. Every evaluation, whether it results in a trade or a skip, is recorded as a `TradingDecision` in the `trading_decisions` table, creating a complete audit trail from the original document signal through to the broker response.
|
||||
|
||||
For a visual overview of the decision flow, see the [Trading Engine Decision Loop diagram](diagrams/trading-engine-decision-loop.md).
|
||||
|
||||
---
|
||||
|
||||
## The Trading Engine Decision Loop
|
||||
|
||||
The `TradingEngine` class in `services/trading/engine.py` is the orchestrator. When `start()` is called, it loads the current portfolio state from PostgreSQL — open positions, reserve pool balance, sector exposure, portfolio heat — and then spawns five concurrent `asyncio` tasks that run for the lifetime of the engine:
|
||||
|
||||
1. **`_decision_loop()`** — The core polling loop. Every 60 seconds (configurable via `polling_interval_seconds`), it queries the `recommendations` table for rows where `action IN ('buy', 'sell')`, `mode IN ('paper_eligible', 'live_eligible')`, and `generated_at` is within the last two hours. Recommendations are ordered by confidence descending and capped at 50 per cycle. For each recommendation, the engine fetches the current market price (first from `market_snapshots`, falling back to the Polygon API), then runs the full pre-trade evaluation pipeline described below.
|
||||
|
||||
2. **`_stop_loss_monitor()`** — Periodically checks current prices against the stop-loss and take-profit levels maintained by the `StopLossManager` in `services/trading/stop_loss_manager.py`. When a price crosses a stop-loss or take-profit threshold, the monitor submits a sell order to the broker queue. The `StopLossManager` computes initial levels from ATR and risk tier parameters, re-evaluates them when volatility shifts materially (ATR change > 10%), activates trailing stops when the price moves more than 50% toward the take-profit target, and tightens stops proactively when portfolio heat exceeds 80% of the maximum.
|
||||
|
||||
3. **`_performance_loop()`** — Computes portfolio-wide performance metrics (total value, unrealized and realized P&L, win rate, Sharpe ratio, drawdown, portfolio heat), persists daily snapshots to `portfolio_snapshots`, checks for daily-loss circuit breaker triggers, evaluates profit-taking opportunities, and synchronizes positions with the database to detect closed positions and trigger reserve pool siphoning.
|
||||
|
||||
4. **`_risk_tier_scheduler()`** — Runs once daily at 16:00 ET (market close). It loads the latest `PerformanceMetrics` from `portfolio_snapshots`, computes the reserve pool as a fraction of total portfolio value, and delegates to the `RiskTierController` in `services/trading/risk_tier_controller.py` to determine whether the active risk tier should change. Tier changes are persisted to `risk_tier_history` and take effect immediately for subsequent decision cycles.
|
||||
|
||||
5. **`_rebalance_scheduler()`** — Runs weekly on Monday at 09:45 ET (shortly after market open). It loads current positions, evaluates them against the active risk tier's constraints using the `PortfolioRebalancer`, and pushes any rebalance sell orders to `stonks:queue:broker_orders`. The rebalancer respects the circuit breaker — if any breaker is active, the rebalance cycle is skipped entirely.
|
||||
|
||||
All five tasks run concurrently within a single `asyncio` event loop. Graceful shutdown via `stop()` cancels all tasks and awaits their completion. If any task encounters an unexpected exception, it logs the error and retries after a brief sleep rather than crashing the engine.
|
||||
|
||||
---
|
||||
|
||||
## Pre-Trade Check Sequence
|
||||
|
||||
When the decision loop picks up a buy recommendation, it calls `evaluate_recommendation()` — a synchronous method that runs the full pre-trade check sequence. The checks are applied in a strict order, and the first failure short-circuits the evaluation with a `skip` decision. This fail-fast design ensures that expensive downstream computations (like position sizing and correlation analysis) are never reached when a simple gate would have rejected the trade.
|
||||
|
||||
The six checks, in order:
|
||||
|
||||
**a. Circuit breaker check.** The engine calls `self.circuit_breaker.is_active()` on the current `CircuitBreakerState`. If any circuit breaker is active and its cooldown has not expired, the recommendation is skipped with reason `circuit_breaker_active`. The circuit breaker mechanism is described in detail below.
|
||||
|
||||
**b. Trading window check.** The `is_within_trading_window()` function verifies that the current time falls within US market hours. Outside the trading window, no orders are submitted — the recommendation is skipped with reason `outside_trading_window`.
|
||||
|
||||
**c. Confidence gate.** The recommendation's confidence score is compared against the active risk tier's `min_confidence` threshold. A conservative tier requires confidence ≥ 0.75, moderate requires ≥ 0.55, and aggressive requires ≥ 0.40. If the recommendation's confidence falls below the tier minimum, it is skipped with reason `insufficient_confidence`. This gate ensures that the risk tier's conservatism is enforced before any capital allocation is considered.
|
||||
|
||||
**d. Deduplication check.** The engine maintains an in-memory set of processed recommendation IDs (`processed_recommendation_ids`) and also checks Redis via `stonks:dedupe:trading:*` keys (with a 24-hour TTL). If the recommendation has already been evaluated in this engine session or by a previous instance, it is skipped with reason `duplicate_recommendation`. This prevents the same recommendation from generating multiple orders across polling cycles.
|
||||
|
||||
**e. Declining positions check.** The `check_declining_positions()` method examines all open positions. If more than 50% of positions have unrealized losses exceeding 2% of their entry value, the engine halts new entries with reason `multiple_declining_positions`. This is a portfolio-level safety valve — when the majority of existing positions are underwater, adding new exposure compounds the risk.
|
||||
|
||||
**f. Max open positions check.** The engine enforces a configurable maximum number of concurrent positions (default 10). If the portfolio is already at capacity, the recommendation is skipped with reason `max_positions_reached`.
|
||||
|
||||
For sell recommendations, the engine follows a separate, simpler path: it verifies the trading window, looks up the existing position for the ticker, and submits a market sell order for the full position quantity without running the position sizer. Sell decisions still generate a `TradingDecision` audit record and set the Redis deduplication key.
|
||||
|
||||
If all six checks pass for a buy recommendation, the engine proceeds to position sizing.
|
||||
|
||||
---
|
||||
|
||||
## Position Sizing
|
||||
|
||||
The `PositionSizer` in `services/trading/position_sizer.py` translates a recommendation's signal quality into a concrete dollar amount and share count, applying a sequential pipeline of adjustments that account for confidence, portfolio composition, sector concentration, correlation, and upcoming earnings events. The sizer operates on the *active pool* — the portion of the portfolio available for trading after subtracting the reserve pool balance.
|
||||
|
||||
### Base Sizing
|
||||
|
||||
The computation begins with a base allocation percentage derived from the risk tier:
|
||||
|
||||
```
|
||||
base_allocation_pct = risk_tier.max_position_pct × 0.5
|
||||
raw_pct = base_allocation_pct × (confidence / min_confidence)
|
||||
```
|
||||
|
||||
The base starts at half the tier's maximum position percentage, then scales linearly with how far the recommendation's confidence exceeds the tier minimum. A moderate-tier recommendation with confidence 0.70 against a minimum of 0.55 would produce a raw percentage of `0.05 × (0.70 / 0.55) ≈ 0.0636`, or about 6.4% of the active pool. The raw percentage is clamped to `max_position_pct` (5% for conservative, 10% for moderate, 15% for aggressive) and then converted to a dollar amount against the active pool. An absolute position cap (default $50) provides a hard ceiling regardless of pool size — a safety measure for the paper trading environment.
|
||||
|
||||
### Correlation-Aware Diversification
|
||||
|
||||
The sizer computes a weighted average correlation between the candidate ticker and all existing positions, using the pairwise correlation matrix that the engine refreshes from 30 days of daily close prices in `market_snapshots`. Each existing position's correlation is weighted by its market value, so larger positions have more influence on the diversification check.
|
||||
|
||||
If the weighted average correlation exceeds 0.8, the position is rejected outright — the portfolio already has too much exposure to correlated assets. Between 0.5 and 0.8, the dollar amount is reduced proportionally: a correlation of 0.65 produces a scale factor of `1.0 − (0.65 − 0.5) / (0.8 − 0.5) = 0.5`, halving the position size. Below 0.5, no reduction is applied.
|
||||
|
||||
### Sector Exposure Reduction
|
||||
|
||||
The sizer checks whether adding the new position would push the sector's total exposure beyond the risk tier's `max_sector_pct` (20% for conservative, 30% for moderate, 40% for aggressive). If the sector is already at its limit, the position is rejected. If the new position would exceed the limit, the dollar amount is reduced to exactly fill the remaining sector capacity.
|
||||
|
||||
### Diversification Bonus
|
||||
|
||||
When the portfolio holds fewer than three distinct sectors and the candidate ticker belongs to a new sector, the sizer applies a 1.2× bonus to the dollar amount. This incentivizes early diversification — the first few positions are encouraged to spread across sectors rather than concentrating in a single one. The bonus is re-clamped to `max_position_pct` after application to prevent oversized positions.
|
||||
|
||||
### Earnings Proximity Adjustment
|
||||
|
||||
The sizer checks the earnings calendar for the candidate ticker. If earnings are within one trading day, the position is rejected entirely — the binary risk of an earnings surprise is too high for automated entry. If earnings are within three trading days, the dollar amount is reduced by 50%. Beyond three days, no adjustment is applied.
|
||||
|
||||
### Portfolio Heat Check and Share Rounding
|
||||
|
||||
After all adjustments, the sizer estimates the new position's contribution to portfolio heat (the aggregate risk from stop-loss distances across all positions). If adding the position would push total heat beyond `max_portfolio_heat × active_pool` (10% for conservative, 20% for moderate, 30% for aggressive), the position is rejected.
|
||||
|
||||
Finally, the dollar amount is converted to whole shares via `floor(dollar_amount / current_price)`. If rounding produces zero shares (the position is too small for even one share at the current price), the position is rejected. The final dollar amount is recalculated from the whole-share quantity to reflect the actual capital deployed.
|
||||
|
||||
The `PositionSizeResult` returned to the engine includes the dollar amount, share quantity, allocation percentage, a list of human-readable adjustment notes, and a rejected flag with reason if any step failed. These adjustment notes are embedded in the `TradingDecision`'s `decision_trace` for full auditability.
|
||||
|
||||
---
|
||||
|
||||
## Circuit Breaker
|
||||
|
||||
The `CircuitBreaker` in `services/trading/circuit_breaker.py` is a pure computation module that evaluates three independent trigger conditions. It carries no state of its own — the engine manages the `CircuitBreakerState` dataclass and persists trigger events to the `circuit_breaker_events` table and Redis keys under `stonks:trading:circuit_breaker:*`.
|
||||
|
||||
### Three Trigger Types
|
||||
|
||||
**Daily loss trigger.** When the portfolio's daily P&L loss exceeds 5% of total portfolio value (`daily_loss_pct = 0.05`), the circuit breaker activates. The `check_daily_loss()` method compares the absolute loss ratio against the threshold. The cooldown duration is set to `volatility_pause_hours` (default 2 hours). The performance loop in the engine calls `_check_circuit_breaker_daily_loss()` periodically to evaluate this condition against the latest portfolio metrics. In extreme cases where the drawdown exceeds an emergency threshold, the reserve pool's emergency liquidation mechanism may also be triggered.
|
||||
|
||||
**Single position loss trigger.** When any individual position loses more than 15% of its entry value (`single_position_loss_pct = 0.15`), the circuit breaker activates with a ticker-specific cooldown. The `check_single_position()` method evaluates the loss percentage. The cooldown for the affected ticker is set to `ticker_cooldown_hours` (default 48 hours), during which the engine will not re-enter that ticker. The `is_ticker_cooled_down()` method checks whether a specific ticker is still within its cooldown window by consulting the `ticker_cooldowns` dictionary in the `CircuitBreakerState`.
|
||||
|
||||
**Volatility trigger (stop-loss clustering).** When three or more stop-losses fire within a 30-minute rolling window (`stop_loss_hits_threshold = 3`, `stop_loss_window_minutes = 30`), the circuit breaker activates. The `check_volatility()` method uses a sliding window algorithm: it sorts the stop-loss timestamps and checks every contiguous subsequence of length `stop_loss_hits_threshold` to see if it fits within the window. This detects rapid-fire stop-loss cascades that indicate extreme market volatility. The cooldown is `volatility_pause_hours` (default 2 hours).
|
||||
|
||||
### Cooldown Computation
|
||||
|
||||
The `compute_cooldown_expiry()` method calculates when a triggered breaker expires. For `daily_loss` and `volatility` triggers, the expiry is `triggered_at + volatility_pause_hours`. For `single_position` triggers, the expiry is `triggered_at + ticker_cooldown_hours`, giving the affected ticker a longer cooling-off period. The `is_active()` method returns `True` when the breaker is flagged active and the current time has not yet passed the cooldown expiry.
|
||||
|
||||
### Redis State Tracking
|
||||
|
||||
The engine persists circuit breaker state to Redis under the `stonks:trading:circuit_breaker:*` key pattern (constructed by `trading_cb_key()` in `services/shared/redis_keys.py`). Each trigger type gets its own key — for example, `stonks:trading:circuit_breaker:daily_loss` — storing the activation timestamp and cooldown expiry. This allows the state to survive engine restarts and enables external monitoring tools to query breaker status without accessing the engine's memory.
|
||||
|
||||
---
|
||||
|
||||
## Reserve Pool
|
||||
|
||||
The `ReservePoolController` in `services/trading/reserve_pool.py` manages an untouchable cash reserve that grows from realized trading profits. The reserve serves two purposes: it provides a buffer against drawdowns, and its size relative to the portfolio influences risk tier upgrade decisions.
|
||||
|
||||
### Profit Siphoning
|
||||
|
||||
When the engine detects a closed position with positive unrealized P&L (via `_sync_positions_and_siphon()` in the performance loop), it calls `siphon_profit()` on the controller. The method transfers a configurable fraction of the realized profit into the reserve — by default 20% (`siphon_pct = 0.20`). Only positive profits are siphoned; losses do not reduce the reserve balance. Each siphon event is recorded in the `reserve_pool_ledger` table with the transfer amount, resulting balance, trigger type (`profit_siphon`), the ticker as reference, and a timestamp.
|
||||
|
||||
### High-Water Mark Rebalancing
|
||||
|
||||
The `is_high_water()` method returns `True` when the reserve balance exceeds 30% of total portfolio value (`high_water_pct = 0.30`). This signal is consumed by the risk tier scheduler — when the reserve is healthy and other performance criteria are met, the controller may recommend upgrading to a more aggressive tier. The high-water mark acts as a confidence indicator: a large reserve means the system has been consistently profitable and can afford to take on more risk.
|
||||
|
||||
### Emergency Liquidation
|
||||
|
||||
The `should_emergency_liquidate()` method checks whether the current drawdown exceeds an emergency threshold. When triggered, `emergency_liquidate()` returns the full reserve balance for release back into the active pool. The caller (the engine) is responsible for zeroing the persisted balance and recording the ledger entry. Emergency liquidation is a last resort — it sacrifices the safety buffer to prevent the portfolio from hitting a catastrophic loss level.
|
||||
|
||||
### Active Pool Computation
|
||||
|
||||
The `compute_active_pool()` method calculates the capital available for trading: `active_pool = total_portfolio_value − reserve_balance`. All position sizing computations use the active pool rather than the total portfolio value, ensuring that the reserve is never inadvertently deployed into new positions.
|
||||
|
||||
---
|
||||
|
||||
## Risk Tier Auto-Adjustment
|
||||
|
||||
The `RiskTierController` in `services/trading/risk_tier_controller.py` evaluates portfolio performance and determines whether the active risk tier should shift. The system supports three tiers — conservative, moderate, and aggressive — each defined by a `RiskTierConfig` dataclass in `services/trading/models.py` with distinct parameter values:
|
||||
|
||||
| Parameter | Conservative | Moderate | Aggressive |
|
||||
|-----------|-------------|----------|------------|
|
||||
| `min_confidence` | 0.75 | 0.55 | 0.40 |
|
||||
| `max_position_pct` | 5% | 10% | 15% |
|
||||
| `stop_loss_atr_multiplier` | 1.5× | 2.0× | 2.5× |
|
||||
| `reward_risk_ratio` | 2.0 | 1.5 | 1.2 |
|
||||
| `max_sector_pct` | 20% | 30% | 40% |
|
||||
| `max_portfolio_heat` | 10% | 20% | 30% |
|
||||
|
||||
The tier controller's `evaluate()` method checks two conditions:
|
||||
|
||||
**Downgrade (any one triggers).** If the trailing 30-day win rate drops below 40% or the current drawdown exceeds 15%, the tier steps down by one level (e.g., aggressive → moderate). If the system is already at conservative, no further downgrade is possible.
|
||||
|
||||
**Upgrade (all must be true).** If the win rate exceeds 55%, the reserve pool exceeds 20% of total portfolio value, and the current drawdown is below 5%, the tier steps up by one level. The triple requirement ensures that upgrades only happen when the system is performing well, has built a safety cushion, and is not in a drawdown.
|
||||
|
||||
The risk tier scheduler in the engine evaluates these conditions daily at market close. When a tier change occurs, it is persisted to the `risk_tier_history` table with the previous tier, new tier, trigger source (`auto_adjustment`), and the metrics that drove the decision (win rate, drawdown, reserve percentage, Sharpe ratio). The new tier takes effect immediately — the engine updates its `_active_risk_tier` reference, and all subsequent decision cycles use the new tier's parameters for confidence gates, position sizing, stop-loss computation, and sector exposure limits.
|
||||
|
||||
---
|
||||
|
||||
## Order Submission Flow
|
||||
|
||||
When `evaluate_recommendation()` returns an `act` decision, the engine constructs an order job and pushes it through a multi-stage submission pipeline that spans two services.
|
||||
|
||||
### TradingDecision Persistence
|
||||
|
||||
Every evaluation — whether it results in `act` or `skip` — produces a `TradingDecision` dataclass that is persisted to the `trading_decisions` table via `_persist_decision()`. The record captures the recommendation ID, decision outcome, skip reason (if applicable), ticker, computed position size and share quantity, the risk tier at the time of decision, portfolio heat, active pool and reserve pool balances, circuit breaker status, correlation and sector exposure check results, earnings proximity flag, and a `decision_trace` JSONB field containing the full reasoning chain. This creates a complete audit record of every recommendation the engine evaluated and why it acted or declined.
|
||||
|
||||
### Order Enqueue
|
||||
|
||||
For `act` decisions, the engine builds an order job dictionary containing the trading decision ID, ticker, action (buy or sell), quantity, and order type (market). This job is pushed via `rpush` to the `stonks:queue:broker_orders` Redis queue (constructed by `queue_key(QUEUE_BROKER)` from `services/shared/redis_keys.py`). The engine immediately deducts the estimated order cost from the in-memory active pool to prevent over-allocation across concurrent recommendation evaluations within the same polling cycle.
|
||||
|
||||
### Broker Service Processing
|
||||
|
||||
The broker service in `services/adapters/broker_service.py` runs as a standalone worker that polls `stonks:queue:broker_orders` via `blpop`. For each order job, `process_order_job()` executes a multi-step pipeline:
|
||||
|
||||
1. **Idempotency check.** A deterministic idempotency key is generated from the job's ticker, action, quantity, and trading decision ID. The service checks Redis first (fast path) and then the `orders` table (durable fallback) to prevent duplicate submissions. If a matching key exists, the job is silently dropped.
|
||||
|
||||
2. **Risk evaluation.** The service loads the current `PortfolioRiskConfig` from the database and the account's risk state (open positions, daily P&L, sector exposure) from both the database and the Alpaca API. The `evaluate_order()` function runs the proposed order through a set of risk checks — position limits, sector concentration, daily loss thresholds — and produces an evaluation result. The evaluation is persisted to the `risk_evaluations` table regardless of outcome.
|
||||
|
||||
3. **Alpaca submission.** If the risk evaluation passes, the service calls `submit_order()` on the `AlpacaBrokerAdapter` in `services/adapters/broker_adapter.py`. The adapter constructs the Alpaca REST API payload (symbol, quantity, side, order type, time in force) and submits it to `paper-api.alpaca.markets/v2/orders` with an idempotency key header. The adapter follows a fail-closed policy: any network error or ambiguous response returns a rejected `OrderResponse` rather than risking duplicate orders.
|
||||
|
||||
4. **Persistence and audit trail.** The `persist_order()` function writes the order to the `orders` table with the full request and response details, risk evaluation results, and the recommendation ID for traceability. When the order is filled, the fill details (price, quantity) are recorded. Order events are published to the analytical lakehouse via MinIO for downstream analysis. The Redis idempotency marker is set after successful persistence to prevent reprocessing.
|
||||
|
||||
The result is a complete chain of custody: from the original document that produced a signal (Pages [1](01-data-ingestion-and-preparation.md)–[2](02-ai-agent-processing-and-extraction.md)), through signal scoring ([Page 3](03-signal-scoring-and-weighted-signals.md)) and trend aggregation ([Page 4](04-trend-aggregation-and-accumulating-signals.md)), to the recommendation ([Page 5](05-recommendation-generation.md)), the trading decision, the risk evaluation, and the broker response — every step is persisted and linked by foreign keys. The `trading_decisions` table links to `recommendations` via `recommendation_id`, the `orders` table links back to both, and the `positions` and `portfolio_snapshots` tables capture the portfolio impact over time.
|
||||
|
||||
For additional reference on the trading engine's configuration, queue topology, and database tables, see [docs/services.md](../services.md).
|
||||
|
||||
---
|
||||
|
||||
## Conclusion: From Raw Data to Trade Execution
|
||||
|
||||
This six-page series has traced the full intelligence-to-decision pipeline in Stonks Oracle, from the moment raw data enters the system to the moment an order reaches the broker.
|
||||
|
||||
It began with [Page 1](01-data-ingestion-and-preparation.md), where the scheduler orchestrates ingestion cycles across four data sources — Polygon news, SEC EDGAR filings, Polygon market data, and macro news APIs — and the parser normalizes raw content into structured documents ready for AI processing. [Page 2](02-ai-agent-processing-and-extraction.md) described how the Document Intelligence Extractor and Global Event Classifier agents use LLM inference to produce structured JSON intelligence, with hot-swappable model configurations and a robust JSON repair pipeline. [Page 3](03-signal-scoring-and-weighted-signals.md) explained how raw extraction output is transformed into `WeightedSignal` objects through a composite formula that balances recency, credibility, novelty, and market context across three independent signal layers. [Page 4](04-trend-aggregation-and-accumulating-signals.md) showed how the aggregation engine merges these signals across five time windows, detecting contradictions, ranking evidence, and computing trend projections — with consecutive same-direction signals accumulating to escalate the system's response from neutral through watch and hold to buy or sell. [Page 5](05-recommendation-generation.md) covered the translation of trend assessments into actionable recommendations through data quality suppression, eligibility evaluation, position sizing, thesis generation, and risk classification.
|
||||
|
||||
And here in Page 6, the pipeline reached its terminus: the trading engine's decision loop polling those recommendations, subjecting each to circuit breaker checks, confidence gates, deduplication, portfolio health assessments, and a multi-step position sizer — then submitting approved orders through the broker adapter to Alpaca's paper trading API, with every decision recorded in a fully auditable trail from signal to execution.
|
||||
|
||||
The pipeline is designed to be conservative by default and transparent throughout. Every stage applies its own safety checks — deduplication at ingestion, confidence gates at extraction, contradiction detection at aggregation, suppression at recommendation, and circuit breakers at trading. The system can be tuned through runtime configuration (risk tier parameters, suppression thresholds, signal layer toggles in `risk_configs`) without code changes or restarts. And the complete audit trail — from `documents` through `document_intelligence`, `document_impact_records`, `trend_windows`, `recommendations`, `trading_decisions`, and `orders` — means that any trade can be traced back to the specific documents, signals, and decisions that produced it.
|
||||
@@ -0,0 +1 @@
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
# Ingestion-to-Extraction Flow
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Scheduler["Scheduler\nservices/scheduler/app.py"]
|
||||
S1["schedule_cycle()"]
|
||||
S2["Cadence check\nmarket_api: 300s\nnews_api: 300s\nfilings_api: 3600s\nmacro_news: 600s"]
|
||||
S3["Rate limit check\ncheck_rate_limit()"]
|
||||
S1 --> S2 --> S3
|
||||
end
|
||||
|
||||
S3 -->|"rpush"| Q_ING["stonks:queue:ingestion"]
|
||||
|
||||
Q_ING -->|"lpop"| ING
|
||||
|
||||
subgraph ING["Ingestion Worker\nservices/ingestion/worker.py"]
|
||||
direction TB
|
||||
AD["Adapter Dispatch\nprocess_job()"]
|
||||
AD --> PA["PolygonMarketAdapter\nservices/adapters/market_adapter.py"]
|
||||
AD --> PB["PolygonNewsAdapter\nservices/adapters/news_adapter.py"]
|
||||
AD --> PC["SECEdgarAdapter\nservices/adapters/filings_adapter.py"]
|
||||
AD --> PD["MacroNewsAdapter\nservices/adapters/macro_news_adapter.py"]
|
||||
AD --> PE["WebScrapeAdapter\nservices/adapters/web_scrape_adapter.py"]
|
||||
end
|
||||
|
||||
ING -->|"Content hash check\nstonks:dedupe:*\nTTL 24h"| REDIS_DEDUPE[("Redis\nDedupe Markers")]
|
||||
|
||||
ING -->|"upload_raw_artifact()"| MINIO_RAW
|
||||
|
||||
subgraph MINIO_RAW["MinIO Raw Storage"]
|
||||
B1["stonks-raw-market"]
|
||||
B2["stonks-raw-news"]
|
||||
B3["stonks-raw-filings"]
|
||||
end
|
||||
|
||||
ING -->|"persist_ingestion_items()"| PG_ING
|
||||
|
||||
subgraph PG_ING["PostgreSQL"]
|
||||
T1["documents"]
|
||||
T2["ingestion_runs"]
|
||||
T3["document_company_mentions"]
|
||||
end
|
||||
|
||||
ING -->|"rpush new doc IDs"| Q_PARSE["stonks:queue:parsing"]
|
||||
|
||||
Q_PARSE -->|"lpop"| PARSER
|
||||
|
||||
subgraph PARSER["Parser Worker\nservices/parser/worker.py"]
|
||||
P1["fetch_html() → parse_html()"]
|
||||
P2["Quality scoring\nconfidence: high / medium / low"]
|
||||
P3["Company mention detection\ndetect_company_mentions()"]
|
||||
P4["Routing decision"]
|
||||
P1 --> P2 --> P3 --> P4
|
||||
end
|
||||
|
||||
PARSER -->|"upload_normalized_text()\nupload_parser_output()"| MINIO_NORM["MinIO\nstonks-normalized"]
|
||||
PARSER -->|"update_document_parse_results()"| PG_ING
|
||||
|
||||
P4 -->|"doc_type = macro_event"| Q_MACRO["stonks:queue:macro_classification"]
|
||||
P4 -->|"doc_type ≠ macro_event"| Q_EXT["stonks:queue:extraction"]
|
||||
|
||||
Q_EXT -->|"lpop"| EXT
|
||||
Q_MACRO -->|"lpop"| EXT
|
||||
|
||||
subgraph EXT["Extractor Worker\nservices/extractor/main.py"]
|
||||
E1["Document Intelligence\nExtractor agent\nslug: document-extractor"]
|
||||
E2["Global Event Classifier\nslug: event-classifier\nservices/extractor/event_classifier.py"]
|
||||
E3["persist_extraction()\nservices/extractor/worker.py"]
|
||||
end
|
||||
|
||||
EXT -->|"persist to"| PG_EXT
|
||||
|
||||
subgraph PG_EXT["PostgreSQL"]
|
||||
T4["document_intelligence"]
|
||||
T5["document_impact_records"]
|
||||
T6["global_events"]
|
||||
T7["macro_impact_records"]
|
||||
end
|
||||
|
||||
EXT -->|"rpush"| Q_AGG["stonks:queue:aggregation"]
|
||||
```
|
||||
@@ -0,0 +1,80 @@
|
||||
# Recommendation Generation Flow
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Q_REC["stonks:queue:recommendation"] -->|"lpop"| WORKER["Recommendation Worker\nservices/recommendation/main.py"]
|
||||
|
||||
WORKER --> FETCH["Fetch TrendSummary\nfrom trend_windows\nfor ticker + window"]
|
||||
|
||||
FETCH --> SUPP
|
||||
|
||||
subgraph SUPP["Data Quality Suppression\nservices/recommendation/suppression.py"]
|
||||
S1["extraction confidence < 0.40?"]
|
||||
S2["evidence staleness > 168h?"]
|
||||
S3["source diversity < 1 type?"]
|
||||
S4["extraction failure rate > 50%?"]
|
||||
S5["valid documents < 2?"]
|
||||
S6["data quality score < 0.30?"]
|
||||
S7["Macro-only signal?\nevaluate_macro_only_suppression()"]
|
||||
S8["Pattern-only signal?\nevaluate_pattern_only_suppression()"]
|
||||
end
|
||||
|
||||
SUPP -->|"Any check fails:\nsuppressed = true\nmode → informational"| ELIG
|
||||
SUPP -->|"All checks pass"| ELIG
|
||||
|
||||
subgraph ELIG["Eligibility Evaluation\nservices/recommendation/eligibility.py"]
|
||||
direction TB
|
||||
G["Gate Checks"]
|
||||
G1["confidence ≥ 0.35"]
|
||||
G2["strength ≥ 0.10"]
|
||||
G3["contradiction ≤ 0.60"]
|
||||
G4["evidence ≥ 2"]
|
||||
G5["direction ≠ neutral"]
|
||||
G --> G1 & G2 & G3 & G4 & G5
|
||||
|
||||
G1 & G2 & G3 & G4 & G5 --> ACT["Action Mapping"]
|
||||
ACT --> A1["BUY: bullish + strength ≥ 0.25"]
|
||||
ACT --> A2["SELL: bearish + strength ≥ 0.25"]
|
||||
ACT --> A3["HOLD: directional + confidence ≥ 0.50"]
|
||||
ACT --> A4["WATCH: otherwise"]
|
||||
|
||||
A1 & A2 & A3 & A4 --> MODE["Mode Escalation"]
|
||||
MODE --> M1["informational\n(default for HOLD/WATCH)"]
|
||||
MODE --> M2["paper_eligible\nconfidence ≥ 0.50"]
|
||||
MODE --> M3["live_eligible\nconfidence ≥ 0.70\ncontradiction ≤ 0.25\nevidence ≥ 5"]
|
||||
end
|
||||
|
||||
ELIG --> SIZING
|
||||
|
||||
subgraph SIZING["Position Sizing\nservices/recommendation/eligibility.py"]
|
||||
PS1["base = 1% portfolio"]
|
||||
PS2["scale by confidence × strength\nup to 10% max"]
|
||||
PS3["contradiction penalty\n−0.5 × contradiction_score"]
|
||||
PS4["evidence count penalty\n< 3 docs → ×0.5\n< 5 docs → ×0.75"]
|
||||
end
|
||||
|
||||
SIZING --> THESIS
|
||||
|
||||
subgraph THESIS["Thesis Generation"]
|
||||
TH1["Deterministic thesis\nassembled from trend data"]
|
||||
TH2["Optional LLM rewrite\nthesis-rewriter agent\nservices/recommendation/thesis_llm.py"]
|
||||
TH1 --> TH2
|
||||
end
|
||||
|
||||
THESIS --> RISK
|
||||
|
||||
subgraph RISK["Risk Classification"]
|
||||
RC1["low"]
|
||||
RC2["moderate"]
|
||||
RC3["high"]
|
||||
RC4["very_high"]
|
||||
end
|
||||
|
||||
RISK --> PERSIST
|
||||
|
||||
subgraph PERSIST["Persistence — PostgreSQL"]
|
||||
P1["recommendations"]
|
||||
P2["recommendation_evidence"]
|
||||
P3["risk_evaluations"]
|
||||
end
|
||||
```
|
||||
@@ -0,0 +1,52 @@
|
||||
# Three-Layer Signal Merging
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Layer1["Layer 1 — Company Signals"]
|
||||
DIR["document_impact_records\n(per-company extraction output)"]
|
||||
DIR -->|"build_weighted_signals()"| WS1["WeightedSignal[]\nweight = 1.0 (full)"]
|
||||
end
|
||||
|
||||
subgraph Layer2["Layer 2 — Macro Signals"]
|
||||
MIR["macro_impact_records\n(global event interpolation)"]
|
||||
MIR -->|"build_macro_weighted_signals()"| WS2["WeightedSignal[]\nimpact × MACRO_SIGNAL_WEIGHT\n(0.3)"]
|
||||
TOGGLE_M{"macro_enabled\nin risk_configs?"}
|
||||
TOGGLE_M -->|"true"| MIR
|
||||
TOGGLE_M -->|"false"| SKIP_M["Layer skipped\ngraceful degradation"]
|
||||
end
|
||||
|
||||
subgraph Layer3["Layer 3 — Competitive Signals"]
|
||||
CSR["competitive_signal_records\n(pattern mining + propagation)"]
|
||||
CSR -->|"build_pattern_weighted_signals()\nservices/aggregation/signal_propagation.py"| WS3["WeightedSignal[]\nimpact × COMPETITIVE_SIGNAL_WEIGHT\n(0.2)"]
|
||||
TOGGLE_C{"competitive_enabled\nin risk_configs?"}
|
||||
TOGGLE_C -->|"true"| CSR
|
||||
TOGGLE_C -->|"false"| SKIP_C["Layer skipped\ngraceful degradation"]
|
||||
end
|
||||
|
||||
WS1 --> MERGE["Concatenate all WeightedSignal lists"]
|
||||
WS2 --> MERGE
|
||||
WS3 --> MERGE
|
||||
|
||||
MERGE --> AGG
|
||||
|
||||
subgraph AGG["Aggregation Engine\nservices/aggregation/worker.py"]
|
||||
A1["weighted_sentiment_average()"]
|
||||
A2["detect_contradictions()\nservices/aggregation/contradiction.py"]
|
||||
A3["derive_trend_direction()"]
|
||||
A4["compute_trend_confidence()"]
|
||||
A5["rank_evidence()"]
|
||||
A1 --> A2 --> A3 --> A4 --> A5
|
||||
end
|
||||
|
||||
AGG -->|"assemble_trend_summary()"| TS["TrendSummary\nservices/shared/schemas.py"]
|
||||
|
||||
TS -->|"persist_trend_summary()"| PG_TREND
|
||||
|
||||
subgraph PG_TREND["PostgreSQL"]
|
||||
TW["trend_windows\n(upserted each cycle)"]
|
||||
TH["trend_history\n(time-series snapshots)"]
|
||||
TE["trend_evidence\n(per-document rankings)"]
|
||||
end
|
||||
|
||||
AGG -->|"rpush"| Q_REC["stonks:queue:recommendation"]
|
||||
```
|
||||
@@ -0,0 +1,94 @@
|
||||
# Trading Engine Decision Loop
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph ENGINE["Trading Engine\nservices/trading/engine.py"]
|
||||
direction TB
|
||||
TASKS["5 Concurrent Async Tasks"]
|
||||
T1["_decision_loop()\n60s polling interval"]
|
||||
T2["_stop_loss_monitor()"]
|
||||
T3["_performance_loop()"]
|
||||
T4["_risk_tier_scheduler()"]
|
||||
T5["_rebalance_scheduler()"]
|
||||
TASKS --> T1 & T2 & T3 & T4 & T5
|
||||
end
|
||||
|
||||
T1 --> POLL["Poll recommendations table\naction IN (buy, sell)\nmode IN (paper_eligible, live_eligible)\ngenerated_at > NOW() − 2h"]
|
||||
|
||||
POLL --> EVAL["evaluate_recommendation()"]
|
||||
|
||||
EVAL --> CHK_A
|
||||
|
||||
subgraph PRETRADE["Pre-Trade Check Sequence\n(first failure short-circuits)"]
|
||||
direction TB
|
||||
CHK_A["a. Circuit Breaker active?\nservices/trading/circuit_breaker.py\nTriggers: daily_loss, single_position, volatility"]
|
||||
CHK_B["b. Trading Window?\nis_within_trading_window()"]
|
||||
CHK_C["c. Confidence Gate\nconfidence ≥ risk_tier.min_confidence"]
|
||||
CHK_D["d. Deduplication\nRec ID in processed set?\nRedis: stonks:dedupe:trading:*"]
|
||||
CHK_E["e. Declining Positions\n> 50% positions down > 2%"]
|
||||
CHK_F["f. Max Open Positions\nopen_count ≥ max (default 10)"]
|
||||
|
||||
CHK_A -->|"pass"| CHK_B
|
||||
CHK_B -->|"pass"| CHK_C
|
||||
CHK_C -->|"pass"| CHK_D
|
||||
CHK_D -->|"pass"| CHK_E
|
||||
CHK_E -->|"pass"| CHK_F
|
||||
end
|
||||
|
||||
CHK_A & CHK_B & CHK_C & CHK_D & CHK_E & CHK_F -->|"fail"| SKIP["TradingDecision\ndecision = skip\n+ skip_reason"]
|
||||
|
||||
CHK_F -->|"pass"| SIZER
|
||||
|
||||
subgraph SIZER["Position Sizing\nservices/trading/position_sizer.py"]
|
||||
direction TB
|
||||
SZ1["Base sizing\nrisk_tier.max_position_pct × 0.5\n× (confidence / min_confidence)"]
|
||||
SZ2["Correlation reduction\nweighted avg corr > 0.8 → reject\n> 0.5 → proportional reduction"]
|
||||
SZ3["Sector exposure\ncap at risk_tier.max_sector_pct"]
|
||||
SZ4["Diversification bonus\n1.2× for new sector (< 3 sectors)"]
|
||||
SZ5["Earnings proximity\n≤ 1 day → reject\n≤ 3 days → 50% reduction"]
|
||||
SZ6["Absolute position cap"]
|
||||
SZ7["Portfolio heat check\nmax_portfolio_heat × active_pool"]
|
||||
SZ8["Share rounding\nfloor(dollar / price)"]
|
||||
|
||||
SZ1 --> SZ2 --> SZ3 --> SZ4 --> SZ5 --> SZ6 --> SZ7 --> SZ8
|
||||
end
|
||||
|
||||
SIZER -->|"rejected"| SKIP
|
||||
SIZER -->|"approved"| ACT["TradingDecision\ndecision = act\nshares, dollar amount"]
|
||||
|
||||
ACT --> PERSIST_TD["Persist to\ntrading_decisions"]
|
||||
|
||||
ACT --> ORDER["Build order job\n{ticker, action, side,\nquantity, order_type}"]
|
||||
|
||||
ORDER -->|"rpush"| Q_BROKER["stonks:queue:broker_orders"]
|
||||
|
||||
Q_BROKER --> BROKER["Broker Adapter\nAlpaca paper trading\nservices/adapters/broker_adapter.py"]
|
||||
|
||||
BROKER --> AUDIT
|
||||
|
||||
subgraph AUDIT["Audit Trail — PostgreSQL"]
|
||||
AU1["orders"]
|
||||
AU2["positions"]
|
||||
AU3["portfolio_snapshots"]
|
||||
end
|
||||
|
||||
subgraph CB_DETAIL["Circuit Breaker Detail\nservices/trading/circuit_breaker.py"]
|
||||
CB1["daily_loss\nportfolio loss > 5%\ncooldown: volatility_pause_hours"]
|
||||
CB2["single_position\nposition loss > 15%\ncooldown: ticker_cooldown_hours (48h)"]
|
||||
CB3["volatility\n≥ 3 stop-losses in 30min\ncooldown: volatility_pause_hours (2h)"]
|
||||
CB4["Redis state\nstonks:trading:circuit_breaker:*"]
|
||||
end
|
||||
|
||||
subgraph RESERVE["Reserve Pool\nservices/trading/reserve_pool.py"]
|
||||
RP1["Profit siphoning: 20%"]
|
||||
RP2["High-water rebalance: 30%"]
|
||||
RP3["Emergency liquidation"]
|
||||
RP4["reserve_pool_ledger"]
|
||||
end
|
||||
|
||||
subgraph RISK_TIER["Risk Tier Auto-Adjustment\nservices/trading/risk_tier_controller.py"]
|
||||
RT1["Evaluate: Sharpe ratio,\ndrawdown, win rate"]
|
||||
RT2["conservative → moderate → aggressive"]
|
||||
RT3["risk_tier_history"]
|
||||
end
|
||||
```
|
||||
@@ -0,0 +1,62 @@
|
||||
# Trend Accumulation and Escalation
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Windows["Five Time Windows\nservices/aggregation/worker.py"]
|
||||
W1["intraday (12h)"]
|
||||
W2["1d (1 day)"]
|
||||
W3["7d (7 days)"]
|
||||
W4["30d (30 days)"]
|
||||
W5["90d (90 days)"]
|
||||
end
|
||||
|
||||
W1 & W2 & W3 & W4 & W5 --> SIGNALS
|
||||
|
||||
SIGNALS["Fetch signals per window\nCompany + Macro + Competitive\n→ WeightedSignal[]"]
|
||||
|
||||
SIGNALS --> SENT["weighted_sentiment_average()\nCompute avg sentiment across signals"]
|
||||
|
||||
SENT --> DIR
|
||||
|
||||
subgraph DIR["derive_trend_direction()"]
|
||||
D1["avg_sentiment ≥ 0.15 → BULLISH"]
|
||||
D2["avg_sentiment ≤ −0.15 → BEARISH"]
|
||||
D3["contradiction > 0.10\nAND |avg| < 0.30 → MIXED"]
|
||||
D4["otherwise → NEUTRAL"]
|
||||
end
|
||||
|
||||
DIR --> CONF
|
||||
|
||||
subgraph CONF["compute_trend_confidence()"]
|
||||
C1["Unique source count\ncaps at 15 → 0.8 contribution"]
|
||||
C2["Avg extraction credibility"]
|
||||
C3["Signal agreement ratio\ndampened by log₂(n+1)/log₂(8)\nsaturates ~7 unique sources"]
|
||||
C4["Contradiction penalty\n−0.4 × contradiction_score"]
|
||||
C5["confidence = 0.3×count + 0.3×credibility\n+ 0.4×agreement − penalty"]
|
||||
end
|
||||
|
||||
CONF --> STRENGTH["trend_strength = |avg_sentiment|\nclamped to [0, 1]"]
|
||||
|
||||
STRENGTH --> ESC
|
||||
|
||||
subgraph ESC["Escalation Path\n(via eligibility thresholds)"]
|
||||
direction TB
|
||||
E1["NEUTRAL\nconfidence < 0.35\nOR strength < 0.10\nOR direction = neutral"]
|
||||
E2["WATCH\nstrength < 0.25\nAND confidence < 0.50"]
|
||||
E3["HOLD\nstrength < 0.25\nAND confidence ≥ 0.50"]
|
||||
E4["BUY / SELL\nstrength ≥ 0.25\nAND direction = bullish/bearish"]
|
||||
|
||||
E1 -->|"More signals\nsame direction"| E2
|
||||
E2 -->|"Confidence grows\nmore unique sources"| E3
|
||||
E3 -->|"Strength exceeds 0.25\naccumulated evidence"| E4
|
||||
end
|
||||
|
||||
ESC --> PERSIST
|
||||
|
||||
subgraph PERSIST["Persistence"]
|
||||
P1["trend_windows\n(upserted each cycle)"]
|
||||
P2["trend_history\n(time-series snapshots)"]
|
||||
P3["trend_evidence\n(per-document rankings)"]
|
||||
P4["trend_projections\nservices/aggregation/projection.py"]
|
||||
end
|
||||
```
|
||||
@@ -0,0 +1,58 @@
|
||||
# Weighted Signal Computation
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
DOC["Document Signal Input\n(published_at, source_credibility,\nnovelty_score, extraction_confidence,\nmarket_ctx)"]
|
||||
|
||||
DOC --> GATE
|
||||
DOC --> REC
|
||||
DOC --> CRED
|
||||
DOC --> NOV
|
||||
DOC --> MKT
|
||||
|
||||
subgraph GATE["Confidence Gate"]
|
||||
G1["extraction_confidence ≥ 0.2?"]
|
||||
G1 -->|"Yes"| G2["gate = 1.0"]
|
||||
G1 -->|"No"| G3["gate = 0.0\n(signal zeroed out)"]
|
||||
end
|
||||
|
||||
subgraph REC["Recency Decay"]
|
||||
R1["w = 2^(−age_hours / half_life)"]
|
||||
R2["Half-lives per window:\nintraday: 2h\n1d: 12h\n7d: 72h\n30d: 240h\n90d: 720h"]
|
||||
R3["Floor: min_recency_weight = 0.01"]
|
||||
R1 --- R2
|
||||
R1 --- R3
|
||||
end
|
||||
|
||||
subgraph CRED["Source Credibility"]
|
||||
C1["Clamp to [0.1, 1.0]"]
|
||||
C2["Apply exponent\n(default 1.0)"]
|
||||
C1 --> C2
|
||||
end
|
||||
|
||||
subgraph NOV["Novelty Bonus"]
|
||||
N1["bonus = novelty_score × 0.25"]
|
||||
N2["Range: [0.0, 0.25]\n(up to 25% boost)"]
|
||||
N1 --- N2
|
||||
end
|
||||
|
||||
subgraph MKT["Market Context Multiplier"]
|
||||
M1["Volatility boost\nlog₁₊(excess) × 0.15\ncapped at 0.30"]
|
||||
M2["Volume surge boost\nvolume_change > 50% → +0.15"]
|
||||
M3["multiplier = 1.0 + boost\n(always ≥ 1.0)"]
|
||||
M1 --> M3
|
||||
M2 --> M3
|
||||
end
|
||||
|
||||
GATE --> FORMULA
|
||||
REC --> FORMULA
|
||||
CRED --> FORMULA
|
||||
NOV --> FORMULA
|
||||
MKT --> FORMULA
|
||||
|
||||
FORMULA["combined = gate × recency × credibility\n× (1 + novelty_bonus)\n× market_context_multiplier"]
|
||||
|
||||
FORMULA --> SW["SignalWeight\nservices/aggregation/scoring.py"]
|
||||
|
||||
SW --> WS["WeightedSignal\n{ document_id, weight: SignalWeight,\nsentiment_value, impact_score }"]
|
||||
```
|
||||
@@ -0,0 +1,40 @@
|
||||
# Intelligence Pipeline Deep Dive
|
||||
|
||||
This document series provides a narrative walkthrough of the full intelligence-to-decision pipeline in Stonks Oracle. Unlike the existing service reference and API documentation, these pages tell the story of how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions.
|
||||
|
||||
Each page covers one stage of the pipeline and ends with a transition to the next, so you can read the series end-to-end or jump directly to the stage you need. Diagrams are stored as standalone Mermaid files that can be rendered independently or embedded in other documents.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Data Ingestion and Preparation](01-data-ingestion-and-preparation.md) — How raw data from Polygon.io, SEC EDGAR, and macro news APIs enters the system, gets deduplicated, stored, parsed, and routed for AI processing.
|
||||
2. [AI Agent Processing and Structured Extraction](02-ai-agent-processing-and-extraction.md) — How the Document Intelligence Extractor and Global Event Classifier agents use LLM inference to produce structured JSON intelligence from documents.
|
||||
3. [Signal Scoring and the WeightedSignal Abstraction](03-signal-scoring-and-weighted-signals.md) — How raw extraction output is transformed into weighted signals through confidence gating, recency decay, source credibility, novelty bonuses, and market context multipliers.
|
||||
4. [Trend Aggregation and Accumulating Signals](04-trend-aggregation-and-accumulating-signals.md) — How the aggregation engine merges weighted signals across five time windows, detects contradictions, ranks evidence, and escalates trend strength as consecutive signals accumulate.
|
||||
5. [Recommendation Generation](05-recommendation-generation.md) — How trend summaries pass through data quality suppression, eligibility evaluation, position sizing, thesis generation, and risk classification to produce actionable recommendations.
|
||||
6. [Trading Decisions and Execution](06-trading-decisions-and-execution.md) — How the trading engine polls recommendations, runs pre-trade checks, sizes positions, enforces circuit breakers, and submits orders through the broker adapter.
|
||||
|
||||
---
|
||||
|
||||
## Diagrams
|
||||
|
||||
The following Mermaid diagram files can be rendered independently or referenced from the narrative pages:
|
||||
|
||||
- [Ingestion to Extraction Flow](diagrams/ingestion-to-extraction-flow.md) — Flowchart from Scheduler through Ingestion, Parser, to Extractor with all queues and storage.
|
||||
- [Three-Layer Signal Merging](diagrams/three-layer-signal-merging.md) — Company, Macro, and Competitive signal layers converging into the Aggregation engine.
|
||||
- [Weighted Signal Computation](diagrams/weighted-signal-computation.md) — Component breakdown of the composite weight formula.
|
||||
- [Trend Accumulation and Escalation](diagrams/trend-accumulation-escalation.md) — How consecutive signals strengthen trends and escalate actions across time windows.
|
||||
- [Recommendation Generation Flow](diagrams/recommendation-generation-flow.md) — From TrendSummary through suppression, eligibility, thesis, risk classification, to persistence.
|
||||
- [Trading Engine Decision Loop](diagrams/trading-engine-decision-loop.md) — Pre-trade check sequence, position sizing, and order submission flow.
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
For reference-level detail on individual services, AI agent configuration, and infrastructure, see the existing documentation:
|
||||
|
||||
- [Services Reference](../services.md) — Per-service configuration, database tables, queues, and runtime behaviors.
|
||||
- [AI Agents Guide](../ai-agents.md) — AI agent configuration, variants, A/B testing, and the agent management API.
|
||||
- [Data Pipeline Architecture](../architecture-data-pipeline.md) — Queue topology, data store summary, and Mermaid flow diagrams for the full data pipeline.
|
||||
- [LLM-to-Trade Pipeline](../llm-to-trade-pipeline.md) — End-to-end data flow from model output through signal aggregation to trade execution.
|
||||
Reference in New Issue
Block a user