feat: implement dual-pipeline signal engine service
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled

New service at services/signal_engine/ implementing concurrent heuristic
(deterministic scoring) and probabilistic (Bayesian inference) pipelines
that evaluate technical signals across 6 timeframes (M30-M) and produce
independent BUY/WATCH/SKIP verdicts per ticker per evaluation tick.

Components:
- Input Normalizer: multi-source data assembly with sentinel fallbacks
- Signal Library: Fibonacci, MA Stack, RSI, Cup & Handle, Elliott Wave
- Multi-Timeframe Confluence Engine: weighted scoring with D/W/M anchors
- Hard Filter Engine: macro_bias, valuation, earnings proximity gating
- Heuristic Pipeline: S_total scoring with confidence-gated verdicts
- Probabilistic Pipeline: Bayesian log-odds with regime priors, entropy
  gating, EV_R calculation, and signal correlation penalty
- Exit Engine: stop-loss, targets, trailing ATR-based stops
- Delta Analyzer: pipeline agreement tracking with rolling Redis metrics
- Output Formatter: SignalOutput contract + Recommendation schema mapping
- Worker orchestrator: concurrent pipelines with failure isolation
- Main entry point: queue polling with fail-safe config loading

Infrastructure:
- Migration 039: signal_engine_outputs table with 3 indexes
- Helm chart: signalEngine service entry (processing tier)
- Redis key: QUEUE_SIGNAL_ENGINE constant

Tests: 390 tests (unit + property-based) covering all components
Config: dual_pipeline_enabled=false by default (safe rollout)
This commit is contained in:
Celes Renata
2026-05-02 07:32:26 +00:00
parent 7e2343ec2c
commit f468e30af0
61 changed files with 14107 additions and 184 deletions
+104 -13
View File
@@ -1,6 +1,6 @@
# AI Agent Building Guide
Stonks Oracle uses three AI agents powered by a local Ollama instance. Each agent has a dedicated purpose in the pipeline, a database-backed configuration, and support for A/B testing through variants. This guide covers how each agent works, how to configure them, how to create and test variants, and how to monitor performance.
Stonks Oracle uses three AI agents powered by local LLM inference (Ollama or vLLM). Each agent has a dedicated purpose in the pipeline, a database-backed configuration, and support for A/B testing through variants. This guide covers how each agent works, how to configure them, how to create and test variants, and how to monitor performance.
## Table of Contents
@@ -8,6 +8,7 @@ Stonks Oracle uses three AI agents powered by a local Ollama instance. Each agen
- [Document Intelligence Extractor](#1-document-intelligence-extractor)
- [Global Event Classifier](#2-global-event-classifier)
- [Thesis Rewriter](#3-thesis-rewriter)
- [LLM Provider Abstraction](#llm-provider-abstraction)
- [Database Schema](#database-schema)
- [ai_agents Table](#ai_agents-table)
- [agent_variants Table](#agent_variants-table)
@@ -30,9 +31,10 @@ Three agents are seeded into the `ai_agents` table on first migration (migration
| **Slug** | `document-extractor` |
| **Purpose** | Extracts structured intelligence (sentiment, catalysts, impact scores, key facts, risks) from company news, SEC filings, earnings transcripts, and press releases |
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
| **Supported Providers** | `ollama`, `vllm` |
| **Prompt Version** | `document-intel-v2` |
| **Schema Version** | `2.0.0` |
| **Entry Point** | `services/extractor/main.py``services/extractor/client.py` |
| **Entry Point** | `services/extractor/main.py``services/extractor/llm_factory.py``services/extractor/client.py` (Ollama) or `services/extractor/vllm_client.py` (vLLM) |
**Input Data:**
- Normalized document text (fetched from MinIO or passed in the Redis job payload)
@@ -40,7 +42,7 @@ Three agents are seeded into the `ai_agents` table on first migration (migration
- List of tracked tickers for company identification
- Document ID for traceability
**Output Schema** (`ExtractionResult`):
**Output Schema** (`ExtractionResult` — defined in `services/extractor/schemas.py`):
```json
{
@@ -81,6 +83,7 @@ Use "other" for catalyst_type if unsure. Keep evidence_spans short
- Includes tracked ticker list with rules for company identification
- Includes the full JSON schema field descriptions
- Truncates documents to 8,000 characters to limit inference time
- When an active variant has `input_token_limit > 0`, truncation uses `input_token_limit * 4` characters instead
---
@@ -91,6 +94,7 @@ Use "other" for catalyst_type if unsure. Keep evidence_spans short
| **Slug** | `event-classifier` |
| **Purpose** | Classifies global/geopolitical news into structured macro events with impact type, severity, affected regions/sectors/commodities, and estimated duration |
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
| **Supported Providers** | `ollama`, `vllm` |
| **Prompt Version** | `event-classification-v1` |
| **Schema Version** | `1.0.0` |
| **Entry Point** | `services/extractor/main.py``services/extractor/event_classifier.py` |
@@ -99,7 +103,7 @@ Use "other" for catalyst_type if unsure. Keep evidence_spans short
- Normalized text of a macro news article (from the `stonks:queue:macro_classification` Redis queue)
- Document ID for traceability
**Output Schema** (`GlobalEvent`):
**Output Schema** (`GlobalEvent` — defined in `services/extractor/event_classifier.py`):
```json
{
@@ -141,9 +145,11 @@ as empty arrays.
```
**User Prompt Template** (built by `build_event_classification_prompt()` in `services/extractor/event_classifier.py`):
- Includes anti-hallucination rules
- Includes anti-hallucination rules (no fabrication, severity "critical" reserved for multi-country events)
- Lists all valid enum values for each field
- Truncates articles to 6,000 characters
- When an active variant has `input_token_limit > 0`, truncation uses `input_token_limit * 4` characters instead
- If a variant overrides the system prompt, the classifier ensures JSON output instructions are always appended if not already present
---
@@ -154,6 +160,7 @@ as empty arrays.
| **Slug** | `thesis-rewriter` |
| **Purpose** | Rewrites deterministic trade thesis summaries into clear, professional analyst prose. Optional layer — the system falls back to the deterministic thesis if this fails |
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
| **Supported Providers** | `ollama`, `vllm` |
| **Prompt Version** | `thesis-rewrite-v1` |
| **Schema Version** | `1.0.0` |
| **Entry Point** | `services/recommendation/main.py``services/recommendation/thesis_llm.py` |
@@ -165,6 +172,7 @@ as empty arrays.
**Output Schema:**
- Plain text (not JSON). The model returns only the rewritten thesis as a string, under 150 words.
- On failure or empty response, the original deterministic thesis is returned unchanged.
- A `_strip_thinking_block()` post-processor removes `<think>` XML tags and "Thinking Process:" blocks that some models (e.g. Qwen3) emit before the actual response.
**System Prompt:**
@@ -182,11 +190,37 @@ STRICT RULES:
5. Use a neutral, professional tone. Avoid hype or marketing language.
6. Return ONLY the rewritten thesis text. No JSON, no markdown, no
commentary.
7. Do NOT show your thinking process. Do NOT include any reasoning
steps. Output ONLY the final rewritten text.
```
**User Prompt Template** (built by `build_thesis_rewrite_prompt()` in `services/recommendation/thesis_llm.py`):
- Includes the deterministic thesis between delimiters
- Includes trend context: ticker, window, direction, strength, confidence, contradiction score, top catalysts, top risks
- Appends `/no_think` suffix to suppress reasoning mode on models that support it (e.g. Qwen3)
- Ollama calls also set `"think": false` in the request payload
---
## LLM Provider Abstraction
All three agents support both **Ollama** and **vLLM** as inference providers. The provider is determined by the `model_provider` field in the agent config (or active variant).
**Module:** `services/extractor/llm_factory.py`
The `build_llm_client()` factory function routes to the correct client:
| `model_provider` value | Client class | API endpoint |
|------------------------|-------------|--------------|
| `ollama` (default), `""`, `None` | `OllamaClient` (`services/extractor/client.py`) | `{OLLAMA_BASE_URL}/api/chat` |
| `vllm` | `VLLMClient` (`services/extractor/vllm_client.py`) | `{VLLM_BASE_URL}/v1/chat/completions` (OpenAI-compatible) |
| Unknown value | `OllamaClient` (with warning log) | Falls back to Ollama |
Both clients implement the `LLMClient` protocol (`services/shared/llm_protocol.py`), providing `call_llm()` and `close()` methods.
**Provider switching at runtime:** When a variant changes the `model_provider`, the extractor worker detects this during its periodic config refresh (every 100 jobs) and creates a new client instance. The old client is closed gracefully. A safety guard prevents switching to Ollama if `OLLAMA_BASE_URL` is empty.
**vLLM health check:** At startup, if the resolved provider is `vllm`, the extractor runs a health check against the vLLM endpoint. If it fails, the worker falls back to Ollama automatically.
---
@@ -202,8 +236,8 @@ Defined in migration `026_ai_agents.sql`. Stores the base configuration for each
| `name` | `VARCHAR(100)` | — | Human-readable name (unique) |
| `slug` | `VARCHAR(100)` | — | URL-safe identifier (unique), used by `AgentConfigResolver` |
| `purpose` | `TEXT` | `''` | Description of what the agent does |
| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider |
| `model_name` | `VARCHAR(200)` | `'qwen3.5:9b'` | Model identifier |
| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider (`ollama` or `vllm`) |
| `model_name` | `VARCHAR(200)` | `'qwen3.5:9b-fast'` | Model identifier |
| `system_prompt` | `TEXT` | `''` | System prompt sent to the model |
| `user_prompt_template` | `TEXT` | `''` | User prompt template (optional — code-defined templates take precedence) |
| `prompt_version` | `VARCHAR(100)` | `''` | Version tag for prompt tracking |
@@ -297,13 +331,20 @@ The `AgentConfigResolver` is the central mechanism for resolving runtime agent c
2. **COALESCE-based override**: The SQL query uses `COALESCE(variant_column, agent_column)` for every configuration field. If an active variant exists and has a non-NULL value for a field, that value is used. Otherwise, the base agent's value is used.
```sql
SELECT a.id AS agent_id,
v.id AS variant_id,
SELECT a.id AS agent_id,
v.id AS variant_id,
COALESCE(v.model_provider, a.model_provider) AS model_provider,
COALESCE(v.model_name, a.model_name) AS model_name,
COALESCE(v.system_prompt, a.system_prompt) AS system_prompt,
COALESCE(v.user_prompt_template, a.user_prompt_template) AS user_prompt_template,
-- ... all other fields ...
COALESCE(v.prompt_version, a.prompt_version) AS prompt_version,
COALESCE(v.temperature, a.temperature) AS temperature,
COALESCE(v.max_tokens, a.max_tokens) AS max_tokens,
COALESCE(v.context_window, 0) AS context_window,
COALESCE(v.input_token_limit, 0) AS input_token_limit,
COALESCE(v.token_budget, 0) AS token_budget,
COALESCE(v.timeout_seconds, a.timeout_seconds) AS timeout_seconds,
COALESCE(v.max_retries, a.max_retries) AS max_retries
FROM ai_agents a
LEFT JOIN agent_variants v
ON v.agent_id = a.id AND v.is_active = TRUE
@@ -361,7 +402,10 @@ resolver.invalidate() # Clear all entries
### Config Refresh in Workers
The extractor and recommendation workers periodically re-resolve their agent config (every 100 jobs for the extractor, every 50 jobs for the recommendation worker). If the resolved model changes, the worker creates a new `OllamaClient` instance with the updated configuration.
The extractor and recommendation workers periodically re-resolve their agent config to pick up variant swaps and model changes:
- **Extractor worker** (`services/extractor/main.py`): Re-resolves both `document-extractor` and `event-classifier` configs every **100 jobs**. If the resolved model or provider changes, the worker creates a new LLM client instance via `build_llm_client()` and closes the old one. A safety guard prevents switching to Ollama if `OLLAMA_BASE_URL` is empty.
- **Recommendation worker** (`services/recommendation/main.py`): Re-resolves the `thesis-rewriter` config every **50 jobs**. If the model changes, a new `OllamaConfig` is built.
---
@@ -373,7 +417,7 @@ Every agent invocation is logged to `agent_performance_log` with the `agent_id`
- **Document extractor**: Logged in `services/extractor/main.py` after each extraction. Records success/failure, duration, confidence, retry count, token estimates.
- **Event classifier**: Logged in `services/extractor/event_classifier.py` after each classification. Same fields.
- **Thesis rewriter**: Logged in `services/recommendation/thesis_llm.py` after each rewrite attempt. Confidence is always 0.0 (not applicable for rewrites).
- **Thesis rewriter**: Logged in `services/recommendation/thesis_llm.py` after each rewrite attempt. Confidence is always 0.0 (not applicable for rewrites). `document_id` is always NULL.
### Querying for Variant Comparison
@@ -464,6 +508,8 @@ All agent endpoints are served by the Query API (`services/api/app.py`) under th
}
```
All fields except `name` have defaults. The `slug` is auto-generated from `name` if not provided. The `model_name` defaults to `llama3.1:8b` for user-created agents.
**Update Agent Request Body** (all fields optional):
```json
@@ -509,6 +555,30 @@ All agent endpoints are served by the Query API (`services/api/app.py`) under th
| `PUT` | `/api/agents/{agent_id}/variants/{variant_id}` | Partial update a variant |
| `DELETE` | `/api/agents/{agent_id}/variants/{variant_id}` | Delete a variant (returns 400 if active) |
**Create Variant Request Body:**
```json
{
"variant_name": "Llama 3.1 8B Test",
"variant_slug": "llama-3-1-8b-test",
"description": "Testing llama3.1:8b as an alternative",
"model_provider": "ollama",
"model_name": "llama3.1:8b",
"system_prompt": "",
"user_prompt_template": "",
"prompt_version": "",
"temperature": 0.0,
"max_tokens": 32768,
"context_window": 0,
"input_token_limit": 0,
"token_budget": 0,
"timeout_seconds": 120,
"max_retries": 2
}
```
Required fields: `variant_name`, `model_name`. The `variant_slug` is auto-generated from `variant_name` if not provided.
### Clone Endpoints
| Method | Path | Description |
@@ -516,7 +586,7 @@ All agent endpoints are served by the Query API (`services/api/app.py`) under th
| `POST` | `/api/agents/{agent_id}/clone` | Clone an agent's base config as a new variant |
| `POST` | `/api/agents/{agent_id}/variants/{variant_id}/clone` | Clone an existing variant as a new variant |
Clone requests copy all configuration fields from the source, with optional overrides in the request body.
Clone requests copy all configuration fields from the source, with optional overrides in the request body. The `variant_name` field is required. All other fields default to the source's values if not provided.
### Activate / Deactivate
@@ -525,6 +595,8 @@ Clone requests copy all configuration fields from the source, with optional over
| `POST` | `/api/agents/{agent_id}/variants/{variant_id}/activate` | Set a variant as active (deactivates any other active variant in a single transaction) |
| `POST` | `/api/agents/{agent_id}/variants/deactivate` | Deactivate the currently active variant (agent falls back to base config) |
The activate endpoint uses a database transaction to atomically deactivate the current variant and activate the new one, ensuring exactly one active variant at all times.
### Per-Variant Performance
| Method | Path | Description |
@@ -532,6 +604,8 @@ Clone requests copy all configuration fields from the source, with optional over
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance` | Aggregated metrics for a specific variant |
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance/history` | Hourly time-series for a specific variant |
Both endpoints accept the same `hours` query parameter (default 24, max 720) and return the same response shape as the agent-level performance endpoints.
---
## Step-by-Step: Creating and Activating a Variant
@@ -616,3 +690,20 @@ curl -s -X PUT \
```
Then re-activate and compare again.
### 7. Switch to vLLM Provider
To test a variant using vLLM instead of Ollama:
```bash
curl -s -X POST https://stonks-api.celestium.life/api/agents/$AGENT_ID/clone \
-H "Content-Type: application/json" \
-d '{
"variant_name": "vLLM Qwen3 Test",
"description": "Testing extraction with vLLM backend",
"model_provider": "vllm",
"model_name": "Qwen/Qwen3-8B"
}' | jq .
```
The extractor worker will detect the provider change during its next config refresh and build a `VLLMClient` instead of an `OllamaClient`. Ensure the `VLLM_BASE_URL` environment variable is set in the extractor deployment.