feat: implement dual-pipeline signal engine service
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
New service at services/signal_engine/ implementing concurrent heuristic (deterministic scoring) and probabilistic (Bayesian inference) pipelines that evaluate technical signals across 6 timeframes (M30-M) and produce independent BUY/WATCH/SKIP verdicts per ticker per evaluation tick. Components: - Input Normalizer: multi-source data assembly with sentinel fallbacks - Signal Library: Fibonacci, MA Stack, RSI, Cup & Handle, Elliott Wave - Multi-Timeframe Confluence Engine: weighted scoring with D/W/M anchors - Hard Filter Engine: macro_bias, valuation, earnings proximity gating - Heuristic Pipeline: S_total scoring with confidence-gated verdicts - Probabilistic Pipeline: Bayesian log-odds with regime priors, entropy gating, EV_R calculation, and signal correlation penalty - Exit Engine: stop-loss, targets, trailing ATR-based stops - Delta Analyzer: pipeline agreement tracking with rolling Redis metrics - Output Formatter: SignalOutput contract + Recommendation schema mapping - Worker orchestrator: concurrent pipelines with failure isolation - Main entry point: queue polling with fail-safe config loading Infrastructure: - Migration 039: signal_engine_outputs table with 3 indexes - Helm chart: signalEngine service entry (processing tier) - Redis key: QUEUE_SIGNAL_ENGINE constant Tests: 390 tests (unit + property-based) covering all components Config: dual_pipeline_enabled=false by default (safe rollout)
This commit is contained in:
+104
-13
@@ -1,6 +1,6 @@
|
||||
# AI Agent Building Guide
|
||||
|
||||
Stonks Oracle uses three AI agents powered by a local Ollama instance. Each agent has a dedicated purpose in the pipeline, a database-backed configuration, and support for A/B testing through variants. This guide covers how each agent works, how to configure them, how to create and test variants, and how to monitor performance.
|
||||
Stonks Oracle uses three AI agents powered by local LLM inference (Ollama or vLLM). Each agent has a dedicated purpose in the pipeline, a database-backed configuration, and support for A/B testing through variants. This guide covers how each agent works, how to configure them, how to create and test variants, and how to monitor performance.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
@@ -8,6 +8,7 @@ Stonks Oracle uses three AI agents powered by a local Ollama instance. Each agen
|
||||
- [Document Intelligence Extractor](#1-document-intelligence-extractor)
|
||||
- [Global Event Classifier](#2-global-event-classifier)
|
||||
- [Thesis Rewriter](#3-thesis-rewriter)
|
||||
- [LLM Provider Abstraction](#llm-provider-abstraction)
|
||||
- [Database Schema](#database-schema)
|
||||
- [ai_agents Table](#ai_agents-table)
|
||||
- [agent_variants Table](#agent_variants-table)
|
||||
@@ -30,9 +31,10 @@ Three agents are seeded into the `ai_agents` table on first migration (migration
|
||||
| **Slug** | `document-extractor` |
|
||||
| **Purpose** | Extracts structured intelligence (sentiment, catalysts, impact scores, key facts, risks) from company news, SEC filings, earnings transcripts, and press releases |
|
||||
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
|
||||
| **Supported Providers** | `ollama`, `vllm` |
|
||||
| **Prompt Version** | `document-intel-v2` |
|
||||
| **Schema Version** | `2.0.0` |
|
||||
| **Entry Point** | `services/extractor/main.py` → `services/extractor/client.py` |
|
||||
| **Entry Point** | `services/extractor/main.py` → `services/extractor/llm_factory.py` → `services/extractor/client.py` (Ollama) or `services/extractor/vllm_client.py` (vLLM) |
|
||||
|
||||
**Input Data:**
|
||||
- Normalized document text (fetched from MinIO or passed in the Redis job payload)
|
||||
@@ -40,7 +42,7 @@ Three agents are seeded into the `ai_agents` table on first migration (migration
|
||||
- List of tracked tickers for company identification
|
||||
- Document ID for traceability
|
||||
|
||||
**Output Schema** (`ExtractionResult`):
|
||||
**Output Schema** (`ExtractionResult` — defined in `services/extractor/schemas.py`):
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -81,6 +83,7 @@ Use "other" for catalyst_type if unsure. Keep evidence_spans short
|
||||
- Includes tracked ticker list with rules for company identification
|
||||
- Includes the full JSON schema field descriptions
|
||||
- Truncates documents to 8,000 characters to limit inference time
|
||||
- When an active variant has `input_token_limit > 0`, truncation uses `input_token_limit * 4` characters instead
|
||||
|
||||
---
|
||||
|
||||
@@ -91,6 +94,7 @@ Use "other" for catalyst_type if unsure. Keep evidence_spans short
|
||||
| **Slug** | `event-classifier` |
|
||||
| **Purpose** | Classifies global/geopolitical news into structured macro events with impact type, severity, affected regions/sectors/commodities, and estimated duration |
|
||||
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
|
||||
| **Supported Providers** | `ollama`, `vllm` |
|
||||
| **Prompt Version** | `event-classification-v1` |
|
||||
| **Schema Version** | `1.0.0` |
|
||||
| **Entry Point** | `services/extractor/main.py` → `services/extractor/event_classifier.py` |
|
||||
@@ -99,7 +103,7 @@ Use "other" for catalyst_type if unsure. Keep evidence_spans short
|
||||
- Normalized text of a macro news article (from the `stonks:queue:macro_classification` Redis queue)
|
||||
- Document ID for traceability
|
||||
|
||||
**Output Schema** (`GlobalEvent`):
|
||||
**Output Schema** (`GlobalEvent` — defined in `services/extractor/event_classifier.py`):
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -141,9 +145,11 @@ as empty arrays.
|
||||
```
|
||||
|
||||
**User Prompt Template** (built by `build_event_classification_prompt()` in `services/extractor/event_classifier.py`):
|
||||
- Includes anti-hallucination rules
|
||||
- Includes anti-hallucination rules (no fabrication, severity "critical" reserved for multi-country events)
|
||||
- Lists all valid enum values for each field
|
||||
- Truncates articles to 6,000 characters
|
||||
- When an active variant has `input_token_limit > 0`, truncation uses `input_token_limit * 4` characters instead
|
||||
- If a variant overrides the system prompt, the classifier ensures JSON output instructions are always appended if not already present
|
||||
|
||||
---
|
||||
|
||||
@@ -154,6 +160,7 @@ as empty arrays.
|
||||
| **Slug** | `thesis-rewriter` |
|
||||
| **Purpose** | Rewrites deterministic trade thesis summaries into clear, professional analyst prose. Optional layer — the system falls back to the deterministic thesis if this fails |
|
||||
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
|
||||
| **Supported Providers** | `ollama`, `vllm` |
|
||||
| **Prompt Version** | `thesis-rewrite-v1` |
|
||||
| **Schema Version** | `1.0.0` |
|
||||
| **Entry Point** | `services/recommendation/main.py` → `services/recommendation/thesis_llm.py` |
|
||||
@@ -165,6 +172,7 @@ as empty arrays.
|
||||
**Output Schema:**
|
||||
- Plain text (not JSON). The model returns only the rewritten thesis as a string, under 150 words.
|
||||
- On failure or empty response, the original deterministic thesis is returned unchanged.
|
||||
- A `_strip_thinking_block()` post-processor removes `<think>` XML tags and "Thinking Process:" blocks that some models (e.g. Qwen3) emit before the actual response.
|
||||
|
||||
**System Prompt:**
|
||||
|
||||
@@ -182,11 +190,37 @@ STRICT RULES:
|
||||
5. Use a neutral, professional tone. Avoid hype or marketing language.
|
||||
6. Return ONLY the rewritten thesis text. No JSON, no markdown, no
|
||||
commentary.
|
||||
7. Do NOT show your thinking process. Do NOT include any reasoning
|
||||
steps. Output ONLY the final rewritten text.
|
||||
```
|
||||
|
||||
**User Prompt Template** (built by `build_thesis_rewrite_prompt()` in `services/recommendation/thesis_llm.py`):
|
||||
- Includes the deterministic thesis between delimiters
|
||||
- Includes trend context: ticker, window, direction, strength, confidence, contradiction score, top catalysts, top risks
|
||||
- Appends `/no_think` suffix to suppress reasoning mode on models that support it (e.g. Qwen3)
|
||||
- Ollama calls also set `"think": false` in the request payload
|
||||
|
||||
---
|
||||
|
||||
## LLM Provider Abstraction
|
||||
|
||||
All three agents support both **Ollama** and **vLLM** as inference providers. The provider is determined by the `model_provider` field in the agent config (or active variant).
|
||||
|
||||
**Module:** `services/extractor/llm_factory.py`
|
||||
|
||||
The `build_llm_client()` factory function routes to the correct client:
|
||||
|
||||
| `model_provider` value | Client class | API endpoint |
|
||||
|------------------------|-------------|--------------|
|
||||
| `ollama` (default), `""`, `None` | `OllamaClient` (`services/extractor/client.py`) | `{OLLAMA_BASE_URL}/api/chat` |
|
||||
| `vllm` | `VLLMClient` (`services/extractor/vllm_client.py`) | `{VLLM_BASE_URL}/v1/chat/completions` (OpenAI-compatible) |
|
||||
| Unknown value | `OllamaClient` (with warning log) | Falls back to Ollama |
|
||||
|
||||
Both clients implement the `LLMClient` protocol (`services/shared/llm_protocol.py`), providing `call_llm()` and `close()` methods.
|
||||
|
||||
**Provider switching at runtime:** When a variant changes the `model_provider`, the extractor worker detects this during its periodic config refresh (every 100 jobs) and creates a new client instance. The old client is closed gracefully. A safety guard prevents switching to Ollama if `OLLAMA_BASE_URL` is empty.
|
||||
|
||||
**vLLM health check:** At startup, if the resolved provider is `vllm`, the extractor runs a health check against the vLLM endpoint. If it fails, the worker falls back to Ollama automatically.
|
||||
|
||||
---
|
||||
|
||||
@@ -202,8 +236,8 @@ Defined in migration `026_ai_agents.sql`. Stores the base configuration for each
|
||||
| `name` | `VARCHAR(100)` | — | Human-readable name (unique) |
|
||||
| `slug` | `VARCHAR(100)` | — | URL-safe identifier (unique), used by `AgentConfigResolver` |
|
||||
| `purpose` | `TEXT` | `''` | Description of what the agent does |
|
||||
| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider |
|
||||
| `model_name` | `VARCHAR(200)` | `'qwen3.5:9b'` | Model identifier |
|
||||
| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider (`ollama` or `vllm`) |
|
||||
| `model_name` | `VARCHAR(200)` | `'qwen3.5:9b-fast'` | Model identifier |
|
||||
| `system_prompt` | `TEXT` | `''` | System prompt sent to the model |
|
||||
| `user_prompt_template` | `TEXT` | `''` | User prompt template (optional — code-defined templates take precedence) |
|
||||
| `prompt_version` | `VARCHAR(100)` | `''` | Version tag for prompt tracking |
|
||||
@@ -297,13 +331,20 @@ The `AgentConfigResolver` is the central mechanism for resolving runtime agent c
|
||||
2. **COALESCE-based override**: The SQL query uses `COALESCE(variant_column, agent_column)` for every configuration field. If an active variant exists and has a non-NULL value for a field, that value is used. Otherwise, the base agent's value is used.
|
||||
|
||||
```sql
|
||||
SELECT a.id AS agent_id,
|
||||
v.id AS variant_id,
|
||||
SELECT a.id AS agent_id,
|
||||
v.id AS variant_id,
|
||||
COALESCE(v.model_provider, a.model_provider) AS model_provider,
|
||||
COALESCE(v.model_name, a.model_name) AS model_name,
|
||||
COALESCE(v.system_prompt, a.system_prompt) AS system_prompt,
|
||||
COALESCE(v.user_prompt_template, a.user_prompt_template) AS user_prompt_template,
|
||||
-- ... all other fields ...
|
||||
COALESCE(v.prompt_version, a.prompt_version) AS prompt_version,
|
||||
COALESCE(v.temperature, a.temperature) AS temperature,
|
||||
COALESCE(v.max_tokens, a.max_tokens) AS max_tokens,
|
||||
COALESCE(v.context_window, 0) AS context_window,
|
||||
COALESCE(v.input_token_limit, 0) AS input_token_limit,
|
||||
COALESCE(v.token_budget, 0) AS token_budget,
|
||||
COALESCE(v.timeout_seconds, a.timeout_seconds) AS timeout_seconds,
|
||||
COALESCE(v.max_retries, a.max_retries) AS max_retries
|
||||
FROM ai_agents a
|
||||
LEFT JOIN agent_variants v
|
||||
ON v.agent_id = a.id AND v.is_active = TRUE
|
||||
@@ -361,7 +402,10 @@ resolver.invalidate() # Clear all entries
|
||||
|
||||
### Config Refresh in Workers
|
||||
|
||||
The extractor and recommendation workers periodically re-resolve their agent config (every 100 jobs for the extractor, every 50 jobs for the recommendation worker). If the resolved model changes, the worker creates a new `OllamaClient` instance with the updated configuration.
|
||||
The extractor and recommendation workers periodically re-resolve their agent config to pick up variant swaps and model changes:
|
||||
|
||||
- **Extractor worker** (`services/extractor/main.py`): Re-resolves both `document-extractor` and `event-classifier` configs every **100 jobs**. If the resolved model or provider changes, the worker creates a new LLM client instance via `build_llm_client()` and closes the old one. A safety guard prevents switching to Ollama if `OLLAMA_BASE_URL` is empty.
|
||||
- **Recommendation worker** (`services/recommendation/main.py`): Re-resolves the `thesis-rewriter` config every **50 jobs**. If the model changes, a new `OllamaConfig` is built.
|
||||
|
||||
---
|
||||
|
||||
@@ -373,7 +417,7 @@ Every agent invocation is logged to `agent_performance_log` with the `agent_id`
|
||||
|
||||
- **Document extractor**: Logged in `services/extractor/main.py` after each extraction. Records success/failure, duration, confidence, retry count, token estimates.
|
||||
- **Event classifier**: Logged in `services/extractor/event_classifier.py` after each classification. Same fields.
|
||||
- **Thesis rewriter**: Logged in `services/recommendation/thesis_llm.py` after each rewrite attempt. Confidence is always 0.0 (not applicable for rewrites).
|
||||
- **Thesis rewriter**: Logged in `services/recommendation/thesis_llm.py` after each rewrite attempt. Confidence is always 0.0 (not applicable for rewrites). `document_id` is always NULL.
|
||||
|
||||
### Querying for Variant Comparison
|
||||
|
||||
@@ -464,6 +508,8 @@ All agent endpoints are served by the Query API (`services/api/app.py`) under th
|
||||
}
|
||||
```
|
||||
|
||||
All fields except `name` have defaults. The `slug` is auto-generated from `name` if not provided. The `model_name` defaults to `llama3.1:8b` for user-created agents.
|
||||
|
||||
**Update Agent Request Body** (all fields optional):
|
||||
|
||||
```json
|
||||
@@ -509,6 +555,30 @@ All agent endpoints are served by the Query API (`services/api/app.py`) under th
|
||||
| `PUT` | `/api/agents/{agent_id}/variants/{variant_id}` | Partial update a variant |
|
||||
| `DELETE` | `/api/agents/{agent_id}/variants/{variant_id}` | Delete a variant (returns 400 if active) |
|
||||
|
||||
**Create Variant Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"variant_name": "Llama 3.1 8B Test",
|
||||
"variant_slug": "llama-3-1-8b-test",
|
||||
"description": "Testing llama3.1:8b as an alternative",
|
||||
"model_provider": "ollama",
|
||||
"model_name": "llama3.1:8b",
|
||||
"system_prompt": "",
|
||||
"user_prompt_template": "",
|
||||
"prompt_version": "",
|
||||
"temperature": 0.0,
|
||||
"max_tokens": 32768,
|
||||
"context_window": 0,
|
||||
"input_token_limit": 0,
|
||||
"token_budget": 0,
|
||||
"timeout_seconds": 120,
|
||||
"max_retries": 2
|
||||
}
|
||||
```
|
||||
|
||||
Required fields: `variant_name`, `model_name`. The `variant_slug` is auto-generated from `variant_name` if not provided.
|
||||
|
||||
### Clone Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
@@ -516,7 +586,7 @@ All agent endpoints are served by the Query API (`services/api/app.py`) under th
|
||||
| `POST` | `/api/agents/{agent_id}/clone` | Clone an agent's base config as a new variant |
|
||||
| `POST` | `/api/agents/{agent_id}/variants/{variant_id}/clone` | Clone an existing variant as a new variant |
|
||||
|
||||
Clone requests copy all configuration fields from the source, with optional overrides in the request body.
|
||||
Clone requests copy all configuration fields from the source, with optional overrides in the request body. The `variant_name` field is required. All other fields default to the source's values if not provided.
|
||||
|
||||
### Activate / Deactivate
|
||||
|
||||
@@ -525,6 +595,8 @@ Clone requests copy all configuration fields from the source, with optional over
|
||||
| `POST` | `/api/agents/{agent_id}/variants/{variant_id}/activate` | Set a variant as active (deactivates any other active variant in a single transaction) |
|
||||
| `POST` | `/api/agents/{agent_id}/variants/deactivate` | Deactivate the currently active variant (agent falls back to base config) |
|
||||
|
||||
The activate endpoint uses a database transaction to atomically deactivate the current variant and activate the new one, ensuring exactly one active variant at all times.
|
||||
|
||||
### Per-Variant Performance
|
||||
|
||||
| Method | Path | Description |
|
||||
@@ -532,6 +604,8 @@ Clone requests copy all configuration fields from the source, with optional over
|
||||
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance` | Aggregated metrics for a specific variant |
|
||||
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance/history` | Hourly time-series for a specific variant |
|
||||
|
||||
Both endpoints accept the same `hours` query parameter (default 24, max 720) and return the same response shape as the agent-level performance endpoints.
|
||||
|
||||
---
|
||||
|
||||
## Step-by-Step: Creating and Activating a Variant
|
||||
@@ -616,3 +690,20 @@ curl -s -X PUT \
|
||||
```
|
||||
|
||||
Then re-activate and compare again.
|
||||
|
||||
### 7. Switch to vLLM Provider
|
||||
|
||||
To test a variant using vLLM instead of Ollama:
|
||||
|
||||
```bash
|
||||
curl -s -X POST https://stonks-api.celestium.life/api/agents/$AGENT_ID/clone \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"variant_name": "vLLM Qwen3 Test",
|
||||
"description": "Testing extraction with vLLM backend",
|
||||
"model_provider": "vllm",
|
||||
"model_name": "Qwen/Qwen3-8B"
|
||||
}' | jq .
|
||||
```
|
||||
|
||||
The extractor worker will detect the provider change during its next config refresh and build a `VLLMClient` instead of an `OllamaClient`. Ensure the `VLLM_BASE_URL` environment variable is set in the extractor deployment.
|
||||
Reference in New Issue
Block a user