Files
stonks-oracle/docs/ai-agents.md
T
Celes Renata f468e30af0
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
feat: implement dual-pipeline signal engine service
New service at services/signal_engine/ implementing concurrent heuristic
(deterministic scoring) and probabilistic (Bayesian inference) pipelines
that evaluate technical signals across 6 timeframes (M30-M) and produce
independent BUY/WATCH/SKIP verdicts per ticker per evaluation tick.

Components:
- Input Normalizer: multi-source data assembly with sentinel fallbacks
- Signal Library: Fibonacci, MA Stack, RSI, Cup & Handle, Elliott Wave
- Multi-Timeframe Confluence Engine: weighted scoring with D/W/M anchors
- Hard Filter Engine: macro_bias, valuation, earnings proximity gating
- Heuristic Pipeline: S_total scoring with confidence-gated verdicts
- Probabilistic Pipeline: Bayesian log-odds with regime priors, entropy
  gating, EV_R calculation, and signal correlation penalty
- Exit Engine: stop-loss, targets, trailing ATR-based stops
- Delta Analyzer: pipeline agreement tracking with rolling Redis metrics
- Output Formatter: SignalOutput contract + Recommendation schema mapping
- Worker orchestrator: concurrent pipelines with failure isolation
- Main entry point: queue polling with fail-safe config loading

Infrastructure:
- Migration 039: signal_engine_outputs table with 3 indexes
- Helm chart: signalEngine service entry (processing tier)
- Redis key: QUEUE_SIGNAL_ENGINE constant

Tests: 390 tests (unit + property-based) covering all components
Config: dual_pipeline_enabled=false by default (safe rollout)
2026-05-02 07:32:26 +00:00

30 KiB

AI Agent Building Guide

Stonks Oracle uses three AI agents powered by local LLM inference (Ollama or vLLM). Each agent has a dedicated purpose in the pipeline, a database-backed configuration, and support for A/B testing through variants. This guide covers how each agent works, how to configure them, how to create and test variants, and how to monitor performance.

Table of Contents


Built-in Agents

Three agents are seeded into the ai_agents table on first migration (migration 026_ai_agents.sql). They have source = 'system' and cannot be deleted through the API — only deactivated or edited.

1. Document Intelligence Extractor

Field Value
Slug document-extractor
Purpose Extracts structured intelligence (sentiment, catalysts, impact scores, key facts, risks) from company news, SEC filings, earnings transcripts, and press releases
Default Model qwen3.5:9b-fast (Ollama)
Supported Providers ollama, vllm
Prompt Version document-intel-v2
Schema Version 2.0.0
Entry Point services/extractor/main.pyservices/extractor/llm_factory.pyservices/extractor/client.py (Ollama) or services/extractor/vllm_client.py (vLLM)

Input Data:

  • Normalized document text (fetched from MinIO or passed in the Redis job payload)
  • Document type: article, filing, transcript, or press_release
  • List of tracked tickers for company identification
  • Document ID for traceability

Output Schema (ExtractionResult — defined in services/extractor/schemas.py):

{
  "summary": "1-3 sentence summary",
  "companies": [
    {
      "ticker": "AAPL",
      "company_name": "Apple Inc.",
      "relevance": 0.9,
      "sentiment": "positive|negative|neutral|mixed",
      "impact_score": 0.7,
      "impact_horizon": "intraday|1d|1d_7d|1d_30d|30d_90d|90d_plus",
      "catalyst_type": "earnings|product|legal|macro|supply_chain|m_and_a|rating_change|other",
      "key_facts": ["fact1", "fact2"],
      "risks": ["risk1"],
      "evidence_spans": ["verbatim quote from document"]
    }
  ],
  "macro_themes": ["inflation", "ai_capex"],
  "novelty_score": 0.6,
  "confidence": 0.8,
  "extraction_warnings": []
}

System Prompt:

You are a financial document analyst. Extract structured data as JSON.
Return ONLY a single JSON object. No markdown fences, no explanation,
no text before or after the JSON. Every field in the schema is required.
Use "other" for catalyst_type if unsure. Keep evidence_spans short
(under 20 words each). Keep key_facts to 3-5 items max.

User Prompt Template (built by build_extraction_prompt() in services/extractor/prompts.py):

  • Includes document type and type-specific guidance (article, filing, transcript, press release)
  • Includes tracked ticker list with rules for company identification
  • Includes the full JSON schema field descriptions
  • Truncates documents to 8,000 characters to limit inference time
  • When an active variant has input_token_limit > 0, truncation uses input_token_limit * 4 characters instead

2. Global Event Classifier

Field Value
Slug event-classifier
Purpose Classifies global/geopolitical news into structured macro events with impact type, severity, affected regions/sectors/commodities, and estimated duration
Default Model qwen3.5:9b-fast (Ollama)
Supported Providers ollama, vllm
Prompt Version event-classification-v1
Schema Version 1.0.0
Entry Point services/extractor/main.pyservices/extractor/event_classifier.py

Input Data:

  • Normalized text of a macro news article (from the stonks:queue:macro_classification Redis queue)
  • Document ID for traceability

Output Schema (GlobalEvent — defined in services/extractor/event_classifier.py):

{
  "event_types": ["trade_barrier", "commodity_shock"],
  "severity": "low|moderate|high|critical",
  "affected_regions": ["US", "CN"],
  "affected_sectors": ["Energy", "Industrials"],
  "affected_commodities": ["crude_oil"],
  "summary": "1-3 sentence summary of event and market implications",
  "key_facts": ["fact1", "fact2"],
  "estimated_duration": "short_term|medium_term|long_term",
  "confidence": 0.75
}

Valid event_types: supply_disruption, demand_shift, cost_increase, regulatory_pressure, currency_impact, commodity_shock, trade_barrier, geopolitical_risk

Valid severity: low, moderate, high, critical

System Prompt:

You classify MACRO-LEVEL global news into structured event JSON.
Return ONLY a single JSON object. No markdown, no explanation.
Every field is required. Keep key_facts to 3-5 items. Keep summary
under 3 sentences.

CRITICAL: Only classify articles about MACRO events that affect entire
markets, sectors, or economies. Examples: trade wars, interest rate
changes, commodity supply disruptions, regulatory changes, geopolitical
conflicts, natural disasters.

DO NOT classify as macro events: individual company earnings, lawsuits
against a single company, single-company management changes, individual
stock analysis, company-specific debt or bankruptcy, product launches
by one company. For these, set severity to "low", confidence below 0.3,
and leave affected_regions, affected_sectors, and affected_commodities
as empty arrays.

User Prompt Template (built by build_event_classification_prompt() in services/extractor/event_classifier.py):

  • Includes anti-hallucination rules (no fabrication, severity "critical" reserved for multi-country events)
  • Lists all valid enum values for each field
  • Truncates articles to 6,000 characters
  • When an active variant has input_token_limit > 0, truncation uses input_token_limit * 4 characters instead
  • If a variant overrides the system prompt, the classifier ensures JSON output instructions are always appended if not already present

3. Thesis Rewriter

Field Value
Slug thesis-rewriter
Purpose Rewrites deterministic trade thesis summaries into clear, professional analyst prose. Optional layer — the system falls back to the deterministic thesis if this fails
Default Model qwen3.5:9b-fast (Ollama)
Supported Providers ollama, vllm
Prompt Version thesis-rewrite-v1
Schema Version 1.0.0
Entry Point services/recommendation/main.pyservices/recommendation/thesis_llm.py

Input Data:

  • Deterministic thesis string (rule-based, built from trend data and eligibility rules)
  • TrendSummary context: ticker, window, direction, strength, confidence, contradiction score, dominant catalysts, material risks

Output Schema:

  • Plain text (not JSON). The model returns only the rewritten thesis as a string, under 150 words.
  • On failure or empty response, the original deterministic thesis is returned unchanged.
  • A _strip_thinking_block() post-processor removes <think> XML tags and "Thinking Process:" blocks that some models (e.g. Qwen3) emit before the actual response.

System Prompt:

You are a concise financial analyst. You rewrite structured trade thesis
summaries into clear, professional prose suitable for an internal
research note.

STRICT RULES:
1. Do NOT add any information that is not present in the input.
2. Do NOT fabricate numbers, dates, company names, or analyst opinions.
3. Keep the rewrite under 150 words.
4. Preserve all factual claims, risk notes, and evidence counts from
   the input.
5. Use a neutral, professional tone. Avoid hype or marketing language.
6. Return ONLY the rewritten thesis text. No JSON, no markdown, no
   commentary.
7. Do NOT show your thinking process. Do NOT include any reasoning
   steps. Output ONLY the final rewritten text.

User Prompt Template (built by build_thesis_rewrite_prompt() in services/recommendation/thesis_llm.py):

  • Includes the deterministic thesis between delimiters
  • Includes trend context: ticker, window, direction, strength, confidence, contradiction score, top catalysts, top risks
  • Appends /no_think suffix to suppress reasoning mode on models that support it (e.g. Qwen3)
  • Ollama calls also set "think": false in the request payload

LLM Provider Abstraction

All three agents support both Ollama and vLLM as inference providers. The provider is determined by the model_provider field in the agent config (or active variant).

Module: services/extractor/llm_factory.py

The build_llm_client() factory function routes to the correct client:

model_provider value Client class API endpoint
ollama (default), "", None OllamaClient (services/extractor/client.py) {OLLAMA_BASE_URL}/api/chat
vllm VLLMClient (services/extractor/vllm_client.py) {VLLM_BASE_URL}/v1/chat/completions (OpenAI-compatible)
Unknown value OllamaClient (with warning log) Falls back to Ollama

Both clients implement the LLMClient protocol (services/shared/llm_protocol.py), providing call_llm() and close() methods.

Provider switching at runtime: When a variant changes the model_provider, the extractor worker detects this during its periodic config refresh (every 100 jobs) and creates a new client instance. The old client is closed gracefully. A safety guard prevents switching to Ollama if OLLAMA_BASE_URL is empty.

vLLM health check: At startup, if the resolved provider is vllm, the extractor runs a health check against the vLLM endpoint. If it fails, the worker falls back to Ollama automatically.


Database Schema

ai_agents Table

Defined in migration 026_ai_agents.sql. Stores the base configuration for each agent.

Column Type Default Description
id UUID gen_random_uuid() Primary key
name VARCHAR(100) Human-readable name (unique)
slug VARCHAR(100) URL-safe identifier (unique), used by AgentConfigResolver
purpose TEXT '' Description of what the agent does
model_provider VARCHAR(50) 'ollama' LLM provider (ollama or vllm)
model_name VARCHAR(200) 'qwen3.5:9b-fast' Model identifier
system_prompt TEXT '' System prompt sent to the model
user_prompt_template TEXT '' User prompt template (optional — code-defined templates take precedence)
prompt_version VARCHAR(100) '' Version tag for prompt tracking
schema_version VARCHAR(50) '1.0.0' Version of the output schema
temperature FLOAT 0.0 Model temperature
max_tokens INTEGER 32768 Maximum output tokens
timeout_seconds INTEGER 120 Request timeout
max_retries INTEGER 2 Retry count on failure
active BOOLEAN TRUE Whether the agent is enabled
source VARCHAR(20) 'system' 'system' for built-in agents, 'user' for API-created
created_at TIMESTAMPTZ NOW() Creation timestamp
updated_at TIMESTAMPTZ NOW() Last update timestamp

Indexes:

  • idx_ai_agents_slug on slug
  • idx_ai_agents_active on active

Registration:

  • System-seeded: The three built-in agents are inserted by migration 026 using INSERT ... WHERE NOT EXISTS — they are only created if no row with that slug exists. This means user edits to system agents are preserved across re-migrations.
  • API-created: Users can create custom agents via POST /api/agents. These get source = 'user' and can be deleted.

agent_variants Table

Defined in migration 027_agent_variants.sql. Stores alternative configurations for A/B testing.

Column Type Default Description
id UUID gen_random_uuid() Primary key
agent_id UUID Foreign key → ai_agents(id) (CASCADE delete)
variant_name VARCHAR(200) Human-readable variant name
variant_slug VARCHAR(200) URL-safe slug (unique per agent)
description TEXT '' What this variant changes
model_provider VARCHAR(50) 'ollama' LLM provider override
model_name VARCHAR(200) Model override
system_prompt TEXT '' System prompt override
user_prompt_template TEXT '' User prompt template override
prompt_version VARCHAR(100) '' Prompt version tag
temperature FLOAT 0.0 Temperature override
max_tokens INTEGER 32768 Max tokens override
context_window INTEGER 0 Ollama num_ctx override (0 = model default)
input_token_limit INTEGER 0 Max input tokens before truncation (0 = no limit)
token_budget INTEGER 0 Total tokens per hour budget (0 = unlimited)
timeout_seconds INTEGER 120 Timeout override
max_retries INTEGER 2 Retry count override
is_active BOOLEAN FALSE Whether this variant is the active override
created_at TIMESTAMPTZ NOW() Creation timestamp
updated_at TIMESTAMPTZ NOW() Last update timestamp

Indexes and Constraints:

  • idx_agent_variants_slug — unique index on (agent_id, variant_slug) — each agent's variant slugs must be unique
  • idx_agent_variants_active — unique partial index on (agent_id) WHERE is_active = TRUEat most one active variant per agent (database-enforced)
  • idx_agent_variants_agent — lookup by agent

agent_performance_log Table

Defined in migration 026_ai_agents.sql, extended in 027_agent_variants.sql with variant_id.

Column Type Default Description
id UUID gen_random_uuid() Primary key
agent_id UUID Foreign key → ai_agents(id) (CASCADE delete)
variant_id UUID NULL Foreign key → agent_variants(id) (SET NULL on delete)
document_id UUID NULL Foreign key → documents(id) (SET NULL on delete)
ticker VARCHAR(20) Stock ticker processed
success BOOLEAN Whether the invocation succeeded
duration_ms INTEGER 0 Total invocation time in milliseconds
confidence FLOAT 0.0 Model confidence score (0.0 for thesis rewrites)
retry_count INTEGER 0 Number of retries before success/failure
input_tokens INTEGER 0 Estimated input tokens (chars / 4)
output_tokens INTEGER 0 Estimated output tokens (chars / 4)
error_message TEXT NULL Error description on failure
recorded_at TIMESTAMPTZ NOW() When the invocation occurred

Indexes:

  • idx_agent_perf_agent on (agent_id, recorded_at DESC)
  • idx_agent_perf_time on (recorded_at DESC)
  • idx_agent_perf_variant on (variant_id, recorded_at DESC)

AgentConfigResolver

Module: services/shared/agent_config.py

The AgentConfigResolver is the central mechanism for resolving runtime agent configuration. All three agent services use it instead of duplicating resolution logic.

How It Works

  1. Lookup by slug: The resolver queries the ai_agents table by slug (e.g., "document-extractor"), joining with agent_variants to find any active variant.

  2. COALESCE-based override: The SQL query uses COALESCE(variant_column, agent_column) for every configuration field. If an active variant exists and has a non-NULL value for a field, that value is used. Otherwise, the base agent's value is used.

    SELECT a.id        AS agent_id,
           v.id        AS variant_id,
           COALESCE(v.model_provider,       a.model_provider)       AS model_provider,
           COALESCE(v.model_name,           a.model_name)           AS model_name,
           COALESCE(v.system_prompt,        a.system_prompt)        AS system_prompt,
           COALESCE(v.user_prompt_template, a.user_prompt_template) AS user_prompt_template,
           COALESCE(v.prompt_version,       a.prompt_version)       AS prompt_version,
           COALESCE(v.temperature,          a.temperature)          AS temperature,
           COALESCE(v.max_tokens,           a.max_tokens)           AS max_tokens,
           COALESCE(v.context_window,       0)                      AS context_window,
           COALESCE(v.input_token_limit,    0)                      AS input_token_limit,
           COALESCE(v.token_budget,         0)                      AS token_budget,
           COALESCE(v.timeout_seconds,      a.timeout_seconds)      AS timeout_seconds,
           COALESCE(v.max_retries,          a.max_retries)          AS max_retries
      FROM ai_agents a
      LEFT JOIN agent_variants v
             ON v.agent_id = a.id AND v.is_active = TRUE
     WHERE a.slug = $1
       AND a.active = TRUE
    
  3. TTL cache (60 seconds): Resolved configurations are cached in memory using time.monotonic(). Cache entries expire after 60 seconds (configurable via ttl_seconds). This means variant swaps take effect within 60 seconds without restarting any service.

  4. Fallback behavior: If the database query fails or returns no rows (agent not found or inactive), the resolver returns None. Callers fall back to environment-variable-based OllamaConfig defaults.

Resolved Config Dataclass

@dataclass(frozen=True, slots=True)
class ResolvedAgentConfig:
    agent_id: str
    variant_id: str | None       # None if no active variant
    model_provider: str
    model_name: str
    system_prompt: str
    user_prompt_template: str
    prompt_version: str
    temperature: float
    max_tokens: int
    context_window: int           # Ollama num_ctx; 0 = model default
    input_token_limit: int        # Max input chars before truncation; 0 = no limit
    token_budget: int             # Hourly token budget; 0 = unlimited
    timeout_seconds: int
    max_retries: int

Usage Pattern

from services.shared.agent_config import AgentConfigResolver

resolver = AgentConfigResolver(pool, ttl_seconds=60)
config = await resolver.resolve("document-extractor")

if config is None:
    # Fall back to env-var defaults
    ...
else:
    # Use config.model_name, config.system_prompt, etc.
    ...

Cache Invalidation

resolver.invalidate("document-extractor")  # Clear one entry
resolver.invalidate()                       # Clear all entries

Config Refresh in Workers

The extractor and recommendation workers periodically re-resolve their agent config to pick up variant swaps and model changes:

  • Extractor worker (services/extractor/main.py): Re-resolves both document-extractor and event-classifier configs every 100 jobs. If the resolved model or provider changes, the worker creates a new LLM client instance via build_llm_client() and closes the old one. A safety guard prevents switching to Ollama if OLLAMA_BASE_URL is empty.
  • Recommendation worker (services/recommendation/main.py): Re-resolves the thesis-rewriter config every 50 jobs. If the model changes, a new OllamaConfig is built.

Performance Logging and Variant Comparison

Every agent invocation is logged to agent_performance_log with the agent_id and variant_id (if a variant was active). This enables comparing variant effectiveness.

What Gets Logged

  • Document extractor: Logged in services/extractor/main.py after each extraction. Records success/failure, duration, confidence, retry count, token estimates.
  • Event classifier: Logged in services/extractor/event_classifier.py after each classification. Same fields.
  • Thesis rewriter: Logged in services/recommendation/thesis_llm.py after each rewrite attempt. Confidence is always 0.0 (not applicable for rewrites). document_id is always NULL.

Querying for Variant Comparison

Compare two variants of the document extractor over the last 24 hours:

SELECT
    v.variant_name,
    COUNT(*) AS total_invocations,
    COUNT(*) FILTER (WHERE p.success) AS successes,
    ROUND(100.0 * COUNT(*) FILTER (WHERE p.success) / COUNT(*), 1) AS success_rate_pct,
    ROUND(AVG(p.duration_ms)::numeric) AS avg_duration_ms,
    ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY p.duration_ms)::numeric) AS p95_duration_ms,
    ROUND(AVG(p.confidence)::numeric, 4) AS avg_confidence,
    ROUND(AVG(p.retry_count)::numeric, 2) AS avg_retries,
    SUM(p.input_tokens + p.output_tokens) AS total_tokens
FROM agent_performance_log p
JOIN agent_variants v ON v.id = p.variant_id
WHERE p.agent_id = '<agent-uuid>'
  AND p.recorded_at >= NOW() - INTERVAL '24 hours'
GROUP BY v.variant_name
ORDER BY success_rate_pct DESC;

Compare base agent (no variant) vs active variant:

SELECT
    CASE WHEN p.variant_id IS NULL THEN 'base' ELSE v.variant_name END AS config,
    COUNT(*) AS invocations,
    ROUND(100.0 * COUNT(*) FILTER (WHERE p.success) / COUNT(*), 1) AS success_rate_pct,
    ROUND(AVG(p.duration_ms)::numeric) AS avg_duration_ms,
    ROUND(AVG(p.confidence)::numeric, 4) AS avg_confidence
FROM agent_performance_log p
LEFT JOIN agent_variants v ON v.id = p.variant_id
WHERE p.agent_id = '<agent-uuid>'
  AND p.recorded_at >= NOW() - INTERVAL '48 hours'
GROUP BY config
ORDER BY config;

Token Budget Enforcement

Variants can set a token_budget (total tokens per hour). Before each invocation, the worker checks:

SELECT COALESCE(SUM(input_tokens + output_tokens), 0) AS total_tokens
FROM agent_performance_log
WHERE variant_id = $1
  AND recorded_at >= NOW() - INTERVAL '1 hour'

If the budget is exceeded, the invocation is skipped (extractor) or falls back to the deterministic thesis (thesis rewriter).


API Endpoints

All agent endpoints are served by the Query API (services/api/app.py) under the /api/agents prefix.

Agent CRUD

Method Path Description
GET /api/agents List all agents. Query param: active_only (bool, default false)
GET /api/agents/{agent_id} Get a single agent by UUID
POST /api/agents Create a new user-defined agent (returns 201)
PUT /api/agents/{agent_id} Partial update an agent (system or user)
DELETE /api/agents/{agent_id} Delete a user-created agent. Returns 403 for system agents

Create Agent Request Body:

{
  "name": "My Custom Agent",
  "slug": "my-custom-agent",
  "purpose": "Custom extraction for earnings calls",
  "model_provider": "ollama",
  "model_name": "llama3.1:8b",
  "system_prompt": "You are a financial analyst...",
  "user_prompt_template": "",
  "prompt_version": "v1",
  "schema_version": "1.0.0",
  "temperature": 0.0,
  "max_tokens": 32768,
  "timeout_seconds": 120,
  "max_retries": 2
}

All fields except name have defaults. The slug is auto-generated from name if not provided. The model_name defaults to llama3.1:8b for user-created agents.

Update Agent Request Body (all fields optional):

{
  "model_name": "qwen3.5:14b",
  "system_prompt": "Updated prompt...",
  "temperature": 0.1,
  "active": false
}

Agent Performance

Method Path Description
GET /api/agents/{agent_id}/performance Aggregated metrics. Query param: hours (int, default 24, max 720)
GET /api/agents/{agent_id}/performance/history Hourly time-series. Query param: hours (int, default 24, max 720)

Performance Response:

{
  "total_invocations": 1250,
  "successes": 1180,
  "failures": 70,
  "avg_duration_ms": 3400,
  "p95_duration_ms": 8200,
  "avg_confidence": 0.7234,
  "avg_retries": 0.15,
  "total_input_tokens": 5000000,
  "total_output_tokens": 1200000,
  "success_rate": 0.944
}

Variant CRUD

Method Path Description
GET /api/agents/{agent_id}/variants List all variants for an agent
GET /api/agents/{agent_id}/variants/{variant_id} Get a single variant
POST /api/agents/{agent_id}/variants Create a new variant (returns 201, 409 on duplicate slug)
PUT /api/agents/{agent_id}/variants/{variant_id} Partial update a variant
DELETE /api/agents/{agent_id}/variants/{variant_id} Delete a variant (returns 400 if active)

Create Variant Request Body:

{
  "variant_name": "Llama 3.1 8B Test",
  "variant_slug": "llama-3-1-8b-test",
  "description": "Testing llama3.1:8b as an alternative",
  "model_provider": "ollama",
  "model_name": "llama3.1:8b",
  "system_prompt": "",
  "user_prompt_template": "",
  "prompt_version": "",
  "temperature": 0.0,
  "max_tokens": 32768,
  "context_window": 0,
  "input_token_limit": 0,
  "token_budget": 0,
  "timeout_seconds": 120,
  "max_retries": 2
}

Required fields: variant_name, model_name. The variant_slug is auto-generated from variant_name if not provided.

Clone Endpoints

Method Path Description
POST /api/agents/{agent_id}/clone Clone an agent's base config as a new variant
POST /api/agents/{agent_id}/variants/{variant_id}/clone Clone an existing variant as a new variant

Clone requests copy all configuration fields from the source, with optional overrides in the request body. The variant_name field is required. All other fields default to the source's values if not provided.

Activate / Deactivate

Method Path Description
POST /api/agents/{agent_id}/variants/{variant_id}/activate Set a variant as active (deactivates any other active variant in a single transaction)
POST /api/agents/{agent_id}/variants/deactivate Deactivate the currently active variant (agent falls back to base config)

The activate endpoint uses a database transaction to atomically deactivate the current variant and activate the new one, ensuring exactly one active variant at all times.

Per-Variant Performance

Method Path Description
GET /api/agents/{agent_id}/variants/{variant_id}/performance Aggregated metrics for a specific variant
GET /api/agents/{agent_id}/variants/{variant_id}/performance/history Hourly time-series for a specific variant

Both endpoints accept the same hours query parameter (default 24, max 720) and return the same response shape as the agent-level performance endpoints.


Step-by-Step: Creating and Activating a Variant

This walkthrough creates a new variant of the document extractor that uses a different model and activates it for live traffic.

1. Find the Agent ID

curl -s https://stonks-api.celestium.life/api/agents?active_only=true | jq '.[] | select(.slug == "document-extractor") | .id'

Note the UUID — we'll call it AGENT_ID.

2. Clone the Agent as a Variant

curl -s -X POST https://stonks-api.celestium.life/api/agents/$AGENT_ID/clone \
  -H "Content-Type: application/json" \
  -d '{
    "variant_name": "Llama 3.1 8B Test",
    "description": "Testing llama3.1:8b as an alternative to qwen3.5:9b-fast",
    "model_name": "llama3.1:8b",
    "temperature": 0.1
  }' | jq .

This creates a new variant with all fields copied from the base agent, except model_name and temperature which are overridden. The variant starts as is_active: false.

Note the variant's id — we'll call it VARIANT_ID.

3. Activate the Variant

curl -s -X POST \
  https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID/activate | jq .

This atomically deactivates any previously active variant and activates the new one. Within 60 seconds (the TTL cache window), the extractor worker will pick up the new configuration and start using llama3.1:8b.

4. Monitor Performance

Wait for some documents to be processed, then compare:

# Base agent performance (all invocations)
curl -s "https://stonks-api.celestium.life/api/agents/$AGENT_ID/performance?hours=4" | jq .

# Variant-specific performance
curl -s "https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID/performance?hours=4" | jq .

Check the hourly trend:

curl -s "https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID/performance/history?hours=12" | jq .

5. Roll Back (Deactivate)

If the variant underperforms, deactivate it to revert to the base agent config:

curl -s -X POST \
  https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/deactivate | jq .

The extractor will revert to the base qwen3.5:9b-fast configuration within 60 seconds.

6. Iterate

You can update the variant's prompt or parameters without creating a new one:

curl -s -X PUT \
  https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID \
  -H "Content-Type: application/json" \
  -d '{
    "system_prompt": "You are a financial document analyst. Extract structured data as JSON. Be extra conservative with impact scores — only assign > 0.7 for material events with concrete numbers.",
    "prompt_version": "document-intel-v2-conservative"
  }' | jq .

Then re-activate and compare again.

7. Switch to vLLM Provider

To test a variant using vLLM instead of Ollama:

curl -s -X POST https://stonks-api.celestium.life/api/agents/$AGENT_ID/clone \
  -H "Content-Type: application/json" \
  -d '{
    "variant_name": "vLLM Qwen3 Test",
    "description": "Testing extraction with vLLM backend",
    "model_provider": "vllm",
    "model_name": "Qwen/Qwen3-8B"
  }' | jq .

The extractor worker will detect the provider change during its next config refresh and build a VLLMClient instead of an OllamaClient. Ensure the VLLM_BASE_URL environment variable is set in the extractor deployment.