Files
stonks-oracle/docs/ai-agents.md
T
Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
2026-04-22 02:56:41 +00:00

619 lines
25 KiB
Markdown

# AI Agent Building Guide
Stonks Oracle uses three AI agents powered by a local Ollama instance. Each agent has a dedicated purpose in the pipeline, a database-backed configuration, and support for A/B testing through variants. This guide covers how each agent works, how to configure them, how to create and test variants, and how to monitor performance.
## Table of Contents
- [Built-in Agents](#built-in-agents)
- [Document Intelligence Extractor](#1-document-intelligence-extractor)
- [Global Event Classifier](#2-global-event-classifier)
- [Thesis Rewriter](#3-thesis-rewriter)
- [Database Schema](#database-schema)
- [ai_agents Table](#ai_agents-table)
- [agent_variants Table](#agent_variants-table)
- [agent_performance_log Table](#agent_performance_log-table)
- [AgentConfigResolver](#agentconfigresolver)
- [Performance Logging and Variant Comparison](#performance-logging-and-variant-comparison)
- [API Endpoints](#api-endpoints)
- [Step-by-Step: Creating and Activating a Variant](#step-by-step-creating-and-activating-a-variant)
---
## Built-in Agents
Three agents are seeded into the `ai_agents` table on first migration (migration `026_ai_agents.sql`). They have `source = 'system'` and cannot be deleted through the API — only deactivated or edited.
### 1. Document Intelligence Extractor
| Field | Value |
|-------|-------|
| **Slug** | `document-extractor` |
| **Purpose** | Extracts structured intelligence (sentiment, catalysts, impact scores, key facts, risks) from company news, SEC filings, earnings transcripts, and press releases |
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
| **Prompt Version** | `document-intel-v2` |
| **Schema Version** | `2.0.0` |
| **Entry Point** | `services/extractor/main.py``services/extractor/client.py` |
**Input Data:**
- Normalized document text (fetched from MinIO or passed in the Redis job payload)
- Document type: `article`, `filing`, `transcript`, or `press_release`
- List of tracked tickers for company identification
- Document ID for traceability
**Output Schema** (`ExtractionResult`):
```json
{
"summary": "1-3 sentence summary",
"companies": [
{
"ticker": "AAPL",
"company_name": "Apple Inc.",
"relevance": 0.9,
"sentiment": "positive|negative|neutral|mixed",
"impact_score": 0.7,
"impact_horizon": "intraday|1d|1d_7d|1d_30d|30d_90d|90d_plus",
"catalyst_type": "earnings|product|legal|macro|supply_chain|m_and_a|rating_change|other",
"key_facts": ["fact1", "fact2"],
"risks": ["risk1"],
"evidence_spans": ["verbatim quote from document"]
}
],
"macro_themes": ["inflation", "ai_capex"],
"novelty_score": 0.6,
"confidence": 0.8,
"extraction_warnings": []
}
```
**System Prompt:**
```
You are a financial document analyst. Extract structured data as JSON.
Return ONLY a single JSON object. No markdown fences, no explanation,
no text before or after the JSON. Every field in the schema is required.
Use "other" for catalyst_type if unsure. Keep evidence_spans short
(under 20 words each). Keep key_facts to 3-5 items max.
```
**User Prompt Template** (built by `build_extraction_prompt()` in `services/extractor/prompts.py`):
- Includes document type and type-specific guidance (article, filing, transcript, press release)
- Includes tracked ticker list with rules for company identification
- Includes the full JSON schema field descriptions
- Truncates documents to 8,000 characters to limit inference time
---
### 2. Global Event Classifier
| Field | Value |
|-------|-------|
| **Slug** | `event-classifier` |
| **Purpose** | Classifies global/geopolitical news into structured macro events with impact type, severity, affected regions/sectors/commodities, and estimated duration |
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
| **Prompt Version** | `event-classification-v1` |
| **Schema Version** | `1.0.0` |
| **Entry Point** | `services/extractor/main.py``services/extractor/event_classifier.py` |
**Input Data:**
- Normalized text of a macro news article (from the `stonks:queue:macro_classification` Redis queue)
- Document ID for traceability
**Output Schema** (`GlobalEvent`):
```json
{
"event_types": ["trade_barrier", "commodity_shock"],
"severity": "low|moderate|high|critical",
"affected_regions": ["US", "CN"],
"affected_sectors": ["Energy", "Industrials"],
"affected_commodities": ["crude_oil"],
"summary": "1-3 sentence summary of event and market implications",
"key_facts": ["fact1", "fact2"],
"estimated_duration": "short_term|medium_term|long_term",
"confidence": 0.75
}
```
Valid `event_types`: `supply_disruption`, `demand_shift`, `cost_increase`, `regulatory_pressure`, `currency_impact`, `commodity_shock`, `trade_barrier`, `geopolitical_risk`
Valid `severity`: `low`, `moderate`, `high`, `critical`
**System Prompt:**
```
You classify MACRO-LEVEL global news into structured event JSON.
Return ONLY a single JSON object. No markdown, no explanation.
Every field is required. Keep key_facts to 3-5 items. Keep summary
under 3 sentences.
CRITICAL: Only classify articles about MACRO events that affect entire
markets, sectors, or economies. Examples: trade wars, interest rate
changes, commodity supply disruptions, regulatory changes, geopolitical
conflicts, natural disasters.
DO NOT classify as macro events: individual company earnings, lawsuits
against a single company, single-company management changes, individual
stock analysis, company-specific debt or bankruptcy, product launches
by one company. For these, set severity to "low", confidence below 0.3,
and leave affected_regions, affected_sectors, and affected_commodities
as empty arrays.
```
**User Prompt Template** (built by `build_event_classification_prompt()` in `services/extractor/event_classifier.py`):
- Includes anti-hallucination rules
- Lists all valid enum values for each field
- Truncates articles to 6,000 characters
---
### 3. Thesis Rewriter
| Field | Value |
|-------|-------|
| **Slug** | `thesis-rewriter` |
| **Purpose** | Rewrites deterministic trade thesis summaries into clear, professional analyst prose. Optional layer — the system falls back to the deterministic thesis if this fails |
| **Default Model** | `qwen3.5:9b-fast` (Ollama) |
| **Prompt Version** | `thesis-rewrite-v1` |
| **Schema Version** | `1.0.0` |
| **Entry Point** | `services/recommendation/main.py``services/recommendation/thesis_llm.py` |
**Input Data:**
- Deterministic thesis string (rule-based, built from trend data and eligibility rules)
- `TrendSummary` context: ticker, window, direction, strength, confidence, contradiction score, dominant catalysts, material risks
**Output Schema:**
- Plain text (not JSON). The model returns only the rewritten thesis as a string, under 150 words.
- On failure or empty response, the original deterministic thesis is returned unchanged.
**System Prompt:**
```
You are a concise financial analyst. You rewrite structured trade thesis
summaries into clear, professional prose suitable for an internal
research note.
STRICT RULES:
1. Do NOT add any information that is not present in the input.
2. Do NOT fabricate numbers, dates, company names, or analyst opinions.
3. Keep the rewrite under 150 words.
4. Preserve all factual claims, risk notes, and evidence counts from
the input.
5. Use a neutral, professional tone. Avoid hype or marketing language.
6. Return ONLY the rewritten thesis text. No JSON, no markdown, no
commentary.
```
**User Prompt Template** (built by `build_thesis_rewrite_prompt()` in `services/recommendation/thesis_llm.py`):
- Includes the deterministic thesis between delimiters
- Includes trend context: ticker, window, direction, strength, confidence, contradiction score, top catalysts, top risks
---
## Database Schema
### `ai_agents` Table
Defined in migration `026_ai_agents.sql`. Stores the base configuration for each agent.
| Column | Type | Default | Description |
|--------|------|---------|-------------|
| `id` | `UUID` | `gen_random_uuid()` | Primary key |
| `name` | `VARCHAR(100)` | — | Human-readable name (unique) |
| `slug` | `VARCHAR(100)` | — | URL-safe identifier (unique), used by `AgentConfigResolver` |
| `purpose` | `TEXT` | `''` | Description of what the agent does |
| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider |
| `model_name` | `VARCHAR(200)` | `'qwen3.5:9b'` | Model identifier |
| `system_prompt` | `TEXT` | `''` | System prompt sent to the model |
| `user_prompt_template` | `TEXT` | `''` | User prompt template (optional — code-defined templates take precedence) |
| `prompt_version` | `VARCHAR(100)` | `''` | Version tag for prompt tracking |
| `schema_version` | `VARCHAR(50)` | `'1.0.0'` | Version of the output schema |
| `temperature` | `FLOAT` | `0.0` | Model temperature |
| `max_tokens` | `INTEGER` | `32768` | Maximum output tokens |
| `timeout_seconds` | `INTEGER` | `120` | Request timeout |
| `max_retries` | `INTEGER` | `2` | Retry count on failure |
| `active` | `BOOLEAN` | `TRUE` | Whether the agent is enabled |
| `source` | `VARCHAR(20)` | `'system'` | `'system'` for built-in agents, `'user'` for API-created |
| `created_at` | `TIMESTAMPTZ` | `NOW()` | Creation timestamp |
| `updated_at` | `TIMESTAMPTZ` | `NOW()` | Last update timestamp |
**Indexes:**
- `idx_ai_agents_slug` on `slug`
- `idx_ai_agents_active` on `active`
**Registration:**
- **System-seeded**: The three built-in agents are inserted by migration 026 using `INSERT ... WHERE NOT EXISTS` — they are only created if no row with that slug exists. This means user edits to system agents are preserved across re-migrations.
- **API-created**: Users can create custom agents via `POST /api/agents`. These get `source = 'user'` and can be deleted.
### `agent_variants` Table
Defined in migration `027_agent_variants.sql`. Stores alternative configurations for A/B testing.
| Column | Type | Default | Description |
|--------|------|---------|-------------|
| `id` | `UUID` | `gen_random_uuid()` | Primary key |
| `agent_id` | `UUID` | — | Foreign key → `ai_agents(id)` (CASCADE delete) |
| `variant_name` | `VARCHAR(200)` | — | Human-readable variant name |
| `variant_slug` | `VARCHAR(200)` | — | URL-safe slug (unique per agent) |
| `description` | `TEXT` | `''` | What this variant changes |
| `model_provider` | `VARCHAR(50)` | `'ollama'` | LLM provider override |
| `model_name` | `VARCHAR(200)` | — | Model override |
| `system_prompt` | `TEXT` | `''` | System prompt override |
| `user_prompt_template` | `TEXT` | `''` | User prompt template override |
| `prompt_version` | `VARCHAR(100)` | `''` | Prompt version tag |
| `temperature` | `FLOAT` | `0.0` | Temperature override |
| `max_tokens` | `INTEGER` | `32768` | Max tokens override |
| `context_window` | `INTEGER` | `0` | Ollama `num_ctx` override (0 = model default) |
| `input_token_limit` | `INTEGER` | `0` | Max input tokens before truncation (0 = no limit) |
| `token_budget` | `INTEGER` | `0` | Total tokens per hour budget (0 = unlimited) |
| `timeout_seconds` | `INTEGER` | `120` | Timeout override |
| `max_retries` | `INTEGER` | `2` | Retry count override |
| `is_active` | `BOOLEAN` | `FALSE` | Whether this variant is the active override |
| `created_at` | `TIMESTAMPTZ` | `NOW()` | Creation timestamp |
| `updated_at` | `TIMESTAMPTZ` | `NOW()` | Last update timestamp |
**Indexes and Constraints:**
- `idx_agent_variants_slug` — unique index on `(agent_id, variant_slug)` — each agent's variant slugs must be unique
- `idx_agent_variants_active` — unique partial index on `(agent_id) WHERE is_active = TRUE`**at most one active variant per agent** (database-enforced)
- `idx_agent_variants_agent` — lookup by agent
### `agent_performance_log` Table
Defined in migration `026_ai_agents.sql`, extended in `027_agent_variants.sql` with `variant_id`.
| Column | Type | Default | Description |
|--------|------|---------|-------------|
| `id` | `UUID` | `gen_random_uuid()` | Primary key |
| `agent_id` | `UUID` | — | Foreign key → `ai_agents(id)` (CASCADE delete) |
| `variant_id` | `UUID` | `NULL` | Foreign key → `agent_variants(id)` (SET NULL on delete) |
| `document_id` | `UUID` | `NULL` | Foreign key → `documents(id)` (SET NULL on delete) |
| `ticker` | `VARCHAR(20)` | — | Stock ticker processed |
| `success` | `BOOLEAN` | — | Whether the invocation succeeded |
| `duration_ms` | `INTEGER` | `0` | Total invocation time in milliseconds |
| `confidence` | `FLOAT` | `0.0` | Model confidence score (0.0 for thesis rewrites) |
| `retry_count` | `INTEGER` | `0` | Number of retries before success/failure |
| `input_tokens` | `INTEGER` | `0` | Estimated input tokens (chars / 4) |
| `output_tokens` | `INTEGER` | `0` | Estimated output tokens (chars / 4) |
| `error_message` | `TEXT` | `NULL` | Error description on failure |
| `recorded_at` | `TIMESTAMPTZ` | `NOW()` | When the invocation occurred |
**Indexes:**
- `idx_agent_perf_agent` on `(agent_id, recorded_at DESC)`
- `idx_agent_perf_time` on `(recorded_at DESC)`
- `idx_agent_perf_variant` on `(variant_id, recorded_at DESC)`
---
## AgentConfigResolver
**Module:** `services/shared/agent_config.py`
The `AgentConfigResolver` is the central mechanism for resolving runtime agent configuration. All three agent services use it instead of duplicating resolution logic.
### How It Works
1. **Lookup by slug**: The resolver queries the `ai_agents` table by slug (e.g., `"document-extractor"`), joining with `agent_variants` to find any active variant.
2. **COALESCE-based override**: The SQL query uses `COALESCE(variant_column, agent_column)` for every configuration field. If an active variant exists and has a non-NULL value for a field, that value is used. Otherwise, the base agent's value is used.
```sql
SELECT a.id AS agent_id,
v.id AS variant_id,
COALESCE(v.model_provider, a.model_provider) AS model_provider,
COALESCE(v.model_name, a.model_name) AS model_name,
COALESCE(v.system_prompt, a.system_prompt) AS system_prompt,
COALESCE(v.user_prompt_template, a.user_prompt_template) AS user_prompt_template,
-- ... all other fields ...
FROM ai_agents a
LEFT JOIN agent_variants v
ON v.agent_id = a.id AND v.is_active = TRUE
WHERE a.slug = $1
AND a.active = TRUE
```
3. **TTL cache (60 seconds)**: Resolved configurations are cached in memory using `time.monotonic()`. Cache entries expire after 60 seconds (configurable via `ttl_seconds`). This means variant swaps take effect within 60 seconds without restarting any service.
4. **Fallback behavior**: If the database query fails or returns no rows (agent not found or inactive), the resolver returns `None`. Callers fall back to environment-variable-based `OllamaConfig` defaults.
### Resolved Config Dataclass
```python
@dataclass(frozen=True, slots=True)
class ResolvedAgentConfig:
agent_id: str
variant_id: str | None # None if no active variant
model_provider: str
model_name: str
system_prompt: str
user_prompt_template: str
prompt_version: str
temperature: float
max_tokens: int
context_window: int # Ollama num_ctx; 0 = model default
input_token_limit: int # Max input chars before truncation; 0 = no limit
token_budget: int # Hourly token budget; 0 = unlimited
timeout_seconds: int
max_retries: int
```
### Usage Pattern
```python
from services.shared.agent_config import AgentConfigResolver
resolver = AgentConfigResolver(pool, ttl_seconds=60)
config = await resolver.resolve("document-extractor")
if config is None:
# Fall back to env-var defaults
...
else:
# Use config.model_name, config.system_prompt, etc.
...
```
### Cache Invalidation
```python
resolver.invalidate("document-extractor") # Clear one entry
resolver.invalidate() # Clear all entries
```
### Config Refresh in Workers
The extractor and recommendation workers periodically re-resolve their agent config (every 100 jobs for the extractor, every 50 jobs for the recommendation worker). If the resolved model changes, the worker creates a new `OllamaClient` instance with the updated configuration.
---
## Performance Logging and Variant Comparison
Every agent invocation is logged to `agent_performance_log` with the `agent_id` and `variant_id` (if a variant was active). This enables comparing variant effectiveness.
### What Gets Logged
- **Document extractor**: Logged in `services/extractor/main.py` after each extraction. Records success/failure, duration, confidence, retry count, token estimates.
- **Event classifier**: Logged in `services/extractor/event_classifier.py` after each classification. Same fields.
- **Thesis rewriter**: Logged in `services/recommendation/thesis_llm.py` after each rewrite attempt. Confidence is always 0.0 (not applicable for rewrites).
### Querying for Variant Comparison
Compare two variants of the document extractor over the last 24 hours:
```sql
SELECT
v.variant_name,
COUNT(*) AS total_invocations,
COUNT(*) FILTER (WHERE p.success) AS successes,
ROUND(100.0 * COUNT(*) FILTER (WHERE p.success) / COUNT(*), 1) AS success_rate_pct,
ROUND(AVG(p.duration_ms)::numeric) AS avg_duration_ms,
ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY p.duration_ms)::numeric) AS p95_duration_ms,
ROUND(AVG(p.confidence)::numeric, 4) AS avg_confidence,
ROUND(AVG(p.retry_count)::numeric, 2) AS avg_retries,
SUM(p.input_tokens + p.output_tokens) AS total_tokens
FROM agent_performance_log p
JOIN agent_variants v ON v.id = p.variant_id
WHERE p.agent_id = '<agent-uuid>'
AND p.recorded_at >= NOW() - INTERVAL '24 hours'
GROUP BY v.variant_name
ORDER BY success_rate_pct DESC;
```
Compare base agent (no variant) vs active variant:
```sql
SELECT
CASE WHEN p.variant_id IS NULL THEN 'base' ELSE v.variant_name END AS config,
COUNT(*) AS invocations,
ROUND(100.0 * COUNT(*) FILTER (WHERE p.success) / COUNT(*), 1) AS success_rate_pct,
ROUND(AVG(p.duration_ms)::numeric) AS avg_duration_ms,
ROUND(AVG(p.confidence)::numeric, 4) AS avg_confidence
FROM agent_performance_log p
LEFT JOIN agent_variants v ON v.id = p.variant_id
WHERE p.agent_id = '<agent-uuid>'
AND p.recorded_at >= NOW() - INTERVAL '48 hours'
GROUP BY config
ORDER BY config;
```
### Token Budget Enforcement
Variants can set a `token_budget` (total tokens per hour). Before each invocation, the worker checks:
```sql
SELECT COALESCE(SUM(input_tokens + output_tokens), 0) AS total_tokens
FROM agent_performance_log
WHERE variant_id = $1
AND recorded_at >= NOW() - INTERVAL '1 hour'
```
If the budget is exceeded, the invocation is skipped (extractor) or falls back to the deterministic thesis (thesis rewriter).
---
## API Endpoints
All agent endpoints are served by the Query API (`services/api/app.py`) under the `/api/agents` prefix.
### Agent CRUD
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/agents` | List all agents. Query param: `active_only` (bool, default `false`) |
| `GET` | `/api/agents/{agent_id}` | Get a single agent by UUID |
| `POST` | `/api/agents` | Create a new user-defined agent (returns 201) |
| `PUT` | `/api/agents/{agent_id}` | Partial update an agent (system or user) |
| `DELETE` | `/api/agents/{agent_id}` | Delete a user-created agent. Returns 403 for system agents |
**Create Agent Request Body:**
```json
{
"name": "My Custom Agent",
"slug": "my-custom-agent",
"purpose": "Custom extraction for earnings calls",
"model_provider": "ollama",
"model_name": "llama3.1:8b",
"system_prompt": "You are a financial analyst...",
"user_prompt_template": "",
"prompt_version": "v1",
"schema_version": "1.0.0",
"temperature": 0.0,
"max_tokens": 32768,
"timeout_seconds": 120,
"max_retries": 2
}
```
**Update Agent Request Body** (all fields optional):
```json
{
"model_name": "qwen3.5:14b",
"system_prompt": "Updated prompt...",
"temperature": 0.1,
"active": false
}
```
### Agent Performance
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/agents/{agent_id}/performance` | Aggregated metrics. Query param: `hours` (int, default 24, max 720) |
| `GET` | `/api/agents/{agent_id}/performance/history` | Hourly time-series. Query param: `hours` (int, default 24, max 720) |
**Performance Response:**
```json
{
"total_invocations": 1250,
"successes": 1180,
"failures": 70,
"avg_duration_ms": 3400,
"p95_duration_ms": 8200,
"avg_confidence": 0.7234,
"avg_retries": 0.15,
"total_input_tokens": 5000000,
"total_output_tokens": 1200000,
"success_rate": 0.944
}
```
### Variant CRUD
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/agents/{agent_id}/variants` | List all variants for an agent |
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}` | Get a single variant |
| `POST` | `/api/agents/{agent_id}/variants` | Create a new variant (returns 201, 409 on duplicate slug) |
| `PUT` | `/api/agents/{agent_id}/variants/{variant_id}` | Partial update a variant |
| `DELETE` | `/api/agents/{agent_id}/variants/{variant_id}` | Delete a variant (returns 400 if active) |
### Clone Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/agents/{agent_id}/clone` | Clone an agent's base config as a new variant |
| `POST` | `/api/agents/{agent_id}/variants/{variant_id}/clone` | Clone an existing variant as a new variant |
Clone requests copy all configuration fields from the source, with optional overrides in the request body.
### Activate / Deactivate
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/agents/{agent_id}/variants/{variant_id}/activate` | Set a variant as active (deactivates any other active variant in a single transaction) |
| `POST` | `/api/agents/{agent_id}/variants/deactivate` | Deactivate the currently active variant (agent falls back to base config) |
### Per-Variant Performance
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance` | Aggregated metrics for a specific variant |
| `GET` | `/api/agents/{agent_id}/variants/{variant_id}/performance/history` | Hourly time-series for a specific variant |
---
## Step-by-Step: Creating and Activating a Variant
This walkthrough creates a new variant of the document extractor that uses a different model and activates it for live traffic.
### 1. Find the Agent ID
```bash
curl -s https://stonks-api.celestium.life/api/agents?active_only=true | jq '.[] | select(.slug == "document-extractor") | .id'
```
Note the UUID — we'll call it `AGENT_ID`.
### 2. Clone the Agent as a Variant
```bash
curl -s -X POST https://stonks-api.celestium.life/api/agents/$AGENT_ID/clone \
-H "Content-Type: application/json" \
-d '{
"variant_name": "Llama 3.1 8B Test",
"description": "Testing llama3.1:8b as an alternative to qwen3.5:9b-fast",
"model_name": "llama3.1:8b",
"temperature": 0.1
}' | jq .
```
This creates a new variant with all fields copied from the base agent, except `model_name` and `temperature` which are overridden. The variant starts as `is_active: false`.
Note the variant's `id` — we'll call it `VARIANT_ID`.
### 3. Activate the Variant
```bash
curl -s -X POST \
https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID/activate | jq .
```
This atomically deactivates any previously active variant and activates the new one. Within 60 seconds (the TTL cache window), the extractor worker will pick up the new configuration and start using `llama3.1:8b`.
### 4. Monitor Performance
Wait for some documents to be processed, then compare:
```bash
# Base agent performance (all invocations)
curl -s "https://stonks-api.celestium.life/api/agents/$AGENT_ID/performance?hours=4" | jq .
# Variant-specific performance
curl -s "https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID/performance?hours=4" | jq .
```
Check the hourly trend:
```bash
curl -s "https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID/performance/history?hours=12" | jq .
```
### 5. Roll Back (Deactivate)
If the variant underperforms, deactivate it to revert to the base agent config:
```bash
curl -s -X POST \
https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/deactivate | jq .
```
The extractor will revert to the base `qwen3.5:9b-fast` configuration within 60 seconds.
### 6. Iterate
You can update the variant's prompt or parameters without creating a new one:
```bash
curl -s -X PUT \
https://stonks-api.celestium.life/api/agents/$AGENT_ID/variants/$VARIANT_ID \
-H "Content-Type: application/json" \
-d '{
"system_prompt": "You are a financial document analyst. Extract structured data as JSON. Be extra conservative with impact scores — only assign > 0.7 for material events with concrete numbers.",
"prompt_version": "document-intel-v2-conservative"
}' | jq .
```
Then re-activate and compare again.