- Migration 027: agent_variants table with single-active enforcement, variant_id column on agent_performance_log - API: full CRUD, clone from agent/variant, activate/deactivate, per-variant performance metrics and history endpoints - Services: extractor, event classifier, thesis rewriter all wired to AgentConfigResolver with variant override support - Frontend: variant list, comparison view, create/edit/clone forms, activate/delete actions on Agents page - Tests: API tests + 5 property-based tests (single-active invariant, clone preservation, config resolution, slug determinism, update idempotence) - Spec files for agent-variants feature
24 KiB
Design Document: Agent Variants
Overview
Add variant support to the existing AI agents system so each agent can have multiple configurations (different models, prompts, parameters) that can be independently tracked, compared, and swapped into production. This builds on the existing ai_agents table, agent_performance_log table, API endpoints, and frontend Agents page.
Architecture
System Context
┌──────────────┐ ┌──────────────────┐ ┌───────────────┐
│ Agents Page │────▶│ Query API │────▶│ PostgreSQL │
│ (React) │ │ (FastAPI) │ │ ai_agents │
│ │ │ │ │ agent_variants│
│ - List │ │ /api/agents/ │ │ agent_perf_log│
│ - Compare │ │ /api/agents/ │ └───────────────┘
│ - Activate │ │ {id}/variants/ │
└──────────────┘ └──────────────────┘
│
┌────────┴────────┐
│ Config Resolver │
│ (shared module) │
└────────┬────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Extractor │ │ Event │ │ Thesis │
│ (client.py) │ │ Classifier │ │ Rewriter │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└──────────────────┼──────────────────┘
▼
┌──────────────┐
│ Ollama │
│ Service │
└──────────────┘
Key Design Decisions
-
Variants as a separate table (not rows in
ai_agents): Keeps the parent agent as the canonical role definition. Variants are children that override config fields. This avoids polluting the existing agent table with parent/child semantics and keeps backward compatibility. -
Partial unique index for single-active enforcement: Rather than application-level logic, a PostgreSQL partial unique index on
(agent_id) WHERE is_active = TRUEguarantees at most one active variant per agent at the database level. -
Shared config resolver module: A new
services/shared/agent_config.pymodule encapsulates the "resolve active config for an agent slug" logic with TTL caching. All three services import this instead of duplicating resolution logic. -
Nullable variant_id on performance log: Adding
variant_idas nullable toagent_performance_logpreserves backward compatibility — existing rows have NULL, new invocations record the variant when applicable. -
No schema_version on variants: Variants inherit the parent agent's schema_version since that defines the output structure, not a tuning parameter. Variants override model, prompt, and inference parameters only.
Database Schema
Migration 027: Agent Variants
-- Agent variant configurations: alternative model/prompt/parameter sets per agent.
CREATE TABLE IF NOT EXISTS agent_variants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id UUID NOT NULL REFERENCES ai_agents(id) ON DELETE CASCADE,
variant_name VARCHAR(200) NOT NULL,
variant_slug VARCHAR(200) NOT NULL,
description TEXT NOT NULL DEFAULT '',
model_provider VARCHAR(50) NOT NULL DEFAULT 'ollama',
model_name VARCHAR(200) NOT NULL,
system_prompt TEXT NOT NULL DEFAULT '',
user_prompt_template TEXT NOT NULL DEFAULT '',
prompt_version VARCHAR(100) NOT NULL DEFAULT '',
temperature FLOAT DEFAULT 0.0,
max_tokens INTEGER DEFAULT 32768,
context_window INTEGER DEFAULT 0, -- Ollama num_ctx; 0 = use model default
input_token_limit INTEGER DEFAULT 0, -- max input tokens before truncation; 0 = no limit
token_budget INTEGER DEFAULT 0, -- total tokens per hour; 0 = unlimited
timeout_seconds INTEGER DEFAULT 120,
max_retries INTEGER DEFAULT 2,
is_active BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Each agent can have many variants, but variant slugs must be unique per agent.
CREATE UNIQUE INDEX IF NOT EXISTS idx_agent_variants_slug
ON agent_variants(agent_id, variant_slug);
-- At most one active variant per agent (database-enforced invariant).
CREATE UNIQUE INDEX IF NOT EXISTS idx_agent_variants_active
ON agent_variants(agent_id) WHERE is_active = TRUE;
-- Fast lookup by agent.
CREATE INDEX IF NOT EXISTS idx_agent_variants_agent
ON agent_variants(agent_id);
-- Add variant_id to performance log for per-variant attribution.
ALTER TABLE agent_performance_log
ADD COLUMN IF NOT EXISTS variant_id UUID REFERENCES agent_variants(id) ON DELETE SET NULL;
CREATE INDEX IF NOT EXISTS idx_agent_perf_variant
ON agent_performance_log(variant_id, recorded_at DESC);
Entity Relationships
ai_agents (1) ──── (0..*) agent_variants
│ │
│ │ (variant_id, nullable)
└───── agent_performance_log ◀──┘
(agent_id, required)
API Design
Pydantic Models
class VariantCreateBody(BaseModel):
variant_name: str
variant_slug: str | None = None # auto-generated from name if omitted
description: str = ""
model_provider: str = "ollama"
model_name: str
system_prompt: str = ""
user_prompt_template: str = ""
prompt_version: str = ""
temperature: float = 0.0
max_tokens: int = 32768
context_window: int = 0 # Ollama num_ctx; 0 = model default
input_token_limit: int = 0 # max input tokens; 0 = no limit
token_budget: int = 0 # tokens per hour; 0 = unlimited
timeout_seconds: int = 120
max_retries: int = 2
class VariantUpdateBody(BaseModel):
variant_name: str | None = None
description: str | None = None
model_provider: str | None = None
model_name: str | None = None
system_prompt: str | None = None
user_prompt_template: str | None = None
prompt_version: str | None = None
temperature: float | None = None
max_tokens: int | None = None
context_window: int | None = None
input_token_limit: int | None = None
token_budget: int | None = None
timeout_seconds: int | None = None
max_retries: int | None = None
class VariantCloneBody(BaseModel):
variant_name: str
variant_slug: str | None = None
# All config fields optional — omitted fields inherit from source
description: str | None = None
model_provider: str | None = None
model_name: str | None = None
system_prompt: str | None = None
user_prompt_template: str | None = None
prompt_version: str | None = None
temperature: float | None = None
max_tokens: int | None = None
context_window: int | None = None
input_token_limit: int | None = None
token_budget: int | None = None
timeout_seconds: int | None = None
max_retries: int | None = None
Endpoints
| Method | Path | Description | Req |
|---|---|---|---|
| GET | /api/agents/{agent_id}/variants |
List all variants for an agent | 3.1 |
| GET | /api/agents/{agent_id}/variants/{variant_id} |
Get a single variant | 3.2 |
| POST | /api/agents/{agent_id}/variants |
Create a variant (direct create) | 3 |
| POST | /api/agents/{agent_id}/clone |
Clone agent as variant | 2.1 |
| POST | /api/agents/{agent_id}/variants/{variant_id}/clone |
Clone variant as new variant | 2.2 |
| PUT | /api/agents/{agent_id}/variants/{variant_id} |
Update a variant | 3.4 |
| DELETE | /api/agents/{agent_id}/variants/{variant_id} |
Delete a variant | 3.5 |
| POST | /api/agents/{agent_id}/variants/{variant_id}/activate |
Set variant as active | 4.1 |
| POST | /api/agents/{agent_id}/variants/deactivate |
Deactivate current active variant | 4.2 |
| GET | /api/agents/{agent_id}/variants/{variant_id}/performance |
Variant performance metrics | 6.3 |
| GET | /api/agents/{agent_id}/variants/{variant_id}/performance/history |
Variant performance time-series | 6.4 |
Clone Endpoint Logic (POST /api/agents/{agent_id}/clone)
# 1. Fetch source agent
agent = await pool.fetchrow("SELECT * FROM ai_agents WHERE id = $1", agent_id)
# 2. Build variant record: start with agent fields, overlay user overrides
variant_data = {
"model_provider": body.model_provider or agent["model_provider"],
"model_name": body.model_name or agent["model_name"],
"system_prompt": body.system_prompt if body.system_prompt is not None else agent["system_prompt"],
# ... etc for all config fields
}
# 3. Generate slug if not provided
slug = body.variant_slug or slugify(body.variant_name)
# 4. Insert into agent_variants
row = await pool.fetchrow(
"INSERT INTO agent_variants (...) VALUES (...) RETURNING *",
agent_id, body.variant_name, slug, **variant_data
)
Activate Endpoint Logic (POST .../activate)
# Single transaction: deactivate previous, activate target
async with pool.acquire() as conn:
async with conn.transaction():
await conn.execute(
"UPDATE agent_variants SET is_active = FALSE, updated_at = NOW() "
"WHERE agent_id = $1 AND is_active = TRUE", agent_id
)
row = await conn.fetchrow(
"UPDATE agent_variants SET is_active = TRUE, updated_at = NOW() "
"WHERE id = $1 AND agent_id = $2 RETURNING *",
variant_id, agent_id
)
Config Resolution Module
services/shared/agent_config.py
@dataclass
class ResolvedAgentConfig:
"""Runtime configuration resolved from DB agent + optional active variant."""
agent_id: str
variant_id: str | None
model_provider: str
model_name: str
system_prompt: str
user_prompt_template: str
prompt_version: str
temperature: float
max_tokens: int
context_window: int
input_token_limit: int
token_budget: int
timeout_seconds: int
max_retries: int
class AgentConfigResolver:
"""Resolves agent configuration from DB with active variant override and TTL cache."""
def __init__(self, pool: asyncpg.Pool, ttl_seconds: int = 60):
self._pool = pool
self._ttl = ttl_seconds
self._cache: dict[str, tuple[float, ResolvedAgentConfig]] = {}
async def resolve(self, agent_slug: str) -> ResolvedAgentConfig | None:
"""Resolve config for an agent slug, preferring active variant if present."""
now = time.monotonic()
cached = self._cache.get(agent_slug)
if cached and (now - cached[0]) < self._ttl:
return cached[1]
# Query: LEFT JOIN active variant onto agent
row = await self._pool.fetchrow("""
SELECT a.id AS agent_id,
v.id AS variant_id,
COALESCE(v.model_provider, a.model_provider) AS model_provider,
COALESCE(v.model_name, a.model_name) AS model_name,
COALESCE(v.system_prompt, a.system_prompt) AS system_prompt,
COALESCE(v.user_prompt_template, a.user_prompt_template) AS user_prompt_template,
COALESCE(v.prompt_version, a.prompt_version) AS prompt_version,
COALESCE(v.temperature, a.temperature) AS temperature,
COALESCE(v.max_tokens, a.max_tokens) AS max_tokens,
COALESCE(v.context_window, 0) AS context_window,
COALESCE(v.input_token_limit, 0) AS input_token_limit,
COALESCE(v.token_budget, 0) AS token_budget,
COALESCE(v.timeout_seconds, a.timeout_seconds) AS timeout_seconds,
COALESCE(v.max_retries, a.max_retries) AS max_retries
FROM ai_agents a
LEFT JOIN agent_variants v ON v.agent_id = a.id AND v.is_active = TRUE
WHERE a.slug = $1 AND a.active = TRUE
""", agent_slug)
if not row:
return None
config = ResolvedAgentConfig(
agent_id=str(row["agent_id"]),
variant_id=str(row["variant_id"]) if row["variant_id"] else None,
model_provider=row["model_provider"],
model_name=row["model_name"],
system_prompt=row["system_prompt"],
user_prompt_template=row["user_prompt_template"],
prompt_version=row["prompt_version"],
temperature=row["temperature"],
max_tokens=row["max_tokens"],
context_window=row["context_window"],
input_token_limit=row["input_token_limit"],
token_budget=row["token_budget"],
timeout_seconds=row["timeout_seconds"],
max_retries=row["max_retries"],
)
self._cache[agent_slug] = (now, config)
return config
Service Integration Points
Document Extractor (services/extractor/client.py)
The OllamaClient currently receives an OllamaConfig at construction time. Integration approach:
- Before creating
OllamaClient, callresolver.resolve("document-extractor") - If resolved, build an
OllamaConfigfrom the resolved values - If resolution fails (DB down), fall back to env-var
OllamaConfig() - Pass resolved config to
OllamaClient.__init__ - After extraction, log to
agent_performance_logwith bothagent_idandvariant_id
Event Classifier (services/extractor/event_classifier.py)
The classify_global_event function receives an ollama_client (OllamaClient). Same pattern:
- Resolve config via
resolver.resolve("event-classifier") - If resolved, construct an OllamaClient with the resolved config
- Pass the variant_id through to performance logging
Thesis Rewriter (services/recommendation/thesis_llm.py)
The rewrite_thesis_with_llm function receives an OllamaConfig directly:
- Resolve config via
resolver.resolve("thesis-rewriter") - If resolved, override the
configparameter with resolved values - Log variant_id in performance metrics
Performance Logging Changes
The existing agent_performance_log INSERT statements need to include variant_id:
await pool.execute(
"""INSERT INTO agent_performance_log
(agent_id, variant_id, document_id, ticker, success, duration_ms,
confidence, retry_count, input_tokens, output_tokens, error_message)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)""",
agent_id, variant_id, # variant_id may be None
document_id, ticker, success, duration_ms,
confidence, retry_count, input_tokens, output_tokens, error_message,
)
Frontend Design
Agents Page Changes
The existing AgentsPage component gets extended with variant management. The layout stays the same (sidebar + detail panel), with variant sections added inside the detail panel.
Component Hierarchy
AgentsPage
├── AgentListSidebar (existing)
├── AgentDetail (existing, extended)
│ ├── Agent config card (existing)
│ ├── Agent performance card (existing)
│ ├── VariantList (new)
│ │ ├── VariantRow (name, model, active badge, actions)
│ │ └── "Clone as Variant" button
│ ├── VariantCompare (new, shown when 2+ variants selected)
│ │ ├── MetricsComparisonTable
│ │ └── OverlayPerformanceChart
│ └── VariantDetail (new, shown when single variant selected)
│ ├── Variant config card
│ └── Variant performance card
├── VariantCreateForm (new)
├── VariantEditForm (new)
└── VariantCloneForm (new)
New TanStack Query Hooks
// api/hooks.ts additions
function useAgentVariants(agentId: string | undefined)
function useVariantPerformance(agentId: string, variantId: string, hours?: number)
function useVariantPerfHistory(agentId: string, variantId: string, hours?: number)
// Mutations
function useCloneAgentAsVariant(agentId: string)
function useCloneVariant(agentId: string, variantId: string)
function useCreateVariant(agentId: string)
function useUpdateVariant(agentId: string, variantId: string)
function useDeleteVariant(agentId: string, variantId: string)
function useActivateVariant(agentId: string, variantId: string)
function useDeactivateVariants(agentId: string)
TypeScript Types
interface AgentVariant {
id: string;
agent_id: string;
variant_name: string;
variant_slug: string;
description: string;
model_provider: string;
model_name: string;
system_prompt: string;
user_prompt_template: string;
prompt_version: string;
temperature: number;
max_tokens: number;
context_window: number;
input_token_limit: number;
token_budget: number;
timeout_seconds: number;
max_retries: number;
is_active: boolean;
created_at: string;
updated_at: string;
}
Comparison View Behavior
- Each
VariantRowhas a checkbox for comparison selection - When 2+ variants are checked,
VariantComparerenders above the variant list - Comparison table shows metrics side-by-side (columns = variants)
- Chart overlays performance history lines for selected variants on shared axes
- "Activate" button in comparison view calls the activate endpoint and invalidates queries
Correctness Properties
Property 1: Single Active Variant Invariant (Req 1.4, 4.1)
For any sequence of activate/deactivate operations on variants of an agent, at most one variant per agent has is_active = TRUE at any point in time.
- Test approach: Property-based test generating random sequences of create-variant + activate + deactivate operations, then asserting the DB invariant holds after each operation.
- Generator: Random agent with 1-5 variants, random sequence of 1-20 activate/deactivate calls.
- Assertion:
SELECT COUNT(*) FROM agent_variants WHERE agent_id = $1 AND is_active = TRUEreturns 0 or 1.
Property 2: Clone Preserves Unoverridden Fields (Req 2.1, 2.3)
For any agent and any subset of override fields, cloning produces a variant where: overridden fields match the override values, and non-overridden fields match the source agent's values.
- Test approach: Property-based test generating random agent configs and random subsets of overrides.
- Generator: Random agent config (model_name, temperature, max_tokens, etc.) + random subset of override values.
- Assertion: For each field, if an override was provided the variant has the override value; otherwise it matches the source.
Property 3: Config Resolution Prefers Active Variant (Req 4.3, 4.4, 9.1-9.3)
For any agent with N variants, AgentConfigResolver.resolve(slug) returns the active variant's config when one exists, and the base agent config when none is active.
- Test approach: Property-based test. Generate agent + 0-5 variants with 0 or 1 active. Resolve and verify returned config matches the expected source.
- Generator: Random agent config, random variant configs, random active/inactive state.
- Assertion: If an active variant exists, all config fields in the resolved result match the variant. If no active variant, all fields match the base agent.
Property 4: Variant Performance Metrics Consistency (Req 6.3, 6.5)
For any agent with variants and performance log entries, the agent-level aggregated metrics are always >= variant-level metrics for any single variant (since agent-level includes all variants).
- Test approach: Property-based test. Generate performance log entries attributed to different variants of the same agent. Query agent-level and variant-level metrics. Assert agent total >= variant total for each metric.
- Generator: Random agent with 2-4 variants, 5-50 random performance log entries distributed across variants.
- Assertion:
agent.total_invocations >= variant.total_invocationsfor each variant. Same for success count, token totals.
Property 5: Partial Update Idempotence (Req 3.4)
For any variant, applying an update with a subset of fields, then applying the same update again, produces the same variant state (ignoring updated_at). The update operation is idempotent on the data fields.
- Test approach: Property-based test. Generate a variant, apply random partial update, read result, apply same update again, read result. Assert all fields except updated_at are identical.
- Generator: Random variant + random subset of updatable fields with random values.
- Assertion:
variant_after_first_update.fields == variant_after_second_update.fields(excluding updated_at).
Property 6: TTL Cache Expiry (Req 9.5)
Within the TTL window, the resolver returns cached config without querying the DB. After TTL expires, the resolver re-queries and reflects any changes.
- Test approach: Property-based test. Resolve config, change the active variant in the DB, resolve again within TTL (should get old value), advance time past TTL, resolve again (should get new value).
- Generator: Random agent configs with different variant configs. Random TTL values (1-120s).
- Assertion: Pre-TTL resolve returns original config. Post-TTL resolve returns updated config.
Property 7: Slug Auto-Generation Determinism (Req 2.4)
For any variant_name, the auto-generated slug is deterministic (same name → same slug), is a valid kebab-case string (lowercase, alphanumeric + hyphens, no leading/trailing hyphens), and is non-empty for any non-empty name.
- Test approach: Property-based test over random variant names.
- Generator: Random non-empty strings with unicode, spaces, special characters.
- Assertion:
slugify(name) == slugify(name)(deterministic), slug matches^[a-z0-9]+(-[a-z0-9]+)*$, slug is non-empty.
File Changes Summary
New Files
| File | Purpose |
|---|---|
infra/migrations/027_agent_variants.sql |
Migration: agent_variants table + performance log variant_id column |
services/shared/agent_config.py |
AgentConfigResolver with TTL cache |
Modified Files
| File | Change |
|---|---|
services/api/app.py |
Add variant CRUD, clone, activate/deactivate, performance endpoints + Pydantic models |
services/extractor/client.py |
Accept optional resolved config; pass variant_id to perf logging |
services/extractor/event_classifier.py |
Use resolver for runtime config; pass variant_id to perf logging |
services/recommendation/thesis_llm.py |
Use resolver for runtime config; pass variant_id to perf logging |
frontend/src/pages/Agents.tsx |
Add variant list, comparison, create/edit/clone forms, activate/delete actions |
frontend/src/test/mocks/handlers.ts |
Add MSW handlers for variant endpoints |
Test Files
| File | Purpose |
|---|---|
tests/test_pbt_agent_variants.py |
Property-based tests for variant invariants, clone, config resolution |
tests/test_agent_variants_api.py |
Example/edge-case tests for API endpoints |
frontend/src/test/pages.test.tsx |
Frontend tests for variant UI components |