Files
stonks-oracle/.kiro/specs/agent-variants/design.md
T
Celes Renata 7c23c044d7 feat: agent variants — migration, API, service integration, frontend, tests
- Migration 027: agent_variants table with single-active enforcement,
  variant_id column on agent_performance_log
- API: full CRUD, clone from agent/variant, activate/deactivate,
  per-variant performance metrics and history endpoints
- Services: extractor, event classifier, thesis rewriter all wired
  to AgentConfigResolver with variant override support
- Frontend: variant list, comparison view, create/edit/clone forms,
  activate/delete actions on Agents page
- Tests: API tests + 5 property-based tests (single-active invariant,
  clone preservation, config resolution, slug determinism, update idempotence)
- Spec files for agent-variants feature
2026-04-17 05:15:42 +00:00

24 KiB

Design Document: Agent Variants

Overview

Add variant support to the existing AI agents system so each agent can have multiple configurations (different models, prompts, parameters) that can be independently tracked, compared, and swapped into production. This builds on the existing ai_agents table, agent_performance_log table, API endpoints, and frontend Agents page.

Architecture

System Context

┌──────────────┐     ┌──────────────────┐     ┌───────────────┐
│  Agents Page │────▶│   Query API      │────▶│  PostgreSQL   │
│  (React)     │     │  (FastAPI)       │     │  ai_agents    │
│              │     │                  │     │  agent_variants│
│  - List      │     │  /api/agents/    │     │  agent_perf_log│
│  - Compare   │     │  /api/agents/    │     └───────────────┘
│  - Activate  │     │   {id}/variants/ │
└──────────────┘     └──────────────────┘
                              │
                     ┌────────┴────────┐
                     │  Config Resolver │
                     │  (shared module) │
                     └────────┬────────┘
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
   │  Extractor   │  │  Event       │  │  Thesis      │
   │  (client.py) │  │  Classifier  │  │  Rewriter    │
   └──────┬───────┘  └──────┬───────┘  └──────┬───────┘
          │                  │                  │
          └──────────────────┼──────────────────┘
                             ▼
                    ┌──────────────┐
                    │  Ollama      │
                    │  Service     │
                    └──────────────┘

Key Design Decisions

  1. Variants as a separate table (not rows in ai_agents): Keeps the parent agent as the canonical role definition. Variants are children that override config fields. This avoids polluting the existing agent table with parent/child semantics and keeps backward compatibility.

  2. Partial unique index for single-active enforcement: Rather than application-level logic, a PostgreSQL partial unique index on (agent_id) WHERE is_active = TRUE guarantees at most one active variant per agent at the database level.

  3. Shared config resolver module: A new services/shared/agent_config.py module encapsulates the "resolve active config for an agent slug" logic with TTL caching. All three services import this instead of duplicating resolution logic.

  4. Nullable variant_id on performance log: Adding variant_id as nullable to agent_performance_log preserves backward compatibility — existing rows have NULL, new invocations record the variant when applicable.

  5. No schema_version on variants: Variants inherit the parent agent's schema_version since that defines the output structure, not a tuning parameter. Variants override model, prompt, and inference parameters only.

Database Schema

Migration 027: Agent Variants

-- Agent variant configurations: alternative model/prompt/parameter sets per agent.
CREATE TABLE IF NOT EXISTS agent_variants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_id UUID NOT NULL REFERENCES ai_agents(id) ON DELETE CASCADE,
    variant_name VARCHAR(200) NOT NULL,
    variant_slug VARCHAR(200) NOT NULL,
    description TEXT NOT NULL DEFAULT '',
    model_provider VARCHAR(50) NOT NULL DEFAULT 'ollama',
    model_name VARCHAR(200) NOT NULL,
    system_prompt TEXT NOT NULL DEFAULT '',
    user_prompt_template TEXT NOT NULL DEFAULT '',
    prompt_version VARCHAR(100) NOT NULL DEFAULT '',
    temperature FLOAT DEFAULT 0.0,
    max_tokens INTEGER DEFAULT 32768,
    context_window INTEGER DEFAULT 0,       -- Ollama num_ctx; 0 = use model default
    input_token_limit INTEGER DEFAULT 0,    -- max input tokens before truncation; 0 = no limit
    token_budget INTEGER DEFAULT 0,         -- total tokens per hour; 0 = unlimited
    timeout_seconds INTEGER DEFAULT 120,
    max_retries INTEGER DEFAULT 2,
    is_active BOOLEAN NOT NULL DEFAULT FALSE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Each agent can have many variants, but variant slugs must be unique per agent.
CREATE UNIQUE INDEX IF NOT EXISTS idx_agent_variants_slug
    ON agent_variants(agent_id, variant_slug);

-- At most one active variant per agent (database-enforced invariant).
CREATE UNIQUE INDEX IF NOT EXISTS idx_agent_variants_active
    ON agent_variants(agent_id) WHERE is_active = TRUE;

-- Fast lookup by agent.
CREATE INDEX IF NOT EXISTS idx_agent_variants_agent
    ON agent_variants(agent_id);

-- Add variant_id to performance log for per-variant attribution.
ALTER TABLE agent_performance_log
    ADD COLUMN IF NOT EXISTS variant_id UUID REFERENCES agent_variants(id) ON DELETE SET NULL;

CREATE INDEX IF NOT EXISTS idx_agent_perf_variant
    ON agent_performance_log(variant_id, recorded_at DESC);

Entity Relationships

ai_agents (1) ──── (0..*) agent_variants
    │                        │
    │                        │ (variant_id, nullable)
    └───── agent_performance_log ◀──┘
           (agent_id, required)

API Design

Pydantic Models

class VariantCreateBody(BaseModel):
    variant_name: str
    variant_slug: str | None = None  # auto-generated from name if omitted
    description: str = ""
    model_provider: str = "ollama"
    model_name: str
    system_prompt: str = ""
    user_prompt_template: str = ""
    prompt_version: str = ""
    temperature: float = 0.0
    max_tokens: int = 32768
    context_window: int = 0        # Ollama num_ctx; 0 = model default
    input_token_limit: int = 0     # max input tokens; 0 = no limit
    token_budget: int = 0          # tokens per hour; 0 = unlimited
    timeout_seconds: int = 120
    max_retries: int = 2

class VariantUpdateBody(BaseModel):
    variant_name: str | None = None
    description: str | None = None
    model_provider: str | None = None
    model_name: str | None = None
    system_prompt: str | None = None
    user_prompt_template: str | None = None
    prompt_version: str | None = None
    temperature: float | None = None
    max_tokens: int | None = None
    context_window: int | None = None
    input_token_limit: int | None = None
    token_budget: int | None = None
    timeout_seconds: int | None = None
    max_retries: int | None = None

class VariantCloneBody(BaseModel):
    variant_name: str
    variant_slug: str | None = None
    # All config fields optional — omitted fields inherit from source
    description: str | None = None
    model_provider: str | None = None
    model_name: str | None = None
    system_prompt: str | None = None
    user_prompt_template: str | None = None
    prompt_version: str | None = None
    temperature: float | None = None
    max_tokens: int | None = None
    context_window: int | None = None
    input_token_limit: int | None = None
    token_budget: int | None = None
    timeout_seconds: int | None = None
    max_retries: int | None = None

Endpoints

Method Path Description Req
GET /api/agents/{agent_id}/variants List all variants for an agent 3.1
GET /api/agents/{agent_id}/variants/{variant_id} Get a single variant 3.2
POST /api/agents/{agent_id}/variants Create a variant (direct create) 3
POST /api/agents/{agent_id}/clone Clone agent as variant 2.1
POST /api/agents/{agent_id}/variants/{variant_id}/clone Clone variant as new variant 2.2
PUT /api/agents/{agent_id}/variants/{variant_id} Update a variant 3.4
DELETE /api/agents/{agent_id}/variants/{variant_id} Delete a variant 3.5
POST /api/agents/{agent_id}/variants/{variant_id}/activate Set variant as active 4.1
POST /api/agents/{agent_id}/variants/deactivate Deactivate current active variant 4.2
GET /api/agents/{agent_id}/variants/{variant_id}/performance Variant performance metrics 6.3
GET /api/agents/{agent_id}/variants/{variant_id}/performance/history Variant performance time-series 6.4

Clone Endpoint Logic (POST /api/agents/{agent_id}/clone)

# 1. Fetch source agent
agent = await pool.fetchrow("SELECT * FROM ai_agents WHERE id = $1", agent_id)

# 2. Build variant record: start with agent fields, overlay user overrides
variant_data = {
    "model_provider": body.model_provider or agent["model_provider"],
    "model_name": body.model_name or agent["model_name"],
    "system_prompt": body.system_prompt if body.system_prompt is not None else agent["system_prompt"],
    # ... etc for all config fields
}

# 3. Generate slug if not provided
slug = body.variant_slug or slugify(body.variant_name)

# 4. Insert into agent_variants
row = await pool.fetchrow(
    "INSERT INTO agent_variants (...) VALUES (...) RETURNING *",
    agent_id, body.variant_name, slug, **variant_data
)

Activate Endpoint Logic (POST .../activate)

# Single transaction: deactivate previous, activate target
async with pool.acquire() as conn:
    async with conn.transaction():
        await conn.execute(
            "UPDATE agent_variants SET is_active = FALSE, updated_at = NOW() "
            "WHERE agent_id = $1 AND is_active = TRUE", agent_id
        )
        row = await conn.fetchrow(
            "UPDATE agent_variants SET is_active = TRUE, updated_at = NOW() "
            "WHERE id = $1 AND agent_id = $2 RETURNING *",
            variant_id, agent_id
        )

Config Resolution Module

services/shared/agent_config.py

@dataclass
class ResolvedAgentConfig:
    """Runtime configuration resolved from DB agent + optional active variant."""
    agent_id: str
    variant_id: str | None
    model_provider: str
    model_name: str
    system_prompt: str
    user_prompt_template: str
    prompt_version: str
    temperature: float
    max_tokens: int
    context_window: int
    input_token_limit: int
    token_budget: int
    timeout_seconds: int
    max_retries: int

class AgentConfigResolver:
    """Resolves agent configuration from DB with active variant override and TTL cache."""

    def __init__(self, pool: asyncpg.Pool, ttl_seconds: int = 60):
        self._pool = pool
        self._ttl = ttl_seconds
        self._cache: dict[str, tuple[float, ResolvedAgentConfig]] = {}

    async def resolve(self, agent_slug: str) -> ResolvedAgentConfig | None:
        """Resolve config for an agent slug, preferring active variant if present."""
        now = time.monotonic()
        cached = self._cache.get(agent_slug)
        if cached and (now - cached[0]) < self._ttl:
            return cached[1]

        # Query: LEFT JOIN active variant onto agent
        row = await self._pool.fetchrow("""
            SELECT a.id AS agent_id,
                   v.id AS variant_id,
                   COALESCE(v.model_provider, a.model_provider) AS model_provider,
                   COALESCE(v.model_name, a.model_name) AS model_name,
                   COALESCE(v.system_prompt, a.system_prompt) AS system_prompt,
                   COALESCE(v.user_prompt_template, a.user_prompt_template) AS user_prompt_template,
                   COALESCE(v.prompt_version, a.prompt_version) AS prompt_version,
                   COALESCE(v.temperature, a.temperature) AS temperature,
                   COALESCE(v.max_tokens, a.max_tokens) AS max_tokens,
                   COALESCE(v.context_window, 0) AS context_window,
                   COALESCE(v.input_token_limit, 0) AS input_token_limit,
                   COALESCE(v.token_budget, 0) AS token_budget,
                   COALESCE(v.timeout_seconds, a.timeout_seconds) AS timeout_seconds,
                   COALESCE(v.max_retries, a.max_retries) AS max_retries
            FROM ai_agents a
            LEFT JOIN agent_variants v ON v.agent_id = a.id AND v.is_active = TRUE
            WHERE a.slug = $1 AND a.active = TRUE
        """, agent_slug)

        if not row:
            return None

        config = ResolvedAgentConfig(
            agent_id=str(row["agent_id"]),
            variant_id=str(row["variant_id"]) if row["variant_id"] else None,
            model_provider=row["model_provider"],
            model_name=row["model_name"],
            system_prompt=row["system_prompt"],
            user_prompt_template=row["user_prompt_template"],
            prompt_version=row["prompt_version"],
            temperature=row["temperature"],
            max_tokens=row["max_tokens"],
            context_window=row["context_window"],
            input_token_limit=row["input_token_limit"],
            token_budget=row["token_budget"],
            timeout_seconds=row["timeout_seconds"],
            max_retries=row["max_retries"],
        )
        self._cache[agent_slug] = (now, config)
        return config

Service Integration Points

Document Extractor (services/extractor/client.py)

The OllamaClient currently receives an OllamaConfig at construction time. Integration approach:

  1. Before creating OllamaClient, call resolver.resolve("document-extractor")
  2. If resolved, build an OllamaConfig from the resolved values
  3. If resolution fails (DB down), fall back to env-var OllamaConfig()
  4. Pass resolved config to OllamaClient.__init__
  5. After extraction, log to agent_performance_log with both agent_id and variant_id

Event Classifier (services/extractor/event_classifier.py)

The classify_global_event function receives an ollama_client (OllamaClient). Same pattern:

  1. Resolve config via resolver.resolve("event-classifier")
  2. If resolved, construct an OllamaClient with the resolved config
  3. Pass the variant_id through to performance logging

Thesis Rewriter (services/recommendation/thesis_llm.py)

The rewrite_thesis_with_llm function receives an OllamaConfig directly:

  1. Resolve config via resolver.resolve("thesis-rewriter")
  2. If resolved, override the config parameter with resolved values
  3. Log variant_id in performance metrics

Performance Logging Changes

The existing agent_performance_log INSERT statements need to include variant_id:

await pool.execute(
    """INSERT INTO agent_performance_log
       (agent_id, variant_id, document_id, ticker, success, duration_ms,
        confidence, retry_count, input_tokens, output_tokens, error_message)
       VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)""",
    agent_id, variant_id,  # variant_id may be None
    document_id, ticker, success, duration_ms,
    confidence, retry_count, input_tokens, output_tokens, error_message,
)

Frontend Design

Agents Page Changes

The existing AgentsPage component gets extended with variant management. The layout stays the same (sidebar + detail panel), with variant sections added inside the detail panel.

Component Hierarchy

AgentsPage
├── AgentListSidebar (existing)
├── AgentDetail (existing, extended)
│   ├── Agent config card (existing)
│   ├── Agent performance card (existing)
│   ├── VariantList (new)
│   │   ├── VariantRow (name, model, active badge, actions)
│   │   └── "Clone as Variant" button
│   ├── VariantCompare (new, shown when 2+ variants selected)
│   │   ├── MetricsComparisonTable
│   │   └── OverlayPerformanceChart
│   └── VariantDetail (new, shown when single variant selected)
│       ├── Variant config card
│       └── Variant performance card
├── VariantCreateForm (new)
├── VariantEditForm (new)
└── VariantCloneForm (new)

New TanStack Query Hooks

// api/hooks.ts additions
function useAgentVariants(agentId: string | undefined)
function useVariantPerformance(agentId: string, variantId: string, hours?: number)
function useVariantPerfHistory(agentId: string, variantId: string, hours?: number)

// Mutations
function useCloneAgentAsVariant(agentId: string)
function useCloneVariant(agentId: string, variantId: string)
function useCreateVariant(agentId: string)
function useUpdateVariant(agentId: string, variantId: string)
function useDeleteVariant(agentId: string, variantId: string)
function useActivateVariant(agentId: string, variantId: string)
function useDeactivateVariants(agentId: string)

TypeScript Types

interface AgentVariant {
  id: string;
  agent_id: string;
  variant_name: string;
  variant_slug: string;
  description: string;
  model_provider: string;
  model_name: string;
  system_prompt: string;
  user_prompt_template: string;
  prompt_version: string;
  temperature: number;
  max_tokens: number;
  context_window: number;
  input_token_limit: number;
  token_budget: number;
  timeout_seconds: number;
  max_retries: number;
  is_active: boolean;
  created_at: string;
  updated_at: string;
}

Comparison View Behavior

  1. Each VariantRow has a checkbox for comparison selection
  2. When 2+ variants are checked, VariantCompare renders above the variant list
  3. Comparison table shows metrics side-by-side (columns = variants)
  4. Chart overlays performance history lines for selected variants on shared axes
  5. "Activate" button in comparison view calls the activate endpoint and invalidates queries

Correctness Properties

Property 1: Single Active Variant Invariant (Req 1.4, 4.1)

For any sequence of activate/deactivate operations on variants of an agent, at most one variant per agent has is_active = TRUE at any point in time.

  • Test approach: Property-based test generating random sequences of create-variant + activate + deactivate operations, then asserting the DB invariant holds after each operation.
  • Generator: Random agent with 1-5 variants, random sequence of 1-20 activate/deactivate calls.
  • Assertion: SELECT COUNT(*) FROM agent_variants WHERE agent_id = $1 AND is_active = TRUE returns 0 or 1.

Property 2: Clone Preserves Unoverridden Fields (Req 2.1, 2.3)

For any agent and any subset of override fields, cloning produces a variant where: overridden fields match the override values, and non-overridden fields match the source agent's values.

  • Test approach: Property-based test generating random agent configs and random subsets of overrides.
  • Generator: Random agent config (model_name, temperature, max_tokens, etc.) + random subset of override values.
  • Assertion: For each field, if an override was provided the variant has the override value; otherwise it matches the source.

Property 3: Config Resolution Prefers Active Variant (Req 4.3, 4.4, 9.1-9.3)

For any agent with N variants, AgentConfigResolver.resolve(slug) returns the active variant's config when one exists, and the base agent config when none is active.

  • Test approach: Property-based test. Generate agent + 0-5 variants with 0 or 1 active. Resolve and verify returned config matches the expected source.
  • Generator: Random agent config, random variant configs, random active/inactive state.
  • Assertion: If an active variant exists, all config fields in the resolved result match the variant. If no active variant, all fields match the base agent.

Property 4: Variant Performance Metrics Consistency (Req 6.3, 6.5)

For any agent with variants and performance log entries, the agent-level aggregated metrics are always >= variant-level metrics for any single variant (since agent-level includes all variants).

  • Test approach: Property-based test. Generate performance log entries attributed to different variants of the same agent. Query agent-level and variant-level metrics. Assert agent total >= variant total for each metric.
  • Generator: Random agent with 2-4 variants, 5-50 random performance log entries distributed across variants.
  • Assertion: agent.total_invocations >= variant.total_invocations for each variant. Same for success count, token totals.

Property 5: Partial Update Idempotence (Req 3.4)

For any variant, applying an update with a subset of fields, then applying the same update again, produces the same variant state (ignoring updated_at). The update operation is idempotent on the data fields.

  • Test approach: Property-based test. Generate a variant, apply random partial update, read result, apply same update again, read result. Assert all fields except updated_at are identical.
  • Generator: Random variant + random subset of updatable fields with random values.
  • Assertion: variant_after_first_update.fields == variant_after_second_update.fields (excluding updated_at).

Property 6: TTL Cache Expiry (Req 9.5)

Within the TTL window, the resolver returns cached config without querying the DB. After TTL expires, the resolver re-queries and reflects any changes.

  • Test approach: Property-based test. Resolve config, change the active variant in the DB, resolve again within TTL (should get old value), advance time past TTL, resolve again (should get new value).
  • Generator: Random agent configs with different variant configs. Random TTL values (1-120s).
  • Assertion: Pre-TTL resolve returns original config. Post-TTL resolve returns updated config.

Property 7: Slug Auto-Generation Determinism (Req 2.4)

For any variant_name, the auto-generated slug is deterministic (same name → same slug), is a valid kebab-case string (lowercase, alphanumeric + hyphens, no leading/trailing hyphens), and is non-empty for any non-empty name.

  • Test approach: Property-based test over random variant names.
  • Generator: Random non-empty strings with unicode, spaces, special characters.
  • Assertion: slugify(name) == slugify(name) (deterministic), slug matches ^[a-z0-9]+(-[a-z0-9]+)*$, slug is non-empty.

File Changes Summary

New Files

File Purpose
infra/migrations/027_agent_variants.sql Migration: agent_variants table + performance log variant_id column
services/shared/agent_config.py AgentConfigResolver with TTL cache

Modified Files

File Change
services/api/app.py Add variant CRUD, clone, activate/deactivate, performance endpoints + Pydantic models
services/extractor/client.py Accept optional resolved config; pass variant_id to perf logging
services/extractor/event_classifier.py Use resolver for runtime config; pass variant_id to perf logging
services/recommendation/thesis_llm.py Use resolver for runtime config; pass variant_id to perf logging
frontend/src/pages/Agents.tsx Add variant list, comparison, create/edit/clone forms, activate/delete actions
frontend/src/test/mocks/handlers.ts Add MSW handlers for variant endpoints

Test Files

File Purpose
tests/test_pbt_agent_variants.py Property-based tests for variant invariants, clone, config resolution
tests/test_agent_variants_api.py Example/edge-case tests for API endpoints
frontend/src/test/pages.test.tsx Frontend tests for variant UI components