feat: agent variants — migration, API, service integration, frontend, tests

- Migration 027: agent_variants table with single-active enforcement, variant_id column on agent_performance_log - API: full CRUD, clone from agent/variant, activate/deactivate, per-variant performance metrics and history endpoints - Services: extractor, event classifier, thesis rewriter all wired to AgentConfigResolver with variant override support - Frontend: variant list, comparison view, create/edit/clone forms, activate/delete actions on Agents page - Tests: API tests + 5 property-based tests (single-active invariant, clone preservation, config resolution, slug determinism, update idempotence) - Spec files for agent-variants feature
2026-04-17 05:15:42 +00:00
parent 734bf001a7
commit 7c23c044d7
14 changed files with 3118 additions and 120 deletions
@@ -0,0 +1,514 @@
+# Design Document: Agent Variants
+
+## Overview
+
+Add variant support to the existing AI agents system so each agent can have multiple configurations (different models, prompts, parameters) that can be independently tracked, compared, and swapped into production. This builds on the existing `ai_agents` table, `agent_performance_log` table, API endpoints, and frontend Agents page.
+
+## Architecture
+
+### System Context
+
+```
+┌──────────────┐     ┌──────────────────┐     ┌───────────────┐
+│  Agents Page │────▶│   Query API      │────▶│  PostgreSQL   │
+│  (React)     │     │  (FastAPI)       │     │  ai_agents    │
+│              │     │                  │     │  agent_variants│
+│  - List      │     │  /api/agents/    │     │  agent_perf_log│
+│  - Compare   │     │  /api/agents/    │     └───────────────┘
+│  - Activate  │     │   {id}/variants/ │
+└──────────────┘     └──────────────────┘
+                              │
+                     ┌────────┴────────┐
+                     │  Config Resolver │
+                     │  (shared module) │
+                     └────────┬────────┘
+                              │
+          ┌───────────────────┼───────────────────┐
+          ▼                   ▼                   ▼
+   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
+   │  Extractor   │  │  Event       │  │  Thesis      │
+   │  (client.py) │  │  Classifier  │  │  Rewriter    │
+   └──────┬───────┘  └──────┬───────┘  └──────┬───────┘
+          │                  │                  │
+          └──────────────────┼──────────────────┘
+                             ▼
+                    ┌──────────────┐
+                    │  Ollama      │
+                    │  Service     │
+                    └──────────────┘
+```
+
+### Key Design Decisions
+
+1. **Variants as a separate table** (not rows in `ai_agents`): Keeps the parent agent as the canonical role definition. Variants are children that override config fields. This avoids polluting the existing agent table with parent/child semantics and keeps backward compatibility.
+
+2. **Partial unique index for single-active enforcement**: Rather than application-level logic, a PostgreSQL partial unique index on `(agent_id) WHERE is_active = TRUE` guarantees at most one active variant per agent at the database level.
+
+3. **Shared config resolver module**: A new `services/shared/agent_config.py` module encapsulates the "resolve active config for an agent slug" logic with TTL caching. All three services import this instead of duplicating resolution logic.
+
+4. **Nullable variant_id on performance log**: Adding `variant_id` as nullable to `agent_performance_log` preserves backward compatibility — existing rows have NULL, new invocations record the variant when applicable.
+
+5. **No schema_version on variants**: Variants inherit the parent agent's schema_version since that defines the output structure, not a tuning parameter. Variants override model, prompt, and inference parameters only.
+
+## Database Schema
+
+### Migration 027: Agent Variants
+
+```sql
+-- Agent variant configurations: alternative model/prompt/parameter sets per agent.
+CREATE TABLE IF NOT EXISTS agent_variants (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    agent_id UUID NOT NULL REFERENCES ai_agents(id) ON DELETE CASCADE,
+    variant_name VARCHAR(200) NOT NULL,
+    variant_slug VARCHAR(200) NOT NULL,
+    description TEXT NOT NULL DEFAULT '',
+    model_provider VARCHAR(50) NOT NULL DEFAULT 'ollama',
+    model_name VARCHAR(200) NOT NULL,
+    system_prompt TEXT NOT NULL DEFAULT '',
+    user_prompt_template TEXT NOT NULL DEFAULT '',
+    prompt_version VARCHAR(100) NOT NULL DEFAULT '',
+    temperature FLOAT DEFAULT 0.0,
+    max_tokens INTEGER DEFAULT 32768,
+    context_window INTEGER DEFAULT 0,       -- Ollama num_ctx; 0 = use model default
+    input_token_limit INTEGER DEFAULT 0,    -- max input tokens before truncation; 0 = no limit
+    token_budget INTEGER DEFAULT 0,         -- total tokens per hour; 0 = unlimited
+    timeout_seconds INTEGER DEFAULT 120,
+    max_retries INTEGER DEFAULT 2,
+    is_active BOOLEAN NOT NULL DEFAULT FALSE,
+    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+
+-- Each agent can have many variants, but variant slugs must be unique per agent.
+CREATE UNIQUE INDEX IF NOT EXISTS idx_agent_variants_slug
+    ON agent_variants(agent_id, variant_slug);
+
+-- At most one active variant per agent (database-enforced invariant).
+CREATE UNIQUE INDEX IF NOT EXISTS idx_agent_variants_active
+    ON agent_variants(agent_id) WHERE is_active = TRUE;
+
+-- Fast lookup by agent.
+CREATE INDEX IF NOT EXISTS idx_agent_variants_agent
+    ON agent_variants(agent_id);
+
+-- Add variant_id to performance log for per-variant attribution.
+ALTER TABLE agent_performance_log
+    ADD COLUMN IF NOT EXISTS variant_id UUID REFERENCES agent_variants(id) ON DELETE SET NULL;
+
+CREATE INDEX IF NOT EXISTS idx_agent_perf_variant
+    ON agent_performance_log(variant_id, recorded_at DESC);
+```
+
+### Entity Relationships
+
+```
+ai_agents (1) ──── (0..*) agent_variants
+    │                        │
+    │                        │ (variant_id, nullable)
+    └───── agent_performance_log ◀──┘
+           (agent_id, required)
+```
+
+## API Design
+
+### Pydantic Models
+
+```python
+class VariantCreateBody(BaseModel):
+    variant_name: str
+    variant_slug: str | None = None  # auto-generated from name if omitted
+    description: str = ""
+    model_provider: str = "ollama"
+    model_name: str
+    system_prompt: str = ""
+    user_prompt_template: str = ""
+    prompt_version: str = ""
+    temperature: float = 0.0
+    max_tokens: int = 32768
+    context_window: int = 0        # Ollama num_ctx; 0 = model default
+    input_token_limit: int = 0     # max input tokens; 0 = no limit
+    token_budget: int = 0          # tokens per hour; 0 = unlimited
+    timeout_seconds: int = 120
+    max_retries: int = 2
+
+class VariantUpdateBody(BaseModel):
+    variant_name: str | None = None
+    description: str | None = None
+    model_provider: str | None = None
+    model_name: str | None = None
+    system_prompt: str | None = None
+    user_prompt_template: str | None = None
+    prompt_version: str | None = None
+    temperature: float | None = None
+    max_tokens: int | None = None
+    context_window: int | None = None
+    input_token_limit: int | None = None
+    token_budget: int | None = None
+    timeout_seconds: int | None = None
+    max_retries: int | None = None
+
+class VariantCloneBody(BaseModel):
+    variant_name: str
+    variant_slug: str | None = None
+    # All config fields optional — omitted fields inherit from source
+    description: str | None = None
+    model_provider: str | None = None
+    model_name: str | None = None
+    system_prompt: str | None = None
+    user_prompt_template: str | None = None
+    prompt_version: str | None = None
+    temperature: float | None = None
+    max_tokens: int | None = None
+    context_window: int | None = None
+    input_token_limit: int | None = None
+    token_budget: int | None = None
+    timeout_seconds: int | None = None
+    max_retries: int | None = None
+```
+
+### Endpoints
+
+| Method | Path | Description | Req |
+|--------|------|-------------|-----|
+| GET | `/api/agents/{agent_id}/variants` | List all variants for an agent | 3.1 |
+| GET | `/api/agents/{agent_id}/variants/{variant_id}` | Get a single variant | 3.2 |
+| POST | `/api/agents/{agent_id}/variants` | Create a variant (direct create) | 3 |
+| POST | `/api/agents/{agent_id}/clone` | Clone agent as variant | 2.1 |
+| POST | `/api/agents/{agent_id}/variants/{variant_id}/clone` | Clone variant as new variant | 2.2 |
+| PUT | `/api/agents/{agent_id}/variants/{variant_id}` | Update a variant | 3.4 |
+| DELETE | `/api/agents/{agent_id}/variants/{variant_id}` | Delete a variant | 3.5 |
+| POST | `/api/agents/{agent_id}/variants/{variant_id}/activate` | Set variant as active | 4.1 |
+| POST | `/api/agents/{agent_id}/variants/deactivate` | Deactivate current active variant | 4.2 |
+| GET | `/api/agents/{agent_id}/variants/{variant_id}/performance` | Variant performance metrics | 6.3 |
+| GET | `/api/agents/{agent_id}/variants/{variant_id}/performance/history` | Variant performance time-series | 6.4 |
+
+### Clone Endpoint Logic (POST `/api/agents/{agent_id}/clone`)
+
+```python
+# 1. Fetch source agent
+agent = await pool.fetchrow("SELECT * FROM ai_agents WHERE id = $1", agent_id)
+
+# 2. Build variant record: start with agent fields, overlay user overrides
+variant_data = {
+    "model_provider": body.model_provider or agent["model_provider"],
+    "model_name": body.model_name or agent["model_name"],
+    "system_prompt": body.system_prompt if body.system_prompt is not None else agent["system_prompt"],
+    # ... etc for all config fields
+}
+
+# 3. Generate slug if not provided
+slug = body.variant_slug or slugify(body.variant_name)
+
+# 4. Insert into agent_variants
+row = await pool.fetchrow(
+    "INSERT INTO agent_variants (...) VALUES (...) RETURNING *",
+    agent_id, body.variant_name, slug, **variant_data
+)
+```
+
+### Activate Endpoint Logic (POST `.../activate`)
+
+```python
+# Single transaction: deactivate previous, activate target
+async with pool.acquire() as conn:
+    async with conn.transaction():
+        await conn.execute(
+            "UPDATE agent_variants SET is_active = FALSE, updated_at = NOW() "
+            "WHERE agent_id = $1 AND is_active = TRUE", agent_id
+        )
+        row = await conn.fetchrow(
+            "UPDATE agent_variants SET is_active = TRUE, updated_at = NOW() "
+            "WHERE id = $1 AND agent_id = $2 RETURNING *",
+            variant_id, agent_id
+        )
+```
+
+## Config Resolution Module
+
+### `services/shared/agent_config.py`
+
+```python
+@dataclass
+class ResolvedAgentConfig:
+    """Runtime configuration resolved from DB agent + optional active variant."""
+    agent_id: str
+    variant_id: str | None
+    model_provider: str
+    model_name: str
+    system_prompt: str
+    user_prompt_template: str
+    prompt_version: str
+    temperature: float
+    max_tokens: int
+    context_window: int
+    input_token_limit: int
+    token_budget: int
+    timeout_seconds: int
+    max_retries: int
+
+class AgentConfigResolver:
+    """Resolves agent configuration from DB with active variant override and TTL cache."""
+
+    def __init__(self, pool: asyncpg.Pool, ttl_seconds: int = 60):
+        self._pool = pool
+        self._ttl = ttl_seconds
+        self._cache: dict[str, tuple[float, ResolvedAgentConfig]] = {}
+
+    async def resolve(self, agent_slug: str) -> ResolvedAgentConfig | None:
+        """Resolve config for an agent slug, preferring active variant if present."""
+        now = time.monotonic()
+        cached = self._cache.get(agent_slug)
+        if cached and (now - cached[0]) < self._ttl:
+            return cached[1]
+
+        # Query: LEFT JOIN active variant onto agent
+        row = await self._pool.fetchrow("""
+            SELECT a.id AS agent_id,
+                   v.id AS variant_id,
+                   COALESCE(v.model_provider, a.model_provider) AS model_provider,
+                   COALESCE(v.model_name, a.model_name) AS model_name,
+                   COALESCE(v.system_prompt, a.system_prompt) AS system_prompt,
+                   COALESCE(v.user_prompt_template, a.user_prompt_template) AS user_prompt_template,
+                   COALESCE(v.prompt_version, a.prompt_version) AS prompt_version,
+                   COALESCE(v.temperature, a.temperature) AS temperature,
+                   COALESCE(v.max_tokens, a.max_tokens) AS max_tokens,
+                   COALESCE(v.context_window, 0) AS context_window,
+                   COALESCE(v.input_token_limit, 0) AS input_token_limit,
+                   COALESCE(v.token_budget, 0) AS token_budget,
+                   COALESCE(v.timeout_seconds, a.timeout_seconds) AS timeout_seconds,
+                   COALESCE(v.max_retries, a.max_retries) AS max_retries
+            FROM ai_agents a
+            LEFT JOIN agent_variants v ON v.agent_id = a.id AND v.is_active = TRUE
+            WHERE a.slug = $1 AND a.active = TRUE
+        """, agent_slug)
+
+        if not row:
+            return None
+
+        config = ResolvedAgentConfig(
+            agent_id=str(row["agent_id"]),
+            variant_id=str(row["variant_id"]) if row["variant_id"] else None,
+            model_provider=row["model_provider"],
+            model_name=row["model_name"],
+            system_prompt=row["system_prompt"],
+            user_prompt_template=row["user_prompt_template"],
+            prompt_version=row["prompt_version"],
+            temperature=row["temperature"],
+            max_tokens=row["max_tokens"],
+            context_window=row["context_window"],
+            input_token_limit=row["input_token_limit"],
+            token_budget=row["token_budget"],
+            timeout_seconds=row["timeout_seconds"],
+            max_retries=row["max_retries"],
+        )
+        self._cache[agent_slug] = (now, config)
+        return config
+```
+
+## Service Integration Points
+
+### Document Extractor (`services/extractor/client.py`)
+
+The `OllamaClient` currently receives an `OllamaConfig` at construction time. Integration approach:
+
+1. Before creating `OllamaClient`, call `resolver.resolve("document-extractor")`
+2. If resolved, build an `OllamaConfig` from the resolved values
+3. If resolution fails (DB down), fall back to env-var `OllamaConfig()`
+4. Pass resolved config to `OllamaClient.__init__`
+5. After extraction, log to `agent_performance_log` with both `agent_id` and `variant_id`
+
+### Event Classifier (`services/extractor/event_classifier.py`)
+
+The `classify_global_event` function receives an `ollama_client` (OllamaClient). Same pattern:
+
+1. Resolve config via `resolver.resolve("event-classifier")`
+2. If resolved, construct an OllamaClient with the resolved config
+3. Pass the variant_id through to performance logging
+
+### Thesis Rewriter (`services/recommendation/thesis_llm.py`)
+
+The `rewrite_thesis_with_llm` function receives an `OllamaConfig` directly:
+
+1. Resolve config via `resolver.resolve("thesis-rewriter")`
+2. If resolved, override the `config` parameter with resolved values
+3. Log variant_id in performance metrics
+
+### Performance Logging Changes
+
+The existing `agent_performance_log` INSERT statements need to include `variant_id`:
+
+```python
+await pool.execute(
+    """INSERT INTO agent_performance_log
+       (agent_id, variant_id, document_id, ticker, success, duration_ms,
+        confidence, retry_count, input_tokens, output_tokens, error_message)
+       VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)""",
+    agent_id, variant_id,  # variant_id may be None
+    document_id, ticker, success, duration_ms,
+    confidence, retry_count, input_tokens, output_tokens, error_message,
+)
+```
+
+## Frontend Design
+
+### Agents Page Changes
+
+The existing `AgentsPage` component gets extended with variant management. The layout stays the same (sidebar + detail panel), with variant sections added inside the detail panel.
+
+### Component Hierarchy
+
+```
+AgentsPage
+├── AgentListSidebar (existing)
+├── AgentDetail (existing, extended)
+│   ├── Agent config card (existing)
+│   ├── Agent performance card (existing)
+│   ├── VariantList (new)
+│   │   ├── VariantRow (name, model, active badge, actions)
+│   │   └── "Clone as Variant" button
+│   ├── VariantCompare (new, shown when 2+ variants selected)
+│   │   ├── MetricsComparisonTable
+│   │   └── OverlayPerformanceChart
+│   └── VariantDetail (new, shown when single variant selected)
+│       ├── Variant config card
+│       └── Variant performance card
+├── VariantCreateForm (new)
+├── VariantEditForm (new)
+└── VariantCloneForm (new)
+```
+
+### New TanStack Query Hooks
+
+```typescript
+// api/hooks.ts additions
+function useAgentVariants(agentId: string | undefined)
+function useVariantPerformance(agentId: string, variantId: string, hours?: number)
+function useVariantPerfHistory(agentId: string, variantId: string, hours?: number)
+
+// Mutations
+function useCloneAgentAsVariant(agentId: string)
+function useCloneVariant(agentId: string, variantId: string)
+function useCreateVariant(agentId: string)
+function useUpdateVariant(agentId: string, variantId: string)
+function useDeleteVariant(agentId: string, variantId: string)
+function useActivateVariant(agentId: string, variantId: string)
+function useDeactivateVariants(agentId: string)
+```
+
+### TypeScript Types
+
+```typescript
+interface AgentVariant {
+  id: string;
+  agent_id: string;
+  variant_name: string;
+  variant_slug: string;
+  description: string;
+  model_provider: string;
+  model_name: string;
+  system_prompt: string;
+  user_prompt_template: string;
+  prompt_version: string;
+  temperature: number;
+  max_tokens: number;
+  context_window: number;
+  input_token_limit: number;
+  token_budget: number;
+  timeout_seconds: number;
+  max_retries: number;
+  is_active: boolean;
+  created_at: string;
+  updated_at: string;
+}
+```
+
+### Comparison View Behavior
+
+1. Each `VariantRow` has a checkbox for comparison selection
+2. When 2+ variants are checked, `VariantCompare` renders above the variant list
+3. Comparison table shows metrics side-by-side (columns = variants)
+4. Chart overlays performance history lines for selected variants on shared axes
+5. "Activate" button in comparison view calls the activate endpoint and invalidates queries
+
+## Correctness Properties
+
+### Property 1: Single Active Variant Invariant (Req 1.4, 4.1)
+
+For any sequence of activate/deactivate operations on variants of an agent, at most one variant per agent has `is_active = TRUE` at any point in time.
+
+- **Test approach**: Property-based test generating random sequences of create-variant + activate + deactivate operations, then asserting the DB invariant holds after each operation.
+- **Generator**: Random agent with 1-5 variants, random sequence of 1-20 activate/deactivate calls.
+- **Assertion**: `SELECT COUNT(*) FROM agent_variants WHERE agent_id = $1 AND is_active = TRUE` returns 0 or 1.
+
+### Property 2: Clone Preserves Unoverridden Fields (Req 2.1, 2.3)
+
+For any agent and any subset of override fields, cloning produces a variant where: overridden fields match the override values, and non-overridden fields match the source agent's values.
+
+- **Test approach**: Property-based test generating random agent configs and random subsets of overrides.
+- **Generator**: Random agent config (model_name, temperature, max_tokens, etc.) + random subset of override values.
+- **Assertion**: For each field, if an override was provided the variant has the override value; otherwise it matches the source.
+
+### Property 3: Config Resolution Prefers Active Variant (Req 4.3, 4.4, 9.1-9.3)
+
+For any agent with N variants, `AgentConfigResolver.resolve(slug)` returns the active variant's config when one exists, and the base agent config when none is active.
+
+- **Test approach**: Property-based test. Generate agent + 0-5 variants with 0 or 1 active. Resolve and verify returned config matches the expected source.
+- **Generator**: Random agent config, random variant configs, random active/inactive state.
+- **Assertion**: If an active variant exists, all config fields in the resolved result match the variant. If no active variant, all fields match the base agent.
+
+### Property 4: Variant Performance Metrics Consistency (Req 6.3, 6.5)
+
+For any agent with variants and performance log entries, the agent-level aggregated metrics are always >= variant-level metrics for any single variant (since agent-level includes all variants).
+
+- **Test approach**: Property-based test. Generate performance log entries attributed to different variants of the same agent. Query agent-level and variant-level metrics. Assert agent total >= variant total for each metric.
+- **Generator**: Random agent with 2-4 variants, 5-50 random performance log entries distributed across variants.
+- **Assertion**: `agent.total_invocations >= variant.total_invocations` for each variant. Same for success count, token totals.
+
+### Property 5: Partial Update Idempotence (Req 3.4)
+
+For any variant, applying an update with a subset of fields, then applying the same update again, produces the same variant state (ignoring updated_at). The update operation is idempotent on the data fields.
+
+- **Test approach**: Property-based test. Generate a variant, apply random partial update, read result, apply same update again, read result. Assert all fields except updated_at are identical.
+- **Generator**: Random variant + random subset of updatable fields with random values.
+- **Assertion**: `variant_after_first_update.fields == variant_after_second_update.fields` (excluding updated_at).
+
+### Property 6: TTL Cache Expiry (Req 9.5)
+
+Within the TTL window, the resolver returns cached config without querying the DB. After TTL expires, the resolver re-queries and reflects any changes.
+
+- **Test approach**: Property-based test. Resolve config, change the active variant in the DB, resolve again within TTL (should get old value), advance time past TTL, resolve again (should get new value).
+- **Generator**: Random agent configs with different variant configs. Random TTL values (1-120s).
+- **Assertion**: Pre-TTL resolve returns original config. Post-TTL resolve returns updated config.
+
+### Property 7: Slug Auto-Generation Determinism (Req 2.4)
+
+For any variant_name, the auto-generated slug is deterministic (same name → same slug), is a valid kebab-case string (lowercase, alphanumeric + hyphens, no leading/trailing hyphens), and is non-empty for any non-empty name.
+
+- **Test approach**: Property-based test over random variant names.
+- **Generator**: Random non-empty strings with unicode, spaces, special characters.
+- **Assertion**: `slugify(name) == slugify(name)` (deterministic), slug matches `^[a-z0-9]+(-[a-z0-9]+)*$`, slug is non-empty.
+
+## File Changes Summary
+
+### New Files
+| File | Purpose |
+|------|---------|
+| `infra/migrations/027_agent_variants.sql` | Migration: agent_variants table + performance log variant_id column |
+| `services/shared/agent_config.py` | AgentConfigResolver with TTL cache |
+
+### Modified Files
+| File | Change |
+|------|--------|
+| `services/api/app.py` | Add variant CRUD, clone, activate/deactivate, performance endpoints + Pydantic models |
+| `services/extractor/client.py` | Accept optional resolved config; pass variant_id to perf logging |
+| `services/extractor/event_classifier.py` | Use resolver for runtime config; pass variant_id to perf logging |
+| `services/recommendation/thesis_llm.py` | Use resolver for runtime config; pass variant_id to perf logging |
+| `frontend/src/pages/Agents.tsx` | Add variant list, comparison, create/edit/clone forms, activate/delete actions |
+| `frontend/src/test/mocks/handlers.ts` | Add MSW handlers for variant endpoints |
+
+### Test Files
+| File | Purpose |
+|------|---------|
+| `tests/test_pbt_agent_variants.py` | Property-based tests for variant invariants, clone, config resolution |
+| `tests/test_agent_variants_api.py` | Example/edge-case tests for API endpoints |
+| `frontend/src/test/pages.test.tsx` | Frontend tests for variant UI components |