- Migration 027: agent_variants table with single-active enforcement, variant_id column on agent_performance_log - API: full CRUD, clone from agent/variant, activate/deactivate, per-variant performance metrics and history endpoints - Services: extractor, event classifier, thesis rewriter all wired to AgentConfigResolver with variant override support - Frontend: variant list, comparison view, create/edit/clone forms, activate/delete actions on Agents page - Tests: API tests + 5 property-based tests (single-active invariant, clone preservation, config resolution, slug determinism, update idempotence) - Spec files for agent-variants feature
15 KiB
Requirements Document
Introduction
Add variant support to the existing AI agents system. Each agent (Document Intelligence Extractor, Global Event Classifier, Thesis Rewriter) can have multiple variants — different model, prompt, and parameter configurations — enabling A/B testing, model comparison, and iterative prompt engineering. Users can clone agents as variants, track per-variant performance, compare variants side-by-side, and swap which variant is the active one running in production for a given agent role.
Glossary
- Agent: A record in the
ai_agentstable representing an AI role (e.g. Document Intelligence Extractor). Each Agent has a purpose, model configuration, and prompts. - Variant: A child configuration of an Agent that inherits the parent Agent's role and purpose but allows independent model, prompt, and parameter overrides. Stored in a new
agent_variantstable. - Active_Variant: The single Variant (or the base Agent configuration) currently designated to execute in production for a given Agent role. Only one Variant per Agent can be active at a time.
- Base_Configuration: The original Agent's model, prompt, and parameter settings before any Variant is created. Serves as the default Active_Variant when no Variant has been promoted.
- Variant_Performance: Per-invocation metrics (success rate, latency, confidence, token usage) attributed to a specific Variant rather than just the parent Agent.
- Agents_Page: The existing React frontend page at
/agentsthat displays agent configurations and performance metrics. - Ollama_Service: The local LLM inference service at
ollama.ollama-service.svc.cluster.local:11434used by all three system agents. - Clone_Operation: The act of creating a new Variant from an existing Agent or Variant, copying all configuration fields while allowing the user to modify them.
Requirements
Requirement 1: Variant Data Model
User Story: As a developer, I want each agent to support multiple variant configurations stored in the database, so that I can experiment with different models and prompts without modifying the base agent.
Acceptance Criteria
- THE Database SHALL store Variant records in an
agent_variantstable with columns: id (UUID), agent_id (FK to ai_agents), variant_name, variant_slug, description, model_provider, model_name, system_prompt, user_prompt_template, prompt_version, temperature, max_tokens, context_window, input_token_limit, token_budget, timeout_seconds, max_retries, is_active (boolean), created_at, and updated_at. - WHEN a Variant is created, THE Database SHALL enforce a foreign key constraint from
agent_variants.agent_idtoai_agents.idwith ON DELETE CASCADE. - THE Database SHALL enforce a unique constraint on the combination of agent_id and variant_slug to prevent duplicate Variant slugs within the same Agent.
- WHEN a Variant has
is_activeset to TRUE, THE Database SHALL ensure that at most one Variant per Agent hasis_active = TRUEby using a partial unique index on (agent_id) WHERE is_active = TRUE. - THE Database SHALL create indexes on agent_id and on (agent_id, is_active) for efficient lookup of Variants by Agent and Active_Variant resolution.
Requirement 2: Clone Agent as Variant
User Story: As a user, I want to clone an existing agent as a variant that inherits the agent's role and purpose but lets me tweak the model, prompt, and parameters, so that I can create experimental configurations quickly.
Acceptance Criteria
- WHEN a user submits a clone request for an Agent, THE API SHALL create a new Variant record that copies the Agent's model_provider, model_name, system_prompt, user_prompt_template, prompt_version, temperature, max_tokens, context_window, input_token_limit, token_budget, timeout_seconds, and max_retries into the new Variant.
- WHEN a user submits a clone request for an existing Variant, THE API SHALL create a new Variant record under the same parent Agent that copies the source Variant's configuration fields.
- WHEN a Variant is created via clone, THE API SHALL allow the user to override any of the copied configuration fields in the same request.
- WHEN a Variant is created, THE API SHALL require a variant_name and auto-generate a variant_slug from the variant_name if one is not provided.
- IF a clone request specifies a variant_slug that already exists for the same Agent, THEN THE API SHALL return a 409 Conflict error with a descriptive message.
- WHEN a Variant is successfully created, THE API SHALL return the complete Variant record including the generated id and timestamps.
Requirement 3: Variant CRUD Operations
User Story: As a user, I want to create, read, update, and delete variants through the API, so that I can manage variant configurations programmatically.
Acceptance Criteria
- WHEN a GET request is made to
/api/agents/{agent_id}/variants, THE API SHALL return a list of all Variant records belonging to the specified Agent, ordered by created_at ascending. - WHEN a GET request is made to
/api/agents/{agent_id}/variants/{variant_id}, THE API SHALL return the full Variant record. - IF a GET request references a non-existent Agent or Variant, THEN THE API SHALL return a 404 Not Found error.
- WHEN a PUT request is made to
/api/agents/{agent_id}/variants/{variant_id}, THE API SHALL update only the fields provided in the request body and set updated_at to the current timestamp. - WHEN a DELETE request is made for a Variant, THE API SHALL remove the Variant record and cascade-delete associated performance log entries.
- IF a DELETE request targets a Variant that is currently the Active_Variant, THEN THE API SHALL return a 400 Bad Request error indicating the user must deactivate or promote a different Variant first.
Requirement 4: Active Variant Swap
User Story: As a user, I want to designate which variant is the active one for a given agent role, so that production inference uses my chosen configuration.
Acceptance Criteria
- WHEN a user sends a POST request to
/api/agents/{agent_id}/variants/{variant_id}/activate, THE API SHALL setis_active = TRUEon the specified Variant and setis_active = FALSEon any previously active Variant for that Agent, within a single database transaction. - WHEN a user sends a POST request to
/api/agents/{agent_id}/variants/deactivate, THE API SHALL setis_active = FALSEon the currently active Variant for that Agent, causing the Agent to fall back to its Base_Configuration. - WHEN the extractor, event classifier, or thesis rewriter service resolves its runtime configuration, THE Service SHALL check for an Active_Variant for its Agent and use the Variant's model_name, system_prompt, temperature, max_tokens, context_window, input_token_limit, token_budget, timeout_seconds, and max_retries instead of the Base_Configuration or environment variable defaults.
- IF no Active_Variant exists for an Agent, THEN THE Service SHALL use the Agent's Base_Configuration from the
ai_agentstable. - WHEN an Active_Variant swap occurs, THE API SHALL return the updated Variant record with the new is_active state.
Requirement 5: Model Swapping
User Story: As a user, I want to configure variants with different Ollama models (e.g. qwen3.5, llama3.1, gemma2), so that I can compare model quality and performance for each agent role.
Acceptance Criteria
- THE Variant record SHALL accept any valid model_name string in the model_name field, enabling the user to specify different Ollama models per Variant.
- WHEN a Variant specifies a model_name, THE Ollama_Service client SHALL use that model_name in the
/api/chatrequest to the Ollama endpoint. - WHEN a user updates a Variant's model_name via the API, THE API SHALL validate that the model_name field is a non-empty string and persist the change.
- THE Agents_Page SHALL display the model_name for each Variant in the variant list, enabling users to see which model each Variant uses at a glance.
Requirement 6: Per-Variant Performance Tracking
User Story: As a user, I want performance metrics (success rate, latency, confidence, token usage) tracked per variant, so that I can evaluate which configuration performs best.
Acceptance Criteria
- THE Database SHALL add a nullable
variant_idcolumn (FK toagent_variants.id, ON DELETE SET NULL) to theagent_performance_logtable. - WHEN a service invocation uses an Active_Variant, THE Service SHALL record the variant_id in the
agent_performance_logentry alongside the existing agent_id. - WHEN a GET request is made to
/api/agents/{agent_id}/variants/{variant_id}/performance, THE API SHALL return aggregated Variant_Performance metrics (total invocations, success count, failure count, average duration, p95 duration, average confidence, average retries, total input tokens, total output tokens, success rate) for the specified Variant within the requested time window. - WHEN a GET request is made to
/api/agents/{agent_id}/variants/{variant_id}/performance/history, THE API SHALL return hourly time-series Variant_Performance data for the specified Variant. - WHEN performance is queried for the base Agent without a variant filter, THE API SHALL continue to return metrics across all invocations for that Agent, including those attributed to Variants.
Requirement 7: Side-by-Side Variant Comparison
User Story: As a user, I want to compare two or more variants side-by-side on the Agents page, so that I can make informed decisions about which variant to activate.
Acceptance Criteria
- WHEN a user selects an Agent on the Agents_Page, THE Agents_Page SHALL display a list of all Variants for that Agent below the Agent detail section, showing variant_name, model_name, is_active status, and creation date for each.
- WHEN a user selects two or more Variants for comparison, THE Agents_Page SHALL display a comparison view showing performance metrics (success rate, average latency, p95 latency, average confidence, total tokens) for each selected Variant in adjacent columns.
- THE Agents_Page SHALL visually highlight the Active_Variant in the variant list with a distinct badge or indicator.
- WHEN a user views the comparison view, THE Agents_Page SHALL display a time-series chart overlaying the performance history of the selected Variants on the same axes for direct visual comparison.
- THE Agents_Page SHALL provide an "Activate" button next to each non-active Variant in the list, allowing the user to promote a Variant to Active_Variant directly from the comparison view.
Requirement 8: Variant UI Management
User Story: As a user, I want to create, edit, clone, and delete variants from the Agents page, so that I can manage variant configurations without leaving the dashboard.
Acceptance Criteria
- WHEN a user clicks "Clone as Variant" on an Agent detail view, THE Agents_Page SHALL open a pre-filled form with the Agent's current configuration, allowing the user to modify fields and submit to create a new Variant.
- WHEN a user clicks "Clone" on an existing Variant, THE Agents_Page SHALL open a pre-filled form with that Variant's configuration for creating a new Variant.
- WHEN a user clicks "Edit" on a Variant, THE Agents_Page SHALL display an edit form pre-populated with the Variant's current configuration, allowing modification and save.
- WHEN a user clicks "Delete" on a non-active Variant, THE Agents_Page SHALL display a confirmation dialog before deleting the Variant.
- IF a user attempts to delete the Active_Variant, THEN THE Agents_Page SHALL display an error message indicating the user must deactivate the Variant first.
- WHEN a Variant is created, edited, activated, or deleted, THE Agents_Page SHALL refresh the variant list and performance data to reflect the change.
Requirement 10: Token Window and Budget Controls
User Story: As a user, I want to configure context window sizes, input token limits, and hourly token budgets per variant, so that I can control resource usage for cloud models while running unlimited for local Ollama.
Acceptance Criteria
- THE Variant record SHALL include a
context_windowinteger field (default 0) that maps to the Ollamanum_ctxparameter. A value of 0 means use the model's default context window. - THE Variant record SHALL include an
input_token_limitinteger field (default 0) that caps how many tokens are sent as input to the model. A value of 0 means no limit (no truncation). - THE Variant record SHALL include a
token_budgetinteger field (default 0) representing the maximum total tokens (input + output) allowed per hour for the variant. A value of 0 means unlimited. - WHEN a service invocation uses an Active_Variant with a non-zero
context_window, THE Ollama_Service client SHALL passnum_ctxin the Ollama API options. - WHEN a service invocation uses an Active_Variant with a non-zero
input_token_limit, THE Service SHALL truncate the input content to approximately that many tokens before sending it to the model. - WHEN a service invocation uses an Active_Variant with a non-zero
token_budgetand the hourly token usage for that variant has reached or exceeded the budget, THE Service SHALL skip the invocation and log a warning. - THE Agents_Page SHALL display context_window, input_token_limit, and token_budget fields in the variant create, edit, and clone forms, with clear labels indicating that 0 means "use default" or "unlimited".
User Story: As a developer, I want the extractor, event classifier, and thesis rewriter services to dynamically resolve their configuration from the database (including active variant overrides), so that variant swaps take effect without restarting services.
Acceptance Criteria
- WHEN the Document Intelligence Extractor service prepares an inference request, THE Service SHALL query the
ai_agentstable (joined withagent_variantsif an Active_Variant exists) by the agent slugdocument-extractorto resolve model_name, system_prompt, temperature, max_tokens, context_window, input_token_limit, token_budget, timeout_seconds, and max_retries. - WHEN the Global Event Classifier service prepares a classification request, THE Service SHALL query the database by the agent slug
event-classifierto resolve runtime configuration, preferring the Active_Variant's values when one exists. - WHEN the Thesis Rewriter service prepares a rewrite request, THE Service SHALL query the database by the agent slug
thesis-rewriterto resolve runtime configuration, preferring the Active_Variant's values when one exists. - IF the database is unreachable during configuration resolution, THEN THE Service SHALL fall back to the environment variable defaults from OllamaConfig and log a warning.
- THE Service SHALL cache resolved configuration with a time-to-live of 60 seconds to avoid querying the database on every invocation, while still reflecting Active_Variant swaps within a reasonable delay.