Files

T

Celes Renata 7c23c044d7 feat: agent variants — migration, API, service integration, frontend, tests

- Migration 027: agent_variants table with single-active enforcement,
  variant_id column on agent_performance_log
- API: full CRUD, clone from agent/variant, activate/deactivate,
  per-variant performance metrics and history endpoints
- Services: extractor, event classifier, thesis rewriter all wired
  to AgentConfigResolver with variant override support
- Frontend: variant list, comparison view, create/edit/clone forms,
  activate/delete actions on Agents page
- Tests: API tests + 5 property-based tests (single-active invariant,
  clone preservation, config resolution, slug determinism, update idempotence)
- Spec files for agent-variants feature

2026-04-17 05:15:42 +00:00

15 KiB

Raw Blame History

Requirements Document

Introduction

Add variant support to the existing AI agents system. Each agent (Document Intelligence Extractor, Global Event Classifier, Thesis Rewriter) can have multiple variants — different model, prompt, and parameter configurations — enabling A/B testing, model comparison, and iterative prompt engineering. Users can clone agents as variants, track per-variant performance, compare variants side-by-side, and swap which variant is the active one running in production for a given agent role.

Glossary

Agent: A record in the ai_agents table representing an AI role (e.g. Document Intelligence Extractor). Each Agent has a purpose, model configuration, and prompts.
Variant: A child configuration of an Agent that inherits the parent Agent's role and purpose but allows independent model, prompt, and parameter overrides. Stored in a new agent_variants table.
Active_Variant: The single Variant (or the base Agent configuration) currently designated to execute in production for a given Agent role. Only one Variant per Agent can be active at a time.
Base_Configuration: The original Agent's model, prompt, and parameter settings before any Variant is created. Serves as the default Active_Variant when no Variant has been promoted.
Variant_Performance: Per-invocation metrics (success rate, latency, confidence, token usage) attributed to a specific Variant rather than just the parent Agent.
Agents_Page: The existing React frontend page at /agents that displays agent configurations and performance metrics.
Ollama_Service: The local LLM inference service at ollama.ollama-service.svc.cluster.local:11434 used by all three system agents.
Clone_Operation: The act of creating a new Variant from an existing Agent or Variant, copying all configuration fields while allowing the user to modify them.

Requirements

Requirement 1: Variant Data Model

User Story: As a developer, I want each agent to support multiple variant configurations stored in the database, so that I can experiment with different models and prompts without modifying the base agent.

Acceptance Criteria

THE Database SHALL store Variant records in an agent_variants table with columns: id (UUID), agent_id (FK to ai_agents), variant_name, variant_slug, description, model_provider, model_name, system_prompt, user_prompt_template, prompt_version, temperature, max_tokens, context_window, input_token_limit, token_budget, timeout_seconds, max_retries, is_active (boolean), created_at, and updated_at.
WHEN a Variant is created, THE Database SHALL enforce a foreign key constraint from agent_variants.agent_id to ai_agents.id with ON DELETE CASCADE.
THE Database SHALL enforce a unique constraint on the combination of agent_id and variant_slug to prevent duplicate Variant slugs within the same Agent.
WHEN a Variant has is_active set to TRUE, THE Database SHALL ensure that at most one Variant per Agent has is_active = TRUE by using a partial unique index on (agent_id) WHERE is_active = TRUE.
THE Database SHALL create indexes on agent_id and on (agent_id, is_active) for efficient lookup of Variants by Agent and Active_Variant resolution.

Requirement 2: Clone Agent as Variant

User Story: As a user, I want to clone an existing agent as a variant that inherits the agent's role and purpose but lets me tweak the model, prompt, and parameters, so that I can create experimental configurations quickly.

Acceptance Criteria

WHEN a user submits a clone request for an Agent, THE API SHALL create a new Variant record that copies the Agent's model_provider, model_name, system_prompt, user_prompt_template, prompt_version, temperature, max_tokens, context_window, input_token_limit, token_budget, timeout_seconds, and max_retries into the new Variant.
WHEN a user submits a clone request for an existing Variant, THE API SHALL create a new Variant record under the same parent Agent that copies the source Variant's configuration fields.
WHEN a Variant is created via clone, THE API SHALL allow the user to override any of the copied configuration fields in the same request.
WHEN a Variant is created, THE API SHALL require a variant_name and auto-generate a variant_slug from the variant_name if one is not provided.
IF a clone request specifies a variant_slug that already exists for the same Agent, THEN THE API SHALL return a 409 Conflict error with a descriptive message.
WHEN a Variant is successfully created, THE API SHALL return the complete Variant record including the generated id and timestamps.

Requirement 3: Variant CRUD Operations

User Story: As a user, I want to create, read, update, and delete variants through the API, so that I can manage variant configurations programmatically.

Acceptance Criteria

WHEN a GET request is made to /api/agents/{agent_id}/variants, THE API SHALL return a list of all Variant records belonging to the specified Agent, ordered by created_at ascending.
WHEN a GET request is made to /api/agents/{agent_id}/variants/{variant_id}, THE API SHALL return the full Variant record.
IF a GET request references a non-existent Agent or Variant, THEN THE API SHALL return a 404 Not Found error.
WHEN a PUT request is made to /api/agents/{agent_id}/variants/{variant_id}, THE API SHALL update only the fields provided in the request body and set updated_at to the current timestamp.
WHEN a DELETE request is made for a Variant, THE API SHALL remove the Variant record and cascade-delete associated performance log entries.
IF a DELETE request targets a Variant that is currently the Active_Variant, THEN THE API SHALL return a 400 Bad Request error indicating the user must deactivate or promote a different Variant first.

Requirement 4: Active Variant Swap

User Story: As a user, I want to designate which variant is the active one for a given agent role, so that production inference uses my chosen configuration.

Acceptance Criteria

WHEN a user sends a POST request to /api/agents/{agent_id}/variants/{variant_id}/activate, THE API SHALL set is_active = TRUE on the specified Variant and set is_active = FALSE on any previously active Variant for that Agent, within a single database transaction.
WHEN a user sends a POST request to /api/agents/{agent_id}/variants/deactivate, THE API SHALL set is_active = FALSE on the currently active Variant for that Agent, causing the Agent to fall back to its Base_Configuration.
WHEN the extractor, event classifier, or thesis rewriter service resolves its runtime configuration, THE Service SHALL check for an Active_Variant for its Agent and use the Variant's model_name, system_prompt, temperature, max_tokens, context_window, input_token_limit, token_budget, timeout_seconds, and max_retries instead of the Base_Configuration or environment variable defaults.
IF no Active_Variant exists for an Agent, THEN THE Service SHALL use the Agent's Base_Configuration from the ai_agents table.
WHEN an Active_Variant swap occurs, THE API SHALL return the updated Variant record with the new is_active state.

Requirement 5: Model Swapping

User Story: As a user, I want to configure variants with different Ollama models (e.g. qwen3.5, llama3.1, gemma2), so that I can compare model quality and performance for each agent role.

Acceptance Criteria

THE Variant record SHALL accept any valid model_name string in the model_name field, enabling the user to specify different Ollama models per Variant.
WHEN a Variant specifies a model_name, THE Ollama_Service client SHALL use that model_name in the /api/chat request to the Ollama endpoint.
WHEN a user updates a Variant's model_name via the API, THE API SHALL validate that the model_name field is a non-empty string and persist the change.
THE Agents_Page SHALL display the model_name for each Variant in the variant list, enabling users to see which model each Variant uses at a glance.

Requirement 6: Per-Variant Performance Tracking

User Story: As a user, I want performance metrics (success rate, latency, confidence, token usage) tracked per variant, so that I can evaluate which configuration performs best.

Acceptance Criteria

THE Database SHALL add a nullable variant_id column (FK to agent_variants.id, ON DELETE SET NULL) to the agent_performance_log table.
WHEN a service invocation uses an Active_Variant, THE Service SHALL record the variant_id in the agent_performance_log entry alongside the existing agent_id.
WHEN a GET request is made to /api/agents/{agent_id}/variants/{variant_id}/performance, THE API SHALL return aggregated Variant_Performance metrics (total invocations, success count, failure count, average duration, p95 duration, average confidence, average retries, total input tokens, total output tokens, success rate) for the specified Variant within the requested time window.
WHEN a GET request is made to /api/agents/{agent_id}/variants/{variant_id}/performance/history, THE API SHALL return hourly time-series Variant_Performance data for the specified Variant.
WHEN performance is queried for the base Agent without a variant filter, THE API SHALL continue to return metrics across all invocations for that Agent, including those attributed to Variants.

Requirement 7: Side-by-Side Variant Comparison

User Story: As a user, I want to compare two or more variants side-by-side on the Agents page, so that I can make informed decisions about which variant to activate.

Acceptance Criteria

WHEN a user selects an Agent on the Agents_Page, THE Agents_Page SHALL display a list of all Variants for that Agent below the Agent detail section, showing variant_name, model_name, is_active status, and creation date for each.
WHEN a user selects two or more Variants for comparison, THE Agents_Page SHALL display a comparison view showing performance metrics (success rate, average latency, p95 latency, average confidence, total tokens) for each selected Variant in adjacent columns.
THE Agents_Page SHALL visually highlight the Active_Variant in the variant list with a distinct badge or indicator.
WHEN a user views the comparison view, THE Agents_Page SHALL display a time-series chart overlaying the performance history of the selected Variants on the same axes for direct visual comparison.
THE Agents_Page SHALL provide an "Activate" button next to each non-active Variant in the list, allowing the user to promote a Variant to Active_Variant directly from the comparison view.

Requirement 8: Variant UI Management

User Story: As a user, I want to create, edit, clone, and delete variants from the Agents page, so that I can manage variant configurations without leaving the dashboard.

Acceptance Criteria

WHEN a user clicks "Clone as Variant" on an Agent detail view, THE Agents_Page SHALL open a pre-filled form with the Agent's current configuration, allowing the user to modify fields and submit to create a new Variant.
WHEN a user clicks "Clone" on an existing Variant, THE Agents_Page SHALL open a pre-filled form with that Variant's configuration for creating a new Variant.
WHEN a user clicks "Edit" on a Variant, THE Agents_Page SHALL display an edit form pre-populated with the Variant's current configuration, allowing modification and save.
WHEN a user clicks "Delete" on a non-active Variant, THE Agents_Page SHALL display a confirmation dialog before deleting the Variant.
IF a user attempts to delete the Active_Variant, THEN THE Agents_Page SHALL display an error message indicating the user must deactivate the Variant first.
WHEN a Variant is created, edited, activated, or deleted, THE Agents_Page SHALL refresh the variant list and performance data to reflect the change.

Requirement 10: Token Window and Budget Controls

User Story: As a user, I want to configure context window sizes, input token limits, and hourly token budgets per variant, so that I can control resource usage for cloud models while running unlimited for local Ollama.

Acceptance Criteria

THE Variant record SHALL include a context_window integer field (default 0) that maps to the Ollama num_ctx parameter. A value of 0 means use the model's default context window.
THE Variant record SHALL include an input_token_limit integer field (default 0) that caps how many tokens are sent as input to the model. A value of 0 means no limit (no truncation).
THE Variant record SHALL include a token_budget integer field (default 0) representing the maximum total tokens (input + output) allowed per hour for the variant. A value of 0 means unlimited.
WHEN a service invocation uses an Active_Variant with a non-zero context_window, THE Ollama_Service client SHALL pass num_ctx in the Ollama API options.
WHEN a service invocation uses an Active_Variant with a non-zero input_token_limit, THE Service SHALL truncate the input content to approximately that many tokens before sending it to the model.
WHEN a service invocation uses an Active_Variant with a non-zero token_budget and the hourly token usage for that variant has reached or exceeded the budget, THE Service SHALL skip the invocation and log a warning.
THE Agents_Page SHALL display context_window, input_token_limit, and token_budget fields in the variant create, edit, and clone forms, with clear labels indicating that 0 means "use default" or "unlimited".

User Story: As a developer, I want the extractor, event classifier, and thesis rewriter services to dynamically resolve their configuration from the database (including active variant overrides), so that variant swaps take effect without restarting services.

Acceptance Criteria

WHEN the Document Intelligence Extractor service prepares an inference request, THE Service SHALL query the ai_agents table (joined with agent_variants if an Active_Variant exists) by the agent slug document-extractor to resolve model_name, system_prompt, temperature, max_tokens, context_window, input_token_limit, token_budget, timeout_seconds, and max_retries.
WHEN the Global Event Classifier service prepares a classification request, THE Service SHALL query the database by the agent slug event-classifier to resolve runtime configuration, preferring the Active_Variant's values when one exists.
WHEN the Thesis Rewriter service prepares a rewrite request, THE Service SHALL query the database by the agent slug thesis-rewriter to resolve runtime configuration, preferring the Active_Variant's values when one exists.
IF the database is unreachable during configuration resolution, THEN THE Service SHALL fall back to the environment variable defaults from OllamaConfig and log a warning.
THE Service SHALL cache resolved configuration with a time-to-live of 60 seconds to avoid querying the database on every invocation, while still reflecting Active_Variant swaps within a reasonable delay.

15 KiB Raw Blame History

Requirements Document

Introduction

Glossary

Requirements

Requirement 1: Variant Data Model

Acceptance Criteria

Requirement 2: Clone Agent as Variant

Acceptance Criteria

Requirement 3: Variant CRUD Operations

Acceptance Criteria

Requirement 4: Active Variant Swap

Acceptance Criteria

Requirement 5: Model Swapping

Acceptance Criteria

Requirement 6: Per-Variant Performance Tracking

Acceptance Criteria

Requirement 7: Side-by-Side Variant Comparison

Acceptance Criteria

Requirement 8: Variant UI Management

Acceptance Criteria

Requirement 10: Token Window and Budget Controls

Acceptance Criteria

Acceptance Criteria

15 KiB

Raw Blame History