stonks-oracle/.kiro/specs/comprehensive-quality-docs/requirements.md

# Requirements Document

## Introduction

This initiative covers three pillars for the Stonks Oracle platform: (1) closing unit test coverage gaps across all 13 services, fixing pre-existing test failures, and ensuring every feature has proper automated tests; (2) updating the Docker Compose deployment to include all application services so users can run the full platform without Kubernetes; and (3) producing comprehensive documentation covering every feature, all API endpoints, Helm chart configuration, Docker deployment options, and three Mermaid architecture diagrams (Kubernetes deployment, Docker Compose deployment, and data pipeline), with the README updated to link to all resources.

## Glossary

- **Test_Suite**: The collection of pytest unit tests, property-based tests, and integration tests in the `tests/` directory
- **Docker_Compose_Stack**: The `docker-compose.yml` file and associated Dockerfiles that define the local development environment
- **Helm_Chart**: The Kubernetes deployment configuration at `infra/helm/stonks-oracle/` including `values.yaml`, value overrides, and templates
- **Query_API**: The FastAPI REST service at `services/api/app.py` serving analytics and dashboard queries
- **Symbol_Registry_API**: The FastAPI REST service at `services/symbol_registry/app.py` managing companies, watchlists, sources, exposure profiles, and competitor relationships
- **Trading_API**: The FastAPI REST service at `services/trading/app.py` controlling the autonomous trading engine
- **Risk_API**: The FastAPI REST service at `services/risk/app.py` evaluating order risk and managing approval workflows
- **Scheduler_Service**: The service at `services/scheduler/` that triggers ingestion cycles on a cadence
- **Ingestion_Service**: The queue worker at `services/ingestion/` that fetches market data, news, filings, and macro events
- **Extractor_Service**: The queue worker at `services/extractor/` that performs LLM-based intelligence extraction and event classification
- **Documentation_Set**: The collection of Markdown files in `docs/` that describe features, APIs, deployment, and architecture
- **Architecture_Diagram**: A Mermaid-syntax diagram showing services, data stores, external integrations, and data flow. Three diagrams are produced: Kubernetes deployment, Docker Compose deployment, and data pipeline
- **README**: The root `README.md` file serving as the project entry point

## Requirements

### Requirement 1: Scheduler Service Unit Tests

**User Story:** As a developer, I want the scheduler service to have dedicated unit tests, so that scheduling logic, cadence management, and source polling behavior are verified independently of integration tests.

#### Acceptance Criteria

1. WHEN the Test_Suite is executed for the scheduler module, THE Test_Suite SHALL include unit tests covering job enqueue logic, polling interval calculation, and source due-date evaluation
2. WHEN a scheduler unit test is run, THE Test_Suite SHALL mock all external dependencies (PostgreSQL, Redis) and test scheduling logic in isolation
3. THE Test_Suite SHALL verify that the scheduler correctly enqueues ingestion jobs for sources whose polling interval has elapsed
4. IF a database or Redis connection fails during scheduling, THEN THE Test_Suite SHALL verify that the Scheduler_Service handles the error without crashing

### Requirement 2: Ingestion Service Unit Tests

**User Story:** As a developer, I want the ingestion service to have unit tests for adapter error handling and retry logic, so that data fetching resilience is verified beyond integration tests.

#### Acceptance Criteria

1. WHEN the Test_Suite is executed for the ingestion module, THE Test_Suite SHALL include unit tests covering adapter error handling, retry logic, and deduplication behavior
2. WHEN an external API returns an error response, THE Test_Suite SHALL verify that the Ingestion_Service retries according to the configured backoff policy
3. WHEN a duplicate content hash is detected, THE Test_Suite SHALL verify that the Ingestion_Service skips re-processing the document
4. IF all retry attempts are exhausted, THEN THE Test_Suite SHALL verify that the Ingestion_Service routes the failed job to the dead-letter queue

### Requirement 3: Extractor Test Failure Fixes

**User Story:** As a developer, I want the pre-existing test failures in the extractor module to be resolved, so that the full test suite passes cleanly in CI.

#### Acceptance Criteria

1. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_extractor_prompts.py` without failures
2. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_extractor_schemas.py` without failures
3. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_ollama_client.py` without failures
4. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_filings_adapter.py` without failures
5. THE Test_Suite SHALL maintain the original test intent and assertions when fixing failures, modifying only the code under test or test setup as needed

### Requirement 4: Full Test Suite Green Status

**User Story:** As a developer, I want the entire test suite to pass, so that CI builds succeed and regressions are caught immediately.

#### Acceptance Criteria

1. WHEN `pytest tests/ -x --tb=short -q` is executed, THE Test_Suite SHALL report zero failures across all test files
2. WHEN `ruff check services/` is executed, THE Test_Suite SHALL report zero lint violations
3. THE Test_Suite SHALL maintain all existing property-based tests (files prefixed `test_pbt_*`) in a passing state
4. IF a test fix requires modifying production code, THEN THE Test_Suite SHALL include a regression test that validates the fix

### Requirement 5: Docker Compose Application Services

**User Story:** As a developer using Docker instead of Kubernetes, I want docker-compose.yml to include all 13 application services and the frontend, so that I can run the full platform locally with a single `docker compose up`.

#### Acceptance Criteria

1. THE Docker_Compose_Stack SHALL define service containers for all 13 application services: scheduler, symbol-registry, ingestion, parser, extractor, aggregation, recommendation, trading-engine, risk-engine, broker-adapter, lake-publisher, query-api, and dashboard
2. THE Docker_Compose_Stack SHALL define a frontend container serving the React dashboard via nginx on port 8080
3. WHEN `docker compose up` is executed, THE Docker_Compose_Stack SHALL start all infrastructure services (PostgreSQL, Redis, MinIO, Ollama, Trino, Hive Metastore, Superset) before application services using dependency ordering
4. WHEN an application service container starts, THE Docker_Compose_Stack SHALL provide health checks that verify the service is ready to accept requests
5. THE Docker_Compose_Stack SHALL configure environment variables for each service matching the defaults documented in `docs/LOCAL_DEV_SETUP.md`, with infrastructure hostnames pointing to Docker Compose service names
6. THE Docker_Compose_Stack SHALL allow users to provide API keys (MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET) via a `.env` file without modifying docker-compose.yml
7. IF an infrastructure dependency (PostgreSQL, Redis) is not yet healthy, THEN THE Docker_Compose_Stack SHALL delay application service startup using `depends_on` with `condition: service_healthy`

### Requirement 6: Service Feature Documentation

**User Story:** As a user or contributor, I want every service documented with its purpose, configuration, queue interactions, and database tables, so that I can understand how each part of the platform works.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include a dedicated document for each of the 13 services describing its purpose, inputs, outputs, configuration environment variables, and database tables used
2. WHEN a service consumes from or publishes to a Redis queue, THE Documentation_Set SHALL document the queue name, message schema, and processing behavior
3. WHEN a service exposes HTTP endpoints, THE Documentation_Set SHALL reference the API documentation for that service
4. THE Documentation_Set SHALL describe the three signal layers (company, macro, competitive) with their data flow, toggle mechanisms, and weight configurations
5. THE Documentation_Set SHALL document the trading engine features including position sizing, circuit breakers, reserve pool management, risk tier auto-adjustment, backtesting, and notification configuration

### Requirement 7: API Reference Documentation

**User Story:** As a developer integrating with Stonks Oracle, I want a complete API reference for all four FastAPI services, so that I know every endpoint, its parameters, request/response schemas, and error codes.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Query_API, including path, method, query parameters, response schema, and error codes
2. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Symbol_Registry_API, including CRUD operations for companies, aliases, watchlists, sources, exposure profiles, and competitor relationships
3. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Trading_API, including engine control, decision audit, performance metrics, backtesting, notifications, and manual override orders
4. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Risk_API, including order evaluation, approval workflow, and approval expiration
5. WHEN an endpoint accepts query parameters or a request body, THE Documentation_Set SHALL document each parameter with its type, default value, and constraints
6. WHEN an endpoint returns an error, THE Documentation_Set SHALL document the HTTP status code and error response format

### Requirement 8: Helm Chart Configuration Reference

**User Story:** As an operator deploying Stonks Oracle on Kubernetes, I want a complete reference for all Helm chart values, so that I can configure services, resources, secrets, ingress, network policies, and analytics components.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include a Helm configuration reference documenting every key in `values.yaml` with its type, default value, and description
2. THE Documentation_Set SHALL document the `services` block structure including replicas, image, command, tier, port, secrets, resources, and probes for each service
3. THE Documentation_Set SHALL document the `config` block with all ConfigMap environment variables, their defaults, and what they control
4. THE Documentation_Set SHALL document the `secrets` block structure (core, broker, market, gmail, dashboard) and how secrets are injected via `--set` flags during deployment
5. THE Documentation_Set SHALL document the `ingress` block including className, clusterIssuer, and host mappings
6. THE Documentation_Set SHALL document the analytics stack toggles (trino.enabled, hiveMetastore.enabled, superset.enabled) and their resource configurations
7. THE Documentation_Set SHALL document the `pipelineEnabled` toggle and its effect on worker service replicas
8. THE Documentation_Set SHALL document the `networkPolicies.enabled` toggle and the default-deny-ingress behavior
9. THE Documentation_Set SHALL document the value override files (`values-beta.yaml`, `values-paper.yaml`) and their intended deployment stages

### Requirement 9: Docker Deployment Guide

**User Story:** As a developer deploying with Docker Compose, I want a guide explaining all Docker deployment options, environment variables, volume mounts, and operational commands, so that I can run and manage the platform without Kubernetes.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include a Docker deployment guide documenting every service defined in docker-compose.yml with its image, ports, volumes, and environment variables
2. THE Documentation_Set SHALL document the `.env` file format with all required and optional environment variables, their defaults, and descriptions
3. THE Documentation_Set SHALL document volume mounts and data persistence behavior, including how to reset data with `docker compose down -v`
4. THE Documentation_Set SHALL document health check configurations and how to verify all services are running
5. THE Documentation_Set SHALL document the Dockerfile build arguments (SERVICE_CMD) and how to build custom service images
6. THE Documentation_Set SHALL document operational commands for starting, stopping, restarting individual services, viewing logs, and scaling replicas

### Requirement 10: Kubernetes Architecture Diagram

**User Story:** As an operator deploying on Kubernetes, I want a Mermaid diagram showing how Stonks Oracle runs in a K8s cluster, so that I can understand the deployment topology, networking, and infrastructure dependencies.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include a Mermaid diagram showing all 13 application services deployed as Kubernetes Deployments within the `stonks-oracle` namespace
2. THE diagram SHALL show external cluster services (PostgreSQL, Redis, MinIO, Ollama) in their respective namespaces with cross-namespace service references
3. THE diagram SHALL show Traefik ingress routes mapping external domains to internal services (stonks.celestium.life → dashboard, stonks-api.celestium.life → query-api, etc.)
4. THE diagram SHALL show network policy boundaries indicating which services can communicate with each other
5. THE diagram SHALL show the analytics plane (Trino, Hive Metastore, Superset) deployed within the stonks-oracle namespace and their connections to MinIO
6. THE diagram SHALL show Helm-managed secrets (core, broker, market, gmail) and which services consume them
7. THE diagram SHALL distinguish between API-tier services (with ingress), pipeline-tier workers (queue-driven), and trading-tier services

### Requirement 11: Docker Compose Architecture Diagram

**User Story:** As a developer running the platform locally with Docker Compose, I want a Mermaid diagram showing how all containers are wired together, so that I can understand port mappings, volume mounts, and service dependencies.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include a Mermaid diagram showing all infrastructure containers (PostgreSQL, Redis, MinIO, Ollama, Trino, Hive Metastore, Superset) and all 13 application service containers as defined in docker-compose.yml
2. THE diagram SHALL show host port mappings for externally accessible services (PostgreSQL:5432, Redis:6379, MinIO:9000/9001, Ollama:11434, Trino:8080, Superset:8088, Dashboard:8080, Query API:8000)
3. THE diagram SHALL show Docker Compose `depends_on` relationships and health check dependencies between infrastructure and application services
4. THE diagram SHALL show named volumes (pgdata, miniodata, ollama_models, hive_data, superset_data) and which containers mount them
5. THE diagram SHALL show the `.env` file providing API keys (MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET) to relevant service containers
6. THE diagram SHALL show internal Docker network connectivity between containers using Docker Compose service names as hostnames

### Requirement 12: Data Pipeline Architecture Diagram

**User Story:** As a user or contributor, I want a Mermaid diagram showing the end-to-end data pipeline from external data sources through signal processing to trade execution, so that I can understand how data flows through the system.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include a Mermaid diagram showing the complete data pipeline from external sources (Polygon.io, news APIs, SEC filings, macro news sources) through ingestion, parsing, extraction, aggregation, recommendation, risk evaluation, and trade execution
2. THE diagram SHALL show the Redis queue topology connecting pipeline stages (ingestion → parsing → extraction → aggregation → recommendation → broker) with queue names
3. THE diagram SHALL show the three signal layers (company, macro, competitive) as distinct processing paths that merge in the aggregation stage
4. THE diagram SHALL show data stores at each stage: MinIO for raw artifacts, PostgreSQL for structured data, Redis for queues and caching
5. THE diagram SHALL show the trading engine decision loop: recommendation polling → position sizing → risk evaluation → order execution → broker submission → fill tracking
6. THE diagram SHALL show the analytical branch: lake publisher writing Parquet fact tables to MinIO, queryable via Trino, visualized in Superset and the React dashboard
7. THE diagram SHALL show external integrations at their connection points: Ollama for LLM extraction, Alpaca for trade execution, AWS SNS and Gmail for notifications

### Requirement 13: AI Agent Building Guide

**User Story:** As a user or contributor, I want a guide explaining how each of the three AI agents works — document extractor, event classifier, and thesis rewriter — including how to configure them, create variants, tune prompts, and monitor performance, so that I can customize and extend the AI capabilities of the platform.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include an AI agent guide documenting the three built-in agents: `document-extractor` (structured intelligence extraction from news/filings), `event-classifier` (macro/geopolitical event classification), and `thesis-rewriter` (LLM-enhanced recommendation thesis generation)
2. FOR each agent, THE Documentation_Set SHALL document its purpose, input data, output schema, default model, system prompt structure, and user prompt template
3. THE Documentation_Set SHALL document the `ai_agents` database table schema and how agents are registered (system-seeded vs user-created via the API)
4. THE Documentation_Set SHALL document the `agent_variants` table and how to create, activate, and deactivate variants for A/B testing different models or prompts
5. THE Documentation_Set SHALL document the `AgentConfigResolver` module including the TTL cache (60-second default), COALESCE-based variant override logic, and fallback behavior when no DB config exists
6. THE Documentation_Set SHALL document the agent performance logging system and how to query `agent_performance_log` to compare variant effectiveness
7. THE Documentation_Set SHALL document the API endpoints for managing agents (CRUD on `/api/agents`) and testing agent configurations (`/api/agents/{id}/test`)
8. THE Documentation_Set SHALL include a step-by-step guide for creating a new agent variant with a different model or prompt and activating it for live traffic

### Requirement 14: Backup and Restore Guide

**User Story:** As an operator, I want a guide documenting all backup and restore scripts, their options, storage locations, and retention policies, so that I can protect data and recover from failures.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include a backup and restore guide documenting every script in `scripts/` related to backup and restore: `backup-db.sh`, `restore-db.sh`, `backup-redis.sh`, `backup.sh`, and `restore.sh`
2. FOR each backup script, THE Documentation_Set SHALL document its CLI arguments, what data it captures, where backups are stored, and retention/pruning behavior (e.g., keeps last 7)
3. FOR each restore script, THE Documentation_Set SHALL document its CLI arguments, what it restores, the service scale-down/scale-up procedure it performs, and any data loss implications
4. THE Documentation_Set SHALL document the MinIO upload option (`--upload-minio`) for off-host backup storage
5. THE Documentation_Set SHALL document the full database nuke and rebuild procedure including connection termination, database drop, Redis flush, redeploy, and re-seed steps
6. THE Documentation_Set SHALL document recommended backup schedules and how to automate backups via cron or Kubernetes CronJobs

### Requirement 15: Observability and Prometheus Metrics Reference

**User Story:** As an operator, I want a reference documenting all Prometheus metrics exposed by the platform, the alerting rules, and how to monitor pipeline health, so that I can set up dashboards and respond to incidents.

#### Acceptance Criteria

1. THE Documentation_Set SHALL include an observability reference documenting the `/metrics` endpoint on the query API and how to configure Prometheus to scrape it
2. THE Documentation_Set SHALL document all Prometheus counters, gauges, and histograms emitted by each service, including metric name, labels, and what they measure (e.g., `EXTRACTION_ATTEMPTS`, `EXTRACTION_DURATION`, `AGGREGATION_WINDOWS_COMPUTED`, `AGGREGATION_SIGNALS_PROCESSED`, `RECOMMENDATION_GENERATED`, `RECOMMENDATION_CONFIDENCE`, alerting counters)
3. THE Documentation_Set SHALL document the alerting module (`services/shared/alerting.py`) including all alert rules, their thresholds, evaluation windows, and the ConfigMap environment variables that control them (`ALERT_SOURCE_FAILURE_THRESHOLD`, `ALERT_SCHEMA_FAILURE_RATE_THRESHOLD`, `ALERT_LAKE_LAG_THRESHOLD_MINUTES`, `ALERT_BROKER_ERROR_THRESHOLD`, etc.)
4. THE Documentation_Set SHALL document the structured JSON logging format, trace context propagation (trace_id, span_id), and how to query logs for debugging pipeline issues
5. THE Documentation_Set SHALL document the dead-letter queue system including queue names, how failed jobs are routed there, and how to replay them using the dead-letter tooling
6. THE Documentation_Set SHALL document recommended Prometheus/Grafana dashboard configurations or queries for monitoring ingestion throughput, extraction latency, aggregation volume, recommendation generation rate, and trading engine activity

### Requirement 16: README Resource Links

**User Story:** As a user landing on the repository, I want the README to link to all documentation resources, so that I can navigate to any guide, reference, or diagram from a single entry point.

#### Acceptance Criteria

1. WHEN the README is updated, THE README SHALL include a documentation section with links to every document in the Documentation_Set
2. THE README SHALL link to the API reference documents for all four FastAPI services
3. THE README SHALL link to the Helm chart configuration reference
4. THE README SHALL link to the Docker deployment guide
5. THE README SHALL link to all three architecture diagram documents (Kubernetes, Docker Compose, and Data Pipeline)
6. THE README SHALL link to the per-service feature documentation
7. THE README SHALL link to the AI agent building guide
8. THE README SHALL link to the backup and restore guide
9. THE README SHALL link to the observability and Prometheus metrics reference
10. THE README SHALL replace the existing ASCII architecture diagram with the Mermaid architecture diagram or link to it
11. THE README SHALL preserve all existing content (license, features, tech stack, project structure, deployment instructions) while adding the new documentation links