# Implementation Plan: Comprehensive Quality & Documentation ## Overview This plan implements three pillars for the Stonks Oracle platform: (1) unit test coverage for the scheduler and ingestion services plus fixing pre-existing test failures, (2) extending docker-compose.yml with all 13 application services and the frontend, and (3) producing comprehensive documentation covering services, APIs, Helm configuration, Docker deployment, architecture diagrams, AI agents, backup/restore, observability, and README resource links. Tasks are ordered so tests come first (catch regressions early), then Docker Compose (infrastructure), then documentation (references verified code). ## Tasks - [x] 1. Write scheduler service unit tests - [x] 1.1 Create `tests/test_scheduler_unit.py` with unit tests for scheduler pure functions and orchestration - Import scheduler functions from `services/scheduler/app.py` - Mock `asyncpg.Pool` (`.fetch()`, `.fetchrow()`, `.fetchval()`, `.execute()`) and `redis.asyncio.Redis` (`.rpush()`, `.set()`, `.get()`, `.incr()`, `.expire()`, `.decr()`, `.delete()`) - Write 8+ test cases covering: `get_cadence_for_source`, `compute_backoff`, `is_source_due`, `build_job_payload`, `schedule_cycle` (mocked DB/Redis), `check_rate_limit`, `recover_stale_documents`, `retry_failed_extractions` - Verify error handling: DB/Redis connection failures handled without crashing - Use `pytest-asyncio` for async test functions, `unittest.mock.AsyncMock` and `unittest.mock.patch` - _Requirements: 1.1, 1.2, 1.3, 1.4_ - [x] 1.2 Write additional edge-case unit tests for scheduler - Test boundary conditions: zero polling interval, max retry count, empty source list - Test rate limiting edge cases: global Polygon limit, per-type limits - _Requirements: 1.3, 1.4_ - [x] 2. Write ingestion service unit tests - [x] 2.1 Create `tests/test_ingestion_unit.py` with unit tests for ingestion worker - Import ingestion functions from `services/ingestion/worker.py` - Mock adapters as `AsyncMock` returning `AdapterResult` with controlled `error`, `items`, `content_hash`, `raw_payload` - Mock `asyncpg.Pool` for `ingestion_runs` INSERT/UPDATE, `persist_ingestion_items`, `record_retrieval_failure` - Mock `redis.asyncio.Redis` for dedupe checks, queue pushes, DLQ routing - Mock `minio.Minio` for `upload_raw_artifact` - Write 6+ test cases covering: successful job processing, adapter error with retry, retry exhaustion → dead-letter queue, content hash deduplication skip, cross-source dedup via `dedupe_items`, error handling paths - _Requirements: 2.1, 2.2, 2.3, 2.4_ - [x] 2.2 Write additional edge-case unit tests for ingestion - Test empty adapter response, partial failures, multiple items in single job - _Requirements: 2.1, 2.4_ - [x] 3. Checkpoint — Verify new unit tests pass - Run `pytest tests/test_scheduler_unit.py tests/test_ingestion_unit.py -x --tb=short -q` - Ensure all tests pass, ask the user if questions arise. - [x] 4. Fix pre-existing test failures - [x] 4.1 Fix `tests/test_extractor_prompts.py` - Run the file individually to diagnose failures - Fix test setup (mock configuration, fixture data) or production code as needed - Preserve original test intent and assertions - If production code changes are needed, add regression tests - _Requirements: 3.1, 3.5_ - [x] 4.2 Fix `tests/test_extractor_schemas.py` - Run the file individually to diagnose failures - Fix test setup or production code as needed - Preserve original test intent and assertions - _Requirements: 3.2, 3.5_ - [x] 4.3 Fix `tests/test_ollama_client.py` - Run the file individually to diagnose failures - Fix test setup or production code as needed - Preserve original test intent and assertions - _Requirements: 3.3, 3.5_ - [x] 4.4 Fix `tests/test_filings_adapter.py` - Run the file individually to diagnose failures - Fix test setup or production code as needed - Preserve original test intent and assertions - _Requirements: 3.4, 3.5_ - [x] 5. Checkpoint — Full test suite green - Run `pytest tests/ -x --tb=short -q` and verify zero failures - Run `ruff check services/` and verify zero violations - Verify all `test_pbt_*` files pass unchanged - If any production code was modified, confirm regression tests exist - Ensure all tests pass, ask the user if questions arise. - _Requirements: 4.1, 4.2, 4.3, 4.4_ - [x] 6. Add application services to docker-compose.yml - [x] 6.1 Add shared environment anchor and all 14 service definitions to `docker-compose.yml` - Define `x-app-env` YAML anchor with common environment variables (POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD, REDIS_HOST, REDIS_PORT, MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, OLLAMA_BASE_URL) - Add 13 application service definitions: scheduler (using `docker/Dockerfile.scheduler`), symbol-registry, ingestion, parser, extractor, aggregation, recommendation, trading-engine, risk-engine, broker-adapter, lake-publisher, query-api — each using `docker/Dockerfile` with appropriate `SERVICE_CMD` build arg - Add dashboard service using `frontend/Dockerfile` on port 3000:8080 - Configure `depends_on` with `condition: service_healthy` for infrastructure dependencies - Add health checks: FastAPI services use `curl -f http://localhost:8000/health`, workers use process liveness - Configure `env_file: .env` on services needing API keys (ingestion, broker-adapter, trading-engine) - Map host ports: symbol-registry:8001, trading-engine:8002, risk-engine:8003, query-api:8004, dashboard:3000 - _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7_ - [x] 6.2 Validate docker-compose.yml configuration - Run `docker compose config` to verify the updated file parses correctly - _Requirements: 5.1_ - [x] 7. Checkpoint — Tests and Docker Compose validated - Run `pytest tests/ -x --tb=short -q` to confirm no regressions - Run `docker compose config` to confirm valid YAML - Ensure all tests pass, ask the user if questions arise. - [x] 8. Write per-service feature documentation - [x] 8.1 Create `docs/services.md` documenting all 13 services - For each service: purpose, entry point module path, configuration environment variables, database tables read/written, Redis queues consumed/published with message schemas - Include queue topology table (queue name → producer → consumer) - Document the three signal layers (company, macro, competitive) with data flow, toggles, and weight configurations - Document trading engine features: position sizing, circuit breakers, reserve pool, risk tier auto-adjustment, backtesting, notifications - Cross-reference API documentation for services with HTTP endpoints - _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_ - [x] 9. Write API reference documentation - [x] 9.1 Create `docs/api-reference.md` covering all four FastAPI services - Document all Query API endpoints (~40+): path, method, query parameters (type, default, constraints), request body schema, response schema, error codes - Document all Symbol Registry API endpoints: companies CRUD, aliases, watchlists, sources, exposure profiles, competitor relationships, competitor inference - Document all Trading API endpoints: health/readiness, engine status, config update, pause/resume, reset, decisions audit, performance metrics/history, backtesting, notifications config/history, override orders, debug state - Document all Risk API endpoints: order evaluation (POST /evaluate), health, pending approvals, approval review, approval expiration - Inspect actual route definitions in `services/api/app.py`, `services/symbol_registry/app.py`, `services/trading/app.py`, `services/risk/app.py` - _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6_ - [x] 10. Write Helm chart configuration reference - [x] 10.1 Create `docs/helm-reference.md` documenting all Helm values - Document `image` block: registry, pullPolicy, tag - Document `pipelineEnabled` toggle and effect on worker replicas - Document `services` block: per-service structure (replicas, image, command, tier, port, secrets, resources, probes) - Document `config` block: all ConfigMap environment variables with defaults and descriptions - Document `secrets` block: core, broker, market, gmail, dashboard — injection via `--set` flags - Document `ingress` block: className, clusterIssuer, host mappings - Document analytics stack toggles: trino.enabled, hiveMetastore.enabled, superset.enabled with resources - Document `networkPolicies.enabled` and default-deny-ingress behavior - Document value override files: `values-beta.yaml`, `values-paper.yaml` and deployment stages - _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9_ - [x] 11. Write Docker deployment guide - [x] 11.1 Create `docs/docker-deployment.md` with complete Docker deployment guide - Document every service with image, ports, volumes, environment variables - Document `.env` file format with all required/optional variables, defaults, descriptions - Document volume mounts and data persistence (pgdata, miniodata, ollama_models, hive_data, superset_data), reset with `docker compose down -v` - Document health check configurations and verification commands - Document Dockerfile build arguments (`SERVICE_CMD`) and custom image builds - Document operational commands: start, stop, restart, logs, scale, reset - _Requirements: 9.1, 9.2, 9.3, 9.4, 9.5, 9.6_ - [x] 12. Checkpoint — Documentation progress check - Verify `docs/services.md`, `docs/api-reference.md`, `docs/helm-reference.md`, `docs/docker-deployment.md` exist and render valid Markdown - Ensure all tests pass, ask the user if questions arise. - [x] 13. Write architecture diagrams - [x] 13.1 Create `docs/architecture-kubernetes.md` with Kubernetes deployment Mermaid diagram - Show all 13 services in `stonks-oracle` namespace grouped by tier (api, processing, trading, orchestration, analytics, frontend) - Show external cluster services (PostgreSQL, Redis, MinIO, Ollama) in their namespaces - Show Traefik ingress routes to external domains - Show network policy boundaries - Show analytics plane (Trino, Hive Metastore, Superset) and MinIO connections - Show Helm-managed secrets (core, broker, market, gmail) with consumer mapping - Distinguish API-tier (with ingress), pipeline-tier (queue-driven), and trading-tier services - _Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7_ - [x] 13.2 Create `docs/architecture-docker-compose.md` with Docker Compose Mermaid diagram - Show all infrastructure + application containers - Show host port mappings for externally accessible services - Show `depends_on` relationships and health check dependencies - Show named volumes and mount points - Show `.env` file providing API keys to relevant containers - Show internal Docker network connectivity - _Requirements: 11.1, 11.2, 11.3, 11.4, 11.5, 11.6_ - [x] 13.3 Create `docs/architecture-data-pipeline.md` with data pipeline Mermaid diagram - Show complete pipeline: external sources → ingestion → parsing → extraction → aggregation → recommendation → risk → trading → broker - Show Redis queue topology with queue names - Show three signal layers as distinct paths merging at aggregation - Show data stores at each stage (MinIO, PostgreSQL, Redis) - Show trading engine decision loop - Show analytical branch: lake publisher → MinIO/Parquet → Trino → Superset/Dashboard - Show external integrations: Ollama, Alpaca, AWS SNS, Gmail - _Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7_ - [x] 14. Write AI agent building guide - [x] 14.1 Create `docs/ai-agents.md` with AI agent guide - Document three built-in agents: document-extractor, event-classifier, thesis-rewriter — purpose, input data, output schema, default model, system prompt structure, user prompt template - Document `ai_agents` table schema and registration (system-seeded vs API-created) - Document `agent_variants` table: create, activate, deactivate variants for A/B testing - Document `AgentConfigResolver` module: TTL cache (60s), COALESCE-based variant override, fallback behavior - Document performance logging: `agent_performance_log` table, querying for variant comparison - Document API endpoints: CRUD on `/api/agents`, test endpoint `/api/agents/{id}/test` - Include step-by-step guide: creating a new variant with different model/prompt and activating it - _Requirements: 13.1, 13.2, 13.3, 13.4, 13.5, 13.6, 13.7, 13.8_ - [x] 15. Write backup and restore guide - [x] 15.1 Create `docs/backup-restore.md` with backup and restore guide - Document all scripts in `scripts/`: `backup-db.sh`, `restore-db.sh`, `backup-redis.sh`, `backup.sh`, `restore.sh` - For each backup script: CLI arguments, data captured, storage location, retention/pruning (keeps last 7) - For each restore script: CLI arguments, what it restores, service scale-down/up procedure, data loss implications - Document MinIO upload option (`--upload-minio`) for off-host storage - Document full nuke-and-rebuild procedure: connection termination, DB drop, Redis flush, redeploy, re-seed - Document recommended backup schedules and automation (cron, Kubernetes CronJobs) - _Requirements: 14.1, 14.2, 14.3, 14.4, 14.5, 14.6_ - [x] 16. Write observability and metrics reference - [x] 16.1 Create `docs/observability.md` with observability reference - Document `/metrics` endpoint on query-api and Prometheus scrape configuration - Document all Prometheus counters, gauges, histograms from `services/shared/metrics.py` — ingestion, parsing, extraction, aggregation, recommendation, lake, trading, alerting, DLQ, active jobs metrics with names, labels, descriptions - Document alerting module (`services/shared/alerting.py`): 4 alert rules, thresholds, evaluation windows, ConfigMap variables - Document structured JSON logging format, trace context (trace_id, span_id), log querying - Document dead-letter queue system: queue names (`stonks:dlq:`), routing, replay tooling - Document recommended Prometheus/Grafana queries for monitoring - _Requirements: 15.1, 15.2, 15.3, 15.4, 15.5, 15.6_ - [x] 17. Update README with documentation links - [x] 17.1 Update `README.md` with documentation section and resource links - Add "Documentation" section with links to all docs: services.md, api-reference.md, helm-reference.md, docker-deployment.md, architecture-kubernetes.md, architecture-docker-compose.md, architecture-data-pipeline.md, ai-agents.md, backup-restore.md, observability.md - Replace ASCII architecture diagram with Mermaid diagram or link to architecture diagram docs - Preserve all existing content: license, features, tech stack, project structure, deployment instructions - _Requirements: 16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 16.9, 16.10, 16.11_ - [x] 18. Final checkpoint — Full verification - Run `pytest tests/ -x --tb=short -q` — zero failures - Run `ruff check services/` — zero violations - Run `docker compose config` — validates successfully - Verify all `test_pbt_*` files pass unchanged - Verify all documentation files exist in `docs/` and render valid Markdown - Ensure all tests pass, ask the user if questions arise. ## Notes - Tasks marked with `*` are optional and can be skipped for faster MVP - Each task references specific requirements for traceability - Checkpoints ensure incremental validation - No property-based tests are included — the design assessment confirmed PBT is not applicable to this feature - Existing `test_pbt_*` files (22 files) must remain passing throughout - The implementation language is Python (with Markdown for documentation), matching the existing codebase