- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
16 KiB
Implementation Plan: Comprehensive Quality & Documentation
Overview
This plan implements three pillars for the Stonks Oracle platform: (1) unit test coverage for the scheduler and ingestion services plus fixing pre-existing test failures, (2) extending docker-compose.yml with all 13 application services and the frontend, and (3) producing comprehensive documentation covering services, APIs, Helm configuration, Docker deployment, architecture diagrams, AI agents, backup/restore, observability, and README resource links. Tasks are ordered so tests come first (catch regressions early), then Docker Compose (infrastructure), then documentation (references verified code).
Tasks
-
1. Write scheduler service unit tests
-
1.1 Create
tests/test_scheduler_unit.pywith unit tests for scheduler pure functions and orchestration- Import scheduler functions from
services/scheduler/app.py - Mock
asyncpg.Pool(.fetch(),.fetchrow(),.fetchval(),.execute()) andredis.asyncio.Redis(.rpush(),.set(),.get(),.incr(),.expire(),.decr(),.delete()) - Write 8+ test cases covering:
get_cadence_for_source,compute_backoff,is_source_due,build_job_payload,schedule_cycle(mocked DB/Redis),check_rate_limit,recover_stale_documents,retry_failed_extractions - Verify error handling: DB/Redis connection failures handled without crashing
- Use
pytest-asynciofor async test functions,unittest.mock.AsyncMockandunittest.mock.patch - Requirements: 1.1, 1.2, 1.3, 1.4
- Import scheduler functions from
-
1.2 Write additional edge-case unit tests for scheduler
- Test boundary conditions: zero polling interval, max retry count, empty source list
- Test rate limiting edge cases: global Polygon limit, per-type limits
- Requirements: 1.3, 1.4
-
-
2. Write ingestion service unit tests
-
2.1 Create
tests/test_ingestion_unit.pywith unit tests for ingestion worker- Import ingestion functions from
services/ingestion/worker.py - Mock adapters as
AsyncMockreturningAdapterResultwith controllederror,items,content_hash,raw_payload - Mock
asyncpg.Poolforingestion_runsINSERT/UPDATE,persist_ingestion_items,record_retrieval_failure - Mock
redis.asyncio.Redisfor dedupe checks, queue pushes, DLQ routing - Mock
minio.Minioforupload_raw_artifact - Write 6+ test cases covering: successful job processing, adapter error with retry, retry exhaustion → dead-letter queue, content hash deduplication skip, cross-source dedup via
dedupe_items, error handling paths - Requirements: 2.1, 2.2, 2.3, 2.4
- Import ingestion functions from
-
2.2 Write additional edge-case unit tests for ingestion
- Test empty adapter response, partial failures, multiple items in single job
- Requirements: 2.1, 2.4
-
-
3. Checkpoint — Verify new unit tests pass
- Run
pytest tests/test_scheduler_unit.py tests/test_ingestion_unit.py -x --tb=short -q - Ensure all tests pass, ask the user if questions arise.
- Run
-
4. Fix pre-existing test failures
-
4.1 Fix
tests/test_extractor_prompts.py- Run the file individually to diagnose failures
- Fix test setup (mock configuration, fixture data) or production code as needed
- Preserve original test intent and assertions
- If production code changes are needed, add regression tests
- Requirements: 3.1, 3.5
-
4.2 Fix
tests/test_extractor_schemas.py- Run the file individually to diagnose failures
- Fix test setup or production code as needed
- Preserve original test intent and assertions
- Requirements: 3.2, 3.5
-
4.3 Fix
tests/test_ollama_client.py- Run the file individually to diagnose failures
- Fix test setup or production code as needed
- Preserve original test intent and assertions
- Requirements: 3.3, 3.5
-
4.4 Fix
tests/test_filings_adapter.py- Run the file individually to diagnose failures
- Fix test setup or production code as needed
- Preserve original test intent and assertions
- Requirements: 3.4, 3.5
-
-
5. Checkpoint — Full test suite green
- Run
pytest tests/ -x --tb=short -qand verify zero failures - Run
ruff check services/and verify zero violations - Verify all
test_pbt_*files pass unchanged - If any production code was modified, confirm regression tests exist
- Ensure all tests pass, ask the user if questions arise.
- Requirements: 4.1, 4.2, 4.3, 4.4
- Run
-
6. Add application services to docker-compose.yml
-
6.1 Add shared environment anchor and all 14 service definitions to
docker-compose.yml- Define
x-app-envYAML anchor with common environment variables (POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD, REDIS_HOST, REDIS_PORT, MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, OLLAMA_BASE_URL) - Add 13 application service definitions: scheduler (using
docker/Dockerfile.scheduler), symbol-registry, ingestion, parser, extractor, aggregation, recommendation, trading-engine, risk-engine, broker-adapter, lake-publisher, query-api — each usingdocker/Dockerfilewith appropriateSERVICE_CMDbuild arg - Add dashboard service using
frontend/Dockerfileon port 3000:8080 - Configure
depends_onwithcondition: service_healthyfor infrastructure dependencies - Add health checks: FastAPI services use
curl -f http://localhost:8000/health, workers use process liveness - Configure
env_file: .envon services needing API keys (ingestion, broker-adapter, trading-engine) - Map host ports: symbol-registry:8001, trading-engine:8002, risk-engine:8003, query-api:8004, dashboard:3000
- Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7
- Define
-
6.2 Validate docker-compose.yml configuration
- Run
docker compose configto verify the updated file parses correctly - Requirements: 5.1
- Run
-
-
7. Checkpoint — Tests and Docker Compose validated
- Run
pytest tests/ -x --tb=short -qto confirm no regressions - Run
docker compose configto confirm valid YAML - Ensure all tests pass, ask the user if questions arise.
- Run
-
8. Write per-service feature documentation
- 8.1 Create
docs/services.mddocumenting all 13 services- For each service: purpose, entry point module path, configuration environment variables, database tables read/written, Redis queues consumed/published with message schemas
- Include queue topology table (queue name → producer → consumer)
- Document the three signal layers (company, macro, competitive) with data flow, toggles, and weight configurations
- Document trading engine features: position sizing, circuit breakers, reserve pool, risk tier auto-adjustment, backtesting, notifications
- Cross-reference API documentation for services with HTTP endpoints
- Requirements: 6.1, 6.2, 6.3, 6.4, 6.5
- 8.1 Create
-
9. Write API reference documentation
- 9.1 Create
docs/api-reference.mdcovering all four FastAPI services- Document all Query API endpoints (~40+): path, method, query parameters (type, default, constraints), request body schema, response schema, error codes
- Document all Symbol Registry API endpoints: companies CRUD, aliases, watchlists, sources, exposure profiles, competitor relationships, competitor inference
- Document all Trading API endpoints: health/readiness, engine status, config update, pause/resume, reset, decisions audit, performance metrics/history, backtesting, notifications config/history, override orders, debug state
- Document all Risk API endpoints: order evaluation (POST /evaluate), health, pending approvals, approval review, approval expiration
- Inspect actual route definitions in
services/api/app.py,services/symbol_registry/app.py,services/trading/app.py,services/risk/app.py - Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6
- 9.1 Create
-
10. Write Helm chart configuration reference
- 10.1 Create
docs/helm-reference.mddocumenting all Helm values- Document
imageblock: registry, pullPolicy, tag - Document
pipelineEnabledtoggle and effect on worker replicas - Document
servicesblock: per-service structure (replicas, image, command, tier, port, secrets, resources, probes) - Document
configblock: all ConfigMap environment variables with defaults and descriptions - Document
secretsblock: core, broker, market, gmail, dashboard — injection via--setflags - Document
ingressblock: className, clusterIssuer, host mappings - Document analytics stack toggles: trino.enabled, hiveMetastore.enabled, superset.enabled with resources
- Document
networkPolicies.enabledand default-deny-ingress behavior - Document value override files:
values-beta.yaml,values-paper.yamland deployment stages - Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9
- Document
- 10.1 Create
-
11. Write Docker deployment guide
- 11.1 Create
docs/docker-deployment.mdwith complete Docker deployment guide- Document every service with image, ports, volumes, environment variables
- Document
.envfile format with all required/optional variables, defaults, descriptions - Document volume mounts and data persistence (pgdata, miniodata, ollama_models, hive_data, superset_data), reset with
docker compose down -v - Document health check configurations and verification commands
- Document Dockerfile build arguments (
SERVICE_CMD) and custom image builds - Document operational commands: start, stop, restart, logs, scale, reset
- Requirements: 9.1, 9.2, 9.3, 9.4, 9.5, 9.6
- 11.1 Create
-
12. Checkpoint — Documentation progress check
- Verify
docs/services.md,docs/api-reference.md,docs/helm-reference.md,docs/docker-deployment.mdexist and render valid Markdown - Ensure all tests pass, ask the user if questions arise.
- Verify
-
13. Write architecture diagrams
-
13.1 Create
docs/architecture-kubernetes.mdwith Kubernetes deployment Mermaid diagram- Show all 13 services in
stonks-oraclenamespace grouped by tier (api, processing, trading, orchestration, analytics, frontend) - Show external cluster services (PostgreSQL, Redis, MinIO, Ollama) in their namespaces
- Show Traefik ingress routes to external domains
- Show network policy boundaries
- Show analytics plane (Trino, Hive Metastore, Superset) and MinIO connections
- Show Helm-managed secrets (core, broker, market, gmail) with consumer mapping
- Distinguish API-tier (with ingress), pipeline-tier (queue-driven), and trading-tier services
- Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
- Show all 13 services in
-
13.2 Create
docs/architecture-docker-compose.mdwith Docker Compose Mermaid diagram- Show all infrastructure + application containers
- Show host port mappings for externally accessible services
- Show
depends_onrelationships and health check dependencies - Show named volumes and mount points
- Show
.envfile providing API keys to relevant containers - Show internal Docker network connectivity
- Requirements: 11.1, 11.2, 11.3, 11.4, 11.5, 11.6
-
13.3 Create
docs/architecture-data-pipeline.mdwith data pipeline Mermaid diagram- Show complete pipeline: external sources → ingestion → parsing → extraction → aggregation → recommendation → risk → trading → broker
- Show Redis queue topology with queue names
- Show three signal layers as distinct paths merging at aggregation
- Show data stores at each stage (MinIO, PostgreSQL, Redis)
- Show trading engine decision loop
- Show analytical branch: lake publisher → MinIO/Parquet → Trino → Superset/Dashboard
- Show external integrations: Ollama, Alpaca, AWS SNS, Gmail
- Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7
-
-
14. Write AI agent building guide
- 14.1 Create
docs/ai-agents.mdwith AI agent guide- Document three built-in agents: document-extractor, event-classifier, thesis-rewriter — purpose, input data, output schema, default model, system prompt structure, user prompt template
- Document
ai_agentstable schema and registration (system-seeded vs API-created) - Document
agent_variantstable: create, activate, deactivate variants for A/B testing - Document
AgentConfigResolvermodule: TTL cache (60s), COALESCE-based variant override, fallback behavior - Document performance logging:
agent_performance_logtable, querying for variant comparison - Document API endpoints: CRUD on
/api/agents, test endpoint/api/agents/{id}/test - Include step-by-step guide: creating a new variant with different model/prompt and activating it
- Requirements: 13.1, 13.2, 13.3, 13.4, 13.5, 13.6, 13.7, 13.8
- 14.1 Create
-
15. Write backup and restore guide
- 15.1 Create
docs/backup-restore.mdwith backup and restore guide- Document all scripts in
scripts/:backup-db.sh,restore-db.sh,backup-redis.sh,backup.sh,restore.sh - For each backup script: CLI arguments, data captured, storage location, retention/pruning (keeps last 7)
- For each restore script: CLI arguments, what it restores, service scale-down/up procedure, data loss implications
- Document MinIO upload option (
--upload-minio) for off-host storage - Document full nuke-and-rebuild procedure: connection termination, DB drop, Redis flush, redeploy, re-seed
- Document recommended backup schedules and automation (cron, Kubernetes CronJobs)
- Requirements: 14.1, 14.2, 14.3, 14.4, 14.5, 14.6
- Document all scripts in
- 15.1 Create
-
16. Write observability and metrics reference
- 16.1 Create
docs/observability.mdwith observability reference- Document
/metricsendpoint on query-api and Prometheus scrape configuration - Document all Prometheus counters, gauges, histograms from
services/shared/metrics.py— ingestion, parsing, extraction, aggregation, recommendation, lake, trading, alerting, DLQ, active jobs metrics with names, labels, descriptions - Document alerting module (
services/shared/alerting.py): 4 alert rules, thresholds, evaluation windows, ConfigMap variables - Document structured JSON logging format, trace context (trace_id, span_id), log querying
- Document dead-letter queue system: queue names (
stonks:dlq:<queue>), routing, replay tooling - Document recommended Prometheus/Grafana queries for monitoring
- Requirements: 15.1, 15.2, 15.3, 15.4, 15.5, 15.6
- Document
- 16.1 Create
-
17. Update README with documentation links
- 17.1 Update
README.mdwith documentation section and resource links- Add "Documentation" section with links to all docs: services.md, api-reference.md, helm-reference.md, docker-deployment.md, architecture-kubernetes.md, architecture-docker-compose.md, architecture-data-pipeline.md, ai-agents.md, backup-restore.md, observability.md
- Replace ASCII architecture diagram with Mermaid diagram or link to architecture diagram docs
- Preserve all existing content: license, features, tech stack, project structure, deployment instructions
- Requirements: 16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 16.9, 16.10, 16.11
- 17.1 Update
-
18. Final checkpoint — Full verification
- Run
pytest tests/ -x --tb=short -q— zero failures - Run
ruff check services/— zero violations - Run
docker compose config— validates successfully - Verify all
test_pbt_*files pass unchanged - Verify all documentation files exist in
docs/and render valid Markdown - Ensure all tests pass, ask the user if questions arise.
- Run
Notes
- Tasks marked with
*are optional and can be skipped for faster MVP - Each task references specific requirements for traceability
- Checkpoints ensure incremental validation
- No property-based tests are included — the design assessment confirmed PBT is not applicable to this feature
- Existing
test_pbt_*files (22 files) must remain passing throughout - The implementation language is Python (with Markdown for documentation), matching the existing codebase