Files
Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
2026-04-22 02:56:41 +00:00

16 KiB

Implementation Plan: Comprehensive Quality & Documentation

Overview

This plan implements three pillars for the Stonks Oracle platform: (1) unit test coverage for the scheduler and ingestion services plus fixing pre-existing test failures, (2) extending docker-compose.yml with all 13 application services and the frontend, and (3) producing comprehensive documentation covering services, APIs, Helm configuration, Docker deployment, architecture diagrams, AI agents, backup/restore, observability, and README resource links. Tasks are ordered so tests come first (catch regressions early), then Docker Compose (infrastructure), then documentation (references verified code).

Tasks

  • 1. Write scheduler service unit tests

    • 1.1 Create tests/test_scheduler_unit.py with unit tests for scheduler pure functions and orchestration

      • Import scheduler functions from services/scheduler/app.py
      • Mock asyncpg.Pool (.fetch(), .fetchrow(), .fetchval(), .execute()) and redis.asyncio.Redis (.rpush(), .set(), .get(), .incr(), .expire(), .decr(), .delete())
      • Write 8+ test cases covering: get_cadence_for_source, compute_backoff, is_source_due, build_job_payload, schedule_cycle (mocked DB/Redis), check_rate_limit, recover_stale_documents, retry_failed_extractions
      • Verify error handling: DB/Redis connection failures handled without crashing
      • Use pytest-asyncio for async test functions, unittest.mock.AsyncMock and unittest.mock.patch
      • Requirements: 1.1, 1.2, 1.3, 1.4
    • 1.2 Write additional edge-case unit tests for scheduler

      • Test boundary conditions: zero polling interval, max retry count, empty source list
      • Test rate limiting edge cases: global Polygon limit, per-type limits
      • Requirements: 1.3, 1.4
  • 2. Write ingestion service unit tests

    • 2.1 Create tests/test_ingestion_unit.py with unit tests for ingestion worker

      • Import ingestion functions from services/ingestion/worker.py
      • Mock adapters as AsyncMock returning AdapterResult with controlled error, items, content_hash, raw_payload
      • Mock asyncpg.Pool for ingestion_runs INSERT/UPDATE, persist_ingestion_items, record_retrieval_failure
      • Mock redis.asyncio.Redis for dedupe checks, queue pushes, DLQ routing
      • Mock minio.Minio for upload_raw_artifact
      • Write 6+ test cases covering: successful job processing, adapter error with retry, retry exhaustion → dead-letter queue, content hash deduplication skip, cross-source dedup via dedupe_items, error handling paths
      • Requirements: 2.1, 2.2, 2.3, 2.4
    • 2.2 Write additional edge-case unit tests for ingestion

      • Test empty adapter response, partial failures, multiple items in single job
      • Requirements: 2.1, 2.4
  • 3. Checkpoint — Verify new unit tests pass

    • Run pytest tests/test_scheduler_unit.py tests/test_ingestion_unit.py -x --tb=short -q
    • Ensure all tests pass, ask the user if questions arise.
  • 4. Fix pre-existing test failures

    • 4.1 Fix tests/test_extractor_prompts.py

      • Run the file individually to diagnose failures
      • Fix test setup (mock configuration, fixture data) or production code as needed
      • Preserve original test intent and assertions
      • If production code changes are needed, add regression tests
      • Requirements: 3.1, 3.5
    • 4.2 Fix tests/test_extractor_schemas.py

      • Run the file individually to diagnose failures
      • Fix test setup or production code as needed
      • Preserve original test intent and assertions
      • Requirements: 3.2, 3.5
    • 4.3 Fix tests/test_ollama_client.py

      • Run the file individually to diagnose failures
      • Fix test setup or production code as needed
      • Preserve original test intent and assertions
      • Requirements: 3.3, 3.5
    • 4.4 Fix tests/test_filings_adapter.py

      • Run the file individually to diagnose failures
      • Fix test setup or production code as needed
      • Preserve original test intent and assertions
      • Requirements: 3.4, 3.5
  • 5. Checkpoint — Full test suite green

    • Run pytest tests/ -x --tb=short -q and verify zero failures
    • Run ruff check services/ and verify zero violations
    • Verify all test_pbt_* files pass unchanged
    • If any production code was modified, confirm regression tests exist
    • Ensure all tests pass, ask the user if questions arise.
    • Requirements: 4.1, 4.2, 4.3, 4.4
  • 6. Add application services to docker-compose.yml

    • 6.1 Add shared environment anchor and all 14 service definitions to docker-compose.yml

      • Define x-app-env YAML anchor with common environment variables (POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD, REDIS_HOST, REDIS_PORT, MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, OLLAMA_BASE_URL)
      • Add 13 application service definitions: scheduler (using docker/Dockerfile.scheduler), symbol-registry, ingestion, parser, extractor, aggregation, recommendation, trading-engine, risk-engine, broker-adapter, lake-publisher, query-api — each using docker/Dockerfile with appropriate SERVICE_CMD build arg
      • Add dashboard service using frontend/Dockerfile on port 3000:8080
      • Configure depends_on with condition: service_healthy for infrastructure dependencies
      • Add health checks: FastAPI services use curl -f http://localhost:8000/health, workers use process liveness
      • Configure env_file: .env on services needing API keys (ingestion, broker-adapter, trading-engine)
      • Map host ports: symbol-registry:8001, trading-engine:8002, risk-engine:8003, query-api:8004, dashboard:3000
      • Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7
    • 6.2 Validate docker-compose.yml configuration

      • Run docker compose config to verify the updated file parses correctly
      • Requirements: 5.1
  • 7. Checkpoint — Tests and Docker Compose validated

    • Run pytest tests/ -x --tb=short -q to confirm no regressions
    • Run docker compose config to confirm valid YAML
    • Ensure all tests pass, ask the user if questions arise.
  • 8. Write per-service feature documentation

    • 8.1 Create docs/services.md documenting all 13 services
      • For each service: purpose, entry point module path, configuration environment variables, database tables read/written, Redis queues consumed/published with message schemas
      • Include queue topology table (queue name → producer → consumer)
      • Document the three signal layers (company, macro, competitive) with data flow, toggles, and weight configurations
      • Document trading engine features: position sizing, circuit breakers, reserve pool, risk tier auto-adjustment, backtesting, notifications
      • Cross-reference API documentation for services with HTTP endpoints
      • Requirements: 6.1, 6.2, 6.3, 6.4, 6.5
  • 9. Write API reference documentation

    • 9.1 Create docs/api-reference.md covering all four FastAPI services
      • Document all Query API endpoints (~40+): path, method, query parameters (type, default, constraints), request body schema, response schema, error codes
      • Document all Symbol Registry API endpoints: companies CRUD, aliases, watchlists, sources, exposure profiles, competitor relationships, competitor inference
      • Document all Trading API endpoints: health/readiness, engine status, config update, pause/resume, reset, decisions audit, performance metrics/history, backtesting, notifications config/history, override orders, debug state
      • Document all Risk API endpoints: order evaluation (POST /evaluate), health, pending approvals, approval review, approval expiration
      • Inspect actual route definitions in services/api/app.py, services/symbol_registry/app.py, services/trading/app.py, services/risk/app.py
      • Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6
  • 10. Write Helm chart configuration reference

    • 10.1 Create docs/helm-reference.md documenting all Helm values
      • Document image block: registry, pullPolicy, tag
      • Document pipelineEnabled toggle and effect on worker replicas
      • Document services block: per-service structure (replicas, image, command, tier, port, secrets, resources, probes)
      • Document config block: all ConfigMap environment variables with defaults and descriptions
      • Document secrets block: core, broker, market, gmail, dashboard — injection via --set flags
      • Document ingress block: className, clusterIssuer, host mappings
      • Document analytics stack toggles: trino.enabled, hiveMetastore.enabled, superset.enabled with resources
      • Document networkPolicies.enabled and default-deny-ingress behavior
      • Document value override files: values-beta.yaml, values-paper.yaml and deployment stages
      • Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9
  • 11. Write Docker deployment guide

    • 11.1 Create docs/docker-deployment.md with complete Docker deployment guide
      • Document every service with image, ports, volumes, environment variables
      • Document .env file format with all required/optional variables, defaults, descriptions
      • Document volume mounts and data persistence (pgdata, miniodata, ollama_models, hive_data, superset_data), reset with docker compose down -v
      • Document health check configurations and verification commands
      • Document Dockerfile build arguments (SERVICE_CMD) and custom image builds
      • Document operational commands: start, stop, restart, logs, scale, reset
      • Requirements: 9.1, 9.2, 9.3, 9.4, 9.5, 9.6
  • 12. Checkpoint — Documentation progress check

    • Verify docs/services.md, docs/api-reference.md, docs/helm-reference.md, docs/docker-deployment.md exist and render valid Markdown
    • Ensure all tests pass, ask the user if questions arise.
  • 13. Write architecture diagrams

    • 13.1 Create docs/architecture-kubernetes.md with Kubernetes deployment Mermaid diagram

      • Show all 13 services in stonks-oracle namespace grouped by tier (api, processing, trading, orchestration, analytics, frontend)
      • Show external cluster services (PostgreSQL, Redis, MinIO, Ollama) in their namespaces
      • Show Traefik ingress routes to external domains
      • Show network policy boundaries
      • Show analytics plane (Trino, Hive Metastore, Superset) and MinIO connections
      • Show Helm-managed secrets (core, broker, market, gmail) with consumer mapping
      • Distinguish API-tier (with ingress), pipeline-tier (queue-driven), and trading-tier services
      • Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
    • 13.2 Create docs/architecture-docker-compose.md with Docker Compose Mermaid diagram

      • Show all infrastructure + application containers
      • Show host port mappings for externally accessible services
      • Show depends_on relationships and health check dependencies
      • Show named volumes and mount points
      • Show .env file providing API keys to relevant containers
      • Show internal Docker network connectivity
      • Requirements: 11.1, 11.2, 11.3, 11.4, 11.5, 11.6
    • 13.3 Create docs/architecture-data-pipeline.md with data pipeline Mermaid diagram

      • Show complete pipeline: external sources → ingestion → parsing → extraction → aggregation → recommendation → risk → trading → broker
      • Show Redis queue topology with queue names
      • Show three signal layers as distinct paths merging at aggregation
      • Show data stores at each stage (MinIO, PostgreSQL, Redis)
      • Show trading engine decision loop
      • Show analytical branch: lake publisher → MinIO/Parquet → Trino → Superset/Dashboard
      • Show external integrations: Ollama, Alpaca, AWS SNS, Gmail
      • Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7
  • 14. Write AI agent building guide

    • 14.1 Create docs/ai-agents.md with AI agent guide
      • Document three built-in agents: document-extractor, event-classifier, thesis-rewriter — purpose, input data, output schema, default model, system prompt structure, user prompt template
      • Document ai_agents table schema and registration (system-seeded vs API-created)
      • Document agent_variants table: create, activate, deactivate variants for A/B testing
      • Document AgentConfigResolver module: TTL cache (60s), COALESCE-based variant override, fallback behavior
      • Document performance logging: agent_performance_log table, querying for variant comparison
      • Document API endpoints: CRUD on /api/agents, test endpoint /api/agents/{id}/test
      • Include step-by-step guide: creating a new variant with different model/prompt and activating it
      • Requirements: 13.1, 13.2, 13.3, 13.4, 13.5, 13.6, 13.7, 13.8
  • 15. Write backup and restore guide

    • 15.1 Create docs/backup-restore.md with backup and restore guide
      • Document all scripts in scripts/: backup-db.sh, restore-db.sh, backup-redis.sh, backup.sh, restore.sh
      • For each backup script: CLI arguments, data captured, storage location, retention/pruning (keeps last 7)
      • For each restore script: CLI arguments, what it restores, service scale-down/up procedure, data loss implications
      • Document MinIO upload option (--upload-minio) for off-host storage
      • Document full nuke-and-rebuild procedure: connection termination, DB drop, Redis flush, redeploy, re-seed
      • Document recommended backup schedules and automation (cron, Kubernetes CronJobs)
      • Requirements: 14.1, 14.2, 14.3, 14.4, 14.5, 14.6
  • 16. Write observability and metrics reference

    • 16.1 Create docs/observability.md with observability reference
      • Document /metrics endpoint on query-api and Prometheus scrape configuration
      • Document all Prometheus counters, gauges, histograms from services/shared/metrics.py — ingestion, parsing, extraction, aggregation, recommendation, lake, trading, alerting, DLQ, active jobs metrics with names, labels, descriptions
      • Document alerting module (services/shared/alerting.py): 4 alert rules, thresholds, evaluation windows, ConfigMap variables
      • Document structured JSON logging format, trace context (trace_id, span_id), log querying
      • Document dead-letter queue system: queue names (stonks:dlq:<queue>), routing, replay tooling
      • Document recommended Prometheus/Grafana queries for monitoring
      • Requirements: 15.1, 15.2, 15.3, 15.4, 15.5, 15.6
  • 17. Update README with documentation links

    • 17.1 Update README.md with documentation section and resource links
      • Add "Documentation" section with links to all docs: services.md, api-reference.md, helm-reference.md, docker-deployment.md, architecture-kubernetes.md, architecture-docker-compose.md, architecture-data-pipeline.md, ai-agents.md, backup-restore.md, observability.md
      • Replace ASCII architecture diagram with Mermaid diagram or link to architecture diagram docs
      • Preserve all existing content: license, features, tech stack, project structure, deployment instructions
      • Requirements: 16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 16.9, 16.10, 16.11
  • 18. Final checkpoint — Full verification

    • Run pytest tests/ -x --tb=short -q — zero failures
    • Run ruff check services/ — zero violations
    • Run docker compose config — validates successfully
    • Verify all test_pbt_* files pass unchanged
    • Verify all documentation files exist in docs/ and render valid Markdown
    • Ensure all tests pass, ask the user if questions arise.

Notes

  • Tasks marked with * are optional and can be skipped for faster MVP
  • Each task references specific requirements for traceability
  • Checkpoints ensure incremental validation
  • No property-based tests are included — the design assessment confirmed PBT is not applicable to this feature
  • Existing test_pbt_* files (22 files) must remain passing throughout
  • The implementation language is Python (with Markdown for documentation), matching the existing codebase