feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
This commit is contained in:
@@ -0,0 +1,223 @@
|
||||
# Implementation Plan: Comprehensive Quality & Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
This plan implements three pillars for the Stonks Oracle platform: (1) unit test coverage for the scheduler and ingestion services plus fixing pre-existing test failures, (2) extending docker-compose.yml with all 13 application services and the frontend, and (3) producing comprehensive documentation covering services, APIs, Helm configuration, Docker deployment, architecture diagrams, AI agents, backup/restore, observability, and README resource links. Tasks are ordered so tests come first (catch regressions early), then Docker Compose (infrastructure), then documentation (references verified code).
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 1. Write scheduler service unit tests
|
||||
- [x] 1.1 Create `tests/test_scheduler_unit.py` with unit tests for scheduler pure functions and orchestration
|
||||
- Import scheduler functions from `services/scheduler/app.py`
|
||||
- Mock `asyncpg.Pool` (`.fetch()`, `.fetchrow()`, `.fetchval()`, `.execute()`) and `redis.asyncio.Redis` (`.rpush()`, `.set()`, `.get()`, `.incr()`, `.expire()`, `.decr()`, `.delete()`)
|
||||
- Write 8+ test cases covering: `get_cadence_for_source`, `compute_backoff`, `is_source_due`, `build_job_payload`, `schedule_cycle` (mocked DB/Redis), `check_rate_limit`, `recover_stale_documents`, `retry_failed_extractions`
|
||||
- Verify error handling: DB/Redis connection failures handled without crashing
|
||||
- Use `pytest-asyncio` for async test functions, `unittest.mock.AsyncMock` and `unittest.mock.patch`
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4_
|
||||
|
||||
- [x] 1.2 Write additional edge-case unit tests for scheduler
|
||||
- Test boundary conditions: zero polling interval, max retry count, empty source list
|
||||
- Test rate limiting edge cases: global Polygon limit, per-type limits
|
||||
- _Requirements: 1.3, 1.4_
|
||||
|
||||
- [x] 2. Write ingestion service unit tests
|
||||
- [x] 2.1 Create `tests/test_ingestion_unit.py` with unit tests for ingestion worker
|
||||
- Import ingestion functions from `services/ingestion/worker.py`
|
||||
- Mock adapters as `AsyncMock` returning `AdapterResult` with controlled `error`, `items`, `content_hash`, `raw_payload`
|
||||
- Mock `asyncpg.Pool` for `ingestion_runs` INSERT/UPDATE, `persist_ingestion_items`, `record_retrieval_failure`
|
||||
- Mock `redis.asyncio.Redis` for dedupe checks, queue pushes, DLQ routing
|
||||
- Mock `minio.Minio` for `upload_raw_artifact`
|
||||
- Write 6+ test cases covering: successful job processing, adapter error with retry, retry exhaustion → dead-letter queue, content hash deduplication skip, cross-source dedup via `dedupe_items`, error handling paths
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4_
|
||||
|
||||
- [x] 2.2 Write additional edge-case unit tests for ingestion
|
||||
- Test empty adapter response, partial failures, multiple items in single job
|
||||
- _Requirements: 2.1, 2.4_
|
||||
|
||||
- [x] 3. Checkpoint — Verify new unit tests pass
|
||||
- Run `pytest tests/test_scheduler_unit.py tests/test_ingestion_unit.py -x --tb=short -q`
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 4. Fix pre-existing test failures
|
||||
- [x] 4.1 Fix `tests/test_extractor_prompts.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup (mock configuration, fixture data) or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- If production code changes are needed, add regression tests
|
||||
- _Requirements: 3.1, 3.5_
|
||||
|
||||
- [x] 4.2 Fix `tests/test_extractor_schemas.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- _Requirements: 3.2, 3.5_
|
||||
|
||||
- [x] 4.3 Fix `tests/test_ollama_client.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- _Requirements: 3.3, 3.5_
|
||||
|
||||
- [x] 4.4 Fix `tests/test_filings_adapter.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- _Requirements: 3.4, 3.5_
|
||||
|
||||
- [x] 5. Checkpoint — Full test suite green
|
||||
- Run `pytest tests/ -x --tb=short -q` and verify zero failures
|
||||
- Run `ruff check services/` and verify zero violations
|
||||
- Verify all `test_pbt_*` files pass unchanged
|
||||
- If any production code was modified, confirm regression tests exist
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4_
|
||||
|
||||
- [x] 6. Add application services to docker-compose.yml
|
||||
- [x] 6.1 Add shared environment anchor and all 14 service definitions to `docker-compose.yml`
|
||||
- Define `x-app-env` YAML anchor with common environment variables (POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD, REDIS_HOST, REDIS_PORT, MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, OLLAMA_BASE_URL)
|
||||
- Add 13 application service definitions: scheduler (using `docker/Dockerfile.scheduler`), symbol-registry, ingestion, parser, extractor, aggregation, recommendation, trading-engine, risk-engine, broker-adapter, lake-publisher, query-api — each using `docker/Dockerfile` with appropriate `SERVICE_CMD` build arg
|
||||
- Add dashboard service using `frontend/Dockerfile` on port 3000:8080
|
||||
- Configure `depends_on` with `condition: service_healthy` for infrastructure dependencies
|
||||
- Add health checks: FastAPI services use `curl -f http://localhost:8000/health`, workers use process liveness
|
||||
- Configure `env_file: .env` on services needing API keys (ingestion, broker-adapter, trading-engine)
|
||||
- Map host ports: symbol-registry:8001, trading-engine:8002, risk-engine:8003, query-api:8004, dashboard:3000
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7_
|
||||
|
||||
- [x] 6.2 Validate docker-compose.yml configuration
|
||||
- Run `docker compose config` to verify the updated file parses correctly
|
||||
- _Requirements: 5.1_
|
||||
|
||||
- [x] 7. Checkpoint — Tests and Docker Compose validated
|
||||
- Run `pytest tests/ -x --tb=short -q` to confirm no regressions
|
||||
- Run `docker compose config` to confirm valid YAML
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 8. Write per-service feature documentation
|
||||
- [x] 8.1 Create `docs/services.md` documenting all 13 services
|
||||
- For each service: purpose, entry point module path, configuration environment variables, database tables read/written, Redis queues consumed/published with message schemas
|
||||
- Include queue topology table (queue name → producer → consumer)
|
||||
- Document the three signal layers (company, macro, competitive) with data flow, toggles, and weight configurations
|
||||
- Document trading engine features: position sizing, circuit breakers, reserve pool, risk tier auto-adjustment, backtesting, notifications
|
||||
- Cross-reference API documentation for services with HTTP endpoints
|
||||
- _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_
|
||||
|
||||
- [x] 9. Write API reference documentation
|
||||
- [x] 9.1 Create `docs/api-reference.md` covering all four FastAPI services
|
||||
- Document all Query API endpoints (~40+): path, method, query parameters (type, default, constraints), request body schema, response schema, error codes
|
||||
- Document all Symbol Registry API endpoints: companies CRUD, aliases, watchlists, sources, exposure profiles, competitor relationships, competitor inference
|
||||
- Document all Trading API endpoints: health/readiness, engine status, config update, pause/resume, reset, decisions audit, performance metrics/history, backtesting, notifications config/history, override orders, debug state
|
||||
- Document all Risk API endpoints: order evaluation (POST /evaluate), health, pending approvals, approval review, approval expiration
|
||||
- Inspect actual route definitions in `services/api/app.py`, `services/symbol_registry/app.py`, `services/trading/app.py`, `services/risk/app.py`
|
||||
- _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6_
|
||||
|
||||
- [x] 10. Write Helm chart configuration reference
|
||||
- [x] 10.1 Create `docs/helm-reference.md` documenting all Helm values
|
||||
- Document `image` block: registry, pullPolicy, tag
|
||||
- Document `pipelineEnabled` toggle and effect on worker replicas
|
||||
- Document `services` block: per-service structure (replicas, image, command, tier, port, secrets, resources, probes)
|
||||
- Document `config` block: all ConfigMap environment variables with defaults and descriptions
|
||||
- Document `secrets` block: core, broker, market, gmail, dashboard — injection via `--set` flags
|
||||
- Document `ingress` block: className, clusterIssuer, host mappings
|
||||
- Document analytics stack toggles: trino.enabled, hiveMetastore.enabled, superset.enabled with resources
|
||||
- Document `networkPolicies.enabled` and default-deny-ingress behavior
|
||||
- Document value override files: `values-beta.yaml`, `values-paper.yaml` and deployment stages
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9_
|
||||
|
||||
- [x] 11. Write Docker deployment guide
|
||||
- [x] 11.1 Create `docs/docker-deployment.md` with complete Docker deployment guide
|
||||
- Document every service with image, ports, volumes, environment variables
|
||||
- Document `.env` file format with all required/optional variables, defaults, descriptions
|
||||
- Document volume mounts and data persistence (pgdata, miniodata, ollama_models, hive_data, superset_data), reset with `docker compose down -v`
|
||||
- Document health check configurations and verification commands
|
||||
- Document Dockerfile build arguments (`SERVICE_CMD`) and custom image builds
|
||||
- Document operational commands: start, stop, restart, logs, scale, reset
|
||||
- _Requirements: 9.1, 9.2, 9.3, 9.4, 9.5, 9.6_
|
||||
|
||||
- [x] 12. Checkpoint — Documentation progress check
|
||||
- Verify `docs/services.md`, `docs/api-reference.md`, `docs/helm-reference.md`, `docs/docker-deployment.md` exist and render valid Markdown
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 13. Write architecture diagrams
|
||||
- [x] 13.1 Create `docs/architecture-kubernetes.md` with Kubernetes deployment Mermaid diagram
|
||||
- Show all 13 services in `stonks-oracle` namespace grouped by tier (api, processing, trading, orchestration, analytics, frontend)
|
||||
- Show external cluster services (PostgreSQL, Redis, MinIO, Ollama) in their namespaces
|
||||
- Show Traefik ingress routes to external domains
|
||||
- Show network policy boundaries
|
||||
- Show analytics plane (Trino, Hive Metastore, Superset) and MinIO connections
|
||||
- Show Helm-managed secrets (core, broker, market, gmail) with consumer mapping
|
||||
- Distinguish API-tier (with ingress), pipeline-tier (queue-driven), and trading-tier services
|
||||
- _Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7_
|
||||
|
||||
- [x] 13.2 Create `docs/architecture-docker-compose.md` with Docker Compose Mermaid diagram
|
||||
- Show all infrastructure + application containers
|
||||
- Show host port mappings for externally accessible services
|
||||
- Show `depends_on` relationships and health check dependencies
|
||||
- Show named volumes and mount points
|
||||
- Show `.env` file providing API keys to relevant containers
|
||||
- Show internal Docker network connectivity
|
||||
- _Requirements: 11.1, 11.2, 11.3, 11.4, 11.5, 11.6_
|
||||
|
||||
- [x] 13.3 Create `docs/architecture-data-pipeline.md` with data pipeline Mermaid diagram
|
||||
- Show complete pipeline: external sources → ingestion → parsing → extraction → aggregation → recommendation → risk → trading → broker
|
||||
- Show Redis queue topology with queue names
|
||||
- Show three signal layers as distinct paths merging at aggregation
|
||||
- Show data stores at each stage (MinIO, PostgreSQL, Redis)
|
||||
- Show trading engine decision loop
|
||||
- Show analytical branch: lake publisher → MinIO/Parquet → Trino → Superset/Dashboard
|
||||
- Show external integrations: Ollama, Alpaca, AWS SNS, Gmail
|
||||
- _Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7_
|
||||
|
||||
- [x] 14. Write AI agent building guide
|
||||
- [x] 14.1 Create `docs/ai-agents.md` with AI agent guide
|
||||
- Document three built-in agents: document-extractor, event-classifier, thesis-rewriter — purpose, input data, output schema, default model, system prompt structure, user prompt template
|
||||
- Document `ai_agents` table schema and registration (system-seeded vs API-created)
|
||||
- Document `agent_variants` table: create, activate, deactivate variants for A/B testing
|
||||
- Document `AgentConfigResolver` module: TTL cache (60s), COALESCE-based variant override, fallback behavior
|
||||
- Document performance logging: `agent_performance_log` table, querying for variant comparison
|
||||
- Document API endpoints: CRUD on `/api/agents`, test endpoint `/api/agents/{id}/test`
|
||||
- Include step-by-step guide: creating a new variant with different model/prompt and activating it
|
||||
- _Requirements: 13.1, 13.2, 13.3, 13.4, 13.5, 13.6, 13.7, 13.8_
|
||||
|
||||
- [x] 15. Write backup and restore guide
|
||||
- [x] 15.1 Create `docs/backup-restore.md` with backup and restore guide
|
||||
- Document all scripts in `scripts/`: `backup-db.sh`, `restore-db.sh`, `backup-redis.sh`, `backup.sh`, `restore.sh`
|
||||
- For each backup script: CLI arguments, data captured, storage location, retention/pruning (keeps last 7)
|
||||
- For each restore script: CLI arguments, what it restores, service scale-down/up procedure, data loss implications
|
||||
- Document MinIO upload option (`--upload-minio`) for off-host storage
|
||||
- Document full nuke-and-rebuild procedure: connection termination, DB drop, Redis flush, redeploy, re-seed
|
||||
- Document recommended backup schedules and automation (cron, Kubernetes CronJobs)
|
||||
- _Requirements: 14.1, 14.2, 14.3, 14.4, 14.5, 14.6_
|
||||
|
||||
- [x] 16. Write observability and metrics reference
|
||||
- [x] 16.1 Create `docs/observability.md` with observability reference
|
||||
- Document `/metrics` endpoint on query-api and Prometheus scrape configuration
|
||||
- Document all Prometheus counters, gauges, histograms from `services/shared/metrics.py` — ingestion, parsing, extraction, aggregation, recommendation, lake, trading, alerting, DLQ, active jobs metrics with names, labels, descriptions
|
||||
- Document alerting module (`services/shared/alerting.py`): 4 alert rules, thresholds, evaluation windows, ConfigMap variables
|
||||
- Document structured JSON logging format, trace context (trace_id, span_id), log querying
|
||||
- Document dead-letter queue system: queue names (`stonks:dlq:<queue>`), routing, replay tooling
|
||||
- Document recommended Prometheus/Grafana queries for monitoring
|
||||
- _Requirements: 15.1, 15.2, 15.3, 15.4, 15.5, 15.6_
|
||||
|
||||
- [x] 17. Update README with documentation links
|
||||
- [x] 17.1 Update `README.md` with documentation section and resource links
|
||||
- Add "Documentation" section with links to all docs: services.md, api-reference.md, helm-reference.md, docker-deployment.md, architecture-kubernetes.md, architecture-docker-compose.md, architecture-data-pipeline.md, ai-agents.md, backup-restore.md, observability.md
|
||||
- Replace ASCII architecture diagram with Mermaid diagram or link to architecture diagram docs
|
||||
- Preserve all existing content: license, features, tech stack, project structure, deployment instructions
|
||||
- _Requirements: 16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 16.9, 16.10, 16.11_
|
||||
|
||||
- [x] 18. Final checkpoint — Full verification
|
||||
- Run `pytest tests/ -x --tb=short -q` — zero failures
|
||||
- Run `ruff check services/` — zero violations
|
||||
- Run `docker compose config` — validates successfully
|
||||
- Verify all `test_pbt_*` files pass unchanged
|
||||
- Verify all documentation files exist in `docs/` and render valid Markdown
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
## Notes
|
||||
|
||||
- Tasks marked with `*` are optional and can be skipped for faster MVP
|
||||
- Each task references specific requirements for traceability
|
||||
- Checkpoints ensure incremental validation
|
||||
- No property-based tests are included — the design assessment confirmed PBT is not applicable to this feature
|
||||
- Existing `test_pbt_*` files (22 files) must remain passing throughout
|
||||
- The implementation language is Python (with Markdown for documentation), matching the existing codebase
|
||||
Reference in New Issue
Block a user