feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
This commit is contained in:
@@ -0,0 +1 @@
|
||||
{"specId": "e433350c-baf0-4f4f-a30e-3724f6654090", "workflowType": "requirements-first", "specType": "feature"}
|
||||
@@ -0,0 +1,377 @@
|
||||
# Design Document: Comprehensive Quality & Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
This design covers three pillars for the Stonks Oracle platform:
|
||||
|
||||
1. **Test Coverage** — Close unit test gaps in the scheduler and ingestion services, fix pre-existing test failures in the extractor module, and achieve a fully green test suite (Requirements 1–4).
|
||||
2. **Docker Deployment** — Extend `docker-compose.yml` to include all 13 application services plus the frontend, enabling full-platform local development without Kubernetes (Requirement 5).
|
||||
3. **Documentation** — Produce comprehensive documentation covering per-service features, API references, Helm chart configuration, Docker deployment, three Mermaid architecture diagrams, AI agent building, backup/restore, observability, and README resource links (Requirements 6–16).
|
||||
|
||||
### Design Rationale
|
||||
|
||||
The platform has mature production code across 13 services but uneven test coverage and documentation. The scheduler and ingestion services lack dedicated unit tests — their logic is only exercised through integration tests. Four extractor-related test files have pre-existing failures that block CI. Documentation exists only as a local dev setup guide, a pipeline overview, and a runbook. This initiative fills those gaps systematically.
|
||||
|
||||
The approach prioritizes:
|
||||
- **Test isolation**: Mock all external dependencies (PostgreSQL, Redis, MinIO, Ollama) so unit tests run fast and deterministically.
|
||||
- **Documentation from source**: Generate API references by inspecting actual FastAPI route definitions, Helm values from `values.yaml`, and metrics from `services/shared/metrics.py`.
|
||||
- **Docker parity with Kubernetes**: Mirror the Helm chart's service definitions in Docker Compose so both deployment modes stay in sync.
|
||||
|
||||
## Architecture
|
||||
|
||||
The work does not change the platform's runtime architecture. It adds:
|
||||
|
||||
1. **New test files** in `tests/` for scheduler and ingestion unit tests.
|
||||
2. **Fixes** to existing test files and/or production code to resolve failures.
|
||||
3. **New service definitions** in `docker-compose.yml` using the existing `docker/Dockerfile` with `SERVICE_CMD` build args.
|
||||
4. **New documentation files** in `docs/` organized by topic.
|
||||
5. **Updated `README.md`** with a documentation index and Mermaid diagram.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Test Coverage (Reqs 1-4)"
|
||||
T1[tests/test_scheduler_unit.py]
|
||||
T2[tests/test_ingestion_unit.py]
|
||||
T3[Fix test_extractor_prompts.py]
|
||||
T4[Fix test_extractor_schemas.py]
|
||||
T5[Fix test_ollama_client.py]
|
||||
T6[Fix test_filings_adapter.py]
|
||||
end
|
||||
|
||||
subgraph "Docker (Req 5)"
|
||||
D1[docker-compose.yml<br/>+ 13 app services + frontend]
|
||||
end
|
||||
|
||||
subgraph "Documentation (Reqs 6-16)"
|
||||
DOC1[docs/services.md]
|
||||
DOC2[docs/api-reference.md]
|
||||
DOC3[docs/helm-reference.md]
|
||||
DOC4[docs/docker-deployment.md]
|
||||
DOC5[docs/architecture-kubernetes.md]
|
||||
DOC6[docs/architecture-docker-compose.md]
|
||||
DOC7[docs/architecture-data-pipeline.md]
|
||||
DOC8[docs/ai-agents.md]
|
||||
DOC9[docs/backup-restore.md]
|
||||
DOC10[docs/observability.md]
|
||||
DOC11[README.md update]
|
||||
end
|
||||
```
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### 1. Scheduler Unit Tests (Requirement 1)
|
||||
|
||||
**Target module**: `services/scheduler/app.py`
|
||||
|
||||
**Functions to test in isolation**:
|
||||
- `get_cadence_for_source(source_type, config)` — Returns polling interval from config or defaults.
|
||||
- `compute_backoff(retry_count)` — Exponential backoff with cap.
|
||||
- `is_source_due(...)` — Core scheduling logic: determines if a source needs polling based on last run status, timing, retry state.
|
||||
- `build_job_payload(source, aliases, now)` — Constructs the ingestion job dict.
|
||||
- `schedule_cycle(pool, rds)` — Full scheduling pass (mocked DB/Redis).
|
||||
- `check_rate_limit(rds, source_type, now)` — Rate limiting with per-type and global Polygon limits.
|
||||
- `recover_stale_documents(pool, rds)` — Re-enqueue orphaned parsed documents.
|
||||
- `retry_failed_extractions(pool, rds)` — Re-enqueue failed extractions.
|
||||
|
||||
**Mocking strategy**:
|
||||
- `asyncpg.Pool` → `AsyncMock` with `.fetch()`, `.fetchrow()`, `.fetchval()`, `.execute()` returning canned records.
|
||||
- `redis.asyncio.Redis` → `AsyncMock` with `.rpush()`, `.set()`, `.get()`, `.incr()`, `.expire()`, `.decr()`, `.delete()` tracking calls.
|
||||
- Use `unittest.mock.patch` for module-level imports where needed.
|
||||
|
||||
**Test file**: `tests/test_scheduler_unit.py`
|
||||
|
||||
### 2. Ingestion Unit Tests (Requirement 2)
|
||||
|
||||
**Target module**: `services/ingestion/worker.py`
|
||||
|
||||
**Functions to test**:
|
||||
- `process_job(job, pool, rds, minio_client, adapters)` — Main job processing with various adapter outcomes.
|
||||
- Error handling paths: adapter returns `AdapterResult(error=...)`, retry exhaustion, dead-letter routing.
|
||||
- Deduplication: content hash already seen in Redis, cross-source document dedup via `dedupe_items`.
|
||||
|
||||
**Mocking strategy**:
|
||||
- Adapters → `AsyncMock` returning `AdapterResult` with controlled `error`, `items`, `content_hash`, `raw_payload`.
|
||||
- `asyncpg.Pool` → `AsyncMock` for `ingestion_runs` INSERT/UPDATE, `persist_ingestion_items`, `record_retrieval_failure`.
|
||||
- `redis.asyncio.Redis` → `AsyncMock` for dedupe checks, queue pushes, DLQ routing.
|
||||
- `minio.Minio` → `MagicMock` for `upload_raw_artifact`.
|
||||
|
||||
**Test file**: `tests/test_ingestion_unit.py`
|
||||
|
||||
### 3. Extractor Test Fixes (Requirement 3)
|
||||
|
||||
**Target files**:
|
||||
- `tests/test_extractor_prompts.py`
|
||||
- `tests/test_extractor_schemas.py`
|
||||
- `tests/test_ollama_client.py`
|
||||
- `tests/test_filings_adapter.py`
|
||||
|
||||
**Approach**: Run each file individually, diagnose failures, and fix either the test setup (mock configuration, fixture data) or the production code. Preserve original test intent and assertions. If production code changes are needed, add regression tests.
|
||||
|
||||
### 4. Full Test Suite Green (Requirement 4)
|
||||
|
||||
**Verification**: Run `pytest tests/ -x --tb=short -q` and `ruff check services/` after all fixes. All existing `test_pbt_*` files must remain passing. Any production code fix must include a regression test.
|
||||
|
||||
### 5. Docker Compose Application Services (Requirement 5)
|
||||
|
||||
**Current state**: `docker-compose.yml` defines 7 infrastructure services (postgres, redis, minio, minio-init, ollama, trino, hive-metastore, superset).
|
||||
|
||||
**Addition**: 14 new service definitions (13 app services + frontend dashboard):
|
||||
|
||||
| Service | Image Build | Command | Port | Depends On |
|
||||
|---------|------------|---------|------|------------|
|
||||
| scheduler | `docker/Dockerfile.scheduler` | `python -m services.scheduler.app` | — | postgres, redis |
|
||||
| symbol-registry | `docker/Dockerfile` | `uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000` | 8001:8000 | postgres |
|
||||
| ingestion | `docker/Dockerfile` | `python -m services.ingestion.worker` | — | postgres, redis, minio |
|
||||
| parser | `docker/Dockerfile` | `python -m services.parser.worker` | — | postgres, redis |
|
||||
| extractor | `docker/Dockerfile` | `python -m services.extractor.main` | — | postgres, redis, ollama |
|
||||
| aggregation | `docker/Dockerfile` | `python -m services.aggregation.main` | — | postgres, redis |
|
||||
| recommendation | `docker/Dockerfile` | `python -m services.recommendation.main` | — | postgres, redis |
|
||||
| trading-engine | `docker/Dockerfile` | `uvicorn services.trading.app:app --host 0.0.0.0 --port 8000` | 8002:8000 | postgres, redis |
|
||||
| risk-engine | `docker/Dockerfile` | `uvicorn services.risk.app:app --host 0.0.0.0 --port 8000` | 8003:8000 | postgres |
|
||||
| broker-adapter | `docker/Dockerfile` | `python -m services.adapters.broker_service` | — | postgres, redis |
|
||||
| lake-publisher | `docker/Dockerfile` | `python -m services.lake_publisher.jobs` | — | postgres, minio |
|
||||
| query-api | `docker/Dockerfile` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` | 8004:8000 | postgres, redis, minio |
|
||||
| dashboard | `frontend/Dockerfile` | nginx (built-in) | 3000:8080 | query-api |
|
||||
|
||||
**Common environment block** (shared via `x-app-env` YAML anchor):
|
||||
```yaml
|
||||
POSTGRES_HOST: postgres
|
||||
POSTGRES_PORT: "5432"
|
||||
POSTGRES_DB: stonks
|
||||
POSTGRES_USER: stonks
|
||||
POSTGRES_PASSWORD: stonks_dev
|
||||
REDIS_HOST: redis
|
||||
REDIS_PORT: "6379"
|
||||
MINIO_ENDPOINT: minio:9000
|
||||
MINIO_ACCESS_KEY: minioadmin
|
||||
MINIO_SECRET_KEY: minioadmin
|
||||
OLLAMA_BASE_URL: http://ollama:11434
|
||||
```
|
||||
|
||||
**`.env` file support**: `MARKET_DATA_API_KEY`, `BROKER_API_KEY`, `BROKER_API_SECRET`, `BROKER_BASE_URL` loaded via `env_file: .env` on services that need them (ingestion, broker-adapter, trading-engine).
|
||||
|
||||
**Health checks**: FastAPI services use `curl -f http://localhost:8000/health`; workers use process liveness checks. Infrastructure `depends_on` uses `condition: service_healthy`.
|
||||
|
||||
### 6. Documentation Structure (Requirements 6–16)
|
||||
|
||||
All documentation files are Markdown in `docs/`. The structure:
|
||||
|
||||
```
|
||||
docs/
|
||||
├── services.md # Req 6: Per-service feature docs
|
||||
├── api-reference.md # Req 7: All 4 FastAPI API references
|
||||
├── helm-reference.md # Req 8: Helm chart values reference
|
||||
├── docker-deployment.md # Req 9: Docker deployment guide
|
||||
├── architecture-kubernetes.md # Req 10: K8s Mermaid diagram
|
||||
├── architecture-docker-compose.md # Req 11: Docker Compose Mermaid diagram
|
||||
├── architecture-data-pipeline.md # Req 12: Data pipeline Mermaid diagram
|
||||
├── ai-agents.md # Req 13: AI agent building guide
|
||||
├── backup-restore.md # Req 14: Backup and restore guide
|
||||
├── observability.md # Req 15: Observability & metrics reference
|
||||
├── LOCAL_DEV_SETUP.md # (existing)
|
||||
├── llm-to-trade-pipeline.md # (existing)
|
||||
└── notes/
|
||||
└── runbook.md # (existing)
|
||||
```
|
||||
|
||||
#### 6a. Service Feature Documentation (`docs/services.md`) — Req 6
|
||||
|
||||
For each of the 13 services, document:
|
||||
- **Purpose**: What the service does in the pipeline.
|
||||
- **Entry point**: Module path (e.g., `services.scheduler.app`).
|
||||
- **Configuration**: Environment variables from `services/shared/config.py` relevant to this service.
|
||||
- **Database tables**: Tables read/written by this service.
|
||||
- **Redis queues**: Queue names consumed from and published to (from `services/shared/redis_keys.py`).
|
||||
- **Queue message schema**: JSON structure of messages.
|
||||
- **Signal layers**: For aggregation/recommendation, document the three signal layers (company, macro, competitive), their toggles (`macro_enabled`, `competitive_enabled` in `risk_configs`), and weight configurations.
|
||||
- **Trading engine features**: For the trading service, document position sizing, circuit breakers, reserve pool, risk tier auto-adjustment, backtesting, and notification configuration.
|
||||
|
||||
Queue topology reference (from `redis_keys.py`):
|
||||
| Queue | Producer | Consumer |
|
||||
|-------|----------|----------|
|
||||
| `stonks:queue:ingestion` | scheduler | ingestion |
|
||||
| `stonks:queue:parsing` | ingestion | parser |
|
||||
| `stonks:queue:extraction` | parser | extractor |
|
||||
| `stonks:queue:macro_classification` | parser, scheduler | extractor |
|
||||
| `stonks:queue:aggregation` | extractor | aggregation |
|
||||
| `stonks:queue:recommendation` | aggregation | recommendation |
|
||||
| `stonks:queue:lake_publish` | various | lake-publisher |
|
||||
| `stonks:queue:broker_orders` | trading-engine, trading API | broker-adapter |
|
||||
| `stonks:queue:trading_decisions` | recommendation | trading-engine |
|
||||
|
||||
#### 6b. API Reference (`docs/api-reference.md`) — Req 7
|
||||
|
||||
Document all endpoints from the four FastAPI services by inspecting their route definitions:
|
||||
|
||||
**Query API** (`services/api/app.py`): ~40+ endpoints covering companies, documents, trends, recommendations, evidence drill-down, orders, positions, portfolio, global events, macro impacts, competitive signals, trend projections, agents, dead-letter queues, pipeline control, SQL explorer, saved queries, audit trail, DevOps metrics, and Prometheus metrics.
|
||||
|
||||
**Symbol Registry API** (`services/symbol_registry/app.py`): Companies CRUD, aliases, watchlists, sources, exposure profiles, competitor relationships, competitor inference.
|
||||
|
||||
**Trading API** (`services/trading/app.py`): Health/readiness, engine status, config update, pause/resume, reset, decisions audit, performance metrics/history, backtesting, notifications config/history, override orders, debug state.
|
||||
|
||||
**Risk API** (`services/risk/app.py`): Order evaluation (`POST /evaluate`), health, pending approvals, approval review, approval expiration.
|
||||
|
||||
For each endpoint: method, path, query parameters (type, default, constraints), request body schema, response schema, error codes (4xx/5xx).
|
||||
|
||||
#### 6c. Helm Chart Reference (`docs/helm-reference.md`) — Req 8
|
||||
|
||||
Document from `infra/helm/stonks-oracle/values.yaml`:
|
||||
- `image` block: registry, pullPolicy, tag
|
||||
- `pipelineEnabled`: toggle and effect on worker replicas
|
||||
- `services` block: per-service structure (replicas, image, command, tier, port, secrets, resources, probes)
|
||||
- `config` block: all ConfigMap environment variables with defaults and descriptions
|
||||
- `secrets` block: core, broker, market, gmail, dashboard — injection via `--set` flags
|
||||
- `ingress` block: className, clusterIssuer, host mappings
|
||||
- Analytics stack: trino, hiveMetastore, superset toggles and resources
|
||||
- `networkPolicies.enabled`: default-deny-ingress behavior
|
||||
- Value override files: `values-beta.yaml`, `values-paper.yaml` and their deployment stages
|
||||
|
||||
#### 6d. Docker Deployment Guide (`docs/docker-deployment.md`) — Req 9
|
||||
|
||||
- Complete service inventory with images, ports, volumes, environment variables
|
||||
- `.env` file format with all required/optional variables
|
||||
- Volume mounts and data persistence (pgdata, miniodata, ollama_models, hive_data, superset_data)
|
||||
- Health check configurations
|
||||
- Dockerfile build arguments (`SERVICE_CMD`)
|
||||
- Operational commands: start, stop, restart, logs, scale, reset (`docker compose down -v`)
|
||||
|
||||
#### 6e. Architecture Diagrams (Reqs 10–12)
|
||||
|
||||
**Kubernetes diagram** (`docs/architecture-kubernetes.md`):
|
||||
- `stonks-oracle` namespace with all 13 services grouped by tier (api, processing, trading, orchestration, analytics, frontend)
|
||||
- External cluster services in their namespaces (postgresql-service, redis-service, minio-service, ollama-service)
|
||||
- Traefik ingress routes to external domains
|
||||
- Network policy boundaries
|
||||
- Analytics plane (Trino, Hive Metastore, Superset)
|
||||
- Helm-managed secrets (core, broker, market, gmail) with consumer mapping
|
||||
- Service tier distinction (API with ingress, pipeline workers, trading)
|
||||
|
||||
**Docker Compose diagram** (`docs/architecture-docker-compose.md`):
|
||||
- All infrastructure + application containers
|
||||
- Host port mappings
|
||||
- `depends_on` relationships and health check dependencies
|
||||
- Named volumes and mount points
|
||||
- `.env` file providing API keys
|
||||
- Internal Docker network connectivity
|
||||
|
||||
**Data Pipeline diagram** (`docs/architecture-data-pipeline.md`):
|
||||
- External sources → ingestion → parsing → extraction → aggregation → recommendation → risk → trading → broker
|
||||
- Redis queue topology with queue names
|
||||
- Three signal layers as distinct paths merging at aggregation
|
||||
- Data stores at each stage (MinIO, PostgreSQL, Redis)
|
||||
- Trading engine decision loop
|
||||
- Analytical branch (lake publisher → MinIO/Parquet → Trino → Superset/Dashboard)
|
||||
- External integrations (Ollama, Alpaca, AWS SNS, Gmail)
|
||||
|
||||
#### 6f. AI Agent Guide (`docs/ai-agents.md`) — Req 13
|
||||
|
||||
- Three built-in agents: document-extractor, event-classifier, thesis-rewriter
|
||||
- Per-agent: purpose, input data, output schema, default model, system prompt structure, user prompt template
|
||||
- `ai_agents` table schema and registration (system-seeded vs API-created)
|
||||
- `agent_variants` table: create, activate, deactivate variants for A/B testing
|
||||
- `AgentConfigResolver` module: TTL cache (60s default), COALESCE-based variant override, fallback behavior
|
||||
- Performance logging: `agent_performance_log` table, querying for variant comparison
|
||||
- API endpoints: CRUD on `/api/agents`, test endpoint `/api/agents/{id}/test`
|
||||
- Step-by-step guide: creating a new variant with different model/prompt and activating it
|
||||
|
||||
#### 6g. Backup & Restore Guide (`docs/backup-restore.md`) — Req 14
|
||||
|
||||
Scripts in `scripts/`:
|
||||
- `backup-db.sh`: PostgreSQL dump, CLI args, storage location, retention (keeps last 7)
|
||||
- `restore-db.sh`: PostgreSQL restore, service scale-down/up, data loss implications
|
||||
- `backup-redis.sh`: Redis RDB snapshot backup
|
||||
- `backup.sh`: Combined backup (DB + Redis), `--upload-minio` option
|
||||
- `restore.sh`: Combined restore
|
||||
- Full nuke-and-rebuild procedure (connection termination, DB drop, Redis flush, redeploy, re-seed)
|
||||
- Recommended backup schedules and automation (cron, Kubernetes CronJobs)
|
||||
|
||||
#### 6h. Observability Reference (`docs/observability.md`) — Req 15
|
||||
|
||||
- `/metrics` endpoint on query-api, Prometheus scrape configuration
|
||||
- All metrics from `services/shared/metrics.py`:
|
||||
- **Ingestion**: `stonks_ingestion_jobs_total`, `stonks_ingestion_items_fetched_total`, `stonks_ingestion_items_new_total`, `stonks_ingestion_items_deduped_total`, `stonks_ingestion_errors_total`, `stonks_ingestion_adapter_duration_seconds`
|
||||
- **Parsing**: `stonks_parse_jobs_total`, `stonks_parse_quality_score`, `stonks_parse_low_quality_total`, `stonks_parse_duration_seconds`
|
||||
- **Extraction**: `stonks_extraction_jobs_total`, `stonks_extraction_attempts_total`, `stonks_extraction_retries_total`, `stonks_extraction_duration_seconds`, `stonks_extraction_confidence`, `stonks_extraction_validation_errors_total`, `stonks_extraction_tokens_total`
|
||||
- **Aggregation**: `stonks_aggregation_windows_total`, `stonks_aggregation_signals_total`, `stonks_aggregation_contradiction_score`, `stonks_aggregation_duration_seconds`
|
||||
- **Recommendation**: `stonks_recommendations_total`, `stonks_recommendations_suppressed_total`, `stonks_recommendation_confidence`
|
||||
- **Lake**: `stonks_lake_facts_published_total`, `stonks_lake_publish_duration_seconds`, `stonks_lake_publish_errors_total`, `stonks_lake_publish_bytes_total`
|
||||
- **Trading**: `stonks_orders_submitted_total`, `stonks_orders_rejected_total`, `stonks_orders_filled_total`, `stonks_orders_duplicates_prevented_total`, `stonks_risk_evaluations_total`, `stonks_risk_check_failures_total`, `stonks_positions_synced_total`
|
||||
- **Alerting**: `stonks_alerts_fired_total`, `stonks_alerts_resolved_total`, `stonks_alert_check_duration_seconds`, `stonks_alert_active`
|
||||
- **DLQ**: `stonks_dlq_items_total`, `stonks_dlq_replayed_total`, `stonks_dlq_depth`
|
||||
- **Active**: `stonks_active_jobs`
|
||||
- Alerting module (`services/shared/alerting.py`): 4 alert rules (source_failures, schema_failure_spike, analytical_lag, broker_issues), thresholds, evaluation windows, ConfigMap variables
|
||||
- Structured JSON logging format, trace context (trace_id, span_id)
|
||||
- Dead-letter queue system: queue names (`stonks:dlq:<queue>`), routing, replay tooling
|
||||
- Recommended Prometheus/Grafana queries
|
||||
|
||||
#### 6i. README Update — Req 16
|
||||
|
||||
- Add "Documentation" section with links to all docs
|
||||
- Replace ASCII architecture diagram with Mermaid or link to diagram docs
|
||||
- Preserve all existing content (license, features, tech stack, project structure, deployment)
|
||||
|
||||
## Data Models
|
||||
|
||||
No new database tables or schema changes are introduced. This initiative works with existing tables:
|
||||
|
||||
**Tables referenced in test coverage work**:
|
||||
- `sources`, `companies`, `company_aliases` — scheduler source polling
|
||||
- `ingestion_runs` — scheduler run tracking, ingestion job recording
|
||||
- `documents`, `document_company_mentions` — ingestion persistence, stale document recovery
|
||||
- `document_intelligence`, `document_impact_records` — extractor test fixtures
|
||||
- `model_performance_metrics` — extractor schema validation metrics
|
||||
|
||||
**Tables documented** (not modified):
|
||||
- All tables listed above plus `trend_windows`, `trend_history`, `trend_projections`, `recommendations`, `recommendation_evidence`, `risk_evaluations`, `orders`, `order_events`, `positions`, `portfolio_snapshots`, `trading_decisions`, `circuit_breaker_events`, `reserve_pool_ledger`, `risk_tier_history`, `backtest_runs`, `backtest_trades`, `notifications`, `global_events`, `macro_impact_records`, `exposure_profiles`, `competitor_relationships`, `competitive_signal_records`, `ai_agents`, `agent_variants`, `agent_performance_log`, `audit_events`, `watchlists`, `watchlist_members`, `retention_policies`, `market_snapshots`
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Test Coverage
|
||||
- **Mock failures**: Unit tests must verify that scheduler and ingestion services handle database/Redis connection failures gracefully (no crashes, proper logging).
|
||||
- **Adapter errors**: Ingestion unit tests must verify retry logic with exponential backoff and dead-letter queue routing after retry exhaustion.
|
||||
- **Test fix approach**: When fixing pre-existing failures, prefer fixing test setup over changing production code. If production code changes are needed, add regression tests to prevent re-introduction.
|
||||
|
||||
### Docker Compose
|
||||
- **Health check failures**: Application services use `depends_on` with `condition: service_healthy` to wait for infrastructure. Health checks have `interval`, `timeout`, `retries`, and `start_period` configured.
|
||||
- **Missing `.env` file**: Services that need API keys (ingestion, broker-adapter, trading-engine) will start but log warnings about missing keys. The platform runs in a degraded mode without external API access.
|
||||
- **Build failures**: Each service uses the same base Dockerfile with `SERVICE_CMD` build arg. Build errors are isolated per service.
|
||||
|
||||
### Documentation
|
||||
- **Stale documentation**: Documentation is generated from source code inspection. If the codebase changes after documentation is written, the docs may drift. The README links section serves as a single index to find and update docs.
|
||||
- **Diagram accuracy**: Mermaid diagrams are hand-authored based on current architecture. They should be updated when services are added or removed.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### PBT Applicability Assessment
|
||||
|
||||
Property-based testing is **NOT applicable** to this feature. The work consists of:
|
||||
1. **Unit tests for existing services** — These are example-based tests with mocked dependencies, not pure functions with universal properties.
|
||||
2. **Fixing pre-existing test failures** — Bug fixes to existing tests/code.
|
||||
3. **Docker Compose configuration** — Declarative infrastructure configuration.
|
||||
4. **Documentation** — Markdown files with no executable logic.
|
||||
|
||||
None of these involve new pure functions, parsers, serializers, or business logic where PBT would add value. The existing `test_pbt_*` files (22 files covering trading, aggregation, competitive intelligence, etc.) already provide PBT coverage for the platform's core logic and must remain passing.
|
||||
|
||||
### Unit Testing Strategy
|
||||
|
||||
**New test files**:
|
||||
- `tests/test_scheduler_unit.py` — 8+ test cases covering all scheduler pure functions and the `schedule_cycle` orchestration with mocked dependencies.
|
||||
- `tests/test_ingestion_unit.py` — 6+ test cases covering adapter error handling, retry logic, deduplication, and dead-letter queue routing.
|
||||
|
||||
**Test fix files** (existing, to be repaired):
|
||||
- `tests/test_extractor_prompts.py`
|
||||
- `tests/test_extractor_schemas.py`
|
||||
- `tests/test_ollama_client.py`
|
||||
- `tests/test_filings_adapter.py`
|
||||
|
||||
**Test framework**: pytest + pytest-asyncio (already configured in the project).
|
||||
|
||||
**Mocking approach**: `unittest.mock.AsyncMock` for async dependencies, `unittest.mock.MagicMock` for sync dependencies, `unittest.mock.patch` for module-level state.
|
||||
|
||||
### Verification Criteria
|
||||
|
||||
1. `pytest tests/ -x --tb=short -q` → zero failures
|
||||
2. `ruff check services/` → zero violations
|
||||
3. All 22 existing `test_pbt_*` files pass unchanged
|
||||
4. `docker compose config` validates the updated docker-compose.yml
|
||||
5. All documentation files render valid Markdown with working internal links
|
||||
@@ -0,0 +1,236 @@
|
||||
# Requirements Document
|
||||
|
||||
## Introduction
|
||||
|
||||
This initiative covers three pillars for the Stonks Oracle platform: (1) closing unit test coverage gaps across all 13 services, fixing pre-existing test failures, and ensuring every feature has proper automated tests; (2) updating the Docker Compose deployment to include all application services so users can run the full platform without Kubernetes; and (3) producing comprehensive documentation covering every feature, all API endpoints, Helm chart configuration, Docker deployment options, and three Mermaid architecture diagrams (Kubernetes deployment, Docker Compose deployment, and data pipeline), with the README updated to link to all resources.
|
||||
|
||||
## Glossary
|
||||
|
||||
- **Test_Suite**: The collection of pytest unit tests, property-based tests, and integration tests in the `tests/` directory
|
||||
- **Docker_Compose_Stack**: The `docker-compose.yml` file and associated Dockerfiles that define the local development environment
|
||||
- **Helm_Chart**: The Kubernetes deployment configuration at `infra/helm/stonks-oracle/` including `values.yaml`, value overrides, and templates
|
||||
- **Query_API**: The FastAPI REST service at `services/api/app.py` serving analytics and dashboard queries
|
||||
- **Symbol_Registry_API**: The FastAPI REST service at `services/symbol_registry/app.py` managing companies, watchlists, sources, exposure profiles, and competitor relationships
|
||||
- **Trading_API**: The FastAPI REST service at `services/trading/app.py` controlling the autonomous trading engine
|
||||
- **Risk_API**: The FastAPI REST service at `services/risk/app.py` evaluating order risk and managing approval workflows
|
||||
- **Scheduler_Service**: The service at `services/scheduler/` that triggers ingestion cycles on a cadence
|
||||
- **Ingestion_Service**: The queue worker at `services/ingestion/` that fetches market data, news, filings, and macro events
|
||||
- **Extractor_Service**: The queue worker at `services/extractor/` that performs LLM-based intelligence extraction and event classification
|
||||
- **Documentation_Set**: The collection of Markdown files in `docs/` that describe features, APIs, deployment, and architecture
|
||||
- **Architecture_Diagram**: A Mermaid-syntax diagram showing services, data stores, external integrations, and data flow. Three diagrams are produced: Kubernetes deployment, Docker Compose deployment, and data pipeline
|
||||
- **README**: The root `README.md` file serving as the project entry point
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Scheduler Service Unit Tests
|
||||
|
||||
**User Story:** As a developer, I want the scheduler service to have dedicated unit tests, so that scheduling logic, cadence management, and source polling behavior are verified independently of integration tests.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Test_Suite is executed for the scheduler module, THE Test_Suite SHALL include unit tests covering job enqueue logic, polling interval calculation, and source due-date evaluation
|
||||
2. WHEN a scheduler unit test is run, THE Test_Suite SHALL mock all external dependencies (PostgreSQL, Redis) and test scheduling logic in isolation
|
||||
3. THE Test_Suite SHALL verify that the scheduler correctly enqueues ingestion jobs for sources whose polling interval has elapsed
|
||||
4. IF a database or Redis connection fails during scheduling, THEN THE Test_Suite SHALL verify that the Scheduler_Service handles the error without crashing
|
||||
|
||||
### Requirement 2: Ingestion Service Unit Tests
|
||||
|
||||
**User Story:** As a developer, I want the ingestion service to have unit tests for adapter error handling and retry logic, so that data fetching resilience is verified beyond integration tests.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Test_Suite is executed for the ingestion module, THE Test_Suite SHALL include unit tests covering adapter error handling, retry logic, and deduplication behavior
|
||||
2. WHEN an external API returns an error response, THE Test_Suite SHALL verify that the Ingestion_Service retries according to the configured backoff policy
|
||||
3. WHEN a duplicate content hash is detected, THE Test_Suite SHALL verify that the Ingestion_Service skips re-processing the document
|
||||
4. IF all retry attempts are exhausted, THEN THE Test_Suite SHALL verify that the Ingestion_Service routes the failed job to the dead-letter queue
|
||||
|
||||
### Requirement 3: Extractor Test Failure Fixes
|
||||
|
||||
**User Story:** As a developer, I want the pre-existing test failures in the extractor module to be resolved, so that the full test suite passes cleanly in CI.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_extractor_prompts.py` without failures
|
||||
2. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_extractor_schemas.py` without failures
|
||||
3. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_ollama_client.py` without failures
|
||||
4. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_filings_adapter.py` without failures
|
||||
5. THE Test_Suite SHALL maintain the original test intent and assertions when fixing failures, modifying only the code under test or test setup as needed
|
||||
|
||||
### Requirement 4: Full Test Suite Green Status
|
||||
|
||||
**User Story:** As a developer, I want the entire test suite to pass, so that CI builds succeed and regressions are caught immediately.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN `pytest tests/ -x --tb=short -q` is executed, THE Test_Suite SHALL report zero failures across all test files
|
||||
2. WHEN `ruff check services/` is executed, THE Test_Suite SHALL report zero lint violations
|
||||
3. THE Test_Suite SHALL maintain all existing property-based tests (files prefixed `test_pbt_*`) in a passing state
|
||||
4. IF a test fix requires modifying production code, THEN THE Test_Suite SHALL include a regression test that validates the fix
|
||||
|
||||
### Requirement 5: Docker Compose Application Services
|
||||
|
||||
**User Story:** As a developer using Docker instead of Kubernetes, I want docker-compose.yml to include all 13 application services and the frontend, so that I can run the full platform locally with a single `docker compose up`.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Docker_Compose_Stack SHALL define service containers for all 13 application services: scheduler, symbol-registry, ingestion, parser, extractor, aggregation, recommendation, trading-engine, risk-engine, broker-adapter, lake-publisher, query-api, and dashboard
|
||||
2. THE Docker_Compose_Stack SHALL define a frontend container serving the React dashboard via nginx on port 8080
|
||||
3. WHEN `docker compose up` is executed, THE Docker_Compose_Stack SHALL start all infrastructure services (PostgreSQL, Redis, MinIO, Ollama, Trino, Hive Metastore, Superset) before application services using dependency ordering
|
||||
4. WHEN an application service container starts, THE Docker_Compose_Stack SHALL provide health checks that verify the service is ready to accept requests
|
||||
5. THE Docker_Compose_Stack SHALL configure environment variables for each service matching the defaults documented in `docs/LOCAL_DEV_SETUP.md`, with infrastructure hostnames pointing to Docker Compose service names
|
||||
6. THE Docker_Compose_Stack SHALL allow users to provide API keys (MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET) via a `.env` file without modifying docker-compose.yml
|
||||
7. IF an infrastructure dependency (PostgreSQL, Redis) is not yet healthy, THEN THE Docker_Compose_Stack SHALL delay application service startup using `depends_on` with `condition: service_healthy`
|
||||
|
||||
### Requirement 6: Service Feature Documentation
|
||||
|
||||
**User Story:** As a user or contributor, I want every service documented with its purpose, configuration, queue interactions, and database tables, so that I can understand how each part of the platform works.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a dedicated document for each of the 13 services describing its purpose, inputs, outputs, configuration environment variables, and database tables used
|
||||
2. WHEN a service consumes from or publishes to a Redis queue, THE Documentation_Set SHALL document the queue name, message schema, and processing behavior
|
||||
3. WHEN a service exposes HTTP endpoints, THE Documentation_Set SHALL reference the API documentation for that service
|
||||
4. THE Documentation_Set SHALL describe the three signal layers (company, macro, competitive) with their data flow, toggle mechanisms, and weight configurations
|
||||
5. THE Documentation_Set SHALL document the trading engine features including position sizing, circuit breakers, reserve pool management, risk tier auto-adjustment, backtesting, and notification configuration
|
||||
|
||||
### Requirement 7: API Reference Documentation
|
||||
|
||||
**User Story:** As a developer integrating with Stonks Oracle, I want a complete API reference for all four FastAPI services, so that I know every endpoint, its parameters, request/response schemas, and error codes.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Query_API, including path, method, query parameters, response schema, and error codes
|
||||
2. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Symbol_Registry_API, including CRUD operations for companies, aliases, watchlists, sources, exposure profiles, and competitor relationships
|
||||
3. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Trading_API, including engine control, decision audit, performance metrics, backtesting, notifications, and manual override orders
|
||||
4. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Risk_API, including order evaluation, approval workflow, and approval expiration
|
||||
5. WHEN an endpoint accepts query parameters or a request body, THE Documentation_Set SHALL document each parameter with its type, default value, and constraints
|
||||
6. WHEN an endpoint returns an error, THE Documentation_Set SHALL document the HTTP status code and error response format
|
||||
|
||||
### Requirement 8: Helm Chart Configuration Reference
|
||||
|
||||
**User Story:** As an operator deploying Stonks Oracle on Kubernetes, I want a complete reference for all Helm chart values, so that I can configure services, resources, secrets, ingress, network policies, and analytics components.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Helm configuration reference documenting every key in `values.yaml` with its type, default value, and description
|
||||
2. THE Documentation_Set SHALL document the `services` block structure including replicas, image, command, tier, port, secrets, resources, and probes for each service
|
||||
3. THE Documentation_Set SHALL document the `config` block with all ConfigMap environment variables, their defaults, and what they control
|
||||
4. THE Documentation_Set SHALL document the `secrets` block structure (core, broker, market, gmail, dashboard) and how secrets are injected via `--set` flags during deployment
|
||||
5. THE Documentation_Set SHALL document the `ingress` block including className, clusterIssuer, and host mappings
|
||||
6. THE Documentation_Set SHALL document the analytics stack toggles (trino.enabled, hiveMetastore.enabled, superset.enabled) and their resource configurations
|
||||
7. THE Documentation_Set SHALL document the `pipelineEnabled` toggle and its effect on worker service replicas
|
||||
8. THE Documentation_Set SHALL document the `networkPolicies.enabled` toggle and the default-deny-ingress behavior
|
||||
9. THE Documentation_Set SHALL document the value override files (`values-beta.yaml`, `values-paper.yaml`) and their intended deployment stages
|
||||
|
||||
### Requirement 9: Docker Deployment Guide
|
||||
|
||||
**User Story:** As a developer deploying with Docker Compose, I want a guide explaining all Docker deployment options, environment variables, volume mounts, and operational commands, so that I can run and manage the platform without Kubernetes.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Docker deployment guide documenting every service defined in docker-compose.yml with its image, ports, volumes, and environment variables
|
||||
2. THE Documentation_Set SHALL document the `.env` file format with all required and optional environment variables, their defaults, and descriptions
|
||||
3. THE Documentation_Set SHALL document volume mounts and data persistence behavior, including how to reset data with `docker compose down -v`
|
||||
4. THE Documentation_Set SHALL document health check configurations and how to verify all services are running
|
||||
5. THE Documentation_Set SHALL document the Dockerfile build arguments (SERVICE_CMD) and how to build custom service images
|
||||
6. THE Documentation_Set SHALL document operational commands for starting, stopping, restarting individual services, viewing logs, and scaling replicas
|
||||
|
||||
### Requirement 10: Kubernetes Architecture Diagram
|
||||
|
||||
**User Story:** As an operator deploying on Kubernetes, I want a Mermaid diagram showing how Stonks Oracle runs in a K8s cluster, so that I can understand the deployment topology, networking, and infrastructure dependencies.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Mermaid diagram showing all 13 application services deployed as Kubernetes Deployments within the `stonks-oracle` namespace
|
||||
2. THE diagram SHALL show external cluster services (PostgreSQL, Redis, MinIO, Ollama) in their respective namespaces with cross-namespace service references
|
||||
3. THE diagram SHALL show Traefik ingress routes mapping external domains to internal services (stonks.celestium.life → dashboard, stonks-api.celestium.life → query-api, etc.)
|
||||
4. THE diagram SHALL show network policy boundaries indicating which services can communicate with each other
|
||||
5. THE diagram SHALL show the analytics plane (Trino, Hive Metastore, Superset) deployed within the stonks-oracle namespace and their connections to MinIO
|
||||
6. THE diagram SHALL show Helm-managed secrets (core, broker, market, gmail) and which services consume them
|
||||
7. THE diagram SHALL distinguish between API-tier services (with ingress), pipeline-tier workers (queue-driven), and trading-tier services
|
||||
|
||||
### Requirement 11: Docker Compose Architecture Diagram
|
||||
|
||||
**User Story:** As a developer running the platform locally with Docker Compose, I want a Mermaid diagram showing how all containers are wired together, so that I can understand port mappings, volume mounts, and service dependencies.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Mermaid diagram showing all infrastructure containers (PostgreSQL, Redis, MinIO, Ollama, Trino, Hive Metastore, Superset) and all 13 application service containers as defined in docker-compose.yml
|
||||
2. THE diagram SHALL show host port mappings for externally accessible services (PostgreSQL:5432, Redis:6379, MinIO:9000/9001, Ollama:11434, Trino:8080, Superset:8088, Dashboard:8080, Query API:8000)
|
||||
3. THE diagram SHALL show Docker Compose `depends_on` relationships and health check dependencies between infrastructure and application services
|
||||
4. THE diagram SHALL show named volumes (pgdata, miniodata, ollama_models, hive_data, superset_data) and which containers mount them
|
||||
5. THE diagram SHALL show the `.env` file providing API keys (MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET) to relevant service containers
|
||||
6. THE diagram SHALL show internal Docker network connectivity between containers using Docker Compose service names as hostnames
|
||||
|
||||
### Requirement 12: Data Pipeline Architecture Diagram
|
||||
|
||||
**User Story:** As a user or contributor, I want a Mermaid diagram showing the end-to-end data pipeline from external data sources through signal processing to trade execution, so that I can understand how data flows through the system.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Mermaid diagram showing the complete data pipeline from external sources (Polygon.io, news APIs, SEC filings, macro news sources) through ingestion, parsing, extraction, aggregation, recommendation, risk evaluation, and trade execution
|
||||
2. THE diagram SHALL show the Redis queue topology connecting pipeline stages (ingestion → parsing → extraction → aggregation → recommendation → broker) with queue names
|
||||
3. THE diagram SHALL show the three signal layers (company, macro, competitive) as distinct processing paths that merge in the aggregation stage
|
||||
4. THE diagram SHALL show data stores at each stage: MinIO for raw artifacts, PostgreSQL for structured data, Redis for queues and caching
|
||||
5. THE diagram SHALL show the trading engine decision loop: recommendation polling → position sizing → risk evaluation → order execution → broker submission → fill tracking
|
||||
6. THE diagram SHALL show the analytical branch: lake publisher writing Parquet fact tables to MinIO, queryable via Trino, visualized in Superset and the React dashboard
|
||||
7. THE diagram SHALL show external integrations at their connection points: Ollama for LLM extraction, Alpaca for trade execution, AWS SNS and Gmail for notifications
|
||||
|
||||
### Requirement 13: AI Agent Building Guide
|
||||
|
||||
**User Story:** As a user or contributor, I want a guide explaining how each of the three AI agents works — document extractor, event classifier, and thesis rewriter — including how to configure them, create variants, tune prompts, and monitor performance, so that I can customize and extend the AI capabilities of the platform.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include an AI agent guide documenting the three built-in agents: `document-extractor` (structured intelligence extraction from news/filings), `event-classifier` (macro/geopolitical event classification), and `thesis-rewriter` (LLM-enhanced recommendation thesis generation)
|
||||
2. FOR each agent, THE Documentation_Set SHALL document its purpose, input data, output schema, default model, system prompt structure, and user prompt template
|
||||
3. THE Documentation_Set SHALL document the `ai_agents` database table schema and how agents are registered (system-seeded vs user-created via the API)
|
||||
4. THE Documentation_Set SHALL document the `agent_variants` table and how to create, activate, and deactivate variants for A/B testing different models or prompts
|
||||
5. THE Documentation_Set SHALL document the `AgentConfigResolver` module including the TTL cache (60-second default), COALESCE-based variant override logic, and fallback behavior when no DB config exists
|
||||
6. THE Documentation_Set SHALL document the agent performance logging system and how to query `agent_performance_log` to compare variant effectiveness
|
||||
7. THE Documentation_Set SHALL document the API endpoints for managing agents (CRUD on `/api/agents`) and testing agent configurations (`/api/agents/{id}/test`)
|
||||
8. THE Documentation_Set SHALL include a step-by-step guide for creating a new agent variant with a different model or prompt and activating it for live traffic
|
||||
|
||||
### Requirement 14: Backup and Restore Guide
|
||||
|
||||
**User Story:** As an operator, I want a guide documenting all backup and restore scripts, their options, storage locations, and retention policies, so that I can protect data and recover from failures.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a backup and restore guide documenting every script in `scripts/` related to backup and restore: `backup-db.sh`, `restore-db.sh`, `backup-redis.sh`, `backup.sh`, and `restore.sh`
|
||||
2. FOR each backup script, THE Documentation_Set SHALL document its CLI arguments, what data it captures, where backups are stored, and retention/pruning behavior (e.g., keeps last 7)
|
||||
3. FOR each restore script, THE Documentation_Set SHALL document its CLI arguments, what it restores, the service scale-down/scale-up procedure it performs, and any data loss implications
|
||||
4. THE Documentation_Set SHALL document the MinIO upload option (`--upload-minio`) for off-host backup storage
|
||||
5. THE Documentation_Set SHALL document the full database nuke and rebuild procedure including connection termination, database drop, Redis flush, redeploy, and re-seed steps
|
||||
6. THE Documentation_Set SHALL document recommended backup schedules and how to automate backups via cron or Kubernetes CronJobs
|
||||
|
||||
### Requirement 15: Observability and Prometheus Metrics Reference
|
||||
|
||||
**User Story:** As an operator, I want a reference documenting all Prometheus metrics exposed by the platform, the alerting rules, and how to monitor pipeline health, so that I can set up dashboards and respond to incidents.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include an observability reference documenting the `/metrics` endpoint on the query API and how to configure Prometheus to scrape it
|
||||
2. THE Documentation_Set SHALL document all Prometheus counters, gauges, and histograms emitted by each service, including metric name, labels, and what they measure (e.g., `EXTRACTION_ATTEMPTS`, `EXTRACTION_DURATION`, `AGGREGATION_WINDOWS_COMPUTED`, `AGGREGATION_SIGNALS_PROCESSED`, `RECOMMENDATION_GENERATED`, `RECOMMENDATION_CONFIDENCE`, alerting counters)
|
||||
3. THE Documentation_Set SHALL document the alerting module (`services/shared/alerting.py`) including all alert rules, their thresholds, evaluation windows, and the ConfigMap environment variables that control them (`ALERT_SOURCE_FAILURE_THRESHOLD`, `ALERT_SCHEMA_FAILURE_RATE_THRESHOLD`, `ALERT_LAKE_LAG_THRESHOLD_MINUTES`, `ALERT_BROKER_ERROR_THRESHOLD`, etc.)
|
||||
4. THE Documentation_Set SHALL document the structured JSON logging format, trace context propagation (trace_id, span_id), and how to query logs for debugging pipeline issues
|
||||
5. THE Documentation_Set SHALL document the dead-letter queue system including queue names, how failed jobs are routed there, and how to replay them using the dead-letter tooling
|
||||
6. THE Documentation_Set SHALL document recommended Prometheus/Grafana dashboard configurations or queries for monitoring ingestion throughput, extraction latency, aggregation volume, recommendation generation rate, and trading engine activity
|
||||
|
||||
### Requirement 16: README Resource Links
|
||||
|
||||
**User Story:** As a user landing on the repository, I want the README to link to all documentation resources, so that I can navigate to any guide, reference, or diagram from a single entry point.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the README is updated, THE README SHALL include a documentation section with links to every document in the Documentation_Set
|
||||
2. THE README SHALL link to the API reference documents for all four FastAPI services
|
||||
3. THE README SHALL link to the Helm chart configuration reference
|
||||
4. THE README SHALL link to the Docker deployment guide
|
||||
5. THE README SHALL link to all three architecture diagram documents (Kubernetes, Docker Compose, and Data Pipeline)
|
||||
6. THE README SHALL link to the per-service feature documentation
|
||||
7. THE README SHALL link to the AI agent building guide
|
||||
8. THE README SHALL link to the backup and restore guide
|
||||
9. THE README SHALL link to the observability and Prometheus metrics reference
|
||||
10. THE README SHALL replace the existing ASCII architecture diagram with the Mermaid architecture diagram or link to it
|
||||
11. THE README SHALL preserve all existing content (license, features, tech stack, project structure, deployment instructions) while adding the new documentation links
|
||||
@@ -0,0 +1,223 @@
|
||||
# Implementation Plan: Comprehensive Quality & Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
This plan implements three pillars for the Stonks Oracle platform: (1) unit test coverage for the scheduler and ingestion services plus fixing pre-existing test failures, (2) extending docker-compose.yml with all 13 application services and the frontend, and (3) producing comprehensive documentation covering services, APIs, Helm configuration, Docker deployment, architecture diagrams, AI agents, backup/restore, observability, and README resource links. Tasks are ordered so tests come first (catch regressions early), then Docker Compose (infrastructure), then documentation (references verified code).
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 1. Write scheduler service unit tests
|
||||
- [x] 1.1 Create `tests/test_scheduler_unit.py` with unit tests for scheduler pure functions and orchestration
|
||||
- Import scheduler functions from `services/scheduler/app.py`
|
||||
- Mock `asyncpg.Pool` (`.fetch()`, `.fetchrow()`, `.fetchval()`, `.execute()`) and `redis.asyncio.Redis` (`.rpush()`, `.set()`, `.get()`, `.incr()`, `.expire()`, `.decr()`, `.delete()`)
|
||||
- Write 8+ test cases covering: `get_cadence_for_source`, `compute_backoff`, `is_source_due`, `build_job_payload`, `schedule_cycle` (mocked DB/Redis), `check_rate_limit`, `recover_stale_documents`, `retry_failed_extractions`
|
||||
- Verify error handling: DB/Redis connection failures handled without crashing
|
||||
- Use `pytest-asyncio` for async test functions, `unittest.mock.AsyncMock` and `unittest.mock.patch`
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4_
|
||||
|
||||
- [x] 1.2 Write additional edge-case unit tests for scheduler
|
||||
- Test boundary conditions: zero polling interval, max retry count, empty source list
|
||||
- Test rate limiting edge cases: global Polygon limit, per-type limits
|
||||
- _Requirements: 1.3, 1.4_
|
||||
|
||||
- [x] 2. Write ingestion service unit tests
|
||||
- [x] 2.1 Create `tests/test_ingestion_unit.py` with unit tests for ingestion worker
|
||||
- Import ingestion functions from `services/ingestion/worker.py`
|
||||
- Mock adapters as `AsyncMock` returning `AdapterResult` with controlled `error`, `items`, `content_hash`, `raw_payload`
|
||||
- Mock `asyncpg.Pool` for `ingestion_runs` INSERT/UPDATE, `persist_ingestion_items`, `record_retrieval_failure`
|
||||
- Mock `redis.asyncio.Redis` for dedupe checks, queue pushes, DLQ routing
|
||||
- Mock `minio.Minio` for `upload_raw_artifact`
|
||||
- Write 6+ test cases covering: successful job processing, adapter error with retry, retry exhaustion → dead-letter queue, content hash deduplication skip, cross-source dedup via `dedupe_items`, error handling paths
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4_
|
||||
|
||||
- [x] 2.2 Write additional edge-case unit tests for ingestion
|
||||
- Test empty adapter response, partial failures, multiple items in single job
|
||||
- _Requirements: 2.1, 2.4_
|
||||
|
||||
- [x] 3. Checkpoint — Verify new unit tests pass
|
||||
- Run `pytest tests/test_scheduler_unit.py tests/test_ingestion_unit.py -x --tb=short -q`
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 4. Fix pre-existing test failures
|
||||
- [x] 4.1 Fix `tests/test_extractor_prompts.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup (mock configuration, fixture data) or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- If production code changes are needed, add regression tests
|
||||
- _Requirements: 3.1, 3.5_
|
||||
|
||||
- [x] 4.2 Fix `tests/test_extractor_schemas.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- _Requirements: 3.2, 3.5_
|
||||
|
||||
- [x] 4.3 Fix `tests/test_ollama_client.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- _Requirements: 3.3, 3.5_
|
||||
|
||||
- [x] 4.4 Fix `tests/test_filings_adapter.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- _Requirements: 3.4, 3.5_
|
||||
|
||||
- [x] 5. Checkpoint — Full test suite green
|
||||
- Run `pytest tests/ -x --tb=short -q` and verify zero failures
|
||||
- Run `ruff check services/` and verify zero violations
|
||||
- Verify all `test_pbt_*` files pass unchanged
|
||||
- If any production code was modified, confirm regression tests exist
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4_
|
||||
|
||||
- [x] 6. Add application services to docker-compose.yml
|
||||
- [x] 6.1 Add shared environment anchor and all 14 service definitions to `docker-compose.yml`
|
||||
- Define `x-app-env` YAML anchor with common environment variables (POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD, REDIS_HOST, REDIS_PORT, MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, OLLAMA_BASE_URL)
|
||||
- Add 13 application service definitions: scheduler (using `docker/Dockerfile.scheduler`), symbol-registry, ingestion, parser, extractor, aggregation, recommendation, trading-engine, risk-engine, broker-adapter, lake-publisher, query-api — each using `docker/Dockerfile` with appropriate `SERVICE_CMD` build arg
|
||||
- Add dashboard service using `frontend/Dockerfile` on port 3000:8080
|
||||
- Configure `depends_on` with `condition: service_healthy` for infrastructure dependencies
|
||||
- Add health checks: FastAPI services use `curl -f http://localhost:8000/health`, workers use process liveness
|
||||
- Configure `env_file: .env` on services needing API keys (ingestion, broker-adapter, trading-engine)
|
||||
- Map host ports: symbol-registry:8001, trading-engine:8002, risk-engine:8003, query-api:8004, dashboard:3000
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7_
|
||||
|
||||
- [x] 6.2 Validate docker-compose.yml configuration
|
||||
- Run `docker compose config` to verify the updated file parses correctly
|
||||
- _Requirements: 5.1_
|
||||
|
||||
- [x] 7. Checkpoint — Tests and Docker Compose validated
|
||||
- Run `pytest tests/ -x --tb=short -q` to confirm no regressions
|
||||
- Run `docker compose config` to confirm valid YAML
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 8. Write per-service feature documentation
|
||||
- [x] 8.1 Create `docs/services.md` documenting all 13 services
|
||||
- For each service: purpose, entry point module path, configuration environment variables, database tables read/written, Redis queues consumed/published with message schemas
|
||||
- Include queue topology table (queue name → producer → consumer)
|
||||
- Document the three signal layers (company, macro, competitive) with data flow, toggles, and weight configurations
|
||||
- Document trading engine features: position sizing, circuit breakers, reserve pool, risk tier auto-adjustment, backtesting, notifications
|
||||
- Cross-reference API documentation for services with HTTP endpoints
|
||||
- _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_
|
||||
|
||||
- [x] 9. Write API reference documentation
|
||||
- [x] 9.1 Create `docs/api-reference.md` covering all four FastAPI services
|
||||
- Document all Query API endpoints (~40+): path, method, query parameters (type, default, constraints), request body schema, response schema, error codes
|
||||
- Document all Symbol Registry API endpoints: companies CRUD, aliases, watchlists, sources, exposure profiles, competitor relationships, competitor inference
|
||||
- Document all Trading API endpoints: health/readiness, engine status, config update, pause/resume, reset, decisions audit, performance metrics/history, backtesting, notifications config/history, override orders, debug state
|
||||
- Document all Risk API endpoints: order evaluation (POST /evaluate), health, pending approvals, approval review, approval expiration
|
||||
- Inspect actual route definitions in `services/api/app.py`, `services/symbol_registry/app.py`, `services/trading/app.py`, `services/risk/app.py`
|
||||
- _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6_
|
||||
|
||||
- [x] 10. Write Helm chart configuration reference
|
||||
- [x] 10.1 Create `docs/helm-reference.md` documenting all Helm values
|
||||
- Document `image` block: registry, pullPolicy, tag
|
||||
- Document `pipelineEnabled` toggle and effect on worker replicas
|
||||
- Document `services` block: per-service structure (replicas, image, command, tier, port, secrets, resources, probes)
|
||||
- Document `config` block: all ConfigMap environment variables with defaults and descriptions
|
||||
- Document `secrets` block: core, broker, market, gmail, dashboard — injection via `--set` flags
|
||||
- Document `ingress` block: className, clusterIssuer, host mappings
|
||||
- Document analytics stack toggles: trino.enabled, hiveMetastore.enabled, superset.enabled with resources
|
||||
- Document `networkPolicies.enabled` and default-deny-ingress behavior
|
||||
- Document value override files: `values-beta.yaml`, `values-paper.yaml` and deployment stages
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9_
|
||||
|
||||
- [x] 11. Write Docker deployment guide
|
||||
- [x] 11.1 Create `docs/docker-deployment.md` with complete Docker deployment guide
|
||||
- Document every service with image, ports, volumes, environment variables
|
||||
- Document `.env` file format with all required/optional variables, defaults, descriptions
|
||||
- Document volume mounts and data persistence (pgdata, miniodata, ollama_models, hive_data, superset_data), reset with `docker compose down -v`
|
||||
- Document health check configurations and verification commands
|
||||
- Document Dockerfile build arguments (`SERVICE_CMD`) and custom image builds
|
||||
- Document operational commands: start, stop, restart, logs, scale, reset
|
||||
- _Requirements: 9.1, 9.2, 9.3, 9.4, 9.5, 9.6_
|
||||
|
||||
- [x] 12. Checkpoint — Documentation progress check
|
||||
- Verify `docs/services.md`, `docs/api-reference.md`, `docs/helm-reference.md`, `docs/docker-deployment.md` exist and render valid Markdown
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 13. Write architecture diagrams
|
||||
- [x] 13.1 Create `docs/architecture-kubernetes.md` with Kubernetes deployment Mermaid diagram
|
||||
- Show all 13 services in `stonks-oracle` namespace grouped by tier (api, processing, trading, orchestration, analytics, frontend)
|
||||
- Show external cluster services (PostgreSQL, Redis, MinIO, Ollama) in their namespaces
|
||||
- Show Traefik ingress routes to external domains
|
||||
- Show network policy boundaries
|
||||
- Show analytics plane (Trino, Hive Metastore, Superset) and MinIO connections
|
||||
- Show Helm-managed secrets (core, broker, market, gmail) with consumer mapping
|
||||
- Distinguish API-tier (with ingress), pipeline-tier (queue-driven), and trading-tier services
|
||||
- _Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7_
|
||||
|
||||
- [x] 13.2 Create `docs/architecture-docker-compose.md` with Docker Compose Mermaid diagram
|
||||
- Show all infrastructure + application containers
|
||||
- Show host port mappings for externally accessible services
|
||||
- Show `depends_on` relationships and health check dependencies
|
||||
- Show named volumes and mount points
|
||||
- Show `.env` file providing API keys to relevant containers
|
||||
- Show internal Docker network connectivity
|
||||
- _Requirements: 11.1, 11.2, 11.3, 11.4, 11.5, 11.6_
|
||||
|
||||
- [x] 13.3 Create `docs/architecture-data-pipeline.md` with data pipeline Mermaid diagram
|
||||
- Show complete pipeline: external sources → ingestion → parsing → extraction → aggregation → recommendation → risk → trading → broker
|
||||
- Show Redis queue topology with queue names
|
||||
- Show three signal layers as distinct paths merging at aggregation
|
||||
- Show data stores at each stage (MinIO, PostgreSQL, Redis)
|
||||
- Show trading engine decision loop
|
||||
- Show analytical branch: lake publisher → MinIO/Parquet → Trino → Superset/Dashboard
|
||||
- Show external integrations: Ollama, Alpaca, AWS SNS, Gmail
|
||||
- _Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7_
|
||||
|
||||
- [x] 14. Write AI agent building guide
|
||||
- [x] 14.1 Create `docs/ai-agents.md` with AI agent guide
|
||||
- Document three built-in agents: document-extractor, event-classifier, thesis-rewriter — purpose, input data, output schema, default model, system prompt structure, user prompt template
|
||||
- Document `ai_agents` table schema and registration (system-seeded vs API-created)
|
||||
- Document `agent_variants` table: create, activate, deactivate variants for A/B testing
|
||||
- Document `AgentConfigResolver` module: TTL cache (60s), COALESCE-based variant override, fallback behavior
|
||||
- Document performance logging: `agent_performance_log` table, querying for variant comparison
|
||||
- Document API endpoints: CRUD on `/api/agents`, test endpoint `/api/agents/{id}/test`
|
||||
- Include step-by-step guide: creating a new variant with different model/prompt and activating it
|
||||
- _Requirements: 13.1, 13.2, 13.3, 13.4, 13.5, 13.6, 13.7, 13.8_
|
||||
|
||||
- [x] 15. Write backup and restore guide
|
||||
- [x] 15.1 Create `docs/backup-restore.md` with backup and restore guide
|
||||
- Document all scripts in `scripts/`: `backup-db.sh`, `restore-db.sh`, `backup-redis.sh`, `backup.sh`, `restore.sh`
|
||||
- For each backup script: CLI arguments, data captured, storage location, retention/pruning (keeps last 7)
|
||||
- For each restore script: CLI arguments, what it restores, service scale-down/up procedure, data loss implications
|
||||
- Document MinIO upload option (`--upload-minio`) for off-host storage
|
||||
- Document full nuke-and-rebuild procedure: connection termination, DB drop, Redis flush, redeploy, re-seed
|
||||
- Document recommended backup schedules and automation (cron, Kubernetes CronJobs)
|
||||
- _Requirements: 14.1, 14.2, 14.3, 14.4, 14.5, 14.6_
|
||||
|
||||
- [x] 16. Write observability and metrics reference
|
||||
- [x] 16.1 Create `docs/observability.md` with observability reference
|
||||
- Document `/metrics` endpoint on query-api and Prometheus scrape configuration
|
||||
- Document all Prometheus counters, gauges, histograms from `services/shared/metrics.py` — ingestion, parsing, extraction, aggregation, recommendation, lake, trading, alerting, DLQ, active jobs metrics with names, labels, descriptions
|
||||
- Document alerting module (`services/shared/alerting.py`): 4 alert rules, thresholds, evaluation windows, ConfigMap variables
|
||||
- Document structured JSON logging format, trace context (trace_id, span_id), log querying
|
||||
- Document dead-letter queue system: queue names (`stonks:dlq:<queue>`), routing, replay tooling
|
||||
- Document recommended Prometheus/Grafana queries for monitoring
|
||||
- _Requirements: 15.1, 15.2, 15.3, 15.4, 15.5, 15.6_
|
||||
|
||||
- [x] 17. Update README with documentation links
|
||||
- [x] 17.1 Update `README.md` with documentation section and resource links
|
||||
- Add "Documentation" section with links to all docs: services.md, api-reference.md, helm-reference.md, docker-deployment.md, architecture-kubernetes.md, architecture-docker-compose.md, architecture-data-pipeline.md, ai-agents.md, backup-restore.md, observability.md
|
||||
- Replace ASCII architecture diagram with Mermaid diagram or link to architecture diagram docs
|
||||
- Preserve all existing content: license, features, tech stack, project structure, deployment instructions
|
||||
- _Requirements: 16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 16.9, 16.10, 16.11_
|
||||
|
||||
- [x] 18. Final checkpoint — Full verification
|
||||
- Run `pytest tests/ -x --tb=short -q` — zero failures
|
||||
- Run `ruff check services/` — zero violations
|
||||
- Run `docker compose config` — validates successfully
|
||||
- Verify all `test_pbt_*` files pass unchanged
|
||||
- Verify all documentation files exist in `docs/` and render valid Markdown
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
## Notes
|
||||
|
||||
- Tasks marked with `*` are optional and can be skipped for faster MVP
|
||||
- Each task references specific requirements for traceability
|
||||
- Checkpoints ensure incremental validation
|
||||
- No property-based tests are included — the design assessment confirmed PBT is not applicable to this feature
|
||||
- Existing `test_pbt_*` files (22 files) must remain passing throughout
|
||||
- The implementation language is Python (with Markdown for documentation), matching the existing codebase
|
||||
Reference in New Issue
Block a user