feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
This commit is contained in:
@@ -0,0 +1 @@
|
||||
{"specId": "e433350c-baf0-4f4f-a30e-3724f6654090", "workflowType": "requirements-first", "specType": "feature"}
|
||||
@@ -0,0 +1,377 @@
|
||||
# Design Document: Comprehensive Quality & Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
This design covers three pillars for the Stonks Oracle platform:
|
||||
|
||||
1. **Test Coverage** — Close unit test gaps in the scheduler and ingestion services, fix pre-existing test failures in the extractor module, and achieve a fully green test suite (Requirements 1–4).
|
||||
2. **Docker Deployment** — Extend `docker-compose.yml` to include all 13 application services plus the frontend, enabling full-platform local development without Kubernetes (Requirement 5).
|
||||
3. **Documentation** — Produce comprehensive documentation covering per-service features, API references, Helm chart configuration, Docker deployment, three Mermaid architecture diagrams, AI agent building, backup/restore, observability, and README resource links (Requirements 6–16).
|
||||
|
||||
### Design Rationale
|
||||
|
||||
The platform has mature production code across 13 services but uneven test coverage and documentation. The scheduler and ingestion services lack dedicated unit tests — their logic is only exercised through integration tests. Four extractor-related test files have pre-existing failures that block CI. Documentation exists only as a local dev setup guide, a pipeline overview, and a runbook. This initiative fills those gaps systematically.
|
||||
|
||||
The approach prioritizes:
|
||||
- **Test isolation**: Mock all external dependencies (PostgreSQL, Redis, MinIO, Ollama) so unit tests run fast and deterministically.
|
||||
- **Documentation from source**: Generate API references by inspecting actual FastAPI route definitions, Helm values from `values.yaml`, and metrics from `services/shared/metrics.py`.
|
||||
- **Docker parity with Kubernetes**: Mirror the Helm chart's service definitions in Docker Compose so both deployment modes stay in sync.
|
||||
|
||||
## Architecture
|
||||
|
||||
The work does not change the platform's runtime architecture. It adds:
|
||||
|
||||
1. **New test files** in `tests/` for scheduler and ingestion unit tests.
|
||||
2. **Fixes** to existing test files and/or production code to resolve failures.
|
||||
3. **New service definitions** in `docker-compose.yml` using the existing `docker/Dockerfile` with `SERVICE_CMD` build args.
|
||||
4. **New documentation files** in `docs/` organized by topic.
|
||||
5. **Updated `README.md`** with a documentation index and Mermaid diagram.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Test Coverage (Reqs 1-4)"
|
||||
T1[tests/test_scheduler_unit.py]
|
||||
T2[tests/test_ingestion_unit.py]
|
||||
T3[Fix test_extractor_prompts.py]
|
||||
T4[Fix test_extractor_schemas.py]
|
||||
T5[Fix test_ollama_client.py]
|
||||
T6[Fix test_filings_adapter.py]
|
||||
end
|
||||
|
||||
subgraph "Docker (Req 5)"
|
||||
D1[docker-compose.yml<br/>+ 13 app services + frontend]
|
||||
end
|
||||
|
||||
subgraph "Documentation (Reqs 6-16)"
|
||||
DOC1[docs/services.md]
|
||||
DOC2[docs/api-reference.md]
|
||||
DOC3[docs/helm-reference.md]
|
||||
DOC4[docs/docker-deployment.md]
|
||||
DOC5[docs/architecture-kubernetes.md]
|
||||
DOC6[docs/architecture-docker-compose.md]
|
||||
DOC7[docs/architecture-data-pipeline.md]
|
||||
DOC8[docs/ai-agents.md]
|
||||
DOC9[docs/backup-restore.md]
|
||||
DOC10[docs/observability.md]
|
||||
DOC11[README.md update]
|
||||
end
|
||||
```
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### 1. Scheduler Unit Tests (Requirement 1)
|
||||
|
||||
**Target module**: `services/scheduler/app.py`
|
||||
|
||||
**Functions to test in isolation**:
|
||||
- `get_cadence_for_source(source_type, config)` — Returns polling interval from config or defaults.
|
||||
- `compute_backoff(retry_count)` — Exponential backoff with cap.
|
||||
- `is_source_due(...)` — Core scheduling logic: determines if a source needs polling based on last run status, timing, retry state.
|
||||
- `build_job_payload(source, aliases, now)` — Constructs the ingestion job dict.
|
||||
- `schedule_cycle(pool, rds)` — Full scheduling pass (mocked DB/Redis).
|
||||
- `check_rate_limit(rds, source_type, now)` — Rate limiting with per-type and global Polygon limits.
|
||||
- `recover_stale_documents(pool, rds)` — Re-enqueue orphaned parsed documents.
|
||||
- `retry_failed_extractions(pool, rds)` — Re-enqueue failed extractions.
|
||||
|
||||
**Mocking strategy**:
|
||||
- `asyncpg.Pool` → `AsyncMock` with `.fetch()`, `.fetchrow()`, `.fetchval()`, `.execute()` returning canned records.
|
||||
- `redis.asyncio.Redis` → `AsyncMock` with `.rpush()`, `.set()`, `.get()`, `.incr()`, `.expire()`, `.decr()`, `.delete()` tracking calls.
|
||||
- Use `unittest.mock.patch` for module-level imports where needed.
|
||||
|
||||
**Test file**: `tests/test_scheduler_unit.py`
|
||||
|
||||
### 2. Ingestion Unit Tests (Requirement 2)
|
||||
|
||||
**Target module**: `services/ingestion/worker.py`
|
||||
|
||||
**Functions to test**:
|
||||
- `process_job(job, pool, rds, minio_client, adapters)` — Main job processing with various adapter outcomes.
|
||||
- Error handling paths: adapter returns `AdapterResult(error=...)`, retry exhaustion, dead-letter routing.
|
||||
- Deduplication: content hash already seen in Redis, cross-source document dedup via `dedupe_items`.
|
||||
|
||||
**Mocking strategy**:
|
||||
- Adapters → `AsyncMock` returning `AdapterResult` with controlled `error`, `items`, `content_hash`, `raw_payload`.
|
||||
- `asyncpg.Pool` → `AsyncMock` for `ingestion_runs` INSERT/UPDATE, `persist_ingestion_items`, `record_retrieval_failure`.
|
||||
- `redis.asyncio.Redis` → `AsyncMock` for dedupe checks, queue pushes, DLQ routing.
|
||||
- `minio.Minio` → `MagicMock` for `upload_raw_artifact`.
|
||||
|
||||
**Test file**: `tests/test_ingestion_unit.py`
|
||||
|
||||
### 3. Extractor Test Fixes (Requirement 3)
|
||||
|
||||
**Target files**:
|
||||
- `tests/test_extractor_prompts.py`
|
||||
- `tests/test_extractor_schemas.py`
|
||||
- `tests/test_ollama_client.py`
|
||||
- `tests/test_filings_adapter.py`
|
||||
|
||||
**Approach**: Run each file individually, diagnose failures, and fix either the test setup (mock configuration, fixture data) or the production code. Preserve original test intent and assertions. If production code changes are needed, add regression tests.
|
||||
|
||||
### 4. Full Test Suite Green (Requirement 4)
|
||||
|
||||
**Verification**: Run `pytest tests/ -x --tb=short -q` and `ruff check services/` after all fixes. All existing `test_pbt_*` files must remain passing. Any production code fix must include a regression test.
|
||||
|
||||
### 5. Docker Compose Application Services (Requirement 5)
|
||||
|
||||
**Current state**: `docker-compose.yml` defines 7 infrastructure services (postgres, redis, minio, minio-init, ollama, trino, hive-metastore, superset).
|
||||
|
||||
**Addition**: 14 new service definitions (13 app services + frontend dashboard):
|
||||
|
||||
| Service | Image Build | Command | Port | Depends On |
|
||||
|---------|------------|---------|------|------------|
|
||||
| scheduler | `docker/Dockerfile.scheduler` | `python -m services.scheduler.app` | — | postgres, redis |
|
||||
| symbol-registry | `docker/Dockerfile` | `uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000` | 8001:8000 | postgres |
|
||||
| ingestion | `docker/Dockerfile` | `python -m services.ingestion.worker` | — | postgres, redis, minio |
|
||||
| parser | `docker/Dockerfile` | `python -m services.parser.worker` | — | postgres, redis |
|
||||
| extractor | `docker/Dockerfile` | `python -m services.extractor.main` | — | postgres, redis, ollama |
|
||||
| aggregation | `docker/Dockerfile` | `python -m services.aggregation.main` | — | postgres, redis |
|
||||
| recommendation | `docker/Dockerfile` | `python -m services.recommendation.main` | — | postgres, redis |
|
||||
| trading-engine | `docker/Dockerfile` | `uvicorn services.trading.app:app --host 0.0.0.0 --port 8000` | 8002:8000 | postgres, redis |
|
||||
| risk-engine | `docker/Dockerfile` | `uvicorn services.risk.app:app --host 0.0.0.0 --port 8000` | 8003:8000 | postgres |
|
||||
| broker-adapter | `docker/Dockerfile` | `python -m services.adapters.broker_service` | — | postgres, redis |
|
||||
| lake-publisher | `docker/Dockerfile` | `python -m services.lake_publisher.jobs` | — | postgres, minio |
|
||||
| query-api | `docker/Dockerfile` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` | 8004:8000 | postgres, redis, minio |
|
||||
| dashboard | `frontend/Dockerfile` | nginx (built-in) | 3000:8080 | query-api |
|
||||
|
||||
**Common environment block** (shared via `x-app-env` YAML anchor):
|
||||
```yaml
|
||||
POSTGRES_HOST: postgres
|
||||
POSTGRES_PORT: "5432"
|
||||
POSTGRES_DB: stonks
|
||||
POSTGRES_USER: stonks
|
||||
POSTGRES_PASSWORD: stonks_dev
|
||||
REDIS_HOST: redis
|
||||
REDIS_PORT: "6379"
|
||||
MINIO_ENDPOINT: minio:9000
|
||||
MINIO_ACCESS_KEY: minioadmin
|
||||
MINIO_SECRET_KEY: minioadmin
|
||||
OLLAMA_BASE_URL: http://ollama:11434
|
||||
```
|
||||
|
||||
**`.env` file support**: `MARKET_DATA_API_KEY`, `BROKER_API_KEY`, `BROKER_API_SECRET`, `BROKER_BASE_URL` loaded via `env_file: .env` on services that need them (ingestion, broker-adapter, trading-engine).
|
||||
|
||||
**Health checks**: FastAPI services use `curl -f http://localhost:8000/health`; workers use process liveness checks. Infrastructure `depends_on` uses `condition: service_healthy`.
|
||||
|
||||
### 6. Documentation Structure (Requirements 6–16)
|
||||
|
||||
All documentation files are Markdown in `docs/`. The structure:
|
||||
|
||||
```
|
||||
docs/
|
||||
├── services.md # Req 6: Per-service feature docs
|
||||
├── api-reference.md # Req 7: All 4 FastAPI API references
|
||||
├── helm-reference.md # Req 8: Helm chart values reference
|
||||
├── docker-deployment.md # Req 9: Docker deployment guide
|
||||
├── architecture-kubernetes.md # Req 10: K8s Mermaid diagram
|
||||
├── architecture-docker-compose.md # Req 11: Docker Compose Mermaid diagram
|
||||
├── architecture-data-pipeline.md # Req 12: Data pipeline Mermaid diagram
|
||||
├── ai-agents.md # Req 13: AI agent building guide
|
||||
├── backup-restore.md # Req 14: Backup and restore guide
|
||||
├── observability.md # Req 15: Observability & metrics reference
|
||||
├── LOCAL_DEV_SETUP.md # (existing)
|
||||
├── llm-to-trade-pipeline.md # (existing)
|
||||
└── notes/
|
||||
└── runbook.md # (existing)
|
||||
```
|
||||
|
||||
#### 6a. Service Feature Documentation (`docs/services.md`) — Req 6
|
||||
|
||||
For each of the 13 services, document:
|
||||
- **Purpose**: What the service does in the pipeline.
|
||||
- **Entry point**: Module path (e.g., `services.scheduler.app`).
|
||||
- **Configuration**: Environment variables from `services/shared/config.py` relevant to this service.
|
||||
- **Database tables**: Tables read/written by this service.
|
||||
- **Redis queues**: Queue names consumed from and published to (from `services/shared/redis_keys.py`).
|
||||
- **Queue message schema**: JSON structure of messages.
|
||||
- **Signal layers**: For aggregation/recommendation, document the three signal layers (company, macro, competitive), their toggles (`macro_enabled`, `competitive_enabled` in `risk_configs`), and weight configurations.
|
||||
- **Trading engine features**: For the trading service, document position sizing, circuit breakers, reserve pool, risk tier auto-adjustment, backtesting, and notification configuration.
|
||||
|
||||
Queue topology reference (from `redis_keys.py`):
|
||||
| Queue | Producer | Consumer |
|
||||
|-------|----------|----------|
|
||||
| `stonks:queue:ingestion` | scheduler | ingestion |
|
||||
| `stonks:queue:parsing` | ingestion | parser |
|
||||
| `stonks:queue:extraction` | parser | extractor |
|
||||
| `stonks:queue:macro_classification` | parser, scheduler | extractor |
|
||||
| `stonks:queue:aggregation` | extractor | aggregation |
|
||||
| `stonks:queue:recommendation` | aggregation | recommendation |
|
||||
| `stonks:queue:lake_publish` | various | lake-publisher |
|
||||
| `stonks:queue:broker_orders` | trading-engine, trading API | broker-adapter |
|
||||
| `stonks:queue:trading_decisions` | recommendation | trading-engine |
|
||||
|
||||
#### 6b. API Reference (`docs/api-reference.md`) — Req 7
|
||||
|
||||
Document all endpoints from the four FastAPI services by inspecting their route definitions:
|
||||
|
||||
**Query API** (`services/api/app.py`): ~40+ endpoints covering companies, documents, trends, recommendations, evidence drill-down, orders, positions, portfolio, global events, macro impacts, competitive signals, trend projections, agents, dead-letter queues, pipeline control, SQL explorer, saved queries, audit trail, DevOps metrics, and Prometheus metrics.
|
||||
|
||||
**Symbol Registry API** (`services/symbol_registry/app.py`): Companies CRUD, aliases, watchlists, sources, exposure profiles, competitor relationships, competitor inference.
|
||||
|
||||
**Trading API** (`services/trading/app.py`): Health/readiness, engine status, config update, pause/resume, reset, decisions audit, performance metrics/history, backtesting, notifications config/history, override orders, debug state.
|
||||
|
||||
**Risk API** (`services/risk/app.py`): Order evaluation (`POST /evaluate`), health, pending approvals, approval review, approval expiration.
|
||||
|
||||
For each endpoint: method, path, query parameters (type, default, constraints), request body schema, response schema, error codes (4xx/5xx).
|
||||
|
||||
#### 6c. Helm Chart Reference (`docs/helm-reference.md`) — Req 8
|
||||
|
||||
Document from `infra/helm/stonks-oracle/values.yaml`:
|
||||
- `image` block: registry, pullPolicy, tag
|
||||
- `pipelineEnabled`: toggle and effect on worker replicas
|
||||
- `services` block: per-service structure (replicas, image, command, tier, port, secrets, resources, probes)
|
||||
- `config` block: all ConfigMap environment variables with defaults and descriptions
|
||||
- `secrets` block: core, broker, market, gmail, dashboard — injection via `--set` flags
|
||||
- `ingress` block: className, clusterIssuer, host mappings
|
||||
- Analytics stack: trino, hiveMetastore, superset toggles and resources
|
||||
- `networkPolicies.enabled`: default-deny-ingress behavior
|
||||
- Value override files: `values-beta.yaml`, `values-paper.yaml` and their deployment stages
|
||||
|
||||
#### 6d. Docker Deployment Guide (`docs/docker-deployment.md`) — Req 9
|
||||
|
||||
- Complete service inventory with images, ports, volumes, environment variables
|
||||
- `.env` file format with all required/optional variables
|
||||
- Volume mounts and data persistence (pgdata, miniodata, ollama_models, hive_data, superset_data)
|
||||
- Health check configurations
|
||||
- Dockerfile build arguments (`SERVICE_CMD`)
|
||||
- Operational commands: start, stop, restart, logs, scale, reset (`docker compose down -v`)
|
||||
|
||||
#### 6e. Architecture Diagrams (Reqs 10–12)
|
||||
|
||||
**Kubernetes diagram** (`docs/architecture-kubernetes.md`):
|
||||
- `stonks-oracle` namespace with all 13 services grouped by tier (api, processing, trading, orchestration, analytics, frontend)
|
||||
- External cluster services in their namespaces (postgresql-service, redis-service, minio-service, ollama-service)
|
||||
- Traefik ingress routes to external domains
|
||||
- Network policy boundaries
|
||||
- Analytics plane (Trino, Hive Metastore, Superset)
|
||||
- Helm-managed secrets (core, broker, market, gmail) with consumer mapping
|
||||
- Service tier distinction (API with ingress, pipeline workers, trading)
|
||||
|
||||
**Docker Compose diagram** (`docs/architecture-docker-compose.md`):
|
||||
- All infrastructure + application containers
|
||||
- Host port mappings
|
||||
- `depends_on` relationships and health check dependencies
|
||||
- Named volumes and mount points
|
||||
- `.env` file providing API keys
|
||||
- Internal Docker network connectivity
|
||||
|
||||
**Data Pipeline diagram** (`docs/architecture-data-pipeline.md`):
|
||||
- External sources → ingestion → parsing → extraction → aggregation → recommendation → risk → trading → broker
|
||||
- Redis queue topology with queue names
|
||||
- Three signal layers as distinct paths merging at aggregation
|
||||
- Data stores at each stage (MinIO, PostgreSQL, Redis)
|
||||
- Trading engine decision loop
|
||||
- Analytical branch (lake publisher → MinIO/Parquet → Trino → Superset/Dashboard)
|
||||
- External integrations (Ollama, Alpaca, AWS SNS, Gmail)
|
||||
|
||||
#### 6f. AI Agent Guide (`docs/ai-agents.md`) — Req 13
|
||||
|
||||
- Three built-in agents: document-extractor, event-classifier, thesis-rewriter
|
||||
- Per-agent: purpose, input data, output schema, default model, system prompt structure, user prompt template
|
||||
- `ai_agents` table schema and registration (system-seeded vs API-created)
|
||||
- `agent_variants` table: create, activate, deactivate variants for A/B testing
|
||||
- `AgentConfigResolver` module: TTL cache (60s default), COALESCE-based variant override, fallback behavior
|
||||
- Performance logging: `agent_performance_log` table, querying for variant comparison
|
||||
- API endpoints: CRUD on `/api/agents`, test endpoint `/api/agents/{id}/test`
|
||||
- Step-by-step guide: creating a new variant with different model/prompt and activating it
|
||||
|
||||
#### 6g. Backup & Restore Guide (`docs/backup-restore.md`) — Req 14
|
||||
|
||||
Scripts in `scripts/`:
|
||||
- `backup-db.sh`: PostgreSQL dump, CLI args, storage location, retention (keeps last 7)
|
||||
- `restore-db.sh`: PostgreSQL restore, service scale-down/up, data loss implications
|
||||
- `backup-redis.sh`: Redis RDB snapshot backup
|
||||
- `backup.sh`: Combined backup (DB + Redis), `--upload-minio` option
|
||||
- `restore.sh`: Combined restore
|
||||
- Full nuke-and-rebuild procedure (connection termination, DB drop, Redis flush, redeploy, re-seed)
|
||||
- Recommended backup schedules and automation (cron, Kubernetes CronJobs)
|
||||
|
||||
#### 6h. Observability Reference (`docs/observability.md`) — Req 15
|
||||
|
||||
- `/metrics` endpoint on query-api, Prometheus scrape configuration
|
||||
- All metrics from `services/shared/metrics.py`:
|
||||
- **Ingestion**: `stonks_ingestion_jobs_total`, `stonks_ingestion_items_fetched_total`, `stonks_ingestion_items_new_total`, `stonks_ingestion_items_deduped_total`, `stonks_ingestion_errors_total`, `stonks_ingestion_adapter_duration_seconds`
|
||||
- **Parsing**: `stonks_parse_jobs_total`, `stonks_parse_quality_score`, `stonks_parse_low_quality_total`, `stonks_parse_duration_seconds`
|
||||
- **Extraction**: `stonks_extraction_jobs_total`, `stonks_extraction_attempts_total`, `stonks_extraction_retries_total`, `stonks_extraction_duration_seconds`, `stonks_extraction_confidence`, `stonks_extraction_validation_errors_total`, `stonks_extraction_tokens_total`
|
||||
- **Aggregation**: `stonks_aggregation_windows_total`, `stonks_aggregation_signals_total`, `stonks_aggregation_contradiction_score`, `stonks_aggregation_duration_seconds`
|
||||
- **Recommendation**: `stonks_recommendations_total`, `stonks_recommendations_suppressed_total`, `stonks_recommendation_confidence`
|
||||
- **Lake**: `stonks_lake_facts_published_total`, `stonks_lake_publish_duration_seconds`, `stonks_lake_publish_errors_total`, `stonks_lake_publish_bytes_total`
|
||||
- **Trading**: `stonks_orders_submitted_total`, `stonks_orders_rejected_total`, `stonks_orders_filled_total`, `stonks_orders_duplicates_prevented_total`, `stonks_risk_evaluations_total`, `stonks_risk_check_failures_total`, `stonks_positions_synced_total`
|
||||
- **Alerting**: `stonks_alerts_fired_total`, `stonks_alerts_resolved_total`, `stonks_alert_check_duration_seconds`, `stonks_alert_active`
|
||||
- **DLQ**: `stonks_dlq_items_total`, `stonks_dlq_replayed_total`, `stonks_dlq_depth`
|
||||
- **Active**: `stonks_active_jobs`
|
||||
- Alerting module (`services/shared/alerting.py`): 4 alert rules (source_failures, schema_failure_spike, analytical_lag, broker_issues), thresholds, evaluation windows, ConfigMap variables
|
||||
- Structured JSON logging format, trace context (trace_id, span_id)
|
||||
- Dead-letter queue system: queue names (`stonks:dlq:<queue>`), routing, replay tooling
|
||||
- Recommended Prometheus/Grafana queries
|
||||
|
||||
#### 6i. README Update — Req 16
|
||||
|
||||
- Add "Documentation" section with links to all docs
|
||||
- Replace ASCII architecture diagram with Mermaid or link to diagram docs
|
||||
- Preserve all existing content (license, features, tech stack, project structure, deployment)
|
||||
|
||||
## Data Models
|
||||
|
||||
No new database tables or schema changes are introduced. This initiative works with existing tables:
|
||||
|
||||
**Tables referenced in test coverage work**:
|
||||
- `sources`, `companies`, `company_aliases` — scheduler source polling
|
||||
- `ingestion_runs` — scheduler run tracking, ingestion job recording
|
||||
- `documents`, `document_company_mentions` — ingestion persistence, stale document recovery
|
||||
- `document_intelligence`, `document_impact_records` — extractor test fixtures
|
||||
- `model_performance_metrics` — extractor schema validation metrics
|
||||
|
||||
**Tables documented** (not modified):
|
||||
- All tables listed above plus `trend_windows`, `trend_history`, `trend_projections`, `recommendations`, `recommendation_evidence`, `risk_evaluations`, `orders`, `order_events`, `positions`, `portfolio_snapshots`, `trading_decisions`, `circuit_breaker_events`, `reserve_pool_ledger`, `risk_tier_history`, `backtest_runs`, `backtest_trades`, `notifications`, `global_events`, `macro_impact_records`, `exposure_profiles`, `competitor_relationships`, `competitive_signal_records`, `ai_agents`, `agent_variants`, `agent_performance_log`, `audit_events`, `watchlists`, `watchlist_members`, `retention_policies`, `market_snapshots`
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Test Coverage
|
||||
- **Mock failures**: Unit tests must verify that scheduler and ingestion services handle database/Redis connection failures gracefully (no crashes, proper logging).
|
||||
- **Adapter errors**: Ingestion unit tests must verify retry logic with exponential backoff and dead-letter queue routing after retry exhaustion.
|
||||
- **Test fix approach**: When fixing pre-existing failures, prefer fixing test setup over changing production code. If production code changes are needed, add regression tests to prevent re-introduction.
|
||||
|
||||
### Docker Compose
|
||||
- **Health check failures**: Application services use `depends_on` with `condition: service_healthy` to wait for infrastructure. Health checks have `interval`, `timeout`, `retries`, and `start_period` configured.
|
||||
- **Missing `.env` file**: Services that need API keys (ingestion, broker-adapter, trading-engine) will start but log warnings about missing keys. The platform runs in a degraded mode without external API access.
|
||||
- **Build failures**: Each service uses the same base Dockerfile with `SERVICE_CMD` build arg. Build errors are isolated per service.
|
||||
|
||||
### Documentation
|
||||
- **Stale documentation**: Documentation is generated from source code inspection. If the codebase changes after documentation is written, the docs may drift. The README links section serves as a single index to find and update docs.
|
||||
- **Diagram accuracy**: Mermaid diagrams are hand-authored based on current architecture. They should be updated when services are added or removed.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### PBT Applicability Assessment
|
||||
|
||||
Property-based testing is **NOT applicable** to this feature. The work consists of:
|
||||
1. **Unit tests for existing services** — These are example-based tests with mocked dependencies, not pure functions with universal properties.
|
||||
2. **Fixing pre-existing test failures** — Bug fixes to existing tests/code.
|
||||
3. **Docker Compose configuration** — Declarative infrastructure configuration.
|
||||
4. **Documentation** — Markdown files with no executable logic.
|
||||
|
||||
None of these involve new pure functions, parsers, serializers, or business logic where PBT would add value. The existing `test_pbt_*` files (22 files covering trading, aggregation, competitive intelligence, etc.) already provide PBT coverage for the platform's core logic and must remain passing.
|
||||
|
||||
### Unit Testing Strategy
|
||||
|
||||
**New test files**:
|
||||
- `tests/test_scheduler_unit.py` — 8+ test cases covering all scheduler pure functions and the `schedule_cycle` orchestration with mocked dependencies.
|
||||
- `tests/test_ingestion_unit.py` — 6+ test cases covering adapter error handling, retry logic, deduplication, and dead-letter queue routing.
|
||||
|
||||
**Test fix files** (existing, to be repaired):
|
||||
- `tests/test_extractor_prompts.py`
|
||||
- `tests/test_extractor_schemas.py`
|
||||
- `tests/test_ollama_client.py`
|
||||
- `tests/test_filings_adapter.py`
|
||||
|
||||
**Test framework**: pytest + pytest-asyncio (already configured in the project).
|
||||
|
||||
**Mocking approach**: `unittest.mock.AsyncMock` for async dependencies, `unittest.mock.MagicMock` for sync dependencies, `unittest.mock.patch` for module-level state.
|
||||
|
||||
### Verification Criteria
|
||||
|
||||
1. `pytest tests/ -x --tb=short -q` → zero failures
|
||||
2. `ruff check services/` → zero violations
|
||||
3. All 22 existing `test_pbt_*` files pass unchanged
|
||||
4. `docker compose config` validates the updated docker-compose.yml
|
||||
5. All documentation files render valid Markdown with working internal links
|
||||
@@ -0,0 +1,236 @@
|
||||
# Requirements Document
|
||||
|
||||
## Introduction
|
||||
|
||||
This initiative covers three pillars for the Stonks Oracle platform: (1) closing unit test coverage gaps across all 13 services, fixing pre-existing test failures, and ensuring every feature has proper automated tests; (2) updating the Docker Compose deployment to include all application services so users can run the full platform without Kubernetes; and (3) producing comprehensive documentation covering every feature, all API endpoints, Helm chart configuration, Docker deployment options, and three Mermaid architecture diagrams (Kubernetes deployment, Docker Compose deployment, and data pipeline), with the README updated to link to all resources.
|
||||
|
||||
## Glossary
|
||||
|
||||
- **Test_Suite**: The collection of pytest unit tests, property-based tests, and integration tests in the `tests/` directory
|
||||
- **Docker_Compose_Stack**: The `docker-compose.yml` file and associated Dockerfiles that define the local development environment
|
||||
- **Helm_Chart**: The Kubernetes deployment configuration at `infra/helm/stonks-oracle/` including `values.yaml`, value overrides, and templates
|
||||
- **Query_API**: The FastAPI REST service at `services/api/app.py` serving analytics and dashboard queries
|
||||
- **Symbol_Registry_API**: The FastAPI REST service at `services/symbol_registry/app.py` managing companies, watchlists, sources, exposure profiles, and competitor relationships
|
||||
- **Trading_API**: The FastAPI REST service at `services/trading/app.py` controlling the autonomous trading engine
|
||||
- **Risk_API**: The FastAPI REST service at `services/risk/app.py` evaluating order risk and managing approval workflows
|
||||
- **Scheduler_Service**: The service at `services/scheduler/` that triggers ingestion cycles on a cadence
|
||||
- **Ingestion_Service**: The queue worker at `services/ingestion/` that fetches market data, news, filings, and macro events
|
||||
- **Extractor_Service**: The queue worker at `services/extractor/` that performs LLM-based intelligence extraction and event classification
|
||||
- **Documentation_Set**: The collection of Markdown files in `docs/` that describe features, APIs, deployment, and architecture
|
||||
- **Architecture_Diagram**: A Mermaid-syntax diagram showing services, data stores, external integrations, and data flow. Three diagrams are produced: Kubernetes deployment, Docker Compose deployment, and data pipeline
|
||||
- **README**: The root `README.md` file serving as the project entry point
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Scheduler Service Unit Tests
|
||||
|
||||
**User Story:** As a developer, I want the scheduler service to have dedicated unit tests, so that scheduling logic, cadence management, and source polling behavior are verified independently of integration tests.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Test_Suite is executed for the scheduler module, THE Test_Suite SHALL include unit tests covering job enqueue logic, polling interval calculation, and source due-date evaluation
|
||||
2. WHEN a scheduler unit test is run, THE Test_Suite SHALL mock all external dependencies (PostgreSQL, Redis) and test scheduling logic in isolation
|
||||
3. THE Test_Suite SHALL verify that the scheduler correctly enqueues ingestion jobs for sources whose polling interval has elapsed
|
||||
4. IF a database or Redis connection fails during scheduling, THEN THE Test_Suite SHALL verify that the Scheduler_Service handles the error without crashing
|
||||
|
||||
### Requirement 2: Ingestion Service Unit Tests
|
||||
|
||||
**User Story:** As a developer, I want the ingestion service to have unit tests for adapter error handling and retry logic, so that data fetching resilience is verified beyond integration tests.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Test_Suite is executed for the ingestion module, THE Test_Suite SHALL include unit tests covering adapter error handling, retry logic, and deduplication behavior
|
||||
2. WHEN an external API returns an error response, THE Test_Suite SHALL verify that the Ingestion_Service retries according to the configured backoff policy
|
||||
3. WHEN a duplicate content hash is detected, THE Test_Suite SHALL verify that the Ingestion_Service skips re-processing the document
|
||||
4. IF all retry attempts are exhausted, THEN THE Test_Suite SHALL verify that the Ingestion_Service routes the failed job to the dead-letter queue
|
||||
|
||||
### Requirement 3: Extractor Test Failure Fixes
|
||||
|
||||
**User Story:** As a developer, I want the pre-existing test failures in the extractor module to be resolved, so that the full test suite passes cleanly in CI.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_extractor_prompts.py` without failures
|
||||
2. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_extractor_schemas.py` without failures
|
||||
3. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_ollama_client.py` without failures
|
||||
4. WHEN the Test_Suite is executed, THE Test_Suite SHALL pass all tests in `test_filings_adapter.py` without failures
|
||||
5. THE Test_Suite SHALL maintain the original test intent and assertions when fixing failures, modifying only the code under test or test setup as needed
|
||||
|
||||
### Requirement 4: Full Test Suite Green Status
|
||||
|
||||
**User Story:** As a developer, I want the entire test suite to pass, so that CI builds succeed and regressions are caught immediately.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN `pytest tests/ -x --tb=short -q` is executed, THE Test_Suite SHALL report zero failures across all test files
|
||||
2. WHEN `ruff check services/` is executed, THE Test_Suite SHALL report zero lint violations
|
||||
3. THE Test_Suite SHALL maintain all existing property-based tests (files prefixed `test_pbt_*`) in a passing state
|
||||
4. IF a test fix requires modifying production code, THEN THE Test_Suite SHALL include a regression test that validates the fix
|
||||
|
||||
### Requirement 5: Docker Compose Application Services
|
||||
|
||||
**User Story:** As a developer using Docker instead of Kubernetes, I want docker-compose.yml to include all 13 application services and the frontend, so that I can run the full platform locally with a single `docker compose up`.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Docker_Compose_Stack SHALL define service containers for all 13 application services: scheduler, symbol-registry, ingestion, parser, extractor, aggregation, recommendation, trading-engine, risk-engine, broker-adapter, lake-publisher, query-api, and dashboard
|
||||
2. THE Docker_Compose_Stack SHALL define a frontend container serving the React dashboard via nginx on port 8080
|
||||
3. WHEN `docker compose up` is executed, THE Docker_Compose_Stack SHALL start all infrastructure services (PostgreSQL, Redis, MinIO, Ollama, Trino, Hive Metastore, Superset) before application services using dependency ordering
|
||||
4. WHEN an application service container starts, THE Docker_Compose_Stack SHALL provide health checks that verify the service is ready to accept requests
|
||||
5. THE Docker_Compose_Stack SHALL configure environment variables for each service matching the defaults documented in `docs/LOCAL_DEV_SETUP.md`, with infrastructure hostnames pointing to Docker Compose service names
|
||||
6. THE Docker_Compose_Stack SHALL allow users to provide API keys (MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET) via a `.env` file without modifying docker-compose.yml
|
||||
7. IF an infrastructure dependency (PostgreSQL, Redis) is not yet healthy, THEN THE Docker_Compose_Stack SHALL delay application service startup using `depends_on` with `condition: service_healthy`
|
||||
|
||||
### Requirement 6: Service Feature Documentation
|
||||
|
||||
**User Story:** As a user or contributor, I want every service documented with its purpose, configuration, queue interactions, and database tables, so that I can understand how each part of the platform works.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a dedicated document for each of the 13 services describing its purpose, inputs, outputs, configuration environment variables, and database tables used
|
||||
2. WHEN a service consumes from or publishes to a Redis queue, THE Documentation_Set SHALL document the queue name, message schema, and processing behavior
|
||||
3. WHEN a service exposes HTTP endpoints, THE Documentation_Set SHALL reference the API documentation for that service
|
||||
4. THE Documentation_Set SHALL describe the three signal layers (company, macro, competitive) with their data flow, toggle mechanisms, and weight configurations
|
||||
5. THE Documentation_Set SHALL document the trading engine features including position sizing, circuit breakers, reserve pool management, risk tier auto-adjustment, backtesting, and notification configuration
|
||||
|
||||
### Requirement 7: API Reference Documentation
|
||||
|
||||
**User Story:** As a developer integrating with Stonks Oracle, I want a complete API reference for all four FastAPI services, so that I know every endpoint, its parameters, request/response schemas, and error codes.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Query_API, including path, method, query parameters, response schema, and error codes
|
||||
2. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Symbol_Registry_API, including CRUD operations for companies, aliases, watchlists, sources, exposure profiles, and competitor relationships
|
||||
3. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Trading_API, including engine control, decision audit, performance metrics, backtesting, notifications, and manual override orders
|
||||
4. THE Documentation_Set SHALL include an API reference document covering all endpoints of the Risk_API, including order evaluation, approval workflow, and approval expiration
|
||||
5. WHEN an endpoint accepts query parameters or a request body, THE Documentation_Set SHALL document each parameter with its type, default value, and constraints
|
||||
6. WHEN an endpoint returns an error, THE Documentation_Set SHALL document the HTTP status code and error response format
|
||||
|
||||
### Requirement 8: Helm Chart Configuration Reference
|
||||
|
||||
**User Story:** As an operator deploying Stonks Oracle on Kubernetes, I want a complete reference for all Helm chart values, so that I can configure services, resources, secrets, ingress, network policies, and analytics components.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Helm configuration reference documenting every key in `values.yaml` with its type, default value, and description
|
||||
2. THE Documentation_Set SHALL document the `services` block structure including replicas, image, command, tier, port, secrets, resources, and probes for each service
|
||||
3. THE Documentation_Set SHALL document the `config` block with all ConfigMap environment variables, their defaults, and what they control
|
||||
4. THE Documentation_Set SHALL document the `secrets` block structure (core, broker, market, gmail, dashboard) and how secrets are injected via `--set` flags during deployment
|
||||
5. THE Documentation_Set SHALL document the `ingress` block including className, clusterIssuer, and host mappings
|
||||
6. THE Documentation_Set SHALL document the analytics stack toggles (trino.enabled, hiveMetastore.enabled, superset.enabled) and their resource configurations
|
||||
7. THE Documentation_Set SHALL document the `pipelineEnabled` toggle and its effect on worker service replicas
|
||||
8. THE Documentation_Set SHALL document the `networkPolicies.enabled` toggle and the default-deny-ingress behavior
|
||||
9. THE Documentation_Set SHALL document the value override files (`values-beta.yaml`, `values-paper.yaml`) and their intended deployment stages
|
||||
|
||||
### Requirement 9: Docker Deployment Guide
|
||||
|
||||
**User Story:** As a developer deploying with Docker Compose, I want a guide explaining all Docker deployment options, environment variables, volume mounts, and operational commands, so that I can run and manage the platform without Kubernetes.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Docker deployment guide documenting every service defined in docker-compose.yml with its image, ports, volumes, and environment variables
|
||||
2. THE Documentation_Set SHALL document the `.env` file format with all required and optional environment variables, their defaults, and descriptions
|
||||
3. THE Documentation_Set SHALL document volume mounts and data persistence behavior, including how to reset data with `docker compose down -v`
|
||||
4. THE Documentation_Set SHALL document health check configurations and how to verify all services are running
|
||||
5. THE Documentation_Set SHALL document the Dockerfile build arguments (SERVICE_CMD) and how to build custom service images
|
||||
6. THE Documentation_Set SHALL document operational commands for starting, stopping, restarting individual services, viewing logs, and scaling replicas
|
||||
|
||||
### Requirement 10: Kubernetes Architecture Diagram
|
||||
|
||||
**User Story:** As an operator deploying on Kubernetes, I want a Mermaid diagram showing how Stonks Oracle runs in a K8s cluster, so that I can understand the deployment topology, networking, and infrastructure dependencies.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Mermaid diagram showing all 13 application services deployed as Kubernetes Deployments within the `stonks-oracle` namespace
|
||||
2. THE diagram SHALL show external cluster services (PostgreSQL, Redis, MinIO, Ollama) in their respective namespaces with cross-namespace service references
|
||||
3. THE diagram SHALL show Traefik ingress routes mapping external domains to internal services (stonks.celestium.life → dashboard, stonks-api.celestium.life → query-api, etc.)
|
||||
4. THE diagram SHALL show network policy boundaries indicating which services can communicate with each other
|
||||
5. THE diagram SHALL show the analytics plane (Trino, Hive Metastore, Superset) deployed within the stonks-oracle namespace and their connections to MinIO
|
||||
6. THE diagram SHALL show Helm-managed secrets (core, broker, market, gmail) and which services consume them
|
||||
7. THE diagram SHALL distinguish between API-tier services (with ingress), pipeline-tier workers (queue-driven), and trading-tier services
|
||||
|
||||
### Requirement 11: Docker Compose Architecture Diagram
|
||||
|
||||
**User Story:** As a developer running the platform locally with Docker Compose, I want a Mermaid diagram showing how all containers are wired together, so that I can understand port mappings, volume mounts, and service dependencies.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Mermaid diagram showing all infrastructure containers (PostgreSQL, Redis, MinIO, Ollama, Trino, Hive Metastore, Superset) and all 13 application service containers as defined in docker-compose.yml
|
||||
2. THE diagram SHALL show host port mappings for externally accessible services (PostgreSQL:5432, Redis:6379, MinIO:9000/9001, Ollama:11434, Trino:8080, Superset:8088, Dashboard:8080, Query API:8000)
|
||||
3. THE diagram SHALL show Docker Compose `depends_on` relationships and health check dependencies between infrastructure and application services
|
||||
4. THE diagram SHALL show named volumes (pgdata, miniodata, ollama_models, hive_data, superset_data) and which containers mount them
|
||||
5. THE diagram SHALL show the `.env` file providing API keys (MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET) to relevant service containers
|
||||
6. THE diagram SHALL show internal Docker network connectivity between containers using Docker Compose service names as hostnames
|
||||
|
||||
### Requirement 12: Data Pipeline Architecture Diagram
|
||||
|
||||
**User Story:** As a user or contributor, I want a Mermaid diagram showing the end-to-end data pipeline from external data sources through signal processing to trade execution, so that I can understand how data flows through the system.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a Mermaid diagram showing the complete data pipeline from external sources (Polygon.io, news APIs, SEC filings, macro news sources) through ingestion, parsing, extraction, aggregation, recommendation, risk evaluation, and trade execution
|
||||
2. THE diagram SHALL show the Redis queue topology connecting pipeline stages (ingestion → parsing → extraction → aggregation → recommendation → broker) with queue names
|
||||
3. THE diagram SHALL show the three signal layers (company, macro, competitive) as distinct processing paths that merge in the aggregation stage
|
||||
4. THE diagram SHALL show data stores at each stage: MinIO for raw artifacts, PostgreSQL for structured data, Redis for queues and caching
|
||||
5. THE diagram SHALL show the trading engine decision loop: recommendation polling → position sizing → risk evaluation → order execution → broker submission → fill tracking
|
||||
6. THE diagram SHALL show the analytical branch: lake publisher writing Parquet fact tables to MinIO, queryable via Trino, visualized in Superset and the React dashboard
|
||||
7. THE diagram SHALL show external integrations at their connection points: Ollama for LLM extraction, Alpaca for trade execution, AWS SNS and Gmail for notifications
|
||||
|
||||
### Requirement 13: AI Agent Building Guide
|
||||
|
||||
**User Story:** As a user or contributor, I want a guide explaining how each of the three AI agents works — document extractor, event classifier, and thesis rewriter — including how to configure them, create variants, tune prompts, and monitor performance, so that I can customize and extend the AI capabilities of the platform.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include an AI agent guide documenting the three built-in agents: `document-extractor` (structured intelligence extraction from news/filings), `event-classifier` (macro/geopolitical event classification), and `thesis-rewriter` (LLM-enhanced recommendation thesis generation)
|
||||
2. FOR each agent, THE Documentation_Set SHALL document its purpose, input data, output schema, default model, system prompt structure, and user prompt template
|
||||
3. THE Documentation_Set SHALL document the `ai_agents` database table schema and how agents are registered (system-seeded vs user-created via the API)
|
||||
4. THE Documentation_Set SHALL document the `agent_variants` table and how to create, activate, and deactivate variants for A/B testing different models or prompts
|
||||
5. THE Documentation_Set SHALL document the `AgentConfigResolver` module including the TTL cache (60-second default), COALESCE-based variant override logic, and fallback behavior when no DB config exists
|
||||
6. THE Documentation_Set SHALL document the agent performance logging system and how to query `agent_performance_log` to compare variant effectiveness
|
||||
7. THE Documentation_Set SHALL document the API endpoints for managing agents (CRUD on `/api/agents`) and testing agent configurations (`/api/agents/{id}/test`)
|
||||
8. THE Documentation_Set SHALL include a step-by-step guide for creating a new agent variant with a different model or prompt and activating it for live traffic
|
||||
|
||||
### Requirement 14: Backup and Restore Guide
|
||||
|
||||
**User Story:** As an operator, I want a guide documenting all backup and restore scripts, their options, storage locations, and retention policies, so that I can protect data and recover from failures.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include a backup and restore guide documenting every script in `scripts/` related to backup and restore: `backup-db.sh`, `restore-db.sh`, `backup-redis.sh`, `backup.sh`, and `restore.sh`
|
||||
2. FOR each backup script, THE Documentation_Set SHALL document its CLI arguments, what data it captures, where backups are stored, and retention/pruning behavior (e.g., keeps last 7)
|
||||
3. FOR each restore script, THE Documentation_Set SHALL document its CLI arguments, what it restores, the service scale-down/scale-up procedure it performs, and any data loss implications
|
||||
4. THE Documentation_Set SHALL document the MinIO upload option (`--upload-minio`) for off-host backup storage
|
||||
5. THE Documentation_Set SHALL document the full database nuke and rebuild procedure including connection termination, database drop, Redis flush, redeploy, and re-seed steps
|
||||
6. THE Documentation_Set SHALL document recommended backup schedules and how to automate backups via cron or Kubernetes CronJobs
|
||||
|
||||
### Requirement 15: Observability and Prometheus Metrics Reference
|
||||
|
||||
**User Story:** As an operator, I want a reference documenting all Prometheus metrics exposed by the platform, the alerting rules, and how to monitor pipeline health, so that I can set up dashboards and respond to incidents.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Documentation_Set SHALL include an observability reference documenting the `/metrics` endpoint on the query API and how to configure Prometheus to scrape it
|
||||
2. THE Documentation_Set SHALL document all Prometheus counters, gauges, and histograms emitted by each service, including metric name, labels, and what they measure (e.g., `EXTRACTION_ATTEMPTS`, `EXTRACTION_DURATION`, `AGGREGATION_WINDOWS_COMPUTED`, `AGGREGATION_SIGNALS_PROCESSED`, `RECOMMENDATION_GENERATED`, `RECOMMENDATION_CONFIDENCE`, alerting counters)
|
||||
3. THE Documentation_Set SHALL document the alerting module (`services/shared/alerting.py`) including all alert rules, their thresholds, evaluation windows, and the ConfigMap environment variables that control them (`ALERT_SOURCE_FAILURE_THRESHOLD`, `ALERT_SCHEMA_FAILURE_RATE_THRESHOLD`, `ALERT_LAKE_LAG_THRESHOLD_MINUTES`, `ALERT_BROKER_ERROR_THRESHOLD`, etc.)
|
||||
4. THE Documentation_Set SHALL document the structured JSON logging format, trace context propagation (trace_id, span_id), and how to query logs for debugging pipeline issues
|
||||
5. THE Documentation_Set SHALL document the dead-letter queue system including queue names, how failed jobs are routed there, and how to replay them using the dead-letter tooling
|
||||
6. THE Documentation_Set SHALL document recommended Prometheus/Grafana dashboard configurations or queries for monitoring ingestion throughput, extraction latency, aggregation volume, recommendation generation rate, and trading engine activity
|
||||
|
||||
### Requirement 16: README Resource Links
|
||||
|
||||
**User Story:** As a user landing on the repository, I want the README to link to all documentation resources, so that I can navigate to any guide, reference, or diagram from a single entry point.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the README is updated, THE README SHALL include a documentation section with links to every document in the Documentation_Set
|
||||
2. THE README SHALL link to the API reference documents for all four FastAPI services
|
||||
3. THE README SHALL link to the Helm chart configuration reference
|
||||
4. THE README SHALL link to the Docker deployment guide
|
||||
5. THE README SHALL link to all three architecture diagram documents (Kubernetes, Docker Compose, and Data Pipeline)
|
||||
6. THE README SHALL link to the per-service feature documentation
|
||||
7. THE README SHALL link to the AI agent building guide
|
||||
8. THE README SHALL link to the backup and restore guide
|
||||
9. THE README SHALL link to the observability and Prometheus metrics reference
|
||||
10. THE README SHALL replace the existing ASCII architecture diagram with the Mermaid architecture diagram or link to it
|
||||
11. THE README SHALL preserve all existing content (license, features, tech stack, project structure, deployment instructions) while adding the new documentation links
|
||||
@@ -0,0 +1,223 @@
|
||||
# Implementation Plan: Comprehensive Quality & Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
This plan implements three pillars for the Stonks Oracle platform: (1) unit test coverage for the scheduler and ingestion services plus fixing pre-existing test failures, (2) extending docker-compose.yml with all 13 application services and the frontend, and (3) producing comprehensive documentation covering services, APIs, Helm configuration, Docker deployment, architecture diagrams, AI agents, backup/restore, observability, and README resource links. Tasks are ordered so tests come first (catch regressions early), then Docker Compose (infrastructure), then documentation (references verified code).
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 1. Write scheduler service unit tests
|
||||
- [x] 1.1 Create `tests/test_scheduler_unit.py` with unit tests for scheduler pure functions and orchestration
|
||||
- Import scheduler functions from `services/scheduler/app.py`
|
||||
- Mock `asyncpg.Pool` (`.fetch()`, `.fetchrow()`, `.fetchval()`, `.execute()`) and `redis.asyncio.Redis` (`.rpush()`, `.set()`, `.get()`, `.incr()`, `.expire()`, `.decr()`, `.delete()`)
|
||||
- Write 8+ test cases covering: `get_cadence_for_source`, `compute_backoff`, `is_source_due`, `build_job_payload`, `schedule_cycle` (mocked DB/Redis), `check_rate_limit`, `recover_stale_documents`, `retry_failed_extractions`
|
||||
- Verify error handling: DB/Redis connection failures handled without crashing
|
||||
- Use `pytest-asyncio` for async test functions, `unittest.mock.AsyncMock` and `unittest.mock.patch`
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4_
|
||||
|
||||
- [x] 1.2 Write additional edge-case unit tests for scheduler
|
||||
- Test boundary conditions: zero polling interval, max retry count, empty source list
|
||||
- Test rate limiting edge cases: global Polygon limit, per-type limits
|
||||
- _Requirements: 1.3, 1.4_
|
||||
|
||||
- [x] 2. Write ingestion service unit tests
|
||||
- [x] 2.1 Create `tests/test_ingestion_unit.py` with unit tests for ingestion worker
|
||||
- Import ingestion functions from `services/ingestion/worker.py`
|
||||
- Mock adapters as `AsyncMock` returning `AdapterResult` with controlled `error`, `items`, `content_hash`, `raw_payload`
|
||||
- Mock `asyncpg.Pool` for `ingestion_runs` INSERT/UPDATE, `persist_ingestion_items`, `record_retrieval_failure`
|
||||
- Mock `redis.asyncio.Redis` for dedupe checks, queue pushes, DLQ routing
|
||||
- Mock `minio.Minio` for `upload_raw_artifact`
|
||||
- Write 6+ test cases covering: successful job processing, adapter error with retry, retry exhaustion → dead-letter queue, content hash deduplication skip, cross-source dedup via `dedupe_items`, error handling paths
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4_
|
||||
|
||||
- [x] 2.2 Write additional edge-case unit tests for ingestion
|
||||
- Test empty adapter response, partial failures, multiple items in single job
|
||||
- _Requirements: 2.1, 2.4_
|
||||
|
||||
- [x] 3. Checkpoint — Verify new unit tests pass
|
||||
- Run `pytest tests/test_scheduler_unit.py tests/test_ingestion_unit.py -x --tb=short -q`
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 4. Fix pre-existing test failures
|
||||
- [x] 4.1 Fix `tests/test_extractor_prompts.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup (mock configuration, fixture data) or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- If production code changes are needed, add regression tests
|
||||
- _Requirements: 3.1, 3.5_
|
||||
|
||||
- [x] 4.2 Fix `tests/test_extractor_schemas.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- _Requirements: 3.2, 3.5_
|
||||
|
||||
- [x] 4.3 Fix `tests/test_ollama_client.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- _Requirements: 3.3, 3.5_
|
||||
|
||||
- [x] 4.4 Fix `tests/test_filings_adapter.py`
|
||||
- Run the file individually to diagnose failures
|
||||
- Fix test setup or production code as needed
|
||||
- Preserve original test intent and assertions
|
||||
- _Requirements: 3.4, 3.5_
|
||||
|
||||
- [x] 5. Checkpoint — Full test suite green
|
||||
- Run `pytest tests/ -x --tb=short -q` and verify zero failures
|
||||
- Run `ruff check services/` and verify zero violations
|
||||
- Verify all `test_pbt_*` files pass unchanged
|
||||
- If any production code was modified, confirm regression tests exist
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4_
|
||||
|
||||
- [x] 6. Add application services to docker-compose.yml
|
||||
- [x] 6.1 Add shared environment anchor and all 14 service definitions to `docker-compose.yml`
|
||||
- Define `x-app-env` YAML anchor with common environment variables (POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD, REDIS_HOST, REDIS_PORT, MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, OLLAMA_BASE_URL)
|
||||
- Add 13 application service definitions: scheduler (using `docker/Dockerfile.scheduler`), symbol-registry, ingestion, parser, extractor, aggregation, recommendation, trading-engine, risk-engine, broker-adapter, lake-publisher, query-api — each using `docker/Dockerfile` with appropriate `SERVICE_CMD` build arg
|
||||
- Add dashboard service using `frontend/Dockerfile` on port 3000:8080
|
||||
- Configure `depends_on` with `condition: service_healthy` for infrastructure dependencies
|
||||
- Add health checks: FastAPI services use `curl -f http://localhost:8000/health`, workers use process liveness
|
||||
- Configure `env_file: .env` on services needing API keys (ingestion, broker-adapter, trading-engine)
|
||||
- Map host ports: symbol-registry:8001, trading-engine:8002, risk-engine:8003, query-api:8004, dashboard:3000
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7_
|
||||
|
||||
- [x] 6.2 Validate docker-compose.yml configuration
|
||||
- Run `docker compose config` to verify the updated file parses correctly
|
||||
- _Requirements: 5.1_
|
||||
|
||||
- [x] 7. Checkpoint — Tests and Docker Compose validated
|
||||
- Run `pytest tests/ -x --tb=short -q` to confirm no regressions
|
||||
- Run `docker compose config` to confirm valid YAML
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 8. Write per-service feature documentation
|
||||
- [x] 8.1 Create `docs/services.md` documenting all 13 services
|
||||
- For each service: purpose, entry point module path, configuration environment variables, database tables read/written, Redis queues consumed/published with message schemas
|
||||
- Include queue topology table (queue name → producer → consumer)
|
||||
- Document the three signal layers (company, macro, competitive) with data flow, toggles, and weight configurations
|
||||
- Document trading engine features: position sizing, circuit breakers, reserve pool, risk tier auto-adjustment, backtesting, notifications
|
||||
- Cross-reference API documentation for services with HTTP endpoints
|
||||
- _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_
|
||||
|
||||
- [x] 9. Write API reference documentation
|
||||
- [x] 9.1 Create `docs/api-reference.md` covering all four FastAPI services
|
||||
- Document all Query API endpoints (~40+): path, method, query parameters (type, default, constraints), request body schema, response schema, error codes
|
||||
- Document all Symbol Registry API endpoints: companies CRUD, aliases, watchlists, sources, exposure profiles, competitor relationships, competitor inference
|
||||
- Document all Trading API endpoints: health/readiness, engine status, config update, pause/resume, reset, decisions audit, performance metrics/history, backtesting, notifications config/history, override orders, debug state
|
||||
- Document all Risk API endpoints: order evaluation (POST /evaluate), health, pending approvals, approval review, approval expiration
|
||||
- Inspect actual route definitions in `services/api/app.py`, `services/symbol_registry/app.py`, `services/trading/app.py`, `services/risk/app.py`
|
||||
- _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6_
|
||||
|
||||
- [x] 10. Write Helm chart configuration reference
|
||||
- [x] 10.1 Create `docs/helm-reference.md` documenting all Helm values
|
||||
- Document `image` block: registry, pullPolicy, tag
|
||||
- Document `pipelineEnabled` toggle and effect on worker replicas
|
||||
- Document `services` block: per-service structure (replicas, image, command, tier, port, secrets, resources, probes)
|
||||
- Document `config` block: all ConfigMap environment variables with defaults and descriptions
|
||||
- Document `secrets` block: core, broker, market, gmail, dashboard — injection via `--set` flags
|
||||
- Document `ingress` block: className, clusterIssuer, host mappings
|
||||
- Document analytics stack toggles: trino.enabled, hiveMetastore.enabled, superset.enabled with resources
|
||||
- Document `networkPolicies.enabled` and default-deny-ingress behavior
|
||||
- Document value override files: `values-beta.yaml`, `values-paper.yaml` and deployment stages
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9_
|
||||
|
||||
- [x] 11. Write Docker deployment guide
|
||||
- [x] 11.1 Create `docs/docker-deployment.md` with complete Docker deployment guide
|
||||
- Document every service with image, ports, volumes, environment variables
|
||||
- Document `.env` file format with all required/optional variables, defaults, descriptions
|
||||
- Document volume mounts and data persistence (pgdata, miniodata, ollama_models, hive_data, superset_data), reset with `docker compose down -v`
|
||||
- Document health check configurations and verification commands
|
||||
- Document Dockerfile build arguments (`SERVICE_CMD`) and custom image builds
|
||||
- Document operational commands: start, stop, restart, logs, scale, reset
|
||||
- _Requirements: 9.1, 9.2, 9.3, 9.4, 9.5, 9.6_
|
||||
|
||||
- [x] 12. Checkpoint — Documentation progress check
|
||||
- Verify `docs/services.md`, `docs/api-reference.md`, `docs/helm-reference.md`, `docs/docker-deployment.md` exist and render valid Markdown
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
- [x] 13. Write architecture diagrams
|
||||
- [x] 13.1 Create `docs/architecture-kubernetes.md` with Kubernetes deployment Mermaid diagram
|
||||
- Show all 13 services in `stonks-oracle` namespace grouped by tier (api, processing, trading, orchestration, analytics, frontend)
|
||||
- Show external cluster services (PostgreSQL, Redis, MinIO, Ollama) in their namespaces
|
||||
- Show Traefik ingress routes to external domains
|
||||
- Show network policy boundaries
|
||||
- Show analytics plane (Trino, Hive Metastore, Superset) and MinIO connections
|
||||
- Show Helm-managed secrets (core, broker, market, gmail) with consumer mapping
|
||||
- Distinguish API-tier (with ingress), pipeline-tier (queue-driven), and trading-tier services
|
||||
- _Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7_
|
||||
|
||||
- [x] 13.2 Create `docs/architecture-docker-compose.md` with Docker Compose Mermaid diagram
|
||||
- Show all infrastructure + application containers
|
||||
- Show host port mappings for externally accessible services
|
||||
- Show `depends_on` relationships and health check dependencies
|
||||
- Show named volumes and mount points
|
||||
- Show `.env` file providing API keys to relevant containers
|
||||
- Show internal Docker network connectivity
|
||||
- _Requirements: 11.1, 11.2, 11.3, 11.4, 11.5, 11.6_
|
||||
|
||||
- [x] 13.3 Create `docs/architecture-data-pipeline.md` with data pipeline Mermaid diagram
|
||||
- Show complete pipeline: external sources → ingestion → parsing → extraction → aggregation → recommendation → risk → trading → broker
|
||||
- Show Redis queue topology with queue names
|
||||
- Show three signal layers as distinct paths merging at aggregation
|
||||
- Show data stores at each stage (MinIO, PostgreSQL, Redis)
|
||||
- Show trading engine decision loop
|
||||
- Show analytical branch: lake publisher → MinIO/Parquet → Trino → Superset/Dashboard
|
||||
- Show external integrations: Ollama, Alpaca, AWS SNS, Gmail
|
||||
- _Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7_
|
||||
|
||||
- [x] 14. Write AI agent building guide
|
||||
- [x] 14.1 Create `docs/ai-agents.md` with AI agent guide
|
||||
- Document three built-in agents: document-extractor, event-classifier, thesis-rewriter — purpose, input data, output schema, default model, system prompt structure, user prompt template
|
||||
- Document `ai_agents` table schema and registration (system-seeded vs API-created)
|
||||
- Document `agent_variants` table: create, activate, deactivate variants for A/B testing
|
||||
- Document `AgentConfigResolver` module: TTL cache (60s), COALESCE-based variant override, fallback behavior
|
||||
- Document performance logging: `agent_performance_log` table, querying for variant comparison
|
||||
- Document API endpoints: CRUD on `/api/agents`, test endpoint `/api/agents/{id}/test`
|
||||
- Include step-by-step guide: creating a new variant with different model/prompt and activating it
|
||||
- _Requirements: 13.1, 13.2, 13.3, 13.4, 13.5, 13.6, 13.7, 13.8_
|
||||
|
||||
- [x] 15. Write backup and restore guide
|
||||
- [x] 15.1 Create `docs/backup-restore.md` with backup and restore guide
|
||||
- Document all scripts in `scripts/`: `backup-db.sh`, `restore-db.sh`, `backup-redis.sh`, `backup.sh`, `restore.sh`
|
||||
- For each backup script: CLI arguments, data captured, storage location, retention/pruning (keeps last 7)
|
||||
- For each restore script: CLI arguments, what it restores, service scale-down/up procedure, data loss implications
|
||||
- Document MinIO upload option (`--upload-minio`) for off-host storage
|
||||
- Document full nuke-and-rebuild procedure: connection termination, DB drop, Redis flush, redeploy, re-seed
|
||||
- Document recommended backup schedules and automation (cron, Kubernetes CronJobs)
|
||||
- _Requirements: 14.1, 14.2, 14.3, 14.4, 14.5, 14.6_
|
||||
|
||||
- [x] 16. Write observability and metrics reference
|
||||
- [x] 16.1 Create `docs/observability.md` with observability reference
|
||||
- Document `/metrics` endpoint on query-api and Prometheus scrape configuration
|
||||
- Document all Prometheus counters, gauges, histograms from `services/shared/metrics.py` — ingestion, parsing, extraction, aggregation, recommendation, lake, trading, alerting, DLQ, active jobs metrics with names, labels, descriptions
|
||||
- Document alerting module (`services/shared/alerting.py`): 4 alert rules, thresholds, evaluation windows, ConfigMap variables
|
||||
- Document structured JSON logging format, trace context (trace_id, span_id), log querying
|
||||
- Document dead-letter queue system: queue names (`stonks:dlq:<queue>`), routing, replay tooling
|
||||
- Document recommended Prometheus/Grafana queries for monitoring
|
||||
- _Requirements: 15.1, 15.2, 15.3, 15.4, 15.5, 15.6_
|
||||
|
||||
- [x] 17. Update README with documentation links
|
||||
- [x] 17.1 Update `README.md` with documentation section and resource links
|
||||
- Add "Documentation" section with links to all docs: services.md, api-reference.md, helm-reference.md, docker-deployment.md, architecture-kubernetes.md, architecture-docker-compose.md, architecture-data-pipeline.md, ai-agents.md, backup-restore.md, observability.md
|
||||
- Replace ASCII architecture diagram with Mermaid diagram or link to architecture diagram docs
|
||||
- Preserve all existing content: license, features, tech stack, project structure, deployment instructions
|
||||
- _Requirements: 16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 16.9, 16.10, 16.11_
|
||||
|
||||
- [x] 18. Final checkpoint — Full verification
|
||||
- Run `pytest tests/ -x --tb=short -q` — zero failures
|
||||
- Run `ruff check services/` — zero violations
|
||||
- Run `docker compose config` — validates successfully
|
||||
- Verify all `test_pbt_*` files pass unchanged
|
||||
- Verify all documentation files exist in `docs/` and render valid Markdown
|
||||
- Ensure all tests pass, ask the user if questions arise.
|
||||
|
||||
## Notes
|
||||
|
||||
- Tasks marked with `*` are optional and can be skipped for faster MVP
|
||||
- Each task references specific requirements for traceability
|
||||
- Checkpoints ensure incremental validation
|
||||
- No property-based tests are included — the design assessment confirmed PBT is not applicable to this feature
|
||||
- Existing `test_pbt_*` files (22 files) must remain passing throughout
|
||||
- The implementation language is Python (with Markdown for documentation), matching the existing codebase
|
||||
@@ -0,0 +1 @@
|
||||
{"specId": "d2fe9091-6423-482c-a4ce-3cd72e62eb23", "workflowType": "requirements-first", "specType": "feature"}
|
||||
@@ -0,0 +1,153 @@
|
||||
# Design Document: Intelligence Pipeline Deep Dive
|
||||
|
||||
## Overview
|
||||
|
||||
This design specifies the structure, content, and creation process for a 6-page narrative deep-dive document covering the full intelligence-to-decision pipeline in Stonks Oracle. The deliverable consists of Markdown narrative pages, an index file, and standalone Mermaid diagram files — all stored under `docs/intelligence-pipeline-deep-dive/`.
|
||||
|
||||
The document targets technical readers who want to understand how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions. Unlike the existing reference docs (`docs/services.md`, `docs/architecture-data-pipeline.md`), this deliverable is narrative and explanatory — it tells the story of data flowing through the platform end-to-end.
|
||||
|
||||
**Key design decision**: This is a documentation-only deliverable. No application code, database schemas, or infrastructure changes are involved. The output is purely Markdown files and Mermaid diagram files.
|
||||
|
||||
### Existing Documentation Landscape
|
||||
|
||||
The codebase already has several reference documents that this deep-dive complements:
|
||||
|
||||
| Document | Purpose | Style |
|
||||
|----------|---------|-------|
|
||||
| `docs/architecture-data-pipeline.md` | Queue topology, data store summary, Mermaid flow diagrams | Reference diagrams + tables |
|
||||
| `docs/llm-to-trade-pipeline.md` | End-to-end data flow from model output to trade | Narrative + tables + code blocks |
|
||||
| `docs/services.md` | Per-service configuration, tables, queues, behaviors | Reference manual |
|
||||
| `docs/ai-agents.md` | AI agent configuration, variants, A/B testing, API | Guide + reference |
|
||||
|
||||
The deep-dive document will reference these existing docs for readers who want deeper detail, while providing a cohesive narrative that connects all pipeline stages into a single story.
|
||||
|
||||
## Architecture
|
||||
|
||||
### File Organization
|
||||
|
||||
```
|
||||
docs/intelligence-pipeline-deep-dive/
|
||||
├── index.md
|
||||
├── 01-data-ingestion-and-preparation.md
|
||||
├── 02-ai-agent-processing-and-extraction.md
|
||||
├── 03-signal-scoring-and-weighted-signals.md
|
||||
├── 04-trend-aggregation-and-accumulating-signals.md
|
||||
├── 05-recommendation-generation.md
|
||||
├── 06-trading-decisions-and-execution.md
|
||||
└── diagrams/
|
||||
├── ingestion-to-extraction-flow.md
|
||||
├── three-layer-signal-merging.md
|
||||
├── recommendation-generation-flow.md
|
||||
├── trading-engine-decision-loop.md
|
||||
├── weighted-signal-computation.md
|
||||
└── trend-accumulation-escalation.md
|
||||
```
|
||||
|
||||
### Content Flow
|
||||
|
||||
Each page covers one pipeline stage and ends with a transitional paragraph previewing the next page. Cross-references between pages use relative Markdown links. Diagrams are stored as standalone Mermaid files in the `diagrams/` subdirectory and linked from the narrative pages (not embedded inline).
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
P1["Page 1\nData Ingestion"] --> P2["Page 2\nAI Extraction"]
|
||||
P2 --> P3["Page 3\nSignal Scoring"]
|
||||
P3 --> P4["Page 4\nTrend Aggregation"]
|
||||
P4 --> P5["Page 5\nRecommendations"]
|
||||
P5 --> P6["Page 6\nTrading Execution"]
|
||||
```
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### Index File (`index.md`)
|
||||
|
||||
The index provides:
|
||||
- A brief introduction to the deep-dive document series
|
||||
- A numbered table of contents linking to all 6 pages
|
||||
- A diagrams section linking to all Mermaid diagram files
|
||||
- References to existing documentation for additional context
|
||||
|
||||
### Narrative Pages (01 through 06)
|
||||
|
||||
Each page follows a consistent structure:
|
||||
1. **Title and introduction** — what this stage does and why it matters
|
||||
2. **Narrative body** — explanatory prose describing the pipeline stage, referencing actual code modules (`services/extractor/main.py`), database tables (`document_impact_records`), Redis queues (`stonks:queue:extraction`), and Pydantic schemas (`ExtractionResult`)
|
||||
3. **Diagram references** — links to relevant Mermaid diagram files in `diagrams/`
|
||||
4. **Transition** — a closing paragraph that previews the next page
|
||||
|
||||
### Mermaid Diagram Files
|
||||
|
||||
Each diagram file contains:
|
||||
1. A brief title comment
|
||||
2. A single Mermaid code block
|
||||
3. Service labels include both human-readable names and Python module paths
|
||||
4. Queue labels use full Redis key patterns
|
||||
5. Database references use exact PostgreSQL table names
|
||||
|
||||
Minimum 6 diagrams covering:
|
||||
- **Ingestion-to-extraction flow**: Scheduler → Ingestion → Parser → Extractor, with queues and storage
|
||||
- **Three-layer signal merging**: Company, Macro, and Competitive layers converging into aggregation
|
||||
- **Recommendation generation flow**: Suppression → Eligibility → Thesis → Risk classification
|
||||
- **Trading engine decision loop**: Pre-trade checks → Position sizing → Order submission
|
||||
- **Weighted signal computation**: Component breakdown of the composite weight formula
|
||||
- **Trend accumulation and escalation**: How consecutive signals strengthen trends and escalate actions
|
||||
|
||||
### Page Content Mapping
|
||||
|
||||
| Page | Primary Code Modules | Key Database Tables | Key Queues |
|
||||
|------|---------------------|---------------------|------------|
|
||||
| 01 - Ingestion | `services/scheduler/app.py`, `services/ingestion/worker.py`, `services/parser/worker.py` | `documents`, `ingestion_runs`, `document_company_mentions` | `stonks:queue:ingestion`, `stonks:queue:parsing` |
|
||||
| 02 - AI Extraction | `services/extractor/main.py`, `services/extractor/client.py`, `services/extractor/prompts.py`, `services/extractor/schemas.py`, `services/extractor/event_classifier.py`, `services/shared/agent_config.py` | `document_intelligence`, `document_impact_records`, `global_events`, `macro_impact_records`, `ai_agents`, `agent_variants` | `stonks:queue:extraction`, `stonks:queue:macro_classification`, `stonks:queue:aggregation` |
|
||||
| 03 - Signal Scoring | `services/aggregation/scoring.py` | `document_impact_records`, `macro_impact_records`, `competitive_signal_records`, `risk_configs` | — |
|
||||
| 04 - Trend Aggregation | `services/aggregation/worker.py`, `services/aggregation/contradiction.py`, `services/aggregation/projection.py`, `services/aggregation/pattern_matcher.py`, `services/aggregation/signal_propagation.py` | `trend_windows`, `trend_history`, `trend_evidence`, `trend_projections` | `stonks:queue:aggregation`, `stonks:queue:recommendation` |
|
||||
| 05 - Recommendations | `services/recommendation/main.py`, `services/recommendation/suppression.py`, `services/recommendation/eligibility.py`, `services/recommendation/thesis_llm.py` | `recommendations`, `recommendation_evidence`, `risk_evaluations` | `stonks:queue:recommendation` |
|
||||
| 06 - Trading | `services/trading/engine.py`, `services/trading/position_sizer.py`, `services/trading/circuit_breaker.py`, `services/trading/reserve_pool.py`, `services/trading/risk_tier_controller.py`, `services/trading/stop_loss_manager.py` | `trading_decisions`, `orders`, `positions`, `portfolio_snapshots`, `reserve_pool_ledger`, `risk_tier_history`, `circuit_breaker_events` | `stonks:queue:broker_orders` |
|
||||
|
||||
## Data Models
|
||||
|
||||
This feature produces only documentation files. There are no new data models, database tables, or schema changes.
|
||||
|
||||
The narrative pages will reference existing data models from the codebase:
|
||||
|
||||
- **`WeightedSignal`** (`services/aggregation/scoring.py`) — document reference + composite weight + sentiment + impact
|
||||
- **`SignalWeight`** (`services/aggregation/scoring.py`) — breakdown of recency, credibility, novelty, confidence gate, market context multiplier
|
||||
- **`ScoringConfig`** (`services/aggregation/scoring.py`) — tunable parameters for signal scoring
|
||||
- **`ExtractionResult`** / **`CompanyImpact`** (`services/extractor/schemas.py`) — structured JSON output from document extraction
|
||||
- **`GlobalEventSchema`** (`services/extractor/event_classifier.py`) — macro event classification output
|
||||
- **`TrendSummary`** (`services/shared/schemas.py`) — rolling trend for a ticker across a time window
|
||||
- **`Recommendation`** (`services/shared/schemas.py`) — actionable trade recommendation
|
||||
- **`TradingDecision`** (`services/trading/engine.py`) — audit record of every trading evaluation
|
||||
|
||||
## Error Handling
|
||||
|
||||
Since this is a documentation-only deliverable, there is no runtime error handling to design. The primary quality concern is **accuracy** — ensuring that all code module paths, database table names, Redis queue keys, schema field names, and configuration values referenced in the narrative match the actual codebase.
|
||||
|
||||
### Accuracy Verification Strategy
|
||||
|
||||
1. **Code module paths**: Every module path referenced in the narrative (e.g., `services/aggregation/scoring.py`) must correspond to an existing file in the repository.
|
||||
2. **Database table names**: Table names must match those defined in `infra/migrations/` SQL files.
|
||||
3. **Redis queue keys**: Queue names must match constants in `services/shared/redis_keys.py`.
|
||||
4. **Schema class names**: Pydantic model names must match their definitions in `services/shared/schemas.py` and service-specific schema files.
|
||||
5. **Configuration values**: Environment variable names and default values must match `services/shared/config.py` and service-specific configuration.
|
||||
|
||||
### Cross-Reference Integrity
|
||||
|
||||
All inter-page links (e.g., `[Page 3](03-signal-scoring-and-weighted-signals.md)`) and diagram links (e.g., `[diagram](diagrams/ingestion-to-extraction-flow.md)`) must resolve to files that exist in the deliverable.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
**Property-based testing does not apply to this feature.** The deliverable is purely documentation — Markdown narrative pages and Mermaid diagram files. There are no functions, data transformations, or code logic to test.
|
||||
|
||||
### Why PBT Does Not Apply
|
||||
|
||||
- The output is static Markdown text, not executable code
|
||||
- There are no input/output functions to verify properties against
|
||||
- There is no data transformation logic that varies with input
|
||||
- The quality criteria (narrative coherence, codebase accuracy, cross-reference integrity) are best verified through manual review
|
||||
|
||||
### Verification Approach
|
||||
|
||||
1. **File existence check**: Verify all 6 page files, the index file, and all diagram files exist at the expected paths
|
||||
2. **Link integrity**: Verify all inter-page and diagram links resolve to existing files
|
||||
3. **Mermaid syntax**: Verify each diagram file contains valid Mermaid syntax by checking for proper `flowchart` or `graph` declarations
|
||||
4. **Codebase reference spot-checks**: Verify a sample of referenced module paths, table names, and queue keys against the actual codebase
|
||||
5. **Narrative flow**: Manual review to confirm each page ends with a transition to the next and the overall story is coherent
|
||||
@@ -0,0 +1,155 @@
|
||||
# Requirements Document
|
||||
|
||||
## Introduction
|
||||
|
||||
This specification defines a 6-page narrative deep-dive document (plus separate Mermaid diagram files) that explains the full intelligence-to-decision pipeline in Stonks Oracle. The document targets a technical reader who wants to understand how raw data enters the system, gets processed by AI agents, produces structured signals, accumulates into trend summaries, and ultimately drives autonomous trading decisions. Unlike the existing service reference and API docs, this deliverable is narrative and explanatory — it tells the story of data flowing through the platform end-to-end, referencing actual code modules, database tables, queue names, and schemas from the codebase.
|
||||
|
||||
## Glossary
|
||||
|
||||
- **Deep_Dive_Document**: The 6-page Markdown document delivered under `docs/intelligence-pipeline-deep-dive/`, consisting of pages 01 through 06 covering the full intelligence-to-decision pipeline.
|
||||
- **Mermaid_Diagram_File**: A standalone Markdown file containing a single Mermaid diagram block, stored alongside the narrative pages in `docs/intelligence-pipeline-deep-dive/diagrams/`.
|
||||
- **Pipeline**: The end-to-end data flow from external source ingestion through AI extraction, signal aggregation, recommendation generation, and autonomous trading execution.
|
||||
- **Signal_Layer**: One of three independent signal sources (Company, Macro, Competitive) that produce `WeightedSignal` objects merged by the Aggregation_Engine.
|
||||
- **Aggregation_Engine**: The `services/aggregation/` module that merges weighted signals from all three layers into `TrendSummary` objects across five time windows.
|
||||
- **Trading_Engine**: The `services/trading/engine.py` module that polls recommendations and executes autonomous paper trades through a multi-check decision loop.
|
||||
- **Extractor**: The `services/extractor/` module that uses Ollama LLM inference to produce structured JSON intelligence from documents.
|
||||
- **WeightedSignal**: The `services.aggregation.scoring.WeightedSignal` dataclass that pairs a document reference with a composite aggregation weight.
|
||||
- **TrendSummary**: The `services.shared.schemas.TrendSummary` Pydantic model representing a rolling trend for a ticker across a specific time window.
|
||||
- **Recommendation**: The `services.shared.schemas.Recommendation` Pydantic model representing an actionable trade recommendation with action, mode, confidence, thesis, and position sizing.
|
||||
- **Circuit_Breaker**: The `services/trading/circuit_breaker.py` safety mechanism that halts trading when risk thresholds (daily loss, single-position loss, volatility clustering) are breached.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Document Structure and File Organization
|
||||
|
||||
**User Story:** As a technical reader, I want the deep-dive organized into clearly separated pages with a consistent structure, so that I can navigate to specific pipeline stages without reading the entire document.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document SHALL consist of exactly 6 Markdown page files named `01-data-ingestion-and-preparation.md` through `06-trading-decisions-and-execution.md`, stored under `docs/intelligence-pipeline-deep-dive/`.
|
||||
2. THE Deep_Dive_Document SHALL include an `index.md` file that provides a table of contents linking to all 6 pages and all Mermaid_Diagram_Files.
|
||||
3. WHEN a page references a Mermaid diagram, THE Deep_Dive_Document SHALL link to the corresponding Mermaid_Diagram_File stored in `docs/intelligence-pipeline-deep-dive/diagrams/` rather than embedding the diagram inline.
|
||||
4. THE Deep_Dive_Document SHALL include a minimum of 4 separate Mermaid_Diagram_Files covering: (a) the ingestion-to-extraction flow, (b) the three signal layers merging into aggregation, (c) the recommendation generation pipeline, and (d) the trading engine decision loop.
|
||||
5. WHEN a page references a code module, THE Deep_Dive_Document SHALL use the full Python module path (e.g., `services/extractor/prompts.py`) rather than abbreviated names.
|
||||
6. WHEN a page references a database table, THE Deep_Dive_Document SHALL use the exact table name as defined in the PostgreSQL schema (e.g., `document_impact_records`, `trend_windows`).
|
||||
7. WHEN a page references a Redis queue, THE Deep_Dive_Document SHALL use the full key pattern as defined in `services/shared/redis_keys.py` (e.g., `stonks:queue:extraction`).
|
||||
|
||||
### Requirement 2: Page 1 — Data Ingestion and Preparation
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how raw data enters Stonks Oracle and gets prepared for AI processing, so that I can trace the origin of any signal back to its external source.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 01 SHALL explain the four categories of input data: news articles (Polygon.io), SEC filings (EDGAR), market data (Polygon.io grouped daily and intraday bars), and macro/geopolitical events (macro news APIs).
|
||||
2. THE Deep_Dive_Document page 01 SHALL describe the Scheduler's role in orchestrating ingestion cycles, including cadence polling intervals per source type (`market_api`: 300s, `news_api`: 300s, `filings_api`: 3600s, `macro_news`: 600s), rate limiting, and exponential backoff.
|
||||
3. THE Deep_Dive_Document page 01 SHALL describe the Ingestion worker's adapter dispatch pattern, referencing the adapter classes (`PolygonMarketAdapter`, `PolygonNewsAdapter`, `SECEdgarAdapter`, `MacroNewsAdapter`) in `services/ingestion/`.
|
||||
4. THE Deep_Dive_Document page 01 SHALL explain content deduplication via Redis content-hash markers (`stonks:dedupe:*` with 24-hour TTL) and raw artifact storage in MinIO buckets (`stonks-raw-market`, `stonks-raw-news`, `stonks-raw-filings`).
|
||||
5. THE Deep_Dive_Document page 01 SHALL describe the Parser's role in converting raw HTML/text into normalized documents, including quality scoring with confidence levels (`high`, `medium`, `low`), company mention detection via alias matching, and the routing decision that sends `macro_event` documents to `stonks:queue:macro_classification` instead of `stonks:queue:extraction`.
|
||||
6. THE Deep_Dive_Document page 01 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 3: Page 2 — AI Agent Processing and Structured Extraction
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how the AI agents process documents and produce structured JSON output, so that I can evaluate the extraction quality and understand the schema contract.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 02 SHALL explain the Document Intelligence Extractor agent (`document-extractor` slug), including its entry point (`services/extractor/main.py` → `services/extractor/client.py`), the system prompt, and the user prompt template built by `build_extraction_prompt()` in `services/extractor/prompts.py`.
|
||||
2. THE Deep_Dive_Document page 02 SHALL describe the `ExtractionResult` JSON schema with all fields (summary, companies array with ticker/sentiment/impact_score/impact_horizon/catalyst_type/key_facts/risks/evidence_spans, macro_themes, novelty_score, confidence, extraction_warnings), referencing `services/extractor/schemas.py`.
|
||||
3. THE Deep_Dive_Document page 02 SHALL explain the Global Event Classifier agent (`event-classifier` slug), including its entry point (`services/extractor/event_classifier.py`), the `GlobalEvent` output schema with event_types/severity/affected_regions/affected_sectors/affected_commodities/estimated_duration/confidence, and the anti-hallucination rules that prevent classifying company-specific news as macro events.
|
||||
4. THE Deep_Dive_Document page 02 SHALL describe the JSON repair pipeline (direct parse → markdown fence stripping → `json-repair` library fallback) and the structural plus semantic validation in `services/extractor/schemas.py`, including retry logic with exponential backoff.
|
||||
5. THE Deep_Dive_Document page 02 SHALL explain the `AgentConfigResolver` mechanism (`services/shared/agent_config.py`) that enables hot-swapping models and prompts via the `ai_agents` and `agent_variants` database tables with a 60-second TTL cache.
|
||||
6. THE Deep_Dive_Document page 02 SHALL describe how extraction results are persisted to `document_intelligence` (one row per document) and `document_impact_records` (one row per company mention), and how the extractor enqueues aggregation jobs to `stonks:queue:aggregation`.
|
||||
7. THE Deep_Dive_Document page 02 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 4: Page 3 — Signal Scoring and the WeightedSignal Abstraction
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how raw extraction output gets transformed into weighted signals for decision making, so that I can reason about why certain documents influence trends more than others.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 03 SHALL explain the `WeightedSignal` dataclass (`services/aggregation/scoring.py`) and the composite weight formula: `combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier`.
|
||||
2. THE Deep_Dive_Document page 03 SHALL describe each weight component in detail: confidence gate (threshold 0.2), recency decay (exponential half-life per window: intraday=2h, 1d=12h, 7d=72h, 30d=240h, 90d=720h), source credibility weighting (clamped [0.1, 1.0] with configurable exponent), novelty bonus (up to 25%), and market context multiplier (volatility boost up to 30%, volume surge boost 15%).
|
||||
3. THE Deep_Dive_Document page 03 SHALL explain how sentiment labels are mapped to numeric values (+1.0 positive, -1.0 negative, 0.0 neutral/mixed) via `sentiment_to_numeric()` and how the weighted sentiment average is computed across all signals.
|
||||
4. THE Deep_Dive_Document page 03 SHALL describe the three signal layers (Company, Macro, Competitive) and how each produces `WeightedSignal` objects that are concatenated into a single list before trend computation, with relative influence controlled by `MACRO_SIGNAL_WEIGHT` (0.3) and `COMPETITIVE_SIGNAL_WEIGHT` (0.2).
|
||||
5. THE Deep_Dive_Document page 03 SHALL explain the runtime toggle mechanism for macro and competitive layers via the `risk_configs` database table, including graceful degradation when a layer is disabled or fails.
|
||||
6. THE Deep_Dive_Document page 03 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 5: Page 4 — Trend Aggregation and Accumulating Signals
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how the aggregation engine merges multiple signals — including consecutive signals suggesting the same direction — to produce trend summaries that drive grander decisions, so that I can see how accumulating bearish or bullish evidence escalates the system's response.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 04 SHALL explain how the Aggregation_Engine (`services/aggregation/worker.py`) computes `TrendSummary` objects across five time windows (intraday, 1d, 7d, 30d, 90d) by fetching impact records, macro impacts, and competitive signals for a ticker.
|
||||
2. THE Deep_Dive_Document page 04 SHALL describe the trend direction derivation rules: bullish (avg_sentiment ≥ 0.15), bearish (avg_sentiment ≤ -0.15), mixed (contradiction > 0.10 and |avg_sentiment| < 0.30), neutral (otherwise), referencing `derive_trend_direction()` in `services/aggregation/worker.py`.
|
||||
3. THE Deep_Dive_Document page 04 SHALL explain contradiction detection (`services/aggregation/contradiction.py`), including sentiment disagreement analysis and catalyst-level disagreement, and how the contradiction score (minority_weight / total_weight) penalizes trend confidence.
|
||||
4. THE Deep_Dive_Document page 04 SHALL describe how consecutive signals in the same direction accumulate to strengthen trend_strength and confidence, explaining the evidence ranking mechanism (`rank_evidence()`) that uses composite scoring (weight, impact, recency, confidence) and the confidence computation that rewards unique source count (caps at 15 sources for 0.8 contribution) and signal agreement (log₂ scaling, saturates around 7 unique sources).
|
||||
5. THE Deep_Dive_Document page 04 SHALL explain how accumulating bearish signals across multiple documents and time windows escalate the system's response — from a neutral hold to a bearish sell recommendation — and conversely how accumulating bullish signals escalate from watch to buy, using the trend strength and confidence thresholds from the eligibility rules.
|
||||
6. THE Deep_Dive_Document page 04 SHALL describe trend projections (`services/aggregation/projection.py`), including macro decay, momentum, driving factors, and divergence detection.
|
||||
7. THE Deep_Dive_Document page 04 SHALL describe persistence to `trend_windows` (upserted each cycle), `trend_history` (time-series snapshots), `trend_evidence` (per-document rankings), and `trend_projections`.
|
||||
8. THE Deep_Dive_Document page 04 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 6: Page 5 — Recommendation Generation and Signal-to-Action Translation
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how trend summaries are translated into actionable recommendations with risk classification and thesis generation, so that I can see the decision logic between aggregated intelligence and trading actions.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 05 SHALL explain the data quality suppression layer (`services/recommendation/suppression.py`), including the six suppression checks (extraction confidence < 0.40, evidence staleness > 168h, source diversity < 1, extraction failure rate > 50%, valid document count < 2, data quality score < 0.30) and the safety suppressions for macro-only and pattern-only trend shifts.
|
||||
2. THE Deep_Dive_Document page 05 SHALL describe the eligibility evaluation (`services/recommendation/eligibility.py`), including gate checks (confidence ≥ 0.35, strength ≥ 0.10, contradiction ≤ 0.60, evidence ≥ 2, direction ≠ neutral), action mapping (BUY/SELL for strength ≥ 0.25, HOLD for weaker directional signals, WATCH otherwise), and mode escalation (informational → paper_eligible → live_eligible based on confidence and evidence thresholds).
|
||||
3. THE Deep_Dive_Document page 05 SHALL explain position sizing computation from signal quality: base 1% + confidence × strength scaling up to 10%, with contradiction penalty, evidence count penalty, and max loss percentage scaling.
|
||||
4. THE Deep_Dive_Document page 05 SHALL describe the two-layer thesis generation: deterministic thesis assembly from trend data, and optional LLM rewrite via the `thesis-rewriter` agent (`services/recommendation/thesis_llm.py`) for trading-eligible recommendations.
|
||||
5. THE Deep_Dive_Document page 05 SHALL explain risk classification (low/moderate/high/very_high) based on contradiction score, confidence, evidence count, and mode.
|
||||
6. THE Deep_Dive_Document page 05 SHALL describe persistence to `recommendations`, `recommendation_evidence`, and `risk_evaluations` tables.
|
||||
7. THE Deep_Dive_Document page 05 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 7: Page 6 — Trading Engine Decisions and Execution
|
||||
|
||||
**User Story:** As a technical reader, I want to understand how the trading engine uses aggregated trend data to make buy/sell/hold decisions, including position sizing, risk evaluation, and circuit breakers, so that I can trace any trade back to its intelligence origin.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document page 06 SHALL explain the Trading_Engine decision loop (`services/trading/engine.py`), including the five concurrent async tasks: decision loop (60s polling), stop-loss monitor, performance loop, risk tier scheduler, and rebalance scheduler.
|
||||
2. THE Deep_Dive_Document page 06 SHALL describe the pre-trade check sequence in order: circuit breaker check, trading window check, confidence gate (risk-tier minimum), deduplication, declining positions check, and max open positions check, explaining that the first failure short-circuits the evaluation.
|
||||
3. THE Deep_Dive_Document page 06 SHALL explain position sizing (`services/trading/position_sizer.py`), including confidence-based scaling with sample-size-dampened agreement scoring, risk tier adjustment (conservative/moderate/aggressive with specific parameter differences), correlation-aware diversification, sector exposure reduction, earnings proximity adjustment, and the absolute position cap.
|
||||
4. THE Deep_Dive_Document page 06 SHALL describe the Circuit_Breaker mechanism (`services/trading/circuit_breaker.py`), including the three trigger types (daily_loss with emergency drawdown threshold, single_position loss with ticker cooldown, volatility with stop-loss clustering detection), cooldown computation, and Redis state tracking (`stonks:trading:circuit_breaker:*`).
|
||||
5. THE Deep_Dive_Document page 06 SHALL explain the reserve pool mechanism (`services/trading/reserve_pool.py`): profit siphoning (default 20%), high-water mark rebalancing (30% threshold), emergency liquidation, and ledger tracking in `reserve_pool_ledger`.
|
||||
6. THE Deep_Dive_Document page 06 SHALL describe risk tier auto-adjustment (`services/trading/risk_tier_controller.py`), including the evaluation criteria (Sharpe ratio, drawdown, win rate) and the three tier configurations with their parameter differences (min confidence, max position %, stop-loss ATR multiplier, reward/risk ratio, max sector %, max portfolio heat).
|
||||
7. THE Deep_Dive_Document page 06 SHALL explain the order submission flow: `TradingDecision` persistence to `trading_decisions`, order job enqueue to `stonks:queue:broker_orders`, broker adapter risk evaluation, Alpaca paper trading submission, and the full audit trail from signal to broker response.
|
||||
8. THE Deep_Dive_Document page 06 SHALL be written in narrative prose style with explanatory paragraphs, not as a reference table or bullet-point list.
|
||||
|
||||
### Requirement 8: Mermaid Diagram Quality and Separation
|
||||
|
||||
**User Story:** As a technical reader, I want architecture diagrams in separate files that I can render independently, so that I can use them in presentations or embed them in other documents.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Mermaid_Diagram_File is created, THE Deep_Dive_Document SHALL store the diagram in a standalone Markdown file under `docs/intelligence-pipeline-deep-dive/diagrams/` with a descriptive filename (e.g., `ingestion-to-extraction-flow.md`).
|
||||
2. THE Deep_Dive_Document SHALL include at least 4 Mermaid_Diagram_Files: one for the ingestion-to-extraction pipeline, one for the three-layer signal merging, one for the recommendation generation flow, and one for the trading engine decision loop.
|
||||
3. WHEN a Mermaid diagram references a service, THE Mermaid_Diagram_File SHALL label the service with both its human-readable name and its Python module path (e.g., `Extractor\nservices/extractor/main.py`).
|
||||
4. WHEN a Mermaid diagram references a queue, THE Mermaid_Diagram_File SHALL use the full Redis key pattern (e.g., `stonks:queue:extraction`).
|
||||
5. WHEN a Mermaid diagram references a database table, THE Mermaid_Diagram_File SHALL use the exact PostgreSQL table name.
|
||||
|
||||
### Requirement 9: Narrative Style and Cross-Referencing
|
||||
|
||||
**User Story:** As a technical reader, I want the document to read as a coherent narrative rather than a reference manual, so that I can build a mental model of the full pipeline without jumping between disconnected sections.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document SHALL use narrative prose with explanatory paragraphs as the primary writing style, reserving tables and bullet lists for structured data summaries only.
|
||||
2. WHEN a page references content covered in a different page, THE Deep_Dive_Document SHALL include a Markdown link to the relevant page and section.
|
||||
3. THE Deep_Dive_Document SHALL include transitional paragraphs at the end of each page that preview what the next page covers, creating a continuous narrative flow.
|
||||
4. THE Deep_Dive_Document SHALL reference the existing documentation where appropriate (e.g., `docs/services.md`, `docs/ai-agents.md`, `docs/architecture-data-pipeline.md`, `docs/llm-to-trade-pipeline.md`) for readers who want deeper reference-level detail.
|
||||
5. IF a concept is introduced for the first time, THEN THE Deep_Dive_Document SHALL provide a brief inline explanation before using the concept in subsequent discussion.
|
||||
|
||||
### Requirement 10: Codebase Accuracy
|
||||
|
||||
**User Story:** As a developer, I want the document to reference actual code modules, database tables, and queue names from the codebase, so that I can use the document as a reliable guide when navigating the source code.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Deep_Dive_Document SHALL reference code modules using paths that exist in the repository (e.g., `services/aggregation/scoring.py`, `services/trading/circuit_breaker.py`, `services/shared/schemas.py`).
|
||||
2. THE Deep_Dive_Document SHALL reference database tables using names that match the PostgreSQL schema as defined in `infra/migrations/`.
|
||||
3. THE Deep_Dive_Document SHALL reference Redis queue names using the constants defined in `services/shared/redis_keys.py` (e.g., `QUEUE_EXTRACTION`, `QUEUE_AGGREGATION`, `QUEUE_RECOMMENDATION`, `QUEUE_BROKER`).
|
||||
4. THE Deep_Dive_Document SHALL reference Pydantic schema classes using their actual class names from `services/shared/schemas.py` (e.g., `DocumentIntelligence`, `TrendSummary`, `Recommendation`, `GlobalEventSchema`, `CompanyImpact`).
|
||||
5. THE Deep_Dive_Document SHALL reference configuration environment variables using the exact names defined in `services/shared/config.py` and the service-specific configuration sections.
|
||||
@@ -0,0 +1,35 @@
|
||||
# Tasks — Intelligence Pipeline Deep Dive
|
||||
|
||||
## Task 1: Create directory structure and index file
|
||||
- [x] 1.1 Create `docs/intelligence-pipeline-deep-dive/` directory and `docs/intelligence-pipeline-deep-dive/diagrams/` subdirectory
|
||||
- [x] 1.2 Create `docs/intelligence-pipeline-deep-dive/index.md` with table of contents linking to all 6 pages and all diagram files, plus references to existing docs (`docs/services.md`, `docs/ai-agents.md`, `docs/architecture-data-pipeline.md`, `docs/llm-to-trade-pipeline.md`)
|
||||
|
||||
## Task 2: Create Mermaid diagram files
|
||||
- [x] 2.1 Create `docs/intelligence-pipeline-deep-dive/diagrams/ingestion-to-extraction-flow.md` — flowchart from Scheduler through Ingestion, Parser, to Extractor with all queues (`stonks:queue:ingestion`, `stonks:queue:parsing`, `stonks:queue:extraction`, `stonks:queue:macro_classification`), storage (MinIO buckets, PostgreSQL tables), and service module paths
|
||||
- [x] 2.2 Create `docs/intelligence-pipeline-deep-dive/diagrams/three-layer-signal-merging.md` — flowchart showing Company signals (`document_impact_records`), Macro signals (`macro_impact_records`), and Competitive signals (`competitive_signal_records`) each producing `WeightedSignal` objects that merge into the Aggregation engine (`services/aggregation/worker.py`)
|
||||
- [x] 2.3 Create `docs/intelligence-pipeline-deep-dive/diagrams/weighted-signal-computation.md` — diagram showing the composite weight formula components: confidence gate, recency decay, source credibility, novelty bonus, and market context multiplier
|
||||
- [x] 2.4 Create `docs/intelligence-pipeline-deep-dive/diagrams/trend-accumulation-escalation.md` — diagram showing how consecutive signals accumulate across time windows to escalate from neutral → watch → hold → buy/sell decisions
|
||||
- [x] 2.5 Create `docs/intelligence-pipeline-deep-dive/diagrams/recommendation-generation-flow.md` — flowchart from TrendSummary through data quality suppression, eligibility evaluation, thesis generation, risk classification, to recommendation persistence
|
||||
- [x] 2.6 Create `docs/intelligence-pipeline-deep-dive/diagrams/trading-engine-decision-loop.md` — flowchart showing the pre-trade check sequence (circuit breaker → trading window → confidence gate → dedup → declining positions → max positions), position sizing, and order submission to `stonks:queue:broker_orders`
|
||||
|
||||
## Task 3: Write Page 1 — Data Ingestion and Preparation
|
||||
- [x] 3.1 Write `docs/intelligence-pipeline-deep-dive/01-data-ingestion-and-preparation.md` covering: four input data categories (Polygon news, SEC EDGAR filings, Polygon market data, macro news APIs), Scheduler cadence polling (market_api: 300s, news_api: 300s, filings_api: 3600s, macro_news: 600s) with rate limiting and backoff, Ingestion worker adapter dispatch (`PolygonMarketAdapter`, `PolygonNewsAdapter`, `SECEdgarAdapter`, `MacroNewsAdapter`), content deduplication via Redis (`stonks:dedupe:*` with 24h TTL), raw artifact storage in MinIO (`stonks-raw-market`, `stonks-raw-news`, `stonks-raw-filings`), Parser role (HTML normalization, quality scoring, company mention detection, routing `macro_event` docs to `stonks:queue:macro_classification`). Written in narrative prose with links to diagrams and transition to Page 2.
|
||||
|
||||
## Task 4: Write Page 2 — AI Agent Processing and Structured Extraction
|
||||
- [x] 4.1 Write `docs/intelligence-pipeline-deep-dive/02-ai-agent-processing-and-extraction.md` covering: Document Intelligence Extractor agent (`document-extractor` slug, `services/extractor/main.py` → `services/extractor/client.py`, system prompt, `build_extraction_prompt()` in `services/extractor/prompts.py`), `ExtractionResult` JSON schema with all fields, Global Event Classifier agent (`event-classifier` slug, `services/extractor/event_classifier.py`, `GlobalEvent` schema, anti-hallucination rules), JSON repair pipeline (direct parse → fence stripping → `json-repair` fallback), structural + semantic validation in `services/extractor/schemas.py`, `AgentConfigResolver` mechanism (`services/shared/agent_config.py`, `ai_agents`/`agent_variants` tables, 60s TTL cache), persistence to `document_intelligence` and `document_impact_records`, aggregation job enqueue. Written in narrative prose with links to diagrams and transition to Page 3.
|
||||
|
||||
## Task 5: Write Page 3 — Signal Scoring and the WeightedSignal Abstraction
|
||||
- [x] 5.1 Write `docs/intelligence-pipeline-deep-dive/03-signal-scoring-and-weighted-signals.md` covering: `WeightedSignal` dataclass (`services/aggregation/scoring.py`), composite weight formula (`combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier`), each component in detail (confidence gate threshold 0.2, recency decay half-lives per window, source credibility clamped [0.1, 1.0], novelty bonus up to 25%, market context volatility boost up to 30% and volume surge boost 15%), sentiment mapping via `sentiment_to_numeric()`, weighted sentiment average computation, three signal layers (Company, Macro weight 0.3, Competitive weight 0.2), runtime toggle via `risk_configs` table. Written in narrative prose with links to diagrams and transition to Page 4.
|
||||
|
||||
## Task 6: Write Page 4 — Trend Aggregation and Accumulating Signals
|
||||
- [x] 6.1 Write `docs/intelligence-pipeline-deep-dive/04-trend-aggregation-and-accumulating-signals.md` covering: Aggregation engine computing TrendSummary across 5 windows (intraday, 1d, 7d, 30d, 90d), trend direction rules (bullish ≥ 0.15, bearish ≤ -0.15, mixed, neutral), contradiction detection (`services/aggregation/contradiction.py`, minority_weight/total_weight), evidence ranking (`rank_evidence()` composite scoring), confidence computation (unique source count caps at 15, log₂ scaling saturates at 7 sources), how consecutive same-direction signals accumulate to escalate decisions (neutral → watch → hold → buy/sell), trend projections (`services/aggregation/projection.py`, macro decay, momentum, divergence detection), persistence to `trend_windows`, `trend_history`, `trend_evidence`, `trend_projections`. Written in narrative prose with links to diagrams and transition to Page 5.
|
||||
|
||||
## Task 7: Write Page 5 — Recommendation Generation and Signal-to-Action Translation
|
||||
- [x] 7.1 Write `docs/intelligence-pipeline-deep-dive/05-recommendation-generation.md` covering: data quality suppression (`services/recommendation/suppression.py`, 6 checks: extraction confidence < 0.40, staleness > 168h, source diversity < 1, failure rate > 50%, valid docs < 2, quality score < 0.30, plus macro-only and pattern-only safety), eligibility evaluation (`services/recommendation/eligibility.py`, gate checks, action mapping BUY/SELL/HOLD/WATCH, mode escalation informational/paper_eligible/live_eligible), position sizing (base 1% + confidence × strength up to 10%, contradiction and evidence penalties), thesis generation (deterministic + optional LLM rewrite via `thesis-rewriter` agent), risk classification (low/moderate/high/very_high), persistence to `recommendations`, `recommendation_evidence`, `risk_evaluations`. Written in narrative prose with links to diagrams and transition to Page 6.
|
||||
|
||||
## Task 8: Write Page 6 — Trading Engine Decisions and Execution
|
||||
- [x] 8.1 Write `docs/intelligence-pipeline-deep-dive/06-trading-decisions-and-execution.md` covering: Trading engine decision loop (`services/trading/engine.py`, 5 concurrent tasks: decision loop 60s, stop-loss monitor, performance loop, risk tier scheduler, rebalance scheduler), pre-trade check sequence (circuit breaker → trading window → confidence gate → dedup → declining positions → max positions), position sizing (`services/trading/position_sizer.py`, confidence scaling, risk tier adjustment, correlation diversification, sector exposure, earnings proximity, absolute cap), circuit breaker (`services/trading/circuit_breaker.py`, daily_loss, single_position, volatility triggers, cooldown, Redis state), reserve pool (`services/trading/reserve_pool.py`, profit siphoning 20%, high-water mark 30%, emergency liquidation), risk tier auto-adjustment (`services/trading/risk_tier_controller.py`, Sharpe/drawdown/win-rate evaluation, conservative/moderate/aggressive tiers), order submission flow (TradingDecision → `stonks:queue:broker_orders` → broker adapter → Alpaca). Written in narrative prose with links to diagrams.
|
||||
|
||||
## Task 9: Update index and verify cross-references
|
||||
- [x] 9.1 Update `docs/intelligence-pipeline-deep-dive/index.md` to ensure all page links and diagram links are correct and all files exist
|
||||
- [x] 9.2 Verify all inter-page links within narrative pages resolve correctly and all diagram references point to existing files
|
||||
@@ -0,0 +1 @@
|
||||
{"specId": "e6d189b2-5861-4e24-954f-5e254246a910", "workflowType": "requirements-first", "specType": "feature"}
|
||||
@@ -0,0 +1,341 @@
|
||||
# Design Document: Sanitized Pipeline Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
This design specifies the process and structure for producing a sanitized version of the 6-page intelligence pipeline deep dive documentation. The sanitized docs transform the existing `docs/intelligence-pipeline-deep-dive/` content into domain-neutral equivalents stored at `docs/sanitized-pipeline-deep-dive/`, stripping all financial, market, and trading language while preserving every engineering detail — algorithms, formulas, architectural patterns, queue topologies, database schemas, code module references, and Mermaid diagrams.
|
||||
|
||||
The deliverable is a documentation-only transformation. No application code, database schemas, or infrastructure changes are involved. The output is Markdown files and Mermaid diagram files that mirror the original structure with domain-neutral framing.
|
||||
|
||||
**Key design decision**: The sanitization is a manual content transformation guided by a defined terminology map. Each source file is read, transformed according to the mapping rules, and written to the output directory. The original files remain untouched.
|
||||
|
||||
### Source Material
|
||||
|
||||
The source documentation at `docs/intelligence-pipeline-deep-dive/` consists of:
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `index.md` | Table of contents, introduction, diagram links, related docs |
|
||||
| `01-data-ingestion-and-preparation.md` | Scheduler, ingestion worker, deduplication, parser |
|
||||
| `02-ai-agent-processing-and-extraction.md` | Document extractor, event classifier, JSON repair, validation |
|
||||
| `03-signal-scoring-and-weighted-signals.md` | Composite weight formula, three signal layers, sentiment mapping |
|
||||
| `04-trend-aggregation-and-accumulating-signals.md` | Time windows, trend direction, contradiction, evidence ranking, confidence |
|
||||
| `05-recommendation-generation.md` | Suppression, eligibility, position sizing, thesis, risk classification |
|
||||
| `06-trading-decisions-and-execution.md` | Trading engine, pre-trade checks, circuit breakers, broker adapter |
|
||||
| `diagrams/ingestion-to-extraction-flow.md` | Mermaid flowchart: scheduler → ingestion → parser → extractor |
|
||||
| `diagrams/three-layer-signal-merging.md` | Mermaid flowchart: three signal layers → aggregation |
|
||||
| `diagrams/weighted-signal-computation.md` | Mermaid flowchart: composite weight formula breakdown |
|
||||
| `diagrams/trend-accumulation-escalation.md` | Mermaid flowchart: time windows → escalation path |
|
||||
| `diagrams/recommendation-generation-flow.md` | Mermaid flowchart: suppression → eligibility → thesis → risk |
|
||||
| `diagrams/trading-engine-decision-loop.md` | Mermaid flowchart: pre-trade checks → position sizing → order submission |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Output File Organization
|
||||
|
||||
The sanitized docs mirror the source structure with sanitized filenames:
|
||||
|
||||
```
|
||||
docs/sanitized-pipeline-deep-dive/
|
||||
├── index.md
|
||||
├── 01-data-ingestion-and-preparation.md
|
||||
├── 02-ai-agent-processing-and-extraction.md
|
||||
├── 03-signal-scoring-and-weighted-signals.md
|
||||
├── 04-trend-aggregation-and-accumulating-signals.md
|
||||
├── 05-recommendation-generation.md
|
||||
├── 06-decision-execution.md
|
||||
└── diagrams/
|
||||
├── ingestion-to-extraction-flow.md
|
||||
├── three-layer-signal-merging.md
|
||||
├── weighted-signal-computation.md
|
||||
├── trend-accumulation-escalation.md
|
||||
├── recommendation-generation-flow.md
|
||||
└── decision-engine-loop.md
|
||||
```
|
||||
|
||||
**Filename changes from source:**
|
||||
- `06-trading-decisions-and-execution.md` → `06-decision-execution.md` (removes "trading")
|
||||
- `diagrams/trading-engine-decision-loop.md` → `diagrams/decision-engine-loop.md` (removes "trading")
|
||||
- All other filenames are already domain-neutral and remain unchanged
|
||||
|
||||
### Transformation Process
|
||||
|
||||
The sanitization follows a three-pass approach for each file:
|
||||
|
||||
1. **Terminology pass**: Apply the terminology map to replace all financial/trading terms with domain-neutral equivalents. This covers inline text, headings, table cells, code blocks, and Mermaid diagram labels.
|
||||
2. **Reference pass**: Update all internal cross-references to point to sanitized filenames (e.g., `06-trading-decisions-and-execution.md` → `06-decision-execution.md`, `trading-engine-decision-loop.md` → `decision-engine-loop.md`). Remove or neutralize references to external financial docs (e.g., links to `../llm-to-trade-pipeline.md` become neutral descriptions).
|
||||
3. **Narrative pass**: Reframe example scenarios, inline illustrations, and narrative framing to use domain-neutral language. This pass handles context-dependent replacements that a simple find-and-replace cannot catch — e.g., "a bearish article about AAPL" becomes "a negative-sentiment article about Entity-A".
|
||||
|
||||
### Content Flow
|
||||
|
||||
The sanitized docs preserve the same page-to-page narrative flow as the originals:
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
P1["Page 1\nData Ingestion"] --> P2["Page 2\nAI Extraction"]
|
||||
P2 --> P3["Page 3\nSignal Scoring"]
|
||||
P3 --> P4["Page 4\nTrend Aggregation"]
|
||||
P4 --> P5["Page 5\nRecommendations"]
|
||||
P5 --> P6["Page 6\nDecision Execution"]
|
||||
```
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### Terminology Map
|
||||
|
||||
The core of the sanitization is a defined mapping from financial/trading terms to domain-neutral equivalents. The map is applied consistently across all files.
|
||||
|
||||
#### System and Provider Names
|
||||
|
||||
| Source Term | Sanitized Replacement |
|
||||
|-------------|----------------------|
|
||||
| Stonks Oracle / stonks | the platform / the system |
|
||||
| Polygon.io / Polygon | external data provider / data source API |
|
||||
| SEC EDGAR / SEC / EFTS | public records API / regulatory filings source |
|
||||
| Alpaca / AlpacaBrokerAdapter | execution adapter / external execution API |
|
||||
| Wall Street | (removed or reframed) |
|
||||
|
||||
#### Trading and Financial Actions
|
||||
|
||||
| Source Term | Sanitized Replacement |
|
||||
|-------------|----------------------|
|
||||
| buy | act |
|
||||
| sell | defer |
|
||||
| hold | monitor |
|
||||
| watch | observe |
|
||||
| trading engine | decision execution engine |
|
||||
| paper trading / paper_eligible | simulation mode / simulation_eligible |
|
||||
| live trading / live_eligible | live execution mode / production_eligible |
|
||||
| trade / trading (as action) | decision / execution |
|
||||
| order (broker order) | execution request |
|
||||
| pre-trade checks | pre-execution checks |
|
||||
|
||||
#### Financial Concepts
|
||||
|
||||
| Source Term | Sanitized Replacement |
|
||||
|-------------|----------------------|
|
||||
| portfolio | resource pool / allocation pool |
|
||||
| portfolio allocation | resource allocation |
|
||||
| portfolio heat | pool exposure |
|
||||
| portfolio snapshots | pool snapshots |
|
||||
| position sizing | commitment sizing / resource allocation |
|
||||
| position (open position) | commitment / active commitment |
|
||||
| stop-loss | risk threshold / loss limit |
|
||||
| take-profit | gain target |
|
||||
| bullish | positive / favorable |
|
||||
| bearish | negative / unfavorable |
|
||||
| stock ticker / ticker symbol | entity identifier |
|
||||
| stock market | (removed or reframed) |
|
||||
| earnings / earnings call / earnings report | performance report / periodic disclosure |
|
||||
| 10-K / 10-Q / 8-K | regulatory filing types |
|
||||
| SEC filings | regulatory filings |
|
||||
| broker / broker API | execution adapter / execution API |
|
||||
| P&L | gain/loss |
|
||||
| Sharpe ratio | risk-adjusted return ratio |
|
||||
| drawdown | peak-to-trough decline |
|
||||
| win rate | success rate |
|
||||
|
||||
#### Ticker Symbols and Company Names
|
||||
|
||||
| Source Term | Sanitized Replacement |
|
||||
|-------------|----------------------|
|
||||
| AAPL / Apple | Entity-A |
|
||||
| TSLA / Tesla | Entity-B |
|
||||
| NVDA / NVIDIA | Entity-C |
|
||||
| XOM | Entity-D |
|
||||
| META | Entity-E |
|
||||
| Any other ticker | Entity-{letter} or "tracked entity" |
|
||||
|
||||
#### Redis Keys
|
||||
|
||||
| Source Pattern | Sanitized Pattern |
|
||||
|----------------|-------------------|
|
||||
| `stonks:queue:*` | `app:queue:*` |
|
||||
| `stonks:dedupe:*` | `app:dedupe:*` |
|
||||
| `stonks:ratelimit:*` | `app:ratelimit:*` |
|
||||
| `stonks:trading:circuit_breaker:*` | `app:execution:circuit_breaker:*` |
|
||||
| `stonks:dedupe:trading:*` | `app:dedupe:execution:*` |
|
||||
|
||||
#### MinIO Buckets
|
||||
|
||||
| Source Bucket | Sanitized Bucket |
|
||||
|---------------|-----------------|
|
||||
| `stonks-raw-market` | `app-raw-data` |
|
||||
| `stonks-raw-news` | `app-raw-content` |
|
||||
| `stonks-raw-filings` | `app-raw-filings` |
|
||||
| `stonks-normalized` | `app-normalized` |
|
||||
| `stonks-llm-prompts` | `app-llm-prompts` |
|
||||
| `stonks-llm-results` | `app-llm-results` |
|
||||
|
||||
#### Database Tables
|
||||
|
||||
| Source Table | Sanitized Table |
|
||||
|-------------|----------------|
|
||||
| `trading_decisions` | `execution_decisions` |
|
||||
| `portfolio_snapshots` | `pool_snapshots` |
|
||||
| `portfolio_pct` (column) | `allocation_pct` |
|
||||
|
||||
All other table names (`documents`, `document_intelligence`, `trend_windows`, `recommendations`, etc.) are already domain-neutral and remain unchanged.
|
||||
|
||||
#### Adapter and Source Type Names
|
||||
|
||||
| Source Term | Sanitized Replacement |
|
||||
|-------------|----------------------|
|
||||
| `PolygonNewsAdapter` | `ExternalNewsAdapter` |
|
||||
| `PolygonMarketAdapter` | `ExternalDataAdapter` |
|
||||
| `SECEdgarAdapter` | `RegulatoryFilingsAdapter` |
|
||||
| `AlpacaBrokerAdapter` | `ExecutionAdapter` |
|
||||
| `broker` (source_type) | `execution_api` |
|
||||
| `market_api` (source_type) | `data_api` |
|
||||
| `filings_api` (source_type) | `filings_api` (unchanged — already neutral) |
|
||||
|
||||
### Preserved Engineering Terms
|
||||
|
||||
The following terms are explicitly preserved because they describe engineering patterns, not financial concepts:
|
||||
|
||||
- **circuit breaker** — engineering safety pattern for rate limiting and cascading failure prevention
|
||||
- **exponential backoff** — retry pattern
|
||||
- **adapter pattern** — software design pattern (only the domain-specific adapter *names* are sanitized)
|
||||
- **signal** — used in signal processing and scoring context
|
||||
- **trend**, **sentiment**, **confidence**, **contradiction**, **evidence** — data analysis terms
|
||||
- **recency decay**, **credibility weight**, **novelty bonus** — scoring algorithm terms
|
||||
- **weighted sentiment average** — mathematical computation term
|
||||
|
||||
### Preserved Technical Content
|
||||
|
||||
All of the following are preserved verbatim (with only the terminology map applied to embedded financial terms):
|
||||
|
||||
- Composite signal scoring formula: `combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier`
|
||||
- Confidence computation formula with log₂ scaling and four components
|
||||
- Weighted sentiment average formula
|
||||
- All threshold values, configuration parameters, and numeric constants
|
||||
- All Markdown table structures containing technical parameters
|
||||
- All code module path references (e.g., `services/aggregation/scoring.py`)
|
||||
- Three-layer signal architecture with weight ratios (1.0, 0.3, 0.2)
|
||||
- Contradiction detection algorithm and evidence ranking methodology
|
||||
- All PostgreSQL table structures and column descriptions (with sanitized names where needed)
|
||||
- All Redis queue patterns and operations (`rpush`/`lpop`/`blpop`)
|
||||
- All MinIO storage patterns (with sanitized bucket names)
|
||||
- Ollama as the LLM inference provider
|
||||
|
||||
### Index Page Reframing
|
||||
|
||||
The sanitized `index.md` describes the system as an "AI-driven intelligence-to-decision pipeline" that:
|
||||
1. Ingests data from multiple external data sources
|
||||
2. Extracts structured intelligence via NLP/LLM
|
||||
3. Scores and weights signals
|
||||
4. Aggregates trends across time windows
|
||||
5. Generates recommendations with quality gates
|
||||
6. Executes decisions autonomously with safety mechanisms
|
||||
|
||||
References to "Stonks Oracle" are replaced with "the platform" or "the system". References to financial-specific APIs (Polygon.io, SEC EDGAR) are replaced with neutral descriptions. The "Related Documentation" section links are updated to use neutral descriptions or removed if they reference financial-specific content.
|
||||
|
||||
### Page 06 Reframing
|
||||
|
||||
Page 06 undergoes the most extensive reframing since it covers the trading engine. Key changes:
|
||||
- Title: "Decision Execution" instead of "Trading Decisions and Execution"
|
||||
- "Trading engine" → "decision execution engine"
|
||||
- "Pre-trade checks" → "pre-execution checks"
|
||||
- "Broker adapter" / "Alpaca" → "execution adapter" / "external execution API"
|
||||
- "Paper trading" → "simulation mode"
|
||||
- "Live trading" → "live execution mode"
|
||||
- "Portfolio" → "resource pool" / "allocation pool"
|
||||
- "Position" → "commitment" / "active commitment"
|
||||
- "Stop-loss" → "risk threshold"
|
||||
- "Take-profit" → "gain target"
|
||||
- All order submission language reframed as "execution request submission"
|
||||
|
||||
### Diagram Sanitization
|
||||
|
||||
Each Mermaid diagram file receives the same terminology map treatment:
|
||||
- Node labels containing financial terms are replaced
|
||||
- Queue name labels (`stonks:queue:*` → `app:queue:*`)
|
||||
- Bucket name labels (`stonks-raw-market` → `app-raw-data`)
|
||||
- Table name labels (`trading_decisions` → `execution_decisions`)
|
||||
- Adapter names in node labels
|
||||
- Subgraph titles containing financial terms
|
||||
- The `trading-engine-decision-loop.md` diagram is renamed to `decision-engine-loop.md`
|
||||
|
||||
Mermaid syntax, node relationships, subgraph structures, and flow directions are preserved exactly.
|
||||
|
||||
## Data Models
|
||||
|
||||
This feature produces only documentation files. There are no new data models, database tables, or schema changes.
|
||||
|
||||
The sanitized narrative pages reference the same data models as the originals, with terminology-mapped names where applicable:
|
||||
|
||||
- **`WeightedSignal`** — document reference + composite weight + sentiment + impact (unchanged)
|
||||
- **`SignalWeight`** — breakdown of recency, credibility, novelty, confidence gate, market context multiplier (unchanged)
|
||||
- **`TrendSummary`** — rolling trend for an entity across a time window (unchanged)
|
||||
- **`Recommendation`** — actionable decision recommendation (reframed from "trade recommendation")
|
||||
- **`execution_decisions`** table — audit record of every decision evaluation (sanitized from `trading_decisions`)
|
||||
- **`pool_snapshots`** table — resource pool state snapshots (sanitized from `portfolio_snapshots`)
|
||||
|
||||
|
||||
## Correctness Properties
|
||||
|
||||
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
||||
|
||||
The sanitized documentation set has one key universal property: the complete absence of financial/trading terminology across all output files. This is well-suited to property-based testing because the property must hold for *every* file in the output set, and the banned term list is large enough that systematic checking across all files provides high-value coverage.
|
||||
|
||||
### Property 1: Banned Financial Terminology Exclusion
|
||||
|
||||
*For any* file in the sanitized documentation set (`docs/sanitized-pipeline-deep-dive/`), the file content shall not contain any term from the comprehensive banned financial terminology list. The banned list includes: stock ticker symbols (AAPL, TSLA, NVDA, XOM, META, and all 50 tracked tickers), company names used as financial examples (Apple, Tesla, NVIDIA), trading action labels (buy, sell, hold, watch as action labels — BUY, SELL, HOLD, WATCH in uppercase), financial system terms (trading engine, paper trading, live trading, paper_eligible, live_eligible, portfolio, portfolio allocation, portfolio heat, portfolio snapshots, broker, Alpaca, broker adapter, broker API, stock market, Wall Street, bullish, bearish, position sizing, stop-loss), financial event terms (SEC EDGAR, SEC filings, 10-K, 10-Q, 8-K, earnings, earnings call, earnings report), provider names (Polygon.io, Polygon), system names (Stonks Oracle, stonks), and infrastructure patterns containing financial terms (stonks: prefix in Redis keys, stonks- prefix in MinIO buckets, trading_decisions table name, portfolio_snapshots table name).
|
||||
|
||||
**Validates: Requirements 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 6.2, 7.1, 7.2, 7.3, 8.1, 8.2**
|
||||
|
||||
## Error Handling
|
||||
|
||||
Since this is a documentation-only deliverable, there is no runtime error handling to design. The primary quality concerns are:
|
||||
|
||||
### Accuracy of Terminology Replacement
|
||||
|
||||
Every financial/trading term must be replaced with its domain-neutral equivalent. Missing a single instance of "stonks" in a Redis key pattern or "AAPL" in an example scenario would violate the sanitization requirements. The terminology map defined in the Components section serves as the authoritative reference.
|
||||
|
||||
### Preservation of Technical Content
|
||||
|
||||
The sanitization must not accidentally remove or alter engineering content. Key risks:
|
||||
- **Formula corruption**: The composite weight formula contains `market_context_multiplier` — the word "market" must not be blindly replaced since it's part of a technical variable name
|
||||
- **Code path corruption**: Module paths like `services/trading/engine.py` contain "trading" — these paths reference actual files and must be preserved as-is (the code files are not being renamed)
|
||||
- **Table name corruption**: Database table names like `trading_decisions` need sanitization in narrative text but the actual SQL/code references to the original table names should be handled carefully
|
||||
|
||||
**Design decision**: Code module paths (e.g., `services/trading/engine.py`) are preserved exactly as they appear in the source, since they reference actual files in the repository. Only narrative references to concepts (e.g., "the trading engine") are sanitized. Variable names within formulas and code blocks are preserved. Database table names are sanitized in narrative descriptions and table listings, but inline code references note the sanitized name.
|
||||
|
||||
### Cross-Reference Integrity
|
||||
|
||||
All internal links must resolve to files that exist in the sanitized output:
|
||||
- Page-to-page links must use sanitized filenames
|
||||
- Diagram links must use sanitized diagram filenames
|
||||
- No links should point back to the source `docs/intelligence-pipeline-deep-dive/` directory
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Why Limited PBT Applies
|
||||
|
||||
This is a documentation-only deliverable — the output is static Markdown files, not executable code with functions and data transformations. However, one universal property (banned term exclusion) is well-suited to property-based testing because it must hold across all files and involves checking a large set of terms against file content.
|
||||
|
||||
Most other requirements (structural checks, content preservation, narrative reframing) are better verified through example-based tests and manual review.
|
||||
|
||||
### Property-Based Tests
|
||||
|
||||
- **Library**: Hypothesis (Python, already in the project)
|
||||
- **Configuration**: `@settings(max_examples=100)`
|
||||
- **Property 1 implementation**: Generate random selections from the banned term list and random file selections from the sanitized docs, verify the term does not appear in the file content. Alternatively, exhaustively check all banned terms against all files (since the file set is small and fixed, this is more practical as an exhaustive example-based test).
|
||||
|
||||
**Practical note**: Given the small, fixed file set (14 files), the banned term exclusion property is most practically implemented as an exhaustive check — iterate all files × all banned terms — rather than a randomized property test. This provides complete coverage rather than probabilistic coverage.
|
||||
|
||||
### Example-Based Tests
|
||||
|
||||
1. **File structure verification**: Verify all expected files exist at the correct paths
|
||||
2. **Cross-reference integrity**: Parse all sanitized files, extract markdown links, verify they resolve to existing sanitized files
|
||||
3. **Mermaid syntax validation**: Verify each diagram file contains valid Mermaid `flowchart` declarations
|
||||
4. **Technical content preservation**: Spot-check that key formulas, threshold values, and code module paths are present in the sanitized docs
|
||||
5. **Terminology replacement verification**: Spot-check that key replacements appear (e.g., "decision execution engine" replaces "trading engine")
|
||||
6. **Index page framing**: Verify the index describes the system as an "AI-driven intelligence-to-decision pipeline"
|
||||
7. **Database table sanitization**: Verify `execution_decisions` appears where `trading_decisions` was, and `pool_snapshots` where `portfolio_snapshots` was
|
||||
|
||||
### Manual Review
|
||||
|
||||
- Narrative coherence and readability of the sanitized content
|
||||
- Consistency of domain-neutral framing across all pages
|
||||
- Quality of example scenario replacements (e.g., "bearish article about AAPL" → "negative-sentiment article about Entity-A")
|
||||
- Preservation of page-to-page transition flow
|
||||
@@ -0,0 +1,202 @@
|
||||
# Requirements Document
|
||||
|
||||
## Introduction
|
||||
|
||||
This feature produces a sanitized version of the existing 6-page intelligence pipeline deep dive documentation (`docs/intelligence-pipeline-deep-dive/`) for use in a work presentation. The sanitized version strips all financial, market, and trading language — stock tickers, buy/sell/hold actions, portfolio allocation, broker APIs, and domain-specific framing — and reframes the content as a general-purpose AI decision intelligence pipeline. The sanitized docs are stored as a separate doc group under `docs/sanitized-pipeline-deep-dive/`, preserving the original documents untouched. All engineering depth — algorithms, formulas, architectural patterns, queue topologies, database schemas, code module references, and Mermaid diagrams — is preserved. Only the domain-specific framing changes.
|
||||
|
||||
## Glossary
|
||||
|
||||
- **Source_Docs**: The original 6-page documentation set at `docs/intelligence-pipeline-deep-dive/`, including `index.md`, pages `01` through `06`, and the `diagrams/` subdirectory containing 6 Mermaid diagram files.
|
||||
- **Sanitized_Docs**: The output documentation set at `docs/sanitized-pipeline-deep-dive/`, mirroring the structure of Source_Docs with all financial/market/trading language replaced by domain-neutral equivalents.
|
||||
- **Sanitization_Engine**: The process (manual or automated) that transforms Source_Docs into Sanitized_Docs by applying the terminology mapping and content reframing rules defined in this document.
|
||||
- **Terminology_Map**: The defined set of financial/market/trading terms and their domain-neutral replacements used by the Sanitization_Engine.
|
||||
- **Entity_Identifier**: The domain-neutral replacement for stock ticker symbols (e.g., AAPL, TSLA) in Sanitized_Docs.
|
||||
- **Decision_Term**: A domain-neutral action term (act, defer, monitor, observe) that replaces trading actions (buy, sell, hold, watch) in Sanitized_Docs.
|
||||
- **Decision_Execution_Engine**: The domain-neutral name for the trading engine in Sanitized_Docs.
|
||||
- **Execution_Adapter**: The domain-neutral name for broker adapters and broker API references in Sanitized_Docs.
|
||||
- **Allocation_Pool**: The domain-neutral name for portfolio references in Sanitized_Docs.
|
||||
- **Commitment_Sizing**: The domain-neutral name for position sizing in Sanitized_Docs.
|
||||
|
||||
---
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Separate Output Directory
|
||||
|
||||
**User Story:** As a presenter, I want the sanitized docs stored in a separate directory from the originals, so that the original documentation remains untouched and both versions coexist.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Sanitization_Engine SHALL write all output files to `docs/sanitized-pipeline-deep-dive/`.
|
||||
2. THE Sanitization_Engine SHALL NOT modify, overwrite, or delete any file under `docs/intelligence-pipeline-deep-dive/`.
|
||||
3. THE Sanitized_Docs SHALL contain an `index.md` file at the root of `docs/sanitized-pipeline-deep-dive/`.
|
||||
4. THE Sanitized_Docs SHALL contain a `diagrams/` subdirectory under `docs/sanitized-pipeline-deep-dive/`.
|
||||
|
||||
---
|
||||
|
||||
### Requirement 2: Mirror the 6-Page Structure
|
||||
|
||||
**User Story:** As a presenter, I want the sanitized docs to mirror the same 6-page structure as the originals, so that readers familiar with the original can navigate the sanitized version identically.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Sanitized_Docs SHALL contain exactly 6 numbered page files matching the naming pattern of Source_Docs: `01-*.md` through `06-*.md`.
|
||||
2. THE Sanitized_Docs SHALL contain an `index.md` with a table of contents linking to all 6 pages and all diagrams, mirroring the structure of the Source_Docs index.
|
||||
3. THE Sanitized_Docs SHALL contain one Mermaid diagram file in `diagrams/` for each diagram file present in `docs/intelligence-pipeline-deep-dive/diagrams/`.
|
||||
4. WHEN a Source_Docs page contains internal cross-references to other pages or diagrams, THE Sanitized_Docs equivalent page SHALL contain corresponding cross-references pointing to the Sanitized_Docs versions of those pages and diagrams.
|
||||
5. THE Sanitized_Docs page filenames SHALL use sanitized titles (e.g., `06-decision-execution.md` instead of `06-trading-decisions-and-execution.md`).
|
||||
|
||||
---
|
||||
|
||||
### Requirement 3: Strip Financial and Trading Terminology
|
||||
|
||||
**User Story:** As a presenter, I want all financial, market, and trading language removed from the sanitized docs, so that the presentation focuses on engineering without revealing the financial domain.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Sanitized_Docs SHALL NOT contain any stock ticker symbols (e.g., AAPL, TSLA, NVDA, XOM, META).
|
||||
2. THE Sanitized_Docs SHALL NOT contain the trading action terms "buy", "sell", "hold", or "watch" when used as system action labels or decision outputs.
|
||||
3. THE Sanitized_Docs SHALL NOT contain the terms "trading engine", "paper trading", "live trading", "paper_eligible", or "live_eligible".
|
||||
4. THE Sanitized_Docs SHALL NOT contain the terms "portfolio", "portfolio allocation", "portfolio heat", or "portfolio snapshots" when referring to the resource management domain concept.
|
||||
5. THE Sanitized_Docs SHALL NOT contain references to "broker", "Alpaca", "broker adapter", or "broker API".
|
||||
6. THE Sanitized_Docs SHALL NOT contain the terms "stock market", "Wall Street", "bullish", "bearish", "position sizing" (as a financial concept label), or "stop-loss" (as a financial concept label).
|
||||
7. THE Sanitized_Docs SHALL NOT contain company names used as financial examples (e.g., "Apple", "Tesla", "NVIDIA" when used in a stock/market context).
|
||||
8. THE Sanitized_Docs SHALL NOT contain the terms "SEC EDGAR", "SEC filings", "10-K", "10-Q", "8-K", "earnings", "earnings call", or "earnings report" as domain-specific financial references.
|
||||
9. THE Sanitized_Docs SHALL NOT contain references to "Polygon.io" or "Polygon" as a financial data provider name.
|
||||
10. THE Sanitized_Docs SHALL NOT contain the term "Stonks Oracle" or "stonks" as a system name.
|
||||
|
||||
---
|
||||
|
||||
### Requirement 4: Apply Domain-Neutral Terminology Mapping
|
||||
|
||||
**User Story:** As a presenter, I want consistent domain-neutral replacements for all stripped terms, so that the sanitized docs read coherently as a general-purpose AI decision intelligence pipeline.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Source_Docs use "stock ticker" or specific ticker symbols, THE Sanitized_Docs SHALL use "entity identifier" or "tracked entity".
|
||||
2. WHEN the Source_Docs use "buy/sell/hold/watch" as action labels, THE Sanitized_Docs SHALL use "act/defer/monitor/observe" or equivalent neutral decision terms.
|
||||
3. WHEN the Source_Docs use "trading engine", THE Sanitized_Docs SHALL use "decision execution engine" or "action engine".
|
||||
4. WHEN the Source_Docs use "portfolio", THE Sanitized_Docs SHALL use "resource pool" or "allocation pool".
|
||||
5. WHEN the Source_Docs use "broker" or "Alpaca", THE Sanitized_Docs SHALL use "execution adapter" or "external execution API".
|
||||
6. WHEN the Source_Docs use "paper trading", THE Sanitized_Docs SHALL use "simulation mode" or "dry-run mode".
|
||||
7. WHEN the Source_Docs use "live trading", THE Sanitized_Docs SHALL use "live execution mode" or "production mode".
|
||||
8. WHEN the Source_Docs use "bullish" or "bearish", THE Sanitized_Docs SHALL use "positive" or "negative" (or "favorable"/"unfavorable").
|
||||
9. WHEN the Source_Docs use "position sizing", THE Sanitized_Docs SHALL use "resource allocation" or "commitment sizing".
|
||||
10. WHEN the Source_Docs use "stop-loss", THE Sanitized_Docs SHALL use "risk threshold" or "loss limit".
|
||||
11. WHEN the Source_Docs use "Stonks Oracle" or "stonks", THE Sanitized_Docs SHALL use a neutral system name such as "the platform" or "the system".
|
||||
12. WHEN the Source_Docs use "SEC EDGAR" or "SEC filings", THE Sanitized_Docs SHALL use "regulatory filings source" or "public records API".
|
||||
13. WHEN the Source_Docs use "Polygon.io" or "Polygon", THE Sanitized_Docs SHALL use "external data provider" or "data source API".
|
||||
14. WHEN the Source_Docs use "earnings" as a catalyst type or event, THE Sanitized_Docs SHALL use "performance report" or "periodic disclosure".
|
||||
15. THE Sanitized_Docs SHALL apply the Terminology_Map consistently across all 6 pages, the index, and all diagram files.
|
||||
|
||||
---
|
||||
|
||||
### Requirement 5: Preserve Engineering and Technical Depth
|
||||
|
||||
**User Story:** As a presenter, I want all engineering concepts, algorithms, formulas, and architectural details preserved, so that the sanitized docs demonstrate the technical sophistication of the system.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Sanitized_Docs SHALL preserve all references to Redis queue patterns, including queue names and `rpush`/`lpop`/`blpop` operations.
|
||||
2. THE Sanitized_Docs SHALL preserve all references to PostgreSQL tables, including table names and column descriptions.
|
||||
3. THE Sanitized_Docs SHALL preserve all references to MinIO buckets and storage patterns.
|
||||
4. THE Sanitized_Docs SHALL preserve all references to Ollama as the LLM inference provider.
|
||||
5. THE Sanitized_Docs SHALL preserve the composite signal scoring formula: `combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier`.
|
||||
6. THE Sanitized_Docs SHALL preserve the confidence computation formula with log₂ scaling and its four components (unique source count, average extraction credibility, signal agreement with sample-size dampening, contradiction penalty).
|
||||
7. THE Sanitized_Docs SHALL preserve the weighted sentiment average formula: `weighted_avg = Σ(combined_weight × impact_score × sentiment_value) / Σ(combined_weight × impact_score)`.
|
||||
8. THE Sanitized_Docs SHALL preserve all code module path references (e.g., `services/aggregation/scoring.py`, `services/recommendation/eligibility.py`).
|
||||
9. THE Sanitized_Docs SHALL preserve the three-layer signal architecture, renaming the layers with domain-neutral labels (e.g., "Entity-Specific Signals", "Environmental Signals", "Relational Signals") while retaining the weight ratios (1.0, 0.3, 0.2).
|
||||
10. THE Sanitized_Docs SHALL preserve all threshold values, configuration parameters, and numeric constants (e.g., confidence gate of 0.2, recency half-lives per window, eligibility thresholds).
|
||||
11. THE Sanitized_Docs SHALL preserve all Markdown table structures containing technical parameters and thresholds.
|
||||
12. THE Sanitized_Docs SHALL preserve the contradiction detection algorithm, evidence ranking methodology, and trend projection computation.
|
||||
|
||||
---
|
||||
|
||||
### Requirement 6: Sanitize Mermaid Diagrams
|
||||
|
||||
**User Story:** As a presenter, I want the Mermaid diagrams sanitized with the same terminology mapping as the narrative pages, so that diagrams and text are consistent.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Sanitized_Docs SHALL contain one sanitized Mermaid diagram file for each of the 6 diagram files in Source_Docs.
|
||||
2. WHEN a Source_Docs diagram contains financial/trading terminology (e.g., "trading engine", "buy/sell", "paper_eligible", "bullish/bearish", ticker symbols), THE corresponding Sanitized_Docs diagram SHALL use the same domain-neutral replacements defined in the Terminology_Map.
|
||||
3. THE Sanitized_Docs diagrams SHALL preserve all Mermaid syntax, node relationships, subgraph structures, and flow directions from the Source_Docs diagrams.
|
||||
4. THE Sanitized_Docs diagrams SHALL preserve all code module path references and service names within diagram nodes.
|
||||
5. THE Sanitized_Docs diagram filenames SHALL use sanitized names where the original names contain financial terms (e.g., `decision-engine-loop.md` instead of `trading-engine-decision-loop.md`).
|
||||
|
||||
---
|
||||
|
||||
### Requirement 7: Sanitize Redis Key and Queue Name References
|
||||
|
||||
**User Story:** As a presenter, I want Redis key patterns and queue names sanitized where they contain financial terms, so that even infrastructure-level references are domain-neutral.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Source_Docs Redis queue name contains "stonks" (e.g., `stonks:queue:ingestion`), THE Sanitized_Docs SHALL replace "stonks" with a neutral prefix (e.g., `app:queue:ingestion`).
|
||||
2. WHEN a Source_Docs Redis key pattern contains "trading" (e.g., `stonks:queue:broker_orders`, `stonks:trading:circuit_breaker:*`), THE Sanitized_Docs SHALL replace the trading-specific segment with a neutral equivalent (e.g., `app:queue:execution_orders`, `app:execution:circuit_breaker:*`).
|
||||
3. THE Sanitized_Docs SHALL apply Redis key sanitization consistently across all narrative pages and diagram files.
|
||||
|
||||
---
|
||||
|
||||
### Requirement 8: Sanitize MinIO Bucket Name References
|
||||
|
||||
**User Story:** As a presenter, I want MinIO bucket names sanitized where they contain financial terms, so that storage references are domain-neutral.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Source_Docs MinIO bucket name contains "stonks" (e.g., `stonks-raw-market`, `stonks-raw-news`, `stonks-normalized`), THE Sanitized_Docs SHALL replace "stonks" with a neutral prefix (e.g., `app-raw-data`, `app-raw-content`, `app-normalized`).
|
||||
2. THE Sanitized_Docs SHALL apply MinIO bucket name sanitization consistently across all narrative pages and diagram files.
|
||||
|
||||
---
|
||||
|
||||
### Requirement 9: Sanitize Database Table and Column References Where Needed
|
||||
|
||||
**User Story:** As a presenter, I want database table and column names that contain obvious financial terms sanitized, while preserving the overall schema structure.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Source_Docs database table name contains "trading" (e.g., `trading_decisions`), THE Sanitized_Docs SHALL use a neutral equivalent (e.g., `execution_decisions`).
|
||||
2. WHEN a Source_Docs database table or column references "portfolio" (e.g., `portfolio_snapshots`, `portfolio_pct`), THE Sanitized_Docs SHALL use a neutral equivalent (e.g., `pool_snapshots`, `allocation_pct`).
|
||||
3. THE Sanitized_Docs SHALL preserve all other database table names that do not contain financial-specific terms (e.g., `documents`, `document_intelligence`, `trend_windows`, `recommendations`).
|
||||
4. THE Sanitized_Docs SHALL apply database reference sanitization consistently across all narrative pages.
|
||||
|
||||
---
|
||||
|
||||
### Requirement 10: Sanitize Example Scenarios and Inline References
|
||||
|
||||
**User Story:** As a presenter, I want all inline examples, scenario walkthroughs, and narrative references sanitized, so that no financial context leaks through illustrative content.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a Source_Docs page uses a specific company name or ticker in an example scenario (e.g., "a bearish article about AAPL"), THE Sanitized_Docs SHALL replace the reference with a generic entity (e.g., "a negative-sentiment article about Entity-A").
|
||||
2. WHEN a Source_Docs page describes a financial event as an example (e.g., "earnings miss", "tariff announcement affecting XOM"), THE Sanitized_Docs SHALL reframe the example using domain-neutral language (e.g., "a negative performance disclosure", "a regulatory policy change affecting Entity-B").
|
||||
3. WHEN a Source_Docs page references market-specific concepts in narrative flow (e.g., "markets move fast", "trading volume", "intraday swings"), THE Sanitized_Docs SHALL reframe using neutral language (e.g., "conditions change rapidly", "activity volume", "short-term fluctuations").
|
||||
4. THE Sanitized_Docs SHALL preserve the logical structure and teaching purpose of all example scenarios while removing the financial framing.
|
||||
|
||||
---
|
||||
|
||||
### Requirement 11: Preserve Acceptable Engineering Terms
|
||||
|
||||
**User Story:** As a presenter, I want general engineering terms that happen to overlap with financial language preserved when they describe engineering patterns, so that the technical accuracy is maintained.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Sanitized_Docs SHALL preserve the term "circuit breaker" when it describes the engineering safety pattern (rate limiting, cascading failure prevention).
|
||||
2. THE Sanitized_Docs SHALL preserve the term "exponential backoff" and all retry/backoff patterns.
|
||||
3. THE Sanitized_Docs SHALL preserve all adapter pattern references (the software design pattern), renaming only the domain-specific adapter names (e.g., "AlpacaBrokerAdapter" becomes a neutral name).
|
||||
4. THE Sanitized_Docs SHALL preserve the term "signal" as used in the signal processing and scoring context.
|
||||
5. THE Sanitized_Docs SHALL preserve the terms "trend", "sentiment", "confidence", "contradiction", and "evidence" as used in the data analysis context.
|
||||
|
||||
---
|
||||
|
||||
### Requirement 12: Reframe the System Narrative
|
||||
|
||||
**User Story:** As a presenter, I want the overall system narrative reframed as a general-purpose AI decision intelligence pipeline, so that the presentation tells a coherent story without financial context.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE Sanitized_Docs index page SHALL describe the system as an "AI-driven intelligence-to-decision pipeline" that ingests data from multiple sources, extracts structured intelligence via NLP/LLM, scores and weights signals, aggregates trends across time windows, generates recommendations with quality gates, and executes decisions autonomously with safety mechanisms.
|
||||
2. THE Sanitized_Docs page 01 SHALL describe data ingestion from "multiple external data sources" rather than from financial-specific APIs.
|
||||
3. THE Sanitized_Docs page 06 SHALL describe "autonomous decision execution with safety mechanisms" rather than "trading decisions and execution".
|
||||
4. WHEN the Source_Docs conclusion references the "intelligence-to-decision pipeline in Stonks Oracle", THE Sanitized_Docs conclusion SHALL reference the "intelligence-to-decision pipeline" without a financial system name.
|
||||
5. THE Sanitized_Docs SHALL maintain the narrative flow where each page ends with a transition to the next page, preserving the end-to-end story structure.
|
||||
@@ -0,0 +1,47 @@
|
||||
# Tasks — Sanitized Pipeline Documentation
|
||||
|
||||
## Task 1: Create Output Directory and Index Page
|
||||
|
||||
- [x] 1.1 Create the `docs/sanitized-pipeline-deep-dive/` directory and `diagrams/` subdirectory
|
||||
- [x] 1.2 Create `docs/sanitized-pipeline-deep-dive/index.md` with sanitized content: replace "Stonks Oracle" with "the platform", replace Polygon.io/SEC EDGAR references with neutral descriptions, update all page links to use sanitized filenames (e.g., `06-decision-execution.md`), update diagram links to use sanitized names (e.g., `decision-engine-loop.md`), describe the system as an "AI-driven intelligence-to-decision pipeline", and update or remove the Related Documentation section to use neutral descriptions
|
||||
|
||||
## Task 2: Sanitize Page 01 — Data Ingestion and Preparation
|
||||
|
||||
- [x] 2.1 Create `docs/sanitized-pipeline-deep-dive/01-data-ingestion-and-preparation.md` by transforming the source page: replace "Stonks Oracle" with "the platform", replace "Polygon.io" with "external data provider", replace "SEC EDGAR"/"EFTS" with "public records API"/"regulatory filings source", replace "AlpacaBrokerAdapter" with "ExecutionAdapter", replace adapter class names (PolygonNewsAdapter → ExternalNewsAdapter, PolygonMarketAdapter → ExternalDataAdapter, SECEdgarAdapter → RegulatoryFilingsAdapter), replace all `stonks:` Redis key prefixes with `app:`, replace MinIO bucket names (stonks-raw-market → app-raw-data, stonks-raw-news → app-raw-content, stonks-raw-filings → app-raw-filings, stonks-normalized → app-normalized), replace ticker symbols (AAPL → Entity-A, etc.) and company names with generic entities, replace "broker" source_type with "execution_api", replace "SEC" references with "regulatory filings", replace "10-K"/"10-Q"/"8-K" with "regulatory filing types", replace "earnings" with "performance report", sanitize example paths (e.g., `news_api/AAPL/...` → `news_api/Entity-A/...`), update cross-references to use sanitized filenames, and preserve all engineering content (queue operations, table structures, quality scoring formula, code module paths)
|
||||
|
||||
## Task 3: Sanitize Page 02 — AI Agent Processing and Extraction
|
||||
|
||||
- [x] 3.1 Create `docs/sanitized-pipeline-deep-dive/02-ai-agent-processing-and-extraction.md` by transforming the source page: replace "Stonks Oracle" references, replace financial document type references (SEC filings → regulatory filings, earnings transcripts → performance transcripts), replace "financial document analyst" role description with "document analyst", replace ticker symbols and company names in examples (AAPL, TSLA, NVDA, XOM, META → Entity-A through Entity-E), replace "bearish"/"bullish" with "negative"/"positive", replace "earnings" catalyst type references with "performance_report", replace "stock ticker" with "entity identifier", replace "market implications" with neutral language, replace `stonks:queue:*` Redis keys with `app:queue:*`, replace MinIO bucket names (stonks-llm-prompts → app-llm-prompts, stonks-llm-results → app-llm-results, stonks-normalized → app-normalized), replace "tariff announcement affecting XOM" example with neutral equivalent, update cross-references, and preserve all engineering content (JSON repair pipeline, validation logic, AgentConfigResolver, Ollama references, code module paths, schema field descriptions)
|
||||
|
||||
## Task 4: Sanitize Page 03 — Signal Scoring and Weighted Signals
|
||||
|
||||
- [x] 4.1 Create `docs/sanitized-pipeline-deep-dive/03-signal-scoring-and-weighted-signals.md` by transforming the source page: replace "bullish"/"bearish" with "positive"/"negative" throughout, replace "trading recommendations" with "decision recommendations", replace ticker examples (AAPL, NVDA) with Entity-A/Entity-C, replace "market context" variable references carefully (preserve `market_context_multiplier` as a technical variable name but sanitize narrative references to "market conditions" → "environmental conditions"), replace "trading volume" with "activity volume", replace `stonks:queue:*` Redis keys with `app:queue:*`, replace "bullish_pct > bearish_pct" with "positive_pct > negative_pct" in signal propagation description, update cross-references, and preserve all engineering content (composite weight formula, recency decay formula, half-life tables, credibility weight computation, novelty bonus formula, weighted sentiment average formula, three-layer architecture with weight ratios 1.0/0.3/0.2, all threshold values and configuration parameters)
|
||||
|
||||
## Task 5: Sanitize Page 04 — Trend Aggregation and Accumulating Signals
|
||||
|
||||
- [x] 5.1 Create `docs/sanitized-pipeline-deep-dive/04-trend-aggregation-and-accumulating-signals.md` by transforming the source page: replace "bullish"/"bearish" with "positive"/"negative" in trend direction descriptions and TrendDirection enum values, replace "trading recommendations" with "decision recommendations", replace "BULLISH_THRESHOLD"/"BEARISH_THRESHOLD" with "POSITIVE_THRESHOLD"/"NEGATIVE_THRESHOLD", replace "paper_eligible"/"live_eligible" with "simulation_eligible"/"production_eligible", replace "paper trading"/"live trading" with "simulation mode"/"live execution mode", replace "buy"/"sell"/"hold"/"watch" action labels with "act"/"defer"/"monitor"/"observe", replace "trading_decisions" table with "execution_decisions", replace "portfolio" references, replace ticker examples (AAPL) with Entity-A, replace "earnings miss" example with "negative performance disclosure", replace `stonks:queue:*` Redis keys with `app:queue:*`, update cross-references, and preserve all engineering content (five time windows, trend direction derivation thresholds, contradiction detection algorithm, evidence ranking, confidence computation formula with log₂ scaling, trend projection computation, all persistence tables)
|
||||
|
||||
## Task 6: Sanitize Page 05 — Recommendation Generation
|
||||
|
||||
- [x] 6.1 Create `docs/sanitized-pipeline-deep-dive/05-recommendation-generation.md` by transforming the source page: replace "buy"/"sell"/"hold"/"watch" action labels with "act"/"defer"/"monitor"/"observe", replace "BUY"/"SELL"/"HOLD"/"WATCH" with "ACT"/"DEFER"/"MONITOR"/"OBSERVE", replace "paper_eligible"/"live_eligible" with "simulation_eligible"/"production_eligible", replace "paper trading"/"live trading" with "simulation mode"/"live execution mode", replace "trading engine" with "decision execution engine", replace "portfolio" with "resource pool"/"allocation pool", replace "portfolio_pct" with "allocation_pct", replace "position sizing" with "commitment sizing", replace "position" (as financial position) with "commitment", replace "stop-loss" with "risk threshold", replace "trading-eligible" with "execution-eligible", replace "trade" (as noun/verb) with "decision"/"execution", replace ticker examples (AAPL) with Entity-A, replace "earnings" catalyst references with "performance_report", replace `stonks:queue:*` Redis keys with `app:queue:*`, replace "broker adapter" with "execution adapter", replace "Alpaca" with "external execution API", update cross-references to use sanitized filenames (06-decision-execution.md), and preserve all engineering content (suppression thresholds, eligibility gates, position sizing formulas, thesis generation logic, risk classification computation, all persistence tables)
|
||||
|
||||
## Task 7: Sanitize Page 06 — Decision Execution
|
||||
|
||||
- [x] 7.1 Create `docs/sanitized-pipeline-deep-dive/06-decision-execution.md` by transforming the source page: change title to "Decision Execution", replace "trading engine" with "decision execution engine" throughout, replace "TradingEngine" class references with "DecisionEngine" in narrative (preserve code module path `services/trading/engine.py`), replace "trade"/"trading" with "decision"/"execution" in narrative, replace "pre-trade checks" with "pre-execution checks", replace "buy"/"sell" action labels with "act"/"defer", replace "paper trading"/"paper_eligible" with "simulation mode"/"simulation_eligible", replace "live trading"/"live_eligible" with "live execution mode"/"production_eligible", replace "broker"/"Alpaca" with "execution adapter"/"external execution API", replace "AlpacaBrokerAdapter" with "ExecutionAdapter" in narrative, replace "portfolio" with "resource pool"/"allocation pool", replace "portfolio heat" with "pool exposure", replace "portfolio_snapshots" with "pool_snapshots", replace "position"/"positions" (financial) with "commitment"/"commitments", replace "position sizing"/"PositionSizer" with "commitment sizing" in narrative, replace "stop-loss" with "risk threshold", replace "take-profit" with "gain target", replace "P&L" with "gain/loss", replace "Sharpe ratio" with "risk-adjusted return ratio", replace "win rate" with "success rate", replace "drawdown" with "peak-to-trough decline", replace "trading_decisions" table with "execution_decisions", replace `stonks:queue:broker_orders` with `app:queue:execution_orders`, replace `stonks:trading:circuit_breaker:*` with `app:execution:circuit_breaker:*`, replace `stonks:dedupe:trading:*` with `app:dedupe:execution:*`, replace all other `stonks:` Redis key prefixes with `app:`, replace "paper-api.alpaca.markets" with "execution-api.example.com", replace "Polygon API" with "data source API", replace ticker examples with Entity-{letter}, replace "earnings" references with "performance report"/"periodic disclosure", update cross-references to use sanitized filenames, update the Conclusion section to remove "Stonks Oracle" and financial framing, and preserve all engineering content (5 concurrent async tasks, circuit breaker algorithm, reserve pool logic, risk tier parameters table, position sizing pipeline, order submission flow, all code module paths, all threshold values)
|
||||
|
||||
## Task 8: Sanitize Mermaid Diagrams
|
||||
|
||||
- [x] 8.1 Create `docs/sanitized-pipeline-deep-dive/diagrams/ingestion-to-extraction-flow.md` by transforming the source diagram: replace `stonks:queue:*` with `app:queue:*`, replace MinIO bucket names (stonks-raw-market → app-raw-data, stonks-raw-news → app-raw-content, stonks-raw-filings → app-raw-filings, stonks-normalized → app-normalized), replace adapter names in node labels (PolygonMarketAdapter → ExternalDataAdapter, PolygonNewsAdapter → ExternalNewsAdapter, SECEdgarAdapter → RegulatoryFilingsAdapter, MacroNewsAdapter unchanged, WebScrapeAdapter unchanged), replace "AlpacaBrokerAdapter" if present, and preserve all Mermaid syntax, node relationships, subgraph structures, flow directions, and code module paths
|
||||
- [x] 8.2 Create `docs/sanitized-pipeline-deep-dive/diagrams/three-layer-signal-merging.md` by transforming the source diagram: replace `stonks:queue:*` with `app:queue:*`, replace "bullish_pct > bearish_pct" if present, and preserve all Mermaid syntax and structure
|
||||
- [x] 8.3 Create `docs/sanitized-pipeline-deep-dive/diagrams/weighted-signal-computation.md` by copying the source diagram with minimal changes (content is already domain-neutral — only replace any `stonks:` references if present), preserving all Mermaid syntax and structure
|
||||
- [x] 8.4 Create `docs/sanitized-pipeline-deep-dive/diagrams/trend-accumulation-escalation.md` by transforming the source diagram: replace "BULLISH"/"BEARISH" with "POSITIVE"/"NEGATIVE", replace "BUY / SELL" with "ACT / DEFER", replace "paper_eligible"/"live_eligible" if present, and preserve all Mermaid syntax and structure
|
||||
- [x] 8.5 Create `docs/sanitized-pipeline-deep-dive/diagrams/recommendation-generation-flow.md` by transforming the source diagram: replace `stonks:queue:*` with `app:queue:*`, replace "BUY"/"SELL"/"HOLD"/"WATCH" with "ACT"/"DEFER"/"MONITOR"/"OBSERVE", replace "paper_eligible"/"live_eligible" with "simulation_eligible"/"production_eligible", replace "portfolio" with "allocation pool", and preserve all Mermaid syntax and structure
|
||||
- [x] 8.6 Create `docs/sanitized-pipeline-deep-dive/diagrams/decision-engine-loop.md` (renamed from trading-engine-decision-loop.md) by transforming the source diagram: replace "Trading Engine" with "Decision Execution Engine", replace `stonks:queue:broker_orders` with `app:queue:execution_orders`, replace `stonks:dedupe:trading:*` with `app:dedupe:execution:*`, replace `stonks:trading:circuit_breaker:*` with `app:execution:circuit_breaker:*`, replace "buy, sell" with "act, defer", replace "paper_eligible, live_eligible" with "simulation_eligible, production_eligible", replace "Alpaca paper trading" with "external execution API (simulation)", replace "portfolio" references with "resource pool"/"allocation pool", replace "Portfolio heat" with "Pool exposure", replace "portfolio_snapshots" with "pool_snapshots", replace "trading_decisions" with "execution_decisions", replace "Sharpe ratio" with "risk-adjusted return ratio", replace "drawdown" with "peak-to-trough decline", replace "win rate" with "success rate", replace "P&L" with "gain/loss", and preserve all Mermaid syntax, node relationships, subgraph structures, flow directions, and code module paths
|
||||
|
||||
## Task 9: Verification and Cross-Reference Integrity
|
||||
|
||||
- [x] 9.1 Verify all sanitized files exist at the expected paths: index.md, 6 numbered pages (01-06), and 6 diagram files in diagrams/
|
||||
- [x] 9.2 Verify no sanitized file contains any banned financial term: scan all files for ticker symbols (AAPL, TSLA, NVDA, XOM, META), company names (Apple, Tesla, NVIDIA as financial references), system names (Stonks Oracle, stonks), provider names (Polygon.io, Polygon, SEC EDGAR, Alpaca), financial terms (trading engine, paper trading, live trading, paper_eligible, live_eligible, portfolio, broker, bullish, bearish, position sizing, stop-loss, stock market, Wall Street, earnings, 10-K, 10-Q, 8-K), and infrastructure patterns (stonks: prefix, stonks- prefix, trading_decisions, portfolio_snapshots)
|
||||
- [x] 9.3 Verify all internal cross-references resolve: parse all markdown links in sanitized files, confirm each link target exists in the sanitized output directory
|
||||
- [x] 9.4 Verify key engineering content is preserved: check that the composite weight formula, confidence computation formula, weighted sentiment average formula, three-layer weight ratios (1.0, 0.3, 0.2), and key threshold values (confidence gate 0.2, eligibility confidence 0.35) appear in the sanitized docs
|
||||
- [x] 9.5 Verify source files are unmodified: confirm that no files under `docs/intelligence-pipeline-deep-dive/` were changed
|
||||
@@ -30,18 +30,16 @@
|
||||
- Ruff config: `ruff.toml` with `known-first-party = ["services"]` for consistent import sorting
|
||||
- Pre-existing test failures (not regressions): `test_extractor_prompts.py`, `test_extractor_schemas.py`, `test_filings_adapter.py`, `test_ollama_client.py`
|
||||
|
||||
## CI/CD — GitHub Actions
|
||||
- Workflow: `.github/workflows/build.yml`
|
||||
- Triggers on push to `main` and PRs
|
||||
- Jobs:
|
||||
- `lint-and-test`: ruff lint + pytest + frontend vitest (Node 24)
|
||||
- `build-services`: matrix build of all Python services → GHCR
|
||||
- `build-dashboard`: frontend/Dockerfile → GHCR (TypeScript strict mode — catches unused imports)
|
||||
- `build-superset`: docker/Dockerfile.superset → GHCR
|
||||
## CI/CD — Woodpecker CI (Gitea) → GitHub promotion
|
||||
- Woodpecker pipelines in `.woodpecker/` — triggered by push to `main` on Gitea
|
||||
- Push to Gitea: `git push gitea main`
|
||||
- Gitea remote: `http://admin:<password>@10.1.1.12:30300/admin/stonks-oracle.git`
|
||||
- Pipeline stages: lint → pytest → frontend vitest → build all service images + dashboard + superset → push to Harbor
|
||||
- ArgoCD watches Gitea `main` and auto-syncs beta/paper/live stages
|
||||
- **Do NOT push directly to GitHub** — GitHub is the promotion target after CI passes
|
||||
- Once Woodpecker builds and tests pass, code is promoted to GitHub (`git push origin main`)
|
||||
- CI handles all image builds and pushes — do NOT manually docker push
|
||||
- Check CI: `gh run list -L 3`
|
||||
- Re-run failed: `gh run rerun <id> --failed`
|
||||
- View failure logs: `gh run view <id> --log-failed`
|
||||
- Check Woodpecker CI status from the Gitea web UI or Woodpecker dashboard
|
||||
|
||||
## Deploy
|
||||
- Full deploy/redeploy: `bash ~/sources/kube/stonks-oracle/runmefirst.sh` (from gremlin-1)
|
||||
@@ -74,7 +72,9 @@ Ingestion jobs MUST include `source_id`, `source_type`, `ticker`, `company_id`,
|
||||
## Git Conventions
|
||||
- Commit after each completed phase task
|
||||
- Commit message format: `feat:`, `fix:`, `phase N:` prefix
|
||||
- Push to `main` triggers CI
|
||||
- Always push to Gitea: `git push gitea main`
|
||||
- Do NOT push to GitHub (`origin`) directly — GitHub is the promotion target after CI passes
|
||||
- ArgoCD syncs from Gitea automatically
|
||||
|
||||
## Code Style
|
||||
- Python 3.12, type hints everywhere
|
||||
|
||||
@@ -40,14 +40,17 @@ Three-layer signal aggregation engine:
|
||||
- Container registry: `registry.celestium.life/stonks-oracle`
|
||||
|
||||
## CI/CD
|
||||
- GitHub Actions workflow at `.github/workflows/build.yml`
|
||||
- Push to `main` triggers: lint → pytest → frontend vitest → build all service images + dashboard + superset → push to Harbor
|
||||
- Woodpecker CI pipelines in `.woodpecker/` — triggered by push to `main` on Gitea
|
||||
- Push to Gitea: `git push gitea main` — this is the primary push target
|
||||
- ArgoCD watches Gitea `main` and auto-syncs beta/paper/live stages
|
||||
- Pipeline stages: lint → pytest → frontend vitest → build all service images + dashboard + superset → push to Harbor
|
||||
- Images tagged as `registry.celestium.life/stonks-oracle/<service>:<sha>` and `:latest`
|
||||
- Dashboard image: `frontend/Dockerfile` (multi-stage: node:24 → nginx-unprivileged on port 8080)
|
||||
- Superset image: `docker/Dockerfile.superset` (apache/superset + trino + psycopg2)
|
||||
- Python service images: `docker/Dockerfile` with `SERVICE_CMD` build arg
|
||||
- Let CI handle image builds and pushes — do NOT manually `docker build && docker push`
|
||||
- Check CI status: `gh run list -L 3`
|
||||
- **Do NOT push directly to GitHub** — GitHub (`origin`) is the promotion target after CI builds and tests pass
|
||||
- Promotion to GitHub: `git push origin main` (only after Woodpecker CI succeeds)
|
||||
|
||||
## Deployment Scripts
|
||||
- `~/sources/kube/stonks-oracle/runmefirst.sh` — full deploy: DB setup, migrations, Helm install, rolling restart (runs from gremlin-1 at 192.168.42.254 where secrets are available)
|
||||
@@ -76,9 +79,9 @@ When a full reset is needed:
|
||||
- Ollama: `ollama.ollama-service.svc.cluster.local:11434` (cluster-internal), also at `http://10.1.1.12:2701` (external), GPU: 4070 Ti Super 16GB
|
||||
|
||||
## Database Migrations
|
||||
- Located in `infra/migrations/001_*.sql` through `027_*.sql`
|
||||
- Located in `infra/migrations/001_*.sql` through `030_*.sql`
|
||||
- Applied automatically by `runmefirst.sh` in sorted order
|
||||
- Next migration number: **029**
|
||||
- Next migration number: **031**
|
||||
- Key migrations:
|
||||
- 016: Global news interpolation (global_events, macro_impact_records, exposure_profiles, trend_projections)
|
||||
- 017: Competitive intelligence (competitor_relationships, competitive_signal_records)
|
||||
|
||||
Reference in New Issue
Block a user