Files
stonks-oracle/docs/architecture-kubernetes.md
Celes Renata f468e30af0
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
feat: implement dual-pipeline signal engine service
New service at services/signal_engine/ implementing concurrent heuristic
(deterministic scoring) and probabilistic (Bayesian inference) pipelines
that evaluate technical signals across 6 timeframes (M30-M) and produce
independent BUY/WATCH/SKIP verdicts per ticker per evaluation tick.

Components:
- Input Normalizer: multi-source data assembly with sentinel fallbacks
- Signal Library: Fibonacci, MA Stack, RSI, Cup & Handle, Elliott Wave
- Multi-Timeframe Confluence Engine: weighted scoring with D/W/M anchors
- Hard Filter Engine: macro_bias, valuation, earnings proximity gating
- Heuristic Pipeline: S_total scoring with confidence-gated verdicts
- Probabilistic Pipeline: Bayesian log-odds with regime priors, entropy
  gating, EV_R calculation, and signal correlation penalty
- Exit Engine: stop-loss, targets, trailing ATR-based stops
- Delta Analyzer: pipeline agreement tracking with rolling Redis metrics
- Output Formatter: SignalOutput contract + Recommendation schema mapping
- Worker orchestrator: concurrent pipelines with failure isolation
- Main entry point: queue polling with fail-safe config loading

Infrastructure:
- Migration 039: signal_engine_outputs table with 3 indexes
- Helm chart: signalEngine service entry (processing tier)
- Redis key: QUEUE_SIGNAL_ENGINE constant

Tests: 390 tests (unit + property-based) covering all components
Config: dual_pipeline_enabled=false by default (safe rollout)
2026-05-02 07:32:26 +00:00

381 lines
21 KiB
Markdown

# Kubernetes Architecture — Stonks Oracle
This document describes the Kubernetes deployment topology for Stonks Oracle, derived from the Helm chart at `infra/helm/stonks-oracle/`.
All application workloads deploy to the `stonks-oracle` namespace. External cluster services (PostgreSQL, Redis, MinIO, Ollama) run in their own namespaces and are referenced via cross-namespace DNS.
## Deployment Diagram
```mermaid
graph TB
%% ── External traffic ──────────────────────────────────────────
internet((Internet))
subgraph traefik ["kube-system · Traefik Ingress Controller"]
direction LR
ing_dash["stonks.celestium.life"]
ing_api["stonks-api.celestium.life"]
ing_reg["stonks-registry.celestium.life"]
ing_trade["stonks-trading.celestium.life"]
ing_superset["stonks-dash.celestium.life"]
ing_trino["stonks-trino.celestium.life"]
end
internet --> traefik
%% ── stonks-oracle namespace ───────────────────────────────────
subgraph ns ["stonks-oracle namespace"]
direction TB
%% ── API Tier (ingress-facing) ─────────────────────────────
subgraph api_tier ["API Tier · tier: api"]
direction LR
query_api["query-api<br/><i>Deployment · 1 replica</i><br/>:8000<br/><i>readiness: /docs</i>"]
symbol_registry["symbol-registry<br/><i>Deployment · 1 replica</i><br/>:8000<br/><i>readiness: /docs · liveness: /docs</i>"]
end
%% ── Frontend Tier ─────────────────────────────────────────
subgraph frontend_tier ["Frontend Tier · tier: frontend"]
dashboard["dashboard<br/><i>Deployment · 1 replica</i><br/>:8080<br/><i>nginx-unprivileged</i><br/><i>readiness: / · liveness: /</i>"]
end
%% ── Trading Tier ──────────────────────────────────────────
subgraph trading_tier ["Trading Tier · tier: trading"]
direction LR
trading_engine["trading-engine<br/><i>Deployment · 1 replica</i><br/>:8000<br/><i>readiness: /ready · liveness: /health</i>"]
risk_engine["risk-engine<br/><i>Deployment · 1 replica</i><br/>:8000"]
broker_adapter["broker-adapter<br/><i>Deployment · 1 replica</i><br/><i>queue-driven worker · pipeline-gated</i>"]
end
%% ── Orchestration Tier ────────────────────────────────────
subgraph orchestration_tier ["Orchestration Tier · tier: orchestration"]
scheduler["scheduler<br/><i>Deployment · 1 replica · pipeline-gated</i><br/><i>init: migrations → seed → backfill</i>"]
end
%% ── Ingestion Tier ────────────────────────────────────────
subgraph ingestion_tier ["Ingestion Tier · tier: ingestion"]
ingestion["ingestion<br/><i>Deployment · 1 replica · pipeline-gated</i><br/><i>queue-driven worker</i>"]
end
%% ── Processing Tier (pipeline workers) ────────────────────
subgraph processing_tier ["Processing Tier · tier: processing"]
direction LR
parser["parser<br/><i>Deployment · 2 replicas · pipeline-gated</i>"]
extractor["extractor<br/><i>Deployment · 1 replica · pipeline-gated</i>"]
aggregation["aggregation<br/><i>Deployment · 4 replicas · pipeline-gated</i>"]
recommendation["recommendation<br/><i>Deployment · 1 replica · pipeline-gated</i>"]
end
%% ── Analytics Tier ────────────────────────────────────────
subgraph analytics_tier ["Analytics Tier · tier: analytics"]
direction LR
lake_publisher["lake-publisher<br/><i>Deployment · 1 replica · pipeline-gated</i><br/><i>queue-driven worker</i>"]
hive_metastore["hive-metastore<br/><i>Deployment · 1 replica</i><br/>:9083<br/><i>apache/hive:4.0.0</i><br/><i>PVC: hive-metastore-data</i>"]
trino["trino<br/><i>Deployment · 1 replica</i><br/>:8080<br/><i>trinodb/trino:latest</i><br/><i>readiness: /v1/info</i>"]
end
%% ── Superset (tier: dashboard in template) ────────────────
subgraph superset_block ["Superset · tier: dashboard"]
superset["superset<br/><i>Deployment · 1 replica</i><br/>:8088<br/><i>custom image</i><br/><i>PVC: superset-data</i><br/><i>readiness: /health</i>"]
end
%% ── Helm Secrets ──────────────────────────────────────────
subgraph secrets_block ["Helm-Managed Secrets"]
direction LR
sec_core["stonks-core-secrets<br/><i>POSTGRES_PASSWORD</i><br/><i>MINIO_ACCESS_KEY</i><br/><i>MINIO_SECRET_KEY</i><br/><i>REDIS_PASSWORD</i>"]
sec_broker["stonks-broker-secrets<br/><i>BROKER_API_KEY</i><br/><i>BROKER_API_SECRET</i><br/><i>BROKER_BASE_URL</i>"]
sec_market["stonks-market-secrets<br/><i>MARKET_DATA_API_KEY</i>"]
sec_gmail["stonks-gmail-secrets<br/><i>GMAIL_SENDER</i><br/><i>GMAIL_RECIPIENT</i><br/><i>GMAIL_APP_PASSWORD</i>"]
sec_dashboard["stonks-dashboard-secrets<br/><i>SUPERSET_SECRET_KEY</i><br/><i>SUPERSET_ADMIN_PASSWORD</i>"]
end
%% ── ConfigMap ─────────────────────────────────────────────
configmap["stonks-config<br/><i>ConfigMap</i><br/><i>All env vars from values.yaml config block</i>"]
end
%% ── External Cluster Services ─────────────────────────────────
subgraph pg_ns ["postgresql-service namespace"]
postgres[("PostgreSQL<br/>postgresql-rw:5432")]
end
subgraph redis_ns ["redis-service namespace"]
redis[("Redis<br/>redis-master:6379")]
end
subgraph minio_ns ["minio-service namespace"]
minio[("MinIO<br/>minio:80")]
end
subgraph ollama_ns ["ollama-service namespace"]
ollama[("Ollama<br/>ollama:11434<br/><i>GPU: 4070 Ti Super 16GB</i>")]
end
%% ── Ingress Routes ────────────────────────────────────────────
ing_dash -->|":8080"| dashboard
ing_api -->|":8000"| query_api
ing_reg -->|":8000"| symbol_registry
ing_trade -->|":8000"| trading_engine
ing_superset -->|":8088"| superset
ing_trino -->|":8080"| trino
%% ── Dashboard → Backend APIs ──────────────────────────────────
dashboard -.->|"/api/ proxy"| query_api
dashboard -.->|"/registry/ proxy"| symbol_registry
dashboard -.->|"/risk/ proxy"| risk_engine
%% ── Pipeline data flow (via Redis queues) ─────────────────────
scheduler -->|"enqueue jobs"| redis
ingestion -->|"stonks:queue:parsing"| redis
parser -->|"stonks:queue:extraction"| redis
extractor -->|"stonks:queue:aggregation"| redis
aggregation -->|"stonks:queue:recommendation"| redis
recommendation -->|"stonks:queue:trading_decisions"| redis
trading_engine -->|"stonks:queue:broker_orders"| redis
broker_adapter -->|"read orders"| redis
lake_publisher -->|"stonks:queue:lake_publish"| redis
%% ── External service connections ──────────────────────────────
scheduler --> postgres
scheduler --> redis
ingestion --> postgres
ingestion --> redis
ingestion --> minio
parser --> postgres
parser --> redis
extractor --> postgres
extractor --> redis
extractor --> ollama
aggregation --> postgres
aggregation --> redis
recommendation --> postgres
recommendation --> redis
trading_engine --> postgres
trading_engine --> redis
risk_engine --> postgres
broker_adapter --> postgres
broker_adapter --> redis
lake_publisher --> postgres
lake_publisher --> minio
query_api --> postgres
query_api --> redis
query_api --> minio
symbol_registry --> postgres
%% ── Analytics plane connections ───────────────────────────────
lake_publisher -->|"Parquet → s3a://stonks-lakehouse"| minio
hive_metastore -->|"s3a:// catalog"| minio
trino -->|"thrift://hive-metastore:9083"| hive_metastore
superset -->|"trino:8080"| trino
query_api -->|"trino:8080"| trino
superset --> postgres
superset --> redis
%% ── Trading tier external egress ──────────────────────────────
trading_engine -->|"HTTPS :443<br/>Alpaca API"| internet
trading_engine -->|"SMTP :587<br/>Gmail notifications"| internet
broker_adapter -->|"HTTPS :443<br/>Alpaca API"| internet
ingestion -->|"HTTPS :443<br/>Polygon.io / News APIs"| internet
%% ── Secret consumption ────────────────────────────────────────
sec_core -.-> query_api
sec_core -.-> symbol_registry
sec_core -.-> scheduler
sec_core -.-> ingestion
sec_core -.-> parser
sec_core -.-> extractor
sec_core -.-> aggregation
sec_core -.-> recommendation
sec_core -.-> trading_engine
sec_core -.-> risk_engine
sec_core -.-> broker_adapter
sec_core -.-> lake_publisher
sec_core -.-> hive_metastore
sec_core -.-> trino
sec_core -.-> superset
sec_broker -.-> ingestion
sec_broker -.-> trading_engine
sec_broker -.-> risk_engine
sec_broker -.-> broker_adapter
sec_market -.-> ingestion
sec_market -.-> query_api
sec_gmail -.-> trading_engine
sec_dashboard -.-> superset
configmap -.-> query_api
configmap -.-> symbol_registry
configmap -.-> scheduler
configmap -.-> ingestion
configmap -.-> parser
configmap -.-> extractor
configmap -.-> aggregation
configmap -.-> recommendation
configmap -.-> trading_engine
configmap -.-> risk_engine
configmap -.-> broker_adapter
configmap -.-> lake_publisher
configmap -.-> superset
%% ── Styles ────────────────────────────────────────────────────
classDef apiSvc fill:#4a90d9,stroke:#2c5f8a,color:#fff
classDef frontendSvc fill:#50c878,stroke:#2e7d46,color:#fff
classDef tradingSvc fill:#e8a838,stroke:#b07d1a,color:#fff
classDef processSvc fill:#9b59b6,stroke:#6c3483,color:#fff
classDef orchSvc fill:#1abc9c,stroke:#148f77,color:#fff
classDef ingestionSvc fill:#e67e22,stroke:#bf6516,color:#fff
classDef analyticsSvc fill:#e74c3c,stroke:#a93226,color:#fff
classDef supersetSvc fill:#c0392b,stroke:#96281b,color:#fff
classDef extSvc fill:#95a5a6,stroke:#717d7e,color:#fff
classDef secretSvc fill:#f5f5dc,stroke:#999,color:#333
classDef configSvc fill:#dfe6e9,stroke:#999,color:#333
class query_api,symbol_registry apiSvc
class dashboard frontendSvc
class trading_engine,risk_engine,broker_adapter tradingSvc
class scheduler orchSvc
class ingestion ingestionSvc
class parser,extractor,aggregation,recommendation processSvc
class lake_publisher,hive_metastore,trino analyticsSvc
class superset supersetSvc
class postgres,redis,minio,ollama extSvc
class sec_core,sec_broker,sec_market,sec_gmail,sec_dashboard secretSvc
class configmap configSvc
```
## Network Policy Boundaries
The Helm chart deploys a **default-deny-ingress** policy that blocks all inbound traffic to pods in the `stonks-oracle` namespace. Each service that needs inbound connections has an explicit allow policy:
```mermaid
graph LR
subgraph netpol ["Network Policies — stonks-oracle namespace"]
direction TB
deny["🔒 default-deny-ingress<br/><i>Blocks ALL ingress to all pods</i>"]
subgraph allows ["Explicit Allow Rules"]
direction TB
np_dash["allow-dashboard-ingress<br/>dashboard :8080<br/>← kube-system (Traefik)"]
np_api["allow-query-api-ingress<br/>query-api :8000<br/>← kube-system (Traefik)<br/>← dashboard pod"]
np_reg["allow-symbol-registry-ingress<br/>symbol-registry :8000<br/>← kube-system (Traefik)<br/>← dashboard pod"]
np_trade["allow-trading-engine-ingress<br/>trading-engine :8000<br/>← kube-system (Traefik)<br/>← query-api pod<br/>← dashboard pod<br/><i>Egress: PostgreSQL :5432,</i><br/><i>Redis :6379, HTTPS :443, SMTP :587</i>"]
np_risk["allow-risk-engine-ingress<br/>risk-engine :8000<br/>← broker-adapter pod<br/>← query-api pod<br/>← dashboard pod"]
np_superset["allow-superset-ingress<br/>superset :8088<br/>← kube-system (Traefik)"]
np_trino["allow-trino-ingress<br/>trino :8080<br/>← superset pod<br/>← query-api pod<br/>← kube-system (Traefik)"]
np_hive["allow-hive-metastore-ingress<br/>hive-metastore :9083<br/>← trino pod<br/>← lake-publisher pod"]
np_broker["deny-broker-adapter-ingress<br/>broker-adapter<br/><i>No inbound traffic allowed</i>"]
end
end
style deny fill:#e74c3c,stroke:#c0392b,color:#fff
style np_broker fill:#e74c3c,stroke:#c0392b,color:#fff
style np_dash fill:#2ecc71,stroke:#27ae60,color:#fff
style np_api fill:#2ecc71,stroke:#27ae60,color:#fff
style np_reg fill:#2ecc71,stroke:#27ae60,color:#fff
style np_trade fill:#f39c12,stroke:#d68910,color:#fff
style np_risk fill:#f39c12,stroke:#d68910,color:#fff
style np_superset fill:#2ecc71,stroke:#27ae60,color:#fff
style np_trino fill:#2ecc71,stroke:#27ae60,color:#fff
style np_hive fill:#3498db,stroke:#2980b9,color:#fff
```
### Services Without Ingress Policies (Pipeline Workers)
The following services have **no inbound network policy** — they are queue-driven workers that only make outbound connections to PostgreSQL, Redis, MinIO, and Ollama. The default-deny-ingress policy blocks any unsolicited inbound traffic:
| Service | Tier | Behavior |
|---------|------|----------|
| scheduler | orchestration | Polls DB, enqueues to Redis. Runs migrations + seed + backfill as init containers |
| ingestion | ingestion | Reads from `stonks:queue:ingestion`, writes to DB/MinIO/Redis. Egress to Polygon.io/News APIs |
| parser | processing | Reads from `stonks:queue:parsing`, writes to DB/Redis |
| extractor | processing | Reads from `stonks:queue:extraction`, calls Ollama, writes to DB/Redis |
| aggregation | processing | Reads from `stonks:queue:aggregation`, writes to DB/Redis |
| recommendation | processing | Reads from `stonks:queue:recommendation`, writes to DB/Redis |
| lake-publisher | analytics | Reads from `stonks:queue:lake_publish`, writes Parquet to MinIO |
## Service Tier Summary
| Tier | Services | Ingress? | Replicas | Pipeline-Gated? | Notes |
|------|----------|----------|----------|-----------------|-------|
| **api** | query-api, symbol-registry | Yes (Traefik) | 1 each | No | FastAPI, readiness probes on `/docs` |
| **frontend** | dashboard | Yes (Traefik) | 1 | No | nginx-unprivileged on :8080, proxies to API services |
| **trading** | trading-engine, risk-engine, broker-adapter | trading-engine: Yes; risk-engine: internal only; broker-adapter: denied | 1 each | broker-adapter only | trading-engine has egress to Alpaca + Gmail |
| **orchestration** | scheduler | No | 1 | Yes | Runs DB migrations + seed + backfill as init containers |
| **ingestion** | ingestion | No | 1 | Yes | Fetches from external APIs (Polygon.io, news, filings) |
| **processing** | parser, extractor, aggregation, recommendation | No | 2, 1, 4, 1 | Yes | Queue-driven pipeline workers |
| **analytics** | lake-publisher, trino, hive-metastore | trino: Yes (Traefik); others: No | 1 each | lake-publisher only | trino + hive-metastore gated by `trino.enabled` / `hiveMetastore.enabled` |
| **dashboard** (Superset) | superset | Yes (Traefik) | 1 | No | Gated by `superset.enabled`, custom image with trino + psycopg2 drivers |
## Secret Consumption Map
| Secret | Keys | Consumers |
|--------|------|-----------|
| `stonks-core-secrets` | POSTGRES_PASSWORD, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, REDIS_PASSWORD | All 13 app services + hive-metastore (init), trino (init), superset |
| `stonks-broker-secrets` | BROKER_API_KEY, BROKER_API_SECRET, BROKER_BASE_URL | ingestion, trading-engine, risk-engine, broker-adapter |
| `stonks-market-secrets` | MARKET_DATA_API_KEY | ingestion, query-api |
| `stonks-gmail-secrets` | GMAIL_SENDER, GMAIL_RECIPIENT, GMAIL_APP_PASSWORD | trading-engine |
| `stonks-dashboard-secrets` | SUPERSET_SECRET_KEY, SUPERSET_ADMIN_PASSWORD | superset |
## Pipeline Toggle
Setting `pipelineEnabled: false` in `values.yaml` scales all services with `pipeline: true` to 0 replicas. This affects:
- scheduler, ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher
API-tier services (query-api, symbol-registry), trading-tier services (trading-engine, risk-engine), analytics services (trino, hive-metastore, superset), and the dashboard always run regardless of this toggle.
## External Cluster Services
These services run outside the `stonks-oracle` namespace and are referenced via cross-namespace DNS:
| Service | Namespace | DNS | Port | Notes |
|---------|-----------|-----|------|-------|
| PostgreSQL | `postgresql-service` | `postgresql-rw.postgresql-service.svc.cluster.local` | 5432 | CloudNativePG managed |
| Redis | `redis-service` | `redis-master.redis-service.svc.cluster.local` | 6379 | Password in `stonks-core-secrets` |
| MinIO | `minio-service` | `minio.minio-service.svc.cluster.local` | 80 | S3-compatible object store |
| Ollama | `ollama-service` | `ollama.ollama-service.svc.cluster.local` | 11434 | LLM inference, GPU: 4070 Ti Super 16GB |
## Analytics Plane
The analytics stack runs within the `stonks-oracle` namespace:
1. **Lake Publisher** writes Parquet fact tables to MinIO at `s3a://stonks-lakehouse/warehouse`. Pipeline-gated — scales to 0 when `pipelineEnabled: false`.
2. **Hive Metastore** (Apache Hive 4.0.0) manages table metadata, backed by embedded Derby DB with a PVC (`hive-metastore-data`) for persistence. Connects to MinIO for S3A filesystem access. Gated by `hiveMetastore.enabled`.
3. **Trino** queries the lakehouse via Hive Metastore (`thrift://hive-metastore:9083`). Exposes two catalogs: `lakehouse` (Hive connector) and `iceberg` (Iceberg connector). Both connect to MinIO for data access. Gated by `trino.enabled`. Readiness probe on `/v1/info`.
4. **Superset** connects to Trino for lakehouse queries and to PostgreSQL for its metadata DB. Uses Redis for caching. Exposed externally via Traefik ingress. Gated by `superset.enabled`. Uses custom image (`registry.celestium.life/stonks-oracle/superset:latest`) with trino + psycopg2 drivers. PVC (`superset-data`) for persistence.
## Ingress Routes
All ingress resources use the `traefik` IngressClass with TLS certificates issued by the `ca-issuer` ClusterIssuer:
| Domain | Backend Service | Port | TLS Secret |
|--------|----------------|------|------------|
| `stonks.celestium.life` | dashboard | 8080 | `stonks-dashboard-tls` |
| `stonks-api.celestium.life` | query-api | 8000 | `stonks-api-tls` |
| `stonks-registry.celestium.life` | symbol-registry | 8000 | `stonks-registry-tls` |
| `stonks-trading.celestium.life` | trading-engine | 8000 | `stonks-trading-tls` |
| `stonks-dash.celestium.life` | superset | 8088 | `stonks-dash-tls` |
| `stonks-trino.celestium.life` | trino | 8080 | `stonks-trino-tls` |
## Deployment Stages
The Helm chart supports multiple deployment stages via value override files:
| Stage | Override File | Namespace | Key Differences |
|-------|--------------|-----------|-----------------|
| **Production** | `values.yaml` (base) | `stonks-oracle` | Full analytics stack, all services |
| **Paper** | `values-paper.yaml` | `stonks-oracle` | `BROKER_MODE=paper`, `DEPLOY_STAGE=paper`, separate DB (`stonks_paper`), Redis DB 2, paper-specific ingress hostnames |
| **Beta** | `values-beta.yaml` | `stonks-oracle-beta` | `DEPLOY_STAGE=beta`, `LOG_LEVEL=DEBUG`, separate DB (`stonks_beta`), Redis DB 1, analytics stack disabled, beta-specific ingress hostnames |