feat: comprehensive docs, unit tests, docker-compose app services

- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
This commit is contained in:
Celes Renata
2026-04-22 02:56:41 +00:00
parent f251c53f92
commit 88ad1e8d99
57 changed files with 13318 additions and 51 deletions
+659
View File
@@ -0,0 +1,659 @@
# Helm Chart Configuration Reference
Complete reference for the Stonks Oracle Helm chart at `infra/helm/stonks-oracle/`.
| | |
|---|---|
| **Chart name** | `stonks-oracle` |
| **Chart version** | `0.1.0` |
| **App version** | `1.0.0` |
| **Chart type** | `application` |
Install with:
```bash
helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle
```
Override values per stage:
```bash
# Beta
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
-n stonks-oracle-beta -f infra/helm/stonks-oracle/values-beta.yaml
# Paper trading
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
-n stonks-oracle -f infra/helm/stonks-oracle/values-paper.yaml
```
---
## Table of Contents
- [image — Global Image Settings](#image--global-image-settings)
- [pipelineEnabled — Pipeline Toggle](#pipelineenabled--pipeline-toggle)
- [services — Service Deployments](#services--service-deployments)
- [config — ConfigMap Environment Variables](#config--configmap-environment-variables)
- [secrets — Kubernetes Secrets](#secrets--kubernetes-secrets)
- [ingress — Ingress Configuration](#ingress--ingress-configuration)
- [Analytics Stack — Trino, Hive Metastore, Superset](#analytics-stack--trino-hive-metastore-superset)
- [networkPolicies — Network Policy Configuration](#networkpolicies--network-policy-configuration)
- [Value Override Files](#value-override-files)
---
## `image` — Global Image Settings
Controls the container image registry, pull policy, and tag for all service deployments. Each service image is resolved as `{registry}/{service.image}:{tag}`.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `image.registry` | string | `registry.celestium.life/stonks-oracle` | Container registry prefix. Each service appends its `image` name to this. |
| `image.pullPolicy` | string | `Always` | Kubernetes `imagePullPolicy`. Use `Always` for latest-tag workflows. |
| `image.tag` | string | `latest` | Image tag applied to all services. CI overrides this with the Git SHA via `--set image.tag=<sha>`. |
Example override:
```bash
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
--set image.tag=abc1234
```
---
## `pipelineEnabled` — Pipeline Toggle
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `pipelineEnabled` | bool | `true` | Master toggle for the data pipeline. |
When `false`, all services with `pipeline: true` in their definition are scaled to **0 replicas**. API-tier and trading-tier services continue running normally.
**Affected services** (scaled to 0 when disabled): scheduler, ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher.
**Unaffected services** (always run): symbol-registry, query-api, trading-engine, risk-engine, dashboard.
The replica count logic in the deployment template:
```yaml
replicas: {{ if and (hasKey $svc "pipeline") $svc.pipeline (not .Values.pipelineEnabled) }}0{{ else }}{{ $svc.replicas }}{{ end }}
```
---
## `services` — Service Deployments
Each key under `services` defines a Kubernetes Deployment. The deployments template iterates over all entries and creates a Deployment + optional Service for each.
### Per-Service Structure
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `replicas` | int | yes | Number of pod replicas. Set to 0 by `pipelineEnabled: false` for pipeline services. |
| `image` | string | yes | Image name appended to `image.registry`. Also used as the Deployment name and pod label (`app: <image>`). |
| `command` | string | no | Shell command passed as `["sh", "-c", "<command>"]`. Omit for images with a built-in entrypoint (e.g., dashboard/nginx). |
| `tier` | string | yes | Service tier label (`stonks-oracle/tier`). One of: `api`, `frontend`, `processing`, `trading`, `orchestration`, `analytics`, `ingestion`. |
| `port` | int | no | Container port. When set, a Kubernetes Service is created mapping `port → port`. |
| `pipeline` | bool | no | If `true`, replicas are set to 0 when `pipelineEnabled` is `false`. |
| `secrets` | list(string) | no | List of Secret names to mount via `envFrom.secretRef`. |
| `resources` | object | yes | Kubernetes resource requests and limits (`cpu`, `memory`). |
| `probes.readiness` | object | no | HTTP readiness probe: `path`, `port`, `initialDelay`, `period`. |
| `probes.liveness` | object | no | HTTP liveness probe: `path`, `port`, `initialDelay`, `period`. |
### Service Definitions
#### scheduler
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `scheduler` |
| `command` | `python -m services.scheduler.app` |
| `tier` | `orchestration` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 50m, memory: 64Mi |
| `resources.limits` | cpu: 200m, memory: 128Mi |
| `probes` | — |
The scheduler deployment has two init containers (not configurable via values):
1. **run-migrations** — applies all SQL files from `infra/migrations/*.sql` in sorted order.
2. **seed-if-empty** — runs `python -m services.symbol_registry.seed` if the `companies` table is empty.
#### symbolRegistry
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `symbol-registry` |
| `command` | `uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000` |
| `tier` | `api` |
| `port` | `8000` |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
| `probes.readiness` | path: `/docs`, port: 8000, initialDelay: 5s, period: 10s |
| `probes.liveness` | path: `/docs`, port: 8000, initialDelay: 10s, period: 30s |
#### ingestion
| Field | Value |
|-------|-------|
| `replicas` | `2` |
| `pipeline` | `true` |
| `image` | `ingestion` |
| `command` | `python -m services.ingestion.worker` |
| `tier` | `ingestion` |
| `port` | — |
| `secrets` | `stonks-core-secrets`, `stonks-market-secrets`, `stonks-broker-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### parser
| Field | Value |
|-------|-------|
| `replicas` | `2` |
| `pipeline` | `true` |
| `image` | `parser` |
| `command` | `python -m services.parser.worker` |
| `tier` | `processing` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### extractor
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `extractor` |
| `command` | `python -m services.extractor.main` |
| `tier` | `processing` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 200m, memory: 256Mi |
| `resources.limits` | cpu: 1, memory: 512Mi |
Single replica is recommended — the extractor is bottlenecked by the shared Ollama GPU.
#### aggregation
| Field | Value |
|-------|-------|
| `replicas` | `4` |
| `pipeline` | `true` |
| `image` | `aggregation` |
| `command` | `python -m services.aggregation.main` |
| `tier` | `processing` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### recommendation
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `recommendation` |
| `command` | `python -m services.recommendation.main` |
| `tier` | `processing` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### tradingEngine
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `trading-engine` |
| `command` | `uvicorn services.trading.app:app --host 0.0.0.0 --port 8000` |
| `tier` | `trading` |
| `port` | `8000` |
| `secrets` | `stonks-core-secrets`, `stonks-broker-secrets`, `stonks-gmail-secrets` |
| `resources.requests` | cpu: 100m, memory: 256Mi |
| `resources.limits` | cpu: 500m, memory: 512Mi |
| `probes.readiness` | path: `/ready`, port: 8000, initialDelay: 5s, period: 10s |
| `probes.liveness` | path: `/health`, port: 8000, initialDelay: 10s, period: 30s |
#### riskEngine
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `risk` |
| `command` | `uvicorn services.risk.app:app --host 0.0.0.0 --port 8000` |
| `tier` | `trading` |
| `port` | `8000` |
| `secrets` | `stonks-core-secrets`, `stonks-broker-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### brokerAdapter
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `broker-adapter` |
| `command` | `python -m services.adapters.broker_service` |
| `tier` | `trading` |
| `port` | — |
| `secrets` | `stonks-core-secrets`, `stonks-broker-secrets` |
| `resources.requests` | cpu: 50m, memory: 64Mi |
| `resources.limits` | cpu: 200m, memory: 128Mi |
#### lakePublisher
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `pipeline` | `true` |
| `image` | `lake-publisher` |
| `command` | `python -m services.lake_publisher.jobs` |
| `tier` | `analytics` |
| `port` | — |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
#### queryApi
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `query-api` |
| `command` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` |
| `tier` | `api` |
| `port` | `8000` |
| `secrets` | `stonks-core-secrets` |
| `resources.requests` | cpu: 100m, memory: 128Mi |
| `resources.limits` | cpu: 500m, memory: 256Mi |
| `probes.readiness` | path: `/docs`, port: 8000, initialDelay: 5s, period: 10s |
#### dashboard
| Field | Value |
|-------|-------|
| `replicas` | `1` |
| `image` | `dashboard` |
| `command` | — (nginx built-in entrypoint) |
| `tier` | `frontend` |
| `port` | `8080` |
| `secrets` | — |
| `resources.requests` | cpu: 50m, memory: 64Mi |
| `resources.limits` | cpu: 200m, memory: 128Mi |
| `probes.readiness` | path: `/`, port: 8080, initialDelay: 3s, period: 10s |
| `probes.liveness` | path: `/`, port: 8080, initialDelay: 5s, period: 30s |
---
## `config` — ConfigMap Environment Variables
All keys under `config` are rendered into a Kubernetes ConfigMap named `stonks-config` and injected into every service pod via `envFrom.configMapRef`. Values are strings.
### Database
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.POSTGRES_HOST` | string | `postgresql-rw.postgresql-service.svc.cluster.local` | PostgreSQL hostname. Points to the CloudNativePG read-write service. |
| `config.POSTGRES_PORT` | string | `5432` | PostgreSQL port. |
| `config.POSTGRES_DB` | string | `stonks` | Database name. Override per stage (e.g., `stonks_beta`, `stonks_paper`). |
| `config.POSTGRES_USER` | string | `stonks` | Database user. Override per stage. |
| `config.REDIS_HOST` | string | `redis-master.redis-service.svc.cluster.local` | Redis hostname. |
| `config.REDIS_PORT` | string | `6379` | Redis port. |
| `config.REDIS_DB` | string | `0` | Redis database index. Use different indices per stage to isolate keys (beta: `1`, paper: `2`). |
### Object Storage
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.MINIO_ENDPOINT` | string | `minio.minio-service.svc.cluster.local:80` | MinIO API endpoint (host:port). |
| `config.MINIO_SECURE` | string | `false` | Use HTTPS for MinIO connections. Set to `true` if MinIO has TLS. |
### LLM / Ollama
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.OLLAMA_BASE_URL` | string | `""` (empty) | Ollama API base URL. Set to the cluster-internal or external Ollama endpoint. |
| `config.OLLAMA_MODEL` | string | `qwen3.5:9b-fast` | Default LLM model for extraction and classification agents. |
| `config.OLLAMA_TIMEOUT` | string | `240` | Request timeout in seconds for Ollama API calls. |
| `config.OLLAMA_MAX_RETRIES` | string | `2` | Maximum retry attempts for failed Ollama requests. |
| `config.OLLAMA_RETRY_BASE_DELAY` | string | `1.0` | Base delay in seconds for exponential backoff on Ollama retries. |
| `config.OLLAMA_RETRY_MAX_DELAY` | string | `10.0` | Maximum delay cap in seconds for Ollama retry backoff. |
| `config.OLLAMA_RETRY_BACKOFF_MULTIPLIER` | string | `2.0` | Multiplier for exponential backoff between Ollama retries. |
### Analytics / Trino
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.TRINO_HOST` | string | `trino.stonks-oracle.svc.cluster.local` | Trino coordinator hostname. |
| `config.TRINO_PORT` | string | `8080` | Trino coordinator port. |
| `config.TRINO_CATALOG` | string | `lakehouse` | Default Trino catalog for Hive-based queries. |
| `config.TRINO_SCHEMA` | string | `stonks` | Default Trino schema. |
| `config.TRINO_ICEBERG_CATALOG` | string | `iceberg` | Trino catalog for Iceberg table queries. |
### Broker / Trading
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.BROKER_MODE` | string | `paper` | Broker execution mode. `paper` for simulated trading, `live` for real orders. |
| `config.BROKER_PROVIDER` | string | `""` (empty) | Broker provider name (e.g., `alpaca`). |
| `config.MARKET_DATA_BASE_URL` | string | `""` (empty) | Market data API base URL (e.g., `https://api.polygon.io`). |
| `config.MARKET_DATA_PROVIDER` | string | `polygon` | Market data provider identifier. |
| `config.TRADING_ENABLED` | string | `true` | Master toggle for the trading engine. Set to `false` to disable order submission. |
| `config.TRADING_RISK_TIER` | string | `moderate` | Default risk tier for position sizing. Options: `conservative`, `moderate`, `aggressive`. |
| `config.TRADING_ABSOLUTE_POSITION_CAP` | string | `10000.0` | Maximum dollar value per position. |
| `config.TRADING_MAX_OPEN_POSITIONS` | string | `10` | Maximum number of concurrent open positions. |
### Data Retention
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.RETENTION_RAW_MARKET_DAYS` | string | `90` | Days to retain raw market data before cleanup. |
| `config.RETENTION_RAW_NEWS_DAYS` | string | `180` | Days to retain raw news articles. |
| `config.RETENTION_RAW_FILINGS_DAYS` | string | `365` | Days to retain raw SEC filings. |
| `config.RETENTION_NORMALIZED_DAYS` | string | `180` | Days to retain normalized/parsed documents. |
| `config.RETENTION_LLM_PROMPTS_DAYS` | string | `365` | Days to retain LLM prompt logs. |
| `config.RETENTION_LLM_RESULTS_DAYS` | string | `365` | Days to retain LLM extraction results. |
| `config.RETENTION_LAKEHOUSE_DAYS` | string | `730` | Days to retain lakehouse fact tables. |
| `config.RETENTION_AUDIT_DAYS` | string | `730` | Days to retain audit trail events. |
| `config.RETENTION_CLEANUP_INTERVAL_HOURS` | string | `24` | Hours between retention cleanup runs. |
| `config.RETENTION_BATCH_SIZE` | string | `1000` | Number of rows deleted per cleanup batch. |
### Logging and Deployment
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.LOG_LEVEL` | string | `INFO` | Python logging level. Options: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
| `config.JSON_LOGS` | string | `true` | Emit structured JSON logs when `true`. |
| `config.DEPLOY_STAGE` | string | `""` (empty) | Deployment stage identifier. Used to isolate Redis keys and MinIO buckets per stage (e.g., `beta`, `paper`). |
### Alerting
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `config.ALERT_SOURCE_FAILURE_THRESHOLD` | string | `3` | Number of consecutive source failures before firing an alert. |
| `config.ALERT_SOURCE_FAILURE_WINDOW_HOURS` | string | `6` | Time window (hours) for evaluating source failure count. |
| `config.ALERT_SCHEMA_FAILURE_RATE_THRESHOLD` | string | `0.3` | Schema validation failure rate (0.01.0) that triggers an alert. |
| `config.ALERT_SCHEMA_FAILURE_WINDOW_HOURS` | string | `1` | Time window (hours) for evaluating schema failure rate. |
| `config.ALERT_LAKE_LAG_THRESHOLD_MINUTES` | string | `60` | Minutes of lakehouse publish lag before alerting. |
| `config.ALERT_BROKER_ERROR_THRESHOLD` | string | `3` | Number of broker errors before firing an alert. |
| `config.ALERT_BROKER_ERROR_WINDOW_HOURS` | string | `1` | Time window (hours) for evaluating broker error count. |
| `config.ALERT_CHECK_INTERVAL_SECONDS` | string | `120` | Seconds between alert evaluation cycles. |
---
## `secrets` — Kubernetes Secrets
Secrets are rendered into five Kubernetes Secret objects. In the base `values.yaml`, all secret values default to empty strings. Inject real values at deploy time using `--set` flags or a values override file.
### Secret Objects
| Secret Name | Values Key | Consumed By |
|-------------|-----------|-------------|
| `stonks-core-secrets` | `secrets.core` | All services |
| `stonks-broker-secrets` | `secrets.broker` | ingestion, trading-engine, risk-engine, broker-adapter |
| `stonks-market-secrets` | `secrets.market` | ingestion |
| `stonks-gmail-secrets` | `secrets.gmail` | trading-engine |
| `stonks-dashboard-secrets` | `secrets.dashboard` | superset |
### `secrets.core`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `POSTGRES_PASSWORD` | string | `""` | PostgreSQL password. |
| `MINIO_ACCESS_KEY` | string | `""` | MinIO access key (AWS-style). |
| `MINIO_SECRET_KEY` | string | `""` | MinIO secret key. |
| `REDIS_PASSWORD` | string | `""` | Redis authentication password. |
### `secrets.broker`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `BROKER_API_KEY` | string | `""` | Broker API key (e.g., Alpaca paper trading key). |
| `BROKER_API_SECRET` | string | `""` | Broker API secret. |
| `BROKER_BASE_URL` | string | `""` | Broker API base URL (e.g., `https://paper-api.alpaca.markets`). |
### `secrets.market`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `MARKET_DATA_API_KEY` | string | `""` | Market data provider API key (e.g., Polygon.io). |
### `secrets.gmail`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `GMAIL_SENDER` | string | `celes@celestium.life` | Gmail sender address for trading notifications. |
| `GMAIL_RECIPIENT` | string | `celes@celestium.life` | Gmail recipient address for trading notifications. |
| `GMAIL_APP_PASSWORD` | string | `""` | Gmail app password for SMTP authentication. |
### `secrets.dashboard`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `SUPERSET_SECRET_KEY` | string | `""` | Flask secret key for Superset session encryption. |
| `SUPERSET_ADMIN_PASSWORD` | string | `""` | Superset admin user password. |
### Injecting Secrets at Deploy Time
```bash
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
-n stonks-oracle \
--set secrets.core.POSTGRES_PASSWORD="<password>" \
--set secrets.core.MINIO_ACCESS_KEY="<key>" \
--set secrets.core.MINIO_SECRET_KEY="<secret>" \
--set secrets.core.REDIS_PASSWORD="<password>" \
--set secrets.broker.BROKER_API_KEY="<key>" \
--set secrets.broker.BROKER_API_SECRET="<secret>" \
--set secrets.broker.BROKER_BASE_URL="https://paper-api.alpaca.markets" \
--set secrets.market.MARKET_DATA_API_KEY="<key>" \
--set secrets.gmail.GMAIL_APP_PASSWORD="<password>" \
--set secrets.dashboard.SUPERSET_SECRET_KEY="<key>" \
--set secrets.dashboard.SUPERSET_ADMIN_PASSWORD="<password>"
```
---
## `ingress` — Ingress Configuration
Controls Traefik Ingress resources with TLS via cert-manager.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `ingress.enabled` | bool | `true` | Create Ingress resources. Set to `false` for port-forward-only access. |
| `ingress.className` | string | `traefik` | Kubernetes IngressClass name. |
| `ingress.clusterIssuer` | string | `ca-issuer` | cert-manager ClusterIssuer for TLS certificates. |
### Host Mappings
| Key | Default | Routes To | Port |
|-----|---------|-----------|------|
| `ingress.hosts.queryApi` | `stonks-api.celestium.life` | query-api Service | 8000 |
| `ingress.hosts.symbolRegistry` | `stonks-registry.celestium.life` | symbol-registry Service | 8000 |
| `ingress.hosts.dashboard` | `stonks.celestium.life` | dashboard Service | 8080 |
| `ingress.hosts.superset` | `stonks-dash.celestium.life` | superset Service | 8088 |
| `ingress.hosts.trino` | `stonks-trino.celestium.life` | trino Service | 8080 |
| `ingress.hosts.tradingEngine` | `stonks-trading.celestium.life` | trading-engine Service | 8000 |
Setting `superset` or `trino` host to an empty string (`""`) disables that Ingress resource (the template uses a conditional check).
Each Ingress resource gets a dedicated TLS secret (e.g., `stonks-api-tls`, `stonks-registry-tls`) automatically provisioned by cert-manager.
---
## Analytics Stack — Trino, Hive Metastore, Superset
The analytics stack provides SQL-based querying over the lakehouse data stored in MinIO. Each component can be independently enabled or disabled.
### `trino`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `trino.enabled` | bool | `true` | Deploy the Trino coordinator. |
| `trino.resources.requests.cpu` | string | `500m` | CPU request. |
| `trino.resources.requests.memory` | string | `1Gi` | Memory request. |
| `trino.resources.limits.cpu` | string | `2` | CPU limit. |
| `trino.resources.limits.memory` | string | `4Gi` | Memory limit. |
When enabled, Trino deploys with two auto-configured catalogs:
- **`lakehouse`** — Hive connector for Parquet fact tables in MinIO.
- **`iceberg`** — Iceberg connector for Iceberg-format tables.
Both catalogs connect to the Hive Metastore for schema metadata and to MinIO for data via S3A. MinIO credentials are read from `stonks-core-secrets`.
### `hiveMetastore`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `hiveMetastore.enabled` | bool | `true` | Deploy the Hive Metastore. |
| `hiveMetastore.storageSize` | string | `1Gi` | PersistentVolumeClaim size for the embedded Derby metastore database. |
| `hiveMetastore.resources.requests.cpu` | string | `200m` | CPU request. |
| `hiveMetastore.resources.requests.memory` | string | `512Mi` | Memory request. |
| `hiveMetastore.resources.limits.cpu` | string | `1` | CPU limit. |
| `hiveMetastore.resources.limits.memory` | string | `1Gi` | Memory limit. |
Uses `apache/hive:4.0.0` with an embedded Derby database. The Thrift metastore listens on port 9083. MinIO credentials are injected from `stonks-core-secrets` via an init container that generates `core-site.xml` and `metastore-site.xml`.
### `superset`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `superset.enabled` | bool | `true` | Deploy Apache Superset. |
| `superset.storageSize` | string | `2Gi` | PersistentVolumeClaim size for Superset home directory. |
| `superset.resources.requests.cpu` | string | `200m` | CPU request. |
| `superset.resources.requests.memory` | string | `512Mi` | Memory request. |
| `superset.resources.limits.cpu` | string | `1` | CPU limit. |
| `superset.resources.limits.memory` | string | `2Gi` | Memory limit. |
Uses a custom image (`registry.celestium.life/stonks-oracle/superset`) with Trino and psycopg2 drivers pre-installed. Superset's metadata database is PostgreSQL (same cluster instance). Redis is used for caching. Credentials come from `stonks-core-secrets` and `stonks-dashboard-secrets`.
Superset listens on port 8088 with a readiness probe at `/health`.
### Disabling the Analytics Stack
To disable the entire analytics stack (e.g., in beta environments):
```yaml
trino:
enabled: false
hiveMetastore:
enabled: false
superset:
enabled: false
```
---
## `networkPolicies` — Network Policy Configuration
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `networkPolicies.enabled` | bool | `true` | Deploy NetworkPolicy resources. |
When enabled, the chart creates a **default-deny-ingress** policy that blocks all inbound traffic to every pod in the namespace. Individual allow policies are then created for services that need ingress:
| Policy | Target Pod | Allowed Sources | Port |
|--------|-----------|-----------------|------|
| `allow-query-api-ingress` | `query-api` | kube-system (Traefik), dashboard | 8000 |
| `allow-symbol-registry-ingress` | `symbol-registry` | kube-system (Traefik), dashboard | 8000 |
| `allow-risk-engine-ingress` | `risk` | broker-adapter, query-api, dashboard | 8000 |
| `allow-trading-engine-ingress` | `trading-engine` | query-api, dashboard, kube-system (Traefik) | 8000 |
| `allow-superset-ingress` | `superset` | kube-system (Traefik) | 8088 |
| `allow-trino-ingress` | `trino` | superset, query-api, kube-system (Traefik) | 8080 |
| `allow-hive-metastore-ingress` | `hive-metastore` | trino, lake-publisher | 9083 |
| `allow-dashboard-ingress` | `dashboard` | kube-system (Traefik) | 8080 |
| `deny-broker-adapter-ingress` | `broker-adapter` | (none — explicit deny) | — |
The trading-engine also has egress rules allowing outbound connections to PostgreSQL (5432), Redis (6379), HTTPS (443), SMTP (587), and DNS (53).
Pipeline workers (scheduler, ingestion, parser, extractor, aggregation, recommendation, lake-publisher) have no explicit ingress allow policies — they rely on the default-deny and communicate only via outbound connections to Redis queues and PostgreSQL.
---
## Value Override Files
The chart ships with two override files for staged deployments. ArgoCD or Kargo applies these during promotion.
### `values-beta.yaml` — Beta / Integration Testing
**Purpose**: Integration testing environment deployed to `stonks-oracle-beta` namespace. Shares infrastructure with paper but uses isolated database (`stonks_beta`), Redis DB index (`1`), and separate ingress hostnames.
Key overrides:
| Key | Beta Value | Reason |
|-----|-----------|--------|
| `pipelineEnabled` | `true` | Services deployed (ArgoCD health checks), but pipeline defaults to OFF via `PIPELINE_DEFAULT_OFF`. |
| `config.DEPLOY_STAGE` | `beta` | Isolates Redis keys (`stonks:beta:*`) and MinIO buckets (`beta-stonks-*`). |
| `config.POSTGRES_DB` | `stonks_beta` | Separate database for beta data. |
| `config.REDIS_DB` | `1` | Separate Redis DB index. |
| `config.LOG_LEVEL` | `DEBUG` | Verbose logging for debugging. |
| `config.TRADING_ENABLED` | `false` | Safety net — no order submission in beta. |
| `config.PIPELINE_DEFAULT_OFF` | `true` | Scheduler won't enqueue jobs unless explicitly enabled. |
| `config.OLLAMA_MODEL` | `qwen3.6` | May use a different model version for testing. |
| `trino.enabled` | `false` | Analytics stack disabled in beta. |
| `hiveMetastore.enabled` | `false` | Analytics stack disabled in beta. |
| `superset.enabled` | `false` | Analytics stack disabled in beta. |
Beta ingress hostnames:
| Service | Hostname |
|---------|----------|
| Query API | `stonks-api-beta.celestium.life` |
| Symbol Registry | `stonks-registry-beta.celestium.life` |
| Dashboard | `stonks-beta.celestium.life` |
| Trading Engine | `stonks-trading-beta.celestium.life` |
| Superset | (disabled) |
| Trino | (disabled) |
### `values-paper.yaml` — Paper Trading
**Purpose**: Paper trading environment with real market data but simulated order execution via Alpaca's paper trading API. Deployed to the main `stonks-oracle` namespace.
Key overrides:
| Key | Paper Value | Reason |
|-----|-----------|--------|
| `config.BROKER_MODE` | `paper` | Simulated order execution. |
| `config.BROKER_PROVIDER` | `alpaca` | Alpaca paper trading API. |
| `config.TRADING_ENABLED` | `true` | Trading engine active. |
| `config.POSTGRES_DB` | `stonks_paper` | Separate database for paper trading data. |
| `config.POSTGRES_USER` | `stonks_paper` | Separate database user. |
| `config.REDIS_DB` | `2` | Separate Redis DB index. |
| `config.DEPLOY_STAGE` | `paper` | Stage identifier. |
| `config.LOG_LEVEL` | `INFO` | Standard logging. |
| `services.extractor.replicas` | `1` | Single replica (GPU bottleneck). |
Paper ingress hostnames:
| Service | Hostname |
|---------|----------|
| Query API | `stonks-paper-api.celestium.life` |
| Symbol Registry | `stonks-paper-registry.celestium.life` |
| Dashboard | `stonks-paper.celestium.life` |
| Superset | `stonks-paper-dash.celestium.life` |
| Trino | `stonks-paper-trino.celestium.life` |
| Trading Engine | `stonks-paper-trading.celestium.life` |
### Deployment Stage Progression
```
values-beta.yaml values-paper.yaml values.yaml (base)
Beta → Paper Trading → Production
Integration Simulated orders Live trading
testing Real market data Real orders
Pipeline OFF Pipeline ON Pipeline ON
Trading OFF Trading ON Trading ON
Analytics OFF Analytics ON Analytics ON
```
Promotion between stages is managed by Kargo/ArgoCD. CI sets the image tag, and the promotion pipeline applies the appropriate values file.