4.8 KiB
Stonks Oracle — Operator Runbook
Cluster Access
kubectl config use-context <your-context>
# All stonks-oracle resources live in the stonks-oracle namespace
alias kso='kubectl -n stonks-oracle'
Service Overview
| Service | Type | Replicas | Notes |
|---|---|---|---|
| scheduler | CronJob-like worker | 1 | Polls sources on schedule |
| symbol-registry | FastAPI | 1 | Company/watchlist CRUD |
| ingestion | Queue worker | 2 | Fetches from adapters |
| parser | Queue worker | 2 | HTML→text extraction |
| extractor | Queue worker | 1 | LLM-based intelligence extraction |
| aggregation | Queue worker | 1 | Trend/signal aggregation |
| recommendation | Queue worker | 1 | Trade signal generation |
| risk | FastAPI | 1 | Risk evaluation + approval |
| broker-adapter | Queue worker | 1 | Paper/live order execution |
| lake-publisher | Queue worker | 1 | Iceberg table publication |
| query-api | FastAPI | 1 | Dashboard/analytics queries |
| trino | Analytics engine | 1 | SQL over lakehouse |
| superset | Dashboard | 1 | Visualization |
| hive-metastore | Metastore | 1 | Iceberg catalog backend |
Common Operations
Restart a service
kso rollout restart deployment/<service-name>
Check logs
kso logs deployment/<service-name> --tail=50 -f
# For previous crash:
kso logs <pod-name> --previous --tail=50
Scale a service
kso scale deployment/<service-name> --replicas=N
Redeploy with updated secrets
GHCR_TOKEN=$(cat /run/secrets/github_token)
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
--namespace stonks-oracle \
--set "ghcrAuth.password=$GHCR_TOKEN" \
--set 'secrets.core.POSTGRES_PASSWORD=St0nks0racl3!' \
--set "secrets.core.MINIO_ACCESS_KEY=AKIA6V7J3N9B5P0D2YQH" \
--set 'secrets.core.MINIO_SECRET_KEY=8fG3!v2rJ7$wN@9mLpQ6zXbC4tKdPqW1' \
--set 'secrets.core.REDIS_PASSWORD=PSCh4ng3me!'
# Then restart deployments to pick up secret changes:
for dep in $(kso get deployments -o name); do kso rollout restart "$dep"; done
Run database migrations
for f in $(ls infra/migrations/*.sql | sort); do
kubectl exec -i -n postgresql-service postgresql-1 -c postgres -- psql -U postgres -d stonks < "$f"
done
Trading Mode Toggle
Current mode is set via ConfigMap stonks-config key BROKER_MODE.
# Check current mode
kso get configmap stonks-config -o jsonpath='{.data.BROKER_MODE}'
# To switch modes, update values.yaml config.BROKER_MODE and helm upgrade,
# then restart broker-adapter and risk deployments.
Modes:
paper— all orders go through paper trading simulation (default, safe)live— orders are submitted to the real broker API (requires operator approval workflow)
Never switch to live without:
- Confirming paper trading PnL is acceptable
- Verifying risk limits are configured in
risk_configurationtable - Enabling operator approval in
operator_approvalstable
Operator Approval for Live Trades
The risk engine requires explicit operator approval before executing live trades. Approvals are managed via the risk API:
# Check pending approvals
curl -s https://stonks-api.celestium.life/risk/approvals/pending
# Approve a recommendation
curl -X POST https://stonks-api.celestium.life/risk/approvals/<id>/approve
Common Failure Modes
CrashLoopBackOff on workers
Queue workers (aggregation, extractor, recommendation, broker-adapter, lake-publisher) exit with code 0 when the queue is empty. Kubernetes restarts them, which is normal. They'll process work when messages arrive.
PostgreSQL auth failure
Password mismatch between stonks-core-secrets.POSTGRES_PASSWORD and the actual DB user password. Fix:
kubectl exec -i -n postgresql-service postgresql-1 -c postgres -- psql -U postgres -d stonks <<'EOF'
ALTER USER stonks WITH PASSWORD '<new-password>';
EOF
Then update the Helm secret and restart.
Redis connection refused
Check Redis is running: kubectl get pods -n redis-service
If Redis master is down, restart it: kubectl rollout restart -n redis-service statefulset/redis-master
ImagePullBackOff
GHCR credentials expired or missing. Re-run helm upgrade with fresh ghcrAuth.password.
Superset won't start
Needs custom image with sqlalchemy-trino package. Stock apache/superset:latest doesn't include it.
Log Access
All services output JSON logs when JSON_LOGS=true (default).
# Stream all logs from a service
kso logs -f deployment/<service> --tail=100
# Search for errors across all pods
kso logs --all-containers --prefix --tail=100 | grep -i error
Ingress Endpoints
| URL | Service |
|---|---|
| https://stonks-api.celestium.life | Query API |
| https://stonks-registry.celestium.life | Symbol Registry |
| https://stonks-dash.celestium.life | Superset |
| https://stonks-trino.celestium.life | Trino |