148 lines
4.8 KiB
Markdown
148 lines
4.8 KiB
Markdown
# Stonks Oracle — Operator Runbook
|
|
|
|
## Cluster Access
|
|
|
|
```bash
|
|
kubectl config use-context <your-context>
|
|
# All stonks-oracle resources live in the stonks-oracle namespace
|
|
alias kso='kubectl -n stonks-oracle'
|
|
```
|
|
|
|
## Service Overview
|
|
|
|
| Service | Type | Replicas | Notes |
|
|
|---------|------|----------|-------|
|
|
| scheduler | CronJob-like worker | 1 | Polls sources on schedule |
|
|
| symbol-registry | FastAPI | 1 | Company/watchlist CRUD |
|
|
| ingestion | Queue worker | 2 | Fetches from adapters |
|
|
| parser | Queue worker | 2 | HTML→text extraction |
|
|
| extractor | Queue worker | 1 | LLM-based intelligence extraction |
|
|
| aggregation | Queue worker | 1 | Trend/signal aggregation |
|
|
| recommendation | Queue worker | 1 | Trade signal generation |
|
|
| risk | FastAPI | 1 | Risk evaluation + approval |
|
|
| broker-adapter | Queue worker | 1 | Paper/live order execution |
|
|
| lake-publisher | Queue worker | 1 | Iceberg table publication |
|
|
| query-api | FastAPI | 1 | Dashboard/analytics queries |
|
|
| trino | Analytics engine | 1 | SQL over lakehouse |
|
|
| superset | Dashboard | 1 | Visualization |
|
|
| hive-metastore | Metastore | 1 | Iceberg catalog backend |
|
|
|
|
## Common Operations
|
|
|
|
### Restart a service
|
|
```bash
|
|
kso rollout restart deployment/<service-name>
|
|
```
|
|
|
|
### Check logs
|
|
```bash
|
|
kso logs deployment/<service-name> --tail=50 -f
|
|
# For previous crash:
|
|
kso logs <pod-name> --previous --tail=50
|
|
```
|
|
|
|
### Scale a service
|
|
```bash
|
|
kso scale deployment/<service-name> --replicas=N
|
|
```
|
|
|
|
### Redeploy with updated secrets
|
|
```bash
|
|
GHCR_TOKEN=$(cat /run/secrets/github_token)
|
|
helm upgrade --install stonks-oracle infra/helm/stonks-oracle \
|
|
--namespace stonks-oracle \
|
|
--set "ghcrAuth.password=$GHCR_TOKEN" \
|
|
--set 'secrets.core.POSTGRES_PASSWORD=St0nks0racl3!' \
|
|
--set "secrets.core.MINIO_ACCESS_KEY=AKIA6V7J3N9B5P0D2YQH" \
|
|
--set 'secrets.core.MINIO_SECRET_KEY=8fG3!v2rJ7$wN@9mLpQ6zXbC4tKdPqW1' \
|
|
--set 'secrets.core.REDIS_PASSWORD=PSCh4ng3me!'
|
|
# Then restart deployments to pick up secret changes:
|
|
for dep in $(kso get deployments -o name); do kso rollout restart "$dep"; done
|
|
```
|
|
|
|
### Run database migrations
|
|
```bash
|
|
for f in $(ls infra/migrations/*.sql | sort); do
|
|
kubectl exec -i -n postgresql-service postgresql-1 -c postgres -- psql -U postgres -d stonks < "$f"
|
|
done
|
|
```
|
|
|
|
## Trading Mode Toggle
|
|
|
|
Current mode is set via ConfigMap `stonks-config` key `BROKER_MODE`.
|
|
|
|
```bash
|
|
# Check current mode
|
|
kso get configmap stonks-config -o jsonpath='{.data.BROKER_MODE}'
|
|
|
|
# To switch modes, update values.yaml config.BROKER_MODE and helm upgrade,
|
|
# then restart broker-adapter and risk deployments.
|
|
```
|
|
|
|
**Modes:**
|
|
- `paper` — all orders go through paper trading simulation (default, safe)
|
|
- `live` — orders are submitted to the real broker API (requires operator approval workflow)
|
|
|
|
**Never switch to live without:**
|
|
1. Confirming paper trading PnL is acceptable
|
|
2. Verifying risk limits are configured in `risk_configuration` table
|
|
3. Enabling operator approval in `operator_approvals` table
|
|
|
|
## Operator Approval for Live Trades
|
|
|
|
The risk engine requires explicit operator approval before executing live trades.
|
|
Approvals are managed via the risk API:
|
|
|
|
```bash
|
|
# Check pending approvals
|
|
curl -s https://stonks-api.celestium.life/risk/approvals/pending
|
|
|
|
# Approve a recommendation
|
|
curl -X POST https://stonks-api.celestium.life/risk/approvals/<id>/approve
|
|
```
|
|
|
|
## Common Failure Modes
|
|
|
|
### CrashLoopBackOff on workers
|
|
Queue workers (aggregation, extractor, recommendation, broker-adapter, lake-publisher) exit with code 0 when the queue is empty. Kubernetes restarts them, which is normal. They'll process work when messages arrive.
|
|
|
|
### PostgreSQL auth failure
|
|
Password mismatch between `stonks-core-secrets.POSTGRES_PASSWORD` and the actual DB user password. Fix:
|
|
```bash
|
|
kubectl exec -i -n postgresql-service postgresql-1 -c postgres -- psql -U postgres -d stonks <<'EOF'
|
|
ALTER USER stonks WITH PASSWORD '<new-password>';
|
|
EOF
|
|
```
|
|
Then update the Helm secret and restart.
|
|
|
|
### Redis connection refused
|
|
Check Redis is running: `kubectl get pods -n redis-service`
|
|
If Redis master is down, restart it: `kubectl rollout restart -n redis-service statefulset/redis-master`
|
|
|
|
### ImagePullBackOff
|
|
GHCR credentials expired or missing. Re-run `helm upgrade` with fresh `ghcrAuth.password`.
|
|
|
|
### Superset won't start
|
|
Needs custom image with `sqlalchemy-trino` package. Stock `apache/superset:latest` doesn't include it.
|
|
|
|
## Log Access
|
|
|
|
All services output JSON logs when `JSON_LOGS=true` (default).
|
|
|
|
```bash
|
|
# Stream all logs from a service
|
|
kso logs -f deployment/<service> --tail=100
|
|
|
|
# Search for errors across all pods
|
|
kso logs --all-containers --prefix --tail=100 | grep -i error
|
|
```
|
|
|
|
## Ingress Endpoints
|
|
|
|
| URL | Service |
|
|
|-----|---------|
|
|
| https://stonks-api.celestium.life | Query API |
|
|
| https://stonks-registry.celestium.life | Symbol Registry |
|
|
| https://stonks-dash.celestium.life | Superset |
|
|
| https://stonks-trino.celestium.life | Trino |
|