- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
24 KiB
Docker Deployment Guide
This guide covers running the full Stonks Oracle platform locally using Docker Compose. It documents every service, environment variable, volume mount, health check, and operational command.
Prerequisites
- Docker Engine 24+ and Docker Compose v2
- At least 16 GB RAM (Ollama + Trino + all services)
- API keys for Polygon.io and Alpaca (optional — platform runs in degraded mode without them)
Quick Start
# 1. Clone the repository
git clone <repo-url> && cd stonks-oracle
# 2. Configure API keys
cp .env.example .env # or edit the existing .env
# Fill in MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET
# 3. Start everything
docker compose up -d
# 4. Verify all services are healthy
docker compose ps
# 5. Access the dashboard
open http://localhost:3000
Service Inventory
Infrastructure Services
| Service | Image | Ports | Volumes | Purpose |
|---|---|---|---|---|
postgres |
postgres:16-alpine |
5432:5432 |
pgdata → /var/lib/postgresql/data, ./infra/migrations → /docker-entrypoint-initdb.d |
Primary database; migrations auto-applied on first start |
redis |
redis:7-alpine |
6379:6379 |
— | Queue broker, caching, deduplication |
minio |
minio/minio:latest |
9000:9000 (API), 9001:9001 (console) |
miniodata → /data |
Object storage for raw artifacts and lakehouse |
minio-init |
minio/mc:latest |
— | — | One-shot init container that creates required buckets |
ollama |
ollama/ollama:latest |
11434:11434 |
ollama_models → /root/.ollama |
LLM inference server for extraction and classification |
trino |
trinodb/trino:latest |
8080:8080 |
./infra/trino/catalog → /etc/trino/catalog |
SQL query engine over the lakehouse |
hive-metastore |
apache/hive:4.0.0 |
9083:9083 |
hive_data → /opt/hive/data, ./infra/hive/core-site.xml → /opt/hive/conf/core-site.xml, ./infra/hive/metastore-site.xml → /opt/hive/conf/metastore-site.xml |
Iceberg/Hive metadata catalog for Trino |
superset |
apache/superset:latest |
8088:8088 |
superset_data → /app/superset_home |
BI dashboards over Trino |
Application Services
| Service | Dockerfile | SERVICE_CMD / Command |
Ports | Depends On |
|---|---|---|---|---|
scheduler |
docker/Dockerfile.scheduler |
python -m services.scheduler.app |
— | postgres (healthy), redis (healthy) |
symbol-registry |
docker/Dockerfile |
uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 |
8001:8000 |
postgres (healthy) |
ingestion |
docker/Dockerfile |
python -m services.ingestion.worker |
— | postgres (healthy), redis (healthy), minio (healthy) |
parser |
docker/Dockerfile |
python -m services.parser.worker |
— | postgres (healthy), redis (healthy) |
extractor |
docker/Dockerfile |
python -m services.extractor.main |
— | postgres (healthy), redis (healthy), ollama (started) |
aggregation |
docker/Dockerfile |
python -m services.aggregation.main |
— | postgres (healthy), redis (healthy) |
recommendation |
docker/Dockerfile |
python -m services.recommendation.main |
— | postgres (healthy), redis (healthy) |
trading-engine |
docker/Dockerfile |
uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 |
8002:8000 |
postgres (healthy), redis (healthy) |
risk-engine |
docker/Dockerfile |
uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 |
8003:8000 |
postgres (healthy) |
broker-adapter |
docker/Dockerfile |
python -m services.adapters.broker_service |
— | postgres (healthy), redis (healthy) |
lake-publisher |
docker/Dockerfile |
python -m services.lake_publisher.jobs |
— | postgres (healthy), minio (healthy) |
query-api |
docker/Dockerfile |
uvicorn services.api.app:app --host 0.0.0.0 --port 8000 |
8004:8000 |
postgres (healthy), redis (healthy), minio (healthy) |
dashboard |
frontend/Dockerfile |
nginx (built-in) | 3000:8080 |
query-api (healthy) |
Port Summary
| Port | Service | Protocol |
|---|---|---|
| 3000 | Dashboard (React UI) | HTTP |
| 5432 | PostgreSQL | TCP |
| 6379 | Redis | TCP |
| 8001 | Symbol Registry API | HTTP |
| 8002 | Trading Engine API | HTTP |
| 8003 | Risk Engine API | HTTP |
| 8004 | Query API | HTTP |
| 8080 | Trino | HTTP |
| 8088 | Superset | HTTP |
| 9000 | MinIO API | HTTP |
| 9001 | MinIO Console | HTTP |
| 9083 | Hive Metastore | Thrift |
| 11434 | Ollama | HTTP |
Environment Variables
Shared Application Environment (x-app-env)
All application services inherit these variables via the x-app-env YAML anchor:
| Variable | Default | Description |
|---|---|---|
POSTGRES_HOST |
postgres |
PostgreSQL hostname (Docker service name) |
POSTGRES_PORT |
5432 |
PostgreSQL port |
POSTGRES_DB |
stonks |
Database name |
POSTGRES_USER |
stonks |
Database user |
POSTGRES_PASSWORD |
stonks_dev |
Database password |
REDIS_HOST |
redis |
Redis hostname (Docker service name) |
REDIS_PORT |
6379 |
Redis port |
MINIO_ENDPOINT |
minio:9000 |
MinIO API endpoint |
MINIO_ACCESS_KEY |
minioadmin |
MinIO access key |
MINIO_SECRET_KEY |
minioadmin |
MinIO secret key |
OLLAMA_BASE_URL |
http://ollama:11434 |
Ollama LLM server URL |
.env File
The .env file is loaded by ingestion, broker-adapter, and trading-engine via the env_file directive. Create it in the repository root:
# Stonks Oracle — Environment Variables
# These are loaded by ingestion, broker-adapter, and trading-engine services.
# Polygon.io market data API key (required for live data ingestion)
MARKET_DATA_API_KEY=
# Alpaca broker credentials (required for paper/live trading)
BROKER_API_KEY=
BROKER_API_SECRET=
BROKER_BASE_URL=https://paper-api.alpaca.markets
| Variable | Required | Default | Used By | Description |
|---|---|---|---|---|
MARKET_DATA_API_KEY |
No* | (empty) | ingestion | Polygon.io API key for market data fetching |
BROKER_API_KEY |
No* | (empty) | broker-adapter, trading-engine | Alpaca API key |
BROKER_API_SECRET |
No* | (empty) | broker-adapter, trading-engine | Alpaca API secret |
BROKER_BASE_URL |
No | https://paper-api.alpaca.markets |
broker-adapter, trading-engine | Alpaca API base URL |
*Services start without these keys but run in degraded mode — ingestion cannot fetch market data and the broker adapter cannot execute trades.
Infrastructure Service Environment
PostgreSQL (postgres):
| Variable | Value | Description |
|---|---|---|
POSTGRES_DB |
stonks |
Database created on first start |
POSTGRES_USER |
stonks |
Superuser for the database |
POSTGRES_PASSWORD |
stonks_dev |
Password for the database user |
MinIO (minio):
| Variable | Value | Description |
|---|---|---|
MINIO_ROOT_USER |
minioadmin |
MinIO admin username |
MINIO_ROOT_PASSWORD |
minioadmin |
MinIO admin password |
Trino (trino):
| Variable | Value | Description |
|---|---|---|
MINIO_ACCESS_KEY |
minioadmin |
Passed to Trino for MinIO catalog access |
MINIO_SECRET_KEY |
minioadmin |
Passed to Trino for MinIO catalog access |
Hive Metastore (hive-metastore):
| Variable | Value | Description |
|---|---|---|
SERVICE_NAME |
metastore |
Tells Hive to run in metastore-only mode |
DB_DRIVER |
derby |
Embedded Derby database for metadata |
Superset (superset):
| Variable | Value | Description |
|---|---|---|
SUPERSET_SECRET_KEY |
stonks-dev-secret-key-change-me |
Flask secret key (change in production) |
ADMIN_USERNAME |
admin |
Initial admin username |
ADMIN_PASSWORD |
admin |
Initial admin password |
ADMIN_EMAIL |
admin@stonks.local |
Initial admin email |
Additional Configuration Variables
All application services support additional environment variables loaded via services/shared/config.py. These can be added to individual service environment blocks or to the x-app-env anchor as needed:
| Variable | Default | Description |
|---|---|---|
REDIS_DB |
0 |
Redis database number |
REDIS_PASSWORD |
(none) | Redis password (not needed in Docker Compose) |
MINIO_SECURE |
false |
Use HTTPS for MinIO |
OLLAMA_MODEL |
qwen3.5:9b |
Default LLM model for extraction |
OLLAMA_TIMEOUT |
120 |
Ollama request timeout (seconds) |
OLLAMA_MAX_RETRIES |
2 |
Max retries for Ollama requests |
TRINO_HOST |
localhost |
Trino hostname |
TRINO_PORT |
8080 |
Trino port |
TRINO_CATALOG |
lakehouse |
Trino catalog name |
TRINO_SCHEMA |
stonks |
Trino schema name |
MARKET_DATA_BASE_URL |
https://api.polygon.io |
Polygon.io base URL |
MARKET_DATA_PROVIDER |
polygon |
Market data provider |
BROKER_MODE |
paper |
Broker mode: paper or live |
BROKER_PROVIDER |
alpaca |
Broker provider |
TRADING_ENABLED |
false |
Enable autonomous trading engine |
TRADING_RISK_TIER |
moderate |
Risk tier: conservative, moderate, aggressive |
TRADING_POLLING_INTERVAL_SECONDS |
60 |
Recommendation polling interval |
TRADING_MAX_OPEN_POSITIONS |
10 |
Maximum concurrent open positions |
MACRO_ENABLED |
true |
Enable macro signal layer |
COMPETITIVE_ENABLED |
true |
Enable competitive signal layer |
LOG_LEVEL |
INFO |
Logging level |
JSON_LOGS |
true |
Enable structured JSON logging |
DEPLOY_STAGE |
(empty) | Deployment stage prefix for bucket names |
See services/shared/config.py for the complete list of all supported environment variables with their defaults.
Volume Mounts and Data Persistence
Docker Compose defines five named volumes for persistent data:
| Volume | Mounted By | Mount Path | Contents |
|---|---|---|---|
pgdata |
postgres | /var/lib/postgresql/data |
PostgreSQL database files |
miniodata |
minio | /data |
MinIO object storage (raw artifacts, lakehouse Parquet files) |
ollama_models |
ollama | /root/.ollama |
Downloaded LLM model weights |
hive_data |
hive-metastore | /opt/hive/data |
Hive metastore Derby database |
superset_data |
superset | /app/superset_home |
Superset configuration and metadata |
Bind Mounts
In addition to named volumes, several services use bind mounts for configuration:
| Service | Host Path | Container Path | Mode | Purpose |
|---|---|---|---|---|
| postgres | ./infra/migrations |
/docker-entrypoint-initdb.d |
rw | SQL migrations auto-applied on first start |
| trino | ./infra/trino/catalog |
/etc/trino/catalog |
rw | Trino catalog configuration (lakehouse, iceberg) |
| hive-metastore | ./infra/hive/core-site.xml |
/opt/hive/conf/core-site.xml |
ro | Hadoop core-site config for MinIO access |
| hive-metastore | ./infra/hive/metastore-site.xml |
/opt/hive/conf/metastore-site.xml |
ro | Hive metastore config |
Resetting Data
To destroy all persistent data and start fresh:
# Stop all containers and remove named volumes
docker compose down -v
This removes pgdata, miniodata, ollama_models, hive_data, and superset_data. The next docker compose up will re-initialize PostgreSQL with migrations, re-create MinIO buckets (via minio-init), and re-download Ollama models.
To reset only specific volumes:
docker compose down
docker volume rm stonks-oracle_pgdata # Reset database only
docker compose up -d
Note
: Volume names are prefixed with the project directory name (e.g.,
stonks-oracle_pgdata). Usedocker volume lsto see exact names.
Health Checks
Every service has a health check configured. Docker Compose uses these to enforce startup ordering via depends_on with condition: service_healthy.
Infrastructure Health Checks
| Service | Test Command | Interval | Retries |
|---|---|---|---|
postgres |
pg_isready -U stonks |
5s | 5 |
redis |
redis-cli ping |
5s | 5 |
minio |
mc ready local |
5s | 5 |
Application Health Checks — FastAPI Services
FastAPI services (symbol-registry, trading-engine, risk-engine, query-api) use HTTP health endpoints:
| Service | Test Command | Interval | Timeout | Retries | Start Period |
|---|---|---|---|---|---|
symbol-registry |
curl -f http://localhost:8000/health |
10s | 5s | 3 | 15s |
trading-engine |
curl -f http://localhost:8000/health |
10s | 5s | 3 | 15s |
risk-engine |
curl -f http://localhost:8000/health |
10s | 5s | 3 | 15s |
query-api |
curl -f http://localhost:8000/health |
10s | 5s | 3 | 15s |
dashboard |
curl -f http://localhost:8080/ |
10s | 5s | 3 | 10s |
Application Health Checks — Worker Services
Worker services (no HTTP endpoint) use process liveness checks:
| Service | Test Command | Interval | Timeout | Retries | Start Period |
|---|---|---|---|---|---|
scheduler |
pgrep -f 'python -m services.scheduler.app' |
10s | 5s | 3 | 15s |
ingestion |
pgrep -f 'python -m services.ingestion.worker' |
10s | 5s | 3 | 15s |
parser |
pgrep -f 'python -m services.parser.worker' |
10s | 5s | 3 | 15s |
extractor |
pgrep -f 'python -m services.extractor.main' |
10s | 5s | 3 | 15s |
aggregation |
pgrep -f 'python -m services.aggregation.main' |
10s | 5s | 3 | 15s |
recommendation |
pgrep -f 'python -m services.recommendation.main' |
10s | 5s | 3 | 15s |
broker-adapter |
pgrep -f 'python -m services.adapters.broker_service' |
10s | 5s | 3 | 15s |
lake-publisher |
pgrep -f 'python -m services.lake_publisher.jobs' |
10s | 5s | 3 | 15s |
Verifying Service Health
# Check all service statuses
docker compose ps
# Check a specific service
docker compose ps query-api
# Inspect health check details for a container
docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | python -m json.tool
Dockerfile Build Details
docker/Dockerfile — Generic Python Service Image
Used by all application services except the scheduler. Accepts a SERVICE_CMD build argument that determines which service the container runs.
Base image: python:3.12-slim
Build arguments:
| Argument | Default | Description |
|---|---|---|
SERVICE_CMD |
python -m services.scheduler.app |
The command executed when the container starts |
What gets copied:
requirements.txt→ pip dependencies installedservices/→ all service source codetests/→ test files (available for in-container testing)conftest.py→ pytest configuration
Environment variables set:
PYTHONDONTWRITEBYTECODE=1— no.pycfilesPYTHONUNBUFFERED=1— unbuffered stdout/stderr for log visibilityPYTHONPATH=/app— ensuresservices.*imports resolve
System packages installed: gcc, libpq-dev (PostgreSQL client library), curl (for health checks)
Security: Runs as non-root user stonks (UID 1000).
How SERVICE_CMD works: The CMD directive is sh -c "${SERVICE_CMD}", so the build argument becomes the runtime command. Each service in docker-compose.yml overrides this via the args.SERVICE_CMD build parameter:
query-api:
build:
context: .
dockerfile: docker/Dockerfile
args:
SERVICE_CMD: "uvicorn services.api.app:app --host 0.0.0.0 --port 8000"
docker/Dockerfile.scheduler — Scheduler Image
A specialized variant of the generic Dockerfile used only by the scheduler service. Adds postgresql-client for running database migrations via psql.
Additional contents:
infra/migrations/→ copied to/app/infra/migrations/for migration executionpostgresql-clientsystem package installed
Command: Hardcoded CMD ["python", "-m", "services.scheduler.app"] (no SERVICE_CMD argument).
docker/Dockerfile.superset — Custom Superset Image
Extends the official Apache Superset image with additional database drivers.
Base image: apache/superset:latest
Additional packages: trino[sqlalchemy], psycopg2-binary, redis
frontend/Dockerfile — Dashboard Image
Multi-stage build for the React dashboard.
Stage 1 — Build (base: node:24-alpine):
| Build Argument | Default | Description |
|---|---|---|
VITE_QUERY_API_URL |
"" |
Query API base URL (empty = use relative /api/ proxy) |
VITE_SYMBOL_REGISTRY_URL |
"" |
Symbol Registry base URL (empty = use relative /registry/ proxy) |
VITE_RISK_ENGINE_URL |
"" |
Risk Engine base URL (empty = use relative /risk/ proxy) |
Stage 2 — Serve (base: nginxinc/nginx-unprivileged:alpine):
- Serves the built static files on port 8080
- Uses
frontend/nginx.conffor SPA fallback and API reverse proxying - Proxies
/api/→query-api:8000,/registry/→symbol-registry:8000,/risk/→risk-engine:8000,/trading/→trading-engine:8000
Building Custom Images
To build a single service image locally:
# Build the query-api image
docker compose build query-api
# Build with a custom SERVICE_CMD
docker build -t my-custom-service \
--build-arg SERVICE_CMD="python -m services.my_service.main" \
-f docker/Dockerfile .
# Build the dashboard with custom API URLs
docker build -t my-dashboard \
--build-arg VITE_QUERY_API_URL="https://api.example.com" \
-f frontend/Dockerfile frontend/
# Rebuild all images
docker compose build
Dependency Ordering
Docker Compose enforces startup order using depends_on with health check conditions. The dependency graph is:
postgres (healthy) ──┬── scheduler
├── symbol-registry
├── ingestion
├── parser
├── extractor
├── aggregation
├── recommendation
├── trading-engine
├── risk-engine
├── broker-adapter
├── lake-publisher
└── query-api
redis (healthy) ─────┬── scheduler
├── ingestion
├── parser
├── extractor
├── aggregation
├── recommendation
├── trading-engine
├── broker-adapter
└── query-api
minio (healthy) ─────┬── minio-init
├── ingestion
├── lake-publisher
└── query-api
ollama (started) ────── extractor
minio ───────────────── trino
hive-metastore ─────── trino
trino ──────────────── superset (via depends_on)
query-api (healthy) ── dashboard
Services with condition: service_healthy wait until the dependency's health check passes. The extractor depends on ollama with condition: service_started (no health check — Ollama may take time to load models).
Operational Commands
Starting Services
# Start all services in the background
docker compose up -d
# Start only infrastructure (useful for local development)
docker compose up -d postgres redis minio minio-init ollama
# Start a specific service and its dependencies
docker compose up -d query-api
Stopping Services
# Stop all services (preserves volumes)
docker compose down
# Stop all services and remove volumes (full reset)
docker compose down -v
# Stop a specific service
docker compose stop trading-engine
Restarting Services
# Restart a specific service
docker compose restart query-api
# Restart with a fresh build
docker compose up -d --build query-api
# Force recreate a service (picks up compose file changes)
docker compose up -d --force-recreate query-api
Viewing Logs
# Follow logs for all services
docker compose logs -f
# Follow logs for a specific service
docker compose logs -f query-api
# View last 50 lines of a service's logs
docker compose logs --tail=50 ingestion
# View logs for multiple services
docker compose logs -f scheduler ingestion extractor
Scaling Replicas
# Scale a worker service to 3 replicas
docker compose up -d --scale ingestion=3
# Scale multiple services
docker compose up -d --scale ingestion=3 --scale extractor=2
# Scale back to 1
docker compose up -d --scale ingestion=1
Note
: Scaling works best for worker services (ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher) that consume from Redis queues. Do not scale FastAPI services that expose host ports without adjusting port mappings.
Inspecting Services
# List all services and their status
docker compose ps
# View resource usage
docker compose top
# Execute a command inside a running container
docker compose exec query-api python -c "from services.shared.config import load_config; print(load_config())"
# Open a shell in a container
docker compose exec postgres psql -U stonks -d stonks
Full Reset
# Nuclear option: stop everything, remove volumes, rebuild, restart
docker compose down -v
docker compose build --no-cache
docker compose up -d
This destroys all data (database, object storage, model weights, metastore, Superset config) and starts from scratch. PostgreSQL migrations are re-applied automatically. MinIO buckets are re-created by minio-init. Ollama models must be re-downloaded.
MinIO Bucket Initialization
The minio-init service runs once on startup and creates the required object storage buckets:
| Bucket | Purpose |
|---|---|
stonks-raw-market |
Raw market data from Polygon.io |
stonks-raw-news |
Raw news articles |
stonks-raw-filings |
Raw SEC filings |
stonks-normalized |
Normalized/parsed documents |
stonks-llm-prompts |
LLM prompt archives |
stonks-llm-results |
LLM extraction results |
stonks-lakehouse |
Parquet fact tables for Trino |
stonks-audit |
Audit trail artifacts |
Access the MinIO console at http://localhost:9001 (credentials: minioadmin / minioadmin).
Dashboard Reverse Proxy
The dashboard container runs nginx with reverse proxy rules that route API requests to backend services using Docker Compose service names:
| Path | Proxied To | Service |
|---|---|---|
/api/ |
http://query-api:8000 |
Query API |
/registry/ |
http://symbol-registry:8000/ |
Symbol Registry API |
/risk/ |
http://risk-engine:8000/ |
Risk Engine API |
/trading/ |
http://trading-engine:8000/ |
Trading Engine API |
All other paths serve the React SPA with try_files fallback to index.html.
Troubleshooting
Service won't start
Check dependency health:
docker compose ps postgres redis minio
If infrastructure services are unhealthy, application services will wait indefinitely. Check infrastructure logs:
docker compose logs postgres
Database migration errors
Migrations in ./infra/migrations/ are applied by PostgreSQL's docker-entrypoint-initdb.d mechanism, which only runs on first database initialization. If you need to re-run migrations:
docker compose down -v # Remove pgdata volume
docker compose up -d # Migrations re-applied on fresh init
Ollama model not available
The extractor service needs an LLM model loaded in Ollama. Pull a model manually:
docker compose exec ollama ollama pull qwen3.5:9b
Port conflicts
If a port is already in use, modify the host port mapping in docker-compose.yml:
query-api:
ports:
- "9004:8000" # Changed from 8004 to 9004