Files
stonks-oracle/docs/docker-deployment.md
T
Celes Renata 88ad1e8d99 feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
2026-04-22 02:56:41 +00:00

24 KiB

Docker Deployment Guide

This guide covers running the full Stonks Oracle platform locally using Docker Compose. It documents every service, environment variable, volume mount, health check, and operational command.

Prerequisites

  • Docker Engine 24+ and Docker Compose v2
  • At least 16 GB RAM (Ollama + Trino + all services)
  • API keys for Polygon.io and Alpaca (optional — platform runs in degraded mode without them)

Quick Start

# 1. Clone the repository
git clone <repo-url> && cd stonks-oracle

# 2. Configure API keys
cp .env.example .env   # or edit the existing .env
# Fill in MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET

# 3. Start everything
docker compose up -d

# 4. Verify all services are healthy
docker compose ps

# 5. Access the dashboard
open http://localhost:3000

Service Inventory

Infrastructure Services

Service Image Ports Volumes Purpose
postgres postgres:16-alpine 5432:5432 pgdata/var/lib/postgresql/data, ./infra/migrations/docker-entrypoint-initdb.d Primary database; migrations auto-applied on first start
redis redis:7-alpine 6379:6379 Queue broker, caching, deduplication
minio minio/minio:latest 9000:9000 (API), 9001:9001 (console) miniodata/data Object storage for raw artifacts and lakehouse
minio-init minio/mc:latest One-shot init container that creates required buckets
ollama ollama/ollama:latest 11434:11434 ollama_models/root/.ollama LLM inference server for extraction and classification
trino trinodb/trino:latest 8080:8080 ./infra/trino/catalog/etc/trino/catalog SQL query engine over the lakehouse
hive-metastore apache/hive:4.0.0 9083:9083 hive_data/opt/hive/data, ./infra/hive/core-site.xml/opt/hive/conf/core-site.xml, ./infra/hive/metastore-site.xml/opt/hive/conf/metastore-site.xml Iceberg/Hive metadata catalog for Trino
superset apache/superset:latest 8088:8088 superset_data/app/superset_home BI dashboards over Trino

Application Services

Service Dockerfile SERVICE_CMD / Command Ports Depends On
scheduler docker/Dockerfile.scheduler python -m services.scheduler.app postgres (healthy), redis (healthy)
symbol-registry docker/Dockerfile uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 8001:8000 postgres (healthy)
ingestion docker/Dockerfile python -m services.ingestion.worker postgres (healthy), redis (healthy), minio (healthy)
parser docker/Dockerfile python -m services.parser.worker postgres (healthy), redis (healthy)
extractor docker/Dockerfile python -m services.extractor.main postgres (healthy), redis (healthy), ollama (started)
aggregation docker/Dockerfile python -m services.aggregation.main postgres (healthy), redis (healthy)
recommendation docker/Dockerfile python -m services.recommendation.main postgres (healthy), redis (healthy)
trading-engine docker/Dockerfile uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 8002:8000 postgres (healthy), redis (healthy)
risk-engine docker/Dockerfile uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 8003:8000 postgres (healthy)
broker-adapter docker/Dockerfile python -m services.adapters.broker_service postgres (healthy), redis (healthy)
lake-publisher docker/Dockerfile python -m services.lake_publisher.jobs postgres (healthy), minio (healthy)
query-api docker/Dockerfile uvicorn services.api.app:app --host 0.0.0.0 --port 8000 8004:8000 postgres (healthy), redis (healthy), minio (healthy)
dashboard frontend/Dockerfile nginx (built-in) 3000:8080 query-api (healthy)

Port Summary

Port Service Protocol
3000 Dashboard (React UI) HTTP
5432 PostgreSQL TCP
6379 Redis TCP
8001 Symbol Registry API HTTP
8002 Trading Engine API HTTP
8003 Risk Engine API HTTP
8004 Query API HTTP
8080 Trino HTTP
8088 Superset HTTP
9000 MinIO API HTTP
9001 MinIO Console HTTP
9083 Hive Metastore Thrift
11434 Ollama HTTP

Environment Variables

Shared Application Environment (x-app-env)

All application services inherit these variables via the x-app-env YAML anchor:

Variable Default Description
POSTGRES_HOST postgres PostgreSQL hostname (Docker service name)
POSTGRES_PORT 5432 PostgreSQL port
POSTGRES_DB stonks Database name
POSTGRES_USER stonks Database user
POSTGRES_PASSWORD stonks_dev Database password
REDIS_HOST redis Redis hostname (Docker service name)
REDIS_PORT 6379 Redis port
MINIO_ENDPOINT minio:9000 MinIO API endpoint
MINIO_ACCESS_KEY minioadmin MinIO access key
MINIO_SECRET_KEY minioadmin MinIO secret key
OLLAMA_BASE_URL http://ollama:11434 Ollama LLM server URL

.env File

The .env file is loaded by ingestion, broker-adapter, and trading-engine via the env_file directive. Create it in the repository root:

# Stonks Oracle — Environment Variables
# These are loaded by ingestion, broker-adapter, and trading-engine services.

# Polygon.io market data API key (required for live data ingestion)
MARKET_DATA_API_KEY=

# Alpaca broker credentials (required for paper/live trading)
BROKER_API_KEY=
BROKER_API_SECRET=
BROKER_BASE_URL=https://paper-api.alpaca.markets
Variable Required Default Used By Description
MARKET_DATA_API_KEY No* (empty) ingestion Polygon.io API key for market data fetching
BROKER_API_KEY No* (empty) broker-adapter, trading-engine Alpaca API key
BROKER_API_SECRET No* (empty) broker-adapter, trading-engine Alpaca API secret
BROKER_BASE_URL No https://paper-api.alpaca.markets broker-adapter, trading-engine Alpaca API base URL

*Services start without these keys but run in degraded mode — ingestion cannot fetch market data and the broker adapter cannot execute trades.

Infrastructure Service Environment

PostgreSQL (postgres):

Variable Value Description
POSTGRES_DB stonks Database created on first start
POSTGRES_USER stonks Superuser for the database
POSTGRES_PASSWORD stonks_dev Password for the database user

MinIO (minio):

Variable Value Description
MINIO_ROOT_USER minioadmin MinIO admin username
MINIO_ROOT_PASSWORD minioadmin MinIO admin password

Trino (trino):

Variable Value Description
MINIO_ACCESS_KEY minioadmin Passed to Trino for MinIO catalog access
MINIO_SECRET_KEY minioadmin Passed to Trino for MinIO catalog access

Hive Metastore (hive-metastore):

Variable Value Description
SERVICE_NAME metastore Tells Hive to run in metastore-only mode
DB_DRIVER derby Embedded Derby database for metadata

Superset (superset):

Variable Value Description
SUPERSET_SECRET_KEY stonks-dev-secret-key-change-me Flask secret key (change in production)
ADMIN_USERNAME admin Initial admin username
ADMIN_PASSWORD admin Initial admin password
ADMIN_EMAIL admin@stonks.local Initial admin email

Additional Configuration Variables

All application services support additional environment variables loaded via services/shared/config.py. These can be added to individual service environment blocks or to the x-app-env anchor as needed:

Variable Default Description
REDIS_DB 0 Redis database number
REDIS_PASSWORD (none) Redis password (not needed in Docker Compose)
MINIO_SECURE false Use HTTPS for MinIO
OLLAMA_MODEL qwen3.5:9b Default LLM model for extraction
OLLAMA_TIMEOUT 120 Ollama request timeout (seconds)
OLLAMA_MAX_RETRIES 2 Max retries for Ollama requests
TRINO_HOST localhost Trino hostname
TRINO_PORT 8080 Trino port
TRINO_CATALOG lakehouse Trino catalog name
TRINO_SCHEMA stonks Trino schema name
MARKET_DATA_BASE_URL https://api.polygon.io Polygon.io base URL
MARKET_DATA_PROVIDER polygon Market data provider
BROKER_MODE paper Broker mode: paper or live
BROKER_PROVIDER alpaca Broker provider
TRADING_ENABLED false Enable autonomous trading engine
TRADING_RISK_TIER moderate Risk tier: conservative, moderate, aggressive
TRADING_POLLING_INTERVAL_SECONDS 60 Recommendation polling interval
TRADING_MAX_OPEN_POSITIONS 10 Maximum concurrent open positions
MACRO_ENABLED true Enable macro signal layer
COMPETITIVE_ENABLED true Enable competitive signal layer
LOG_LEVEL INFO Logging level
JSON_LOGS true Enable structured JSON logging
DEPLOY_STAGE (empty) Deployment stage prefix for bucket names

See services/shared/config.py for the complete list of all supported environment variables with their defaults.


Volume Mounts and Data Persistence

Docker Compose defines five named volumes for persistent data:

Volume Mounted By Mount Path Contents
pgdata postgres /var/lib/postgresql/data PostgreSQL database files
miniodata minio /data MinIO object storage (raw artifacts, lakehouse Parquet files)
ollama_models ollama /root/.ollama Downloaded LLM model weights
hive_data hive-metastore /opt/hive/data Hive metastore Derby database
superset_data superset /app/superset_home Superset configuration and metadata

Bind Mounts

In addition to named volumes, several services use bind mounts for configuration:

Service Host Path Container Path Mode Purpose
postgres ./infra/migrations /docker-entrypoint-initdb.d rw SQL migrations auto-applied on first start
trino ./infra/trino/catalog /etc/trino/catalog rw Trino catalog configuration (lakehouse, iceberg)
hive-metastore ./infra/hive/core-site.xml /opt/hive/conf/core-site.xml ro Hadoop core-site config for MinIO access
hive-metastore ./infra/hive/metastore-site.xml /opt/hive/conf/metastore-site.xml ro Hive metastore config

Resetting Data

To destroy all persistent data and start fresh:

# Stop all containers and remove named volumes
docker compose down -v

This removes pgdata, miniodata, ollama_models, hive_data, and superset_data. The next docker compose up will re-initialize PostgreSQL with migrations, re-create MinIO buckets (via minio-init), and re-download Ollama models.

To reset only specific volumes:

docker compose down
docker volume rm stonks-oracle_pgdata    # Reset database only
docker compose up -d

Note

: Volume names are prefixed with the project directory name (e.g., stonks-oracle_pgdata). Use docker volume ls to see exact names.


Health Checks

Every service has a health check configured. Docker Compose uses these to enforce startup ordering via depends_on with condition: service_healthy.

Infrastructure Health Checks

Service Test Command Interval Retries
postgres pg_isready -U stonks 5s 5
redis redis-cli ping 5s 5
minio mc ready local 5s 5

Application Health Checks — FastAPI Services

FastAPI services (symbol-registry, trading-engine, risk-engine, query-api) use HTTP health endpoints:

Service Test Command Interval Timeout Retries Start Period
symbol-registry curl -f http://localhost:8000/health 10s 5s 3 15s
trading-engine curl -f http://localhost:8000/health 10s 5s 3 15s
risk-engine curl -f http://localhost:8000/health 10s 5s 3 15s
query-api curl -f http://localhost:8000/health 10s 5s 3 15s
dashboard curl -f http://localhost:8080/ 10s 5s 3 10s

Application Health Checks — Worker Services

Worker services (no HTTP endpoint) use process liveness checks:

Service Test Command Interval Timeout Retries Start Period
scheduler pgrep -f 'python -m services.scheduler.app' 10s 5s 3 15s
ingestion pgrep -f 'python -m services.ingestion.worker' 10s 5s 3 15s
parser pgrep -f 'python -m services.parser.worker' 10s 5s 3 15s
extractor pgrep -f 'python -m services.extractor.main' 10s 5s 3 15s
aggregation pgrep -f 'python -m services.aggregation.main' 10s 5s 3 15s
recommendation pgrep -f 'python -m services.recommendation.main' 10s 5s 3 15s
broker-adapter pgrep -f 'python -m services.adapters.broker_service' 10s 5s 3 15s
lake-publisher pgrep -f 'python -m services.lake_publisher.jobs' 10s 5s 3 15s

Verifying Service Health

# Check all service statuses
docker compose ps

# Check a specific service
docker compose ps query-api

# Inspect health check details for a container
docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | python -m json.tool

Dockerfile Build Details

docker/Dockerfile — Generic Python Service Image

Used by all application services except the scheduler. Accepts a SERVICE_CMD build argument that determines which service the container runs.

Base image: python:3.12-slim

Build arguments:

Argument Default Description
SERVICE_CMD python -m services.scheduler.app The command executed when the container starts

What gets copied:

  • requirements.txt → pip dependencies installed
  • services/ → all service source code
  • tests/ → test files (available for in-container testing)
  • conftest.py → pytest configuration

Environment variables set:

  • PYTHONDONTWRITEBYTECODE=1 — no .pyc files
  • PYTHONUNBUFFERED=1 — unbuffered stdout/stderr for log visibility
  • PYTHONPATH=/app — ensures services.* imports resolve

System packages installed: gcc, libpq-dev (PostgreSQL client library), curl (for health checks)

Security: Runs as non-root user stonks (UID 1000).

How SERVICE_CMD works: The CMD directive is sh -c "${SERVICE_CMD}", so the build argument becomes the runtime command. Each service in docker-compose.yml overrides this via the args.SERVICE_CMD build parameter:

query-api:
  build:
    context: .
    dockerfile: docker/Dockerfile
    args:
      SERVICE_CMD: "uvicorn services.api.app:app --host 0.0.0.0 --port 8000"

docker/Dockerfile.scheduler — Scheduler Image

A specialized variant of the generic Dockerfile used only by the scheduler service. Adds postgresql-client for running database migrations via psql.

Additional contents:

  • infra/migrations/ → copied to /app/infra/migrations/ for migration execution
  • postgresql-client system package installed

Command: Hardcoded CMD ["python", "-m", "services.scheduler.app"] (no SERVICE_CMD argument).

docker/Dockerfile.superset — Custom Superset Image

Extends the official Apache Superset image with additional database drivers.

Base image: apache/superset:latest

Additional packages: trino[sqlalchemy], psycopg2-binary, redis

frontend/Dockerfile — Dashboard Image

Multi-stage build for the React dashboard.

Stage 1 — Build (base: node:24-alpine):

Build Argument Default Description
VITE_QUERY_API_URL "" Query API base URL (empty = use relative /api/ proxy)
VITE_SYMBOL_REGISTRY_URL "" Symbol Registry base URL (empty = use relative /registry/ proxy)
VITE_RISK_ENGINE_URL "" Risk Engine base URL (empty = use relative /risk/ proxy)

Stage 2 — Serve (base: nginxinc/nginx-unprivileged:alpine):

  • Serves the built static files on port 8080
  • Uses frontend/nginx.conf for SPA fallback and API reverse proxying
  • Proxies /api/query-api:8000, /registry/symbol-registry:8000, /risk/risk-engine:8000, /trading/trading-engine:8000

Building Custom Images

To build a single service image locally:

# Build the query-api image
docker compose build query-api

# Build with a custom SERVICE_CMD
docker build -t my-custom-service \
  --build-arg SERVICE_CMD="python -m services.my_service.main" \
  -f docker/Dockerfile .

# Build the dashboard with custom API URLs
docker build -t my-dashboard \
  --build-arg VITE_QUERY_API_URL="https://api.example.com" \
  -f frontend/Dockerfile frontend/

# Rebuild all images
docker compose build

Dependency Ordering

Docker Compose enforces startup order using depends_on with health check conditions. The dependency graph is:

postgres (healthy) ──┬── scheduler
                     ├── symbol-registry
                     ├── ingestion
                     ├── parser
                     ├── extractor
                     ├── aggregation
                     ├── recommendation
                     ├── trading-engine
                     ├── risk-engine
                     ├── broker-adapter
                     ├── lake-publisher
                     └── query-api

redis (healthy) ─────┬── scheduler
                     ├── ingestion
                     ├── parser
                     ├── extractor
                     ├── aggregation
                     ├── recommendation
                     ├── trading-engine
                     ├── broker-adapter
                     └── query-api

minio (healthy) ─────┬── minio-init
                     ├── ingestion
                     ├── lake-publisher
                     └── query-api

ollama (started) ────── extractor

minio ───────────────── trino
hive-metastore ─────── trino
trino ──────────────── superset (via depends_on)

query-api (healthy) ── dashboard

Services with condition: service_healthy wait until the dependency's health check passes. The extractor depends on ollama with condition: service_started (no health check — Ollama may take time to load models).


Operational Commands

Starting Services

# Start all services in the background
docker compose up -d

# Start only infrastructure (useful for local development)
docker compose up -d postgres redis minio minio-init ollama

# Start a specific service and its dependencies
docker compose up -d query-api

Stopping Services

# Stop all services (preserves volumes)
docker compose down

# Stop all services and remove volumes (full reset)
docker compose down -v

# Stop a specific service
docker compose stop trading-engine

Restarting Services

# Restart a specific service
docker compose restart query-api

# Restart with a fresh build
docker compose up -d --build query-api

# Force recreate a service (picks up compose file changes)
docker compose up -d --force-recreate query-api

Viewing Logs

# Follow logs for all services
docker compose logs -f

# Follow logs for a specific service
docker compose logs -f query-api

# View last 50 lines of a service's logs
docker compose logs --tail=50 ingestion

# View logs for multiple services
docker compose logs -f scheduler ingestion extractor

Scaling Replicas

# Scale a worker service to 3 replicas
docker compose up -d --scale ingestion=3

# Scale multiple services
docker compose up -d --scale ingestion=3 --scale extractor=2

# Scale back to 1
docker compose up -d --scale ingestion=1

Note

: Scaling works best for worker services (ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher) that consume from Redis queues. Do not scale FastAPI services that expose host ports without adjusting port mappings.

Inspecting Services

# List all services and their status
docker compose ps

# View resource usage
docker compose top

# Execute a command inside a running container
docker compose exec query-api python -c "from services.shared.config import load_config; print(load_config())"

# Open a shell in a container
docker compose exec postgres psql -U stonks -d stonks

Full Reset

# Nuclear option: stop everything, remove volumes, rebuild, restart
docker compose down -v
docker compose build --no-cache
docker compose up -d

This destroys all data (database, object storage, model weights, metastore, Superset config) and starts from scratch. PostgreSQL migrations are re-applied automatically. MinIO buckets are re-created by minio-init. Ollama models must be re-downloaded.


MinIO Bucket Initialization

The minio-init service runs once on startup and creates the required object storage buckets:

Bucket Purpose
stonks-raw-market Raw market data from Polygon.io
stonks-raw-news Raw news articles
stonks-raw-filings Raw SEC filings
stonks-normalized Normalized/parsed documents
stonks-llm-prompts LLM prompt archives
stonks-llm-results LLM extraction results
stonks-lakehouse Parquet fact tables for Trino
stonks-audit Audit trail artifacts

Access the MinIO console at http://localhost:9001 (credentials: minioadmin / minioadmin).


Dashboard Reverse Proxy

The dashboard container runs nginx with reverse proxy rules that route API requests to backend services using Docker Compose service names:

Path Proxied To Service
/api/ http://query-api:8000 Query API
/registry/ http://symbol-registry:8000/ Symbol Registry API
/risk/ http://risk-engine:8000/ Risk Engine API
/trading/ http://trading-engine:8000/ Trading Engine API

All other paths serve the React SPA with try_files fallback to index.html.


Troubleshooting

Service won't start

Check dependency health:

docker compose ps postgres redis minio

If infrastructure services are unhealthy, application services will wait indefinitely. Check infrastructure logs:

docker compose logs postgres

Database migration errors

Migrations in ./infra/migrations/ are applied by PostgreSQL's docker-entrypoint-initdb.d mechanism, which only runs on first database initialization. If you need to re-run migrations:

docker compose down -v   # Remove pgdata volume
docker compose up -d     # Migrations re-applied on fresh init

Ollama model not available

The extractor service needs an LLM model loaded in Ollama. Pull a model manually:

docker compose exec ollama ollama pull qwen3.5:9b

Port conflicts

If a port is already in use, modify the host port mapping in docker-compose.yml:

query-api:
  ports:
    - "9004:8000"   # Changed from 8004 to 9004