# Docker Deployment Guide This guide covers running the full Stonks Oracle platform locally using Docker Compose. It documents every service, environment variable, volume mount, health check, and operational command. ## Prerequisites - Docker Engine 24+ and Docker Compose v2 - At least 16 GB RAM (Ollama + Trino + all services) - API keys for Polygon.io and Alpaca (optional — platform runs in degraded mode without them) ## Quick Start ```bash # 1. Clone the repository git clone && cd stonks-oracle # 2. Configure API keys cp .env.example .env # or edit the existing .env # Fill in MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET # 3. Start everything docker compose up -d # 4. Verify all services are healthy docker compose ps # 5. Access the dashboard open http://localhost:3000 ``` --- ## Service Inventory ### Infrastructure Services | Service | Image | Ports | Volumes | Purpose | |---------|-------|-------|---------|---------| | `postgres` | `postgres:16-alpine` | `5432:5432` | `pgdata` → `/var/lib/postgresql/data`, `./infra/migrations` → `/docker-entrypoint-initdb.d` | Primary database; migrations auto-applied on first start | | `redis` | `redis:7-alpine` | `6379:6379` | — | Queue broker, caching, deduplication | | `minio` | `minio/minio:latest` | `9000:9000` (API), `9001:9001` (console) | `miniodata` → `/data` | Object storage for raw artifacts and lakehouse | | `minio-init` | `minio/mc:latest` | — | — | One-shot init container that creates required buckets | | `ollama` | `ollama/ollama:latest` | `11434:11434` | `ollama_models` → `/root/.ollama` | LLM inference server for extraction and classification | | `trino` | `trinodb/trino:latest` | `8080:8080` | `./infra/trino/catalog` → `/etc/trino/catalog` | SQL query engine over the lakehouse | | `hive-metastore` | `apache/hive:4.0.0` | `9083:9083` | `hive_data` → `/opt/hive/data`, `./infra/hive/core-site.xml` → `/opt/hive/conf/core-site.xml`, `./infra/hive/metastore-site.xml` → `/opt/hive/conf/metastore-site.xml` | Iceberg/Hive metadata catalog for Trino | | `superset` | `apache/superset:latest` | `8088:8088` | `superset_data` → `/app/superset_home` | BI dashboards over Trino | ### Application Services | Service | Dockerfile | `SERVICE_CMD` / Command | Ports | Depends On | |---------|-----------|------------------------|-------|------------| | `scheduler` | `docker/Dockerfile.scheduler` | `python -m services.scheduler.app` | — | postgres (healthy), redis (healthy) | | `symbol-registry` | `docker/Dockerfile` | `uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000` | `8001:8000` | postgres (healthy) | | `ingestion` | `docker/Dockerfile` | `python -m services.ingestion.worker` | — | postgres (healthy), redis (healthy), minio (healthy) | | `parser` | `docker/Dockerfile` | `python -m services.parser.worker` | — | postgres (healthy), redis (healthy) | | `extractor` | `docker/Dockerfile` | `python -m services.extractor.main` | — | postgres (healthy), redis (healthy), ollama (started) | | `aggregation` | `docker/Dockerfile` | `python -m services.aggregation.main` | — | postgres (healthy), redis (healthy) | | `recommendation` | `docker/Dockerfile` | `python -m services.recommendation.main` | — | postgres (healthy), redis (healthy) | | `trading-engine` | `docker/Dockerfile` | `uvicorn services.trading.app:app --host 0.0.0.0 --port 8000` | `8002:8000` | postgres (healthy), redis (healthy) | | `risk-engine` | `docker/Dockerfile` | `uvicorn services.risk.app:app --host 0.0.0.0 --port 8000` | `8003:8000` | postgres (healthy) | | `broker-adapter` | `docker/Dockerfile` | `python -m services.adapters.broker_service` | — | postgres (healthy), redis (healthy) | | `lake-publisher` | `docker/Dockerfile` | `python -m services.lake_publisher.jobs` | — | postgres (healthy), minio (healthy) | | `query-api` | `docker/Dockerfile` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` | `8004:8000` | postgres (healthy), redis (healthy), minio (healthy) | | `dashboard` | `frontend/Dockerfile` | nginx (built-in) | `3000:8080` | query-api (healthy) | ### Port Summary | Port | Service | Protocol | |------|---------|----------| | 3000 | Dashboard (React UI) | HTTP | | 5432 | PostgreSQL | TCP | | 6379 | Redis | TCP | | 8001 | Symbol Registry API | HTTP | | 8002 | Trading Engine API | HTTP | | 8003 | Risk Engine API | HTTP | | 8004 | Query API | HTTP | | 8080 | Trino | HTTP | | 8088 | Superset | HTTP | | 9000 | MinIO API | HTTP | | 9001 | MinIO Console | HTTP | | 9083 | Hive Metastore | Thrift | | 11434 | Ollama | HTTP | --- ## Environment Variables ### Shared Application Environment (`x-app-env`) All application services inherit these variables via the `x-app-env` YAML anchor: | Variable | Default | Description | |----------|---------|-------------| | `POSTGRES_HOST` | `postgres` | PostgreSQL hostname (Docker service name) | | `POSTGRES_PORT` | `5432` | PostgreSQL port | | `POSTGRES_DB` | `stonks` | Database name | | `POSTGRES_USER` | `stonks` | Database user | | `POSTGRES_PASSWORD` | `stonks_dev` | Database password | | `REDIS_HOST` | `redis` | Redis hostname (Docker service name) | | `REDIS_PORT` | `6379` | Redis port | | `MINIO_ENDPOINT` | `minio:9000` | MinIO API endpoint | | `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key | | `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key | | `OLLAMA_BASE_URL` | `http://ollama:11434` | Ollama LLM server URL | ### `.env` File The `.env` file is loaded by `ingestion`, `broker-adapter`, and `trading-engine` via the `env_file` directive. Create it in the repository root: ```dotenv # Stonks Oracle — Environment Variables # These are loaded by ingestion, broker-adapter, and trading-engine services. # Polygon.io market data API key (required for live data ingestion) MARKET_DATA_API_KEY= # Alpaca broker credentials (required for paper/live trading) BROKER_API_KEY= BROKER_API_SECRET= BROKER_BASE_URL=https://paper-api.alpaca.markets ``` | Variable | Required | Default | Used By | Description | |----------|----------|---------|---------|-------------| | `MARKET_DATA_API_KEY` | No* | (empty) | ingestion | Polygon.io API key for market data fetching | | `BROKER_API_KEY` | No* | (empty) | broker-adapter, trading-engine | Alpaca API key | | `BROKER_API_SECRET` | No* | (empty) | broker-adapter, trading-engine | Alpaca API secret | | `BROKER_BASE_URL` | No | `https://paper-api.alpaca.markets` | broker-adapter, trading-engine | Alpaca API base URL | *Services start without these keys but run in degraded mode — ingestion cannot fetch market data and the broker adapter cannot execute trades. ### Infrastructure Service Environment **PostgreSQL** (`postgres`): | Variable | Value | Description | |----------|-------|-------------| | `POSTGRES_DB` | `stonks` | Database created on first start | | `POSTGRES_USER` | `stonks` | Superuser for the database | | `POSTGRES_PASSWORD` | `stonks_dev` | Password for the database user | **MinIO** (`minio`): | Variable | Value | Description | |----------|-------|-------------| | `MINIO_ROOT_USER` | `minioadmin` | MinIO admin username | | `MINIO_ROOT_PASSWORD` | `minioadmin` | MinIO admin password | **Trino** (`trino`): | Variable | Value | Description | |----------|-------|-------------| | `MINIO_ACCESS_KEY` | `minioadmin` | Passed to Trino for MinIO catalog access | | `MINIO_SECRET_KEY` | `minioadmin` | Passed to Trino for MinIO catalog access | **Hive Metastore** (`hive-metastore`): | Variable | Value | Description | |----------|-------|-------------| | `SERVICE_NAME` | `metastore` | Tells Hive to run in metastore-only mode | | `DB_DRIVER` | `derby` | Embedded Derby database for metadata | **Superset** (`superset`): | Variable | Value | Description | |----------|-------|-------------| | `SUPERSET_SECRET_KEY` | `stonks-dev-secret-key-change-me` | Flask secret key (change in production) | | `ADMIN_USERNAME` | `admin` | Initial admin username | | `ADMIN_PASSWORD` | `admin` | Initial admin password | | `ADMIN_EMAIL` | `admin@stonks.local` | Initial admin email | ### Additional Configuration Variables All application services support additional environment variables loaded via `services/shared/config.py`. These can be added to individual service `environment` blocks or to the `x-app-env` anchor as needed: | Variable | Default | Description | |----------|---------|-------------| | `REDIS_DB` | `0` | Redis database number | | `REDIS_PASSWORD` | (none) | Redis password (not needed in Docker Compose) | | `MINIO_SECURE` | `false` | Use HTTPS for MinIO | | `OLLAMA_MODEL` | `qwen3.5:9b` | Default LLM model for extraction | | `OLLAMA_TIMEOUT` | `120` | Ollama request timeout (seconds) | | `OLLAMA_MAX_RETRIES` | `2` | Max retries for Ollama requests | | `TRINO_HOST` | `localhost` | Trino hostname | | `TRINO_PORT` | `8080` | Trino port | | `TRINO_CATALOG` | `lakehouse` | Trino catalog name | | `TRINO_SCHEMA` | `stonks` | Trino schema name | | `MARKET_DATA_BASE_URL` | `https://api.polygon.io` | Polygon.io base URL | | `MARKET_DATA_PROVIDER` | `polygon` | Market data provider | | `BROKER_MODE` | `paper` | Broker mode: `paper` or `live` | | `BROKER_PROVIDER` | `alpaca` | Broker provider | | `TRADING_ENABLED` | `false` | Enable autonomous trading engine | | `TRADING_RISK_TIER` | `moderate` | Risk tier: `conservative`, `moderate`, `aggressive` | | `TRADING_POLLING_INTERVAL_SECONDS` | `60` | Recommendation polling interval | | `TRADING_MAX_OPEN_POSITIONS` | `10` | Maximum concurrent open positions | | `MACRO_ENABLED` | `true` | Enable macro signal layer | | `COMPETITIVE_ENABLED` | `true` | Enable competitive signal layer | | `LOG_LEVEL` | `INFO` | Logging level | | `JSON_LOGS` | `true` | Enable structured JSON logging | | `DEPLOY_STAGE` | (empty) | Deployment stage prefix for bucket names | See `services/shared/config.py` for the complete list of all supported environment variables with their defaults. --- ## Volume Mounts and Data Persistence Docker Compose defines five named volumes for persistent data: | Volume | Mounted By | Mount Path | Contents | |--------|-----------|------------|----------| | `pgdata` | postgres | `/var/lib/postgresql/data` | PostgreSQL database files | | `miniodata` | minio | `/data` | MinIO object storage (raw artifacts, lakehouse Parquet files) | | `ollama_models` | ollama | `/root/.ollama` | Downloaded LLM model weights | | `hive_data` | hive-metastore | `/opt/hive/data` | Hive metastore Derby database | | `superset_data` | superset | `/app/superset_home` | Superset configuration and metadata | ### Bind Mounts In addition to named volumes, several services use bind mounts for configuration: | Service | Host Path | Container Path | Mode | Purpose | |---------|-----------|---------------|------|---------| | postgres | `./infra/migrations` | `/docker-entrypoint-initdb.d` | rw | SQL migrations auto-applied on first start | | trino | `./infra/trino/catalog` | `/etc/trino/catalog` | rw | Trino catalog configuration (lakehouse, iceberg) | | hive-metastore | `./infra/hive/core-site.xml` | `/opt/hive/conf/core-site.xml` | ro | Hadoop core-site config for MinIO access | | hive-metastore | `./infra/hive/metastore-site.xml` | `/opt/hive/conf/metastore-site.xml` | ro | Hive metastore config | ### Resetting Data To destroy all persistent data and start fresh: ```bash # Stop all containers and remove named volumes docker compose down -v ``` This removes `pgdata`, `miniodata`, `ollama_models`, `hive_data`, and `superset_data`. The next `docker compose up` will re-initialize PostgreSQL with migrations, re-create MinIO buckets (via `minio-init`), and re-download Ollama models. To reset only specific volumes: ```bash docker compose down docker volume rm stonks-oracle_pgdata # Reset database only docker compose up -d ``` > **Note**: Volume names are prefixed with the project directory name (e.g., `stonks-oracle_pgdata`). Use `docker volume ls` to see exact names. --- ## Health Checks Every service has a health check configured. Docker Compose uses these to enforce startup ordering via `depends_on` with `condition: service_healthy`. ### Infrastructure Health Checks | Service | Test Command | Interval | Retries | |---------|-------------|----------|---------| | `postgres` | `pg_isready -U stonks` | 5s | 5 | | `redis` | `redis-cli ping` | 5s | 5 | | `minio` | `mc ready local` | 5s | 5 | ### Application Health Checks — FastAPI Services FastAPI services (symbol-registry, trading-engine, risk-engine, query-api) use HTTP health endpoints: | Service | Test Command | Interval | Timeout | Retries | Start Period | |---------|-------------|----------|---------|---------|-------------| | `symbol-registry` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s | | `trading-engine` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s | | `risk-engine` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s | | `query-api` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s | | `dashboard` | `curl -f http://localhost:8080/` | 10s | 5s | 3 | 10s | ### Application Health Checks — Worker Services Worker services (no HTTP endpoint) use process liveness checks: | Service | Test Command | Interval | Timeout | Retries | Start Period | |---------|-------------|----------|---------|---------|-------------| | `scheduler` | `pgrep -f 'python -m services.scheduler.app'` | 10s | 5s | 3 | 15s | | `ingestion` | `pgrep -f 'python -m services.ingestion.worker'` | 10s | 5s | 3 | 15s | | `parser` | `pgrep -f 'python -m services.parser.worker'` | 10s | 5s | 3 | 15s | | `extractor` | `pgrep -f 'python -m services.extractor.main'` | 10s | 5s | 3 | 15s | | `aggregation` | `pgrep -f 'python -m services.aggregation.main'` | 10s | 5s | 3 | 15s | | `recommendation` | `pgrep -f 'python -m services.recommendation.main'` | 10s | 5s | 3 | 15s | | `broker-adapter` | `pgrep -f 'python -m services.adapters.broker_service'` | 10s | 5s | 3 | 15s | | `lake-publisher` | `pgrep -f 'python -m services.lake_publisher.jobs'` | 10s | 5s | 3 | 15s | ### Verifying Service Health ```bash # Check all service statuses docker compose ps # Check a specific service docker compose ps query-api # Inspect health check details for a container docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | python -m json.tool ``` --- ## Dockerfile Build Details ### `docker/Dockerfile` — Generic Python Service Image Used by all application services except the scheduler. Accepts a `SERVICE_CMD` build argument that determines which service the container runs. **Base image**: `python:3.12-slim` **Build arguments**: | Argument | Default | Description | |----------|---------|-------------| | `SERVICE_CMD` | `python -m services.scheduler.app` | The command executed when the container starts | **What gets copied**: - `requirements.txt` → pip dependencies installed - `services/` → all service source code - `tests/` → test files (available for in-container testing) - `conftest.py` → pytest configuration **Environment variables set**: - `PYTHONDONTWRITEBYTECODE=1` — no `.pyc` files - `PYTHONUNBUFFERED=1` — unbuffered stdout/stderr for log visibility - `PYTHONPATH=/app` — ensures `services.*` imports resolve **System packages installed**: `gcc`, `libpq-dev` (PostgreSQL client library), `curl` (for health checks) **Security**: Runs as non-root user `stonks` (UID 1000). **How `SERVICE_CMD` works**: The `CMD` directive is `sh -c "${SERVICE_CMD}"`, so the build argument becomes the runtime command. Each service in `docker-compose.yml` overrides this via the `args.SERVICE_CMD` build parameter: ```yaml query-api: build: context: . dockerfile: docker/Dockerfile args: SERVICE_CMD: "uvicorn services.api.app:app --host 0.0.0.0 --port 8000" ``` ### `docker/Dockerfile.scheduler` — Scheduler Image A specialized variant of the generic Dockerfile used only by the `scheduler` service. Adds `postgresql-client` for running database migrations via `psql`. **Additional contents**: - `infra/migrations/` → copied to `/app/infra/migrations/` for migration execution - `postgresql-client` system package installed **Command**: Hardcoded `CMD ["python", "-m", "services.scheduler.app"]` (no `SERVICE_CMD` argument). ### `docker/Dockerfile.superset` — Custom Superset Image Extends the official Apache Superset image with additional database drivers. **Base image**: `apache/superset:latest` **Additional packages**: `trino[sqlalchemy]`, `psycopg2-binary`, `redis` ### `frontend/Dockerfile` — Dashboard Image Multi-stage build for the React dashboard. **Stage 1 — Build** (base: `node:24-alpine`): | Build Argument | Default | Description | |---------------|---------|-------------| | `VITE_QUERY_API_URL` | `""` | Query API base URL (empty = use relative `/api/` proxy) | | `VITE_SYMBOL_REGISTRY_URL` | `""` | Symbol Registry base URL (empty = use relative `/registry/` proxy) | | `VITE_RISK_ENGINE_URL` | `""` | Risk Engine base URL (empty = use relative `/risk/` proxy) | **Stage 2 — Serve** (base: `nginxinc/nginx-unprivileged:alpine`): - Serves the built static files on port 8080 - Uses `frontend/nginx.conf` for SPA fallback and API reverse proxying - Proxies `/api/` → `query-api:8000`, `/registry/` → `symbol-registry:8000`, `/risk/` → `risk-engine:8000`, `/trading/` → `trading-engine:8000` ### Building Custom Images To build a single service image locally: ```bash # Build the query-api image docker compose build query-api # Build with a custom SERVICE_CMD docker build -t my-custom-service \ --build-arg SERVICE_CMD="python -m services.my_service.main" \ -f docker/Dockerfile . # Build the dashboard with custom API URLs docker build -t my-dashboard \ --build-arg VITE_QUERY_API_URL="https://api.example.com" \ -f frontend/Dockerfile frontend/ # Rebuild all images docker compose build ``` --- ## Dependency Ordering Docker Compose enforces startup order using `depends_on` with health check conditions. The dependency graph is: ``` postgres (healthy) ──┬── scheduler ├── symbol-registry ├── ingestion ├── parser ├── extractor ├── aggregation ├── recommendation ├── trading-engine ├── risk-engine ├── broker-adapter ├── lake-publisher └── query-api redis (healthy) ─────┬── scheduler ├── ingestion ├── parser ├── extractor ├── aggregation ├── recommendation ├── trading-engine ├── broker-adapter └── query-api minio (healthy) ─────┬── minio-init ├── ingestion ├── lake-publisher └── query-api ollama (started) ────── extractor minio ───────────────── trino hive-metastore ─────── trino trino ──────────────── superset (via depends_on) query-api (healthy) ── dashboard ``` Services with `condition: service_healthy` wait until the dependency's health check passes. The `extractor` depends on `ollama` with `condition: service_started` (no health check — Ollama may take time to load models). --- ## Operational Commands ### Starting Services ```bash # Start all services in the background docker compose up -d # Start only infrastructure (useful for local development) docker compose up -d postgres redis minio minio-init ollama # Start a specific service and its dependencies docker compose up -d query-api ``` ### Stopping Services ```bash # Stop all services (preserves volumes) docker compose down # Stop all services and remove volumes (full reset) docker compose down -v # Stop a specific service docker compose stop trading-engine ``` ### Restarting Services ```bash # Restart a specific service docker compose restart query-api # Restart with a fresh build docker compose up -d --build query-api # Force recreate a service (picks up compose file changes) docker compose up -d --force-recreate query-api ``` ### Viewing Logs ```bash # Follow logs for all services docker compose logs -f # Follow logs for a specific service docker compose logs -f query-api # View last 50 lines of a service's logs docker compose logs --tail=50 ingestion # View logs for multiple services docker compose logs -f scheduler ingestion extractor ``` ### Scaling Replicas ```bash # Scale a worker service to 3 replicas docker compose up -d --scale ingestion=3 # Scale multiple services docker compose up -d --scale ingestion=3 --scale extractor=2 # Scale back to 1 docker compose up -d --scale ingestion=1 ``` > **Note**: Scaling works best for worker services (ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher) that consume from Redis queues. Do not scale FastAPI services that expose host ports without adjusting port mappings. ### Inspecting Services ```bash # List all services and their status docker compose ps # View resource usage docker compose top # Execute a command inside a running container docker compose exec query-api python -c "from services.shared.config import load_config; print(load_config())" # Open a shell in a container docker compose exec postgres psql -U stonks -d stonks ``` ### Full Reset ```bash # Nuclear option: stop everything, remove volumes, rebuild, restart docker compose down -v docker compose build --no-cache docker compose up -d ``` This destroys all data (database, object storage, model weights, metastore, Superset config) and starts from scratch. PostgreSQL migrations are re-applied automatically. MinIO buckets are re-created by `minio-init`. Ollama models must be re-downloaded. --- ## MinIO Bucket Initialization The `minio-init` service runs once on startup and creates the required object storage buckets: | Bucket | Purpose | |--------|---------| | `stonks-raw-market` | Raw market data from Polygon.io | | `stonks-raw-news` | Raw news articles | | `stonks-raw-filings` | Raw SEC filings | | `stonks-normalized` | Normalized/parsed documents | | `stonks-llm-prompts` | LLM prompt archives | | `stonks-llm-results` | LLM extraction results | | `stonks-lakehouse` | Parquet fact tables for Trino | | `stonks-audit` | Audit trail artifacts | Access the MinIO console at `http://localhost:9001` (credentials: `minioadmin` / `minioadmin`). --- ## Dashboard Reverse Proxy The dashboard container runs nginx with reverse proxy rules that route API requests to backend services using Docker Compose service names: | Path | Proxied To | Service | |------|-----------|---------| | `/api/` | `http://query-api:8000` | Query API | | `/registry/` | `http://symbol-registry:8000/` | Symbol Registry API | | `/risk/` | `http://risk-engine:8000/` | Risk Engine API | | `/trading/` | `http://trading-engine:8000/` | Trading Engine API | All other paths serve the React SPA with `try_files` fallback to `index.html`. --- ## Troubleshooting ### Service won't start Check dependency health: ```bash docker compose ps postgres redis minio ``` If infrastructure services are unhealthy, application services will wait indefinitely. Check infrastructure logs: ```bash docker compose logs postgres ``` ### Database migration errors Migrations in `./infra/migrations/` are applied by PostgreSQL's `docker-entrypoint-initdb.d` mechanism, which only runs on first database initialization. If you need to re-run migrations: ```bash docker compose down -v # Remove pgdata volume docker compose up -d # Migrations re-applied on fresh init ``` ### Ollama model not available The extractor service needs an LLM model loaded in Ollama. Pull a model manually: ```bash docker compose exec ollama ollama pull qwen3.5:9b ``` ### Port conflicts If a port is already in use, modify the host port mapping in `docker-compose.yml`: ```yaml query-api: ports: - "9004:8000" # Changed from 8004 to 9004 ```