stonks-oracle/docs/docker-deployment.md

# Docker Deployment Guide

This guide covers running the full Stonks Oracle platform locally using Docker Compose. It documents every service, environment variable, volume mount, health check, and operational command.

## Prerequisites

- Docker Engine 24+ and Docker Compose v2
- NVIDIA GPU with drivers and NVIDIA Container Toolkit (for Ollama LLM inference)
- At least 16 GB RAM (Ollama + Trino + all services)
- API keys for Polygon.io and Alpaca (optional — platform runs in degraded mode without them)

## Quick Start

```bash
# 1. Clone the repository
git clone <repo-url> && cd stonks-oracle

# 2. Configure API keys (create .env in the repo root)
cat > .env <<'EOF'
MARKET_DATA_API_KEY=your_polygon_key
BROKER_API_KEY=your_alpaca_key
BROKER_API_SECRET=your_alpaca_secret
BROKER_BASE_URL=https://paper-api.alpaca.markets
EOF

# 3. Start everything
docker compose up -d

# 4. Pull an LLM model into Ollama
docker compose exec ollama ollama pull qwen3.5:9b-fast

# 5. Seed the database
docker compose exec scheduler python -m services.symbol_registry.seed

# 6. Verify all services are healthy
docker compose ps

# 7. Access the dashboard
open http://localhost:3000
```

### Automated Deployment

The `deploy-docker.sh` script automates the full deployment to a remote host via SSH, including prerequisite installation, repository sync, environment configuration, image builds, service startup, database seeding, and Ollama model pulling:

```bash
# Deploy with defaults (GPU-accelerated Docker Ollama)
bash deploy-docker.sh

# Specify a custom Ollama model
bash deploy-docker.sh --ollama-model qwen3.6

# Deploy to a different host
bash deploy-docker.sh --host user@myserver --dir /opt/stonks
```

| Flag | Default | Description |
|------|---------|-------------|
| `--host` | `celes@192.168.42.254` | SSH target (`USER@HOST`) |
| `--ollama-url` | (auto — Docker container) | Ollama API URL |
| `--ollama-model` | `qwen3.5:9b-fast` | Ollama model to pull |
| `--dir` | `~/stonks-oracle` | Remote install directory |

The script detects the target OS and package manager (apt, dnf, yum, pacman, zypper) and installs Docker, NVIDIA drivers, and the NVIDIA Container Toolkit as needed. It also handles WSL environments and firewall configuration.

---

## Service Inventory

### Infrastructure Services

| Service | Image | Ports | Volumes | Purpose |
|---------|-------|-------|---------|---------|
| `postgres` | `postgres:16-alpine` | `5432:5432` | `pgdata` → `/var/lib/postgresql/data`, `./infra/migrations` → `/docker-entrypoint-initdb.d` | Primary database; migrations auto-applied on first start |
| `redis` | `redis:7-alpine` | `6379:6379` | — | Queue broker, caching, deduplication |
| `minio` | `minio/minio:latest` | `9000:9000` (API), `9001:9001` (console) | `miniodata` → `/data` | Object storage for raw artifacts and lakehouse |
| `minio-init` | `minio/mc:latest` | — | — | One-shot init container that creates required buckets |
| `ollama` | `ollama/ollama:latest` | `11434:11434` | `ollama_models` → `/root/.ollama` | LLM inference server for extraction and classification |
| `trino` | `trinodb/trino:latest` | `8080:8080` | `./infra/trino/catalog` → `/etc/trino/catalog` | SQL query engine over the lakehouse |
| `hive-metastore` | `apache/hive:4.0.0` | `9083:9083` | `hive_data` → `/opt/hive/data`, `./infra/hive/core-site.xml` → `/opt/hive/conf/core-site.xml`, `./infra/hive/metastore-site.xml` → `/opt/hive/conf/metastore-site.xml` | Iceberg/Hive metadata catalog for Trino |
| `superset` | `apache/superset:latest` | `8088:8088` | `superset_data` → `/app/superset_home` | BI dashboards over Trino |

### Application Services

| Service | Dockerfile | `SERVICE_CMD` / Command | Ports | Depends On |
|---------|-----------|------------------------|-------|------------|
| `scheduler` | `docker/Dockerfile.scheduler` | `python -m services.scheduler.app` | — | postgres (healthy), redis (healthy) |
| `symbol-registry` | `docker/Dockerfile` | `uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000` | `8001:8000` | postgres (healthy) |
| `ingestion` | `docker/Dockerfile` | `python -m services.ingestion.worker` | — | postgres (healthy), redis (healthy), minio (healthy) |
| `parser` | `docker/Dockerfile` | `python -m services.parser.worker` | — | postgres (healthy), redis (healthy) |
| `extractor` | `docker/Dockerfile` | `python -m services.extractor.main` | — | postgres (healthy), redis (healthy), ollama (started) |
| `aggregation` | `docker/Dockerfile` | `python -m services.aggregation.main` | — | postgres (healthy), redis (healthy) |
| `recommendation` | `docker/Dockerfile` | `python -m services.recommendation.main` | — | postgres (healthy), redis (healthy) |
| `trading-engine` | `docker/Dockerfile` | `uvicorn services.trading.app:app --host 0.0.0.0 --port 8000` | `8002:8000` | postgres (healthy), redis (healthy) |
| `risk-engine` | `docker/Dockerfile` | `uvicorn services.risk.app:app --host 0.0.0.0 --port 8000` | `8003:8000` | postgres (healthy) |
| `broker-adapter` | `docker/Dockerfile` | `python -m services.adapters.broker_service` | — | postgres (healthy), redis (healthy) |
| `lake-publisher` | `docker/Dockerfile` | `python -m services.lake_publisher.jobs` | — | postgres (healthy), minio (healthy) |
| `query-api` | `docker/Dockerfile` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` | `8004:8000` | postgres (healthy), redis (healthy), minio (healthy) |
| `dashboard` | `frontend/Dockerfile` | nginx (built-in) | `3000:8080` | query-api (healthy) |

The `risk-engine` service has a Docker network alias of `risk` so the dashboard's nginx reverse proxy can resolve it as `http://risk:8000`.

### Port Summary

| Port | Service | Protocol |
|------|---------|----------|
| 3000 | Dashboard (React UI) | HTTP |
| 5432 | PostgreSQL | TCP |
| 6379 | Redis | TCP |
| 8001 | Symbol Registry API | HTTP |
| 8002 | Trading Engine API | HTTP |
| 8003 | Risk Engine API | HTTP |
| 8004 | Query API | HTTP |
| 8080 | Trino | HTTP |
| 8088 | Superset | HTTP |
| 9000 | MinIO API | HTTP |
| 9001 | MinIO Console | HTTP |
| 9083 | Hive Metastore | Thrift |
| 11434 | Ollama | HTTP |

---

## Environment Variables

### Shared Application Environment (`x-app-env`)

All application services inherit these variables via the `x-app-env` YAML anchor:

| Variable | Default | Description |
|----------|---------|-------------|
| `POSTGRES_HOST` | `postgres` | PostgreSQL hostname (Docker service name) |
| `POSTGRES_PORT` | `5432` | PostgreSQL port |
| `POSTGRES_DB` | `stonks` | Database name |
| `POSTGRES_USER` | `stonks` | Database user |
| `POSTGRES_PASSWORD` | `stonks_dev` | Database password |
| `REDIS_HOST` | `redis` | Redis hostname (Docker service name) |
| `REDIS_PORT` | `6379` | Redis port |
| `MINIO_ENDPOINT` | `minio:9000` | MinIO API endpoint |
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
| `OLLAMA_BASE_URL` | `http://ollama:11434` | Ollama LLM server URL |

### `.env` File

The `.env` file is loaded by `ingestion`, `broker-adapter`, and `trading-engine` via the `env_file` directive. Create it in the repository root:

```dotenv
# Stonks Oracle — Environment Variables
# Loaded by: ingestion, broker-adapter, trading-engine

# ── Required for live data ingestion ──
MARKET_DATA_API_KEY=

# ── Required for paper/live trading ──
BROKER_API_KEY=
BROKER_API_SECRET=
BROKER_BASE_URL=https://paper-api.alpaca.markets

# ── Trading engine settings (optional) ──
TRADING_ENABLED=true
TRADING_RISK_TIER=moderate
TRADING_MAX_OPEN_POSITIONS=15

# ── LLM model (optional) ──
OLLAMA_MODEL=qwen3.5:9b-fast

# ── Signal layers (optional) ──
MACRO_ENABLED=true
COMPETITIVE_ENABLED=true
```

| Variable | Required | Default | Used By | Description |
|----------|----------|---------|---------|-------------|
| `MARKET_DATA_API_KEY` | No* | (empty) | ingestion | Polygon.io API key for market data fetching |
| `BROKER_API_KEY` | No* | (empty) | broker-adapter, trading-engine | Alpaca API key |
| `BROKER_API_SECRET` | No* | (empty) | broker-adapter, trading-engine | Alpaca API secret |
| `BROKER_BASE_URL` | No | `https://paper-api.alpaca.markets` | broker-adapter, trading-engine | Alpaca API base URL |

*Services start without these keys but run in degraded mode — ingestion cannot fetch market data and the broker adapter cannot execute trades.

### Infrastructure Service Environment

**PostgreSQL** (`postgres`):

| Variable | Value | Description |
|----------|-------|-------------|
| `POSTGRES_DB` | `stonks` | Database created on first start |
| `POSTGRES_USER` | `stonks` | Superuser for the database |
| `POSTGRES_PASSWORD` | `stonks_dev` | Password for the database user |

**MinIO** (`minio`):

| Variable | Value | Description |
|----------|-------|-------------|
| `MINIO_ROOT_USER` | `minioadmin` | MinIO admin username |
| `MINIO_ROOT_PASSWORD` | `minioadmin` | MinIO admin password |

**Trino** (`trino`):

| Variable | Value | Description |
|----------|-------|-------------|
| `MINIO_ACCESS_KEY` | `minioadmin` | Passed to Trino for MinIO catalog access |
| `MINIO_SECRET_KEY` | `minioadmin` | Passed to Trino for MinIO catalog access |

**Hive Metastore** (`hive-metastore`):

| Variable | Value | Description |
|----------|-------|-------------|
| `SERVICE_NAME` | `metastore` | Tells Hive to run in metastore-only mode |
| `DB_DRIVER` | `derby` | Embedded Derby database for metadata |

**Superset** (`superset`):

| Variable | Value | Description |
|----------|-------|-------------|
| `SUPERSET_SECRET_KEY` | `stonks-dev-secret-key-change-me` | Flask secret key (change in production) |
| `ADMIN_USERNAME` | `admin` | Initial admin username |
| `ADMIN_PASSWORD` | `admin` | Initial admin password |
| `ADMIN_EMAIL` | `admin@stonks.local` | Initial admin email |

### Additional Configuration Variables

All application services support additional environment variables loaded via `services/shared/config.py`. These can be added to individual service `environment` blocks or to the `x-app-env` anchor as needed:

| Variable | Default | Description |
|----------|---------|-------------|
| `REDIS_DB` | `0` | Redis database number |
| `REDIS_PASSWORD` | (none) | Redis password (not needed in Docker Compose) |
| `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
| `OLLAMA_MODEL` | `qwen3.5:9b` | Default LLM model for extraction |
| `OLLAMA_TIMEOUT` | `120` | Ollama request timeout (seconds) |
| `OLLAMA_MAX_RETRIES` | `2` | Max retries for Ollama requests |
| `OLLAMA_RETRY_BASE_DELAY` | `1.0` | Base delay between retries (seconds) |
| `OLLAMA_RETRY_MAX_DELAY` | `10.0` | Maximum delay between retries (seconds) |
| `OLLAMA_RETRY_BACKOFF_MULTIPLIER` | `2.0` | Backoff multiplier for retries |
| `VLLM_BASE_URL` | `http://192.168.42.254:8000` | vLLM server URL (if using vLLM instead of Ollama) |
| `VLLM_MODEL` | `RedHatAI/Qwen3.6-35B-A3B-NVFP4` | vLLM model name |
| `VLLM_TIMEOUT` | `120` | vLLM request timeout (seconds) |
| `VLLM_MAX_RETRIES` | `2` | Max retries for vLLM requests |
| `VLLM_TEMPERATURE` | `0.7` | vLLM sampling temperature |
| `VLLM_MAX_TOKENS` | `4096` | vLLM max output tokens |
| `VLLM_API_KEY` | (empty) | vLLM API key (if required) |
| `TRINO_HOST` | `localhost` | Trino hostname |
| `TRINO_PORT` | `8080` | Trino port |
| `TRINO_CATALOG` | `lakehouse` | Trino catalog name |
| `TRINO_SCHEMA` | `stonks` | Trino schema name |
| `TRINO_ICEBERG_CATALOG` | `iceberg` | Trino Iceberg catalog name |
| `MARKET_DATA_BASE_URL` | `https://api.polygon.io` | Polygon.io base URL |
| `MARKET_DATA_PROVIDER` | `polygon` | Market data provider |
| `BROKER_MODE` | `paper` | Broker mode: `paper` or `live` |
| `BROKER_PROVIDER` | `alpaca` | Broker provider |
| `TRADING_ENABLED` | `false` | Enable autonomous trading engine |
| `TRADING_RISK_TIER` | `moderate` | Risk tier: `conservative`, `moderate`, `aggressive` |
| `TRADING_POLLING_INTERVAL_SECONDS` | `60` | Recommendation polling interval |
| `TRADING_MAX_OPEN_POSITIONS` | `10` | Maximum concurrent open positions |
| `TRADING_RESERVE_SIPHON_PCT` | `0.20` | Percentage of profits siphoned to reserve pool |
| `TRADING_STOP_LOSS_CHECK_INTERVAL_SECONDS` | `300` | Stop-loss check interval |
| `TRADING_FAST_STOP_LOSS_INTERVAL_SECONDS` | `60` | Fast stop-loss check interval |
| `TRADING_GRADUAL_ENTRY_TRANCHES` | `3` | Number of tranches for gradual entry |
| `TRADING_GRADUAL_ENTRY_THRESHOLD_DOLLARS` | `30.0` | Dollar threshold for gradual entry |
| `TRADING_ABSOLUTE_POSITION_CAP` | `50.0` | Maximum position size (dollars) |
| `TRADING_ACTIVE_POOL_MINIMUM` | `100.0` | Minimum active pool balance |
| `TRADING_EMERGENCY_DRAWDOWN_THRESHOLD_PCT` | `0.40` | Emergency drawdown threshold |
| `TRADING_RESERVE_HIGH_WATER_PCT` | `0.30` | Reserve high-water mark percentage |
| `TRADING_MICRO_TRADING_ENABLED` | `false` | Enable micro-trading mode |
| `TRADING_MICRO_TRADING_INTERVAL_SECONDS` | `300` | Micro-trading polling interval |
| `TRADING_MICRO_TRADING_ALLOCATION_CAP_PCT` | `0.03` | Micro-trading allocation cap |
| `TRADING_MICRO_TRADING_MAX_DAILY` | `10` | Max micro-trades per day |
| `TRADING_MICRO_TRADING_MAX_HOLD_MINUTES` | `120` | Max micro-trade hold time |
| `TRADING_SNS_TOPIC_ARN` | (empty) | AWS SNS topic ARN for notifications |
| `TRADING_SNS_PHONE_NUMBER` | (empty) | Phone number for SNS notifications |
| `TRADING_GMAIL_SENDER` | (empty) | Gmail sender address for notifications |
| `TRADING_GMAIL_RECIPIENT` | (empty) | Gmail recipient address for notifications |
| `MACRO_ENABLED` | `true` | Enable macro signal layer |
| `MACRO_SIGNAL_WEIGHT` | `0.3` | Relative weight of macro vs company signals |
| `MACRO_CONFIDENCE_THRESHOLD` | `0.4` | Minimum confidence for macro event inclusion |
| `MACRO_SHORT_TERM_STALENESS_HOURS` | `48` | Hours before short-term events get accelerated decay |
| `PROJECTION_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for projections to influence recommendations |
| `COMPETITIVE_ENABLED` | `true` | Enable competitive signal layer |
| `COMPETITIVE_SIGNAL_WEIGHT` | `0.2` | Relative weight of competitive signals |
| `COMPETITIVE_PATTERN_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for pattern inclusion |
| `COMPETITIVE_PROPAGATION_STRENGTH_THRESHOLD` | `0.2` | Minimum strength for signal propagation |
| `COMPETITIVE_ROUTINE_LOOKBACK_DAYS` | `180` | Lookback window for routine patterns |
| `COMPETITIVE_MAJOR_DECISION_LOOKBACK_DAYS` | `365` | Lookback window for major decisions |
| `COMPETITIVE_MIN_PATTERN_SAMPLES` | `3` | Minimum samples for pattern matching |
| `COMPETITIVE_MAJOR_DECISION_WEIGHT_MULTIPLIER` | `1.3` | Weight multiplier for major decision patterns |
| `COMPETITIVE_STALENESS_WINDOW_DAYS` | `180` | Window for staleness decay on competitive signals |
| `COMPETITIVE_STALENESS_RECENT_DAYS` | `90` | Days within which signals are considered recent |
| `COMPETITIVE_STALENESS_DECAY_PENALTY` | `0.5` | Decay penalty for stale competitive signals |
| `COMPETITIVE_PROPAGATION_FAILURE_THRESHOLD` | `5` | Consecutive propagation failures before operator alert |
| `ALERT_SOURCE_FAILURE_THRESHOLD` | `3` | Consecutive source failures before alert fires |
| `ALERT_SOURCE_FAILURE_WINDOW_HOURS` | `6` | Lookback window for source failure alerting |
| `ALERT_SCHEMA_FAILURE_RATE_THRESHOLD` | `0.3` | Extraction failure rate (30%) that triggers alert |
| `ALERT_SCHEMA_FAILURE_WINDOW_HOURS` | `1` | Lookback window for schema failure spike |
| `ALERT_LAKE_LAG_THRESHOLD_MINUTES` | `60` | Minutes since last lake publish before alert |
| `ALERT_BROKER_ERROR_THRESHOLD` | `3` | Consecutive broker errors before alert |
| `ALERT_BROKER_ERROR_WINDOW_HOURS` | `1` | Lookback window for broker error alerting |
| `ALERT_CHECK_INTERVAL_SECONDS` | `120` | How often alerting rules are evaluated |
| `RETENTION_RAW_MARKET_DAYS` | `90` | Retention period for raw market data (days) |
| `RETENTION_RAW_NEWS_DAYS` | `180` | Retention period for raw news articles (days) |
| `RETENTION_RAW_FILINGS_DAYS` | `365` | Retention period for raw SEC filings (days) |
| `RETENTION_NORMALIZED_DAYS` | `180` | Retention period for normalized documents (days) |
| `RETENTION_LLM_PROMPTS_DAYS` | `365` | Retention period for LLM prompt archives (days) |
| `RETENTION_LLM_RESULTS_DAYS` | `365` | Retention period for LLM extraction results (days) |
| `RETENTION_LAKEHOUSE_DAYS` | `730` | Retention period for lakehouse Parquet files (days) |
| `RETENTION_AUDIT_DAYS` | `730` | Retention period for audit trail artifacts (days) |
| `RETENTION_CLEANUP_INTERVAL_HOURS` | `24` | How often the retention cleanup worker runs |
| `RETENTION_BATCH_SIZE` | `1000` | Number of objects processed per cleanup batch |
| `LOG_LEVEL` | `INFO` | Logging level |
| `JSON_LOGS` | `true` | Enable structured JSON logging |
| `DEPLOY_STAGE` | (empty) | Deployment stage prefix for bucket names |

See `services/shared/config.py` for the complete list of all supported environment variables with their defaults.

---

## LLM Provider Configuration

Stonks Oracle supports two LLM backends: **Ollama** (local, self-hosted) and **vLLM** (high-performance inference server). The active provider is configured per-agent in the `ai_agents` database table, but the connection details come from environment variables.

### Option A: Bundled Ollama (default)

The `docker-compose.yml` includes an Ollama container with GPU passthrough via the NVIDIA Container Toolkit. On first start, pull a model:

```bash
docker compose exec ollama ollama pull qwen3.5:9b-fast
```

No additional configuration needed — services connect to `http://ollama:11434` by default.

The Ollama container requests all available NVIDIA GPUs via the `deploy.resources.reservations.devices` configuration. If no GPU is available, Ollama falls back to CPU inference (significantly slower).

### Option B: External Ollama

If Ollama is already running on the host (e.g. with GPU access), create a `docker-compose.override.yml`:

```yaml
services:
  ollama:
    entrypoint: ["true"]
    restart: "no"
    ports: []
  extractor:
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    environment:
      OLLAMA_BASE_URL: "http://host.docker.internal:11434"
    extra_hosts:
      - "host.docker.internal:host-gateway"
  recommendation:
    environment:
      OLLAMA_BASE_URL: "http://host.docker.internal:11434"
    extra_hosts:
      - "host.docker.internal:host-gateway"
```

This disables the bundled Ollama container and routes services to the host's instance. Replace the port if your Ollama runs on a non-standard port. For a remote Ollama instance (not on localhost), replace `host.docker.internal` with the remote IP and remove the `extra_hosts` block.

### Option C: vLLM Server

For higher throughput or quantized models (e.g. `RedHatAI/Qwen3.6-35B-A3B-NVFP4`), point services at a vLLM server. Add to your `.env`:

```dotenv
VLLM_BASE_URL=http://192.168.42.254:8000
VLLM_MODEL=RedHatAI/Qwen3.6-35B-A3B-NVFP4
VLLM_TIMEOUT=120
VLLM_TEMPERATURE=0.7
```

Then update the `ai_agents` table to use the vLLM provider:

```sql
UPDATE ai_agents SET model_provider = 'vllm', model_name = 'RedHatAI/Qwen3.6-35B-A3B-NVFP4' WHERE active = true;
```

Or use the API:

```bash
curl -X PUT http://localhost:8004/api/admin/agents/document-extractor \
  -H 'Content-Type: application/json' \
  -d '{"model_provider": "vllm", "model_name": "RedHatAI/Qwen3.6-35B-A3B-NVFP4"}'
```

### Option D: Mixed (Ollama + vLLM)

You can run different agents on different providers. For example, use vLLM for the high-volume extractor and Ollama for the thesis rewriter:

```sql
UPDATE ai_agents SET model_provider = 'vllm', model_name = 'RedHatAI/Qwen3.6-35B-A3B-NVFP4' WHERE slug = 'document-extractor';
UPDATE ai_agents SET model_provider = 'vllm', model_name = 'RedHatAI/Qwen3.6-35B-A3B-NVFP4' WHERE slug = 'event-classifier';
UPDATE ai_agents SET model_provider = 'ollama', model_name = 'qwen3.5:9b-fast' WHERE slug = 'thesis-rewriter';
```

Both `OLLAMA_BASE_URL` and `VLLM_BASE_URL` must be set in the environment for mixed mode.

### Automated Deployment

The `deploy-docker.sh` script handles LLM configuration automatically. It always uses the Docker Ollama container with GPU passthrough (NVIDIA Container Toolkit):

```bash
# Deploy with defaults (Docker Ollama, GPU-accelerated)
bash deploy-docker.sh

# Specify a custom model
bash deploy-docker.sh --ollama-model qwen3.6

# Specify a different host and directory
bash deploy-docker.sh --host user@myserver --dir /opt/stonks
```

If an external Ollama URL is provided via `--ollama-url`, the script creates a `docker-compose.override.yml` that disables the bundled container and routes services to the external instance.

---

## Volume Mounts and Data Persistence

Docker Compose defines five named volumes for persistent data:

| Volume | Mounted By | Mount Path | Contents |
|--------|-----------|------------|----------|
| `pgdata` | postgres | `/var/lib/postgresql/data` | PostgreSQL database files |
| `miniodata` | minio | `/data` | MinIO object storage (raw artifacts, lakehouse Parquet files) |
| `ollama_models` | ollama | `/root/.ollama` | Downloaded LLM model weights |
| `hive_data` | hive-metastore | `/opt/hive/data` | Hive metastore Derby database |
| `superset_data` | superset | `/app/superset_home` | Superset configuration and metadata |

### Bind Mounts

In addition to named volumes, several services use bind mounts for configuration:

| Service | Host Path | Container Path | Mode | Purpose |
|---------|-----------|---------------|------|---------|
| postgres | `./infra/migrations` | `/docker-entrypoint-initdb.d` | rw | SQL migrations auto-applied on first start |
| trino | `./infra/trino/catalog` | `/etc/trino/catalog` | rw | Trino catalog configuration (lakehouse, iceberg) |
| hive-metastore | `./infra/hive/core-site.xml` | `/opt/hive/conf/core-site.xml` | ro | Hadoop core-site config for MinIO access |
| hive-metastore | `./infra/hive/metastore-site.xml` | `/opt/hive/conf/metastore-site.xml` | ro | Hive metastore config |

### Resetting Data

To destroy all persistent data and start fresh:

```bash
# Stop all containers and remove named volumes
docker compose down -v
```

This removes `pgdata`, `miniodata`, `ollama_models`, `hive_data`, and `superset_data`. The next `docker compose up` will re-initialize PostgreSQL with migrations, re-create MinIO buckets (via `minio-init`), and re-download Ollama models.

To reset only specific volumes:

```bash
docker compose down
docker volume rm stonks-oracle_pgdata    # Reset database only
docker compose up -d
```

> **Note**: Volume names are prefixed with the project directory name (e.g., `stonks-oracle_pgdata`). Use `docker volume ls` to see exact names.

---

## Health Checks

Every service has a health check configured. Docker Compose uses these to enforce startup ordering via `depends_on` with `condition: service_healthy`.

### Infrastructure Health Checks

| Service | Test Command | Interval | Retries |
|---------|-------------|----------|---------|
| `postgres` | `pg_isready -U stonks` | 5s | 5 |
| `redis` | `redis-cli ping` | 5s | 5 |
| `minio` | `mc ready local` | 5s | 5 |

### Application Health Checks — FastAPI Services

FastAPI services (symbol-registry, trading-engine, risk-engine, query-api) use HTTP health endpoints:

| Service | Test Command | Interval | Timeout | Retries | Start Period |
|---------|-------------|----------|---------|---------|-------------|
| `symbol-registry` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| `trading-engine` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| `risk-engine` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| `query-api` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
| `dashboard` | `curl -f http://localhost:8080/` | 10s | 5s | 3 | 10s |

### Application Health Checks — Worker Services

Worker services (no HTTP endpoint) use process liveness checks:

| Service | Test Command | Interval | Timeout | Retries | Start Period |
|---------|-------------|----------|---------|---------|-------------|
| `scheduler` | `pgrep -f 'python -m services.scheduler.app'` | 10s | 5s | 3 | 15s |
| `ingestion` | `pgrep -f 'python -m services.ingestion.worker'` | 10s | 5s | 3 | 15s |
| `parser` | `pgrep -f 'python -m services.parser.worker'` | 10s | 5s | 3 | 15s |
| `extractor` | `pgrep -f 'python -m services.extractor.main'` | 10s | 5s | 3 | 15s |
| `aggregation` | `pgrep -f 'python -m services.aggregation.main'` | 10s | 5s | 3 | 15s |
| `recommendation` | `pgrep -f 'python -m services.recommendation.main'` | 10s | 5s | 3 | 15s |
| `broker-adapter` | `pgrep -f 'python -m services.adapters.broker_service'` | 10s | 5s | 3 | 15s |
| `lake-publisher` | `pgrep -f 'python -m services.lake_publisher.jobs'` | 10s | 5s | 3 | 15s |

### Verifying Service Health

```bash
# Check all service statuses
docker compose ps

# Check a specific service
docker compose ps query-api

# Inspect health check details for a container
docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | python -m json.tool

# Wait for all services to be healthy
docker compose up -d --wait
```

---

## Dockerfile Build Details

### `docker/Dockerfile` — Generic Python Service Image

Used by all application services except the scheduler. Accepts a `SERVICE_CMD` build argument that determines which service the container runs.

**Base image**: `python:3.12-slim` (via Harbor proxy cache in CI)

**Build arguments**:

| Argument | Default | Description |
|----------|---------|-------------|
| `SERVICE_CMD` | `python -m services.scheduler.app` | The command executed when the container starts |
| `CACHE_BUST` | (none) | Optional cache-busting argument to force rebuild of source layers |

**What gets copied**:
- `requirements.txt` → pip dependencies installed
- `services/` → all service source code
- `scripts/` → operational scripts
- `tests/` → test files (available for in-container testing)
- `conftest.py` → pytest configuration

**Environment variables set**:
- `PYTHONDONTWRITEBYTECODE=1` — no `.pyc` files
- `PYTHONUNBUFFERED=1` — unbuffered stdout/stderr for log visibility
- `PYTHONPATH=/app` — ensures `services.*` imports resolve

**System packages installed**: `gcc`, `libpq-dev` (PostgreSQL client library), `curl` (for health checks)

**Security**: Runs as non-root user `stonks` (UID 1000).

**How `SERVICE_CMD` works**: The `CMD` directive is `sh -c "${SERVICE_CMD}"`, so the build argument becomes the runtime command. Each service in `docker-compose.yml` overrides this via the `args.SERVICE_CMD` build parameter:

```yaml
query-api:
  build:
    context: .
    dockerfile: docker/Dockerfile
    args:
      SERVICE_CMD: "uvicorn services.api.app:app --host 0.0.0.0 --port 8000"
```

### `docker/Dockerfile.scheduler` — Scheduler Image

A specialized variant of the generic Dockerfile used only by the `scheduler` service. Adds `postgresql-client` for running database migrations via `psql`.

**Additional contents**:
- `infra/migrations/` → copied to `/app/infra/migrations/` for migration execution
- `postgresql-client` system package installed

**Command**: Hardcoded `CMD ["python", "-m", "services.scheduler.app"]` (no `SERVICE_CMD` argument).

### `docker/Dockerfile.superset` — Custom Superset Image

Extends the official Apache Superset image with additional database drivers.

**Base image**: `apache/superset:latest` (via Harbor proxy cache in CI)

**Additional packages**: `trino[sqlalchemy]`, `psycopg2-binary`, `redis`

### `frontend/Dockerfile` — Dashboard Image

Multi-stage build for the React dashboard.

**Stage 1 — Build** (base: `node:24-alpine`):

| Build Argument | Default | Description |
|---------------|---------|-------------|
| `VITE_QUERY_API_URL` | `""` | Query API base URL (empty = use relative `/api/` proxy) |
| `VITE_SYMBOL_REGISTRY_URL` | `""` | Symbol Registry base URL (empty = use relative `/registry/` proxy) |
| `VITE_RISK_ENGINE_URL` | `""` | Risk Engine base URL (empty = use relative `/risk/` proxy) |

**Stage 2 — Serve** (base: `nginxinc/nginx-unprivileged:alpine`):
- Serves the built static files on port 8080
- Uses `frontend/nginx.conf` for SPA fallback and API reverse proxying
- Proxies `/api/` → `query-api:8000`, `/registry/` → `symbol-registry:8000`, `/risk/` → `risk:8000`, `/trading/` → `trading-engine:8000`
- SSE stream endpoint (`/api/ops/pipeline/stream`) has buffering disabled for real-time delivery
- Static assets under `/assets/` are cached with 1-year expiry

### Building Custom Images

To build a single service image locally:

```bash
# Build the query-api image
docker compose build query-api

# Build with a custom SERVICE_CMD
docker build -t my-custom-service \
  --build-arg SERVICE_CMD="python -m services.my_service.main" \
  -f docker/Dockerfile .

# Build the dashboard with custom API URLs
docker build -t my-dashboard \
  --build-arg VITE_QUERY_API_URL="https://api.example.com" \
  -f frontend/Dockerfile frontend/

# Rebuild all images
docker compose build

# Rebuild without cache (force fresh build)
docker compose build --no-cache
```

---

## Dependency Ordering

Docker Compose enforces startup order using `depends_on` with health check conditions. The dependency graph is:

```
postgres (healthy) ──┬── scheduler
                     ├── symbol-registry
                     ├── ingestion
                     ├── parser
                     ├── extractor
                     ├── aggregation
                     ├── recommendation
                     ├── trading-engine
                     ├── risk-engine
                     ├── broker-adapter
                     ├── lake-publisher
                     └── query-api

redis (healthy) ─────┬── scheduler
                     ├── ingestion
                     ├── parser
                     ├── extractor
                     ├── aggregation
                     ├── recommendation
                     ├── trading-engine
                     ├── broker-adapter
                     └── query-api

minio (healthy) ─────┬── minio-init
                     ├── ingestion
                     ├── lake-publisher
                     └── query-api

ollama (started) ────── extractor

minio ───────────────── trino
hive-metastore ─────── trino
trino ──────────────── superset (via depends_on)

query-api (healthy) ── dashboard
```

Services with `condition: service_healthy` wait until the dependency's health check passes. The `extractor` depends on `ollama` with `condition: service_started` (no health check — Ollama may take time to load models).

---

## Operational Commands

### Starting Services

```bash
# Start all services in the background
docker compose up -d

# Start all services and wait for health checks
docker compose up -d --wait

# Start only infrastructure (useful for local development)
docker compose up -d postgres redis minio minio-init ollama

# Start a specific service and its dependencies
docker compose up -d query-api
```

### Stopping Services

```bash
# Stop all services (preserves volumes)
docker compose down

# Stop all services and remove volumes (full reset)
docker compose down -v

# Stop a specific service
docker compose stop trading-engine
```

### Restarting Services

```bash
# Restart a specific service
docker compose restart query-api

# Restart with a fresh build
docker compose up -d --build query-api

# Force recreate a service (picks up compose file changes)
docker compose up -d --force-recreate query-api
```

### Viewing Logs

```bash
# Follow logs for all services
docker compose logs -f

# Follow logs for a specific service
docker compose logs -f query-api

# View last 50 lines of a service's logs
docker compose logs --tail=50 ingestion

# View logs for multiple services
docker compose logs -f scheduler ingestion extractor
```

### Scaling Replicas

```bash
# Scale a worker service to 3 replicas
docker compose up -d --scale ingestion=3

# Scale multiple services
docker compose up -d --scale ingestion=3 --scale extractor=2

# Scale back to 1
docker compose up -d --scale ingestion=1
```

> **Note**: Scaling works best for worker services (ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher) that consume from Redis queues. Do not scale FastAPI services that expose host ports without adjusting port mappings.

### Inspecting Services

```bash
# List all services and their status
docker compose ps

# View resource usage
docker compose top

# Execute a command inside a running container
docker compose exec query-api python -c "from services.shared.config import load_config; print(load_config())"

# Open a shell in a container
docker compose exec postgres psql -U stonks -d stonks

# Seed the database
docker compose exec scheduler python -m services.symbol_registry.seed
```

### Full Reset

```bash
# Nuclear option: stop everything, remove volumes, rebuild, restart
docker compose down -v
docker compose build --no-cache
docker compose up -d
```

This destroys all data (database, object storage, model weights, metastore, Superset config) and starts from scratch. PostgreSQL migrations are re-applied automatically. MinIO buckets are re-created by `minio-init`. Ollama models must be re-downloaded.

---

## MinIO Bucket Initialization

The `minio-init` service runs once on startup and creates the required object storage buckets:

| Bucket | Purpose |
|--------|---------|
| `stonks-raw-market` | Raw market data from Polygon.io |
| `stonks-raw-news` | Raw news articles |
| `stonks-raw-filings` | Raw SEC filings |
| `stonks-normalized` | Normalized/parsed documents |
| `stonks-llm-prompts` | LLM prompt archives |
| `stonks-llm-results` | LLM extraction results |
| `stonks-lakehouse` | Parquet fact tables for Trino |
| `stonks-audit` | Audit trail artifacts |

Access the MinIO console at `http://localhost:9001` (credentials: `minioadmin` / `minioadmin`).

---

## Dashboard Reverse Proxy

The dashboard container runs nginx with reverse proxy rules that route API requests to backend services using Docker Compose service names:

| Path | Proxied To | Service |
|------|-----------|---------|
| `/api/` | `http://query-api:8000` | Query API |
| `/api/ops/pipeline/stream` | `http://query-api:8000` (SSE, no buffering) | Query API (real-time pipeline stream) |
| `/registry/` | `http://symbol-registry:8000/` | Symbol Registry API |
| `/risk/` | `http://risk:8000/` | Risk Engine (via network alias) |
| `/trading/` | `http://trading-engine:8000/` | Trading Engine API |

The `risk-engine` service has a network alias of `risk` in `docker-compose.yml` so the nginx upstream resolves correctly.

All other paths serve the React SPA with `try_files` fallback to `index.html`. Static assets under `/assets/` are served with 1-year cache headers.

Security headers applied: `X-Frame-Options: SAMEORIGIN`, `X-Content-Type-Options: nosniff`, `Referrer-Policy: strict-origin-when-cross-origin`.

---

## Troubleshooting

### Service won't start

Check dependency health:

```bash
docker compose ps postgres redis minio
```

If infrastructure services are unhealthy, application services will wait indefinitely. Check infrastructure logs:

```bash
docker compose logs postgres
```

### Database migration errors

Migrations in `./infra/migrations/` are applied by PostgreSQL's `docker-entrypoint-initdb.d` mechanism, which only runs on first database initialization. If you need to re-run migrations:

```bash
docker compose down -v   # Remove pgdata volume
docker compose up -d     # Migrations re-applied on fresh init
```

### Ollama model not available

The extractor service needs an LLM model loaded. Pull a model manually:

```bash
# If using bundled Ollama container:
docker compose exec ollama ollama pull qwen3.5:9b-fast

# If using host Ollama:
ollama pull qwen3.5:9b-fast

# If using vLLM, ensure the model is loaded on the vLLM server
curl http://your-vllm-host:8000/v1/models
```

### Ollama port conflict (address already in use)

If Ollama is already running on the host, the bundled container will fail to bind port 11434. Use the external Ollama configuration described in the "LLM Provider Configuration" section above, or use `deploy-docker.sh` which handles this automatically.

### GPU not detected by Ollama container

Ensure the NVIDIA Container Toolkit is installed and Docker is configured:

```bash
# Verify GPU passthrough works
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi

# If it fails, reconfigure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```

### Port conflicts

If a port is already in use, modify the host port mapping in `docker-compose.yml`:

```yaml
query-api:
  ports:
    - "9004:8000"   # Changed from 8004 to 9004
```

### Container runs out of memory

The full stack requires at least 16 GB RAM. If services are being OOM-killed:

```bash
# Check which containers are using the most memory
docker stats --no-stream

# Reduce memory usage by stopping non-essential services
docker compose stop trino hive-metastore superset
```