88ad1e8d99
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
628 lines
24 KiB
Markdown
628 lines
24 KiB
Markdown
# Docker Deployment Guide
|
|
|
|
This guide covers running the full Stonks Oracle platform locally using Docker Compose. It documents every service, environment variable, volume mount, health check, and operational command.
|
|
|
|
## Prerequisites
|
|
|
|
- Docker Engine 24+ and Docker Compose v2
|
|
- At least 16 GB RAM (Ollama + Trino + all services)
|
|
- API keys for Polygon.io and Alpaca (optional — platform runs in degraded mode without them)
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# 1. Clone the repository
|
|
git clone <repo-url> && cd stonks-oracle
|
|
|
|
# 2. Configure API keys
|
|
cp .env.example .env # or edit the existing .env
|
|
# Fill in MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET
|
|
|
|
# 3. Start everything
|
|
docker compose up -d
|
|
|
|
# 4. Verify all services are healthy
|
|
docker compose ps
|
|
|
|
# 5. Access the dashboard
|
|
open http://localhost:3000
|
|
```
|
|
|
|
---
|
|
|
|
## Service Inventory
|
|
|
|
### Infrastructure Services
|
|
|
|
| Service | Image | Ports | Volumes | Purpose |
|
|
|---------|-------|-------|---------|---------|
|
|
| `postgres` | `postgres:16-alpine` | `5432:5432` | `pgdata` → `/var/lib/postgresql/data`, `./infra/migrations` → `/docker-entrypoint-initdb.d` | Primary database; migrations auto-applied on first start |
|
|
| `redis` | `redis:7-alpine` | `6379:6379` | — | Queue broker, caching, deduplication |
|
|
| `minio` | `minio/minio:latest` | `9000:9000` (API), `9001:9001` (console) | `miniodata` → `/data` | Object storage for raw artifacts and lakehouse |
|
|
| `minio-init` | `minio/mc:latest` | — | — | One-shot init container that creates required buckets |
|
|
| `ollama` | `ollama/ollama:latest` | `11434:11434` | `ollama_models` → `/root/.ollama` | LLM inference server for extraction and classification |
|
|
| `trino` | `trinodb/trino:latest` | `8080:8080` | `./infra/trino/catalog` → `/etc/trino/catalog` | SQL query engine over the lakehouse |
|
|
| `hive-metastore` | `apache/hive:4.0.0` | `9083:9083` | `hive_data` → `/opt/hive/data`, `./infra/hive/core-site.xml` → `/opt/hive/conf/core-site.xml`, `./infra/hive/metastore-site.xml` → `/opt/hive/conf/metastore-site.xml` | Iceberg/Hive metadata catalog for Trino |
|
|
| `superset` | `apache/superset:latest` | `8088:8088` | `superset_data` → `/app/superset_home` | BI dashboards over Trino |
|
|
|
|
### Application Services
|
|
|
|
| Service | Dockerfile | `SERVICE_CMD` / Command | Ports | Depends On |
|
|
|---------|-----------|------------------------|-------|------------|
|
|
| `scheduler` | `docker/Dockerfile.scheduler` | `python -m services.scheduler.app` | — | postgres (healthy), redis (healthy) |
|
|
| `symbol-registry` | `docker/Dockerfile` | `uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000` | `8001:8000` | postgres (healthy) |
|
|
| `ingestion` | `docker/Dockerfile` | `python -m services.ingestion.worker` | — | postgres (healthy), redis (healthy), minio (healthy) |
|
|
| `parser` | `docker/Dockerfile` | `python -m services.parser.worker` | — | postgres (healthy), redis (healthy) |
|
|
| `extractor` | `docker/Dockerfile` | `python -m services.extractor.main` | — | postgres (healthy), redis (healthy), ollama (started) |
|
|
| `aggregation` | `docker/Dockerfile` | `python -m services.aggregation.main` | — | postgres (healthy), redis (healthy) |
|
|
| `recommendation` | `docker/Dockerfile` | `python -m services.recommendation.main` | — | postgres (healthy), redis (healthy) |
|
|
| `trading-engine` | `docker/Dockerfile` | `uvicorn services.trading.app:app --host 0.0.0.0 --port 8000` | `8002:8000` | postgres (healthy), redis (healthy) |
|
|
| `risk-engine` | `docker/Dockerfile` | `uvicorn services.risk.app:app --host 0.0.0.0 --port 8000` | `8003:8000` | postgres (healthy) |
|
|
| `broker-adapter` | `docker/Dockerfile` | `python -m services.adapters.broker_service` | — | postgres (healthy), redis (healthy) |
|
|
| `lake-publisher` | `docker/Dockerfile` | `python -m services.lake_publisher.jobs` | — | postgres (healthy), minio (healthy) |
|
|
| `query-api` | `docker/Dockerfile` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` | `8004:8000` | postgres (healthy), redis (healthy), minio (healthy) |
|
|
| `dashboard` | `frontend/Dockerfile` | nginx (built-in) | `3000:8080` | query-api (healthy) |
|
|
|
|
### Port Summary
|
|
|
|
| Port | Service | Protocol |
|
|
|------|---------|----------|
|
|
| 3000 | Dashboard (React UI) | HTTP |
|
|
| 5432 | PostgreSQL | TCP |
|
|
| 6379 | Redis | TCP |
|
|
| 8001 | Symbol Registry API | HTTP |
|
|
| 8002 | Trading Engine API | HTTP |
|
|
| 8003 | Risk Engine API | HTTP |
|
|
| 8004 | Query API | HTTP |
|
|
| 8080 | Trino | HTTP |
|
|
| 8088 | Superset | HTTP |
|
|
| 9000 | MinIO API | HTTP |
|
|
| 9001 | MinIO Console | HTTP |
|
|
| 9083 | Hive Metastore | Thrift |
|
|
| 11434 | Ollama | HTTP |
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
### Shared Application Environment (`x-app-env`)
|
|
|
|
All application services inherit these variables via the `x-app-env` YAML anchor:
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `POSTGRES_HOST` | `postgres` | PostgreSQL hostname (Docker service name) |
|
|
| `POSTGRES_PORT` | `5432` | PostgreSQL port |
|
|
| `POSTGRES_DB` | `stonks` | Database name |
|
|
| `POSTGRES_USER` | `stonks` | Database user |
|
|
| `POSTGRES_PASSWORD` | `stonks_dev` | Database password |
|
|
| `REDIS_HOST` | `redis` | Redis hostname (Docker service name) |
|
|
| `REDIS_PORT` | `6379` | Redis port |
|
|
| `MINIO_ENDPOINT` | `minio:9000` | MinIO API endpoint |
|
|
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
|
|
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
|
|
| `OLLAMA_BASE_URL` | `http://ollama:11434` | Ollama LLM server URL |
|
|
|
|
### `.env` File
|
|
|
|
The `.env` file is loaded by `ingestion`, `broker-adapter`, and `trading-engine` via the `env_file` directive. Create it in the repository root:
|
|
|
|
```dotenv
|
|
# Stonks Oracle — Environment Variables
|
|
# These are loaded by ingestion, broker-adapter, and trading-engine services.
|
|
|
|
# Polygon.io market data API key (required for live data ingestion)
|
|
MARKET_DATA_API_KEY=
|
|
|
|
# Alpaca broker credentials (required for paper/live trading)
|
|
BROKER_API_KEY=
|
|
BROKER_API_SECRET=
|
|
BROKER_BASE_URL=https://paper-api.alpaca.markets
|
|
```
|
|
|
|
| Variable | Required | Default | Used By | Description |
|
|
|----------|----------|---------|---------|-------------|
|
|
| `MARKET_DATA_API_KEY` | No* | (empty) | ingestion | Polygon.io API key for market data fetching |
|
|
| `BROKER_API_KEY` | No* | (empty) | broker-adapter, trading-engine | Alpaca API key |
|
|
| `BROKER_API_SECRET` | No* | (empty) | broker-adapter, trading-engine | Alpaca API secret |
|
|
| `BROKER_BASE_URL` | No | `https://paper-api.alpaca.markets` | broker-adapter, trading-engine | Alpaca API base URL |
|
|
|
|
*Services start without these keys but run in degraded mode — ingestion cannot fetch market data and the broker adapter cannot execute trades.
|
|
|
|
### Infrastructure Service Environment
|
|
|
|
**PostgreSQL** (`postgres`):
|
|
|
|
| Variable | Value | Description |
|
|
|----------|-------|-------------|
|
|
| `POSTGRES_DB` | `stonks` | Database created on first start |
|
|
| `POSTGRES_USER` | `stonks` | Superuser for the database |
|
|
| `POSTGRES_PASSWORD` | `stonks_dev` | Password for the database user |
|
|
|
|
**MinIO** (`minio`):
|
|
|
|
| Variable | Value | Description |
|
|
|----------|-------|-------------|
|
|
| `MINIO_ROOT_USER` | `minioadmin` | MinIO admin username |
|
|
| `MINIO_ROOT_PASSWORD` | `minioadmin` | MinIO admin password |
|
|
|
|
**Trino** (`trino`):
|
|
|
|
| Variable | Value | Description |
|
|
|----------|-------|-------------|
|
|
| `MINIO_ACCESS_KEY` | `minioadmin` | Passed to Trino for MinIO catalog access |
|
|
| `MINIO_SECRET_KEY` | `minioadmin` | Passed to Trino for MinIO catalog access |
|
|
|
|
**Hive Metastore** (`hive-metastore`):
|
|
|
|
| Variable | Value | Description |
|
|
|----------|-------|-------------|
|
|
| `SERVICE_NAME` | `metastore` | Tells Hive to run in metastore-only mode |
|
|
| `DB_DRIVER` | `derby` | Embedded Derby database for metadata |
|
|
|
|
**Superset** (`superset`):
|
|
|
|
| Variable | Value | Description |
|
|
|----------|-------|-------------|
|
|
| `SUPERSET_SECRET_KEY` | `stonks-dev-secret-key-change-me` | Flask secret key (change in production) |
|
|
| `ADMIN_USERNAME` | `admin` | Initial admin username |
|
|
| `ADMIN_PASSWORD` | `admin` | Initial admin password |
|
|
| `ADMIN_EMAIL` | `admin@stonks.local` | Initial admin email |
|
|
|
|
### Additional Configuration Variables
|
|
|
|
All application services support additional environment variables loaded via `services/shared/config.py`. These can be added to individual service `environment` blocks or to the `x-app-env` anchor as needed:
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `REDIS_DB` | `0` | Redis database number |
|
|
| `REDIS_PASSWORD` | (none) | Redis password (not needed in Docker Compose) |
|
|
| `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
|
|
| `OLLAMA_MODEL` | `qwen3.5:9b` | Default LLM model for extraction |
|
|
| `OLLAMA_TIMEOUT` | `120` | Ollama request timeout (seconds) |
|
|
| `OLLAMA_MAX_RETRIES` | `2` | Max retries for Ollama requests |
|
|
| `TRINO_HOST` | `localhost` | Trino hostname |
|
|
| `TRINO_PORT` | `8080` | Trino port |
|
|
| `TRINO_CATALOG` | `lakehouse` | Trino catalog name |
|
|
| `TRINO_SCHEMA` | `stonks` | Trino schema name |
|
|
| `MARKET_DATA_BASE_URL` | `https://api.polygon.io` | Polygon.io base URL |
|
|
| `MARKET_DATA_PROVIDER` | `polygon` | Market data provider |
|
|
| `BROKER_MODE` | `paper` | Broker mode: `paper` or `live` |
|
|
| `BROKER_PROVIDER` | `alpaca` | Broker provider |
|
|
| `TRADING_ENABLED` | `false` | Enable autonomous trading engine |
|
|
| `TRADING_RISK_TIER` | `moderate` | Risk tier: `conservative`, `moderate`, `aggressive` |
|
|
| `TRADING_POLLING_INTERVAL_SECONDS` | `60` | Recommendation polling interval |
|
|
| `TRADING_MAX_OPEN_POSITIONS` | `10` | Maximum concurrent open positions |
|
|
| `MACRO_ENABLED` | `true` | Enable macro signal layer |
|
|
| `COMPETITIVE_ENABLED` | `true` | Enable competitive signal layer |
|
|
| `LOG_LEVEL` | `INFO` | Logging level |
|
|
| `JSON_LOGS` | `true` | Enable structured JSON logging |
|
|
| `DEPLOY_STAGE` | (empty) | Deployment stage prefix for bucket names |
|
|
|
|
See `services/shared/config.py` for the complete list of all supported environment variables with their defaults.
|
|
|
|
---
|
|
|
|
## Volume Mounts and Data Persistence
|
|
|
|
Docker Compose defines five named volumes for persistent data:
|
|
|
|
| Volume | Mounted By | Mount Path | Contents |
|
|
|--------|-----------|------------|----------|
|
|
| `pgdata` | postgres | `/var/lib/postgresql/data` | PostgreSQL database files |
|
|
| `miniodata` | minio | `/data` | MinIO object storage (raw artifacts, lakehouse Parquet files) |
|
|
| `ollama_models` | ollama | `/root/.ollama` | Downloaded LLM model weights |
|
|
| `hive_data` | hive-metastore | `/opt/hive/data` | Hive metastore Derby database |
|
|
| `superset_data` | superset | `/app/superset_home` | Superset configuration and metadata |
|
|
|
|
### Bind Mounts
|
|
|
|
In addition to named volumes, several services use bind mounts for configuration:
|
|
|
|
| Service | Host Path | Container Path | Mode | Purpose |
|
|
|---------|-----------|---------------|------|---------|
|
|
| postgres | `./infra/migrations` | `/docker-entrypoint-initdb.d` | rw | SQL migrations auto-applied on first start |
|
|
| trino | `./infra/trino/catalog` | `/etc/trino/catalog` | rw | Trino catalog configuration (lakehouse, iceberg) |
|
|
| hive-metastore | `./infra/hive/core-site.xml` | `/opt/hive/conf/core-site.xml` | ro | Hadoop core-site config for MinIO access |
|
|
| hive-metastore | `./infra/hive/metastore-site.xml` | `/opt/hive/conf/metastore-site.xml` | ro | Hive metastore config |
|
|
|
|
### Resetting Data
|
|
|
|
To destroy all persistent data and start fresh:
|
|
|
|
```bash
|
|
# Stop all containers and remove named volumes
|
|
docker compose down -v
|
|
```
|
|
|
|
This removes `pgdata`, `miniodata`, `ollama_models`, `hive_data`, and `superset_data`. The next `docker compose up` will re-initialize PostgreSQL with migrations, re-create MinIO buckets (via `minio-init`), and re-download Ollama models.
|
|
|
|
To reset only specific volumes:
|
|
|
|
```bash
|
|
docker compose down
|
|
docker volume rm stonks-oracle_pgdata # Reset database only
|
|
docker compose up -d
|
|
```
|
|
|
|
> **Note**: Volume names are prefixed with the project directory name (e.g., `stonks-oracle_pgdata`). Use `docker volume ls` to see exact names.
|
|
|
|
---
|
|
|
|
## Health Checks
|
|
|
|
Every service has a health check configured. Docker Compose uses these to enforce startup ordering via `depends_on` with `condition: service_healthy`.
|
|
|
|
### Infrastructure Health Checks
|
|
|
|
| Service | Test Command | Interval | Retries |
|
|
|---------|-------------|----------|---------|
|
|
| `postgres` | `pg_isready -U stonks` | 5s | 5 |
|
|
| `redis` | `redis-cli ping` | 5s | 5 |
|
|
| `minio` | `mc ready local` | 5s | 5 |
|
|
|
|
### Application Health Checks — FastAPI Services
|
|
|
|
FastAPI services (symbol-registry, trading-engine, risk-engine, query-api) use HTTP health endpoints:
|
|
|
|
| Service | Test Command | Interval | Timeout | Retries | Start Period |
|
|
|---------|-------------|----------|---------|---------|-------------|
|
|
| `symbol-registry` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
|
|
| `trading-engine` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
|
|
| `risk-engine` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
|
|
| `query-api` | `curl -f http://localhost:8000/health` | 10s | 5s | 3 | 15s |
|
|
| `dashboard` | `curl -f http://localhost:8080/` | 10s | 5s | 3 | 10s |
|
|
|
|
### Application Health Checks — Worker Services
|
|
|
|
Worker services (no HTTP endpoint) use process liveness checks:
|
|
|
|
| Service | Test Command | Interval | Timeout | Retries | Start Period |
|
|
|---------|-------------|----------|---------|---------|-------------|
|
|
| `scheduler` | `pgrep -f 'python -m services.scheduler.app'` | 10s | 5s | 3 | 15s |
|
|
| `ingestion` | `pgrep -f 'python -m services.ingestion.worker'` | 10s | 5s | 3 | 15s |
|
|
| `parser` | `pgrep -f 'python -m services.parser.worker'` | 10s | 5s | 3 | 15s |
|
|
| `extractor` | `pgrep -f 'python -m services.extractor.main'` | 10s | 5s | 3 | 15s |
|
|
| `aggregation` | `pgrep -f 'python -m services.aggregation.main'` | 10s | 5s | 3 | 15s |
|
|
| `recommendation` | `pgrep -f 'python -m services.recommendation.main'` | 10s | 5s | 3 | 15s |
|
|
| `broker-adapter` | `pgrep -f 'python -m services.adapters.broker_service'` | 10s | 5s | 3 | 15s |
|
|
| `lake-publisher` | `pgrep -f 'python -m services.lake_publisher.jobs'` | 10s | 5s | 3 | 15s |
|
|
|
|
### Verifying Service Health
|
|
|
|
```bash
|
|
# Check all service statuses
|
|
docker compose ps
|
|
|
|
# Check a specific service
|
|
docker compose ps query-api
|
|
|
|
# Inspect health check details for a container
|
|
docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | python -m json.tool
|
|
```
|
|
|
|
---
|
|
|
|
## Dockerfile Build Details
|
|
|
|
### `docker/Dockerfile` — Generic Python Service Image
|
|
|
|
Used by all application services except the scheduler. Accepts a `SERVICE_CMD` build argument that determines which service the container runs.
|
|
|
|
**Base image**: `python:3.12-slim`
|
|
|
|
**Build arguments**:
|
|
|
|
| Argument | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `SERVICE_CMD` | `python -m services.scheduler.app` | The command executed when the container starts |
|
|
|
|
**What gets copied**:
|
|
- `requirements.txt` → pip dependencies installed
|
|
- `services/` → all service source code
|
|
- `tests/` → test files (available for in-container testing)
|
|
- `conftest.py` → pytest configuration
|
|
|
|
**Environment variables set**:
|
|
- `PYTHONDONTWRITEBYTECODE=1` — no `.pyc` files
|
|
- `PYTHONUNBUFFERED=1` — unbuffered stdout/stderr for log visibility
|
|
- `PYTHONPATH=/app` — ensures `services.*` imports resolve
|
|
|
|
**System packages installed**: `gcc`, `libpq-dev` (PostgreSQL client library), `curl` (for health checks)
|
|
|
|
**Security**: Runs as non-root user `stonks` (UID 1000).
|
|
|
|
**How `SERVICE_CMD` works**: The `CMD` directive is `sh -c "${SERVICE_CMD}"`, so the build argument becomes the runtime command. Each service in `docker-compose.yml` overrides this via the `args.SERVICE_CMD` build parameter:
|
|
|
|
```yaml
|
|
query-api:
|
|
build:
|
|
context: .
|
|
dockerfile: docker/Dockerfile
|
|
args:
|
|
SERVICE_CMD: "uvicorn services.api.app:app --host 0.0.0.0 --port 8000"
|
|
```
|
|
|
|
### `docker/Dockerfile.scheduler` — Scheduler Image
|
|
|
|
A specialized variant of the generic Dockerfile used only by the `scheduler` service. Adds `postgresql-client` for running database migrations via `psql`.
|
|
|
|
**Additional contents**:
|
|
- `infra/migrations/` → copied to `/app/infra/migrations/` for migration execution
|
|
- `postgresql-client` system package installed
|
|
|
|
**Command**: Hardcoded `CMD ["python", "-m", "services.scheduler.app"]` (no `SERVICE_CMD` argument).
|
|
|
|
### `docker/Dockerfile.superset` — Custom Superset Image
|
|
|
|
Extends the official Apache Superset image with additional database drivers.
|
|
|
|
**Base image**: `apache/superset:latest`
|
|
|
|
**Additional packages**: `trino[sqlalchemy]`, `psycopg2-binary`, `redis`
|
|
|
|
### `frontend/Dockerfile` — Dashboard Image
|
|
|
|
Multi-stage build for the React dashboard.
|
|
|
|
**Stage 1 — Build** (base: `node:24-alpine`):
|
|
|
|
| Build Argument | Default | Description |
|
|
|---------------|---------|-------------|
|
|
| `VITE_QUERY_API_URL` | `""` | Query API base URL (empty = use relative `/api/` proxy) |
|
|
| `VITE_SYMBOL_REGISTRY_URL` | `""` | Symbol Registry base URL (empty = use relative `/registry/` proxy) |
|
|
| `VITE_RISK_ENGINE_URL` | `""` | Risk Engine base URL (empty = use relative `/risk/` proxy) |
|
|
|
|
**Stage 2 — Serve** (base: `nginxinc/nginx-unprivileged:alpine`):
|
|
- Serves the built static files on port 8080
|
|
- Uses `frontend/nginx.conf` for SPA fallback and API reverse proxying
|
|
- Proxies `/api/` → `query-api:8000`, `/registry/` → `symbol-registry:8000`, `/risk/` → `risk-engine:8000`, `/trading/` → `trading-engine:8000`
|
|
|
|
### Building Custom Images
|
|
|
|
To build a single service image locally:
|
|
|
|
```bash
|
|
# Build the query-api image
|
|
docker compose build query-api
|
|
|
|
# Build with a custom SERVICE_CMD
|
|
docker build -t my-custom-service \
|
|
--build-arg SERVICE_CMD="python -m services.my_service.main" \
|
|
-f docker/Dockerfile .
|
|
|
|
# Build the dashboard with custom API URLs
|
|
docker build -t my-dashboard \
|
|
--build-arg VITE_QUERY_API_URL="https://api.example.com" \
|
|
-f frontend/Dockerfile frontend/
|
|
|
|
# Rebuild all images
|
|
docker compose build
|
|
```
|
|
|
|
---
|
|
|
|
## Dependency Ordering
|
|
|
|
Docker Compose enforces startup order using `depends_on` with health check conditions. The dependency graph is:
|
|
|
|
```
|
|
postgres (healthy) ──┬── scheduler
|
|
├── symbol-registry
|
|
├── ingestion
|
|
├── parser
|
|
├── extractor
|
|
├── aggregation
|
|
├── recommendation
|
|
├── trading-engine
|
|
├── risk-engine
|
|
├── broker-adapter
|
|
├── lake-publisher
|
|
└── query-api
|
|
|
|
redis (healthy) ─────┬── scheduler
|
|
├── ingestion
|
|
├── parser
|
|
├── extractor
|
|
├── aggregation
|
|
├── recommendation
|
|
├── trading-engine
|
|
├── broker-adapter
|
|
└── query-api
|
|
|
|
minio (healthy) ─────┬── minio-init
|
|
├── ingestion
|
|
├── lake-publisher
|
|
└── query-api
|
|
|
|
ollama (started) ────── extractor
|
|
|
|
minio ───────────────── trino
|
|
hive-metastore ─────── trino
|
|
trino ──────────────── superset (via depends_on)
|
|
|
|
query-api (healthy) ── dashboard
|
|
```
|
|
|
|
Services with `condition: service_healthy` wait until the dependency's health check passes. The `extractor` depends on `ollama` with `condition: service_started` (no health check — Ollama may take time to load models).
|
|
|
|
---
|
|
|
|
## Operational Commands
|
|
|
|
### Starting Services
|
|
|
|
```bash
|
|
# Start all services in the background
|
|
docker compose up -d
|
|
|
|
# Start only infrastructure (useful for local development)
|
|
docker compose up -d postgres redis minio minio-init ollama
|
|
|
|
# Start a specific service and its dependencies
|
|
docker compose up -d query-api
|
|
```
|
|
|
|
### Stopping Services
|
|
|
|
```bash
|
|
# Stop all services (preserves volumes)
|
|
docker compose down
|
|
|
|
# Stop all services and remove volumes (full reset)
|
|
docker compose down -v
|
|
|
|
# Stop a specific service
|
|
docker compose stop trading-engine
|
|
```
|
|
|
|
### Restarting Services
|
|
|
|
```bash
|
|
# Restart a specific service
|
|
docker compose restart query-api
|
|
|
|
# Restart with a fresh build
|
|
docker compose up -d --build query-api
|
|
|
|
# Force recreate a service (picks up compose file changes)
|
|
docker compose up -d --force-recreate query-api
|
|
```
|
|
|
|
### Viewing Logs
|
|
|
|
```bash
|
|
# Follow logs for all services
|
|
docker compose logs -f
|
|
|
|
# Follow logs for a specific service
|
|
docker compose logs -f query-api
|
|
|
|
# View last 50 lines of a service's logs
|
|
docker compose logs --tail=50 ingestion
|
|
|
|
# View logs for multiple services
|
|
docker compose logs -f scheduler ingestion extractor
|
|
```
|
|
|
|
### Scaling Replicas
|
|
|
|
```bash
|
|
# Scale a worker service to 3 replicas
|
|
docker compose up -d --scale ingestion=3
|
|
|
|
# Scale multiple services
|
|
docker compose up -d --scale ingestion=3 --scale extractor=2
|
|
|
|
# Scale back to 1
|
|
docker compose up -d --scale ingestion=1
|
|
```
|
|
|
|
> **Note**: Scaling works best for worker services (ingestion, parser, extractor, aggregation, recommendation, broker-adapter, lake-publisher) that consume from Redis queues. Do not scale FastAPI services that expose host ports without adjusting port mappings.
|
|
|
|
### Inspecting Services
|
|
|
|
```bash
|
|
# List all services and their status
|
|
docker compose ps
|
|
|
|
# View resource usage
|
|
docker compose top
|
|
|
|
# Execute a command inside a running container
|
|
docker compose exec query-api python -c "from services.shared.config import load_config; print(load_config())"
|
|
|
|
# Open a shell in a container
|
|
docker compose exec postgres psql -U stonks -d stonks
|
|
```
|
|
|
|
### Full Reset
|
|
|
|
```bash
|
|
# Nuclear option: stop everything, remove volumes, rebuild, restart
|
|
docker compose down -v
|
|
docker compose build --no-cache
|
|
docker compose up -d
|
|
```
|
|
|
|
This destroys all data (database, object storage, model weights, metastore, Superset config) and starts from scratch. PostgreSQL migrations are re-applied automatically. MinIO buckets are re-created by `minio-init`. Ollama models must be re-downloaded.
|
|
|
|
---
|
|
|
|
## MinIO Bucket Initialization
|
|
|
|
The `minio-init` service runs once on startup and creates the required object storage buckets:
|
|
|
|
| Bucket | Purpose |
|
|
|--------|---------|
|
|
| `stonks-raw-market` | Raw market data from Polygon.io |
|
|
| `stonks-raw-news` | Raw news articles |
|
|
| `stonks-raw-filings` | Raw SEC filings |
|
|
| `stonks-normalized` | Normalized/parsed documents |
|
|
| `stonks-llm-prompts` | LLM prompt archives |
|
|
| `stonks-llm-results` | LLM extraction results |
|
|
| `stonks-lakehouse` | Parquet fact tables for Trino |
|
|
| `stonks-audit` | Audit trail artifacts |
|
|
|
|
Access the MinIO console at `http://localhost:9001` (credentials: `minioadmin` / `minioadmin`).
|
|
|
|
---
|
|
|
|
## Dashboard Reverse Proxy
|
|
|
|
The dashboard container runs nginx with reverse proxy rules that route API requests to backend services using Docker Compose service names:
|
|
|
|
| Path | Proxied To | Service |
|
|
|------|-----------|---------|
|
|
| `/api/` | `http://query-api:8000` | Query API |
|
|
| `/registry/` | `http://symbol-registry:8000/` | Symbol Registry API |
|
|
| `/risk/` | `http://risk-engine:8000/` | Risk Engine API |
|
|
| `/trading/` | `http://trading-engine:8000/` | Trading Engine API |
|
|
|
|
All other paths serve the React SPA with `try_files` fallback to `index.html`.
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Service won't start
|
|
|
|
Check dependency health:
|
|
|
|
```bash
|
|
docker compose ps postgres redis minio
|
|
```
|
|
|
|
If infrastructure services are unhealthy, application services will wait indefinitely. Check infrastructure logs:
|
|
|
|
```bash
|
|
docker compose logs postgres
|
|
```
|
|
|
|
### Database migration errors
|
|
|
|
Migrations in `./infra/migrations/` are applied by PostgreSQL's `docker-entrypoint-initdb.d` mechanism, which only runs on first database initialization. If you need to re-run migrations:
|
|
|
|
```bash
|
|
docker compose down -v # Remove pgdata volume
|
|
docker compose up -d # Migrations re-applied on fresh init
|
|
```
|
|
|
|
### Ollama model not available
|
|
|
|
The extractor service needs an LLM model loaded in Ollama. Pull a model manually:
|
|
|
|
```bash
|
|
docker compose exec ollama ollama pull qwen3.5:9b
|
|
```
|
|
|
|
### Port conflicts
|
|
|
|
If a port is already in use, modify the host port mapping in `docker-compose.yml`:
|
|
|
|
```yaml
|
|
query-api:
|
|
ports:
|
|
- "9004:8000" # Changed from 8004 to 9004
|
|
```
|