feat: implement dual-pipeline signal engine service

New service at services/signal_engine/ implementing concurrent heuristic (deterministic scoring) and probabilistic (Bayesian inference) pipelines that evaluate technical signals across 6 timeframes (M30-M) and produce independent BUY/WATCH/SKIP verdicts per ticker per evaluation tick. Components: - Input Normalizer: multi-source data assembly with sentinel fallbacks - Signal Library: Fibonacci, MA Stack, RSI, Cup & Handle, Elliott Wave - Multi-Timeframe Confluence Engine: weighted scoring with D/W/M anchors - Hard Filter Engine: macro_bias, valuation, earnings proximity gating - Heuristic Pipeline: S_total scoring with confidence-gated verdicts - Probabilistic Pipeline: Bayesian log-odds with regime priors, entropy gating, EV_R calculation, and signal correlation penalty - Exit Engine: stop-loss, targets, trailing ATR-based stops - Delta Analyzer: pipeline agreement tracking with rolling Redis metrics - Output Formatter: SignalOutput contract + Recommendation schema mapping - Worker orchestrator: concurrent pipelines with failure isolation - Main entry point: queue polling with fail-safe config loading Infrastructure: - Migration 039: signal_engine_outputs table with 3 indexes - Helm chart: signalEngine service entry (processing tier) - Redis key: QUEUE_SIGNAL_ENGINE constant Tests: 390 tests (unit + property-based) covering all components Config: dual_pipeline_enabled=false by default (safe rollout)
2026-05-02 07:32:26 +00:00
parent 7e2343ec2c
commit f468e30af0
61 changed files with 14107 additions and 184 deletions
@@ -5,6 +5,7 @@ This guide covers running the full Stonks Oracle platform locally using Docker C
 ## Prerequisites

 - Docker Engine 24+ and Docker Compose v2
+- NVIDIA GPU with drivers and NVIDIA Container Toolkit (for Ollama LLM inference)
 - At least 16 GB RAM (Ollama + Trino + all services)
 - API keys for Polygon.io and Alpaca (optional — platform runs in degraded mode without them)

@@ -14,20 +15,54 @@ This guide covers running the full Stonks Oracle platform locally using Docker C
 # 1. Clone the repository
 git clone <repo-url> && cd stonks-oracle

-# 2. Configure API keys
-cp .env.example .env   # or edit the existing .env
-# Fill in MARKET_DATA_API_KEY, BROKER_API_KEY, BROKER_API_SECRET
+# 2. Configure API keys (create .env in the repo root)
+cat > .env <<'EOF'
+MARKET_DATA_API_KEY=your_polygon_key
+BROKER_API_KEY=your_alpaca_key
+BROKER_API_SECRET=your_alpaca_secret
+BROKER_BASE_URL=https://paper-api.alpaca.markets
+EOF

 # 3. Start everything
 docker compose up -d

-# 4. Verify all services are healthy
+# 4. Pull an LLM model into Ollama
+docker compose exec ollama ollama pull qwen3.5:9b-fast
+
+# 5. Seed the database
+docker compose exec scheduler python -m services.symbol_registry.seed
+
+# 6. Verify all services are healthy
 docker compose ps

-# 5. Access the dashboard
+# 7. Access the dashboard
 open http://localhost:3000
 ```

+### Automated Deployment
+
+The `deploy-docker.sh` script automates the full deployment to a remote host via SSH, including prerequisite installation, repository sync, environment configuration, image builds, service startup, database seeding, and Ollama model pulling:
+
+```bash
+# Deploy with defaults (GPU-accelerated Docker Ollama)
+bash deploy-docker.sh
+
+# Specify a custom Ollama model
+bash deploy-docker.sh --ollama-model qwen3.6
+
+# Deploy to a different host
+bash deploy-docker.sh --host user@myserver --dir /opt/stonks
+```
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--host` | `celes@192.168.42.254` | SSH target (`USER@HOST`) |
+| `--ollama-url` | (auto — Docker container) | Ollama API URL |
+| `--ollama-model` | `qwen3.5:9b-fast` | Ollama model to pull |
+| `--dir` | `~/stonks-oracle` | Remote install directory |
+
+The script detects the target OS and package manager (apt, dnf, yum, pacman, zypper) and installs Docker, NVIDIA drivers, and the NVIDIA Container Toolkit as needed. It also handles WSL environments and firewall configuration.
+
 ---

 ## Service Inventory
@@ -63,6 +98,8 @@ open http://localhost:3000
 | `query-api` | `docker/Dockerfile` | `uvicorn services.api.app:app --host 0.0.0.0 --port 8000` | `8004:8000` | postgres (healthy), redis (healthy), minio (healthy) |
 | `dashboard` | `frontend/Dockerfile` | nginx (built-in) | `3000:8080` | query-api (healthy) |

+The `risk-engine` service has a Docker network alias of `risk` so the dashboard's nginx reverse proxy can resolve it as `http://risk:8000`.
+
 ### Port Summary

 | Port | Service | Protocol |
@@ -109,15 +146,27 @@ The `.env` file is loaded by `ingestion`, `broker-adapter`, and `trading-engine`

 ```dotenv
 # Stonks Oracle — Environment Variables
-# These are loaded by ingestion, broker-adapter, and trading-engine services.
+# Loaded by: ingestion, broker-adapter, trading-engine

-# Polygon.io market data API key (required for live data ingestion)
+# ── Required for live data ingestion ──
 MARKET_DATA_API_KEY=

-# Alpaca broker credentials (required for paper/live trading)
+# ── Required for paper/live trading ──
 BROKER_API_KEY=
 BROKER_API_SECRET=
 BROKER_BASE_URL=https://paper-api.alpaca.markets
+
+# ── Trading engine settings (optional) ──
+TRADING_ENABLED=true
+TRADING_RISK_TIER=moderate
+TRADING_MAX_OPEN_POSITIONS=15
+
+# ── LLM model (optional) ──
+OLLAMA_MODEL=qwen3.5:9b-fast
+
+# ── Signal layers (optional) ──
+MACRO_ENABLED=true
+COMPETITIVE_ENABLED=true
 ```

 | Variable | Required | Default | Used By | Description |
@@ -178,20 +227,24 @@ All application services support additional environment variables loaded via `se
 | `REDIS_DB` | `0` | Redis database number |
 | `REDIS_PASSWORD` | (none) | Redis password (not needed in Docker Compose) |
 | `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
-| `OLLAMA_BASE_URL` | `http://ollama:11434` | Ollama LLM server URL |
 | `OLLAMA_MODEL` | `qwen3.5:9b` | Default LLM model for extraction |
 | `OLLAMA_TIMEOUT` | `120` | Ollama request timeout (seconds) |
 | `OLLAMA_MAX_RETRIES` | `2` | Max retries for Ollama requests |
-| `VLLM_BASE_URL` | (empty) | vLLM server URL (if using vLLM instead of Ollama) |
-| `VLLM_MODEL` | (empty) | vLLM model name (e.g. `AxionML/Qwen3.5-9B-NVFP4`) |
+| `OLLAMA_RETRY_BASE_DELAY` | `1.0` | Base delay between retries (seconds) |
+| `OLLAMA_RETRY_MAX_DELAY` | `10.0` | Maximum delay between retries (seconds) |
+| `OLLAMA_RETRY_BACKOFF_MULTIPLIER` | `2.0` | Backoff multiplier for retries |
+| `VLLM_BASE_URL` | `http://192.168.42.254:8000` | vLLM server URL (if using vLLM instead of Ollama) |
+| `VLLM_MODEL` | `RedHatAI/Qwen3.6-35B-A3B-NVFP4` | vLLM model name |
 | `VLLM_TIMEOUT` | `120` | vLLM request timeout (seconds) |
 | `VLLM_MAX_RETRIES` | `2` | Max retries for vLLM requests |
 | `VLLM_TEMPERATURE` | `0.7` | vLLM sampling temperature |
+| `VLLM_MAX_TOKENS` | `4096` | vLLM max output tokens |
 | `VLLM_API_KEY` | (empty) | vLLM API key (if required) |
 | `TRINO_HOST` | `localhost` | Trino hostname |
 | `TRINO_PORT` | `8080` | Trino port |
 | `TRINO_CATALOG` | `lakehouse` | Trino catalog name |
 | `TRINO_SCHEMA` | `stonks` | Trino schema name |
+| `TRINO_ICEBERG_CATALOG` | `iceberg` | Trino Iceberg catalog name |
 | `MARKET_DATA_BASE_URL` | `https://api.polygon.io` | Polygon.io base URL |
 | `MARKET_DATA_PROVIDER` | `polygon` | Market data provider |
 | `BROKER_MODE` | `paper` | Broker mode: `paper` or `live` |
@@ -200,12 +253,62 @@ All application services support additional environment variables loaded via `se
 | `TRADING_RISK_TIER` | `moderate` | Risk tier: `conservative`, `moderate`, `aggressive` |
 | `TRADING_POLLING_INTERVAL_SECONDS` | `60` | Recommendation polling interval |
 | `TRADING_MAX_OPEN_POSITIONS` | `10` | Maximum concurrent open positions |
+| `TRADING_RESERVE_SIPHON_PCT` | `0.20` | Percentage of profits siphoned to reserve pool |
+| `TRADING_STOP_LOSS_CHECK_INTERVAL_SECONDS` | `300` | Stop-loss check interval |
+| `TRADING_FAST_STOP_LOSS_INTERVAL_SECONDS` | `60` | Fast stop-loss check interval |
+| `TRADING_GRADUAL_ENTRY_TRANCHES` | `3` | Number of tranches for gradual entry |
+| `TRADING_GRADUAL_ENTRY_THRESHOLD_DOLLARS` | `30.0` | Dollar threshold for gradual entry |
+| `TRADING_ABSOLUTE_POSITION_CAP` | `50.0` | Maximum position size (dollars) |
+| `TRADING_ACTIVE_POOL_MINIMUM` | `100.0` | Minimum active pool balance |
+| `TRADING_EMERGENCY_DRAWDOWN_THRESHOLD_PCT` | `0.40` | Emergency drawdown threshold |
+| `TRADING_RESERVE_HIGH_WATER_PCT` | `0.30` | Reserve high-water mark percentage |
+| `TRADING_MICRO_TRADING_ENABLED` | `false` | Enable micro-trading mode |
+| `TRADING_MICRO_TRADING_INTERVAL_SECONDS` | `300` | Micro-trading polling interval |
+| `TRADING_MICRO_TRADING_ALLOCATION_CAP_PCT` | `0.03` | Micro-trading allocation cap |
+| `TRADING_MICRO_TRADING_MAX_DAILY` | `10` | Max micro-trades per day |
+| `TRADING_MICRO_TRADING_MAX_HOLD_MINUTES` | `120` | Max micro-trade hold time |
+| `TRADING_SNS_TOPIC_ARN` | (empty) | AWS SNS topic ARN for notifications |
+| `TRADING_SNS_PHONE_NUMBER` | (empty) | Phone number for SNS notifications |
+| `TRADING_GMAIL_SENDER` | (empty) | Gmail sender address for notifications |
+| `TRADING_GMAIL_RECIPIENT` | (empty) | Gmail recipient address for notifications |
 | `MACRO_ENABLED` | `true` | Enable macro signal layer |
+| `MACRO_SIGNAL_WEIGHT` | `0.3` | Relative weight of macro vs company signals |
+| `MACRO_CONFIDENCE_THRESHOLD` | `0.4` | Minimum confidence for macro event inclusion |
+| `MACRO_SHORT_TERM_STALENESS_HOURS` | `48` | Hours before short-term events get accelerated decay |
+| `PROJECTION_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for projections to influence recommendations |
 | `COMPETITIVE_ENABLED` | `true` | Enable competitive signal layer |
+| `COMPETITIVE_SIGNAL_WEIGHT` | `0.2` | Relative weight of competitive signals |
+| `COMPETITIVE_PATTERN_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for pattern inclusion |
+| `COMPETITIVE_PROPAGATION_STRENGTH_THRESHOLD` | `0.2` | Minimum strength for signal propagation |
+| `COMPETITIVE_ROUTINE_LOOKBACK_DAYS` | `180` | Lookback window for routine patterns |
+| `COMPETITIVE_MAJOR_DECISION_LOOKBACK_DAYS` | `365` | Lookback window for major decisions |
+| `COMPETITIVE_MIN_PATTERN_SAMPLES` | `3` | Minimum samples for pattern matching |
+| `COMPETITIVE_MAJOR_DECISION_WEIGHT_MULTIPLIER` | `1.3` | Weight multiplier for major decision patterns |
+| `COMPETITIVE_STALENESS_WINDOW_DAYS` | `180` | Window for staleness decay on competitive signals |
+| `COMPETITIVE_STALENESS_RECENT_DAYS` | `90` | Days within which signals are considered recent |
+| `COMPETITIVE_STALENESS_DECAY_PENALTY` | `0.5` | Decay penalty for stale competitive signals |
+| `COMPETITIVE_PROPAGATION_FAILURE_THRESHOLD` | `5` | Consecutive propagation failures before operator alert |
+| `ALERT_SOURCE_FAILURE_THRESHOLD` | `3` | Consecutive source failures before alert fires |
+| `ALERT_SOURCE_FAILURE_WINDOW_HOURS` | `6` | Lookback window for source failure alerting |
+| `ALERT_SCHEMA_FAILURE_RATE_THRESHOLD` | `0.3` | Extraction failure rate (30%) that triggers alert |
+| `ALERT_SCHEMA_FAILURE_WINDOW_HOURS` | `1` | Lookback window for schema failure spike |
+| `ALERT_LAKE_LAG_THRESHOLD_MINUTES` | `60` | Minutes since last lake publish before alert |
+| `ALERT_BROKER_ERROR_THRESHOLD` | `3` | Consecutive broker errors before alert |
+| `ALERT_BROKER_ERROR_WINDOW_HOURS` | `1` | Lookback window for broker error alerting |
+| `ALERT_CHECK_INTERVAL_SECONDS` | `120` | How often alerting rules are evaluated |
+| `RETENTION_RAW_MARKET_DAYS` | `90` | Retention period for raw market data (days) |
+| `RETENTION_RAW_NEWS_DAYS` | `180` | Retention period for raw news articles (days) |
+| `RETENTION_RAW_FILINGS_DAYS` | `365` | Retention period for raw SEC filings (days) |
+| `RETENTION_NORMALIZED_DAYS` | `180` | Retention period for normalized documents (days) |
+| `RETENTION_LLM_PROMPTS_DAYS` | `365` | Retention period for LLM prompt archives (days) |
+| `RETENTION_LLM_RESULTS_DAYS` | `365` | Retention period for LLM extraction results (days) |
+| `RETENTION_LAKEHOUSE_DAYS` | `730` | Retention period for lakehouse Parquet files (days) |
+| `RETENTION_AUDIT_DAYS` | `730` | Retention period for audit trail artifacts (days) |
+| `RETENTION_CLEANUP_INTERVAL_HOURS` | `24` | How often the retention cleanup worker runs |
+| `RETENTION_BATCH_SIZE` | `1000` | Number of objects processed per cleanup batch |
 | `LOG_LEVEL` | `INFO` | Logging level |
 | `JSON_LOGS` | `true` | Enable structured JSON logging |
 | `DEPLOY_STAGE` | (empty) | Deployment stage prefix for bucket names |
-| `TZ` | `America/Los_Angeles` | Display timezone for timestamps (set on all containers) |

 See `services/shared/config.py` for the complete list of all supported environment variables with their defaults.

@@ -217,7 +320,7 @@ Stonks Oracle supports two LLM backends: **Ollama** (local, self-hosted) and **v

 ### Option A: Bundled Ollama (default)

-The `docker-compose.yml` includes an Ollama container. On first start, pull a model:
+The `docker-compose.yml` includes an Ollama container with GPU passthrough via the NVIDIA Container Toolkit. On first start, pull a model:

 ```bash
 docker compose exec ollama ollama pull qwen3.5:9b-fast
@@ -225,6 +328,8 @@ docker compose exec ollama ollama pull qwen3.5:9b-fast

 No additional configuration needed — services connect to `http://ollama:11434` by default.

+The Ollama container requests all available NVIDIA GPUs via the `deploy.resources.reservations.devices` configuration. If no GPU is available, Ollama falls back to CPU inference (significantly slower).
+
 ### Option B: External Ollama

 If Ollama is already running on the host (e.g. with GPU access), create a `docker-compose.override.yml`:
@@ -252,15 +357,15 @@ services:
      - "host.docker.internal:host-gateway"
 ```

-This disables the bundled Ollama container and routes services to the host's instance. Replace the port if your Ollama runs on a non-standard port.
+This disables the bundled Ollama container and routes services to the host's instance. Replace the port if your Ollama runs on a non-standard port. For a remote Ollama instance (not on localhost), replace `host.docker.internal` with the remote IP and remove the `extra_hosts` block.

 ### Option C: vLLM Server

-For higher throughput or quantized models (e.g. `AxionML/Qwen3.5-9B-NVFP4`), point services at a vLLM server. Add to your `.env`:
+For higher throughput or quantized models (e.g. `RedHatAI/Qwen3.6-35B-A3B-NVFP4`), point services at a vLLM server. Add to your `.env`:

 ```dotenv
 VLLM_BASE_URL=http://192.168.42.254:8000
-VLLM_MODEL=AxionML/Qwen3.5-9B-NVFP4
+VLLM_MODEL=RedHatAI/Qwen3.6-35B-A3B-NVFP4
 VLLM_TIMEOUT=120
 VLLM_TEMPERATURE=0.7
 ```
@@ -268,7 +373,7 @@ VLLM_TEMPERATURE=0.7
 Then update the `ai_agents` table to use the vLLM provider:

 ```sql
-UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE active = true;
+UPDATE ai_agents SET model_provider = 'vllm', model_name = 'RedHatAI/Qwen3.6-35B-A3B-NVFP4' WHERE active = true;
 ```

 Or use the API:
@@ -276,7 +381,7 @@ Or use the API:
 ```bash
 curl -X PUT http://localhost:8004/api/admin/agents/document-extractor \
  -H 'Content-Type: application/json' \
-  -d '{"model_provider": "vllm", "model_name": "AxionML/Qwen3.5-9B-NVFP4"}'
+  -d '{"model_provider": "vllm", "model_name": "RedHatAI/Qwen3.6-35B-A3B-NVFP4"}'
 ```

 ### Option D: Mixed (Ollama + vLLM)
@@ -284,8 +389,8 @@ curl -X PUT http://localhost:8004/api/admin/agents/document-extractor \
 You can run different agents on different providers. For example, use vLLM for the high-volume extractor and Ollama for the thesis rewriter:

 ```sql
-UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE slug = 'document-extractor';
-UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE slug = 'event-classifier';
+UPDATE ai_agents SET model_provider = 'vllm', model_name = 'RedHatAI/Qwen3.6-35B-A3B-NVFP4' WHERE slug = 'document-extractor';
+UPDATE ai_agents SET model_provider = 'vllm', model_name = 'RedHatAI/Qwen3.6-35B-A3B-NVFP4' WHERE slug = 'event-classifier';
 UPDATE ai_agents SET model_provider = 'ollama', model_name = 'qwen3.5:9b-fast' WHERE slug = 'thesis-rewriter';
 ```

@@ -293,19 +398,21 @@ Both `OLLAMA_BASE_URL` and `VLLM_BASE_URL` must be set in the environment for mi

 ### Automated Deployment

-The `deploy-docker.sh` script handles LLM configuration automatically:
+The `deploy-docker.sh` script handles LLM configuration automatically. It always uses the Docker Ollama container with GPU passthrough (NVIDIA Container Toolkit):

 ```bash
-# Auto-detect host Ollama, use default model
+# Deploy with defaults (Docker Ollama, GPU-accelerated)
 bash deploy-docker.sh

-# Specify a remote Ollama instance
-bash deploy-docker.sh --ollama-url http://10.1.1.12:2701 --ollama-model qwen3.6
+# Specify a custom model
+bash deploy-docker.sh --ollama-model qwen3.6

-# Specify a different host
+# Specify a different host and directory
 bash deploy-docker.sh --host user@myserver --dir /opt/stonks
 ```

+If an external Ollama URL is provided via `--ollama-url`, the script creates a `docker-compose.override.yml` that disables the bundled container and routes services to the external instance.
+
 ---

 ## Volume Mounts and Data Persistence
@@ -404,6 +511,9 @@ docker compose ps query-api

 # Inspect health check details for a container
 docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | python -m json.tool
+
+# Wait for all services to be healthy
+docker compose up -d --wait
 ```

 ---
@@ -414,17 +524,19 @@ docker inspect --format='{{json .State.Health}}' stonks-oracle-query-api-1 | pyt

 Used by all application services except the scheduler. Accepts a `SERVICE_CMD` build argument that determines which service the container runs.

-**Base image**: `python:3.12-slim`
+**Base image**: `python:3.12-slim` (via Harbor proxy cache in CI)

 **Build arguments**:

 | Argument | Default | Description |
 |----------|---------|-------------|
 | `SERVICE_CMD` | `python -m services.scheduler.app` | The command executed when the container starts |
+| `CACHE_BUST` | (none) | Optional cache-busting argument to force rebuild of source layers |

 **What gets copied**:
 - `requirements.txt` → pip dependencies installed
 - `services/` → all service source code
+- `scripts/` → operational scripts
 - `tests/` → test files (available for in-container testing)
 - `conftest.py` → pytest configuration

@@ -462,7 +574,7 @@ A specialized variant of the generic Dockerfile used only by the `scheduler` ser

 Extends the official Apache Superset image with additional database drivers.

-**Base image**: `apache/superset:latest`
+**Base image**: `apache/superset:latest` (via Harbor proxy cache in CI)

 **Additional packages**: `trino[sqlalchemy]`, `psycopg2-binary`, `redis`

@@ -481,7 +593,9 @@ Multi-stage build for the React dashboard.
 **Stage 2 — Serve** (base: `nginxinc/nginx-unprivileged:alpine`):
 - Serves the built static files on port 8080
 - Uses `frontend/nginx.conf` for SPA fallback and API reverse proxying
- Proxies `/api/` → `query-api:8000`, `/registry/` → `symbol-registry:8000`, `/risk/` → `risk-engine:8000`, `/trading/` → `trading-engine:8000`
+- Proxies `/api/` → `query-api:8000`, `/registry/` → `symbol-registry:8000`, `/risk/` → `risk:8000`, `/trading/` → `trading-engine:8000`
+- SSE stream endpoint (`/api/ops/pipeline/stream`) has buffering disabled for real-time delivery
+- Static assets under `/assets/` are cached with 1-year expiry

 ### Building Custom Images

@@ -503,6 +617,9 @@ docker build -t my-dashboard \

 # Rebuild all images
 docker compose build
+
+# Rebuild without cache (force fresh build)
+docker compose build --no-cache
 ```

 ---
@@ -561,6 +678,9 @@ Services with `condition: service_healthy` wait until the dependency's health ch
 # Start all services in the background
 docker compose up -d

+# Start all services and wait for health checks
+docker compose up -d --wait
+
 # Start only infrastructure (useful for local development)
 docker compose up -d postgres redis minio minio-init ollama

@@ -639,6 +759,9 @@ docker compose exec query-api python -c "from services.shared.config import load

 # Open a shell in a container
 docker compose exec postgres psql -U stonks -d stonks
+
+# Seed the database
+docker compose exec scheduler python -m services.symbol_registry.seed
 ```

 ### Full Reset
@@ -680,13 +803,16 @@ The dashboard container runs nginx with reverse proxy rules that route API reque
 | Path | Proxied To | Service |
 |------|-----------|---------|
 | `/api/` | `http://query-api:8000` | Query API |
+| `/api/ops/pipeline/stream` | `http://query-api:8000` (SSE, no buffering) | Query API (real-time pipeline stream) |
 | `/registry/` | `http://symbol-registry:8000/` | Symbol Registry API |
 | `/risk/` | `http://risk:8000/` | Risk Engine (via network alias) |
 | `/trading/` | `http://trading-engine:8000/` | Trading Engine API |

 The `risk-engine` service has a network alias of `risk` in `docker-compose.yml` so the nginx upstream resolves correctly.

-All other paths serve the React SPA with `try_files` fallback to `index.html`.
+All other paths serve the React SPA with `try_files` fallback to `index.html`. Static assets under `/assets/` are served with 1-year cache headers.
+
+Security headers applied: `X-Frame-Options: SAMEORIGIN`, `X-Content-Type-Options: nosniff`, `Referrer-Policy: strict-origin-when-cross-origin`.

 ---

@@ -734,6 +860,19 @@ curl http://your-vllm-host:8000/v1/models

 If Ollama is already running on the host, the bundled container will fail to bind port 11434. Use the external Ollama configuration described in the "LLM Provider Configuration" section above, or use `deploy-docker.sh` which handles this automatically.

+### GPU not detected by Ollama container
+
+Ensure the NVIDIA Container Toolkit is installed and Docker is configured:
+
+```bash
+# Verify GPU passthrough works
+docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
+
+# If it fails, reconfigure Docker runtime
+sudo nvidia-ctk runtime configure --runtime=docker
+sudo systemctl restart docker
+```
+
 ### Port conflicts

 If a port is already in use, modify the host port mapping in `docker-compose.yml`:
@@ -743,3 +882,15 @@ query-api:
  ports:
    - "9004:8000"   # Changed from 8004 to 9004
 ```
+
+### Container runs out of memory
+
+The full stack requires at least 16 GB RAM. If services are being OOM-killed:
+
+```bash
+# Check which containers are using the most memory
+docker stats --no-stream
+
+# Reduce memory usage by stopping non-essential services
+docker compose stop trino hive-metastore superset
+```