docs: add LLM provider config (Ollama/vLLM/mixed), fix risk network alias in compose

2026-04-29 03:08:54 +00:00
parent f151747d56
commit 11c6457559
2 changed files with 124 additions and 3 deletions
@@ -312,6 +312,10 @@ services:
      <<: *app-env
    ports:
      - "8003:8000"
+    networks:
+      default:
+        aliases:
+          - risk
    depends_on:
      postgres:
        condition: service_healthy
@@ -178,9 +178,16 @@ All application services support additional environment variables loaded via `se
 | `REDIS_DB` | `0` | Redis database number |
 | `REDIS_PASSWORD` | (none) | Redis password (not needed in Docker Compose) |
 | `MINIO_SECURE` | `false` | Use HTTPS for MinIO |
+| `OLLAMA_BASE_URL` | `http://ollama:11434` | Ollama LLM server URL |
 | `OLLAMA_MODEL` | `qwen3.5:9b` | Default LLM model for extraction |
 | `OLLAMA_TIMEOUT` | `120` | Ollama request timeout (seconds) |
 | `OLLAMA_MAX_RETRIES` | `2` | Max retries for Ollama requests |
+| `VLLM_BASE_URL` | (empty) | vLLM server URL (if using vLLM instead of Ollama) |
+| `VLLM_MODEL` | (empty) | vLLM model name (e.g. `AxionML/Qwen3.5-9B-NVFP4`) |
+| `VLLM_TIMEOUT` | `120` | vLLM request timeout (seconds) |
+| `VLLM_MAX_RETRIES` | `2` | Max retries for vLLM requests |
+| `VLLM_TEMPERATURE` | `0.7` | vLLM sampling temperature |
+| `VLLM_API_KEY` | (empty) | vLLM API key (if required) |
 | `TRINO_HOST` | `localhost` | Trino hostname |
 | `TRINO_PORT` | `8080` | Trino port |
 | `TRINO_CATALOG` | `lakehouse` | Trino catalog name |
@@ -203,6 +210,103 @@ See `services/shared/config.py` for the complete list of all supported environme

 ---

+## LLM Provider Configuration
+
+Stonks Oracle supports two LLM backends: **Ollama** (local, self-hosted) and **vLLM** (high-performance inference server). The active provider is configured per-agent in the `ai_agents` database table, but the connection details come from environment variables.
+
+### Option A: Bundled Ollama (default)
+
+The `docker-compose.yml` includes an Ollama container. On first start, pull a model:
+
+```bash
+docker compose exec ollama ollama pull qwen3.5:9b-fast
+```
+
+No additional configuration needed — services connect to `http://ollama:11434` by default.
+
+### Option B: External Ollama
+
+If Ollama is already running on the host (e.g. with GPU access), create a `docker-compose.override.yml`:
+
+```yaml
+services:
+  ollama:
+    entrypoint: ["true"]
+    restart: "no"
+    ports: []
+  extractor:
+    depends_on:
+      postgres:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+    environment:
+      OLLAMA_BASE_URL: "http://host.docker.internal:11434"
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+  recommendation:
+    environment:
+      OLLAMA_BASE_URL: "http://host.docker.internal:11434"
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+```
+
+This disables the bundled Ollama container and routes services to the host's instance. Replace the port if your Ollama runs on a non-standard port.
+
+### Option C: vLLM Server
+
+For higher throughput or quantized models (e.g. `AxionML/Qwen3.5-9B-NVFP4`), point services at a vLLM server. Add to your `.env`:
+
+```dotenv
+VLLM_BASE_URL=http://192.168.42.254:8000
+VLLM_MODEL=AxionML/Qwen3.5-9B-NVFP4
+VLLM_TIMEOUT=120
+VLLM_TEMPERATURE=0.7
+```
+
+Then update the `ai_agents` table to use the vLLM provider:
+
+```sql
+UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE active = true;
+```
+
+Or use the API:
+
+```bash
+curl -X PUT http://localhost:8004/api/admin/agents/document-extractor \
+  -H 'Content-Type: application/json' \
+  -d '{"model_provider": "vllm", "model_name": "AxionML/Qwen3.5-9B-NVFP4"}'
+```
+
+### Option D: Mixed (Ollama + vLLM)
+
+You can run different agents on different providers. For example, use vLLM for the high-volume extractor and Ollama for the thesis rewriter:
+
+```sql
+UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE slug = 'document-extractor';
+UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE slug = 'event-classifier';
+UPDATE ai_agents SET model_provider = 'ollama', model_name = 'qwen3.5:9b-fast' WHERE slug = 'thesis-rewriter';
+```
+
+Both `OLLAMA_BASE_URL` and `VLLM_BASE_URL` must be set in the environment for mixed mode.
+
+### Automated Deployment
+
+The `deploy-docker.sh` script handles LLM configuration automatically:
+
+```bash
+# Auto-detect host Ollama, use default model
+bash deploy-docker.sh
+
+# Specify a remote Ollama instance
+bash deploy-docker.sh --ollama-url http://10.1.1.12:2701 --ollama-model qwen3.6
+
+# Specify a different host
+bash deploy-docker.sh --host user@myserver --dir /opt/stonks
+```
+
+---
+
 ## Volume Mounts and Data Persistence

 Docker Compose defines five named volumes for persistent data:
@@ -576,9 +680,11 @@ The dashboard container runs nginx with reverse proxy rules that route API reque
 |------|-----------|---------|
 | `/api/` | `http://query-api:8000` | Query API |
 | `/registry/` | `http://symbol-registry:8000/` | Symbol Registry API |
-| `/risk/` | `http://risk-engine:8000/` | Risk Engine API |
+| `/risk/` | `http://risk:8000/` | Risk Engine (via network alias) |
 | `/trading/` | `http://trading-engine:8000/` | Trading Engine API |

+The `risk-engine` service has a network alias of `risk` in `docker-compose.yml` so the nginx upstream resolves correctly.
+
 All other paths serve the React SPA with `try_files` fallback to `index.html`.

 ---
@@ -610,12 +716,23 @@ docker compose up -d     # Migrations re-applied on fresh init

 ### Ollama model not available

-The extractor service needs an LLM model loaded in Ollama. Pull a model manually:
+The extractor service needs an LLM model loaded. Pull a model manually:

 ```bash
-docker compose exec ollama ollama pull qwen3.5:9b
+# If using bundled Ollama container:
+docker compose exec ollama ollama pull qwen3.5:9b-fast
+
+# If using host Ollama:
+ollama pull qwen3.5:9b-fast
+
+# If using vLLM, ensure the model is loaded on the vLLM server
+curl http://your-vllm-host:8000/v1/models
 ```

+### Ollama port conflict (address already in use)
+
+If Ollama is already running on the host, the bundled container will fail to bind port 11434. Use the external Ollama configuration described in the "LLM Provider Configuration" section above, or use `deploy-docker.sh` which handles this automatically.
+
 ### Port conflicts

 If a port is already in use, modify the host port mapping in `docker-compose.yml`: