diff --git a/docker-compose.yml b/docker-compose.yml index 944d79b..2c979d9 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -312,6 +312,10 @@ services: <<: *app-env ports: - "8003:8000" + networks: + default: + aliases: + - risk depends_on: postgres: condition: service_healthy diff --git a/docs/docker-deployment.md b/docs/docker-deployment.md index beddb66..f01ace6 100644 --- a/docs/docker-deployment.md +++ b/docs/docker-deployment.md @@ -178,9 +178,16 @@ All application services support additional environment variables loaded via `se | `REDIS_DB` | `0` | Redis database number | | `REDIS_PASSWORD` | (none) | Redis password (not needed in Docker Compose) | | `MINIO_SECURE` | `false` | Use HTTPS for MinIO | +| `OLLAMA_BASE_URL` | `http://ollama:11434` | Ollama LLM server URL | | `OLLAMA_MODEL` | `qwen3.5:9b` | Default LLM model for extraction | | `OLLAMA_TIMEOUT` | `120` | Ollama request timeout (seconds) | | `OLLAMA_MAX_RETRIES` | `2` | Max retries for Ollama requests | +| `VLLM_BASE_URL` | (empty) | vLLM server URL (if using vLLM instead of Ollama) | +| `VLLM_MODEL` | (empty) | vLLM model name (e.g. `AxionML/Qwen3.5-9B-NVFP4`) | +| `VLLM_TIMEOUT` | `120` | vLLM request timeout (seconds) | +| `VLLM_MAX_RETRIES` | `2` | Max retries for vLLM requests | +| `VLLM_TEMPERATURE` | `0.7` | vLLM sampling temperature | +| `VLLM_API_KEY` | (empty) | vLLM API key (if required) | | `TRINO_HOST` | `localhost` | Trino hostname | | `TRINO_PORT` | `8080` | Trino port | | `TRINO_CATALOG` | `lakehouse` | Trino catalog name | @@ -203,6 +210,103 @@ See `services/shared/config.py` for the complete list of all supported environme --- +## LLM Provider Configuration + +Stonks Oracle supports two LLM backends: **Ollama** (local, self-hosted) and **vLLM** (high-performance inference server). The active provider is configured per-agent in the `ai_agents` database table, but the connection details come from environment variables. + +### Option A: Bundled Ollama (default) + +The `docker-compose.yml` includes an Ollama container. On first start, pull a model: + +```bash +docker compose exec ollama ollama pull qwen3.5:9b-fast +``` + +No additional configuration needed — services connect to `http://ollama:11434` by default. + +### Option B: External Ollama + +If Ollama is already running on the host (e.g. with GPU access), create a `docker-compose.override.yml`: + +```yaml +services: + ollama: + entrypoint: ["true"] + restart: "no" + ports: [] + extractor: + depends_on: + postgres: + condition: service_healthy + redis: + condition: service_healthy + environment: + OLLAMA_BASE_URL: "http://host.docker.internal:11434" + extra_hosts: + - "host.docker.internal:host-gateway" + recommendation: + environment: + OLLAMA_BASE_URL: "http://host.docker.internal:11434" + extra_hosts: + - "host.docker.internal:host-gateway" +``` + +This disables the bundled Ollama container and routes services to the host's instance. Replace the port if your Ollama runs on a non-standard port. + +### Option C: vLLM Server + +For higher throughput or quantized models (e.g. `AxionML/Qwen3.5-9B-NVFP4`), point services at a vLLM server. Add to your `.env`: + +```dotenv +VLLM_BASE_URL=http://192.168.42.254:8000 +VLLM_MODEL=AxionML/Qwen3.5-9B-NVFP4 +VLLM_TIMEOUT=120 +VLLM_TEMPERATURE=0.7 +``` + +Then update the `ai_agents` table to use the vLLM provider: + +```sql +UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE active = true; +``` + +Or use the API: + +```bash +curl -X PUT http://localhost:8004/api/admin/agents/document-extractor \ + -H 'Content-Type: application/json' \ + -d '{"model_provider": "vllm", "model_name": "AxionML/Qwen3.5-9B-NVFP4"}' +``` + +### Option D: Mixed (Ollama + vLLM) + +You can run different agents on different providers. For example, use vLLM for the high-volume extractor and Ollama for the thesis rewriter: + +```sql +UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE slug = 'document-extractor'; +UPDATE ai_agents SET model_provider = 'vllm', model_name = 'AxionML/Qwen3.5-9B-NVFP4' WHERE slug = 'event-classifier'; +UPDATE ai_agents SET model_provider = 'ollama', model_name = 'qwen3.5:9b-fast' WHERE slug = 'thesis-rewriter'; +``` + +Both `OLLAMA_BASE_URL` and `VLLM_BASE_URL` must be set in the environment for mixed mode. + +### Automated Deployment + +The `deploy-docker.sh` script handles LLM configuration automatically: + +```bash +# Auto-detect host Ollama, use default model +bash deploy-docker.sh + +# Specify a remote Ollama instance +bash deploy-docker.sh --ollama-url http://10.1.1.12:2701 --ollama-model qwen3.6 + +# Specify a different host +bash deploy-docker.sh --host user@myserver --dir /opt/stonks +``` + +--- + ## Volume Mounts and Data Persistence Docker Compose defines five named volumes for persistent data: @@ -576,9 +680,11 @@ The dashboard container runs nginx with reverse proxy rules that route API reque |------|-----------|---------| | `/api/` | `http://query-api:8000` | Query API | | `/registry/` | `http://symbol-registry:8000/` | Symbol Registry API | -| `/risk/` | `http://risk-engine:8000/` | Risk Engine API | +| `/risk/` | `http://risk:8000/` | Risk Engine (via network alias) | | `/trading/` | `http://trading-engine:8000/` | Trading Engine API | +The `risk-engine` service has a network alias of `risk` in `docker-compose.yml` so the nginx upstream resolves correctly. + All other paths serve the React SPA with `try_files` fallback to `index.html`. --- @@ -610,12 +716,23 @@ docker compose up -d # Migrations re-applied on fresh init ### Ollama model not available -The extractor service needs an LLM model loaded in Ollama. Pull a model manually: +The extractor service needs an LLM model loaded. Pull a model manually: ```bash -docker compose exec ollama ollama pull qwen3.5:9b +# If using bundled Ollama container: +docker compose exec ollama ollama pull qwen3.5:9b-fast + +# If using host Ollama: +ollama pull qwen3.5:9b-fast + +# If using vLLM, ensure the model is loaded on the vLLM server +curl http://your-vllm-host:8000/v1/models ``` +### Ollama port conflict (address already in use) + +If Ollama is already running on the host, the bundled container will fail to bind port 11434. Use the external Ollama configuration described in the "LLM Provider Configuration" section above, or use `deploy-docker.sh` which handles this automatically. + ### Port conflicts If a port is already in use, modify the host port mapping in `docker-compose.yml`: