# Stonks Oracle — Local Development Setup (Windows + Docker Desktop) This guide walks you through setting up Stonks Oracle on a Windows machine using Docker Desktop. By the end you will have the full platform running locally: PostgreSQL, Redis, MinIO, Ollama, Trino, and all application services. ## Prerequisites - **Windows 10/11** with WSL 2 enabled - **Docker Desktop** for Windows (with WSL 2 backend) - **Git** (Git for Windows or via WSL) - **Python 3.12** (for running services outside Docker during development) - **Node.js 24** (for frontend development) ### Install Docker Desktop 1. Download from [https://www.docker.com/products/docker-desktop/](https://www.docker.com/products/docker-desktop/) 2. During install, ensure "Use WSL 2 instead of Hyper-V" is checked 3. After install, open Docker Desktop → Settings → Resources → WSL Integration → enable for your distro 4. Allocate at least **8 GB RAM** and **4 CPUs** in Settings → Resources (Ollama needs room) ### Install Python 3.12 Download from [https://www.python.org/downloads/](https://www.python.org/downloads/) and check "Add Python to PATH" during install. Or use `winget install Python.Python.3.12` from PowerShell. ### Install Node.js 24 Download from [https://nodejs.org/](https://nodejs.org/) (LTS or Current, 24.x). Or use `winget install OpenJS.NodeJS`. --- ## 1. Register for API Accounts You need two API accounts. Both have free tiers that work for development. ### Polygon.io (Market Data) 1. Go to [https://polygon.io/](https://polygon.io/) 2. Sign up for a free account 3. Navigate to Dashboard → API Keys 4. Copy your API key — this becomes `MARKET_DATA_API_KEY` The free tier gives you delayed data and limited API calls. Paid tiers ($29+/mo) give real-time data and higher rate limits. ### Alpaca (Paper Trading) 1. Go to [https://alpaca.markets/](https://alpaca.markets/) 2. Sign up for a free account 3. Navigate to the **Paper Trading** dashboard (not live) 4. Go to API Keys → Generate New Key 5. Copy both the **API Key ID** and **Secret Key** — these become `BROKER_API_KEY` and `BROKER_API_SECRET` 6. Your paper trading base URL is `https://paper-api.alpaca.markets` Alpaca paper trading is completely free with no time limit. --- ## 2. Clone the Repository ```powershell git clone https://github.com/celesrenata/stonks-oracle.git cd stonks-oracle ``` --- ## 3. Create Your Environment File Create a `.env` file in the project root with your API keys: ```ini # Polygon.io MARKET_DATA_API_KEY=your_polygon_api_key_here # Alpaca Paper Trading BROKER_API_KEY=your_alpaca_key_id_here BROKER_API_SECRET=your_alpaca_secret_key_here BROKER_BASE_URL=https://paper-api.alpaca.markets BROKER_MODE=paper ``` This file is gitignored. Keep it safe. --- ## 4. Start Infrastructure Services The `docker-compose.yml` in the project root defines all infrastructure services. Start them: ```powershell docker compose up -d ``` This starts: | Service | Port | Purpose | |---------|------|---------| | PostgreSQL 16 | 5432 | Primary database | | Redis 7 | 6379 | Job queues and caching | | MinIO | 9000 (API), 9001 (Console) | Object storage for artifacts | | Ollama | 11434 | Local LLM inference | | Trino | 8080 | SQL query engine for lakehouse | | Hive Metastore | 9083 | Metadata catalog for Trino | | Superset | 8088 | Analytics dashboards | The `minio-init` sidecar automatically creates the required storage buckets. ### Verify everything is running ```powershell docker compose ps ``` All services should show `running` (healthy). Give it 30-60 seconds for health checks to pass. ### Access the UIs - **MinIO Console**: [http://localhost:9001](http://localhost:9001) — login: `minioadmin` / `minioadmin` - **Superset**: [http://localhost:8088](http://localhost:8088) — login: `admin` / `admin` - **Trino**: [http://localhost:8080](http://localhost:8080) --- ## 5. Pull the Ollama Model Stonks Oracle uses the `qwen3.5:9b` model for document extraction and event classification. Pull it: ```powershell docker exec -it stonks-oracle-ollama-1 ollama pull qwen3.5:9b ``` This downloads ~5 GB. If you have a GPU and want faster inference, make sure Docker Desktop has GPU passthrough enabled (Settings → Resources → GPU). Ollama will auto-detect CUDA GPUs. To verify the model is available: ```powershell docker exec stonks-oracle-ollama-1 ollama list ``` --- ## 6. Set Up the Python Environment ```powershell python -m venv .venv .venv\Scripts\activate pip install -r requirements.txt ``` ### Verify the database migrations ran PostgreSQL auto-runs the migration SQL files from `infra/migrations/` on first start (they are mounted into `/docker-entrypoint-initdb.d`). Verify: ```powershell python -c "import asyncio, asyncpg; asyncio.run(asyncpg.connect('postgresql://stonks:stonks_dev@localhost:5432/stonks').then(lambda c: print('Connected!')))" ``` Or more simply, use `psql` if you have it: ```powershell docker exec -it stonks-oracle-postgres-1 psql -U stonks -d stonks -c "\dt" | head -20 ``` You should see tables like `companies`, `documents`, `trend_windows`, `recommendations`, `orders`, etc. ### Seed the company universe ```powershell python -m services.symbol_registry.seed ``` This populates 50 companies across 10 sectors and 46 competitor relationships. --- ## 7. Run the Application Services You can run services directly with Python (for development) or build Docker images. ### Option A: Run directly with Python (recommended for development) Open separate terminal windows for each service. Each needs the virtualenv activated and environment variables set: ```powershell # Terminal 1 — Scheduler (triggers ingestion on a cadence) .venv\Scripts\activate set MARKET_DATA_API_KEY=your_polygon_key python -m services.scheduler.app # Terminal 2 — Ingestion (fetches articles, filings, market data) .venv\Scripts\activate set MARKET_DATA_API_KEY=your_polygon_key python -m services.ingestion.worker # Terminal 3 — Parser (normalizes raw documents) .venv\Scripts\activate python -m services.parser.worker # Terminal 4 — Extractor (LLM-based intelligence extraction) .venv\Scripts\activate python -m services.extractor.main # Terminal 5 — Aggregation (merges signals into trend summaries) .venv\Scripts\activate python -m services.aggregation.main # Terminal 6 — Recommendation (generates trade recommendations) .venv\Scripts\activate python -m services.recommendation.main # Terminal 7 — Query API (REST API for the dashboard) .venv\Scripts\activate uvicorn services.api.app:app --host 0.0.0.0 --port 8000 # Terminal 8 — Symbol Registry (company CRUD API) .venv\Scripts\activate uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8001 # Terminal 9 — Risk Engine .venv\Scripts\activate uvicorn services.risk.app:app --host 0.0.0.0 --port 8002 # Terminal 10 — Trading Engine (autonomous paper trading) .venv\Scripts\activate set BROKER_API_KEY=your_alpaca_key set BROKER_API_SECRET=your_alpaca_secret set BROKER_BASE_URL=https://paper-api.alpaca.markets uvicorn services.trading.app:app --host 0.0.0.0 --port 8003 # Terminal 11 — Broker Adapter (executes trades via Alpaca) .venv\Scripts\activate set BROKER_API_KEY=your_alpaca_key set BROKER_API_SECRET=your_alpaca_secret set BROKER_BASE_URL=https://paper-api.alpaca.markets python -m services.adapters.broker_service ``` Not all services are required for basic development. The minimum set is: - **scheduler** + **ingestion** + **parser** + **extractor** — to get data flowing - **aggregation** + **recommendation** — to generate signals - **query-api** — to serve the dashboard Add the trading services when you want to test paper trading. ### Option B: Build and run as Docker containers ```powershell # Build the Python service image docker build -t stonks-oracle/services -f docker/Dockerfile . # Build the frontend docker build -t stonks-oracle/dashboard -f frontend/Dockerfile frontend/ # Run a service (example: scheduler) docker run --rm --network host ^ -e MARKET_DATA_API_KEY=your_polygon_key ^ -e POSTGRES_HOST=localhost ^ -e REDIS_HOST=localhost ^ -e MINIO_ENDPOINT=localhost:9000 ^ -e OLLAMA_BASE_URL=http://localhost:11434 ^ -e SERVICE_CMD="python -m services.scheduler.app" ^ stonks-oracle/services ``` --- ## 8. Run the Frontend Dashboard ```powershell cd frontend npm install npm run dev ``` The dashboard starts at [http://localhost:5173](http://localhost:5173). It proxies API requests to the backend services via Vite's dev server. For production-like testing, build and serve with nginx: ```powershell cd frontend docker build -t stonks-oracle/dashboard . docker run --rm -p 8080:8080 --network host stonks-oracle/dashboard ``` Then visit [http://localhost:8080](http://localhost:8080). --- ## 9. Run Tests ### Python tests ```powershell .venv\Scripts\activate pip install ruff pytest pytest-asyncio hypothesis ruff check services/ python -m pytest tests/ -x --tb=short -q ``` ### Frontend tests ```powershell cd frontend npx vitest --run ``` --- ## 10. How the Pipeline Works Once services are running, the data flows automatically: ``` Scheduler (every 15s) → enqueues ingestion jobs for due sources → Ingestion fetches articles/filings/market data from Polygon → Parser normalizes raw text → Extractor calls Ollama to extract structured intelligence → Aggregation merges signals into trend summaries → Recommendation generates buy/sell/watch signals → Trading Engine evaluates and executes paper trades → Broker Adapter submits orders to Alpaca ``` Monitor the pipeline via Redis queues: ```powershell docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:ingestion docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:parsing docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:extraction docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:aggregation docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:recommendation ``` --- ## Environment Variable Reference All services read configuration from environment variables with sensible defaults for local development. You only need to set the ones that differ from defaults. ### Required (no useful default) | Variable | Description | |----------|-------------| | `MARKET_DATA_API_KEY` | Polygon.io API key | | `BROKER_API_KEY` | Alpaca API key ID | | `BROKER_API_SECRET` | Alpaca API secret | ### Infrastructure (defaults work with docker-compose) | Variable | Default | Description | |----------|---------|-------------| | `POSTGRES_HOST` | `localhost` | PostgreSQL host | | `POSTGRES_PORT` | `5432` | PostgreSQL port | | `POSTGRES_DB` | `stonks` | Database name | | `POSTGRES_USER` | `stonks` | Database user | | `POSTGRES_PASSWORD` | `stonks_dev` | Database password | | `REDIS_HOST` | `localhost` | Redis host | | `REDIS_PORT` | `6379` | Redis port | | `REDIS_PASSWORD` | *(none)* | Redis password (not set in dev) | | `MINIO_ENDPOINT` | `localhost:9000` | MinIO API endpoint | | `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key | | `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key | | `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL | | `OLLAMA_MODEL` | `qwen3.5:9b` | LLM model name | ### Trading | Variable | Default | Description | |----------|---------|-------------| | `BROKER_MODE` | `paper` | Trading mode (`paper` or `live`) | | `BROKER_PROVIDER` | `alpaca` | Broker provider | | `BROKER_BASE_URL` | *(none)* | Alpaca API URL (set to `https://paper-api.alpaca.markets`) | --- ## 11. Integration Tests The integration test pipeline validates all API endpoints against a live Kubernetes sandbox with realistic seed data. It deploys ephemeral infrastructure (PostgreSQL, Redis, MinIO), seeds deterministic test data, deploys all API services, and runs the full test suite with profiling. ### Prerequisites - `kubectl` configured with access to a Kubernetes cluster - Docker images built and pushed to GHCR (or use `:latest`) - `envsubst` available (usually part of `gettext` package) - `GHCR_TOKEN` environment variable set for image pulls (optional if images are public) ### Running the Full Pipeline ```bash # Run with latest images bash infra/inttest/run_pipeline.sh # Run with a specific image tag bash infra/inttest/run_pipeline.sh --image-tag abc123 # Keep the sandbox running for debugging bash infra/inttest/run_pipeline.sh --skip-teardown # Custom namespace and results file bash infra/inttest/run_pipeline.sh --namespace my-test --results-file results.json ``` ### CLI Options | Option | Default | Description | |--------|---------|-------------| | `--image-tag TAG` | `latest` | Docker image tag to deploy | | `--namespace NAME` | `stonks-inttest-` | Kubernetes namespace name | | `--skip-teardown` | `false` | Leave namespace running after tests | | `--results-file PATH` | `inttest-results.json` | Path for JSON results output | ### Exit Codes | Code | Meaning | |------|---------| | 0 | All tests passed | | 1 | One or more test failures | | 2 | Infrastructure setup failure | ### JSON Result Contract The pipeline produces a JSON results file (`inttest-results.json` by default) with this structure: ```json { "run_id": "stonks-inttest-1705312800", "image_tag": "abc123", "started_at": "2025-01-15T12:00:00Z", "completed_at": "2025-01-15T12:07:30Z", "exit_code": 0, "stages": { "infra_deploy": {"duration_s": 45, "status": "ok"}, "seed_data": {"duration_s": 8, "status": "ok"}, "service_deploy": {"duration_s": 32, "status": "ok"}, "integration_tests": {"duration_s": 28, "status": "ok"}, "teardown": {"duration_s": 5, "status": "ok"} }, "tests": {"total": 41, "passed": 41, "failed": 0, "errors": 0}, "profiling": { "endpoints": {"/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45}}, "slow_endpoints": [] } } ``` ### Running Tests Locally (Development) For faster iteration during development, you can run individual test files against local services: ```bash # Start local services first (query-api on 8000, registry on 8001, etc.) # Then run specific test files: .venv/bin/python -m pytest tests/integration/test_query_api.py -v --tb=short .venv/bin/python -m pytest tests/integration/test_registry_api.py -v --tb=short .venv/bin/python -m pytest tests/integration/test_frontend_data_deps.py -v --tb=short # Run with profiling output: .venv/bin/python -m pytest tests/integration/ -v --profiling-output=profiling.json ``` Set the service URLs via environment variables: ```bash export QUERY_API_URL=http://localhost:8000 export REGISTRY_API_URL=http://localhost:8001 export RISK_API_URL=http://localhost:8002 export TRADING_API_URL=http://localhost:8003 ``` ### Future: CI/CD Pipeline This integration test runner is designed as a standalone foundation. A future CI/CD pipeline spec will consume it as one stage in a larger pipeline that includes: - Self-hosted builds on gremlin nodes (no GitHub Actions compute costs) - Staged promotion: beta → paper → live - Market-hours promotion blockers (9:30–16:00 ET) - Break-glass emergency deploy to production - Per-stage enable/disable toggles --- ## Troubleshooting ### "Connection refused" to PostgreSQL/Redis/MinIO Make sure Docker Desktop is running and `docker compose ps` shows all services healthy. On Windows, `localhost` should work since Docker Desktop maps ports to the host. ### Ollama model not found Run `docker exec stonks-oracle-ollama-1 ollama pull qwen3.5:9b` and wait for the download to complete. Check available models with `ollama list`. ### Ollama is slow (no GPU) Without a GPU, Ollama runs on CPU and extraction takes 2-5 minutes per document. If you have an NVIDIA GPU, ensure Docker Desktop has GPU support enabled and the NVIDIA Container Toolkit is installed. See [Ollama Docker GPU docs](https://github.com/ollama/ollama/blob/main/docs/docker.md). ### Migrations didn't run If the database is empty, the migrations may not have run on first start. You can apply them manually: ```powershell # Connect to postgres and run migrations in order docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/001_initial_schema.sql docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/002_documents_and_intelligence.sql # ... repeat for all 030 migration files ``` Or run them all at once: ```powershell Get-ChildItem infra\migrations\*.sql | Sort-Object Name | ForEach-Object { Write-Host "Applying $($_.Name)..." Get-Content $_.FullName | docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks } ``` ### Frontend can't reach the API When running the frontend with `npm run dev`, Vite proxies `/api/` requests. Make sure the Query API is running on port 8000. If using different ports, set the Vite env vars: ```powershell set VITE_QUERY_API_URL=http://localhost:8000 set VITE_SYMBOL_REGISTRY_URL=http://localhost:8001 set VITE_RISK_ENGINE_URL=http://localhost:8002 npm run dev ``` ### WSL 2 memory issues Docker Desktop on WSL 2 can consume a lot of memory. Create or edit `%USERPROFILE%\.wslconfig`: ```ini [wsl2] memory=8GB processors=4 ``` Then restart WSL: `wsl --shutdown` from PowerShell. --- ## Stopping Everything ```powershell # Stop infrastructure docker compose down # Stop infrastructure AND delete all data (fresh start) docker compose down -v ``` The `-v` flag removes the named volumes (database data, MinIO objects, Ollama models). Omit it to preserve data between restarts.