Files
stonks-oracle/docs/LOCAL_DEV_SETUP.md
T
2026-04-17 23:47:51 +00:00

446 lines
14 KiB
Markdown

# Stonks Oracle — Local Development Setup (Windows + Docker Desktop)
This guide walks you through setting up Stonks Oracle on a Windows machine using Docker Desktop. By the end you will have the full platform running locally: PostgreSQL, Redis, MinIO, Ollama, Trino, and all application services.
## Prerequisites
- **Windows 10/11** with WSL 2 enabled
- **Docker Desktop** for Windows (with WSL 2 backend)
- **Git** (Git for Windows or via WSL)
- **Python 3.12** (for running services outside Docker during development)
- **Node.js 24** (for frontend development)
### Install Docker Desktop
1. Download from [https://www.docker.com/products/docker-desktop/](https://www.docker.com/products/docker-desktop/)
2. During install, ensure "Use WSL 2 instead of Hyper-V" is checked
3. After install, open Docker Desktop → Settings → Resources → WSL Integration → enable for your distro
4. Allocate at least **8 GB RAM** and **4 CPUs** in Settings → Resources (Ollama needs room)
### Install Python 3.12
Download from [https://www.python.org/downloads/](https://www.python.org/downloads/) and check "Add Python to PATH" during install. Or use `winget install Python.Python.3.12` from PowerShell.
### Install Node.js 24
Download from [https://nodejs.org/](https://nodejs.org/) (LTS or Current, 24.x). Or use `winget install OpenJS.NodeJS`.
---
## 1. Register for API Accounts
You need two API accounts. Both have free tiers that work for development.
### Polygon.io (Market Data)
1. Go to [https://polygon.io/](https://polygon.io/)
2. Sign up for a free account
3. Navigate to Dashboard → API Keys
4. Copy your API key — this becomes `MARKET_DATA_API_KEY`
The free tier gives you delayed data and limited API calls. Paid tiers ($29+/mo) give real-time data and higher rate limits.
### Alpaca (Paper Trading)
1. Go to [https://alpaca.markets/](https://alpaca.markets/)
2. Sign up for a free account
3. Navigate to the **Paper Trading** dashboard (not live)
4. Go to API Keys → Generate New Key
5. Copy both the **API Key ID** and **Secret Key** — these become `BROKER_API_KEY` and `BROKER_API_SECRET`
6. Your paper trading base URL is `https://paper-api.alpaca.markets`
Alpaca paper trading is completely free with no time limit.
---
## 2. Clone the Repository
```powershell
git clone https://github.com/celesrenata/stonks-oracle.git
cd stonks-oracle
```
---
## 3. Create Your Environment File
Create a `.env` file in the project root with your API keys:
```ini
# Polygon.io
MARKET_DATA_API_KEY=your_polygon_api_key_here
# Alpaca Paper Trading
BROKER_API_KEY=your_alpaca_key_id_here
BROKER_API_SECRET=your_alpaca_secret_key_here
BROKER_BASE_URL=https://paper-api.alpaca.markets
BROKER_MODE=paper
```
This file is gitignored. Keep it safe.
---
## 4. Start Infrastructure Services
The `docker-compose.yml` in the project root defines all infrastructure services. Start them:
```powershell
docker compose up -d
```
This starts:
| Service | Port | Purpose |
|---------|------|---------|
| PostgreSQL 16 | 5432 | Primary database |
| Redis 7 | 6379 | Job queues and caching |
| MinIO | 9000 (API), 9001 (Console) | Object storage for artifacts |
| Ollama | 11434 | Local LLM inference |
| Trino | 8080 | SQL query engine for lakehouse |
| Hive Metastore | 9083 | Metadata catalog for Trino |
| Superset | 8088 | Analytics dashboards |
The `minio-init` sidecar automatically creates the required storage buckets.
### Verify everything is running
```powershell
docker compose ps
```
All services should show `running` (healthy). Give it 30-60 seconds for health checks to pass.
### Access the UIs
- **MinIO Console**: [http://localhost:9001](http://localhost:9001) — login: `minioadmin` / `minioadmin`
- **Superset**: [http://localhost:8088](http://localhost:8088) — login: `admin` / `admin`
- **Trino**: [http://localhost:8080](http://localhost:8080)
---
## 5. Pull the Ollama Model
Stonks Oracle uses the `qwen3.5:9b` model for document extraction and event classification. Pull it:
```powershell
docker exec -it stonks-oracle-ollama-1 ollama pull qwen3.5:9b
```
This downloads ~5 GB. If you have a GPU and want faster inference, make sure Docker Desktop has GPU passthrough enabled (Settings → Resources → GPU). Ollama will auto-detect CUDA GPUs.
To verify the model is available:
```powershell
docker exec stonks-oracle-ollama-1 ollama list
```
---
## 6. Set Up the Python Environment
```powershell
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
```
### Verify the database migrations ran
PostgreSQL auto-runs the migration SQL files from `infra/migrations/` on first start (they are mounted into `/docker-entrypoint-initdb.d`). Verify:
```powershell
python -c "import asyncio, asyncpg; asyncio.run(asyncpg.connect('postgresql://stonks:stonks_dev@localhost:5432/stonks').then(lambda c: print('Connected!')))"
```
Or more simply, use `psql` if you have it:
```powershell
docker exec -it stonks-oracle-postgres-1 psql -U stonks -d stonks -c "\dt" | head -20
```
You should see tables like `companies`, `documents`, `trend_windows`, `recommendations`, `orders`, etc.
### Seed the company universe
```powershell
python -m services.symbol_registry.seed
```
This populates 50 companies across 10 sectors and 46 competitor relationships.
---
## 7. Run the Application Services
You can run services directly with Python (for development) or build Docker images.
### Option A: Run directly with Python (recommended for development)
Open separate terminal windows for each service. Each needs the virtualenv activated and environment variables set:
```powershell
# Terminal 1 — Scheduler (triggers ingestion on a cadence)
.venv\Scripts\activate
set MARKET_DATA_API_KEY=your_polygon_key
python -m services.scheduler.app
# Terminal 2 — Ingestion (fetches articles, filings, market data)
.venv\Scripts\activate
set MARKET_DATA_API_KEY=your_polygon_key
python -m services.ingestion.worker
# Terminal 3 — Parser (normalizes raw documents)
.venv\Scripts\activate
python -m services.parser.worker
# Terminal 4 — Extractor (LLM-based intelligence extraction)
.venv\Scripts\activate
python -m services.extractor.main
# Terminal 5 — Aggregation (merges signals into trend summaries)
.venv\Scripts\activate
python -m services.aggregation.main
# Terminal 6 — Recommendation (generates trade recommendations)
.venv\Scripts\activate
python -m services.recommendation.main
# Terminal 7 — Query API (REST API for the dashboard)
.venv\Scripts\activate
uvicorn services.api.app:app --host 0.0.0.0 --port 8000
# Terminal 8 — Symbol Registry (company CRUD API)
.venv\Scripts\activate
uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8001
# Terminal 9 — Risk Engine
.venv\Scripts\activate
uvicorn services.risk.app:app --host 0.0.0.0 --port 8002
# Terminal 10 — Trading Engine (autonomous paper trading)
.venv\Scripts\activate
set BROKER_API_KEY=your_alpaca_key
set BROKER_API_SECRET=your_alpaca_secret
set BROKER_BASE_URL=https://paper-api.alpaca.markets
uvicorn services.trading.app:app --host 0.0.0.0 --port 8003
# Terminal 11 — Broker Adapter (executes trades via Alpaca)
.venv\Scripts\activate
set BROKER_API_KEY=your_alpaca_key
set BROKER_API_SECRET=your_alpaca_secret
set BROKER_BASE_URL=https://paper-api.alpaca.markets
python -m services.adapters.broker_service
```
Not all services are required for basic development. The minimum set is:
- **scheduler** + **ingestion** + **parser** + **extractor** — to get data flowing
- **aggregation** + **recommendation** — to generate signals
- **query-api** — to serve the dashboard
Add the trading services when you want to test paper trading.
### Option B: Build and run as Docker containers
```powershell
# Build the Python service image
docker build -t stonks-oracle/services -f docker/Dockerfile .
# Build the frontend
docker build -t stonks-oracle/dashboard -f frontend/Dockerfile frontend/
# Run a service (example: scheduler)
docker run --rm --network host ^
-e MARKET_DATA_API_KEY=your_polygon_key ^
-e POSTGRES_HOST=localhost ^
-e REDIS_HOST=localhost ^
-e MINIO_ENDPOINT=localhost:9000 ^
-e OLLAMA_BASE_URL=http://localhost:11434 ^
-e SERVICE_CMD="python -m services.scheduler.app" ^
stonks-oracle/services
```
---
## 8. Run the Frontend Dashboard
```powershell
cd frontend
npm install
npm run dev
```
The dashboard starts at [http://localhost:5173](http://localhost:5173). It proxies API requests to the backend services via Vite's dev server.
For production-like testing, build and serve with nginx:
```powershell
cd frontend
docker build -t stonks-oracle/dashboard .
docker run --rm -p 8080:8080 --network host stonks-oracle/dashboard
```
Then visit [http://localhost:8080](http://localhost:8080).
---
## 9. Run Tests
### Python tests
```powershell
.venv\Scripts\activate
pip install ruff pytest pytest-asyncio hypothesis
ruff check services/
python -m pytest tests/ -x --tb=short -q
```
### Frontend tests
```powershell
cd frontend
npx vitest --run
```
---
## 10. How the Pipeline Works
Once services are running, the data flows automatically:
```
Scheduler (every 15s)
→ enqueues ingestion jobs for due sources
→ Ingestion fetches articles/filings/market data from Polygon
→ Parser normalizes raw text
→ Extractor calls Ollama to extract structured intelligence
→ Aggregation merges signals into trend summaries
→ Recommendation generates buy/sell/watch signals
→ Trading Engine evaluates and executes paper trades
→ Broker Adapter submits orders to Alpaca
```
Monitor the pipeline via Redis queues:
```powershell
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:ingestion
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:parsing
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:extraction
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:aggregation
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:recommendation
```
---
## Environment Variable Reference
All services read configuration from environment variables with sensible defaults for local development. You only need to set the ones that differ from defaults.
### Required (no useful default)
| Variable | Description |
|----------|-------------|
| `MARKET_DATA_API_KEY` | Polygon.io API key |
| `BROKER_API_KEY` | Alpaca API key ID |
| `BROKER_API_SECRET` | Alpaca API secret |
### Infrastructure (defaults work with docker-compose)
| Variable | Default | Description |
|----------|---------|-------------|
| `POSTGRES_HOST` | `localhost` | PostgreSQL host |
| `POSTGRES_PORT` | `5432` | PostgreSQL port |
| `POSTGRES_DB` | `stonks` | Database name |
| `POSTGRES_USER` | `stonks` | Database user |
| `POSTGRES_PASSWORD` | `stonks_dev` | Database password |
| `REDIS_HOST` | `localhost` | Redis host |
| `REDIS_PORT` | `6379` | Redis port |
| `REDIS_PASSWORD` | *(none)* | Redis password (not set in dev) |
| `MINIO_ENDPOINT` | `localhost:9000` | MinIO API endpoint |
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL |
| `OLLAMA_MODEL` | `qwen3.5:9b` | LLM model name |
### Trading
| Variable | Default | Description |
|----------|---------|-------------|
| `BROKER_MODE` | `paper` | Trading mode (`paper` or `live`) |
| `BROKER_PROVIDER` | `alpaca` | Broker provider |
| `BROKER_BASE_URL` | *(none)* | Alpaca API URL (set to `https://paper-api.alpaca.markets`) |
---
## Troubleshooting
### "Connection refused" to PostgreSQL/Redis/MinIO
Make sure Docker Desktop is running and `docker compose ps` shows all services healthy. On Windows, `localhost` should work since Docker Desktop maps ports to the host.
### Ollama model not found
Run `docker exec stonks-oracle-ollama-1 ollama pull qwen3.5:9b` and wait for the download to complete. Check available models with `ollama list`.
### Ollama is slow (no GPU)
Without a GPU, Ollama runs on CPU and extraction takes 2-5 minutes per document. If you have an NVIDIA GPU, ensure Docker Desktop has GPU support enabled and the NVIDIA Container Toolkit is installed. See [Ollama Docker GPU docs](https://github.com/ollama/ollama/blob/main/docs/docker.md).
### Migrations didn't run
If the database is empty, the migrations may not have run on first start. You can apply them manually:
```powershell
# Connect to postgres and run migrations in order
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/001_initial_schema.sql
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/002_documents_and_intelligence.sql
# ... repeat for all 030 migration files
```
Or run them all at once:
```powershell
Get-ChildItem infra\migrations\*.sql | Sort-Object Name | ForEach-Object {
Write-Host "Applying $($_.Name)..."
Get-Content $_.FullName | docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks
}
```
### Frontend can't reach the API
When running the frontend with `npm run dev`, Vite proxies `/api/` requests. Make sure the Query API is running on port 8000. If using different ports, set the Vite env vars:
```powershell
set VITE_QUERY_API_URL=http://localhost:8000
set VITE_SYMBOL_REGISTRY_URL=http://localhost:8001
set VITE_RISK_ENGINE_URL=http://localhost:8002
npm run dev
```
### WSL 2 memory issues
Docker Desktop on WSL 2 can consume a lot of memory. Create or edit `%USERPROFILE%\.wslconfig`:
```ini
[wsl2]
memory=8GB
processors=4
```
Then restart WSL: `wsl --shutdown` from PowerShell.
---
## Stopping Everything
```powershell
# Stop infrastructure
docker compose down
# Stop infrastructure AND delete all data (fresh start)
docker compose down -v
```
The `-v` flag removes the named volumes (database data, MinIO objects, Ollama models). Omit it to preserve data between restarts.