stonks-oracle/docs/LOCAL_DEV_SETUP.md

# Stonks Oracle — Local Development Setup (Windows + Docker Desktop)

This guide walks you through setting up Stonks Oracle on a Windows machine using Docker Desktop. By the end you will have the full platform running locally: PostgreSQL, Redis, MinIO, Ollama, Trino, and all application services.

## Prerequisites

- **Windows 10/11** with WSL 2 enabled
- **Docker Desktop** for Windows (with WSL 2 backend)
- **Git** (Git for Windows or via WSL)
- **Python 3.12** (for running services outside Docker during development)
- **Node.js 24** (for frontend development)

### Install Docker Desktop

1. Download from [https://www.docker.com/products/docker-desktop/](https://www.docker.com/products/docker-desktop/)
2. During install, ensure "Use WSL 2 instead of Hyper-V" is checked
3. After install, open Docker Desktop → Settings → Resources → WSL Integration → enable for your distro
4. Allocate at least **8 GB RAM** and **4 CPUs** in Settings → Resources (Ollama needs room)

### Install Python 3.12

Download from [https://www.python.org/downloads/](https://www.python.org/downloads/) and check "Add Python to PATH" during install. Or use `winget install Python.Python.3.12` from PowerShell.

### Install Node.js 24

Download from [https://nodejs.org/](https://nodejs.org/) (LTS or Current, 24.x). Or use `winget install OpenJS.NodeJS`.

---

## 1. Register for API Accounts

You need two API accounts. Both have free tiers that work for development.

### Polygon.io (Market Data)

1. Go to [https://polygon.io/](https://polygon.io/)
2. Sign up for a free account
3. Navigate to Dashboard → API Keys
4. Copy your API key — this becomes `MARKET_DATA_API_KEY`

The free tier gives you delayed data and limited API calls. Paid tiers ($29+/mo) give real-time data and higher rate limits.

### Alpaca (Paper Trading)

1. Go to [https://alpaca.markets/](https://alpaca.markets/)
2. Sign up for a free account
3. Navigate to the **Paper Trading** dashboard (not live)
4. Go to API Keys → Generate New Key
5. Copy both the **API Key ID** and **Secret Key** — these become `BROKER_API_KEY` and `BROKER_API_SECRET`
6. Your paper trading base URL is `https://paper-api.alpaca.markets`

Alpaca paper trading is completely free with no time limit.

---

## 2. Clone the Repository

```powershell
git clone https://github.com/celesrenata/stonks-oracle.git
cd stonks-oracle
```

---

## 3. Create Your Environment File

Create a `.env` file in the project root with your API keys:

```ini
# Polygon.io
MARKET_DATA_API_KEY=your_polygon_api_key_here

# Alpaca Paper Trading
BROKER_API_KEY=your_alpaca_key_id_here
BROKER_API_SECRET=your_alpaca_secret_key_here
BROKER_BASE_URL=https://paper-api.alpaca.markets
BROKER_MODE=paper
```

This file is gitignored. Keep it safe.

---

## 4. Start Infrastructure Services

The `docker-compose.yml` in the project root defines all infrastructure services. Start them:

```powershell
docker compose up -d
```

This starts:

| Service | Port | Purpose |
|---------|------|---------|
| PostgreSQL 16 | 5432 | Primary database |
| Redis 7 | 6379 | Job queues and caching |
| MinIO | 9000 (API), 9001 (Console) | Object storage for artifacts |
| Ollama | 11434 | Local LLM inference |
| Trino | 8080 | SQL query engine for lakehouse |
| Hive Metastore | 9083 | Metadata catalog for Trino |
| Superset | 8088 | Analytics dashboards |

The `minio-init` sidecar automatically creates the required storage buckets.

### Verify everything is running

```powershell
docker compose ps
```

All services should show `running` (healthy). Give it 30-60 seconds for health checks to pass.

### Access the UIs

- **MinIO Console**: [http://localhost:9001](http://localhost:9001) — login: `minioadmin` / `minioadmin`
- **Superset**: [http://localhost:8088](http://localhost:8088) — login: `admin` / `admin`
- **Trino**: [http://localhost:8080](http://localhost:8080)

---

## 5. Pull the Ollama Model

Stonks Oracle uses the `qwen3.5:9b` model for document extraction and event classification. Pull it:

```powershell
docker exec -it stonks-oracle-ollama-1 ollama pull qwen3.5:9b
```

This downloads ~5 GB. If you have a GPU and want faster inference, make sure Docker Desktop has GPU passthrough enabled (Settings → Resources → GPU). Ollama will auto-detect CUDA GPUs.

To verify the model is available:

```powershell
docker exec stonks-oracle-ollama-1 ollama list
```

---

## 6. Set Up the Python Environment

```powershell
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
```

### Verify the database migrations ran

PostgreSQL auto-runs the migration SQL files from `infra/migrations/` on first start (they are mounted into `/docker-entrypoint-initdb.d`). Verify:

```powershell
python -c "import asyncio, asyncpg; asyncio.run(asyncpg.connect('postgresql://stonks:stonks_dev@localhost:5432/stonks').then(lambda c: print('Connected!')))"
```

Or more simply, use `psql` if you have it:

```powershell
docker exec -it stonks-oracle-postgres-1 psql -U stonks -d stonks -c "\dt" | head -20
```

You should see tables like `companies`, `documents`, `trend_windows`, `recommendations`, `orders`, etc.

### Seed the company universe

```powershell
python -m services.symbol_registry.seed
```

This populates 50 companies across 10 sectors and 46 competitor relationships.

---

## 7. Run the Application Services

You can run services directly with Python (for development) or build Docker images.

### Option A: Run directly with Python (recommended for development)

Open separate terminal windows for each service. Each needs the virtualenv activated and environment variables set:

```powershell
# Terminal 1 — Scheduler (triggers ingestion on a cadence)
.venv\Scripts\activate
set MARKET_DATA_API_KEY=your_polygon_key
python -m services.scheduler.app

# Terminal 2 — Ingestion (fetches articles, filings, market data)
.venv\Scripts\activate
set MARKET_DATA_API_KEY=your_polygon_key
python -m services.ingestion.worker

# Terminal 3 — Parser (normalizes raw documents)
.venv\Scripts\activate
python -m services.parser.worker

# Terminal 4 — Extractor (LLM-based intelligence extraction)
.venv\Scripts\activate
python -m services.extractor.main

# Terminal 5 — Aggregation (merges signals into trend summaries)
.venv\Scripts\activate
python -m services.aggregation.main

# Terminal 6 — Recommendation (generates trade recommendations)
.venv\Scripts\activate
python -m services.recommendation.main

# Terminal 7 — Query API (REST API for the dashboard)
.venv\Scripts\activate
uvicorn services.api.app:app --host 0.0.0.0 --port 8000

# Terminal 8 — Symbol Registry (company CRUD API)
.venv\Scripts\activate
uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8001

# Terminal 9 — Risk Engine
.venv\Scripts\activate
uvicorn services.risk.app:app --host 0.0.0.0 --port 8002

# Terminal 10 — Trading Engine (autonomous paper trading)
.venv\Scripts\activate
set BROKER_API_KEY=your_alpaca_key
set BROKER_API_SECRET=your_alpaca_secret
set BROKER_BASE_URL=https://paper-api.alpaca.markets
uvicorn services.trading.app:app --host 0.0.0.0 --port 8003

# Terminal 11 — Broker Adapter (executes trades via Alpaca)
.venv\Scripts\activate
set BROKER_API_KEY=your_alpaca_key
set BROKER_API_SECRET=your_alpaca_secret
set BROKER_BASE_URL=https://paper-api.alpaca.markets
python -m services.adapters.broker_service
```

Not all services are required for basic development. The minimum set is:

- **scheduler** + **ingestion** + **parser** + **extractor** — to get data flowing
- **aggregation** + **recommendation** — to generate signals
- **query-api** — to serve the dashboard

Add the trading services when you want to test paper trading.

### Option B: Build and run as Docker containers

```powershell
# Build the Python service image
docker build -t stonks-oracle/services -f docker/Dockerfile .

# Build the frontend
docker build -t stonks-oracle/dashboard -f frontend/Dockerfile frontend/

# Run a service (example: scheduler)
docker run --rm --network host ^
  -e MARKET_DATA_API_KEY=your_polygon_key ^
  -e POSTGRES_HOST=localhost ^
  -e REDIS_HOST=localhost ^
  -e MINIO_ENDPOINT=localhost:9000 ^
  -e OLLAMA_BASE_URL=http://localhost:11434 ^
  -e SERVICE_CMD="python -m services.scheduler.app" ^
  stonks-oracle/services
```

---

## 8. Run the Frontend Dashboard

```powershell
cd frontend
npm install
npm run dev
```

The dashboard starts at [http://localhost:5173](http://localhost:5173). It proxies API requests to the backend services via Vite's dev server.

For production-like testing, build and serve with nginx:

```powershell
cd frontend
docker build -t stonks-oracle/dashboard .
docker run --rm -p 8080:8080 --network host stonks-oracle/dashboard
```

Then visit [http://localhost:8080](http://localhost:8080).

---

## 9. Run Tests

### Python tests

```powershell
.venv\Scripts\activate
pip install ruff pytest pytest-asyncio hypothesis
ruff check services/
python -m pytest tests/ -x --tb=short -q
```

### Frontend tests

```powershell
cd frontend
npx vitest --run
```

---

## 10. How the Pipeline Works

Once services are running, the data flows automatically:

```
Scheduler (every 15s)
  → enqueues ingestion jobs for due sources
    → Ingestion fetches articles/filings/market data from Polygon
      → Parser normalizes raw text
        → Extractor calls Ollama to extract structured intelligence
          → Aggregation merges signals into trend summaries
            → Recommendation generates buy/sell/watch signals
              → Trading Engine evaluates and executes paper trades
                → Broker Adapter submits orders to Alpaca
```

Monitor the pipeline via Redis queues:

```powershell
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:ingestion
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:parsing
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:extraction
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:aggregation
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:recommendation
```

---

## Environment Variable Reference

All services read configuration from environment variables with sensible defaults for local development. You only need to set the ones that differ from defaults.

### Required (no useful default)

| Variable | Description |
|----------|-------------|
| `MARKET_DATA_API_KEY` | Polygon.io API key |
| `BROKER_API_KEY` | Alpaca API key ID |
| `BROKER_API_SECRET` | Alpaca API secret |

### Infrastructure (defaults work with docker-compose)

| Variable | Default | Description |
|----------|---------|-------------|
| `POSTGRES_HOST` | `localhost` | PostgreSQL host |
| `POSTGRES_PORT` | `5432` | PostgreSQL port |
| `POSTGRES_DB` | `stonks` | Database name |
| `POSTGRES_USER` | `stonks` | Database user |
| `POSTGRES_PASSWORD` | `stonks_dev` | Database password |
| `REDIS_HOST` | `localhost` | Redis host |
| `REDIS_PORT` | `6379` | Redis port |
| `REDIS_PASSWORD` | *(none)* | Redis password (not set in dev) |
| `MINIO_ENDPOINT` | `localhost:9000` | MinIO API endpoint |
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL |
| `OLLAMA_MODEL` | `qwen3.5:9b` | LLM model name |

### Trading

| Variable | Default | Description |
|----------|---------|-------------|
| `BROKER_MODE` | `paper` | Trading mode (`paper` or `live`) |
| `BROKER_PROVIDER` | `alpaca` | Broker provider |
| `BROKER_BASE_URL` | *(none)* | Alpaca API URL (set to `https://paper-api.alpaca.markets`) |

---

## 11. Integration Tests

The integration test pipeline validates all API endpoints against a live Kubernetes sandbox with realistic seed data. It deploys ephemeral infrastructure (PostgreSQL, Redis, MinIO), seeds deterministic test data, deploys all API services, and runs the full test suite with profiling.

### Prerequisites

- `kubectl` configured with access to a Kubernetes cluster
- Docker images built and pushed to GHCR (or use `:latest`)
- `envsubst` available (usually part of `gettext` package)
- `GHCR_TOKEN` environment variable set for image pulls (optional if images are public)

### Running the Full Pipeline

```bash
# Run with latest images
bash infra/inttest/run_pipeline.sh

# Run with a specific image tag
bash infra/inttest/run_pipeline.sh --image-tag abc123

# Keep the sandbox running for debugging
bash infra/inttest/run_pipeline.sh --skip-teardown

# Custom namespace and results file
bash infra/inttest/run_pipeline.sh --namespace my-test --results-file results.json
```

### CLI Options

| Option | Default | Description |
|--------|---------|-------------|
| `--image-tag TAG` | `latest` | Docker image tag to deploy |
| `--namespace NAME` | `stonks-inttest-<timestamp>` | Kubernetes namespace name |
| `--skip-teardown` | `false` | Leave namespace running after tests |
| `--results-file PATH` | `inttest-results.json` | Path for JSON results output |

### Exit Codes

| Code | Meaning |
|------|---------|
| 0 | All tests passed |
| 1 | One or more test failures |
| 2 | Infrastructure setup failure |

### JSON Result Contract

The pipeline produces a JSON results file (`inttest-results.json` by default) with this structure:

```json
{
  "run_id": "stonks-inttest-1705312800",
  "image_tag": "abc123",
  "started_at": "2025-01-15T12:00:00Z",
  "completed_at": "2025-01-15T12:07:30Z",
  "exit_code": 0,
  "stages": {
    "infra_deploy": {"duration_s": 45, "status": "ok"},
    "seed_data": {"duration_s": 8, "status": "ok"},
    "service_deploy": {"duration_s": 32, "status": "ok"},
    "integration_tests": {"duration_s": 28, "status": "ok"},
    "teardown": {"duration_s": 5, "status": "ok"}
  },
  "tests": {"total": 41, "passed": 41, "failed": 0, "errors": 0},
  "profiling": {
    "endpoints": {"/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45}},
    "slow_endpoints": []
  }
}
```

### Running Tests Locally (Development)

For faster iteration during development, you can run individual test files against local services:

```bash
# Start local services first (query-api on 8000, registry on 8001, etc.)
# Then run specific test files:
.venv/bin/python -m pytest tests/integration/test_query_api.py -v --tb=short
.venv/bin/python -m pytest tests/integration/test_registry_api.py -v --tb=short
.venv/bin/python -m pytest tests/integration/test_frontend_data_deps.py -v --tb=short

# Run with profiling output:
.venv/bin/python -m pytest tests/integration/ -v --profiling-output=profiling.json
```

Set the service URLs via environment variables:
```bash
export QUERY_API_URL=http://localhost:8000
export REGISTRY_API_URL=http://localhost:8001
export RISK_API_URL=http://localhost:8002
export TRADING_API_URL=http://localhost:8003
```

### Future: CI/CD Pipeline

This integration test runner is designed as a standalone foundation. A future CI/CD pipeline spec will consume it as one stage in a larger pipeline that includes:
- Self-hosted builds on gremlin nodes (no GitHub Actions compute costs)
- Staged promotion: beta → paper → live
- Market-hours promotion blockers (9:30–16:00 ET)
- Break-glass emergency deploy to production
- Per-stage enable/disable toggles

---

## Troubleshooting

### "Connection refused" to PostgreSQL/Redis/MinIO

Make sure Docker Desktop is running and `docker compose ps` shows all services healthy. On Windows, `localhost` should work since Docker Desktop maps ports to the host.

### Ollama model not found

Run `docker exec stonks-oracle-ollama-1 ollama pull qwen3.5:9b` and wait for the download to complete. Check available models with `ollama list`.

### Ollama is slow (no GPU)

Without a GPU, Ollama runs on CPU and extraction takes 2-5 minutes per document. If you have an NVIDIA GPU, ensure Docker Desktop has GPU support enabled and the NVIDIA Container Toolkit is installed. See [Ollama Docker GPU docs](https://github.com/ollama/ollama/blob/main/docs/docker.md).

### Migrations didn't run

If the database is empty, the migrations may not have run on first start. You can apply them manually:

```powershell
# Connect to postgres and run migrations in order
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/001_initial_schema.sql
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/002_documents_and_intelligence.sql
# ... repeat for all 030 migration files
```

Or run them all at once:

```powershell
Get-ChildItem infra\migrations\*.sql | Sort-Object Name | ForEach-Object {
    Write-Host "Applying $($_.Name)..."
    Get-Content $_.FullName | docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks
}
```

### Frontend can't reach the API

When running the frontend with `npm run dev`, Vite proxies `/api/` requests. Make sure the Query API is running on port 8000. If using different ports, set the Vite env vars:

```powershell
set VITE_QUERY_API_URL=http://localhost:8000
set VITE_SYMBOL_REGISTRY_URL=http://localhost:8001
set VITE_RISK_ENGINE_URL=http://localhost:8002
npm run dev
```

### WSL 2 memory issues

Docker Desktop on WSL 2 can consume a lot of memory. Create or edit `%USERPROFILE%\.wslconfig`:

```ini
[wsl2]
memory=8GB
processors=4
```

Then restart WSL: `wsl --shutdown` from PowerShell.

---

## Stopping Everything

```powershell
# Stop infrastructure
docker compose down

# Stop infrastructure AND delete all data (fresh start)
docker compose down -v
```

The `-v` flag removes the named volumes (database data, MinIO objects, Ollama models). Omit it to preserve data between restarts.