c85c0068a2
- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files - Fix 12 failing tests to match current implementation behavior - Fix pytest_plugins in non-top-level conftest (moved to root conftest.py) - Auto-fix 189 lint issues (import sorting, unused imports) - Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests) - Add values-beta.yaml and values-paper.yaml for staged deployments - Update GitHub Actions workflow to use self-hosted-gremlin runners - Add integration-test job to CI pipeline Result: 1596 passed, 0 failed, 0 warnings
550 lines
17 KiB
Markdown
550 lines
17 KiB
Markdown
# Stonks Oracle — Local Development Setup (Windows + Docker Desktop)
|
||
|
||
This guide walks you through setting up Stonks Oracle on a Windows machine using Docker Desktop. By the end you will have the full platform running locally: PostgreSQL, Redis, MinIO, Ollama, Trino, and all application services.
|
||
|
||
## Prerequisites
|
||
|
||
- **Windows 10/11** with WSL 2 enabled
|
||
- **Docker Desktop** for Windows (with WSL 2 backend)
|
||
- **Git** (Git for Windows or via WSL)
|
||
- **Python 3.12** (for running services outside Docker during development)
|
||
- **Node.js 24** (for frontend development)
|
||
|
||
### Install Docker Desktop
|
||
|
||
1. Download from [https://www.docker.com/products/docker-desktop/](https://www.docker.com/products/docker-desktop/)
|
||
2. During install, ensure "Use WSL 2 instead of Hyper-V" is checked
|
||
3. After install, open Docker Desktop → Settings → Resources → WSL Integration → enable for your distro
|
||
4. Allocate at least **8 GB RAM** and **4 CPUs** in Settings → Resources (Ollama needs room)
|
||
|
||
### Install Python 3.12
|
||
|
||
Download from [https://www.python.org/downloads/](https://www.python.org/downloads/) and check "Add Python to PATH" during install. Or use `winget install Python.Python.3.12` from PowerShell.
|
||
|
||
### Install Node.js 24
|
||
|
||
Download from [https://nodejs.org/](https://nodejs.org/) (LTS or Current, 24.x). Or use `winget install OpenJS.NodeJS`.
|
||
|
||
---
|
||
|
||
## 1. Register for API Accounts
|
||
|
||
You need two API accounts. Both have free tiers that work for development.
|
||
|
||
### Polygon.io (Market Data)
|
||
|
||
1. Go to [https://polygon.io/](https://polygon.io/)
|
||
2. Sign up for a free account
|
||
3. Navigate to Dashboard → API Keys
|
||
4. Copy your API key — this becomes `MARKET_DATA_API_KEY`
|
||
|
||
The free tier gives you delayed data and limited API calls. Paid tiers ($29+/mo) give real-time data and higher rate limits.
|
||
|
||
### Alpaca (Paper Trading)
|
||
|
||
1. Go to [https://alpaca.markets/](https://alpaca.markets/)
|
||
2. Sign up for a free account
|
||
3. Navigate to the **Paper Trading** dashboard (not live)
|
||
4. Go to API Keys → Generate New Key
|
||
5. Copy both the **API Key ID** and **Secret Key** — these become `BROKER_API_KEY` and `BROKER_API_SECRET`
|
||
6. Your paper trading base URL is `https://paper-api.alpaca.markets`
|
||
|
||
Alpaca paper trading is completely free with no time limit.
|
||
|
||
---
|
||
|
||
## 2. Clone the Repository
|
||
|
||
```powershell
|
||
git clone https://github.com/celesrenata/stonks-oracle.git
|
||
cd stonks-oracle
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Create Your Environment File
|
||
|
||
Create a `.env` file in the project root with your API keys:
|
||
|
||
```ini
|
||
# Polygon.io
|
||
MARKET_DATA_API_KEY=your_polygon_api_key_here
|
||
|
||
# Alpaca Paper Trading
|
||
BROKER_API_KEY=your_alpaca_key_id_here
|
||
BROKER_API_SECRET=your_alpaca_secret_key_here
|
||
BROKER_BASE_URL=https://paper-api.alpaca.markets
|
||
BROKER_MODE=paper
|
||
```
|
||
|
||
This file is gitignored. Keep it safe.
|
||
|
||
---
|
||
|
||
## 4. Start Infrastructure Services
|
||
|
||
The `docker-compose.yml` in the project root defines all infrastructure services. Start them:
|
||
|
||
```powershell
|
||
docker compose up -d
|
||
```
|
||
|
||
This starts:
|
||
|
||
| Service | Port | Purpose |
|
||
|---------|------|---------|
|
||
| PostgreSQL 16 | 5432 | Primary database |
|
||
| Redis 7 | 6379 | Job queues and caching |
|
||
| MinIO | 9000 (API), 9001 (Console) | Object storage for artifacts |
|
||
| Ollama | 11434 | Local LLM inference |
|
||
| Trino | 8080 | SQL query engine for lakehouse |
|
||
| Hive Metastore | 9083 | Metadata catalog for Trino |
|
||
| Superset | 8088 | Analytics dashboards |
|
||
|
||
The `minio-init` sidecar automatically creates the required storage buckets.
|
||
|
||
### Verify everything is running
|
||
|
||
```powershell
|
||
docker compose ps
|
||
```
|
||
|
||
All services should show `running` (healthy). Give it 30-60 seconds for health checks to pass.
|
||
|
||
### Access the UIs
|
||
|
||
- **MinIO Console**: [http://localhost:9001](http://localhost:9001) — login: `minioadmin` / `minioadmin`
|
||
- **Superset**: [http://localhost:8088](http://localhost:8088) — login: `admin` / `admin`
|
||
- **Trino**: [http://localhost:8080](http://localhost:8080)
|
||
|
||
---
|
||
|
||
## 5. Pull the Ollama Model
|
||
|
||
Stonks Oracle uses the `qwen3.5:9b` model for document extraction and event classification. Pull it:
|
||
|
||
```powershell
|
||
docker exec -it stonks-oracle-ollama-1 ollama pull qwen3.5:9b
|
||
```
|
||
|
||
This downloads ~5 GB. If you have a GPU and want faster inference, make sure Docker Desktop has GPU passthrough enabled (Settings → Resources → GPU). Ollama will auto-detect CUDA GPUs.
|
||
|
||
To verify the model is available:
|
||
|
||
```powershell
|
||
docker exec stonks-oracle-ollama-1 ollama list
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Set Up the Python Environment
|
||
|
||
```powershell
|
||
python -m venv .venv
|
||
.venv\Scripts\activate
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### Verify the database migrations ran
|
||
|
||
PostgreSQL auto-runs the migration SQL files from `infra/migrations/` on first start (they are mounted into `/docker-entrypoint-initdb.d`). Verify:
|
||
|
||
```powershell
|
||
python -c "import asyncio, asyncpg; asyncio.run(asyncpg.connect('postgresql://stonks:stonks_dev@localhost:5432/stonks').then(lambda c: print('Connected!')))"
|
||
```
|
||
|
||
Or more simply, use `psql` if you have it:
|
||
|
||
```powershell
|
||
docker exec -it stonks-oracle-postgres-1 psql -U stonks -d stonks -c "\dt" | head -20
|
||
```
|
||
|
||
You should see tables like `companies`, `documents`, `trend_windows`, `recommendations`, `orders`, etc.
|
||
|
||
### Seed the company universe
|
||
|
||
```powershell
|
||
python -m services.symbol_registry.seed
|
||
```
|
||
|
||
This populates 50 companies across 10 sectors and 46 competitor relationships.
|
||
|
||
---
|
||
|
||
## 7. Run the Application Services
|
||
|
||
You can run services directly with Python (for development) or build Docker images.
|
||
|
||
### Option A: Run directly with Python (recommended for development)
|
||
|
||
Open separate terminal windows for each service. Each needs the virtualenv activated and environment variables set:
|
||
|
||
```powershell
|
||
# Terminal 1 — Scheduler (triggers ingestion on a cadence)
|
||
.venv\Scripts\activate
|
||
set MARKET_DATA_API_KEY=your_polygon_key
|
||
python -m services.scheduler.app
|
||
|
||
# Terminal 2 — Ingestion (fetches articles, filings, market data)
|
||
.venv\Scripts\activate
|
||
set MARKET_DATA_API_KEY=your_polygon_key
|
||
python -m services.ingestion.worker
|
||
|
||
# Terminal 3 — Parser (normalizes raw documents)
|
||
.venv\Scripts\activate
|
||
python -m services.parser.worker
|
||
|
||
# Terminal 4 — Extractor (LLM-based intelligence extraction)
|
||
.venv\Scripts\activate
|
||
python -m services.extractor.main
|
||
|
||
# Terminal 5 — Aggregation (merges signals into trend summaries)
|
||
.venv\Scripts\activate
|
||
python -m services.aggregation.main
|
||
|
||
# Terminal 6 — Recommendation (generates trade recommendations)
|
||
.venv\Scripts\activate
|
||
python -m services.recommendation.main
|
||
|
||
# Terminal 7 — Query API (REST API for the dashboard)
|
||
.venv\Scripts\activate
|
||
uvicorn services.api.app:app --host 0.0.0.0 --port 8000
|
||
|
||
# Terminal 8 — Symbol Registry (company CRUD API)
|
||
.venv\Scripts\activate
|
||
uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8001
|
||
|
||
# Terminal 9 — Risk Engine
|
||
.venv\Scripts\activate
|
||
uvicorn services.risk.app:app --host 0.0.0.0 --port 8002
|
||
|
||
# Terminal 10 — Trading Engine (autonomous paper trading)
|
||
.venv\Scripts\activate
|
||
set BROKER_API_KEY=your_alpaca_key
|
||
set BROKER_API_SECRET=your_alpaca_secret
|
||
set BROKER_BASE_URL=https://paper-api.alpaca.markets
|
||
uvicorn services.trading.app:app --host 0.0.0.0 --port 8003
|
||
|
||
# Terminal 11 — Broker Adapter (executes trades via Alpaca)
|
||
.venv\Scripts\activate
|
||
set BROKER_API_KEY=your_alpaca_key
|
||
set BROKER_API_SECRET=your_alpaca_secret
|
||
set BROKER_BASE_URL=https://paper-api.alpaca.markets
|
||
python -m services.adapters.broker_service
|
||
```
|
||
|
||
Not all services are required for basic development. The minimum set is:
|
||
|
||
- **scheduler** + **ingestion** + **parser** + **extractor** — to get data flowing
|
||
- **aggregation** + **recommendation** — to generate signals
|
||
- **query-api** — to serve the dashboard
|
||
|
||
Add the trading services when you want to test paper trading.
|
||
|
||
### Option B: Build and run as Docker containers
|
||
|
||
```powershell
|
||
# Build the Python service image
|
||
docker build -t stonks-oracle/services -f docker/Dockerfile .
|
||
|
||
# Build the frontend
|
||
docker build -t stonks-oracle/dashboard -f frontend/Dockerfile frontend/
|
||
|
||
# Run a service (example: scheduler)
|
||
docker run --rm --network host ^
|
||
-e MARKET_DATA_API_KEY=your_polygon_key ^
|
||
-e POSTGRES_HOST=localhost ^
|
||
-e REDIS_HOST=localhost ^
|
||
-e MINIO_ENDPOINT=localhost:9000 ^
|
||
-e OLLAMA_BASE_URL=http://localhost:11434 ^
|
||
-e SERVICE_CMD="python -m services.scheduler.app" ^
|
||
stonks-oracle/services
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Run the Frontend Dashboard
|
||
|
||
```powershell
|
||
cd frontend
|
||
npm install
|
||
npm run dev
|
||
```
|
||
|
||
The dashboard starts at [http://localhost:5173](http://localhost:5173). It proxies API requests to the backend services via Vite's dev server.
|
||
|
||
For production-like testing, build and serve with nginx:
|
||
|
||
```powershell
|
||
cd frontend
|
||
docker build -t stonks-oracle/dashboard .
|
||
docker run --rm -p 8080:8080 --network host stonks-oracle/dashboard
|
||
```
|
||
|
||
Then visit [http://localhost:8080](http://localhost:8080).
|
||
|
||
---
|
||
|
||
## 9. Run Tests
|
||
|
||
### Python tests
|
||
|
||
```powershell
|
||
.venv\Scripts\activate
|
||
pip install ruff pytest pytest-asyncio hypothesis
|
||
ruff check services/
|
||
python -m pytest tests/ -x --tb=short -q
|
||
```
|
||
|
||
### Frontend tests
|
||
|
||
```powershell
|
||
cd frontend
|
||
npx vitest --run
|
||
```
|
||
|
||
---
|
||
|
||
## 10. How the Pipeline Works
|
||
|
||
Once services are running, the data flows automatically:
|
||
|
||
```
|
||
Scheduler (every 15s)
|
||
→ enqueues ingestion jobs for due sources
|
||
→ Ingestion fetches articles/filings/market data from Polygon
|
||
→ Parser normalizes raw text
|
||
→ Extractor calls Ollama to extract structured intelligence
|
||
→ Aggregation merges signals into trend summaries
|
||
→ Recommendation generates buy/sell/watch signals
|
||
→ Trading Engine evaluates and executes paper trades
|
||
→ Broker Adapter submits orders to Alpaca
|
||
```
|
||
|
||
Monitor the pipeline via Redis queues:
|
||
|
||
```powershell
|
||
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:ingestion
|
||
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:parsing
|
||
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:extraction
|
||
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:aggregation
|
||
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:recommendation
|
||
```
|
||
|
||
---
|
||
|
||
## Environment Variable Reference
|
||
|
||
All services read configuration from environment variables with sensible defaults for local development. You only need to set the ones that differ from defaults.
|
||
|
||
### Required (no useful default)
|
||
|
||
| Variable | Description |
|
||
|----------|-------------|
|
||
| `MARKET_DATA_API_KEY` | Polygon.io API key |
|
||
| `BROKER_API_KEY` | Alpaca API key ID |
|
||
| `BROKER_API_SECRET` | Alpaca API secret |
|
||
|
||
### Infrastructure (defaults work with docker-compose)
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `POSTGRES_HOST` | `localhost` | PostgreSQL host |
|
||
| `POSTGRES_PORT` | `5432` | PostgreSQL port |
|
||
| `POSTGRES_DB` | `stonks` | Database name |
|
||
| `POSTGRES_USER` | `stonks` | Database user |
|
||
| `POSTGRES_PASSWORD` | `stonks_dev` | Database password |
|
||
| `REDIS_HOST` | `localhost` | Redis host |
|
||
| `REDIS_PORT` | `6379` | Redis port |
|
||
| `REDIS_PASSWORD` | *(none)* | Redis password (not set in dev) |
|
||
| `MINIO_ENDPOINT` | `localhost:9000` | MinIO API endpoint |
|
||
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
|
||
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
|
||
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL |
|
||
| `OLLAMA_MODEL` | `qwen3.5:9b` | LLM model name |
|
||
|
||
### Trading
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `BROKER_MODE` | `paper` | Trading mode (`paper` or `live`) |
|
||
| `BROKER_PROVIDER` | `alpaca` | Broker provider |
|
||
| `BROKER_BASE_URL` | *(none)* | Alpaca API URL (set to `https://paper-api.alpaca.markets`) |
|
||
|
||
---
|
||
|
||
## 11. Integration Tests
|
||
|
||
The integration test pipeline validates all API endpoints against a live Kubernetes sandbox with realistic seed data. It deploys ephemeral infrastructure (PostgreSQL, Redis, MinIO), seeds deterministic test data, deploys all API services, and runs the full test suite with profiling.
|
||
|
||
### Prerequisites
|
||
|
||
- `kubectl` configured with access to a Kubernetes cluster
|
||
- Docker images built and pushed to GHCR (or use `:latest`)
|
||
- `envsubst` available (usually part of `gettext` package)
|
||
- `GHCR_TOKEN` environment variable set for image pulls (optional if images are public)
|
||
|
||
### Running the Full Pipeline
|
||
|
||
```bash
|
||
# Run with latest images
|
||
bash infra/inttest/run_pipeline.sh
|
||
|
||
# Run with a specific image tag
|
||
bash infra/inttest/run_pipeline.sh --image-tag abc123
|
||
|
||
# Keep the sandbox running for debugging
|
||
bash infra/inttest/run_pipeline.sh --skip-teardown
|
||
|
||
# Custom namespace and results file
|
||
bash infra/inttest/run_pipeline.sh --namespace my-test --results-file results.json
|
||
```
|
||
|
||
### CLI Options
|
||
|
||
| Option | Default | Description |
|
||
|--------|---------|-------------|
|
||
| `--image-tag TAG` | `latest` | Docker image tag to deploy |
|
||
| `--namespace NAME` | `stonks-inttest-<timestamp>` | Kubernetes namespace name |
|
||
| `--skip-teardown` | `false` | Leave namespace running after tests |
|
||
| `--results-file PATH` | `inttest-results.json` | Path for JSON results output |
|
||
|
||
### Exit Codes
|
||
|
||
| Code | Meaning |
|
||
|------|---------|
|
||
| 0 | All tests passed |
|
||
| 1 | One or more test failures |
|
||
| 2 | Infrastructure setup failure |
|
||
|
||
### JSON Result Contract
|
||
|
||
The pipeline produces a JSON results file (`inttest-results.json` by default) with this structure:
|
||
|
||
```json
|
||
{
|
||
"run_id": "stonks-inttest-1705312800",
|
||
"image_tag": "abc123",
|
||
"started_at": "2025-01-15T12:00:00Z",
|
||
"completed_at": "2025-01-15T12:07:30Z",
|
||
"exit_code": 0,
|
||
"stages": {
|
||
"infra_deploy": {"duration_s": 45, "status": "ok"},
|
||
"seed_data": {"duration_s": 8, "status": "ok"},
|
||
"service_deploy": {"duration_s": 32, "status": "ok"},
|
||
"integration_tests": {"duration_s": 28, "status": "ok"},
|
||
"teardown": {"duration_s": 5, "status": "ok"}
|
||
},
|
||
"tests": {"total": 41, "passed": 41, "failed": 0, "errors": 0},
|
||
"profiling": {
|
||
"endpoints": {"/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45}},
|
||
"slow_endpoints": []
|
||
}
|
||
}
|
||
```
|
||
|
||
### Running Tests Locally (Development)
|
||
|
||
For faster iteration during development, you can run individual test files against local services:
|
||
|
||
```bash
|
||
# Start local services first (query-api on 8000, registry on 8001, etc.)
|
||
# Then run specific test files:
|
||
.venv/bin/python -m pytest tests/integration/test_query_api.py -v --tb=short
|
||
.venv/bin/python -m pytest tests/integration/test_registry_api.py -v --tb=short
|
||
.venv/bin/python -m pytest tests/integration/test_frontend_data_deps.py -v --tb=short
|
||
|
||
# Run with profiling output:
|
||
.venv/bin/python -m pytest tests/integration/ -v --profiling-output=profiling.json
|
||
```
|
||
|
||
Set the service URLs via environment variables:
|
||
```bash
|
||
export QUERY_API_URL=http://localhost:8000
|
||
export REGISTRY_API_URL=http://localhost:8001
|
||
export RISK_API_URL=http://localhost:8002
|
||
export TRADING_API_URL=http://localhost:8003
|
||
```
|
||
|
||
### Future: CI/CD Pipeline
|
||
|
||
This integration test runner is designed as a standalone foundation. A future CI/CD pipeline spec will consume it as one stage in a larger pipeline that includes:
|
||
- Self-hosted builds on gremlin nodes (no GitHub Actions compute costs)
|
||
- Staged promotion: beta → paper → live
|
||
- Market-hours promotion blockers (9:30–16:00 ET)
|
||
- Break-glass emergency deploy to production
|
||
- Per-stage enable/disable toggles
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
### "Connection refused" to PostgreSQL/Redis/MinIO
|
||
|
||
Make sure Docker Desktop is running and `docker compose ps` shows all services healthy. On Windows, `localhost` should work since Docker Desktop maps ports to the host.
|
||
|
||
### Ollama model not found
|
||
|
||
Run `docker exec stonks-oracle-ollama-1 ollama pull qwen3.5:9b` and wait for the download to complete. Check available models with `ollama list`.
|
||
|
||
### Ollama is slow (no GPU)
|
||
|
||
Without a GPU, Ollama runs on CPU and extraction takes 2-5 minutes per document. If you have an NVIDIA GPU, ensure Docker Desktop has GPU support enabled and the NVIDIA Container Toolkit is installed. See [Ollama Docker GPU docs](https://github.com/ollama/ollama/blob/main/docs/docker.md).
|
||
|
||
### Migrations didn't run
|
||
|
||
If the database is empty, the migrations may not have run on first start. You can apply them manually:
|
||
|
||
```powershell
|
||
# Connect to postgres and run migrations in order
|
||
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/001_initial_schema.sql
|
||
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/002_documents_and_intelligence.sql
|
||
# ... repeat for all 030 migration files
|
||
```
|
||
|
||
Or run them all at once:
|
||
|
||
```powershell
|
||
Get-ChildItem infra\migrations\*.sql | Sort-Object Name | ForEach-Object {
|
||
Write-Host "Applying $($_.Name)..."
|
||
Get-Content $_.FullName | docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks
|
||
}
|
||
```
|
||
|
||
### Frontend can't reach the API
|
||
|
||
When running the frontend with `npm run dev`, Vite proxies `/api/` requests. Make sure the Query API is running on port 8000. If using different ports, set the Vite env vars:
|
||
|
||
```powershell
|
||
set VITE_QUERY_API_URL=http://localhost:8000
|
||
set VITE_SYMBOL_REGISTRY_URL=http://localhost:8001
|
||
set VITE_RISK_ENGINE_URL=http://localhost:8002
|
||
npm run dev
|
||
```
|
||
|
||
### WSL 2 memory issues
|
||
|
||
Docker Desktop on WSL 2 can consume a lot of memory. Create or edit `%USERPROFILE%\.wslconfig`:
|
||
|
||
```ini
|
||
[wsl2]
|
||
memory=8GB
|
||
processors=4
|
||
```
|
||
|
||
Then restart WSL: `wsl --shutdown` from PowerShell.
|
||
|
||
---
|
||
|
||
## Stopping Everything
|
||
|
||
```powershell
|
||
# Stop infrastructure
|
||
docker compose down
|
||
|
||
# Stop infrastructure AND delete all data (fresh start)
|
||
docker compose down -v
|
||
```
|
||
|
||
The `-v` flag removes the named volumes (database data, MinIO objects, Ollama models). Omit it to preserve data between restarts.
|