Files
Celes Renata c85c0068a2 fix: clean up utcnow deprecation warnings, fix 12 failing tests, add CI/CD pipeline manifests
- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files
- Fix 12 failing tests to match current implementation behavior
- Fix pytest_plugins in non-top-level conftest (moved to root conftest.py)
- Auto-fix 189 lint issues (import sorting, unused imports)
- Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests)
- Add values-beta.yaml and values-paper.yaml for staged deployments
- Update GitHub Actions workflow to use self-hosted-gremlin runners
- Add integration-test job to CI pipeline

Result: 1596 passed, 0 failed, 0 warnings
2026-04-18 03:59:28 +00:00

550 lines
17 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Stonks Oracle — Local Development Setup (Windows + Docker Desktop)
This guide walks you through setting up Stonks Oracle on a Windows machine using Docker Desktop. By the end you will have the full platform running locally: PostgreSQL, Redis, MinIO, Ollama, Trino, and all application services.
## Prerequisites
- **Windows 10/11** with WSL 2 enabled
- **Docker Desktop** for Windows (with WSL 2 backend)
- **Git** (Git for Windows or via WSL)
- **Python 3.12** (for running services outside Docker during development)
- **Node.js 24** (for frontend development)
### Install Docker Desktop
1. Download from [https://www.docker.com/products/docker-desktop/](https://www.docker.com/products/docker-desktop/)
2. During install, ensure "Use WSL 2 instead of Hyper-V" is checked
3. After install, open Docker Desktop → Settings → Resources → WSL Integration → enable for your distro
4. Allocate at least **8 GB RAM** and **4 CPUs** in Settings → Resources (Ollama needs room)
### Install Python 3.12
Download from [https://www.python.org/downloads/](https://www.python.org/downloads/) and check "Add Python to PATH" during install. Or use `winget install Python.Python.3.12` from PowerShell.
### Install Node.js 24
Download from [https://nodejs.org/](https://nodejs.org/) (LTS or Current, 24.x). Or use `winget install OpenJS.NodeJS`.
---
## 1. Register for API Accounts
You need two API accounts. Both have free tiers that work for development.
### Polygon.io (Market Data)
1. Go to [https://polygon.io/](https://polygon.io/)
2. Sign up for a free account
3. Navigate to Dashboard → API Keys
4. Copy your API key — this becomes `MARKET_DATA_API_KEY`
The free tier gives you delayed data and limited API calls. Paid tiers ($29+/mo) give real-time data and higher rate limits.
### Alpaca (Paper Trading)
1. Go to [https://alpaca.markets/](https://alpaca.markets/)
2. Sign up for a free account
3. Navigate to the **Paper Trading** dashboard (not live)
4. Go to API Keys → Generate New Key
5. Copy both the **API Key ID** and **Secret Key** — these become `BROKER_API_KEY` and `BROKER_API_SECRET`
6. Your paper trading base URL is `https://paper-api.alpaca.markets`
Alpaca paper trading is completely free with no time limit.
---
## 2. Clone the Repository
```powershell
git clone https://github.com/celesrenata/stonks-oracle.git
cd stonks-oracle
```
---
## 3. Create Your Environment File
Create a `.env` file in the project root with your API keys:
```ini
# Polygon.io
MARKET_DATA_API_KEY=your_polygon_api_key_here
# Alpaca Paper Trading
BROKER_API_KEY=your_alpaca_key_id_here
BROKER_API_SECRET=your_alpaca_secret_key_here
BROKER_BASE_URL=https://paper-api.alpaca.markets
BROKER_MODE=paper
```
This file is gitignored. Keep it safe.
---
## 4. Start Infrastructure Services
The `docker-compose.yml` in the project root defines all infrastructure services. Start them:
```powershell
docker compose up -d
```
This starts:
| Service | Port | Purpose |
|---------|------|---------|
| PostgreSQL 16 | 5432 | Primary database |
| Redis 7 | 6379 | Job queues and caching |
| MinIO | 9000 (API), 9001 (Console) | Object storage for artifacts |
| Ollama | 11434 | Local LLM inference |
| Trino | 8080 | SQL query engine for lakehouse |
| Hive Metastore | 9083 | Metadata catalog for Trino |
| Superset | 8088 | Analytics dashboards |
The `minio-init` sidecar automatically creates the required storage buckets.
### Verify everything is running
```powershell
docker compose ps
```
All services should show `running` (healthy). Give it 30-60 seconds for health checks to pass.
### Access the UIs
- **MinIO Console**: [http://localhost:9001](http://localhost:9001) — login: `minioadmin` / `minioadmin`
- **Superset**: [http://localhost:8088](http://localhost:8088) — login: `admin` / `admin`
- **Trino**: [http://localhost:8080](http://localhost:8080)
---
## 5. Pull the Ollama Model
Stonks Oracle uses the `qwen3.5:9b` model for document extraction and event classification. Pull it:
```powershell
docker exec -it stonks-oracle-ollama-1 ollama pull qwen3.5:9b
```
This downloads ~5 GB. If you have a GPU and want faster inference, make sure Docker Desktop has GPU passthrough enabled (Settings → Resources → GPU). Ollama will auto-detect CUDA GPUs.
To verify the model is available:
```powershell
docker exec stonks-oracle-ollama-1 ollama list
```
---
## 6. Set Up the Python Environment
```powershell
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
```
### Verify the database migrations ran
PostgreSQL auto-runs the migration SQL files from `infra/migrations/` on first start (they are mounted into `/docker-entrypoint-initdb.d`). Verify:
```powershell
python -c "import asyncio, asyncpg; asyncio.run(asyncpg.connect('postgresql://stonks:stonks_dev@localhost:5432/stonks').then(lambda c: print('Connected!')))"
```
Or more simply, use `psql` if you have it:
```powershell
docker exec -it stonks-oracle-postgres-1 psql -U stonks -d stonks -c "\dt" | head -20
```
You should see tables like `companies`, `documents`, `trend_windows`, `recommendations`, `orders`, etc.
### Seed the company universe
```powershell
python -m services.symbol_registry.seed
```
This populates 50 companies across 10 sectors and 46 competitor relationships.
---
## 7. Run the Application Services
You can run services directly with Python (for development) or build Docker images.
### Option A: Run directly with Python (recommended for development)
Open separate terminal windows for each service. Each needs the virtualenv activated and environment variables set:
```powershell
# Terminal 1 — Scheduler (triggers ingestion on a cadence)
.venv\Scripts\activate
set MARKET_DATA_API_KEY=your_polygon_key
python -m services.scheduler.app
# Terminal 2 — Ingestion (fetches articles, filings, market data)
.venv\Scripts\activate
set MARKET_DATA_API_KEY=your_polygon_key
python -m services.ingestion.worker
# Terminal 3 — Parser (normalizes raw documents)
.venv\Scripts\activate
python -m services.parser.worker
# Terminal 4 — Extractor (LLM-based intelligence extraction)
.venv\Scripts\activate
python -m services.extractor.main
# Terminal 5 — Aggregation (merges signals into trend summaries)
.venv\Scripts\activate
python -m services.aggregation.main
# Terminal 6 — Recommendation (generates trade recommendations)
.venv\Scripts\activate
python -m services.recommendation.main
# Terminal 7 — Query API (REST API for the dashboard)
.venv\Scripts\activate
uvicorn services.api.app:app --host 0.0.0.0 --port 8000
# Terminal 8 — Symbol Registry (company CRUD API)
.venv\Scripts\activate
uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8001
# Terminal 9 — Risk Engine
.venv\Scripts\activate
uvicorn services.risk.app:app --host 0.0.0.0 --port 8002
# Terminal 10 — Trading Engine (autonomous paper trading)
.venv\Scripts\activate
set BROKER_API_KEY=your_alpaca_key
set BROKER_API_SECRET=your_alpaca_secret
set BROKER_BASE_URL=https://paper-api.alpaca.markets
uvicorn services.trading.app:app --host 0.0.0.0 --port 8003
# Terminal 11 — Broker Adapter (executes trades via Alpaca)
.venv\Scripts\activate
set BROKER_API_KEY=your_alpaca_key
set BROKER_API_SECRET=your_alpaca_secret
set BROKER_BASE_URL=https://paper-api.alpaca.markets
python -m services.adapters.broker_service
```
Not all services are required for basic development. The minimum set is:
- **scheduler** + **ingestion** + **parser** + **extractor** — to get data flowing
- **aggregation** + **recommendation** — to generate signals
- **query-api** — to serve the dashboard
Add the trading services when you want to test paper trading.
### Option B: Build and run as Docker containers
```powershell
# Build the Python service image
docker build -t stonks-oracle/services -f docker/Dockerfile .
# Build the frontend
docker build -t stonks-oracle/dashboard -f frontend/Dockerfile frontend/
# Run a service (example: scheduler)
docker run --rm --network host ^
-e MARKET_DATA_API_KEY=your_polygon_key ^
-e POSTGRES_HOST=localhost ^
-e REDIS_HOST=localhost ^
-e MINIO_ENDPOINT=localhost:9000 ^
-e OLLAMA_BASE_URL=http://localhost:11434 ^
-e SERVICE_CMD="python -m services.scheduler.app" ^
stonks-oracle/services
```
---
## 8. Run the Frontend Dashboard
```powershell
cd frontend
npm install
npm run dev
```
The dashboard starts at [http://localhost:5173](http://localhost:5173). It proxies API requests to the backend services via Vite's dev server.
For production-like testing, build and serve with nginx:
```powershell
cd frontend
docker build -t stonks-oracle/dashboard .
docker run --rm -p 8080:8080 --network host stonks-oracle/dashboard
```
Then visit [http://localhost:8080](http://localhost:8080).
---
## 9. Run Tests
### Python tests
```powershell
.venv\Scripts\activate
pip install ruff pytest pytest-asyncio hypothesis
ruff check services/
python -m pytest tests/ -x --tb=short -q
```
### Frontend tests
```powershell
cd frontend
npx vitest --run
```
---
## 10. How the Pipeline Works
Once services are running, the data flows automatically:
```
Scheduler (every 15s)
→ enqueues ingestion jobs for due sources
→ Ingestion fetches articles/filings/market data from Polygon
→ Parser normalizes raw text
→ Extractor calls Ollama to extract structured intelligence
→ Aggregation merges signals into trend summaries
→ Recommendation generates buy/sell/watch signals
→ Trading Engine evaluates and executes paper trades
→ Broker Adapter submits orders to Alpaca
```
Monitor the pipeline via Redis queues:
```powershell
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:ingestion
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:parsing
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:extraction
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:aggregation
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:recommendation
```
---
## Environment Variable Reference
All services read configuration from environment variables with sensible defaults for local development. You only need to set the ones that differ from defaults.
### Required (no useful default)
| Variable | Description |
|----------|-------------|
| `MARKET_DATA_API_KEY` | Polygon.io API key |
| `BROKER_API_KEY` | Alpaca API key ID |
| `BROKER_API_SECRET` | Alpaca API secret |
### Infrastructure (defaults work with docker-compose)
| Variable | Default | Description |
|----------|---------|-------------|
| `POSTGRES_HOST` | `localhost` | PostgreSQL host |
| `POSTGRES_PORT` | `5432` | PostgreSQL port |
| `POSTGRES_DB` | `stonks` | Database name |
| `POSTGRES_USER` | `stonks` | Database user |
| `POSTGRES_PASSWORD` | `stonks_dev` | Database password |
| `REDIS_HOST` | `localhost` | Redis host |
| `REDIS_PORT` | `6379` | Redis port |
| `REDIS_PASSWORD` | *(none)* | Redis password (not set in dev) |
| `MINIO_ENDPOINT` | `localhost:9000` | MinIO API endpoint |
| `MINIO_ACCESS_KEY` | `minioadmin` | MinIO access key |
| `MINIO_SECRET_KEY` | `minioadmin` | MinIO secret key |
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL |
| `OLLAMA_MODEL` | `qwen3.5:9b` | LLM model name |
### Trading
| Variable | Default | Description |
|----------|---------|-------------|
| `BROKER_MODE` | `paper` | Trading mode (`paper` or `live`) |
| `BROKER_PROVIDER` | `alpaca` | Broker provider |
| `BROKER_BASE_URL` | *(none)* | Alpaca API URL (set to `https://paper-api.alpaca.markets`) |
---
## 11. Integration Tests
The integration test pipeline validates all API endpoints against a live Kubernetes sandbox with realistic seed data. It deploys ephemeral infrastructure (PostgreSQL, Redis, MinIO), seeds deterministic test data, deploys all API services, and runs the full test suite with profiling.
### Prerequisites
- `kubectl` configured with access to a Kubernetes cluster
- Docker images built and pushed to GHCR (or use `:latest`)
- `envsubst` available (usually part of `gettext` package)
- `GHCR_TOKEN` environment variable set for image pulls (optional if images are public)
### Running the Full Pipeline
```bash
# Run with latest images
bash infra/inttest/run_pipeline.sh
# Run with a specific image tag
bash infra/inttest/run_pipeline.sh --image-tag abc123
# Keep the sandbox running for debugging
bash infra/inttest/run_pipeline.sh --skip-teardown
# Custom namespace and results file
bash infra/inttest/run_pipeline.sh --namespace my-test --results-file results.json
```
### CLI Options
| Option | Default | Description |
|--------|---------|-------------|
| `--image-tag TAG` | `latest` | Docker image tag to deploy |
| `--namespace NAME` | `stonks-inttest-<timestamp>` | Kubernetes namespace name |
| `--skip-teardown` | `false` | Leave namespace running after tests |
| `--results-file PATH` | `inttest-results.json` | Path for JSON results output |
### Exit Codes
| Code | Meaning |
|------|---------|
| 0 | All tests passed |
| 1 | One or more test failures |
| 2 | Infrastructure setup failure |
### JSON Result Contract
The pipeline produces a JSON results file (`inttest-results.json` by default) with this structure:
```json
{
"run_id": "stonks-inttest-1705312800",
"image_tag": "abc123",
"started_at": "2025-01-15T12:00:00Z",
"completed_at": "2025-01-15T12:07:30Z",
"exit_code": 0,
"stages": {
"infra_deploy": {"duration_s": 45, "status": "ok"},
"seed_data": {"duration_s": 8, "status": "ok"},
"service_deploy": {"duration_s": 32, "status": "ok"},
"integration_tests": {"duration_s": 28, "status": "ok"},
"teardown": {"duration_s": 5, "status": "ok"}
},
"tests": {"total": 41, "passed": 41, "failed": 0, "errors": 0},
"profiling": {
"endpoints": {"/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45}},
"slow_endpoints": []
}
}
```
### Running Tests Locally (Development)
For faster iteration during development, you can run individual test files against local services:
```bash
# Start local services first (query-api on 8000, registry on 8001, etc.)
# Then run specific test files:
.venv/bin/python -m pytest tests/integration/test_query_api.py -v --tb=short
.venv/bin/python -m pytest tests/integration/test_registry_api.py -v --tb=short
.venv/bin/python -m pytest tests/integration/test_frontend_data_deps.py -v --tb=short
# Run with profiling output:
.venv/bin/python -m pytest tests/integration/ -v --profiling-output=profiling.json
```
Set the service URLs via environment variables:
```bash
export QUERY_API_URL=http://localhost:8000
export REGISTRY_API_URL=http://localhost:8001
export RISK_API_URL=http://localhost:8002
export TRADING_API_URL=http://localhost:8003
```
### Future: CI/CD Pipeline
This integration test runner is designed as a standalone foundation. A future CI/CD pipeline spec will consume it as one stage in a larger pipeline that includes:
- Self-hosted builds on gremlin nodes (no GitHub Actions compute costs)
- Staged promotion: beta → paper → live
- Market-hours promotion blockers (9:3016:00 ET)
- Break-glass emergency deploy to production
- Per-stage enable/disable toggles
---
## Troubleshooting
### "Connection refused" to PostgreSQL/Redis/MinIO
Make sure Docker Desktop is running and `docker compose ps` shows all services healthy. On Windows, `localhost` should work since Docker Desktop maps ports to the host.
### Ollama model not found
Run `docker exec stonks-oracle-ollama-1 ollama pull qwen3.5:9b` and wait for the download to complete. Check available models with `ollama list`.
### Ollama is slow (no GPU)
Without a GPU, Ollama runs on CPU and extraction takes 2-5 minutes per document. If you have an NVIDIA GPU, ensure Docker Desktop has GPU support enabled and the NVIDIA Container Toolkit is installed. See [Ollama Docker GPU docs](https://github.com/ollama/ollama/blob/main/docs/docker.md).
### Migrations didn't run
If the database is empty, the migrations may not have run on first start. You can apply them manually:
```powershell
# Connect to postgres and run migrations in order
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/001_initial_schema.sql
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/002_documents_and_intelligence.sql
# ... repeat for all 030 migration files
```
Or run them all at once:
```powershell
Get-ChildItem infra\migrations\*.sql | Sort-Object Name | ForEach-Object {
Write-Host "Applying $($_.Name)..."
Get-Content $_.FullName | docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks
}
```
### Frontend can't reach the API
When running the frontend with `npm run dev`, Vite proxies `/api/` requests. Make sure the Query API is running on port 8000. If using different ports, set the Vite env vars:
```powershell
set VITE_QUERY_API_URL=http://localhost:8000
set VITE_SYMBOL_REGISTRY_URL=http://localhost:8001
set VITE_RISK_ENGINE_URL=http://localhost:8002
npm run dev
```
### WSL 2 memory issues
Docker Desktop on WSL 2 can consume a lot of memory. Create or edit `%USERPROFILE%\.wslconfig`:
```ini
[wsl2]
memory=8GB
processors=4
```
Then restart WSL: `wsl --shutdown` from PowerShell.
---
## Stopping Everything
```powershell
# Stop infrastructure
docker compose down
# Stop infrastructure AND delete all data (fresh start)
docker compose down -v
```
The `-v` flag removes the named volumes (database data, MinIO objects, Ollama models). Omit it to preserve data between restarts.