Files
Celes Renata c85c0068a2 fix: clean up utcnow deprecation warnings, fix 12 failing tests, add CI/CD pipeline manifests
- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files
- Fix 12 failing tests to match current implementation behavior
- Fix pytest_plugins in non-top-level conftest (moved to root conftest.py)
- Auto-fix 189 lint issues (import sorting, unused imports)
- Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests)
- Add values-beta.yaml and values-paper.yaml for staged deployments
- Update GitHub Actions workflow to use self-hosted-gremlin runners
- Add integration-test job to CI pipeline

Result: 1596 passed, 0 failed, 0 warnings
2026-04-18 03:59:28 +00:00

17 KiB
Raw Permalink Blame History

Stonks Oracle — Local Development Setup (Windows + Docker Desktop)

This guide walks you through setting up Stonks Oracle on a Windows machine using Docker Desktop. By the end you will have the full platform running locally: PostgreSQL, Redis, MinIO, Ollama, Trino, and all application services.

Prerequisites

  • Windows 10/11 with WSL 2 enabled
  • Docker Desktop for Windows (with WSL 2 backend)
  • Git (Git for Windows or via WSL)
  • Python 3.12 (for running services outside Docker during development)
  • Node.js 24 (for frontend development)

Install Docker Desktop

  1. Download from https://www.docker.com/products/docker-desktop/
  2. During install, ensure "Use WSL 2 instead of Hyper-V" is checked
  3. After install, open Docker Desktop → Settings → Resources → WSL Integration → enable for your distro
  4. Allocate at least 8 GB RAM and 4 CPUs in Settings → Resources (Ollama needs room)

Install Python 3.12

Download from https://www.python.org/downloads/ and check "Add Python to PATH" during install. Or use winget install Python.Python.3.12 from PowerShell.

Install Node.js 24

Download from https://nodejs.org/ (LTS or Current, 24.x). Or use winget install OpenJS.NodeJS.


1. Register for API Accounts

You need two API accounts. Both have free tiers that work for development.

Polygon.io (Market Data)

  1. Go to https://polygon.io/
  2. Sign up for a free account
  3. Navigate to Dashboard → API Keys
  4. Copy your API key — this becomes MARKET_DATA_API_KEY

The free tier gives you delayed data and limited API calls. Paid tiers ($29+/mo) give real-time data and higher rate limits.

Alpaca (Paper Trading)

  1. Go to https://alpaca.markets/
  2. Sign up for a free account
  3. Navigate to the Paper Trading dashboard (not live)
  4. Go to API Keys → Generate New Key
  5. Copy both the API Key ID and Secret Key — these become BROKER_API_KEY and BROKER_API_SECRET
  6. Your paper trading base URL is https://paper-api.alpaca.markets

Alpaca paper trading is completely free with no time limit.


2. Clone the Repository

git clone https://github.com/celesrenata/stonks-oracle.git
cd stonks-oracle

3. Create Your Environment File

Create a .env file in the project root with your API keys:

# Polygon.io
MARKET_DATA_API_KEY=your_polygon_api_key_here

# Alpaca Paper Trading
BROKER_API_KEY=your_alpaca_key_id_here
BROKER_API_SECRET=your_alpaca_secret_key_here
BROKER_BASE_URL=https://paper-api.alpaca.markets
BROKER_MODE=paper

This file is gitignored. Keep it safe.


4. Start Infrastructure Services

The docker-compose.yml in the project root defines all infrastructure services. Start them:

docker compose up -d

This starts:

Service Port Purpose
PostgreSQL 16 5432 Primary database
Redis 7 6379 Job queues and caching
MinIO 9000 (API), 9001 (Console) Object storage for artifacts
Ollama 11434 Local LLM inference
Trino 8080 SQL query engine for lakehouse
Hive Metastore 9083 Metadata catalog for Trino
Superset 8088 Analytics dashboards

The minio-init sidecar automatically creates the required storage buckets.

Verify everything is running

docker compose ps

All services should show running (healthy). Give it 30-60 seconds for health checks to pass.

Access the UIs


5. Pull the Ollama Model

Stonks Oracle uses the qwen3.5:9b model for document extraction and event classification. Pull it:

docker exec -it stonks-oracle-ollama-1 ollama pull qwen3.5:9b

This downloads ~5 GB. If you have a GPU and want faster inference, make sure Docker Desktop has GPU passthrough enabled (Settings → Resources → GPU). Ollama will auto-detect CUDA GPUs.

To verify the model is available:

docker exec stonks-oracle-ollama-1 ollama list

6. Set Up the Python Environment

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

Verify the database migrations ran

PostgreSQL auto-runs the migration SQL files from infra/migrations/ on first start (they are mounted into /docker-entrypoint-initdb.d). Verify:

python -c "import asyncio, asyncpg; asyncio.run(asyncpg.connect('postgresql://stonks:stonks_dev@localhost:5432/stonks').then(lambda c: print('Connected!')))"

Or more simply, use psql if you have it:

docker exec -it stonks-oracle-postgres-1 psql -U stonks -d stonks -c "\dt" | head -20

You should see tables like companies, documents, trend_windows, recommendations, orders, etc.

Seed the company universe

python -m services.symbol_registry.seed

This populates 50 companies across 10 sectors and 46 competitor relationships.


7. Run the Application Services

You can run services directly with Python (for development) or build Docker images.

Open separate terminal windows for each service. Each needs the virtualenv activated and environment variables set:

# Terminal 1 — Scheduler (triggers ingestion on a cadence)
.venv\Scripts\activate
set MARKET_DATA_API_KEY=your_polygon_key
python -m services.scheduler.app

# Terminal 2 — Ingestion (fetches articles, filings, market data)
.venv\Scripts\activate
set MARKET_DATA_API_KEY=your_polygon_key
python -m services.ingestion.worker

# Terminal 3 — Parser (normalizes raw documents)
.venv\Scripts\activate
python -m services.parser.worker

# Terminal 4 — Extractor (LLM-based intelligence extraction)
.venv\Scripts\activate
python -m services.extractor.main

# Terminal 5 — Aggregation (merges signals into trend summaries)
.venv\Scripts\activate
python -m services.aggregation.main

# Terminal 6 — Recommendation (generates trade recommendations)
.venv\Scripts\activate
python -m services.recommendation.main

# Terminal 7 — Query API (REST API for the dashboard)
.venv\Scripts\activate
uvicorn services.api.app:app --host 0.0.0.0 --port 8000

# Terminal 8 — Symbol Registry (company CRUD API)
.venv\Scripts\activate
uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8001

# Terminal 9 — Risk Engine
.venv\Scripts\activate
uvicorn services.risk.app:app --host 0.0.0.0 --port 8002

# Terminal 10 — Trading Engine (autonomous paper trading)
.venv\Scripts\activate
set BROKER_API_KEY=your_alpaca_key
set BROKER_API_SECRET=your_alpaca_secret
set BROKER_BASE_URL=https://paper-api.alpaca.markets
uvicorn services.trading.app:app --host 0.0.0.0 --port 8003

# Terminal 11 — Broker Adapter (executes trades via Alpaca)
.venv\Scripts\activate
set BROKER_API_KEY=your_alpaca_key
set BROKER_API_SECRET=your_alpaca_secret
set BROKER_BASE_URL=https://paper-api.alpaca.markets
python -m services.adapters.broker_service

Not all services are required for basic development. The minimum set is:

  • scheduler + ingestion + parser + extractor — to get data flowing
  • aggregation + recommendation — to generate signals
  • query-api — to serve the dashboard

Add the trading services when you want to test paper trading.

Option B: Build and run as Docker containers

# Build the Python service image
docker build -t stonks-oracle/services -f docker/Dockerfile .

# Build the frontend
docker build -t stonks-oracle/dashboard -f frontend/Dockerfile frontend/

# Run a service (example: scheduler)
docker run --rm --network host ^
  -e MARKET_DATA_API_KEY=your_polygon_key ^
  -e POSTGRES_HOST=localhost ^
  -e REDIS_HOST=localhost ^
  -e MINIO_ENDPOINT=localhost:9000 ^
  -e OLLAMA_BASE_URL=http://localhost:11434 ^
  -e SERVICE_CMD="python -m services.scheduler.app" ^
  stonks-oracle/services

8. Run the Frontend Dashboard

cd frontend
npm install
npm run dev

The dashboard starts at http://localhost:5173. It proxies API requests to the backend services via Vite's dev server.

For production-like testing, build and serve with nginx:

cd frontend
docker build -t stonks-oracle/dashboard .
docker run --rm -p 8080:8080 --network host stonks-oracle/dashboard

Then visit http://localhost:8080.


9. Run Tests

Python tests

.venv\Scripts\activate
pip install ruff pytest pytest-asyncio hypothesis
ruff check services/
python -m pytest tests/ -x --tb=short -q

Frontend tests

cd frontend
npx vitest --run

10. How the Pipeline Works

Once services are running, the data flows automatically:

Scheduler (every 15s)
  → enqueues ingestion jobs for due sources
    → Ingestion fetches articles/filings/market data from Polygon
      → Parser normalizes raw text
        → Extractor calls Ollama to extract structured intelligence
          → Aggregation merges signals into trend summaries
            → Recommendation generates buy/sell/watch signals
              → Trading Engine evaluates and executes paper trades
                → Broker Adapter submits orders to Alpaca

Monitor the pipeline via Redis queues:

docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:ingestion
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:parsing
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:extraction
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:aggregation
docker exec stonks-oracle-redis-1 redis-cli llen stonks:queue:recommendation

Environment Variable Reference

All services read configuration from environment variables with sensible defaults for local development. You only need to set the ones that differ from defaults.

Required (no useful default)

Variable Description
MARKET_DATA_API_KEY Polygon.io API key
BROKER_API_KEY Alpaca API key ID
BROKER_API_SECRET Alpaca API secret

Infrastructure (defaults work with docker-compose)

Variable Default Description
POSTGRES_HOST localhost PostgreSQL host
POSTGRES_PORT 5432 PostgreSQL port
POSTGRES_DB stonks Database name
POSTGRES_USER stonks Database user
POSTGRES_PASSWORD stonks_dev Database password
REDIS_HOST localhost Redis host
REDIS_PORT 6379 Redis port
REDIS_PASSWORD (none) Redis password (not set in dev)
MINIO_ENDPOINT localhost:9000 MinIO API endpoint
MINIO_ACCESS_KEY minioadmin MinIO access key
MINIO_SECRET_KEY minioadmin MinIO secret key
OLLAMA_BASE_URL http://localhost:11434 Ollama API URL
OLLAMA_MODEL qwen3.5:9b LLM model name

Trading

Variable Default Description
BROKER_MODE paper Trading mode (paper or live)
BROKER_PROVIDER alpaca Broker provider
BROKER_BASE_URL (none) Alpaca API URL (set to https://paper-api.alpaca.markets)

11. Integration Tests

The integration test pipeline validates all API endpoints against a live Kubernetes sandbox with realistic seed data. It deploys ephemeral infrastructure (PostgreSQL, Redis, MinIO), seeds deterministic test data, deploys all API services, and runs the full test suite with profiling.

Prerequisites

  • kubectl configured with access to a Kubernetes cluster
  • Docker images built and pushed to GHCR (or use :latest)
  • envsubst available (usually part of gettext package)
  • GHCR_TOKEN environment variable set for image pulls (optional if images are public)

Running the Full Pipeline

# Run with latest images
bash infra/inttest/run_pipeline.sh

# Run with a specific image tag
bash infra/inttest/run_pipeline.sh --image-tag abc123

# Keep the sandbox running for debugging
bash infra/inttest/run_pipeline.sh --skip-teardown

# Custom namespace and results file
bash infra/inttest/run_pipeline.sh --namespace my-test --results-file results.json

CLI Options

Option Default Description
--image-tag TAG latest Docker image tag to deploy
--namespace NAME stonks-inttest-<timestamp> Kubernetes namespace name
--skip-teardown false Leave namespace running after tests
--results-file PATH inttest-results.json Path for JSON results output

Exit Codes

Code Meaning
0 All tests passed
1 One or more test failures
2 Infrastructure setup failure

JSON Result Contract

The pipeline produces a JSON results file (inttest-results.json by default) with this structure:

{
  "run_id": "stonks-inttest-1705312800",
  "image_tag": "abc123",
  "started_at": "2025-01-15T12:00:00Z",
  "completed_at": "2025-01-15T12:07:30Z",
  "exit_code": 0,
  "stages": {
    "infra_deploy": {"duration_s": 45, "status": "ok"},
    "seed_data": {"duration_s": 8, "status": "ok"},
    "service_deploy": {"duration_s": 32, "status": "ok"},
    "integration_tests": {"duration_s": 28, "status": "ok"},
    "teardown": {"duration_s": 5, "status": "ok"}
  },
  "tests": {"total": 41, "passed": 41, "failed": 0, "errors": 0},
  "profiling": {
    "endpoints": {"/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45}},
    "slow_endpoints": []
  }
}

Running Tests Locally (Development)

For faster iteration during development, you can run individual test files against local services:

# Start local services first (query-api on 8000, registry on 8001, etc.)
# Then run specific test files:
.venv/bin/python -m pytest tests/integration/test_query_api.py -v --tb=short
.venv/bin/python -m pytest tests/integration/test_registry_api.py -v --tb=short
.venv/bin/python -m pytest tests/integration/test_frontend_data_deps.py -v --tb=short

# Run with profiling output:
.venv/bin/python -m pytest tests/integration/ -v --profiling-output=profiling.json

Set the service URLs via environment variables:

export QUERY_API_URL=http://localhost:8000
export REGISTRY_API_URL=http://localhost:8001
export RISK_API_URL=http://localhost:8002
export TRADING_API_URL=http://localhost:8003

Future: CI/CD Pipeline

This integration test runner is designed as a standalone foundation. A future CI/CD pipeline spec will consume it as one stage in a larger pipeline that includes:

  • Self-hosted builds on gremlin nodes (no GitHub Actions compute costs)
  • Staged promotion: beta → paper → live
  • Market-hours promotion blockers (9:3016:00 ET)
  • Break-glass emergency deploy to production
  • Per-stage enable/disable toggles

Troubleshooting

"Connection refused" to PostgreSQL/Redis/MinIO

Make sure Docker Desktop is running and docker compose ps shows all services healthy. On Windows, localhost should work since Docker Desktop maps ports to the host.

Ollama model not found

Run docker exec stonks-oracle-ollama-1 ollama pull qwen3.5:9b and wait for the download to complete. Check available models with ollama list.

Ollama is slow (no GPU)

Without a GPU, Ollama runs on CPU and extraction takes 2-5 minutes per document. If you have an NVIDIA GPU, ensure Docker Desktop has GPU support enabled and the NVIDIA Container Toolkit is installed. See Ollama Docker GPU docs.

Migrations didn't run

If the database is empty, the migrations may not have run on first start. You can apply them manually:

# Connect to postgres and run migrations in order
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/001_initial_schema.sql
docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks < infra/migrations/002_documents_and_intelligence.sql
# ... repeat for all 030 migration files

Or run them all at once:

Get-ChildItem infra\migrations\*.sql | Sort-Object Name | ForEach-Object {
    Write-Host "Applying $($_.Name)..."
    Get-Content $_.FullName | docker exec -i stonks-oracle-postgres-1 psql -U stonks -d stonks
}

Frontend can't reach the API

When running the frontend with npm run dev, Vite proxies /api/ requests. Make sure the Query API is running on port 8000. If using different ports, set the Vite env vars:

set VITE_QUERY_API_URL=http://localhost:8000
set VITE_SYMBOL_REGISTRY_URL=http://localhost:8001
set VITE_RISK_ENGINE_URL=http://localhost:8002
npm run dev

WSL 2 memory issues

Docker Desktop on WSL 2 can consume a lot of memory. Create or edit %USERPROFILE%\.wslconfig:

[wsl2]
memory=8GB
processors=4

Then restart WSL: wsl --shutdown from PowerShell.


Stopping Everything

# Stop infrastructure
docker compose down

# Stop infrastructure AND delete all data (fresh start)
docker compose down -v

The -v flag removes the named volumes (database data, MinIO objects, Ollama models). Omit it to preserve data between restarts.