admin/stonks-oracle

Fork 0

T

Celes Renata 007189c0a5

ci/woodpecker/push/test Pipeline was successful

Details

ci/woodpecker/push/build-1 Pipeline was successful

Details

ci/woodpecker/push/build-2 Pipeline was successful

Details

ci/woodpecker/push/build-3 Pipeline was successful

Details

ci/woodpecker/push/finalize Pipeline was successful

Details

Build and Push / lint-and-test (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled

Details

Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled

Details

Build and Push / build-dashboard (push) Has been cancelled

Details

Build and Push / build-superset (push) Has been cancelled

Details

Build and Push / integration-test (push) Has been cancelled

Details

Build and Push / beta-gate (push) Has been cancelled

Details

fix: handle plain-text thinking blocks and disable think mode

The model outputs 'Thinking Process:' as plain text (not in <think> tags).
Updated _strip_thinking_block to handle both XML tags and plain-text
reasoning patterns. Also:
- Added rule 7 to system prompt: 'Do NOT show your thinking process'
- Set think=False in Ollama payload to disable Qwen3 thinking mode
- Added fallback regex to extract thesis from after thinking blocks

2026-04-29 15:50:49 +00:00

.github/workflows

feat: use Harbor-cached buildkit image for buildx in CI

2026-04-21 05:58:46 +00:00

.hypothesis

feat: pipeline on/off toggle with per-stage Helm control

2026-04-21 00:21:53 +00:00

.kiro

feat: signal math upgrade — probabilistic, regime-aware scoring pipeline

2026-04-29 11:41:48 +00:00

.woodpecker

fix: remove Docker Hub login from CI builds

2026-04-23 12:08:37 +00:00

dashboards

phase 14-15: docker build validation and helm deployment

2026-04-11 11:59:45 -07:00

docker

fix: dedicated scheduler Dockerfile with psql for migrations, remove Python splitter

2026-04-19 23:35:00 +00:00

docs

docs: update equations.md with probabilistic pipeline formulas

2026-04-29 15:12:47 +00:00

frontend

feat: add max open positions and position cap controls to trading dashboard

2026-04-28 19:16:50 +00:00

infra

feat: signal math upgrade — probabilistic, regime-aware scoring pipeline

2026-04-29 11:41:48 +00:00

lakehouse

phase 14-15: docker build validation and helm deployment

2026-04-11 11:59:45 -07:00

pipelines

feat: move Woodpecker server storage to NFS, update OAuth credentials

2026-04-28 15:09:31 +00:00

scripts

ci: fix lint errors across project, update ruff.toml per-file ignores

2026-04-18 21:02:28 +00:00

services

fix: handle plain-text thinking blocks and disable think mode

2026-04-29 15:50:49 +00:00

tests

feat: signal math upgrade — probabilistic, regime-aware scoring pipeline

2026-04-29 11:41:48 +00:00

.dockerignore

fix: allow infra/migrations in .dockerignore, add psql + migrations to Docker image

2026-04-19 22:51:24 +00:00

.gitignore

ci: fix lint errors across project, update ruff.toml per-file ignores

2026-04-18 21:02:28 +00:00

conftest.py

fix: inttest runner crash and minio bucket-init proxy issue

2026-04-19 19:15:20 +00:00

deploy-docker.sh

feat: add Rocky 9.7 prerequisites and GPU passthrough for ollama container

2026-04-29 04:16:44 +00:00

docker-compose.yml

feat: add Rocky 9.7 prerequisites and GPU passthrough for ollama container

2026-04-29 04:16:44 +00:00

flake.lock

phase 2: symbol registry validation, seed data, nix dev shell

2026-04-11 03:41:41 -07:00

flake.nix

phase 2: symbol registry validation, seed data, nix dev shell

2026-04-11 03:41:41 -07:00

LICENSE

chore: add BSL 1.1 license and copyright notice

2026-04-17 07:45:30 +00:00

Makefile

feat: migrate CI/CD from GHCR to local Harbor registry

2026-04-19 07:34:28 +00:00

README.md

feat: comprehensive docs, unit tests, docker-compose app services

2026-04-22 02:56:41 +00:00

requirements.txt

fix: pin ruff==0.15.10 to match local version and fix CI I001 failures

2026-04-17 03:16:28 +00:00

ruff.toml

ci: fix lint errors across project, update ruff.toml per-file ignores

2026-04-18 21:02:28 +00:00

runmefirst.sh

fix: force-recreate GHCR pull secret on every deploy

2026-04-17 08:10:34 +00:00

runmelast.sh

phase 16: custom superset image with trino driver, fix security context

2026-04-11 17:18:17 -07:00

README.md

Stonks Oracle

Licensed under the Business Source License 1.1. Production use requires written approval from the author. See LICENSE for details.

AI-powered market intelligence and autonomous paper-trading platform. Ingests market data, company news, and regulatory filings; extracts structured intelligence with local LLMs; aggregates signals across three layers (company, macro, competitive); and autonomously executes paper trades — all self-hosted on Kubernetes.

Documentation

Document	Description
Service Reference	All 13 services — purpose, configuration, queue topology, database tables
API Reference	Complete endpoint reference for Query API, Symbol Registry, Trading, and Risk services
Helm Chart Reference	All Helm values: services, config, secrets, ingress, network policies, analytics stack
Docker Deployment Guide	Docker Compose setup, environment variables, volumes, operational commands
Kubernetes Architecture	Mermaid diagram of the K8s deployment topology, namespaces, ingress, and secrets
Docker Compose Architecture	Mermaid diagram of all containers, port mappings, volumes, and dependencies
Data Pipeline Architecture	Mermaid diagram of the end-to-end data pipeline, queue topology, and signal layers
AI Agents Guide	Built-in agents, variant management, prompt tuning, and performance monitoring
Backup & Restore Guide	Backup scripts, restore procedures, retention policies, and disaster recovery
Observability Reference	Prometheus metrics, alerting rules, structured logging, and dead-letter queues

What It Does

Stonks Oracle tracks 50 companies across 10 sectors. It monitors multiple data sources, runs every article and filing through a local Ollama model to extract structured intelligence, aggregates those signals into rolling trend summaries with contradiction detection, and generates explainable trade recommendations. An autonomous trading engine then evaluates those recommendations and executes paper trades through Alpaca without manual intervention.

Everything is auditable — raw artifacts, prompts, model outputs, decision traces, and trade execution logs are preserved. Historical data flows into a MinIO-backed lakehouse queryable via Trino and visualized through Superset dashboards and a built-in React dashboard.

Architecture

flowchart LR
    subgraph sources ["Data Sources"]
        polygon["Polygon.io"]
        sec["SEC EDGAR"]
        macro_src["Macro News"]
    end

    subgraph pipeline ["Signal Processing"]
        scheduler["Scheduler"]
        ingestion["Ingestion"]
        parser["Parser"]
        extractor["Extractor"]
        aggregation["Aggregation"]
        recommendation["Recommendation"]
    end

    subgraph trading ["Trading"]
        risk["Risk Engine"]
        engine["Trading Engine"]
        broker["Broker Adapter"]
        alpaca["Alpaca (paper)"]
    end

    subgraph analytics ["Analytics"]
        lake["Lake Publisher"]
        trino["Trino"]
        superset["Superset"]
        dashboard["Dashboard"]
    end

    sources --> scheduler --> ingestion --> parser --> extractor --> aggregation --> recommendation
    recommendation --> risk --> engine --> broker --> alpaca
    aggregation --> lake --> trino --> superset
    trino --> dashboard

For detailed architecture diagrams see:

Two planes:

Operational — ingestion, parsing, extraction, aggregation, recommendations, risk evaluation, autonomous trading, trade execution (PostgreSQL, Redis, MinIO)
Analytical — historical fact tables, SQL queries, dashboards (MinIO/Parquet, Trino, Superset)

Signal Layers

The aggregation engine merges signals from three independent layers via a unified WeightedSignal abstraction. Each layer has a runtime toggle — no restart required.

Layer	Source	What It Does
Layer 1: Company	News, filings, market data	Document intelligence extraction → per-company impact records → trend windows
Layer 2: Macro	Global news, geopolitical events	Ollama-based event classification → exposure profile matching → per-company macro impact scores
Layer 3: Competitive	Historical platform data	Pattern mining on past catalyst outcomes → cross-company signal propagation via competitor relationships

Pattern-only or macro-only trend shifts are forced to informational mode (suppression safety)
Macro weight default: 0.3, competitive weight default: 0.2
Toggles: macro_enabled and competitive_enabled in risk_configs

Tracked Universe

50 companies across 10 sectors: Technology, Consumer Cyclical, Financial Services, Healthcare, Energy, Communication Services, Industrials, Consumer Defensive, Real Estate, Utilities.

46 competitor relationships (direct_rival, same_sector, overlapping_products, supply_chain_adjacent).

Seed data: python -m services.symbol_registry.seed

Features

Autonomous Trading Engine

Continuous decision loop that polls for actionable recommendations and executes paper trades without manual intervention. Includes confidence-based position sizing (with sample-size-dampened agreement scoring to prevent thin-evidence inflation), dynamic stop-loss/take-profit (ATR-based), circuit breakers (daily loss cap, single-position loss, volatility detection), reserve pool management (auto-siphon from profits), risk tier auto-adjustment (conservative/moderate/aggressive based on trailing performance), portfolio rebalancing (sector and concentration limits), gradual entry (multi-tranche orders), correlation-aware diversification, earnings calendar awareness, portfolio heat management, tax-lot tracking with wash sale detection, performance tracking (Sharpe, drawdown, win rate, profit factor), and backtesting against historical data.

AI Agent Management

Configurable AI agents (document extractor, event classifier, thesis rewriter) with database-driven model/prompt resolution. 60-second TTL cache for hot-swapping models without restarts. Agent performance logging with variant attribution for future A/B testing support.

Global News Interpolation

Macro/geopolitical event ingestion from dedicated sources. Ollama-based classification by impact type, severity, affected regions, and sectors. Company exposure profiles (geographic revenue mix, supply chain regions, commodity dependencies, market position tier) map events to per-company macro impact scores with resilience modifiers. Forward-looking trend projections combine company momentum with macro trajectories.

Competitive Intelligence

Historical pattern mining on the platform's own data — how similar catalyst types resolved in the past for a company and its competitors. Cross-company signal propagation via competitor relationships. Major corporate decision tracking (M&A, restructuring, leadership changes) with extended lookback windows. Auto-inference of competitor relationships from sector matching and document co-mentions.

Data Ingestion

Grouped daily market data from Polygon.io (OHLCV bars, corporate actions)
Company news via news APIs with full article scraping
SEC filings and regulatory events
Macro/geopolitical news from dedicated sources
Content hash deduplication, rate limiting, retries, raw artifact preservation in MinIO

AI-Powered Extraction

Local Ollama models with schema-constrained JSON output
Per-document intelligence: sentiment, catalysts, impact horizon, key facts, risks, macro themes
Per-company impact records when a document mentions multiple companies
Schema and semantic validation with retry on invalid outputs

Trend Aggregation

Rolling company-level trend summaries across 5 windows (intraday, 1d, 7d, 30d, 90d)
Recency decay, source credibility weighting, document novelty scoring
Contradiction detection with explicit disagreement representation
Sector and market-level rollups incorporating macro event impacts
Forward-looking trend projections with driving factor explanations

Paper Trading

Alpaca paper trading integration (3 accounts max per Alpaca owner)
Full reset: liquidates broker positions, cancels orders, clears local DB, syncs capital from broker's actual account balance
No manual capital controls — engine capital always derived from broker state on reset
Moderate risk tier default, auto-adjustable
Full execution audit trail from signal to broker response
Operator approval workflow available for live mode

Notification Service

AWS SNS for SMS alerts on critical events (circuit breaker triggers, risk tier changes, large trades)
Gmail API for email alerts and daily performance summaries
Configurable alert channels and thresholds

Lakehouse and SQL Analytics

Parquet fact tables on MinIO with Hive-compatible partitioning
Iceberg table metadata for schema evolution
Trino SQL engine for ad-hoc queries
Fact tables: market bars, documents, extractions, trade signals, orders, fills, positions, PnL, global events, macro impacts, competitive signals, trend projections
Apache Superset for pre-built dashboards

Web Dashboard

React/TypeScript SPA with Tailwind CSS
Company, watchlist, and source management
Document timeline with intelligence drill-down
Trend visualization with evidence chain navigation (company, macro, and competitive signals distinguished)
Trading engine overview: risk tier, circuit breaker status, active/reserve pool, portfolio heat, P&L
Portfolio composition, trade history, backtesting panel
Global events browser, macro exposure panels, trend projection visualization
Competitor relationship management, historical pattern explorer, corporate decision timeline
DevOps dashboards: pipeline health, ingestion throughput, model performance
Interactive SQL explorer with Monaco Editor and chart builder

Observability

Structured JSON logging across all services
Prometheus metrics for every pipeline stage
Alerting for source failures, schema failure spikes, analytical lag, broker issues, and trading anomalies
Dead-letter queues with replay tooling
Data retention and lifecycle controls

Services

Service	Description
`scheduler`	Triggers ingestion cycles based on source polling intervals
`symbol-registry`	Manages companies, aliases, watchlists, sources, exposure profiles, and competitor relationships
`ingestion`	Fetches market data, news, filings, and macro events from external APIs
`parser`	Normalizes raw HTML/text, reduces boilerplate, scores parse quality
`extractor`	Runs Ollama extraction to produce document intelligence and global event classifications
`aggregation`	Computes rolling trend summaries with contradiction detection and trend projections
`recommendation`	Generates trade recommendations from aggregated evidence across all signal layers
`risk`	Evaluates orders against portfolio risk controls
`trading-engine`	Autonomous decision loop: position sizing, stop-loss, circuit breakers, reserve pool, rebalancing
`broker-adapter`	Interfaces with Alpaca for paper/live order execution
`lake-publisher`	Writes analytical Parquet datasets to MinIO
`query-api`	REST API for all operational and analytical queries
`dashboard`	React SPA served via nginx

Tech Stack

Language: Python 3.12, TypeScript (frontend)
AI: Ollama (local LLM inference with structured JSON output)
Databases: PostgreSQL 16, Redis 7
Object Storage: MinIO (S3-compatible)
Lakehouse: Parquet + Hive partitioning + Iceberg metadata
SQL Engine: Trino
BI: Apache Superset
Frontend: React 19, Vite, TanStack Router/Query, Recharts, Monaco Editor, Tailwind CSS
Infrastructure: Kubernetes (k3s), Helm, Traefik ingress, cert-manager
CI/CD: GitHub Actions → GHCR container registry
Broker: Alpaca (paper trading)
Market Data: Polygon.io
Notifications: AWS SNS (SMS), Gmail API (email)

Project Structure

├── services/
│   ├── shared/          # Config, schemas, Redis keys, logging, audit
│   ├── scheduler/       # Job scheduling and source polling
│   ├── symbol_registry/ # Company, source, exposure profile, competitor management API
│   ├── ingestion/       # External API adapters and raw artifact storage
│   ├── parser/          # HTML parsing, boilerplate reduction, quality scoring
│   ├── extractor/       # Ollama extraction, event classification, schema validation
│   ├── aggregation/     # Trend computation, contradiction detection, projections
│   ├── recommendation/  # Recommendation generation and suppression
│   ├── risk/            # Risk evaluation and approval workflow
│   ├── trading/         # Autonomous trading engine, backtester, performance tracker
│   ├── adapters/        # Broker API integration
│   ├── lake_publisher/  # Parquet fact table publication
│   └── api/             # Query API (FastAPI)
├── frontend/            # React dashboard SPA
├── infra/
│   ├── helm/            # Helm chart for Kubernetes deployment
│   ├── k8s/             # Raw Kubernetes manifests
│   ├── migrations/      # PostgreSQL schema migrations
│   ├── trino/           # Trino catalog configuration
│   ├── hive/            # Hive metastore configuration
│   ├── minio/           # MinIO lifecycle policies
│   └── superset/        # Superset configuration
├── scripts/             # Backup/restore scripts (backup-db.sh, restore-db.sh, backup-redis.sh)
├── dashboards/          # Superset dashboard JSON exports
├── tests/               # Python test suite
└── docker/              # Dockerfiles for services and Superset

Local Development

Prerequisites: Python 3.12, Node.js 24, Docker

# Start infrastructure
docker compose up -d

# Install Python dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Run tests
python -m pytest tests/ -x --tb=short -q

# Frontend
cd frontend
npm install
npx vitest --run

Deployment

The platform runs on Kubernetes (k3s cluster, 4 NixOS nodes). Full deployment is handled by runmefirst.sh, which sets up the database, runs migrations, and deploys via Helm.

# Full deploy (from gremlin-1 where secrets are available):
bash ~/sources/kube/stonks-oracle/runmefirst.sh

# Quick Helm upgrade after CI builds new images:
helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle

# Restart a specific service:
kubectl rollout restart deployment/<service-name> -n stonks-oracle

Secrets are stored in ~/sources/kube/stonks-oracle/ on the deploy host — not in the repo. The deploy script reads them from disk and injects them via Helm --set flags. See the runbook for operational details.

Live Endpoints

Service	URL
Dashboard	https://stonks.celestium.life
Query API	https://stonks-api.celestium.life
Symbol Registry	https://stonks-registry.celestium.life
Trading Engine	https://stonks-trading.celestium.life
Superset	https://stonks-dash.celestium.life
Trino	https://stonks-trino.celestium.life

License

Private repository.