docs: rewrite README and runbook for current platform state

README: updated architecture diagram, three signal layers, tracked
universe, autonomous trading engine, global news interpolation,
competitive intelligence, paper trading, notification service,
updated services table, project structure, deployment, endpoints.

Runbook: updated service overview, deployment via runmefirst.sh,
secrets management (keys in kube dir not repo), backup/restore
scripts, trading engine operations, signal layer toggles, database
nuke & rebuild, monitoring, CI/CD, removed hardcoded secrets.
This commit is contained in:
Celes Renata
2026-04-16 02:06:18 +00:00
parent e652a62dbc
commit 9aae57f3e1
2 changed files with 366 additions and 134 deletions
+113 -76
View File
@@ -1,130 +1,159 @@
# Stonks Oracle
AI-powered market intelligence and paper-trading platform. Ingests market data, company news, and regulatory filings; extracts structured intelligence with local LLMs; computes trend summaries and trade recommendations; and optionally executes paper trades — all self-hosted on Kubernetes.
AI-powered market intelligence and autonomous paper-trading platform. Ingests market data, company news, and regulatory filings; extracts structured intelligence with local LLMs; aggregates signals across three layers (company, macro, competitive); and autonomously executes paper trades — all self-hosted on Kubernetes.
## What It Does
Stonks Oracle monitors tracked companies across multiple data sources, runs every article and filing through a local Ollama model to extract structured intelligence (sentiment, catalysts, risks, key facts), aggregates those signals into rolling trend summaries with contradiction detection, and generates explainable trade recommendations with risk controls.
Stonks Oracle tracks 50 companies across 10 sectors. It monitors multiple data sources, runs every article and filing through a local Ollama model to extract structured intelligence, aggregates those signals into rolling trend summaries with contradiction detection, and generates explainable trade recommendations. An autonomous trading engine then evaluates those recommendations and executes paper trades through Alpaca without manual intervention.
Everything is auditable — raw artifacts, prompts, model outputs, and decision traces are preserved. Historical data flows into a MinIO-backed lakehouse queryable via Trino and visualized through Superset dashboards and a built-in React dashboard.
Everything is auditable — raw artifacts, prompts, model outputs, decision traces, and trade execution logs are preserved. Historical data flows into a MinIO-backed lakehouse queryable via Trino and visualized through Superset dashboards and a built-in React dashboard.
## Architecture
```
┌─────────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐
Scheduler │───▶│ Ingestion│───▶│ Parser │───▶│ Extractor
└─────────────┘ └──────────┘ └──────────┘ └──────┬──────┘
┌──────────────────────────────────────┘
┌─────────────┐ ┌────────────────┐ ┌──────────────┐
│ Aggregation │───▶│ Recommendation │───▶│ Risk Engine
└─────────────┘ └────────────────┘ └──────┬───────┘
┌────────────────────────────────────────┘
┌──────────────┐ ┌────────────────┐
│Broker Adapter│ │ Lake Publisher
└──────────────┘ └────────────────┘
────────────────────┘
┌──────────┐ ┌──────────┐ ┌───────────
│ Trino │ │ Superset │ │ Dashboard │
────────── ────────── ───────────
┌──────────────────────────────────────────┐
│ Signal Aggregation
│ │
┌───────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐
│ Scheduler │─▶│Ingestion │─▶│ │ Parser │─▶│ Extractor │ │
└───────────┘ └──────────┘ │ └──────────┘ └──────┬─────────┘ │
│ │ │
│ ┌─────────────┘
│ ▼ │
│ ┌─────────────┐ ┌────────────────┐
│ │ Aggregation │───▶│ Recommendation │ │
│ └──────┬──────┘ └───────┬────────┘ │
│ │ │ │
│ Macro signals Competitive
│ + Competitive signals │
│ signals merged
└──────────────────────────────────────────┘
┌───────────────────────┘
┌───────────── ┌──────────────── ┌──────────────
│ Risk Engine │───▶│ Trading Engine │───▶│Broker Adapter│
└─────────────┘ └────────────────┘ └──────────────┘
┌────────────────┐ ┌──────────┘
│ Lake Publisher │ ▼
└───────┬────────┘ Alpaca (paper)
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌───────────┐
│ Trino │ │ Superset │ │ Dashboard │
└──────────┘ └──────────┘ └───────────┘
```
Two planes:
- **Operational** — ingestion, parsing, extraction, aggregation, recommendations, risk evaluation, trade execution (PostgreSQL, Redis, MinIO)
- **Operational** — ingestion, parsing, extraction, aggregation, recommendations, risk evaluation, autonomous trading, trade execution (PostgreSQL, Redis, MinIO)
- **Analytical** — historical fact tables, SQL queries, dashboards (MinIO/Parquet, Trino, Superset)
## Signal Layers
The aggregation engine merges signals from three independent layers via a unified `WeightedSignal` abstraction. Each layer has a runtime toggle — no restart required.
| Layer | Source | What It Does |
|-------|--------|-------------|
| **Layer 1: Company** | News, filings, market data | Document intelligence extraction → per-company impact records → trend windows |
| **Layer 2: Macro** | Global news, geopolitical events | Ollama-based event classification → exposure profile matching → per-company macro impact scores |
| **Layer 3: Competitive** | Historical platform data | Pattern mining on past catalyst outcomes → cross-company signal propagation via competitor relationships |
- Pattern-only or macro-only trend shifts are forced to informational mode (suppression safety)
- Macro weight default: 0.3, competitive weight default: 0.2
- Toggles: `macro_enabled` and `competitive_enabled` in `risk_configs`
## Tracked Universe
50 companies across 10 sectors: Technology, Consumer Cyclical, Financial Services, Healthcare, Energy, Communication Services, Industrials, Consumer Defensive, Real Estate, Utilities.
46 competitor relationships (direct_rival, same_sector, overlapping_products, supply_chain_adjacent).
Seed data: `python -m services.symbol_registry.seed`
## Features
### Autonomous Trading Engine
Continuous decision loop that polls for actionable recommendations and executes paper trades without manual intervention. Includes confidence-based position sizing, dynamic stop-loss/take-profit (ATR-based), circuit breakers (daily loss cap, single-position loss, volatility detection), reserve pool management (auto-siphon from profits), risk tier auto-adjustment (conservative/moderate/aggressive based on trailing performance), portfolio rebalancing (sector and concentration limits), gradual entry (multi-tranche orders), correlation-aware diversification, earnings calendar awareness, portfolio heat management, tax-lot tracking with wash sale detection, performance tracking (Sharpe, drawdown, win rate, profit factor), and backtesting against historical data.
### Global News Interpolation
Macro/geopolitical event ingestion from dedicated sources. Ollama-based classification by impact type, severity, affected regions, and sectors. Company exposure profiles (geographic revenue mix, supply chain regions, commodity dependencies, market position tier) map events to per-company macro impact scores with resilience modifiers. Forward-looking trend projections combine company momentum with macro trajectories.
### Competitive Intelligence
Historical pattern mining on the platform's own data — how similar catalyst types resolved in the past for a company and its competitors. Cross-company signal propagation via competitor relationships. Major corporate decision tracking (M&A, restructuring, leadership changes) with extended lookback windows. Auto-inference of competitor relationships from sector matching and document co-mentions.
### Data Ingestion
- Market data via Polygon.io (quotes, OHLCV bars, corporate actions)
- Grouped daily market data from Polygon.io (OHLCV bars, corporate actions)
- Company news via news APIs with full article scraping
- SEC filings and regulatory events
- Configurable polling intervals, rate limiting, retries, and backoff
- Content hash deduplication across all sources
- Raw artifact preservation in MinIO for full auditability
- Macro/geopolitical news from dedicated sources
- Content hash deduplication, rate limiting, retries, raw artifact preservation in MinIO
### AI-Powered Extraction
- Local Ollama models with schema-constrained JSON output
- Per-document intelligence: sentiment, catalysts, impact horizon, key facts, risks, macro themes
- Per-company impact records when a document mentions multiple companies
- Schema and semantic validation with retry on invalid outputs
- Prompt, model metadata, and raw output preservation for reproducibility
### Trend Aggregation
- Rolling company-level trend summaries across 5 windows (intraday, 1d, 7d, 30d, 90d)
- Recency decay, source credibility weighting, and document novelty scoring
- Recency decay, source credibility weighting, document novelty scoring
- Contradiction detection with explicit disagreement representation
- Sector and market-level rollups
- Evidence ranking with top supporting and opposing documents
- Sector and market-level rollups incorporating macro event impacts
- Forward-looking trend projections with driving factor explanations
### Trade Recommendations
- Explainable recommendation objects with action, thesis, confidence, and cited evidence
- Deterministic eligibility scoring separated from action mapping
- Position sizing based on portfolio rules
- Data quality suppression — low-confidence or stale data forces informational-only mode
- Optional LLM thesis rewriting for analyst-quality prose
### Risk Engine and Trading
- Paper trading mode and live trading mode as separate environments
- Hard blocks: max position size, daily loss cap, sector exposure limits, symbol cooldowns
- Operator approval workflow for live trading
- Idempotent order submission with duplicate prevention
- Fail-closed behavior on broker outages
### Paper Trading
- $100k paper capital via Alpaca integration
- Moderate risk tier default, auto-adjustable
- Full execution audit trail from signal to broker response
- Operator approval workflow available for live mode
### Notification Service
- AWS SNS for SMS alerts on critical events (circuit breaker triggers, risk tier changes, large trades)
- Gmail API for email alerts and daily performance summaries
- Configurable alert channels and thresholds
### Lakehouse and SQL Analytics
- Parquet fact tables on MinIO with Hive-compatible partitioning
- Iceberg table metadata for schema evolution
- Trino SQL engine for ad-hoc analytical queries
- Fact tables: market bars, documents, extractions, trade signals, orders, fills, positions, PnL, prediction vs outcome
- Trino SQL engine for ad-hoc queries
- Fact tables: market bars, documents, extractions, trade signals, orders, fills, positions, PnL, global events, macro impacts, competitive signals, trend projections
- Apache Superset for pre-built dashboards
### Web Dashboard
- React/TypeScript SPA with Tailwind CSS
- Company, watchlist, and source management
- Document timeline with intelligence drill-down
- Trend visualization with evidence chain navigation
- Recommendation review with full provenance
- Order and position tracking with audit trails
- Trading mode controls, risk configuration, approval workflow
- DevOps dashboards: pipeline health, ingestion throughput, model performance, source coverage
- Trend visualization with evidence chain navigation (company, macro, and competitive signals distinguished)
- Trading engine overview: risk tier, circuit breaker status, active/reserve pool, portfolio heat, P&L
- Portfolio composition, trade history, backtesting panel
- Global events browser, macro exposure panels, trend projection visualization
- Competitor relationship management, historical pattern explorer, corporate decision timeline
- DevOps dashboards: pipeline health, ingestion throughput, model performance
- Interactive SQL explorer with Monaco Editor and chart builder
- Pre-built analytical dashboards: symbol overview, sentiment heatmap, prediction accuracy, paper trading PnL, model quality
### Observability
- Structured JSON logging across all services
- Prometheus metrics for every pipeline stage
- Alerting for source failures, schema failure spikes, analytical lag, and broker issues
- Alerting for source failures, schema failure spikes, analytical lag, broker issues, and trading anomalies
- Dead-letter queues with replay tooling
- Data retention and lifecycle controls
### Global News Interpolation *(planned)*
- Macro/geopolitical event ingestion and Ollama-based classification
- Company exposure profiles (geographic revenue mix, supply chain, commodities, market position)
- Per-company macro impact scoring with resilience modifiers
- Macro signals blended into trend aggregation with configurable weight
- Runtime toggle to enable/disable macro signal layer
- Forward-looking trend projections combining company momentum with macro trajectories
- Dashboard pages for global events, macro exposure panels, and projection visualization
## Services
| Service | Description |
|---------|-------------|
| `scheduler` | Triggers ingestion cycles based on source polling intervals |
| `symbol-registry` | Manages companies, aliases, watchlists, sources, and exposure profiles |
| `ingestion` | Fetches market data, news, and filings from external APIs |
| `symbol-registry` | Manages companies, aliases, watchlists, sources, exposure profiles, and competitor relationships |
| `ingestion` | Fetches market data, news, filings, and macro events from external APIs |
| `parser` | Normalizes raw HTML/text, reduces boilerplate, scores parse quality |
| `extractor` | Runs Ollama extraction to produce document intelligence objects |
| `aggregation` | Computes rolling trend summaries with contradiction detection |
| `recommendation` | Generates trade recommendations from aggregated evidence |
| `extractor` | Runs Ollama extraction to produce document intelligence and global event classifications |
| `aggregation` | Computes rolling trend summaries with contradiction detection and trend projections |
| `recommendation` | Generates trade recommendations from aggregated evidence across all signal layers |
| `risk` | Evaluates orders against portfolio risk controls |
| `broker-adapter` | Interfaces with broker APIs for paper/live trading |
| `trading-engine` | Autonomous decision loop: position sizing, stop-loss, circuit breakers, reserve pool, rebalancing |
| `broker-adapter` | Interfaces with Alpaca for paper/live order execution |
| `lake-publisher` | Writes analytical Parquet datasets to MinIO |
| `query-api` | REST API for all operational and analytical queries |
| `dashboard` | React SPA served via nginx |
@@ -143,6 +172,7 @@ Two planes:
- **CI/CD**: GitHub Actions → GHCR container registry
- **Broker**: Alpaca (paper trading)
- **Market Data**: Polygon.io
- **Notifications**: AWS SNS (SMS), Gmail API (email)
## Project Structure
@@ -150,13 +180,14 @@ Two planes:
├── services/
│ ├── shared/ # Config, schemas, Redis keys, logging, audit
│ ├── scheduler/ # Job scheduling and source polling
│ ├── symbol_registry/ # Company and source management API
│ ├── symbol_registry/ # Company, source, exposure profile, competitor management API
│ ├── ingestion/ # External API adapters and raw artifact storage
│ ├── parser/ # HTML parsing, boilerplate reduction, quality scoring
│ ├── extractor/ # Ollama extraction and schema validation
│ ├── aggregation/ # Trend computation and contradiction detection
│ ├── extractor/ # Ollama extraction, event classification, schema validation
│ ├── aggregation/ # Trend computation, contradiction detection, projections
│ ├── recommendation/ # Recommendation generation and suppression
│ ├── risk/ # Risk evaluation and approval workflow
│ ├── trading/ # Autonomous trading engine, backtester, performance tracker
│ ├── adapters/ # Broker API integration
│ ├── lake_publisher/ # Parquet fact table publication
│ └── api/ # Query API (FastAPI)
@@ -169,6 +200,7 @@ Two planes:
│ ├── hive/ # Hive metastore configuration
│ ├── minio/ # MinIO lifecycle policies
│ └── superset/ # Superset configuration
├── scripts/ # Backup/restore scripts (backup-db.sh, restore-db.sh, backup-redis.sh)
├── dashboards/ # Superset dashboard JSON exports
├── tests/ # Python test suite
└── docker/ # Dockerfiles for services and Superset
@@ -198,17 +230,21 @@ npx vitest --run
## Deployment
The platform runs on Kubernetes with Helm:
The platform runs on Kubernetes (k3s cluster, 4 NixOS nodes). Full deployment is handled by `runmefirst.sh`, which sets up the database, runs migrations, and deploys via Helm.
```bash
# CI builds and pushes images automatically on push to main
# Deploy to cluster:
# Full deploy (from gremlin-1 where secrets are available):
bash ~/sources/kube/stonks-oracle/runmefirst.sh
# Quick Helm upgrade after CI builds new images:
helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle
# Restart a specific service:
kubectl rollout restart deployment/<service-name> -n stonks-oracle
```
Secrets are stored in `~/sources/kube/stonks-oracle/` on the deploy host — not in the repo. The deploy script reads them from disk and injects them via Helm `--set` flags. See the [runbook](docs/notes/runbook.md) for operational details.
## Live Endpoints
| Service | URL |
@@ -216,6 +252,7 @@ kubectl rollout restart deployment/<service-name> -n stonks-oracle
| Dashboard | https://stonks.celestium.life |
| Query API | https://stonks-api.celestium.life |
| Symbol Registry | https://stonks-registry.celestium.life |
| Trading Engine | https://stonks-trading.celestium.life |
| Superset | https://stonks-dash.celestium.life |
| Trino | https://stonks-trino.celestium.life |