225 lines
11 KiB
Markdown
225 lines
11 KiB
Markdown
# Stonks Oracle
|
|
|
|
AI-powered market intelligence and paper-trading platform. Ingests market data, company news, and regulatory filings; extracts structured intelligence with local LLMs; computes trend summaries and trade recommendations; and optionally executes paper trades — all self-hosted on Kubernetes.
|
|
|
|
## What It Does
|
|
|
|
Stonks Oracle monitors tracked companies across multiple data sources, runs every article and filing through a local Ollama model to extract structured intelligence (sentiment, catalysts, risks, key facts), aggregates those signals into rolling trend summaries with contradiction detection, and generates explainable trade recommendations with risk controls.
|
|
|
|
Everything is auditable — raw artifacts, prompts, model outputs, and decision traces are preserved. Historical data flows into a MinIO-backed lakehouse queryable via Trino and visualized through Superset dashboards and a built-in React dashboard.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐
|
|
│ Scheduler │───▶│ Ingestion│───▶│ Parser │───▶│ Extractor │
|
|
└─────────────┘ └──────────┘ └──────────┘ └──────┬──────┘
|
|
│
|
|
┌──────────────────────────────────────┘
|
|
▼
|
|
┌─────────────┐ ┌────────────────┐ ┌──────────────┐
|
|
│ Aggregation │───▶│ Recommendation │───▶│ Risk Engine │
|
|
└─────────────┘ └────────────────┘ └──────┬───────┘
|
|
│
|
|
┌────────────────────────────────────────┘
|
|
▼
|
|
┌──────────────┐ ┌────────────────┐
|
|
│Broker Adapter│ │ Lake Publisher │
|
|
└──────────────┘ └────────────────┘
|
|
│
|
|
┌────────────────────┘
|
|
▼
|
|
┌──────────┐ ┌──────────┐ ┌───────────┐
|
|
│ Trino │ │ Superset │ │ Dashboard │
|
|
└──────────┘ └──────────┘ └───────────┘
|
|
```
|
|
|
|
Two planes:
|
|
- **Operational** — ingestion, parsing, extraction, aggregation, recommendations, risk evaluation, trade execution (PostgreSQL, Redis, MinIO)
|
|
- **Analytical** — historical fact tables, SQL queries, dashboards (MinIO/Parquet, Trino, Superset)
|
|
|
|
## Features
|
|
|
|
### Data Ingestion
|
|
- Market data via Polygon.io (quotes, OHLCV bars, corporate actions)
|
|
- Company news via news APIs with full article scraping
|
|
- SEC filings and regulatory events
|
|
- Configurable polling intervals, rate limiting, retries, and backoff
|
|
- Content hash deduplication across all sources
|
|
- Raw artifact preservation in MinIO for full auditability
|
|
|
|
### AI-Powered Extraction
|
|
- Local Ollama models with schema-constrained JSON output
|
|
- Per-document intelligence: sentiment, catalysts, impact horizon, key facts, risks, macro themes
|
|
- Per-company impact records when a document mentions multiple companies
|
|
- Schema and semantic validation with retry on invalid outputs
|
|
- Prompt, model metadata, and raw output preservation for reproducibility
|
|
|
|
### Trend Aggregation
|
|
- Rolling company-level trend summaries across 5 windows (intraday, 1d, 7d, 30d, 90d)
|
|
- Recency decay, source credibility weighting, and document novelty scoring
|
|
- Contradiction detection with explicit disagreement representation
|
|
- Sector and market-level rollups
|
|
- Evidence ranking with top supporting and opposing documents
|
|
|
|
### Trade Recommendations
|
|
- Explainable recommendation objects with action, thesis, confidence, and cited evidence
|
|
- Deterministic eligibility scoring separated from action mapping
|
|
- Position sizing based on portfolio rules
|
|
- Data quality suppression — low-confidence or stale data forces informational-only mode
|
|
- Optional LLM thesis rewriting for analyst-quality prose
|
|
|
|
### Risk Engine and Trading
|
|
- Paper trading mode and live trading mode as separate environments
|
|
- Hard blocks: max position size, daily loss cap, sector exposure limits, symbol cooldowns
|
|
- Operator approval workflow for live trading
|
|
- Idempotent order submission with duplicate prevention
|
|
- Fail-closed behavior on broker outages
|
|
- Full execution audit trail from signal to broker response
|
|
|
|
### Lakehouse and SQL Analytics
|
|
- Parquet fact tables on MinIO with Hive-compatible partitioning
|
|
- Iceberg table metadata for schema evolution
|
|
- Trino SQL engine for ad-hoc analytical queries
|
|
- Fact tables: market bars, documents, extractions, trade signals, orders, fills, positions, PnL, prediction vs outcome
|
|
- Apache Superset for pre-built dashboards
|
|
|
|
### Web Dashboard
|
|
- React/TypeScript SPA with Tailwind CSS
|
|
- Company, watchlist, and source management
|
|
- Document timeline with intelligence drill-down
|
|
- Trend visualization with evidence chain navigation
|
|
- Recommendation review with full provenance
|
|
- Order and position tracking with audit trails
|
|
- Trading mode controls, risk configuration, approval workflow
|
|
- DevOps dashboards: pipeline health, ingestion throughput, model performance, source coverage
|
|
- Interactive SQL explorer with Monaco Editor and chart builder
|
|
- Pre-built analytical dashboards: symbol overview, sentiment heatmap, prediction accuracy, paper trading PnL, model quality
|
|
|
|
### Observability
|
|
- Structured JSON logging across all services
|
|
- Prometheus metrics for every pipeline stage
|
|
- Alerting for source failures, schema failure spikes, analytical lag, and broker issues
|
|
- Dead-letter queues with replay tooling
|
|
- Data retention and lifecycle controls
|
|
|
|
### Global News Interpolation *(planned)*
|
|
- Macro/geopolitical event ingestion and Ollama-based classification
|
|
- Company exposure profiles (geographic revenue mix, supply chain, commodities, market position)
|
|
- Per-company macro impact scoring with resilience modifiers
|
|
- Macro signals blended into trend aggregation with configurable weight
|
|
- Runtime toggle to enable/disable macro signal layer
|
|
- Forward-looking trend projections combining company momentum with macro trajectories
|
|
- Dashboard pages for global events, macro exposure panels, and projection visualization
|
|
|
|
## Services
|
|
|
|
| Service | Description |
|
|
|---------|-------------|
|
|
| `scheduler` | Triggers ingestion cycles based on source polling intervals |
|
|
| `symbol-registry` | Manages companies, aliases, watchlists, sources, and exposure profiles |
|
|
| `ingestion` | Fetches market data, news, and filings from external APIs |
|
|
| `parser` | Normalizes raw HTML/text, reduces boilerplate, scores parse quality |
|
|
| `extractor` | Runs Ollama extraction to produce document intelligence objects |
|
|
| `aggregation` | Computes rolling trend summaries with contradiction detection |
|
|
| `recommendation` | Generates trade recommendations from aggregated evidence |
|
|
| `risk` | Evaluates orders against portfolio risk controls |
|
|
| `broker-adapter` | Interfaces with broker APIs for paper/live trading |
|
|
| `lake-publisher` | Writes analytical Parquet datasets to MinIO |
|
|
| `query-api` | REST API for all operational and analytical queries |
|
|
| `dashboard` | React SPA served via nginx |
|
|
|
|
## Tech Stack
|
|
|
|
- **Language**: Python 3.12, TypeScript (frontend)
|
|
- **AI**: Ollama (local LLM inference with structured JSON output)
|
|
- **Databases**: PostgreSQL 16, Redis 7
|
|
- **Object Storage**: MinIO (S3-compatible)
|
|
- **Lakehouse**: Parquet + Hive partitioning + Iceberg metadata
|
|
- **SQL Engine**: Trino
|
|
- **BI**: Apache Superset
|
|
- **Frontend**: React 19, Vite, TanStack Router/Query, Recharts, Monaco Editor, Tailwind CSS
|
|
- **Infrastructure**: Kubernetes (k3s), Helm, Traefik ingress, cert-manager
|
|
- **CI/CD**: GitHub Actions → GHCR container registry
|
|
- **Broker**: Alpaca (paper trading)
|
|
- **Market Data**: Polygon.io
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
├── services/
|
|
│ ├── shared/ # Config, schemas, Redis keys, logging, audit
|
|
│ ├── scheduler/ # Job scheduling and source polling
|
|
│ ├── symbol_registry/ # Company and source management API
|
|
│ ├── ingestion/ # External API adapters and raw artifact storage
|
|
│ ├── parser/ # HTML parsing, boilerplate reduction, quality scoring
|
|
│ ├── extractor/ # Ollama extraction and schema validation
|
|
│ ├── aggregation/ # Trend computation and contradiction detection
|
|
│ ├── recommendation/ # Recommendation generation and suppression
|
|
│ ├── risk/ # Risk evaluation and approval workflow
|
|
│ ├── adapters/ # Broker API integration
|
|
│ ├── lake_publisher/ # Parquet fact table publication
|
|
│ └── api/ # Query API (FastAPI)
|
|
├── frontend/ # React dashboard SPA
|
|
├── infra/
|
|
│ ├── helm/ # Helm chart for Kubernetes deployment
|
|
│ ├── k8s/ # Raw Kubernetes manifests
|
|
│ ├── migrations/ # PostgreSQL schema migrations
|
|
│ ├── trino/ # Trino catalog configuration
|
|
│ ├── hive/ # Hive metastore configuration
|
|
│ ├── minio/ # MinIO lifecycle policies
|
|
│ └── superset/ # Superset configuration
|
|
├── dashboards/ # Superset dashboard JSON exports
|
|
├── tests/ # Python test suite
|
|
└── docker/ # Dockerfiles for services and Superset
|
|
```
|
|
|
|
## Local Development
|
|
|
|
Prerequisites: Python 3.12, Node.js 24, Docker
|
|
|
|
```bash
|
|
# Start infrastructure
|
|
docker compose up -d
|
|
|
|
# Install Python dependencies
|
|
python3 -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -r requirements.txt
|
|
|
|
# Run tests
|
|
python -m pytest tests/ -x --tb=short -q
|
|
|
|
# Frontend
|
|
cd frontend
|
|
npm install
|
|
npx vitest --run
|
|
```
|
|
|
|
## Deployment
|
|
|
|
The platform runs on Kubernetes with Helm:
|
|
|
|
```bash
|
|
# CI builds and pushes images automatically on push to main
|
|
# Deploy to cluster:
|
|
helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle
|
|
|
|
# Restart a specific service:
|
|
kubectl rollout restart deployment/<service-name> -n stonks-oracle
|
|
```
|
|
|
|
## Live Endpoints
|
|
|
|
| Service | URL |
|
|
|---------|-----|
|
|
| Dashboard | https://stonks.celestium.life |
|
|
| Query API | https://stonks-api.celestium.life |
|
|
| Symbol Registry | https://stonks-registry.celestium.life |
|
|
| Superset | https://stonks-dash.celestium.life |
|
|
| Trino | https://stonks-trino.celestium.life |
|
|
|
|
## License
|
|
|
|
Private repository.
|