feat: competitive intelligence & historical pattern matching layer
This commit is contained in:
@@ -0,0 +1,224 @@
|
||||
# Stonks Oracle
|
||||
|
||||
AI-powered market intelligence and paper-trading platform. Ingests market data, company news, and regulatory filings; extracts structured intelligence with local LLMs; computes trend summaries and trade recommendations; and optionally executes paper trades — all self-hosted on Kubernetes.
|
||||
|
||||
## What It Does
|
||||
|
||||
Stonks Oracle monitors tracked companies across multiple data sources, runs every article and filing through a local Ollama model to extract structured intelligence (sentiment, catalysts, risks, key facts), aggregates those signals into rolling trend summaries with contradiction detection, and generates explainable trade recommendations with risk controls.
|
||||
|
||||
Everything is auditable — raw artifacts, prompts, model outputs, and decision traces are preserved. Historical data flows into a MinIO-backed lakehouse queryable via Trino and visualized through Superset dashboards and a built-in React dashboard.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐
|
||||
│ Scheduler │───▶│ Ingestion│───▶│ Parser │───▶│ Extractor │
|
||||
└─────────────┘ └──────────┘ └──────────┘ └──────┬──────┘
|
||||
│
|
||||
┌──────────────────────────────────────┘
|
||||
▼
|
||||
┌─────────────┐ ┌────────────────┐ ┌──────────────┐
|
||||
│ Aggregation │───▶│ Recommendation │───▶│ Risk Engine │
|
||||
└─────────────┘ └────────────────┘ └──────┬───────┘
|
||||
│
|
||||
┌────────────────────────────────────────┘
|
||||
▼
|
||||
┌──────────────┐ ┌────────────────┐
|
||||
│Broker Adapter│ │ Lake Publisher │
|
||||
└──────────────┘ └────────────────┘
|
||||
│
|
||||
┌────────────────────┘
|
||||
▼
|
||||
┌──────────┐ ┌──────────┐ ┌───────────┐
|
||||
│ Trino │ │ Superset │ │ Dashboard │
|
||||
└──────────┘ └──────────┘ └───────────┘
|
||||
```
|
||||
|
||||
Two planes:
|
||||
- **Operational** — ingestion, parsing, extraction, aggregation, recommendations, risk evaluation, trade execution (PostgreSQL, Redis, MinIO)
|
||||
- **Analytical** — historical fact tables, SQL queries, dashboards (MinIO/Parquet, Trino, Superset)
|
||||
|
||||
## Features
|
||||
|
||||
### Data Ingestion
|
||||
- Market data via Polygon.io (quotes, OHLCV bars, corporate actions)
|
||||
- Company news via news APIs with full article scraping
|
||||
- SEC filings and regulatory events
|
||||
- Configurable polling intervals, rate limiting, retries, and backoff
|
||||
- Content hash deduplication across all sources
|
||||
- Raw artifact preservation in MinIO for full auditability
|
||||
|
||||
### AI-Powered Extraction
|
||||
- Local Ollama models with schema-constrained JSON output
|
||||
- Per-document intelligence: sentiment, catalysts, impact horizon, key facts, risks, macro themes
|
||||
- Per-company impact records when a document mentions multiple companies
|
||||
- Schema and semantic validation with retry on invalid outputs
|
||||
- Prompt, model metadata, and raw output preservation for reproducibility
|
||||
|
||||
### Trend Aggregation
|
||||
- Rolling company-level trend summaries across 5 windows (intraday, 1d, 7d, 30d, 90d)
|
||||
- Recency decay, source credibility weighting, and document novelty scoring
|
||||
- Contradiction detection with explicit disagreement representation
|
||||
- Sector and market-level rollups
|
||||
- Evidence ranking with top supporting and opposing documents
|
||||
|
||||
### Trade Recommendations
|
||||
- Explainable recommendation objects with action, thesis, confidence, and cited evidence
|
||||
- Deterministic eligibility scoring separated from action mapping
|
||||
- Position sizing based on portfolio rules
|
||||
- Data quality suppression — low-confidence or stale data forces informational-only mode
|
||||
- Optional LLM thesis rewriting for analyst-quality prose
|
||||
|
||||
### Risk Engine and Trading
|
||||
- Paper trading mode and live trading mode as separate environments
|
||||
- Hard blocks: max position size, daily loss cap, sector exposure limits, symbol cooldowns
|
||||
- Operator approval workflow for live trading
|
||||
- Idempotent order submission with duplicate prevention
|
||||
- Fail-closed behavior on broker outages
|
||||
- Full execution audit trail from signal to broker response
|
||||
|
||||
### Lakehouse and SQL Analytics
|
||||
- Parquet fact tables on MinIO with Hive-compatible partitioning
|
||||
- Iceberg table metadata for schema evolution
|
||||
- Trino SQL engine for ad-hoc analytical queries
|
||||
- Fact tables: market bars, documents, extractions, trade signals, orders, fills, positions, PnL, prediction vs outcome
|
||||
- Apache Superset for pre-built dashboards
|
||||
|
||||
### Web Dashboard
|
||||
- React/TypeScript SPA with Tailwind CSS
|
||||
- Company, watchlist, and source management
|
||||
- Document timeline with intelligence drill-down
|
||||
- Trend visualization with evidence chain navigation
|
||||
- Recommendation review with full provenance
|
||||
- Order and position tracking with audit trails
|
||||
- Trading mode controls, risk configuration, approval workflow
|
||||
- DevOps dashboards: pipeline health, ingestion throughput, model performance, source coverage
|
||||
- Interactive SQL explorer with Monaco Editor and chart builder
|
||||
- Pre-built analytical dashboards: symbol overview, sentiment heatmap, prediction accuracy, paper trading PnL, model quality
|
||||
|
||||
### Observability
|
||||
- Structured JSON logging across all services
|
||||
- Prometheus metrics for every pipeline stage
|
||||
- Alerting for source failures, schema failure spikes, analytical lag, and broker issues
|
||||
- Dead-letter queues with replay tooling
|
||||
- Data retention and lifecycle controls
|
||||
|
||||
### Global News Interpolation *(planned)*
|
||||
- Macro/geopolitical event ingestion and Ollama-based classification
|
||||
- Company exposure profiles (geographic revenue mix, supply chain, commodities, market position)
|
||||
- Per-company macro impact scoring with resilience modifiers
|
||||
- Macro signals blended into trend aggregation with configurable weight
|
||||
- Runtime toggle to enable/disable macro signal layer
|
||||
- Forward-looking trend projections combining company momentum with macro trajectories
|
||||
- Dashboard pages for global events, macro exposure panels, and projection visualization
|
||||
|
||||
## Services
|
||||
|
||||
| Service | Description |
|
||||
|---------|-------------|
|
||||
| `scheduler` | Triggers ingestion cycles based on source polling intervals |
|
||||
| `symbol-registry` | Manages companies, aliases, watchlists, sources, and exposure profiles |
|
||||
| `ingestion` | Fetches market data, news, and filings from external APIs |
|
||||
| `parser` | Normalizes raw HTML/text, reduces boilerplate, scores parse quality |
|
||||
| `extractor` | Runs Ollama extraction to produce document intelligence objects |
|
||||
| `aggregation` | Computes rolling trend summaries with contradiction detection |
|
||||
| `recommendation` | Generates trade recommendations from aggregated evidence |
|
||||
| `risk` | Evaluates orders against portfolio risk controls |
|
||||
| `broker-adapter` | Interfaces with broker APIs for paper/live trading |
|
||||
| `lake-publisher` | Writes analytical Parquet datasets to MinIO |
|
||||
| `query-api` | REST API for all operational and analytical queries |
|
||||
| `dashboard` | React SPA served via nginx |
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Language**: Python 3.12, TypeScript (frontend)
|
||||
- **AI**: Ollama (local LLM inference with structured JSON output)
|
||||
- **Databases**: PostgreSQL 16, Redis 7
|
||||
- **Object Storage**: MinIO (S3-compatible)
|
||||
- **Lakehouse**: Parquet + Hive partitioning + Iceberg metadata
|
||||
- **SQL Engine**: Trino
|
||||
- **BI**: Apache Superset
|
||||
- **Frontend**: React 19, Vite, TanStack Router/Query, Recharts, Monaco Editor, Tailwind CSS
|
||||
- **Infrastructure**: Kubernetes (k3s), Helm, Traefik ingress, cert-manager
|
||||
- **CI/CD**: GitHub Actions → GHCR container registry
|
||||
- **Broker**: Alpaca (paper trading)
|
||||
- **Market Data**: Polygon.io
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
├── services/
|
||||
│ ├── shared/ # Config, schemas, Redis keys, logging, audit
|
||||
│ ├── scheduler/ # Job scheduling and source polling
|
||||
│ ├── symbol_registry/ # Company and source management API
|
||||
│ ├── ingestion/ # External API adapters and raw artifact storage
|
||||
│ ├── parser/ # HTML parsing, boilerplate reduction, quality scoring
|
||||
│ ├── extractor/ # Ollama extraction and schema validation
|
||||
│ ├── aggregation/ # Trend computation and contradiction detection
|
||||
│ ├── recommendation/ # Recommendation generation and suppression
|
||||
│ ├── risk/ # Risk evaluation and approval workflow
|
||||
│ ├── adapters/ # Broker API integration
|
||||
│ ├── lake_publisher/ # Parquet fact table publication
|
||||
│ └── api/ # Query API (FastAPI)
|
||||
├── frontend/ # React dashboard SPA
|
||||
├── infra/
|
||||
│ ├── helm/ # Helm chart for Kubernetes deployment
|
||||
│ ├── k8s/ # Raw Kubernetes manifests
|
||||
│ ├── migrations/ # PostgreSQL schema migrations
|
||||
│ ├── trino/ # Trino catalog configuration
|
||||
│ ├── hive/ # Hive metastore configuration
|
||||
│ ├── minio/ # MinIO lifecycle policies
|
||||
│ └── superset/ # Superset configuration
|
||||
├── dashboards/ # Superset dashboard JSON exports
|
||||
├── tests/ # Python test suite
|
||||
└── docker/ # Dockerfiles for services and Superset
|
||||
```
|
||||
|
||||
## Local Development
|
||||
|
||||
Prerequisites: Python 3.12, Node.js 24, Docker
|
||||
|
||||
```bash
|
||||
# Start infrastructure
|
||||
docker compose up -d
|
||||
|
||||
# Install Python dependencies
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run tests
|
||||
python -m pytest tests/ -x --tb=short -q
|
||||
|
||||
# Frontend
|
||||
cd frontend
|
||||
npm install
|
||||
npx vitest --run
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
The platform runs on Kubernetes with Helm:
|
||||
|
||||
```bash
|
||||
# CI builds and pushes images automatically on push to main
|
||||
# Deploy to cluster:
|
||||
helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle
|
||||
|
||||
# Restart a specific service:
|
||||
kubectl rollout restart deployment/<service-name> -n stonks-oracle
|
||||
```
|
||||
|
||||
## Live Endpoints
|
||||
|
||||
| Service | URL |
|
||||
|---------|-----|
|
||||
| Dashboard | https://stonks.celestium.life |
|
||||
| Query API | https://stonks-api.celestium.life |
|
||||
| Symbol Registry | https://stonks-registry.celestium.life |
|
||||
| Superset | https://stonks-dash.celestium.life |
|
||||
| Trino | https://stonks-trino.celestium.life |
|
||||
|
||||
## License
|
||||
|
||||
Private repository.
|
||||
Reference in New Issue
Block a user