481 lines
15 KiB
Markdown
481 lines
15 KiB
Markdown
# Stonks Oracle - Design
|
|
|
|
## 1. Purpose
|
|
Stonks Oracle is a Kubernetes-native AI market intelligence and trading platform. It ingests structured market data, company news, filings, and curated web content; preserves raw artifacts in MinIO; extracts structured intelligence objects with local Ollama models; aggregates signals into trend and recommendation outputs; optionally executes trades through a broker integration; and publishes historical datasets into a local lakehouse for Athena-like querying and QuickSight-like dashboards.
|
|
|
|
This design prioritizes:
|
|
- deterministic data contracts
|
|
- auditability of every AI-derived conclusion
|
|
- safe paper-trading-first automation
|
|
- self-hosted analytics on MinIO-backed datasets
|
|
- clear separation between operational state and analytical state
|
|
|
|
## 2. Architecture Summary
|
|
The platform is split into two planes:
|
|
|
|
### 2.1 Operational plane
|
|
Handles ingestion, parsing, structured extraction, signal generation, risk evaluation, trade execution, and control APIs.
|
|
|
|
Primary stores:
|
|
- PostgreSQL for operational state and transactional records
|
|
- Redis for queues, locks, and hot cache state
|
|
- MinIO for raw artifacts, prompts, model outputs, and exported datasets
|
|
|
|
### 2.2 Analytical plane
|
|
Handles historical fact storage, SQL query access, research, scorecards, and dashboards.
|
|
|
|
Primary components:
|
|
- MinIO as S3-compatible object store
|
|
- Hive-compatible partition layout for query compatibility
|
|
- Iceberg tables as the preferred lakehouse abstraction for managed analytical datasets
|
|
- Trino as the Athena-like SQL query engine
|
|
- Apache Superset as the QuickSight-like dashboard and exploration layer
|
|
|
|
## 3. External Integrations
|
|
|
|
### 3.1 Market Data API
|
|
Used for:
|
|
- quotes
|
|
- OHLCV bars
|
|
- reference data
|
|
- corporate actions
|
|
- earnings calendars
|
|
- optional market news or fundamentals
|
|
|
|
### 3.2 News API
|
|
Used for:
|
|
- company-linked headlines
|
|
- publisher metadata
|
|
- article URLs
|
|
- article summaries when licensed
|
|
|
|
### 3.3 Filings / Regulatory API
|
|
Used for:
|
|
- SEC-style company submissions
|
|
- 8-K, 10-Q, 10-K, and related filings
|
|
- structured issuer event discovery
|
|
|
|
### 3.4 Web Scraper
|
|
Used for:
|
|
- full article body retrieval when API content is partial
|
|
- investor relations pages
|
|
- curated press release sources
|
|
- transcript or presentation retrieval when permitted
|
|
|
|
### 3.5 Broker API
|
|
Used for:
|
|
- paper-trading simulation or sandbox trading
|
|
- live order submission when enabled
|
|
- order acknowledgements and rejections
|
|
- fills and cancellations
|
|
- positions and account balances
|
|
|
|
## 4. Logical Components
|
|
|
|
### 4.1 Symbol Registry Service
|
|
Responsibilities:
|
|
- manage companies, aliases, watchlists, sectors, and source configurations
|
|
- manage source trust or credibility policies
|
|
- manage symbol-to-document matching rules
|
|
|
|
### 4.2 Scheduler / Orchestrator
|
|
Responsibilities:
|
|
- trigger market, news, filings, and scrape jobs
|
|
- manage polling cadences by source class
|
|
- coordinate backoff, retries, and dedupe windows
|
|
- publish downstream jobs to workers
|
|
|
|
### 4.3 Ingestion Adapters
|
|
Subcomponents:
|
|
- Market data adapter
|
|
- News API adapter
|
|
- Filings adapter
|
|
- Broker event adapter
|
|
|
|
Responsibilities:
|
|
- fetch external payloads
|
|
- preserve raw responses in MinIO
|
|
- normalize metadata into PostgreSQL
|
|
- emit processing jobs for parsing or publication
|
|
|
|
### 4.4 Scraper / Parser Service
|
|
Responsibilities:
|
|
- fetch and render source pages
|
|
- extract normalized text and metadata
|
|
- reduce boilerplate and duplicated template text
|
|
- score parser quality and extraction confidence
|
|
- persist normalized artifacts
|
|
|
|
### 4.5 Ollama Extraction Service
|
|
Responsibilities:
|
|
- call local Ollama models using schema-constrained JSON output
|
|
- produce canonical document intelligence objects
|
|
- preserve prompts, schemas, model metadata, and raw outputs
|
|
- validate schema and semantic consistency
|
|
- retry invalid generations under policy
|
|
|
|
### 4.6 Aggregation Engine
|
|
Responsibilities:
|
|
- combine document intelligence with market context
|
|
- compute rolling trend summaries by company, sector, and market
|
|
- track contradiction and agreement signals
|
|
- score evidence with recency decay and source weighting
|
|
|
|
### 4.7 Recommendation Engine
|
|
Responsibilities:
|
|
- generate explainable recommendation objects from aggregated evidence
|
|
- separate deterministic eligibility scoring from final action mapping
|
|
- produce suggested action, thesis, horizon, and invalidation conditions
|
|
- publish analytical prediction facts to the lake
|
|
|
|
### 4.8 Risk Engine
|
|
Responsibilities:
|
|
- enforce guardrails such as max position size, daily loss cap, exposure by sector, symbol cooldowns, news shock lockouts, and operator approval rules
|
|
- determine whether a recommendation is eligible for paper or live execution
|
|
- block ambiguous or unsafe orders before broker submission
|
|
|
|
### 4.9 Broker Adapter
|
|
Responsibilities:
|
|
- abstract one or more trading APIs
|
|
- support paper mode and live mode
|
|
- record submission, acknowledgement, rejection, fill, and cancellation events
|
|
- guarantee idempotent order submission keys
|
|
- publish order and fill facts to both PostgreSQL and the analytical lake
|
|
|
|
### 4.10 Lake Publisher
|
|
Responsibilities:
|
|
- transform operational records into analytics-friendly fact datasets
|
|
- publish append-only partitioned tables to MinIO
|
|
- maintain Iceberg metadata or equivalent lakehouse metadata
|
|
- expose datasets such as predictions, outcomes, fills, bars, and PnL
|
|
|
|
### 4.11 Query API / Dashboard
|
|
Responsibilities:
|
|
- expose companies, documents, trends, recommendations, and orders
|
|
- provide evidence drill-down and audit views
|
|
- provide operator controls for live-trading enablement and review queues
|
|
- expose links into analytical dashboards and query tools
|
|
|
|
### 4.12 SQL Query Engine and BI Layer
|
|
Components:
|
|
- Trino coordinator and workers
|
|
- Hive Metastore or Iceberg catalog service
|
|
- Apache Superset
|
|
|
|
Responsibilities:
|
|
- provide Athena-like SQL access to MinIO-hosted tables
|
|
- support dashboard datasets and ad hoc exploration
|
|
- support joins between market facts, AI predictions, and executed trades
|
|
|
|
## 5. Storage Model
|
|
|
|
### 5.1 Operational stores
|
|
#### PostgreSQL
|
|
Used for:
|
|
- companies and aliases
|
|
- watchlists and source configs
|
|
- article and filing metadata
|
|
- document intelligence objects
|
|
- trend summaries
|
|
- recommendations
|
|
- risk evaluations
|
|
- orders and execution events
|
|
- control-plane state and audit records
|
|
|
|
#### Redis
|
|
Used for:
|
|
- distributed locks for symbol-source retrieval
|
|
- ingestion rate-limit counters
|
|
- job queue state
|
|
- retry backoff state
|
|
- dedupe markers
|
|
- cache for hot API and dashboard views
|
|
|
|
#### MinIO object storage
|
|
Used for:
|
|
- raw API payloads
|
|
- raw article HTML and normalized text
|
|
- prompts, schemas, and raw model results
|
|
- exported analytical datasets
|
|
- audit traces and reproducibility bundles
|
|
|
|
### 5.2 MinIO bucket layout
|
|
Recommended buckets:
|
|
- `stonks-raw-market` — raw market API payloads
|
|
- `stonks-raw-news` — raw news API payloads and article HTML
|
|
- `stonks-raw-filings` — raw filings and issuer event payloads
|
|
- `stonks-normalized` — cleaned text and parser outputs
|
|
- `stonks-llm-prompts` — prompts and schemas used
|
|
- `stonks-llm-results` — raw model outputs and validation reports
|
|
- `stonks-lakehouse` — partitioned analytical datasets and table metadata
|
|
- `stonks-audit` — execution traces and exported reports
|
|
|
|
Suggested raw object path pattern:
|
|
```text
|
|
/{stage}/{symbol}/{yyyy}/{mm}/{dd}/{document_id}/{artifact_type}.json
|
|
/{stage}/{symbol}/{yyyy}/{mm}/{dd}/{document_id}/{artifact_type}.html
|
|
```
|
|
|
|
Suggested analytical path pattern:
|
|
```text
|
|
/warehouse/{table_name}/dt={yyyy-mm-dd}/symbol={ticker}/part-*.parquet
|
|
```
|
|
|
|
### 5.3 Lakehouse model
|
|
Preferred design:
|
|
- Parquet files stored in MinIO
|
|
- Hive-compatible partitioning for interoperability
|
|
- Iceberg table metadata for managed analytical tables
|
|
- Trino catalogs for SQL access
|
|
|
|
Rationale:
|
|
- Hive-compatible layouts preserve broad engine compatibility
|
|
- Iceberg improves schema evolution, partition handling, and table maintenance
|
|
- Trino can query MinIO-backed object storage and supports both Hive and Iceberg catalogs
|
|
|
|
## 6. Data Model
|
|
|
|
### 6.1 PostgreSQL schema outline
|
|
Core tables:
|
|
- `companies`
|
|
- `company_aliases`
|
|
- `watchlists`
|
|
- `watchlist_members`
|
|
- `sources`
|
|
- `api_credentials_refs`
|
|
- `ingestion_runs`
|
|
- `market_snapshots`
|
|
- `documents`
|
|
- `document_versions`
|
|
- `document_company_mentions`
|
|
- `document_intelligence`
|
|
- `document_impact_records`
|
|
- `trend_windows`
|
|
- `recommendations`
|
|
- `recommendation_evidence`
|
|
- `risk_evaluations`
|
|
- `broker_accounts`
|
|
- `orders`
|
|
- `order_events`
|
|
- `positions`
|
|
- `audit_events`
|
|
|
|
### 6.2 Article or document metadata record
|
|
```json
|
|
{
|
|
"document_id": "uuid",
|
|
"document_type": "article|filing|transcript|press_release",
|
|
"symbol_candidates": ["AAPL", "MSFT"],
|
|
"source_type": "news_api",
|
|
"publisher": "string",
|
|
"url": "string",
|
|
"canonical_url": "string",
|
|
"title": "string",
|
|
"published_at": "2026-04-09T00:00:00Z",
|
|
"retrieved_at": "2026-04-09T00:00:00Z",
|
|
"language": "en",
|
|
"content_hash": "sha256",
|
|
"storage_refs": {
|
|
"raw_html": "s3://...",
|
|
"raw_payload": "s3://..."
|
|
}
|
|
}
|
|
```
|
|
|
|
### 6.3 Document intelligence schema
|
|
```json
|
|
{
|
|
"document_id": "uuid",
|
|
"summary": "string",
|
|
"companies": [
|
|
{
|
|
"ticker": "AAPL",
|
|
"company_name": "Apple Inc.",
|
|
"relevance": 0.95,
|
|
"sentiment": "positive",
|
|
"impact_score": 0.71,
|
|
"impact_horizon": "1d_30d",
|
|
"catalyst_type": "earnings|product|legal|macro|supply_chain|m_and_a|rating_change|other",
|
|
"key_facts": ["string"],
|
|
"risks": ["string"],
|
|
"evidence_spans": ["string"]
|
|
}
|
|
],
|
|
"macro_themes": ["rates", "ai_capex"],
|
|
"novelty_score": 0.64,
|
|
"source_credibility": 0.8,
|
|
"extraction_warnings": ["ambiguous_ticker_reference"],
|
|
"confidence": 0.86,
|
|
"model": {
|
|
"provider": "ollama",
|
|
"model_name": "gpt-oss:20b",
|
|
"prompt_version": "document-intel-v2",
|
|
"schema_version": "2.0.0"
|
|
}
|
|
}
|
|
```
|
|
|
|
### 6.4 Trend summary schema
|
|
```json
|
|
{
|
|
"entity_type": "company",
|
|
"entity_id": "AAPL",
|
|
"window": "7d",
|
|
"trend_direction": "bullish|bearish|mixed|neutral",
|
|
"trend_strength": 0.68,
|
|
"confidence": 0.74,
|
|
"top_supporting_evidence": ["document_id_1", "document_id_2"],
|
|
"top_opposing_evidence": ["document_id_3"],
|
|
"dominant_catalysts": ["product", "analyst_rating"],
|
|
"material_risks": ["regulatory scrutiny"],
|
|
"contradiction_score": 0.22
|
|
}
|
|
```
|
|
|
|
### 6.5 Recommendation schema
|
|
```json
|
|
{
|
|
"recommendation_id": "uuid",
|
|
"ticker": "AAPL",
|
|
"action": "buy|sell|hold|watch",
|
|
"mode": "informational|paper_eligible|live_eligible",
|
|
"confidence": 0.72,
|
|
"time_horizon": "swing_1d_10d",
|
|
"thesis": "string",
|
|
"invalidation_conditions": ["string"],
|
|
"position_sizing": {
|
|
"portfolio_pct": 0.02,
|
|
"max_loss_pct": 0.005
|
|
},
|
|
"evidence_refs": ["document_id_1", "document_id_2"],
|
|
"model_metadata": {
|
|
"version": "recommendation-v1"
|
|
}
|
|
}
|
|
```
|
|
|
|
## 7. Analytical Lake Datasets
|
|
The analytical plane should expose the following logical fact tables:
|
|
- `lake.market_bars`
|
|
- `lake.market_quotes`
|
|
- `lake.company_events`
|
|
- `lake.documents`
|
|
- `lake.document_extractions`
|
|
- `lake.trade_signals`
|
|
- `lake.trade_orders`
|
|
- `lake.trade_fills`
|
|
- `lake.positions_daily`
|
|
- `lake.pnl_daily`
|
|
- `lake.prediction_vs_outcome`
|
|
|
|
Recommended partitioning examples:
|
|
- market data: partition by `dt`, optional symbol transform later
|
|
- documents: partition by `dt` and maybe `source_type`
|
|
- predictions: partition by `dt` and `model_version`
|
|
- fills and PnL: partition by `dt` and broker account
|
|
|
|
## 8. Data Flows
|
|
|
|
### 8.1 Market and document ingestion flow
|
|
1. Scheduler selects due symbols and sources.
|
|
2. Adapters fetch market, news, and filings payloads.
|
|
3. Raw payloads are written to MinIO.
|
|
4. Metadata records are written to PostgreSQL.
|
|
5. New documents are emitted to parser jobs.
|
|
|
|
### 8.2 Extraction flow
|
|
1. Parser produces normalized text and confidence score.
|
|
2. Extraction worker sends document to Ollama with schema-bound output.
|
|
3. Validator checks schema and semantic consistency.
|
|
4. Canonical intelligence object is stored in PostgreSQL and MinIO.
|
|
5. Aggregation jobs are triggered for impacted symbols.
|
|
|
|
### 8.3 Recommendation and trade flow
|
|
1. Aggregation engine updates trend windows.
|
|
2. Recommendation engine emits a recommendation object.
|
|
3. Risk engine determines eligibility and allowed execution mode.
|
|
4. Broker adapter places paper or live orders when authorized.
|
|
5. Broker events update PostgreSQL and publish analytical facts to the lake.
|
|
|
|
### 8.4 Lake publication flow
|
|
1. Operational records are transformed into analytical facts.
|
|
2. Facts are written as partitioned Parquet files to MinIO.
|
|
3. Table metadata is updated through Iceberg or equivalent catalog operations.
|
|
4. Trino exposes the datasets for SQL.
|
|
5. Superset uses Trino datasets for dashboards and ad hoc exploration.
|
|
|
|
## 9. Query and Dashboard Surface
|
|
|
|
### 9.1 Operational API
|
|
Should expose:
|
|
- company and watchlist configuration
|
|
- source health and job state
|
|
- document timelines and evidence
|
|
- recommendation history
|
|
- order history and audit trail
|
|
- risk configuration and trading mode
|
|
|
|
### 9.2 Analytical surface
|
|
Should expose:
|
|
- SQL access through Trino
|
|
- dashboard datasets in Superset
|
|
- scorecards for prediction accuracy and PnL
|
|
- evidence-to-outcome drill-down views
|
|
- model performance and extraction failure dashboards
|
|
|
|
Suggested starter dashboards:
|
|
- symbol overview
|
|
- market sentiment heatmap
|
|
- prediction confidence vs realized move
|
|
- paper trading PnL
|
|
- model extraction quality
|
|
- source coverage and ingestion lag
|
|
|
|
## 10. Reliability and Safety
|
|
- Broker submission must be idempotent.
|
|
- Live trading must be disabled by default.
|
|
- Paper trading must be the first enabled execution mode.
|
|
- Invalid model output must not advance to trade execution.
|
|
- Low-quality document extraction must not influence live trading.
|
|
- All analytical publication jobs should be replayable.
|
|
- Every recommendation and order should be reproducible from saved prompts, source refs, and model metadata.
|
|
|
|
## 11. Deployment Notes
|
|
Recommended Kubernetes workloads:
|
|
- `symbol-registry-api`
|
|
- `scheduler`
|
|
- `market-adapter-worker`
|
|
- `news-adapter-worker`
|
|
- `filings-adapter-worker`
|
|
- `scraper-worker`
|
|
- `parser-worker`
|
|
- `ollama-extractor-worker`
|
|
- `aggregation-worker`
|
|
- `recommendation-worker`
|
|
- `risk-engine-api`
|
|
- `broker-adapter`
|
|
- `lake-publisher`
|
|
- `trino-coordinator`
|
|
- `trino-worker`
|
|
- `superset-web`
|
|
- `postgres`
|
|
- `redis`
|
|
- `minio`
|
|
|
|
## 12. Deliberate Scope Boundaries for v1
|
|
Included in v1:
|
|
- tracked watchlists
|
|
- market, news, filings, and broker integrations
|
|
- Ollama structured extraction
|
|
- trend aggregation and recommendation objects
|
|
- paper trading with strict controls
|
|
- MinIO-backed analytics lake
|
|
- Trino and Superset self-hosted analytics
|
|
|
|
Deferred from v1:
|
|
- options trading
|
|
- full order book or tick-level market microstructure
|
|
- online model retraining
|
|
- fully autonomous live trading with no approval workflow
|
|
- advanced portfolio optimization beyond basic sizing and risk caps
|