15 KiB
Stonks Oracle - Design
1. Purpose
Stonks Oracle is a Kubernetes-native AI market intelligence and trading platform. It ingests structured market data, company news, filings, and curated web content; preserves raw artifacts in MinIO; extracts structured intelligence objects with local Ollama models; aggregates signals into trend and recommendation outputs; optionally executes trades through a broker integration; and publishes historical datasets into a local lakehouse for Athena-like querying and QuickSight-like dashboards.
This design prioritizes:
- deterministic data contracts
- auditability of every AI-derived conclusion
- safe paper-trading-first automation
- self-hosted analytics on MinIO-backed datasets
- clear separation between operational state and analytical state
2. Architecture Summary
The platform is split into two planes:
2.1 Operational plane
Handles ingestion, parsing, structured extraction, signal generation, risk evaluation, trade execution, and control APIs.
Primary stores:
- PostgreSQL for operational state and transactional records
- Redis for queues, locks, and hot cache state
- MinIO for raw artifacts, prompts, model outputs, and exported datasets
2.2 Analytical plane
Handles historical fact storage, SQL query access, research, scorecards, and dashboards.
Primary components:
- MinIO as S3-compatible object store
- Hive-compatible partition layout for query compatibility
- Iceberg tables as the preferred lakehouse abstraction for managed analytical datasets
- Trino as the Athena-like SQL query engine
- Apache Superset as the QuickSight-like dashboard and exploration layer
3. External Integrations
3.1 Market Data API
Used for:
- quotes
- OHLCV bars
- reference data
- corporate actions
- earnings calendars
- optional market news or fundamentals
3.2 News API
Used for:
- company-linked headlines
- publisher metadata
- article URLs
- article summaries when licensed
3.3 Filings / Regulatory API
Used for:
- SEC-style company submissions
- 8-K, 10-Q, 10-K, and related filings
- structured issuer event discovery
3.4 Web Scraper
Used for:
- full article body retrieval when API content is partial
- investor relations pages
- curated press release sources
- transcript or presentation retrieval when permitted
3.5 Broker API
Used for:
- paper-trading simulation or sandbox trading
- live order submission when enabled
- order acknowledgements and rejections
- fills and cancellations
- positions and account balances
4. Logical Components
4.1 Symbol Registry Service
Responsibilities:
- manage companies, aliases, watchlists, sectors, and source configurations
- manage source trust or credibility policies
- manage symbol-to-document matching rules
4.2 Scheduler / Orchestrator
Responsibilities:
- trigger market, news, filings, and scrape jobs
- manage polling cadences by source class
- coordinate backoff, retries, and dedupe windows
- publish downstream jobs to workers
4.3 Ingestion Adapters
Subcomponents:
- Market data adapter
- News API adapter
- Filings adapter
- Broker event adapter
Responsibilities:
- fetch external payloads
- preserve raw responses in MinIO
- normalize metadata into PostgreSQL
- emit processing jobs for parsing or publication
4.4 Scraper / Parser Service
Responsibilities:
- fetch and render source pages
- extract normalized text and metadata
- reduce boilerplate and duplicated template text
- score parser quality and extraction confidence
- persist normalized artifacts
4.5 Ollama Extraction Service
Responsibilities:
- call local Ollama models using schema-constrained JSON output
- produce canonical document intelligence objects
- preserve prompts, schemas, model metadata, and raw outputs
- validate schema and semantic consistency
- retry invalid generations under policy
4.6 Aggregation Engine
Responsibilities:
- combine document intelligence with market context
- compute rolling trend summaries by company, sector, and market
- track contradiction and agreement signals
- score evidence with recency decay and source weighting
4.7 Recommendation Engine
Responsibilities:
- generate explainable recommendation objects from aggregated evidence
- separate deterministic eligibility scoring from final action mapping
- produce suggested action, thesis, horizon, and invalidation conditions
- publish analytical prediction facts to the lake
4.8 Risk Engine
Responsibilities:
- enforce guardrails such as max position size, daily loss cap, exposure by sector, symbol cooldowns, news shock lockouts, and operator approval rules
- determine whether a recommendation is eligible for paper or live execution
- block ambiguous or unsafe orders before broker submission
4.9 Broker Adapter
Responsibilities:
- abstract one or more trading APIs
- support paper mode and live mode
- record submission, acknowledgement, rejection, fill, and cancellation events
- guarantee idempotent order submission keys
- publish order and fill facts to both PostgreSQL and the analytical lake
4.10 Lake Publisher
Responsibilities:
- transform operational records into analytics-friendly fact datasets
- publish append-only partitioned tables to MinIO
- maintain Iceberg metadata or equivalent lakehouse metadata
- expose datasets such as predictions, outcomes, fills, bars, and PnL
4.11 Query API / Dashboard
Responsibilities:
- expose companies, documents, trends, recommendations, and orders
- provide evidence drill-down and audit views
- provide operator controls for live-trading enablement and review queues
- expose links into analytical dashboards and query tools
4.12 SQL Query Engine and BI Layer
Components:
- Trino coordinator and workers
- Hive Metastore or Iceberg catalog service
- Apache Superset
Responsibilities:
- provide Athena-like SQL access to MinIO-hosted tables
- support dashboard datasets and ad hoc exploration
- support joins between market facts, AI predictions, and executed trades
5. Storage Model
5.1 Operational stores
PostgreSQL
Used for:
- companies and aliases
- watchlists and source configs
- article and filing metadata
- document intelligence objects
- trend summaries
- recommendations
- risk evaluations
- orders and execution events
- control-plane state and audit records
Redis
Used for:
- distributed locks for symbol-source retrieval
- ingestion rate-limit counters
- job queue state
- retry backoff state
- dedupe markers
- cache for hot API and dashboard views
MinIO object storage
Used for:
- raw API payloads
- raw article HTML and normalized text
- prompts, schemas, and raw model results
- exported analytical datasets
- audit traces and reproducibility bundles
5.2 MinIO bucket layout
Recommended buckets:
stonks-raw-market— raw market API payloadsstonks-raw-news— raw news API payloads and article HTMLstonks-raw-filings— raw filings and issuer event payloadsstonks-normalized— cleaned text and parser outputsstonks-llm-prompts— prompts and schemas usedstonks-llm-results— raw model outputs and validation reportsstonks-lakehouse— partitioned analytical datasets and table metadatastonks-audit— execution traces and exported reports
Suggested raw object path pattern:
/{stage}/{symbol}/{yyyy}/{mm}/{dd}/{document_id}/{artifact_type}.json
/{stage}/{symbol}/{yyyy}/{mm}/{dd}/{document_id}/{artifact_type}.html
Suggested analytical path pattern:
/warehouse/{table_name}/dt={yyyy-mm-dd}/symbol={ticker}/part-*.parquet
5.3 Lakehouse model
Preferred design:
- Parquet files stored in MinIO
- Hive-compatible partitioning for interoperability
- Iceberg table metadata for managed analytical tables
- Trino catalogs for SQL access
Rationale:
- Hive-compatible layouts preserve broad engine compatibility
- Iceberg improves schema evolution, partition handling, and table maintenance
- Trino can query MinIO-backed object storage and supports both Hive and Iceberg catalogs
6. Data Model
6.1 PostgreSQL schema outline
Core tables:
companiescompany_aliaseswatchlistswatchlist_memberssourcesapi_credentials_refsingestion_runsmarket_snapshotsdocumentsdocument_versionsdocument_company_mentionsdocument_intelligencedocument_impact_recordstrend_windowsrecommendationsrecommendation_evidencerisk_evaluationsbroker_accountsordersorder_eventspositionsaudit_events
6.2 Article or document metadata record
{
"document_id": "uuid",
"document_type": "article|filing|transcript|press_release",
"symbol_candidates": ["AAPL", "MSFT"],
"source_type": "news_api",
"publisher": "string",
"url": "string",
"canonical_url": "string",
"title": "string",
"published_at": "2026-04-09T00:00:00Z",
"retrieved_at": "2026-04-09T00:00:00Z",
"language": "en",
"content_hash": "sha256",
"storage_refs": {
"raw_html": "s3://...",
"raw_payload": "s3://..."
}
}
6.3 Document intelligence schema
{
"document_id": "uuid",
"summary": "string",
"companies": [
{
"ticker": "AAPL",
"company_name": "Apple Inc.",
"relevance": 0.95,
"sentiment": "positive",
"impact_score": 0.71,
"impact_horizon": "1d_30d",
"catalyst_type": "earnings|product|legal|macro|supply_chain|m_and_a|rating_change|other",
"key_facts": ["string"],
"risks": ["string"],
"evidence_spans": ["string"]
}
],
"macro_themes": ["rates", "ai_capex"],
"novelty_score": 0.64,
"source_credibility": 0.8,
"extraction_warnings": ["ambiguous_ticker_reference"],
"confidence": 0.86,
"model": {
"provider": "ollama",
"model_name": "gpt-oss:20b",
"prompt_version": "document-intel-v2",
"schema_version": "2.0.0"
}
}
6.4 Trend summary schema
{
"entity_type": "company",
"entity_id": "AAPL",
"window": "7d",
"trend_direction": "bullish|bearish|mixed|neutral",
"trend_strength": 0.68,
"confidence": 0.74,
"top_supporting_evidence": ["document_id_1", "document_id_2"],
"top_opposing_evidence": ["document_id_3"],
"dominant_catalysts": ["product", "analyst_rating"],
"material_risks": ["regulatory scrutiny"],
"contradiction_score": 0.22
}
6.5 Recommendation schema
{
"recommendation_id": "uuid",
"ticker": "AAPL",
"action": "buy|sell|hold|watch",
"mode": "informational|paper_eligible|live_eligible",
"confidence": 0.72,
"time_horizon": "swing_1d_10d",
"thesis": "string",
"invalidation_conditions": ["string"],
"position_sizing": {
"portfolio_pct": 0.02,
"max_loss_pct": 0.005
},
"evidence_refs": ["document_id_1", "document_id_2"],
"model_metadata": {
"version": "recommendation-v1"
}
}
7. Analytical Lake Datasets
The analytical plane should expose the following logical fact tables:
lake.market_barslake.market_quoteslake.company_eventslake.documentslake.document_extractionslake.trade_signalslake.trade_orderslake.trade_fillslake.positions_dailylake.pnl_dailylake.prediction_vs_outcome
Recommended partitioning examples:
- market data: partition by
dt, optional symbol transform later - documents: partition by
dtand maybesource_type - predictions: partition by
dtandmodel_version - fills and PnL: partition by
dtand broker account
8. Data Flows
8.1 Market and document ingestion flow
- Scheduler selects due symbols and sources.
- Adapters fetch market, news, and filings payloads.
- Raw payloads are written to MinIO.
- Metadata records are written to PostgreSQL.
- New documents are emitted to parser jobs.
8.2 Extraction flow
- Parser produces normalized text and confidence score.
- Extraction worker sends document to Ollama with schema-bound output.
- Validator checks schema and semantic consistency.
- Canonical intelligence object is stored in PostgreSQL and MinIO.
- Aggregation jobs are triggered for impacted symbols.
8.3 Recommendation and trade flow
- Aggregation engine updates trend windows.
- Recommendation engine emits a recommendation object.
- Risk engine determines eligibility and allowed execution mode.
- Broker adapter places paper or live orders when authorized.
- Broker events update PostgreSQL and publish analytical facts to the lake.
8.4 Lake publication flow
- Operational records are transformed into analytical facts.
- Facts are written as partitioned Parquet files to MinIO.
- Table metadata is updated through Iceberg or equivalent catalog operations.
- Trino exposes the datasets for SQL.
- Superset uses Trino datasets for dashboards and ad hoc exploration.
9. Query and Dashboard Surface
9.1 Operational API
Should expose:
- company and watchlist configuration
- source health and job state
- document timelines and evidence
- recommendation history
- order history and audit trail
- risk configuration and trading mode
9.2 Analytical surface
Should expose:
- SQL access through Trino
- dashboard datasets in Superset
- scorecards for prediction accuracy and PnL
- evidence-to-outcome drill-down views
- model performance and extraction failure dashboards
Suggested starter dashboards:
- symbol overview
- market sentiment heatmap
- prediction confidence vs realized move
- paper trading PnL
- model extraction quality
- source coverage and ingestion lag
10. Reliability and Safety
- Broker submission must be idempotent.
- Live trading must be disabled by default.
- Paper trading must be the first enabled execution mode.
- Invalid model output must not advance to trade execution.
- Low-quality document extraction must not influence live trading.
- All analytical publication jobs should be replayable.
- Every recommendation and order should be reproducible from saved prompts, source refs, and model metadata.
11. Deployment Notes
Recommended Kubernetes workloads:
symbol-registry-apischedulermarket-adapter-workernews-adapter-workerfilings-adapter-workerscraper-workerparser-workerollama-extractor-workeraggregation-workerrecommendation-workerrisk-engine-apibroker-adapterlake-publishertrino-coordinatortrino-workersuperset-webpostgresredisminio
12. Deliberate Scope Boundaries for v1
Included in v1:
- tracked watchlists
- market, news, filings, and broker integrations
- Ollama structured extraction
- trend aggregation and recommendation objects
- paper trading with strict controls
- MinIO-backed analytics lake
- Trino and Superset self-hosted analytics
Deferred from v1:
- options trading
- full order book or tick-level market microstructure
- online model retraining
- fully autonomous live trading with no approval workflow
- advanced portfolio optimization beyond basic sizing and risk caps