Files
stonks-oracle/tasks.md
T
Celes Renata 8cfc4f423b initial commit
2026-04-11 02:15:06 -07:00

7.7 KiB

Stonks Oracle - Tasks

Phase 0 - Project Setup

  • Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards
  • Choose implementation language for services (Python preferred for scraping/LLM workflows)
  • Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset
  • Add Kubernetes manifests or Helm chart skeletons for all core components
  • Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation

Phase 1 - Core Data and Infrastructure

  • Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records
  • Create MinIO bucket provisioning and lifecycle policies
  • Create Redis key conventions and queue abstractions
  • Implement shared config loader for environment variables and secrets
  • Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations
  • Stand up initial Trino catalog configuration for MinIO-backed datasets
  • Stand up Superset with environment-backed datasource configuration

Phase 2 - Symbol Registry and Source Management

  • Build symbol registry API endpoints for companies, aliases, watchlists, and sources
  • Add source credibility, retention policy, and access policy fields
  • Add source classes for market data API, news API, filings API, web scrape, and broker adapter
  • Add admin validation for duplicate tickers, invalid URLs, and unsupported source types
  • Add seed data support for an initial tracked watchlist

Phase 3 - External API Adapters

  • Implement scheduler for symbol and source polling windows
  • Implement market data API adapter interface
  • Implement first concrete market data provider adapter
  • Implement news API adapter interface
  • Implement first concrete news API provider adapter
  • Implement filings or regulatory adapter interface
  • Implement first concrete filings provider adapter
  • Implement broker API adapter interface for paper trading and order events
  • Implement rate-limit coordination, retries, and backoff across adapters

Phase 4 - Ingestion Pipeline

  • Implement web scraper worker for curated URLs and article pages
  • Implement canonical URL normalization and content hashing
  • Implement raw artifact upload to MinIO
  • Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
  • Implement retry and failure tracking for source retrieval
  • Implement dedupe logic across article and filing sources

Phase 5 - Parsing and Normalization

  • Implement HTML-to-text parsing pipeline
  • Implement boilerplate reduction and body extraction heuristics
  • Implement parser quality scoring and confidence flags
  • Implement company mention detection using ticker, alias, and name matching
  • Persist normalized text and parser outputs to MinIO and PostgreSQL

Phase 6 - Ollama Structured Extraction

  • Build extraction prompt templates with anti-hallucination instructions
  • Build JSON schema definitions for document intelligence extraction
  • Implement Ollama client wrapper using structured output format
  • Implement schema validation and semantic validation layers
  • Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
  • Add retry behavior for invalid or incomplete model responses
  • Add model performance metrics and dashboards

Phase 7 - Aggregation and Trend Engine

  • Implement recency decay and source credibility weighting
  • Integrate market context features into aggregation windows
  • Implement company-level rolling window aggregation
  • Implement contradiction detection and disagreement representation
  • Implement sector and market rollups
  • Implement evidence ranking for supporting and opposing documents
  • Persist trend windows and evidence mappings

Phase 8 - Recommendation Engine

  • Design deterministic recommendation eligibility logic
  • Implement recommendation generation from aggregated scores and evidence
  • Add optional LLM wording layer for thesis generation only
  • Persist recommendation objects and evidence citations
  • Add suppression logic for low-quality data or low confidence
  • Publish prediction facts to analytical tables

Phase 9 - Risk Engine and Trade Adapter

  • Implement portfolio and account risk configuration model
  • Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
  • Implement paper trading adapter behavior and state sync
  • Integrate first broker API in sandbox mode
  • Implement idempotent order submission keys and duplicate prevention
  • Implement full execution audit trail
  • Add operator approval workflow for live trading mode
  • Publish order, fill, and position facts to analytical tables

Phase 10 - Lakehouse and SQL Analytics

  • Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
  • Implement Parquet writers for analytical datasets
  • Implement Hive-compatible partition layout conventions on MinIO
  • Implement Iceberg table creation and metadata management for analytical datasets
  • Implement lake publisher jobs from operational data into analytical fact tables
  • Configure Trino catalogs for Hive and or Iceberg access to MinIO
  • Add example SQL views for prediction-vs-outcome and paper-trade scorecards

Phase 11 - Query API and Dashboard

  • Build APIs for companies, document timelines, trend summaries, recommendations, and order history
  • Build evidence drill-down view linking recommendations to source documents and raw artifacts
  • Build admin controls for source health, symbol configs, and trading mode
  • Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
  • Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy

Phase 12 - Observability and Hardening

  • Add structured logs and distributed tracing across services
  • Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
  • Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
  • Add dead-letter queues and replay tooling
  • Add data retention and lifecycle controls for raw and derived artifacts
  • Add security review for secrets, network policies, trading isolation, and dashboard access control

Phase 13 - Verification and Rollout

  • Create replay dataset from archived documents for deterministic extraction testing
  • Create integration tests for the full ingest-to-recommendation flow
  • Create paper trading simulation scenarios
  • Validate fail-closed behavior for broker outages and ambiguous order states
  • Validate lake publication and Trino query correctness over partitioned MinIO datasets
  • Run shadow mode before enabling any live execution
  • Prepare operator runbook and incident response procedures
  • Track 5 to 10 symbols
  • Ingest one market data API, one news API, and one filings source per symbol group
  • Persist raw artifacts to MinIO and metadata to PostgreSQL
  • Extract structured document intelligence through Ollama
  • Generate 7-day company trend summaries with market context
  • Produce paper-trade recommendations only
  • Publish analytical facts for bars, signals, and paper trades into MinIO
  • Expose a simple dashboard with evidence, trend cards, and prediction-vs-outcome views