7.7 KiB
7.7 KiB
Stonks Oracle - Tasks
Phase 0 - Project Setup
- Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards
- Choose implementation language for services (Python preferred for scraping/LLM workflows)
- Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset
- Add Kubernetes manifests or Helm chart skeletons for all core components
- Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation
Phase 1 - Core Data and Infrastructure
- Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records
- Create MinIO bucket provisioning and lifecycle policies
- Create Redis key conventions and queue abstractions
- Implement shared config loader for environment variables and secrets
- Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations
- Stand up initial Trino catalog configuration for MinIO-backed datasets
- Stand up Superset with environment-backed datasource configuration
Phase 2 - Symbol Registry and Source Management
- Build symbol registry API endpoints for companies, aliases, watchlists, and sources
- Add source credibility, retention policy, and access policy fields
- Add source classes for market data API, news API, filings API, web scrape, and broker adapter
- Add admin validation for duplicate tickers, invalid URLs, and unsupported source types
- Add seed data support for an initial tracked watchlist
Phase 3 - External API Adapters
- Implement scheduler for symbol and source polling windows
- Implement market data API adapter interface
- Implement first concrete market data provider adapter
- Implement news API adapter interface
- Implement first concrete news API provider adapter
- Implement filings or regulatory adapter interface
- Implement first concrete filings provider adapter
- Implement broker API adapter interface for paper trading and order events
- Implement rate-limit coordination, retries, and backoff across adapters
Phase 4 - Ingestion Pipeline
- Implement web scraper worker for curated URLs and article pages
- Implement canonical URL normalization and content hashing
- Implement raw artifact upload to MinIO
- Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
- Implement retry and failure tracking for source retrieval
- Implement dedupe logic across article and filing sources
Phase 5 - Parsing and Normalization
- Implement HTML-to-text parsing pipeline
- Implement boilerplate reduction and body extraction heuristics
- Implement parser quality scoring and confidence flags
- Implement company mention detection using ticker, alias, and name matching
- Persist normalized text and parser outputs to MinIO and PostgreSQL
Phase 6 - Ollama Structured Extraction
- Build extraction prompt templates with anti-hallucination instructions
- Build JSON schema definitions for document intelligence extraction
- Implement Ollama client wrapper using structured output format
- Implement schema validation and semantic validation layers
- Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
- Add retry behavior for invalid or incomplete model responses
- Add model performance metrics and dashboards
Phase 7 - Aggregation and Trend Engine
- Implement recency decay and source credibility weighting
- Integrate market context features into aggregation windows
- Implement company-level rolling window aggregation
- Implement contradiction detection and disagreement representation
- Implement sector and market rollups
- Implement evidence ranking for supporting and opposing documents
- Persist trend windows and evidence mappings
Phase 8 - Recommendation Engine
- Design deterministic recommendation eligibility logic
- Implement recommendation generation from aggregated scores and evidence
- Add optional LLM wording layer for thesis generation only
- Persist recommendation objects and evidence citations
- Add suppression logic for low-quality data or low confidence
- Publish prediction facts to analytical tables
Phase 9 - Risk Engine and Trade Adapter
- Implement portfolio and account risk configuration model
- Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
- Implement paper trading adapter behavior and state sync
- Integrate first broker API in sandbox mode
- Implement idempotent order submission keys and duplicate prevention
- Implement full execution audit trail
- Add operator approval workflow for live trading mode
- Publish order, fill, and position facts to analytical tables
Phase 10 - Lakehouse and SQL Analytics
- Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
- Implement Parquet writers for analytical datasets
- Implement Hive-compatible partition layout conventions on MinIO
- Implement Iceberg table creation and metadata management for analytical datasets
- Implement lake publisher jobs from operational data into analytical fact tables
- Configure Trino catalogs for Hive and or Iceberg access to MinIO
- Add example SQL views for prediction-vs-outcome and paper-trade scorecards
Phase 11 - Query API and Dashboard
- Build APIs for companies, document timelines, trend summaries, recommendations, and order history
- Build evidence drill-down view linking recommendations to source documents and raw artifacts
- Build admin controls for source health, symbol configs, and trading mode
- Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
- Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy
Phase 12 - Observability and Hardening
- Add structured logs and distributed tracing across services
- Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
- Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
- Add dead-letter queues and replay tooling
- Add data retention and lifecycle controls for raw and derived artifacts
- Add security review for secrets, network policies, trading isolation, and dashboard access control
Phase 13 - Verification and Rollout
- Create replay dataset from archived documents for deterministic extraction testing
- Create integration tests for the full ingest-to-recommendation flow
- Create paper trading simulation scenarios
- Validate fail-closed behavior for broker outages and ambiguous order states
- Validate lake publication and Trino query correctness over partitioned MinIO datasets
- Run shadow mode before enabling any live execution
- Prepare operator runbook and incident response procedures
Recommended First Vertical Slice
- Track 5 to 10 symbols
- Ingest one market data API, one news API, and one filings source per symbol group
- Persist raw artifacts to MinIO and metadata to PostgreSQL
- Extract structured document intelligence through Ollama
- Generate 7-day company trend summaries with market context
- Produce paper-trade recommendations only
- Publish analytical facts for bars, signals, and paper trades into MinIO
- Expose a simple dashboard with evidence, trend cards, and prediction-vs-outcome views