# Stonks Oracle - Tasks ## Phase 0 - Project Setup - [ ] Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards - [ ] Choose implementation language for services (Python preferred for scraping/LLM workflows) - [ ] Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset - [ ] Add Kubernetes manifests or Helm chart skeletons for all core components - [ ] Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation ## Phase 1 - Core Data and Infrastructure - [ ] Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records - [ ] Create MinIO bucket provisioning and lifecycle policies - [ ] Create Redis key conventions and queue abstractions - [ ] Implement shared config loader for environment variables and secrets - [ ] Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations - [ ] Stand up initial Trino catalog configuration for MinIO-backed datasets - [ ] Stand up Superset with environment-backed datasource configuration ## Phase 2 - Symbol Registry and Source Management - [ ] Build symbol registry API endpoints for companies, aliases, watchlists, and sources - [ ] Add source credibility, retention policy, and access policy fields - [ ] Add source classes for market data API, news API, filings API, web scrape, and broker adapter - [ ] Add admin validation for duplicate tickers, invalid URLs, and unsupported source types - [ ] Add seed data support for an initial tracked watchlist ## Phase 3 - External API Adapters - [ ] Implement scheduler for symbol and source polling windows - [ ] Implement market data API adapter interface - [ ] Implement first concrete market data provider adapter - [ ] Implement news API adapter interface - [ ] Implement first concrete news API provider adapter - [ ] Implement filings or regulatory adapter interface - [ ] Implement first concrete filings provider adapter - [ ] Implement broker API adapter interface for paper trading and order events - [ ] Implement rate-limit coordination, retries, and backoff across adapters ## Phase 4 - Ingestion Pipeline - [ ] Implement web scraper worker for curated URLs and article pages - [ ] Implement canonical URL normalization and content hashing - [ ] Implement raw artifact upload to MinIO - [ ] Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events - [ ] Implement retry and failure tracking for source retrieval - [ ] Implement dedupe logic across article and filing sources ## Phase 5 - Parsing and Normalization - [ ] Implement HTML-to-text parsing pipeline - [ ] Implement boilerplate reduction and body extraction heuristics - [ ] Implement parser quality scoring and confidence flags - [ ] Implement company mention detection using ticker, alias, and name matching - [ ] Persist normalized text and parser outputs to MinIO and PostgreSQL ## Phase 6 - Ollama Structured Extraction - [ ] Build extraction prompt templates with anti-hallucination instructions - [ ] Build JSON schema definitions for document intelligence extraction - [ ] Implement Ollama client wrapper using structured output format - [ ] Implement schema validation and semantic validation layers - [ ] Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects - [ ] Add retry behavior for invalid or incomplete model responses - [ ] Add model performance metrics and dashboards ## Phase 7 - Aggregation and Trend Engine - [ ] Implement recency decay and source credibility weighting - [ ] Integrate market context features into aggregation windows - [ ] Implement company-level rolling window aggregation - [ ] Implement contradiction detection and disagreement representation - [ ] Implement sector and market rollups - [ ] Implement evidence ranking for supporting and opposing documents - [ ] Persist trend windows and evidence mappings ## Phase 8 - Recommendation Engine - [ ] Design deterministic recommendation eligibility logic - [ ] Implement recommendation generation from aggregated scores and evidence - [ ] Add optional LLM wording layer for thesis generation only - [ ] Persist recommendation objects and evidence citations - [ ] Add suppression logic for low-quality data or low confidence - [ ] Publish prediction facts to analytical tables ## Phase 9 - Risk Engine and Trade Adapter - [ ] Implement portfolio and account risk configuration model - [ ] Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts - [ ] Implement paper trading adapter behavior and state sync - [ ] Integrate first broker API in sandbox mode - [ ] Implement idempotent order submission keys and duplicate prevention - [ ] Implement full execution audit trail - [ ] Add operator approval workflow for live trading mode - [ ] Publish order, fill, and position facts to analytical tables ## Phase 10 - Lakehouse and SQL Analytics - [ ] Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL - [ ] Implement Parquet writers for analytical datasets - [ ] Implement Hive-compatible partition layout conventions on MinIO - [ ] Implement Iceberg table creation and metadata management for analytical datasets - [ ] Implement lake publisher jobs from operational data into analytical fact tables - [ ] Configure Trino catalogs for Hive and or Iceberg access to MinIO - [ ] Add example SQL views for prediction-vs-outcome and paper-trade scorecards ## Phase 11 - Query API and Dashboard - [ ] Build APIs for companies, document timelines, trend summaries, recommendations, and order history - [ ] Build evidence drill-down view linking recommendations to source documents and raw artifacts - [ ] Build admin controls for source health, symbol configs, and trading mode - [ ] Build operational dashboard for ingestion throughput, model failures, and source coverage gaps - [ ] Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy ## Phase 12 - Observability and Hardening - [ ] Add structured logs and distributed tracing across services - [ ] Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading - [ ] Add alerting for source failures, schema failure spikes, analytical lag, and broker issues - [ ] Add dead-letter queues and replay tooling - [ ] Add data retention and lifecycle controls for raw and derived artifacts - [ ] Add security review for secrets, network policies, trading isolation, and dashboard access control ## Phase 13 - Verification and Rollout - [ ] Create replay dataset from archived documents for deterministic extraction testing - [ ] Create integration tests for the full ingest-to-recommendation flow - [ ] Create paper trading simulation scenarios - [ ] Validate fail-closed behavior for broker outages and ambiguous order states - [ ] Validate lake publication and Trino query correctness over partitioned MinIO datasets - [ ] Run shadow mode before enabling any live execution - [ ] Prepare operator runbook and incident response procedures ## Recommended First Vertical Slice - [ ] Track 5 to 10 symbols - [ ] Ingest one market data API, one news API, and one filings source per symbol group - [ ] Persist raw artifacts to MinIO and metadata to PostgreSQL - [ ] Extract structured document intelligence through Ollama - [ ] Generate 7-day company trend summaries with market context - [ ] Produce paper-trade recommendations only - [ ] Publish analytical facts for bars, signals, and paper trades into MinIO - [ ] Expose a simple dashboard with evidence, trend cards, and prediction-vs-outcome views