stonks-oracle/tasks.md

# Stonks Oracle - Tasks

## Phase 0 - Project Setup
- [ ] Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards
- [ ] Choose implementation language for services (Python preferred for scraping/LLM workflows)
- [ ] Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset
- [ ] Add Kubernetes manifests or Helm chart skeletons for all core components
- [ ] Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation

## Phase 1 - Core Data and Infrastructure
- [ ] Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records
- [ ] Create MinIO bucket provisioning and lifecycle policies
- [ ] Create Redis key conventions and queue abstractions
- [ ] Implement shared config loader for environment variables and secrets
- [ ] Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations
- [ ] Stand up initial Trino catalog configuration for MinIO-backed datasets
- [ ] Stand up Superset with environment-backed datasource configuration

## Phase 2 - Symbol Registry and Source Management
- [ ] Build symbol registry API endpoints for companies, aliases, watchlists, and sources
- [ ] Add source credibility, retention policy, and access policy fields
- [ ] Add source classes for market data API, news API, filings API, web scrape, and broker adapter
- [ ] Add admin validation for duplicate tickers, invalid URLs, and unsupported source types
- [ ] Add seed data support for an initial tracked watchlist

## Phase 3 - External API Adapters
- [ ] Implement scheduler for symbol and source polling windows
- [ ] Implement market data API adapter interface
- [ ] Implement first concrete market data provider adapter
- [ ] Implement news API adapter interface
- [ ] Implement first concrete news API provider adapter
- [ ] Implement filings or regulatory adapter interface
- [ ] Implement first concrete filings provider adapter
- [ ] Implement broker API adapter interface for paper trading and order events
- [ ] Implement rate-limit coordination, retries, and backoff across adapters

## Phase 4 - Ingestion Pipeline
- [ ] Implement web scraper worker for curated URLs and article pages
- [ ] Implement canonical URL normalization and content hashing
- [ ] Implement raw artifact upload to MinIO
- [ ] Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
- [ ] Implement retry and failure tracking for source retrieval
- [ ] Implement dedupe logic across article and filing sources

## Phase 5 - Parsing and Normalization
- [ ] Implement HTML-to-text parsing pipeline
- [ ] Implement boilerplate reduction and body extraction heuristics
- [ ] Implement parser quality scoring and confidence flags
- [ ] Implement company mention detection using ticker, alias, and name matching
- [ ] Persist normalized text and parser outputs to MinIO and PostgreSQL

## Phase 6 - Ollama Structured Extraction
- [ ] Build extraction prompt templates with anti-hallucination instructions
- [ ] Build JSON schema definitions for document intelligence extraction
- [ ] Implement Ollama client wrapper using structured output format
- [ ] Implement schema validation and semantic validation layers
- [ ] Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
- [ ] Add retry behavior for invalid or incomplete model responses
- [ ] Add model performance metrics and dashboards

## Phase 7 - Aggregation and Trend Engine
- [ ] Implement recency decay and source credibility weighting
- [ ] Integrate market context features into aggregation windows
- [ ] Implement company-level rolling window aggregation
- [ ] Implement contradiction detection and disagreement representation
- [ ] Implement sector and market rollups
- [ ] Implement evidence ranking for supporting and opposing documents
- [ ] Persist trend windows and evidence mappings

## Phase 8 - Recommendation Engine
- [ ] Design deterministic recommendation eligibility logic
- [ ] Implement recommendation generation from aggregated scores and evidence
- [ ] Add optional LLM wording layer for thesis generation only
- [ ] Persist recommendation objects and evidence citations
- [ ] Add suppression logic for low-quality data or low confidence
- [ ] Publish prediction facts to analytical tables

## Phase 9 - Risk Engine and Trade Adapter
- [ ] Implement portfolio and account risk configuration model
- [ ] Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
- [ ] Implement paper trading adapter behavior and state sync
- [ ] Integrate first broker API in sandbox mode
- [ ] Implement idempotent order submission keys and duplicate prevention
- [ ] Implement full execution audit trail
- [ ] Add operator approval workflow for live trading mode
- [ ] Publish order, fill, and position facts to analytical tables

## Phase 10 - Lakehouse and SQL Analytics
- [ ] Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
- [ ] Implement Parquet writers for analytical datasets
- [ ] Implement Hive-compatible partition layout conventions on MinIO
- [ ] Implement Iceberg table creation and metadata management for analytical datasets
- [ ] Implement lake publisher jobs from operational data into analytical fact tables
- [ ] Configure Trino catalogs for Hive and or Iceberg access to MinIO
- [ ] Add example SQL views for prediction-vs-outcome and paper-trade scorecards

## Phase 11 - Query API and Dashboard
- [ ] Build APIs for companies, document timelines, trend summaries, recommendations, and order history
- [ ] Build evidence drill-down view linking recommendations to source documents and raw artifacts
- [ ] Build admin controls for source health, symbol configs, and trading mode
- [ ] Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
- [ ] Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy

## Phase 12 - Observability and Hardening
- [ ] Add structured logs and distributed tracing across services
- [ ] Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
- [ ] Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
- [ ] Add dead-letter queues and replay tooling
- [ ] Add data retention and lifecycle controls for raw and derived artifacts
- [ ] Add security review for secrets, network policies, trading isolation, and dashboard access control

## Phase 13 - Verification and Rollout
- [ ] Create replay dataset from archived documents for deterministic extraction testing
- [ ] Create integration tests for the full ingest-to-recommendation flow
- [ ] Create paper trading simulation scenarios
- [ ] Validate fail-closed behavior for broker outages and ambiguous order states
- [ ] Validate lake publication and Trino query correctness over partitioned MinIO datasets
- [ ] Run shadow mode before enabling any live execution
- [ ] Prepare operator runbook and incident response procedures

## Recommended First Vertical Slice
- [ ] Track 5 to 10 symbols
- [ ] Ingest one market data API, one news API, and one filings source per symbol group
- [ ] Persist raw artifacts to MinIO and metadata to PostgreSQL
- [ ] Extract structured document intelligence through Ollama
- [ ] Generate 7-day company trend summaries with market context
- [ ] Produce paper-trade recommendations only
- [ ] Publish analytical facts for bars, signals, and paper trades into MinIO
- [ ] Expose a simple dashboard with evidence, trend cards, and prediction-vs-outcome views