130 lines
7.7 KiB
Markdown
130 lines
7.7 KiB
Markdown
# Stonks Oracle - Tasks
|
|
|
|
## Phase 0 - Project Setup
|
|
- [ ] Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards
|
|
- [ ] Choose implementation language for services (Python preferred for scraping/LLM workflows)
|
|
- [ ] Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset
|
|
- [ ] Add Kubernetes manifests or Helm chart skeletons for all core components
|
|
- [ ] Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation
|
|
|
|
## Phase 1 - Core Data and Infrastructure
|
|
- [ ] Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records
|
|
- [ ] Create MinIO bucket provisioning and lifecycle policies
|
|
- [ ] Create Redis key conventions and queue abstractions
|
|
- [ ] Implement shared config loader for environment variables and secrets
|
|
- [ ] Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations
|
|
- [ ] Stand up initial Trino catalog configuration for MinIO-backed datasets
|
|
- [ ] Stand up Superset with environment-backed datasource configuration
|
|
|
|
## Phase 2 - Symbol Registry and Source Management
|
|
- [ ] Build symbol registry API endpoints for companies, aliases, watchlists, and sources
|
|
- [ ] Add source credibility, retention policy, and access policy fields
|
|
- [ ] Add source classes for market data API, news API, filings API, web scrape, and broker adapter
|
|
- [ ] Add admin validation for duplicate tickers, invalid URLs, and unsupported source types
|
|
- [ ] Add seed data support for an initial tracked watchlist
|
|
|
|
## Phase 3 - External API Adapters
|
|
- [ ] Implement scheduler for symbol and source polling windows
|
|
- [ ] Implement market data API adapter interface
|
|
- [ ] Implement first concrete market data provider adapter
|
|
- [ ] Implement news API adapter interface
|
|
- [ ] Implement first concrete news API provider adapter
|
|
- [ ] Implement filings or regulatory adapter interface
|
|
- [ ] Implement first concrete filings provider adapter
|
|
- [ ] Implement broker API adapter interface for paper trading and order events
|
|
- [ ] Implement rate-limit coordination, retries, and backoff across adapters
|
|
|
|
## Phase 4 - Ingestion Pipeline
|
|
- [ ] Implement web scraper worker for curated URLs and article pages
|
|
- [ ] Implement canonical URL normalization and content hashing
|
|
- [ ] Implement raw artifact upload to MinIO
|
|
- [ ] Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
|
|
- [ ] Implement retry and failure tracking for source retrieval
|
|
- [ ] Implement dedupe logic across article and filing sources
|
|
|
|
## Phase 5 - Parsing and Normalization
|
|
- [ ] Implement HTML-to-text parsing pipeline
|
|
- [ ] Implement boilerplate reduction and body extraction heuristics
|
|
- [ ] Implement parser quality scoring and confidence flags
|
|
- [ ] Implement company mention detection using ticker, alias, and name matching
|
|
- [ ] Persist normalized text and parser outputs to MinIO and PostgreSQL
|
|
|
|
## Phase 6 - Ollama Structured Extraction
|
|
- [ ] Build extraction prompt templates with anti-hallucination instructions
|
|
- [ ] Build JSON schema definitions for document intelligence extraction
|
|
- [ ] Implement Ollama client wrapper using structured output format
|
|
- [ ] Implement schema validation and semantic validation layers
|
|
- [ ] Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
|
|
- [ ] Add retry behavior for invalid or incomplete model responses
|
|
- [ ] Add model performance metrics and dashboards
|
|
|
|
## Phase 7 - Aggregation and Trend Engine
|
|
- [ ] Implement recency decay and source credibility weighting
|
|
- [ ] Integrate market context features into aggregation windows
|
|
- [ ] Implement company-level rolling window aggregation
|
|
- [ ] Implement contradiction detection and disagreement representation
|
|
- [ ] Implement sector and market rollups
|
|
- [ ] Implement evidence ranking for supporting and opposing documents
|
|
- [ ] Persist trend windows and evidence mappings
|
|
|
|
## Phase 8 - Recommendation Engine
|
|
- [ ] Design deterministic recommendation eligibility logic
|
|
- [ ] Implement recommendation generation from aggregated scores and evidence
|
|
- [ ] Add optional LLM wording layer for thesis generation only
|
|
- [ ] Persist recommendation objects and evidence citations
|
|
- [ ] Add suppression logic for low-quality data or low confidence
|
|
- [ ] Publish prediction facts to analytical tables
|
|
|
|
## Phase 9 - Risk Engine and Trade Adapter
|
|
- [ ] Implement portfolio and account risk configuration model
|
|
- [ ] Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
|
|
- [ ] Implement paper trading adapter behavior and state sync
|
|
- [ ] Integrate first broker API in sandbox mode
|
|
- [ ] Implement idempotent order submission keys and duplicate prevention
|
|
- [ ] Implement full execution audit trail
|
|
- [ ] Add operator approval workflow for live trading mode
|
|
- [ ] Publish order, fill, and position facts to analytical tables
|
|
|
|
## Phase 10 - Lakehouse and SQL Analytics
|
|
- [ ] Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
|
|
- [ ] Implement Parquet writers for analytical datasets
|
|
- [ ] Implement Hive-compatible partition layout conventions on MinIO
|
|
- [ ] Implement Iceberg table creation and metadata management for analytical datasets
|
|
- [ ] Implement lake publisher jobs from operational data into analytical fact tables
|
|
- [ ] Configure Trino catalogs for Hive and or Iceberg access to MinIO
|
|
- [ ] Add example SQL views for prediction-vs-outcome and paper-trade scorecards
|
|
|
|
## Phase 11 - Query API and Dashboard
|
|
- [ ] Build APIs for companies, document timelines, trend summaries, recommendations, and order history
|
|
- [ ] Build evidence drill-down view linking recommendations to source documents and raw artifacts
|
|
- [ ] Build admin controls for source health, symbol configs, and trading mode
|
|
- [ ] Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
|
|
- [ ] Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy
|
|
|
|
## Phase 12 - Observability and Hardening
|
|
- [ ] Add structured logs and distributed tracing across services
|
|
- [ ] Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
|
|
- [ ] Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
|
|
- [ ] Add dead-letter queues and replay tooling
|
|
- [ ] Add data retention and lifecycle controls for raw and derived artifacts
|
|
- [ ] Add security review for secrets, network policies, trading isolation, and dashboard access control
|
|
|
|
## Phase 13 - Verification and Rollout
|
|
- [ ] Create replay dataset from archived documents for deterministic extraction testing
|
|
- [ ] Create integration tests for the full ingest-to-recommendation flow
|
|
- [ ] Create paper trading simulation scenarios
|
|
- [ ] Validate fail-closed behavior for broker outages and ambiguous order states
|
|
- [ ] Validate lake publication and Trino query correctness over partitioned MinIO datasets
|
|
- [ ] Run shadow mode before enabling any live execution
|
|
- [ ] Prepare operator runbook and incident response procedures
|
|
|
|
## Recommended First Vertical Slice
|
|
- [ ] Track 5 to 10 symbols
|
|
- [ ] Ingest one market data API, one news API, and one filings source per symbol group
|
|
- [ ] Persist raw artifacts to MinIO and metadata to PostgreSQL
|
|
- [ ] Extract structured document intelligence through Ollama
|
|
- [ ] Generate 7-day company trend summaries with market context
|
|
- [ ] Produce paper-trade recommendations only
|
|
- [ ] Publish analytical facts for bars, signals, and paper trades into MinIO
|
|
- [ ] Expose a simple dashboard with evidence, trend cards, and prediction-vs-outcome views
|