22 KiB
Stonks Oracle - Tasks
Phase 0 - Project Setup
- Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards
- Choose implementation language for services (Python preferred for scraping/LLM workflows)
- Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset
- Add Kubernetes manifests or Helm chart skeletons for all core components
- Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation
Phase 1 - Core Data and Infrastructure
- Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records
- Create MinIO bucket provisioning and lifecycle policies
- Create Redis key conventions and queue abstractions
- Implement shared config loader for environment variables and secrets
- Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations
- Stand up initial Trino catalog configuration for MinIO-backed datasets
- Stand up Superset with environment-backed datasource configuration
Phase 2 - Symbol Registry and Source Management
- Build symbol registry API endpoints for companies, aliases, watchlists, and sources
- Add source credibility, retention policy, and access policy fields
- Add source classes for market data API, news API, filings API, web scrape, and broker adapter
- Add admin validation for duplicate tickers, invalid URLs, and unsupported source types
- Add seed data support for an initial tracked watchlist
Phase 3
- External API Adapters
- Implement scheduler for symbol and source polling windows
- Implement market data API adapter interface
- Implement first concrete market data provider adapter
- Implement news API adapter interface
- Implement first concrete news API provider adapter
- Implement filings or regulatory adapter interface
- Implement first concrete filings provider adapter
- Implement broker API adapter interface for paper trading and order events
- Implement rate-limit coordination, retries, and backoff across adapters
Phase 4 - Ingestion Pipeline
- Implement web scraper worker for curated URLs and article pages
- Implement canonical URL normalization and content hashing
- Implement raw artifact upload to MinIO
- Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
- Implement retry and failure tracking for source retrieval
- Implement dedupe logic across article and filing sources
Phase 5 - Parsing and Normalization
- Implement HTML-to-text parsing pipeline
- Implement boilerplate reduction and body extraction heuristics
- Implement parser quality scoring and confidence flags
- Implement company mention detection using ticker, alias, and name matching
- Persist normalized text and parser outputs to MinIO and PostgreSQL
Phase 6 - Ollama Structured Extraction
- Build extraction prompt templates with anti-hallucination instructions
- Build JSON schema definitions for document intelligence extraction
- Implement Ollama client wrapper using structured output format
- Implement schema validation and semantic validation layers
- Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
- Add retry behavior for invalid or incomplete model responses
- Add model performance metrics and dashboards
Phase 7 - Aggregation and Trend Engine
- Implement recency decay and source credibility weighting
- Integrate market context features into aggregation windows
- Implement company-level rolling window aggregation
- Implement contradiction detection and disagreement representation
- Implement sector and market rollups
- Implement evidence ranking for supporting and opposing documents
- Persist trend windows and evidence mappings
Phase 8 - Recommendation Engine
- Design deterministic recommendation eligibility logic
- Implement recommendation generation from aggregated scores and evidence
- Add optional LLM wording layer for thesis generation only
- Persist recommendation objects and evidence citations
- Add suppression logic for low-quality data or low confidence
- Publish prediction facts to analytical tables
Phase 9 - Risk Engine and Trade Adapter
- Implement portfolio and account risk configuration model
- Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
- Implement paper trading adapter behavior and state sync
- Integrate first broker API in sandbox mode
- Implement idempotent order submission keys and duplicate prevention
- Implement full execution audit trail
- Add operator approval workflow for live trading mode
- Publish order, fill, and position facts to analytical tables
Phase 10 - Lakehouse and SQL Analytics
- Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
- Implement Parquet writers for analytical datasets
- Implement Hive-compatible partition layout conventions on MinIO
- Implement Iceberg table creation and metadata management for analytical datasets
- Implement lake publisher jobs from operational data into analytical fact tables
- Configure Trino catalogs for Hive and or Iceberg access to MinIO
- Add example SQL views for prediction-vs-outcome and paper-trade scorecards
Phase 11 - Query API and Dashboard
- Build APIs for companies, document timelines, trend summaries, recommendations, and order history
- Build evidence drill-down view linking recommendations to source documents and raw artifacts
- Build admin controls for source health, symbol configs, and trading mode
- Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
- Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy
Phase 12 - Observability and Hardening
- Add structured logs and distributed tracing across services
- Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
- Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
- Add dead-letter queues and replay tooling
- Add data retention and lifecycle controls for raw and derived artifacts
- Add security review for secrets, network policies, trading isolation, and dashboard access control
Phase 13 - Verification and Rollout
- Create replay dataset from archived documents for deterministic extraction testing
- Create integration tests for the full ingest-to-recommendation flow
- Create paper trading simulation scenarios
- Vnmalidate fail-closed behavior for broker outages and ambiguous order states
- Validate lake publication and Trino query correctness over partitioned MinIO datasets
Run shadow modemoved to Phase 15.5 (post-deployment)Prepare operator runbookmoved to Phase 15.5 (post-deployment)
Phase 14 - Local Docker Build Validation
- 14. Build and validate all Docker containers locally
- 14.1 Build all 11 service containers locally using the Makefile
- Run
make buildto build scheduler, symbol-registry, ingestion, parser, extractor, aggregation, recommendation, risk, broker-adapter, lake-publisher, and query-api images - Fix any build failures (missing dependencies, import errors, syntax issues)
- Requirements: N1, 12.1
- Run
- 14.2 Validate schema and logic consistency across all services
- Run the full test suite with
pytest tests/ -x --tb=short -qto catch import errors, schema mismatches, and logic inconsistencies - Verify all shared schemas in
services/shared/schemas.pyare consistent with what each service expects - Verify config loader fields match the configmap and secrets definitions
- Fix any mismatches found between services, schemas, migrations, and K8s manifests
- Requirements: 5.2, 5.3, 9.2, N2
- Run the full test suite with
- 14.3 Verify each container starts without immediate crash
- Run each built image with
docker run --rmand a quick health check or--helpflag to confirm the entrypoint resolves - Fix any runtime import errors or missing module paths
- Requirements: N1
- Run each built image with
Phase 15 - CI Validation, Helm Deployment, and Cluster Rollout
- 15. Commit, push, validate CI, create Helm chart, and deploy to cluster
- 15.1 Commit and push code to GitHub
- Configure git with SSH key for the private repo
- Commit all current changes with message
phase 14-15: docker build validation and helm deployment - Push to main branch
- Requirements: N1
- 15.2 Validate GitHub Actions workflow builds containers
- Monitor the GitHub Actions run to confirm lint-and-test and build-services jobs succeed
- Fix any CI failures and re-push if needed
- Requirements: N1
- 15.3 Create Helm chart for stonks-oracle deployment
- Create
infra/helm/stonks-oracle/Chart.yamlwith chart metadata - Create
infra/helm/stonks-oracle/values.yamlwith configurable image tags, replica counts, resource limits, and environment references - Create Helm templates for all deployments, services, configmap, secrets, ingress, and network policies from existing K8s manifests
- Add imagePullSecrets configuration for GHCR private registry access
- Add a template for a Kubernetes Secret of type
kubernetes.io/dockerconfigjsonfor GHCR authentication - Requirements: N1, 8.2
- Create
- 15.4 Configure GHCR image pull authentication on the cluster
- Create a
docker-registrysecret in thestonks-oraclenamespace with GHCR credentials (using a GitHub PAT or deploy key) - Reference the imagePullSecret in all deployment specs via the Helm values
- Requirements: 8.2, N1
- Create a
- 15.5 Deploy stonks-oracle to the cluster via Helm
- Run
helm installorhelm upgrade --installtargeting thestonks-oraclenamespace - Verify all pods reach Running/Ready state
- Verify services and ingress endpoints are reachable
- Debug and fix any deployment issues (CrashLoopBackOff, image pull errors, config mismatches)
- Requirements: N1, 12.1
- Run
- 15.6 Run shadow mode before enabling any live execution
- Confirm all services are running and processing in paper-only mode
- Validate end-to-end data flow from ingestion through recommendation without live trades
- Requirements: N5, 8.1
- 15.7 Prepare operator runbook and incident response procedures
- Document service restart procedures, log access, and common failure modes
- Document how to toggle trading modes and approve live execution
- Requirements: 8.2, 12.1
Phase 17 - First Vertical Slice: Live Pipeline End-to-End
-
17. Activate the full data pipeline for a set of tracked symbols
-
17.1 Seed initial symbols and configure sources via the dashboard
- Use the dashboard Companies page to add 5-10 symbols (e.g. AAPL, MSFT, GOOGL, AMZN, TSLA, NVDA, META, JPM, V, UNH)
- For each company, add sources via the Company Detail → Sources tab: one
market_apisource (Polygon), onenews_apisource, onefilings_apisource - Configure source
configJSON with the correct Polygon endpoint patterns per ticker - Verify companies and sources appear in the dashboard and via
curl https://stonks-registry.celestium.life/companies - Requirements: 1.1, 1.2, 1.3, 2.1
-
17.2 Wire the scheduler to enqueue ingestion jobs for active sources
- Verify the scheduler service reads active companies and sources from PostgreSQL
- Verify it enqueues Redis jobs for each source on its polling interval
- Check scheduler logs:
kubectl logs -n stonks-oracle deployment/scheduler --tail=50 - Confirm ingestion queue has items: check Redis keys via
kubectl exec - Fix any issues with the scheduler → source → Redis queue flow
- Requirements: 3.1, 3.2
-
17.3 Validate ingestion workers pull data from Polygon and persist to MinIO/PostgreSQL
- Check ingestion worker logs for successful API calls to Polygon
- Verify raw market data artifacts land in MinIO
stonks-raw-marketbucket - Verify document metadata rows appear in PostgreSQL
documentstable - Verify
ingestion_runstable shows completed runs withitems_fetched > 0 - Debug and fix any adapter errors (auth, rate limits, response parsing)
- Requirements: 4.1, 4.2, 4.3
-
17.4 Validate parser normalizes documents and extractor produces intelligence
- Check parser worker logs for document processing
- Verify normalized text appears in MinIO
stonks-normalizedbucket - Verify
parse_quality_scoreandparse_confidenceare set on documents - Check extractor worker logs for Ollama calls
- Verify
document_intelligencerows appear with summaries, sentiments, and impact records - Verify
document_impact_recordslink intelligence to companies - Debug any Ollama connection or schema validation issues
- Requirements: 5.1, 5.2, 6.1, 6.2, 6.3
-
17.5 Validate aggregation produces trend summaries
- Check aggregation worker logs for trend window generation
- Verify
trend_windowstable has entries with direction, strength, confidence - Verify
trend_evidencetable links trends to contributing documents - Verify contradiction scores are computed
- Check the dashboard Trends page shows trend cards with real data
- Requirements: 7.1, 7.2, 7.3
-
17.6 Validate recommendation engine produces paper-trade recommendations
- Check recommendation worker logs for recommendation generation
- Verify
recommendationstable has entries with action, confidence, thesis - Verify
recommendation_evidencelinks recommendations to documents - Verify risk evaluation runs and
risk_evaluationstable has entries - Check the dashboard Recommendations page shows real recommendations
- Requirements: 8.1, 8.2, 8.3
-
17.7 Validate lake publisher writes analytical facts to MinIO/Trino
- Check lake-publisher worker logs for Parquet writes
- Verify MinIO
stonks-lakehousebucket has partitioned Parquet files - Verify Trino can query the lakehouse tables via the SQL Explorer page
- Test a sample query:
SELECT * FROM lakehouse.stonks.documents LIMIT 10 - Requirements: 10.1, 10.2, 10.3
-
17.8 Validate dashboard shows live data across all pages
- Verify Home page shows non-zero metrics (active companies, documents, recommendations)
- Verify Companies page lists seeded companies with source counts
- Verify Documents page shows ingested documents with parse quality badges
- Verify Trends page shows trend cards with real direction/strength
- Verify Recommendations page shows generated recommendations
- Verify Ops Pipeline page shows document stage counts
- Verify Ops Ingestion page shows throughput data
- Take screenshots or curl key endpoints to confirm
- Requirements: 13.1, 13.2, 13.3, 13.4, 13.6
Phase 16 - Web Dashboard Frontend
-
16. Build React web dashboard for full platform control and analytics
-
16.1 Scaffold React project with Vite, TypeScript, Tailwind, and routing
- Initialize
frontend/directory withnpm create vite@latestusing the React-TS template - Install core dependencies:
@tanstack/react-router,@tanstack/react-query,tailwindcss,recharts,@monaco-editor/react,lucide-react - Configure Tailwind with a dark-mode-friendly color palette
- Create the app shell with sidebar navigation layout and route definitions for all sections
- Set up TanStack Query provider with default stale/cache times
- Create a shared API client module that targets the Query API, Symbol Registry, and Risk Engine base URLs (configurable via env vars)
- Requirements: 13.1
- Initialize
-
16.2 Build the API client layer and shared components
- Create typed API hooks using TanStack Query for each API domain: companies, documents, trends, recommendations, orders, positions, admin/trading, admin/sources, ops endpoints
- Build reusable UI components: DataTable (sortable, filterable, paginated), StatusBadge, ConfidenceBar, TrendArrow, DateRangeSelector, TickerFilter, LoadingSpinner, ErrorBoundary
- Build a shared layout component with sidebar, breadcrumbs, and top-bar health indicator
- Requirements: 13.1, 13.2
-
16.3 Implement company and source management pages
- Build
/companieslist page with searchable, sortable table showing ticker, name, sector, active status, source count - Build
/companies/:iddetail page with editable fields (sector, industry, market cap, active toggle), tabs for aliases, sources, and document history - Build source add/edit form with source type selector, config JSON editor, credibility slider, retention days, access policy dropdown
- Build
/companies/:id/aliasesmanagement with add/delete alias forms - Wire watchlist CRUD pages at
/watchlistswith member management - Requirements: 13.2, 1.1, 1.2, 1.3
- Build
-
16.4 Implement document timeline and intelligence drill-down pages
- Build
/documentslist page with filterable timeline: title, type, source, ticker mentions, published date, parse quality badge, extraction status - Build
/documents/:iddetail page showing full intelligence extraction, company impacts with sentiment/score, key facts, risks, macro themes, and links to raw MinIO artifacts - Add evidence chain visualization showing document → intelligence → impact records
- Requirements: 13.3, 11.1, 11.2
- Build
-
16.5 Implement trend summary and evidence chain pages
- Build
/trendslist page with company trend cards showing direction indicator, strength bar, confidence score, contradiction score, and window selector - Build
/trends/:iddetail page with full evidence drill-down: contributing documents, intelligence objects, rank scores, weight breakdowns - Add expandable evidence list on trend cards for quick preview
- Requirements: 13.3, 6.5, 10.4
- Build
-
16.6 Implement recommendation review and order tracking pages
- Build
/recommendationslist page with filterable table: ticker, action badge, mode, confidence bar, thesis preview, timestamp - Build
/recommendations/:iddetail page with full evidence drill-down, risk evaluation display, and linked orders - Build
/orderslist page with status badges, fill info, and expandable audit trail - Build
/orders/:iddetail page with decision trace, order events timeline, and full audit trail - Build
/positionspage with current positions table showing unrealized/realized PnL, entry/current prices - Requirements: 13.4, 11.1, 11.2, 11.3
- Build
-
16.7 Implement trading controls and risk management pages
- Build
/tradingpage with trading mode toggle (paper/live/disabled) with confirmation dialog - Build pending approvals queue with approve/reject buttons and review note input
- Build risk configuration editor form for max position size, daily loss cap, sector exposure, cooldown periods
- Build active lockouts display with type, reason, and expiration countdown
- Requirements: 13.5, 8.1, 8.2
- Build
-
16.8 Implement DevOps monitoring dashboards
- Build
/ops/pipelinepage with pipeline health summary: document stage counts, parsing quality distribution, extraction validation rates, trend generation stats - Build
/ops/ingestionpage with time-series charts (Recharts) for ingestion throughput, success/failure rates by source type, configurable time bucket selector - Build
/ops/modelpage with model performance metrics: success rate gauge, latency percentile chart, retry rate, confidence distribution histogram, recent failures table - Build
/ops/coveragepage with company × source type coverage matrix, stale source indicators, and coverage gap alerts - Requirements: 13.6, 12.1, 12.2, 12.3
- Build
-
16.9 Implement SQL query explorer (Athena-like)
- Add a
/api/analytics/queryproxy endpoint to the Query API that forwards SQL to Trino, enforces row limits, and returns structured{columns, rows, row_count, elapsed_ms}results - Add a
/api/analytics/schemaendpoint that returns Trino catalog/schema/table/column metadata for the schema browser - Build
/analytics/querypage with Monaco Editor SQL input, schema browser sidebar, execute button, and results table with virtual scrolling - Add chart builder panel: select chart type (line, bar, scatter, pie, heatmap), map result columns to axes, render via Recharts
- Add saved queries: persist to PostgreSQL via a new
/api/analytics/saved-queriesCRUD endpoint, display saved query list with load/delete - Requirements: 13.7, 10.1, 10.3
- Add a
-
16.10 Implement pre-built analytical dashboards (QuickSight-like)
- Build
/analytics/dashboardsgallery page listing available dashboards with preview thumbnails - Build Symbol Overview dashboard: company card grid with trend direction, latest recommendation, position status, sourced from API data
- Build Sentiment Heatmap dashboard: sector × time matrix colored by aggregated sentiment, sourced from Trino query
- Build Prediction Accuracy dashboard: scatter plot of predicted confidence vs realized price move, sourced from Trino
prediction_vs_outcometable - Build Paper Trading PnL dashboard: equity curve line chart, daily PnL bars, win rate metrics, sourced from Trino
pnl_dailytable - Build Model Quality dashboard: extraction success rate over time, latency distribution, retry rate charts
- Add date range selector and ticker filter controls shared across all dashboards
- Requirements: 13.8, 10.2, 10.4
- Build
-
16.11 Build home/overview page
- Build
/home page with system health summary card, recent activity feed, key metrics (active companies, documents today, recommendations today, pipeline status) - Add quick-nav cards linking to each major section
- Add alert banner for critical issues (source failures, pipeline bottlenecks)
- Requirements: 13.1, 13.6
- Build
-
16.12 Add Dockerfile, CI build, Helm template, and deploy to cluster
- Create
frontend/Dockerfileusing multi-stage build: node for build, nginx for serve - Add
dashboardservice to.github/workflows/build.ymlmatrix - Add
dashboarddeployment, service, and ingress to Helm chart values and templates - Configure ingress at
stonks.celestium.life(or similar) - Build, push, and deploy to cluster
- Verify dashboard is accessible and all pages load data from the APIs
- Requirements: 13.1, N1
- Create