Files
stonks-oracle/.kiro/specs/stonks-oracle/tasks.md
T

22 KiB
Raw Blame History

Stonks Oracle - Tasks

Phase 0 - Project Setup

  • Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards
  • Choose implementation language for services (Python preferred for scraping/LLM workflows)
  • Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset
  • Add Kubernetes manifests or Helm chart skeletons for all core components
  • Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation

Phase 1 - Core Data and Infrastructure

  • Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records
  • Create MinIO bucket provisioning and lifecycle policies
  • Create Redis key conventions and queue abstractions
  • Implement shared config loader for environment variables and secrets
  • Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations
  • Stand up initial Trino catalog configuration for MinIO-backed datasets
  • Stand up Superset with environment-backed datasource configuration

Phase 2 - Symbol Registry and Source Management

  • Build symbol registry API endpoints for companies, aliases, watchlists, and sources
  • Add source credibility, retention policy, and access policy fields
  • Add source classes for market data API, news API, filings API, web scrape, and broker adapter
  • Add admin validation for duplicate tickers, invalid URLs, and unsupported source types
  • Add seed data support for an initial tracked watchlist

Phase 3

  • External API Adapters
  • Implement scheduler for symbol and source polling windows
  • Implement market data API adapter interface
  • Implement first concrete market data provider adapter
  • Implement news API adapter interface
  • Implement first concrete news API provider adapter
  • Implement filings or regulatory adapter interface
  • Implement first concrete filings provider adapter
  • Implement broker API adapter interface for paper trading and order events
  • Implement rate-limit coordination, retries, and backoff across adapters

Phase 4 - Ingestion Pipeline

  • Implement web scraper worker for curated URLs and article pages
  • Implement canonical URL normalization and content hashing
  • Implement raw artifact upload to MinIO
  • Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
  • Implement retry and failure tracking for source retrieval
  • Implement dedupe logic across article and filing sources

Phase 5 - Parsing and Normalization

  • Implement HTML-to-text parsing pipeline
  • Implement boilerplate reduction and body extraction heuristics
  • Implement parser quality scoring and confidence flags
  • Implement company mention detection using ticker, alias, and name matching
  • Persist normalized text and parser outputs to MinIO and PostgreSQL

Phase 6 - Ollama Structured Extraction

  • Build extraction prompt templates with anti-hallucination instructions
  • Build JSON schema definitions for document intelligence extraction
  • Implement Ollama client wrapper using structured output format
  • Implement schema validation and semantic validation layers
  • Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
  • Add retry behavior for invalid or incomplete model responses
  • Add model performance metrics and dashboards

Phase 7 - Aggregation and Trend Engine

  • Implement recency decay and source credibility weighting
  • Integrate market context features into aggregation windows
  • Implement company-level rolling window aggregation
  • Implement contradiction detection and disagreement representation
  • Implement sector and market rollups
  • Implement evidence ranking for supporting and opposing documents
  • Persist trend windows and evidence mappings

Phase 8 - Recommendation Engine

  • Design deterministic recommendation eligibility logic
  • Implement recommendation generation from aggregated scores and evidence
  • Add optional LLM wording layer for thesis generation only
  • Persist recommendation objects and evidence citations
  • Add suppression logic for low-quality data or low confidence
  • Publish prediction facts to analytical tables

Phase 9 - Risk Engine and Trade Adapter

  • Implement portfolio and account risk configuration model
  • Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
  • Implement paper trading adapter behavior and state sync
  • Integrate first broker API in sandbox mode
  • Implement idempotent order submission keys and duplicate prevention
  • Implement full execution audit trail
  • Add operator approval workflow for live trading mode
  • Publish order, fill, and position facts to analytical tables

Phase 10 - Lakehouse and SQL Analytics

  • Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
  • Implement Parquet writers for analytical datasets
  • Implement Hive-compatible partition layout conventions on MinIO
  • Implement Iceberg table creation and metadata management for analytical datasets
  • Implement lake publisher jobs from operational data into analytical fact tables
  • Configure Trino catalogs for Hive and or Iceberg access to MinIO
  • Add example SQL views for prediction-vs-outcome and paper-trade scorecards

Phase 11 - Query API and Dashboard

  • Build APIs for companies, document timelines, trend summaries, recommendations, and order history
  • Build evidence drill-down view linking recommendations to source documents and raw artifacts
  • Build admin controls for source health, symbol configs, and trading mode
  • Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
  • Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy

Phase 12 - Observability and Hardening

  • Add structured logs and distributed tracing across services
  • Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
  • Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
  • Add dead-letter queues and replay tooling
  • Add data retention and lifecycle controls for raw and derived artifacts
  • Add security review for secrets, network policies, trading isolation, and dashboard access control

Phase 13 - Verification and Rollout

  • Create replay dataset from archived documents for deterministic extraction testing
  • Create integration tests for the full ingest-to-recommendation flow
  • Create paper trading simulation scenarios
  • Vnmalidate fail-closed behavior for broker outages and ambiguous order states
  • Validate lake publication and Trino query correctness over partitioned MinIO datasets
  • Run shadow mode moved to Phase 15.5 (post-deployment)
  • Prepare operator runbook moved to Phase 15.5 (post-deployment)

Phase 14 - Local Docker Build Validation

  • 14. Build and validate all Docker containers locally
  • 14.1 Build all 11 service containers locally using the Makefile
    • Run make build to build scheduler, symbol-registry, ingestion, parser, extractor, aggregation, recommendation, risk, broker-adapter, lake-publisher, and query-api images
    • Fix any build failures (missing dependencies, import errors, syntax issues)
    • Requirements: N1, 12.1
  • 14.2 Validate schema and logic consistency across all services
    • Run the full test suite with pytest tests/ -x --tb=short -q to catch import errors, schema mismatches, and logic inconsistencies
    • Verify all shared schemas in services/shared/schemas.py are consistent with what each service expects
    • Verify config loader fields match the configmap and secrets definitions
    • Fix any mismatches found between services, schemas, migrations, and K8s manifests
    • Requirements: 5.2, 5.3, 9.2, N2
  • 14.3 Verify each container starts without immediate crash
    • Run each built image with docker run --rm and a quick health check or --help flag to confirm the entrypoint resolves
    • Fix any runtime import errors or missing module paths
    • Requirements: N1

Phase 15 - CI Validation, Helm Deployment, and Cluster Rollout

  • 15. Commit, push, validate CI, create Helm chart, and deploy to cluster
  • 15.1 Commit and push code to GitHub
    • Configure git with SSH key for the private repo
    • Commit all current changes with message phase 14-15: docker build validation and helm deployment
    • Push to main branch
    • Requirements: N1
  • 15.2 Validate GitHub Actions workflow builds containers
    • Monitor the GitHub Actions run to confirm lint-and-test and build-services jobs succeed
    • Fix any CI failures and re-push if needed
    • Requirements: N1
  • 15.3 Create Helm chart for stonks-oracle deployment
    • Create infra/helm/stonks-oracle/Chart.yaml with chart metadata
    • Create infra/helm/stonks-oracle/values.yaml with configurable image tags, replica counts, resource limits, and environment references
    • Create Helm templates for all deployments, services, configmap, secrets, ingress, and network policies from existing K8s manifests
    • Add imagePullSecrets configuration for GHCR private registry access
    • Add a template for a Kubernetes Secret of type kubernetes.io/dockerconfigjson for GHCR authentication
    • Requirements: N1, 8.2
  • 15.4 Configure GHCR image pull authentication on the cluster
    • Create a docker-registry secret in the stonks-oracle namespace with GHCR credentials (using a GitHub PAT or deploy key)
    • Reference the imagePullSecret in all deployment specs via the Helm values
    • Requirements: 8.2, N1
  • 15.5 Deploy stonks-oracle to the cluster via Helm
    • Run helm install or helm upgrade --install targeting the stonks-oracle namespace
    • Verify all pods reach Running/Ready state
    • Verify services and ingress endpoints are reachable
    • Debug and fix any deployment issues (CrashLoopBackOff, image pull errors, config mismatches)
    • Requirements: N1, 12.1
  • 15.6 Run shadow mode before enabling any live execution
    • Confirm all services are running and processing in paper-only mode
    • Validate end-to-end data flow from ingestion through recommendation without live trades
    • Requirements: N5, 8.1
  • 15.7 Prepare operator runbook and incident response procedures
    • Document service restart procedures, log access, and common failure modes
    • Document how to toggle trading modes and approve live execution
    • Requirements: 8.2, 12.1

Phase 17 - First Vertical Slice: Live Pipeline End-to-End

  • 17. Activate the full data pipeline for a set of tracked symbols

  • 17.1 Seed initial symbols and configure sources via the dashboard

    • Use the dashboard Companies page to add 5-10 symbols (e.g. AAPL, MSFT, GOOGL, AMZN, TSLA, NVDA, META, JPM, V, UNH)
    • For each company, add sources via the Company Detail → Sources tab: one market_api source (Polygon), one news_api source, one filings_api source
    • Configure source config JSON with the correct Polygon endpoint patterns per ticker
    • Verify companies and sources appear in the dashboard and via curl https://stonks-registry.celestium.life/companies
    • Requirements: 1.1, 1.2, 1.3, 2.1
  • 17.2 Wire the scheduler to enqueue ingestion jobs for active sources

    • Verify the scheduler service reads active companies and sources from PostgreSQL
    • Verify it enqueues Redis jobs for each source on its polling interval
    • Check scheduler logs: kubectl logs -n stonks-oracle deployment/scheduler --tail=50
    • Confirm ingestion queue has items: check Redis keys via kubectl exec
    • Fix any issues with the scheduler → source → Redis queue flow
    • Requirements: 3.1, 3.2
  • 17.3 Validate ingestion workers pull data from Polygon and persist to MinIO/PostgreSQL

    • Check ingestion worker logs for successful API calls to Polygon
    • Verify raw market data artifacts land in MinIO stonks-raw-market bucket
    • Verify document metadata rows appear in PostgreSQL documents table
    • Verify ingestion_runs table shows completed runs with items_fetched > 0
    • Debug and fix any adapter errors (auth, rate limits, response parsing)
    • Requirements: 4.1, 4.2, 4.3
  • 17.4 Validate parser normalizes documents and extractor produces intelligence

    • Check parser worker logs for document processing
    • Verify normalized text appears in MinIO stonks-normalized bucket
    • Verify parse_quality_score and parse_confidence are set on documents
    • Check extractor worker logs for Ollama calls
    • Verify document_intelligence rows appear with summaries, sentiments, and impact records
    • Verify document_impact_records link intelligence to companies
    • Debug any Ollama connection or schema validation issues
    • Requirements: 5.1, 5.2, 6.1, 6.2, 6.3
  • 17.5 Validate aggregation produces trend summaries

    • Check aggregation worker logs for trend window generation
    • Verify trend_windows table has entries with direction, strength, confidence
    • Verify trend_evidence table links trends to contributing documents
    • Verify contradiction scores are computed
    • Check the dashboard Trends page shows trend cards with real data
    • Requirements: 7.1, 7.2, 7.3
  • 17.6 Validate recommendation engine produces paper-trade recommendations

    • Check recommendation worker logs for recommendation generation
    • Verify recommendations table has entries with action, confidence, thesis
    • Verify recommendation_evidence links recommendations to documents
    • Verify risk evaluation runs and risk_evaluations table has entries
    • Check the dashboard Recommendations page shows real recommendations
    • Requirements: 8.1, 8.2, 8.3
  • 17.7 Validate lake publisher writes analytical facts to MinIO/Trino

    • Check lake-publisher worker logs for Parquet writes
    • Verify MinIO stonks-lakehouse bucket has partitioned Parquet files
    • Verify Trino can query the lakehouse tables via the SQL Explorer page
    • Test a sample query: SELECT * FROM lakehouse.stonks.documents LIMIT 10
    • Requirements: 10.1, 10.2, 10.3
  • 17.8 Validate dashboard shows live data across all pages

    • Verify Home page shows non-zero metrics (active companies, documents, recommendations)
    • Verify Companies page lists seeded companies with source counts
    • Verify Documents page shows ingested documents with parse quality badges
    • Verify Trends page shows trend cards with real direction/strength
    • Verify Recommendations page shows generated recommendations
    • Verify Ops Pipeline page shows document stage counts
    • Verify Ops Ingestion page shows throughput data
    • Take screenshots or curl key endpoints to confirm
    • Requirements: 13.1, 13.2, 13.3, 13.4, 13.6

Phase 16 - Web Dashboard Frontend

  • 16. Build React web dashboard for full platform control and analytics

  • 16.1 Scaffold React project with Vite, TypeScript, Tailwind, and routing

    • Initialize frontend/ directory with npm create vite@latest using the React-TS template
    • Install core dependencies: @tanstack/react-router, @tanstack/react-query, tailwindcss, recharts, @monaco-editor/react, lucide-react
    • Configure Tailwind with a dark-mode-friendly color palette
    • Create the app shell with sidebar navigation layout and route definitions for all sections
    • Set up TanStack Query provider with default stale/cache times
    • Create a shared API client module that targets the Query API, Symbol Registry, and Risk Engine base URLs (configurable via env vars)
    • Requirements: 13.1
  • 16.2 Build the API client layer and shared components

    • Create typed API hooks using TanStack Query for each API domain: companies, documents, trends, recommendations, orders, positions, admin/trading, admin/sources, ops endpoints
    • Build reusable UI components: DataTable (sortable, filterable, paginated), StatusBadge, ConfidenceBar, TrendArrow, DateRangeSelector, TickerFilter, LoadingSpinner, ErrorBoundary
    • Build a shared layout component with sidebar, breadcrumbs, and top-bar health indicator
    • Requirements: 13.1, 13.2
  • 16.3 Implement company and source management pages

    • Build /companies list page with searchable, sortable table showing ticker, name, sector, active status, source count
    • Build /companies/:id detail page with editable fields (sector, industry, market cap, active toggle), tabs for aliases, sources, and document history
    • Build source add/edit form with source type selector, config JSON editor, credibility slider, retention days, access policy dropdown
    • Build /companies/:id/aliases management with add/delete alias forms
    • Wire watchlist CRUD pages at /watchlists with member management
    • Requirements: 13.2, 1.1, 1.2, 1.3
  • 16.4 Implement document timeline and intelligence drill-down pages

    • Build /documents list page with filterable timeline: title, type, source, ticker mentions, published date, parse quality badge, extraction status
    • Build /documents/:id detail page showing full intelligence extraction, company impacts with sentiment/score, key facts, risks, macro themes, and links to raw MinIO artifacts
    • Add evidence chain visualization showing document → intelligence → impact records
    • Requirements: 13.3, 11.1, 11.2
  • 16.5 Implement trend summary and evidence chain pages

    • Build /trends list page with company trend cards showing direction indicator, strength bar, confidence score, contradiction score, and window selector
    • Build /trends/:id detail page with full evidence drill-down: contributing documents, intelligence objects, rank scores, weight breakdowns
    • Add expandable evidence list on trend cards for quick preview
    • Requirements: 13.3, 6.5, 10.4
  • 16.6 Implement recommendation review and order tracking pages

    • Build /recommendations list page with filterable table: ticker, action badge, mode, confidence bar, thesis preview, timestamp
    • Build /recommendations/:id detail page with full evidence drill-down, risk evaluation display, and linked orders
    • Build /orders list page with status badges, fill info, and expandable audit trail
    • Build /orders/:id detail page with decision trace, order events timeline, and full audit trail
    • Build /positions page with current positions table showing unrealized/realized PnL, entry/current prices
    • Requirements: 13.4, 11.1, 11.2, 11.3
  • 16.7 Implement trading controls and risk management pages

    • Build /trading page with trading mode toggle (paper/live/disabled) with confirmation dialog
    • Build pending approvals queue with approve/reject buttons and review note input
    • Build risk configuration editor form for max position size, daily loss cap, sector exposure, cooldown periods
    • Build active lockouts display with type, reason, and expiration countdown
    • Requirements: 13.5, 8.1, 8.2
  • 16.8 Implement DevOps monitoring dashboards

    • Build /ops/pipeline page with pipeline health summary: document stage counts, parsing quality distribution, extraction validation rates, trend generation stats
    • Build /ops/ingestion page with time-series charts (Recharts) for ingestion throughput, success/failure rates by source type, configurable time bucket selector
    • Build /ops/model page with model performance metrics: success rate gauge, latency percentile chart, retry rate, confidence distribution histogram, recent failures table
    • Build /ops/coverage page with company × source type coverage matrix, stale source indicators, and coverage gap alerts
    • Requirements: 13.6, 12.1, 12.2, 12.3
  • 16.9 Implement SQL query explorer (Athena-like)

    • Add a /api/analytics/query proxy endpoint to the Query API that forwards SQL to Trino, enforces row limits, and returns structured {columns, rows, row_count, elapsed_ms} results
    • Add a /api/analytics/schema endpoint that returns Trino catalog/schema/table/column metadata for the schema browser
    • Build /analytics/query page with Monaco Editor SQL input, schema browser sidebar, execute button, and results table with virtual scrolling
    • Add chart builder panel: select chart type (line, bar, scatter, pie, heatmap), map result columns to axes, render via Recharts
    • Add saved queries: persist to PostgreSQL via a new /api/analytics/saved-queries CRUD endpoint, display saved query list with load/delete
    • Requirements: 13.7, 10.1, 10.3
  • 16.10 Implement pre-built analytical dashboards (QuickSight-like)

    • Build /analytics/dashboards gallery page listing available dashboards with preview thumbnails
    • Build Symbol Overview dashboard: company card grid with trend direction, latest recommendation, position status, sourced from API data
    • Build Sentiment Heatmap dashboard: sector × time matrix colored by aggregated sentiment, sourced from Trino query
    • Build Prediction Accuracy dashboard: scatter plot of predicted confidence vs realized price move, sourced from Trino prediction_vs_outcome table
    • Build Paper Trading PnL dashboard: equity curve line chart, daily PnL bars, win rate metrics, sourced from Trino pnl_daily table
    • Build Model Quality dashboard: extraction success rate over time, latency distribution, retry rate charts
    • Add date range selector and ticker filter controls shared across all dashboards
    • Requirements: 13.8, 10.2, 10.4
  • 16.11 Build home/overview page

    • Build / home page with system health summary card, recent activity feed, key metrics (active companies, documents today, recommendations today, pipeline status)
    • Add quick-nav cards linking to each major section
    • Add alert banner for critical issues (source failures, pipeline bottlenecks)
    • Requirements: 13.1, 13.6
  • 16.12 Add Dockerfile, CI build, Helm template, and deploy to cluster

    • Create frontend/Dockerfile using multi-stage build: node for build, nginx for serve
    • Add dashboard service to .github/workflows/build.yml matrix
    • Add dashboard deployment, service, and ingress to Helm chart values and templates
    • Configure ingress at stonks.celestium.life (or similar)
    • Build, push, and deploy to cluster
    • Verify dashboard is accessible and all pages load data from the APIs
    • Requirements: 13.1, N1