admin/stonks-oracle

Fork 0

Files

T

Celes Renata 4f2f113cda phase 17: fix text[]/varchar[] type mismatch in coverage-gaps SQL

2026-04-12 04:15:00 -07:00

22 KiB

Raw Blame History

Stonks Oracle - Tasks

Phase 0 - Project Setup

Create repository structure for services, shared schemas, infrastructure, lakehouse, and dashboards
Choose implementation language for services (Python preferred for scraping/LLM workflows)
Add local development stack with MinIO, PostgreSQL, Redis, Ollama, Trino, and Superset
Add Kubernetes manifests or Helm chart skeletons for all core components
Add CI pipeline for linting, tests, container builds, schema checks, and lake dataset validation

Phase 1 - Core Data and Infrastructure

Create PostgreSQL schema migrations for companies, watchlists, sources, documents, document intelligence, trends, recommendations, orders, positions, and audit records
Create MinIO bucket provisioning and lifecycle policies
Create Redis key conventions and queue abstractions
Implement shared config loader for environment variables and secrets
Implement shared typed JSON schemas for document intelligence, trend summaries, and recommendations
Stand up initial Trino catalog configuration for MinIO-backed datasets
Stand up Superset with environment-backed datasource configuration

Phase 2 - Symbol Registry and Source Management

Build symbol registry API endpoints for companies, aliases, watchlists, and sources
Add source credibility, retention policy, and access policy fields
Add source classes for market data API, news API, filings API, web scrape, and broker adapter
Add admin validation for duplicate tickers, invalid URLs, and unsupported source types
Add seed data support for an initial tracked watchlist

Phase 3

External API Adapters
Implement scheduler for symbol and source polling windows
Implement market data API adapter interface
Implement first concrete market data provider adapter
Implement news API adapter interface
Implement first concrete news API provider adapter
Implement filings or regulatory adapter interface
Implement first concrete filings provider adapter
Implement broker API adapter interface for paper trading and order events
Implement rate-limit coordination, retries, and backoff across adapters

Phase 4 - Ingestion Pipeline

Implement web scraper worker for curated URLs and article pages
Implement canonical URL normalization and content hashing
Implement raw artifact upload to MinIO
Implement metadata persistence in PostgreSQL for market payloads, documents, and broker events
Implement retry and failure tracking for source retrieval
Implement dedupe logic across article and filing sources

Phase 5 - Parsing and Normalization

Implement HTML-to-text parsing pipeline
Implement boilerplate reduction and body extraction heuristics
Implement parser quality scoring and confidence flags
Implement company mention detection using ticker, alias, and name matching
Persist normalized text and parser outputs to MinIO and PostgreSQL

Phase 6 - Ollama Structured Extraction

Build extraction prompt templates with anti-hallucination instructions
Build JSON schema definitions for document intelligence extraction
Implement Ollama client wrapper using structured output format
Implement schema validation and semantic validation layers
Persist prompts, model metadata, raw outputs, validation reports, and final intelligence objects
Add retry behavior for invalid or incomplete model responses
Add model performance metrics and dashboards

Phase 7 - Aggregation and Trend Engine

Implement recency decay and source credibility weighting
Integrate market context features into aggregation windows
Implement company-level rolling window aggregation
Implement contradiction detection and disagreement representation
Implement sector and market rollups
Implement evidence ranking for supporting and opposing documents
Persist trend windows and evidence mappings

Phase 8 - Recommendation Engine

Design deterministic recommendation eligibility logic
Implement recommendation generation from aggregated scores and evidence
Add optional LLM wording layer for thesis generation only
Persist recommendation objects and evidence citations
Add suppression logic for low-quality data or low confidence
Publish prediction facts to analytical tables

Phase 9 - Risk Engine and Trade Adapter

Implement portfolio and account risk configuration model
Implement hard blocks for max position size, sector exposure, daily loss limits, and news-shock lockouts
Implement paper trading adapter behavior and state sync
Integrate first broker API in sandbox mode
Implement idempotent order submission keys and duplicate prevention
Implement full execution audit trail
Add operator approval workflow for live trading mode
Publish order, fill, and position facts to analytical tables

Phase 10 - Lakehouse and SQL Analytics

Define analytical fact tables for bars, documents, extractions, signals, orders, fills, positions, and PnL
Implement Parquet writers for analytical datasets
Implement Hive-compatible partition layout conventions on MinIO
Implement Iceberg table creation and metadata management for analytical datasets
Implement lake publisher jobs from operational data into analytical fact tables
Configure Trino catalogs for Hive and or Iceberg access to MinIO
Add example SQL views for prediction-vs-outcome and paper-trade scorecards

Phase 11 - Query API and Dashboard

Build APIs for companies, document timelines, trend summaries, recommendations, and order history
Build evidence drill-down view linking recommendations to source documents and raw artifacts
Build admin controls for source health, symbol configs, and trading mode
Build operational dashboard for ingestion throughput, model failures, and source coverage gaps
Build Superset starter dashboards for symbol overview, sentiment heatmap, PnL, and prediction accuracy

Phase 12 - Observability and Hardening

Add structured logs and distributed tracing across services
Add Prometheus metrics for ingestion, parsing, extraction, aggregation, lake publication, and trading
Add alerting for source failures, schema failure spikes, analytical lag, and broker issues
Add dead-letter queues and replay tooling
Add data retention and lifecycle controls for raw and derived artifacts
Add security review for secrets, network policies, trading isolation, and dashboard access control

Phase 13 - Verification and Rollout

Create replay dataset from archived documents for deterministic extraction testing
Create integration tests for the full ingest-to-recommendation flow
Create paper trading simulation scenarios
Vnmalidate fail-closed behavior for broker outages and ambiguous order states
Validate lake publication and Trino query correctness over partitioned MinIO datasets
~~Run shadow mode~~ moved to Phase 15.5 (post-deployment)
~~Prepare operator runbook~~ moved to Phase 15.5 (post-deployment)

Phase 14 - Local Docker Build Validation

14. Build and validate all Docker containers locally
14.1 Build all 11 service containers locally using the Makefile
- Run make build to build scheduler, symbol-registry, ingestion, parser, extractor, aggregation, recommendation, risk, broker-adapter, lake-publisher, and query-api images
- Fix any build failures (missing dependencies, import errors, syntax issues)
- Requirements: N1, 12.1
14.2 Validate schema and logic consistency across all services
- Run the full test suite with pytest tests/ -x --tb=short -q to catch import errors, schema mismatches, and logic inconsistencies
- Verify all shared schemas in services/shared/schemas.py are consistent with what each service expects
- Verify config loader fields match the configmap and secrets definitions
- Fix any mismatches found between services, schemas, migrations, and K8s manifests
- Requirements: 5.2, 5.3, 9.2, N2
14.3 Verify each container starts without immediate crash
- Run each built image with docker run --rm and a quick health check or --help flag to confirm the entrypoint resolves
- Fix any runtime import errors or missing module paths
- Requirements: N1

Phase 15 - CI Validation, Helm Deployment, and Cluster Rollout

15. Commit, push, validate CI, create Helm chart, and deploy to cluster
15.1 Commit and push code to GitHub
- Configure git with SSH key for the private repo
- Commit all current changes with message phase 14-15: docker build validation and helm deployment
- Push to main branch
- Requirements: N1
15.2 Validate GitHub Actions workflow builds containers
- Monitor the GitHub Actions run to confirm lint-and-test and build-services jobs succeed
- Fix any CI failures and re-push if needed
- Requirements: N1
15.3 Create Helm chart for stonks-oracle deployment
- Create infra/helm/stonks-oracle/Chart.yaml with chart metadata
- Create infra/helm/stonks-oracle/values.yaml with configurable image tags, replica counts, resource limits, and environment references
- Create Helm templates for all deployments, services, configmap, secrets, ingress, and network policies from existing K8s manifests
- Add imagePullSecrets configuration for GHCR private registry access
- Add a template for a Kubernetes Secret of type kubernetes.io/dockerconfigjson for GHCR authentication
- Requirements: N1, 8.2
15.4 Configure GHCR image pull authentication on the cluster
- Create a docker-registry secret in the stonks-oracle namespace with GHCR credentials (using a GitHub PAT or deploy key)
- Reference the imagePullSecret in all deployment specs via the Helm values
- Requirements: 8.2, N1
15.5 Deploy stonks-oracle to the cluster via Helm
- Run helm install or helm upgrade --install targeting the stonks-oracle namespace
- Verify all pods reach Running/Ready state
- Verify services and ingress endpoints are reachable
- Debug and fix any deployment issues (CrashLoopBackOff, image pull errors, config mismatches)
- Requirements: N1, 12.1
15.6 Run shadow mode before enabling any live execution
- Confirm all services are running and processing in paper-only mode
- Validate end-to-end data flow from ingestion through recommendation without live trades
- Requirements: N5, 8.1
15.7 Prepare operator runbook and incident response procedures
- Document service restart procedures, log access, and common failure modes
- Document how to toggle trading modes and approve live execution
- Requirements: 8.2, 12.1

Phase 17 - First Vertical Slice: Live Pipeline End-to-End

17. Activate the full data pipeline for a set of tracked symbols
17.1 Seed initial symbols and configure sources via the dashboard
- Use the dashboard Companies page to add 5-10 symbols (e.g. AAPL, MSFT, GOOGL, AMZN, TSLA, NVDA, META, JPM, V, UNH)
- For each company, add sources via the Company Detail → Sources tab: one market_api source (Polygon), one news_api source, one filings_api source
- Configure source config JSON with the correct Polygon endpoint patterns per ticker
- Verify companies and sources appear in the dashboard and via curl https://stonks-registry.celestium.life/companies
- Requirements: 1.1, 1.2, 1.3, 2.1
17.2 Wire the scheduler to enqueue ingestion jobs for active sources
- Verify the scheduler service reads active companies and sources from PostgreSQL
- Verify it enqueues Redis jobs for each source on its polling interval
- Check scheduler logs: kubectl logs -n stonks-oracle deployment/scheduler --tail=50
- Confirm ingestion queue has items: check Redis keys via kubectl exec
- Fix any issues with the scheduler → source → Redis queue flow
- Requirements: 3.1, 3.2
17.3 Validate ingestion workers pull data from Polygon and persist to MinIO/PostgreSQL
- Check ingestion worker logs for successful API calls to Polygon
- Verify raw market data artifacts land in MinIO stonks-raw-market bucket
- Verify document metadata rows appear in PostgreSQL documents table
- Verify ingestion_runs table shows completed runs with items_fetched > 0
- Debug and fix any adapter errors (auth, rate limits, response parsing)
- Requirements: 4.1, 4.2, 4.3
17.4 Validate parser normalizes documents and extractor produces intelligence
- Check parser worker logs for document processing
- Verify normalized text appears in MinIO stonks-normalized bucket
- Verify parse_quality_score and parse_confidence are set on documents
- Check extractor worker logs for Ollama calls
- Verify document_intelligence rows appear with summaries, sentiments, and impact records
- Verify document_impact_records link intelligence to companies
- Debug any Ollama connection or schema validation issues
- Requirements: 5.1, 5.2, 6.1, 6.2, 6.3
17.5 Validate aggregation produces trend summaries
- Check aggregation worker logs for trend window generation
- Verify trend_windows table has entries with direction, strength, confidence
- Verify trend_evidence table links trends to contributing documents
- Verify contradiction scores are computed
- Check the dashboard Trends page shows trend cards with real data
- Requirements: 7.1, 7.2, 7.3
17.6 Validate recommendation engine produces paper-trade recommendations
- Check recommendation worker logs for recommendation generation
- Verify recommendations table has entries with action, confidence, thesis
- Verify recommendation_evidence links recommendations to documents
- Verify risk evaluation runs and risk_evaluations table has entries
- Check the dashboard Recommendations page shows real recommendations
- Requirements: 8.1, 8.2, 8.3
17.7 Validate lake publisher writes analytical facts to MinIO/Trino
- Check lake-publisher worker logs for Parquet writes
- Verify MinIO stonks-lakehouse bucket has partitioned Parquet files
- Verify Trino can query the lakehouse tables via the SQL Explorer page
- Test a sample query: SELECT * FROM lakehouse.stonks.documents LIMIT 10
- Requirements: 10.1, 10.2, 10.3
17.8 Validate dashboard shows live data across all pages
- Verify Home page shows non-zero metrics (active companies, documents, recommendations)
- Verify Companies page lists seeded companies with source counts
- Verify Documents page shows ingested documents with parse quality badges
- Verify Trends page shows trend cards with real direction/strength
- Verify Recommendations page shows generated recommendations
- Verify Ops Pipeline page shows document stage counts
- Verify Ops Ingestion page shows throughput data
- Take screenshots or curl key endpoints to confirm
- Requirements: 13.1, 13.2, 13.3, 13.4, 13.6

Phase 16 - Web Dashboard Frontend

16. Build React web dashboard for full platform control and analytics
16.1 Scaffold React project with Vite, TypeScript, Tailwind, and routing
- Initialize frontend/ directory with npm create vite@latest using the React-TS template
- Install core dependencies: @tanstack/react-router, @tanstack/react-query, tailwindcss, recharts, @monaco-editor/react, lucide-react
- Configure Tailwind with a dark-mode-friendly color palette
- Create the app shell with sidebar navigation layout and route definitions for all sections
- Set up TanStack Query provider with default stale/cache times
- Create a shared API client module that targets the Query API, Symbol Registry, and Risk Engine base URLs (configurable via env vars)
- Requirements: 13.1
16.2 Build the API client layer and shared components
- Create typed API hooks using TanStack Query for each API domain: companies, documents, trends, recommendations, orders, positions, admin/trading, admin/sources, ops endpoints
- Build reusable UI components: DataTable (sortable, filterable, paginated), StatusBadge, ConfidenceBar, TrendArrow, DateRangeSelector, TickerFilter, LoadingSpinner, ErrorBoundary
- Build a shared layout component with sidebar, breadcrumbs, and top-bar health indicator
- Requirements: 13.1, 13.2
16.3 Implement company and source management pages
- Build /companies list page with searchable, sortable table showing ticker, name, sector, active status, source count
- Build /companies/:id detail page with editable fields (sector, industry, market cap, active toggle), tabs for aliases, sources, and document history
- Build source add/edit form with source type selector, config JSON editor, credibility slider, retention days, access policy dropdown
- Build /companies/:id/aliases management with add/delete alias forms
- Wire watchlist CRUD pages at /watchlists with member management
- Requirements: 13.2, 1.1, 1.2, 1.3
16.4 Implement document timeline and intelligence drill-down pages
- Build /documents list page with filterable timeline: title, type, source, ticker mentions, published date, parse quality badge, extraction status
- Build /documents/:id detail page showing full intelligence extraction, company impacts with sentiment/score, key facts, risks, macro themes, and links to raw MinIO artifacts
- Add evidence chain visualization showing document → intelligence → impact records
- Requirements: 13.3, 11.1, 11.2
16.5 Implement trend summary and evidence chain pages
- Build /trends list page with company trend cards showing direction indicator, strength bar, confidence score, contradiction score, and window selector
- Build /trends/:id detail page with full evidence drill-down: contributing documents, intelligence objects, rank scores, weight breakdowns
- Add expandable evidence list on trend cards for quick preview
- Requirements: 13.3, 6.5, 10.4
16.6 Implement recommendation review and order tracking pages
- Build /recommendations list page with filterable table: ticker, action badge, mode, confidence bar, thesis preview, timestamp
- Build /recommendations/:id detail page with full evidence drill-down, risk evaluation display, and linked orders
- Build /orders list page with status badges, fill info, and expandable audit trail
- Build /orders/:id detail page with decision trace, order events timeline, and full audit trail
- Build /positions page with current positions table showing unrealized/realized PnL, entry/current prices
- Requirements: 13.4, 11.1, 11.2, 11.3
16.7 Implement trading controls and risk management pages
- Build /trading page with trading mode toggle (paper/live/disabled) with confirmation dialog
- Build pending approvals queue with approve/reject buttons and review note input
- Build risk configuration editor form for max position size, daily loss cap, sector exposure, cooldown periods
- Build active lockouts display with type, reason, and expiration countdown
- Requirements: 13.5, 8.1, 8.2
16.8 Implement DevOps monitoring dashboards
- Build /ops/pipeline page with pipeline health summary: document stage counts, parsing quality distribution, extraction validation rates, trend generation stats
- Build /ops/ingestion page with time-series charts (Recharts) for ingestion throughput, success/failure rates by source type, configurable time bucket selector
- Build /ops/model page with model performance metrics: success rate gauge, latency percentile chart, retry rate, confidence distribution histogram, recent failures table
- Build /ops/coverage page with company × source type coverage matrix, stale source indicators, and coverage gap alerts
- Requirements: 13.6, 12.1, 12.2, 12.3
16.9 Implement SQL query explorer (Athena-like)
- Add a /api/analytics/query proxy endpoint to the Query API that forwards SQL to Trino, enforces row limits, and returns structured {columns, rows, row_count, elapsed_ms} results
- Add a /api/analytics/schema endpoint that returns Trino catalog/schema/table/column metadata for the schema browser
- Build /analytics/query page with Monaco Editor SQL input, schema browser sidebar, execute button, and results table with virtual scrolling
- Add chart builder panel: select chart type (line, bar, scatter, pie, heatmap), map result columns to axes, render via Recharts
- Add saved queries: persist to PostgreSQL via a new /api/analytics/saved-queries CRUD endpoint, display saved query list with load/delete
- Requirements: 13.7, 10.1, 10.3
16.10 Implement pre-built analytical dashboards (QuickSight-like)
- Build /analytics/dashboards gallery page listing available dashboards with preview thumbnails
- Build Symbol Overview dashboard: company card grid with trend direction, latest recommendation, position status, sourced from API data
- Build Sentiment Heatmap dashboard: sector × time matrix colored by aggregated sentiment, sourced from Trino query
- Build Prediction Accuracy dashboard: scatter plot of predicted confidence vs realized price move, sourced from Trino prediction_vs_outcome table
- Build Paper Trading PnL dashboard: equity curve line chart, daily PnL bars, win rate metrics, sourced from Trino pnl_daily table
- Build Model Quality dashboard: extraction success rate over time, latency distribution, retry rate charts
- Add date range selector and ticker filter controls shared across all dashboards
- Requirements: 13.8, 10.2, 10.4
16.11 Build home/overview page
- Build / home page with system health summary card, recent activity feed, key metrics (active companies, documents today, recommendations today, pipeline status)
- Add quick-nav cards linking to each major section
- Add alert banner for critical issues (source failures, pipeline bottlenecks)
- Requirements: 13.1, 13.6
16.12 Add Dockerfile, CI build, Helm template, and deploy to cluster
- Create frontend/Dockerfile using multi-stage build: node for build, nginx for serve
- Add dashboard service to .github/workflows/build.yml matrix
- Add dashboard deployment, service, and ingress to Helm chart values and templates
- Configure ingress at stonks.celestium.life (or similar)
- Build, push, and deploy to cluster
- Verify dashboard is accessible and all pages load data from the APIs
- Requirements: 13.1, N1

22 KiB Raw Blame History Unescape Escape

Stonks Oracle - Tasks

Phase 0 - Project Setup

Phase 1 - Core Data and Infrastructure

Phase 2 - Symbol Registry and Source Management

Phase 3

Phase 4 - Ingestion Pipeline

Phase 5 - Parsing and Normalization

Phase 6 - Ollama Structured Extraction

Phase 7 - Aggregation and Trend Engine

Phase 8 - Recommendation Engine

Phase 9 - Risk Engine and Trade Adapter

Phase 10 - Lakehouse and SQL Analytics

Phase 11 - Query API and Dashboard

Phase 12 - Observability and Hardening

Phase 13 - Verification and Rollout

Phase 14 - Local Docker Build Validation

Phase 15 - CI Validation, Helm Deployment, and Cluster Rollout

Phase 17 - First Vertical Slice: Live Pipeline End-to-End

Phase 16 - Web Dashboard Frontend

22 KiB

Raw Blame History