phase 14-15: docker build validation and helm deployment

This commit is contained in:
Celes Renata
2026-04-11 11:59:45 -07:00
parent 7394d241c9
commit ce10afa034
179 changed files with 32559 additions and 576 deletions
+15 -6
View File
@@ -3,9 +3,18 @@
Apache Superset dashboard configurations and starter datasets for Stonks Oracle.
## Starter Dashboards
- Symbol Overview — company profile, source health, recent documents
- Sentiment Heatmap — market-wide sentiment by sector and symbol
- Prediction Accuracy — predicted signals vs realized price moves
- Paper Trading PnL — paper trade performance and position tracking
- Model Quality — extraction success rates, latency, and confidence distributions
- Source Coverage — ingestion throughput, source failures, and coverage gaps
See `starter/` for dashboard definitions covering:
- Symbol Overview — company profiles, source health, recent documents, and market snapshots
- Sentiment Heatmap — market-wide sentiment by sector and symbol, catalyst analysis
- Prediction Accuracy — predicted signals vs realized price moves, confidence calibration
- Paper Trading PnL — cumulative PnL, position snapshots, order history, and scorecards
## Operational Dashboards
See `operational/` for dashboard definitions covering:
- Ingestion Throughput — documents/hour by source type, success/failure rates, stale sources
- Model Extraction Quality — success rates, latency percentiles, validation failures, confidence distributions
- Source Coverage & Gaps — per-symbol source type matrix, missing sources, failure heatmap
Starter dashboards are powered by the Trino `lakehouse` catalog over MinIO-backed analytical tables.
Operational dashboards query the Query API `/api/ops/*` endpoints.
All dashboards can be imported into Superset via the UI or CLI.
+22
View File
@@ -0,0 +1,22 @@
# Operational Dashboard
Superset dashboard definitions for Stonks Oracle operational monitoring.
## Dashboards
- Ingestion Throughput — documents ingested per hour by source type, success/failure rates
- Model Extraction Quality — extraction success rates, latency percentiles, validation failures
- Source Coverage Gaps — symbols missing source types, stale sources with no recent data
## Data Sources
These dashboards query the Query API operational endpoints:
- `/api/ops/ingestion/throughput` — time-bucketed ingestion metrics
- `/api/ops/ingestion/summary` — aggregate ingestion stats
- `/api/ops/model/failures` — recent extraction failures
- `/api/ops/model/performance` — model performance summary
- `/api/ops/pipeline/health` — pipeline stage health
- `/api/ops/sources/coverage-gaps` — source coverage analysis
## Setup
Import the dashboard JSON files into Superset via the Superset UI or CLI.
The dashboards use the Trino `lakehouse` catalog as their primary datasource,
with supplementary queries against the Query API for real-time operational data.
@@ -0,0 +1,75 @@
{
"dashboard_title": "Ingestion Throughput",
"description": "Operational dashboard for monitoring ingestion pipeline throughput, success rates, and item counts across source types.",
"slug": "ingestion-throughput",
"position_json": {
"HEADER_ID": {"id": "HEADER_ID", "type": "HEADER", "meta": {"text": "Ingestion Throughput"}},
"ROW-1": {
"type": "ROW",
"children": ["CHART-throughput-timeseries", "CHART-source-type-breakdown"]
},
"ROW-2": {
"type": "ROW",
"children": ["CHART-success-failure-rate", "CHART-items-fetched"]
},
"ROW-3": {
"type": "ROW",
"children": ["CHART-stale-sources", "CHART-active-companies"]
}
},
"metadata": {
"refresh_frequency": 300,
"default_filters": "{}",
"color_scheme": "supersetColors"
},
"charts": [
{
"slice_name": "Ingestion Runs Over Time",
"viz_type": "echarts_timeseries_bar",
"description": "Ingestion run counts bucketed by hour, stacked by source type",
"datasource_type": "query",
"query": "SELECT date_trunc('hour', ir.started_at) AS bucket, ir.source_type, COUNT(*) AS run_count, COUNT(*) FILTER (WHERE ir.status = 'completed') AS completed, COUNT(*) FILTER (WHERE ir.status = 'failed') AS failed FROM ingestion_runs ir WHERE ir.started_at >= NOW() - INTERVAL '24 hours' GROUP BY 1, 2 ORDER BY 1",
"params": {
"x_axis": "bucket",
"metrics": ["run_count"],
"groupby": ["source_type"],
"time_grain_sqla": "PT1H"
}
},
{
"slice_name": "Source Type Breakdown",
"viz_type": "pie",
"description": "Distribution of ingestion runs by source type in the last 24h",
"datasource_type": "query",
"query": "SELECT ir.source_type, COUNT(*) AS runs FROM ingestion_runs ir WHERE ir.started_at >= NOW() - INTERVAL '24 hours' GROUP BY ir.source_type ORDER BY runs DESC"
},
{
"slice_name": "Success vs Failure Rate",
"viz_type": "echarts_timeseries_line",
"description": "Hourly success and failure counts over time",
"datasource_type": "query",
"query": "SELECT date_trunc('hour', ir.started_at) AS bucket, COUNT(*) FILTER (WHERE ir.status = 'completed') AS completed, COUNT(*) FILTER (WHERE ir.status = 'failed') AS failed, ROUND(COUNT(*) FILTER (WHERE ir.status = 'completed')::numeric / NULLIF(COUNT(*), 0), 3) AS success_rate FROM ingestion_runs ir WHERE ir.started_at >= NOW() - INTERVAL '24 hours' GROUP BY 1 ORDER BY 1"
},
{
"slice_name": "Items Fetched Over Time",
"viz_type": "echarts_timeseries_bar",
"description": "Total items fetched and new items per hour",
"datasource_type": "query",
"query": "SELECT date_trunc('hour', ir.started_at) AS bucket, COALESCE(SUM(ir.items_fetched), 0) AS items_fetched, COALESCE(SUM(ir.items_new), 0) AS items_new FROM ingestion_runs ir WHERE ir.started_at >= NOW() - INTERVAL '24 hours' GROUP BY 1 ORDER BY 1"
},
{
"slice_name": "Stale Sources",
"viz_type": "table",
"description": "Sources with no successful run in the last 24 hours",
"datasource_type": "query",
"query": "SELECT c.ticker, s.source_type, s.source_name, MAX(ir.started_at) FILTER (WHERE ir.status = 'completed') AS last_success, COUNT(*) FILTER (WHERE ir.status = 'failed' AND ir.started_at >= NOW() - INTERVAL '24 hours') AS recent_failures FROM sources s JOIN companies c ON c.id = s.company_id LEFT JOIN ingestion_runs ir ON ir.source_id = s.id WHERE s.active = TRUE AND c.active = TRUE GROUP BY c.ticker, s.source_type, s.source_name HAVING MAX(ir.started_at) FILTER (WHERE ir.status = 'completed') < NOW() - INTERVAL '24 hours' OR MAX(ir.started_at) FILTER (WHERE ir.status = 'completed') IS NULL ORDER BY c.ticker"
},
{
"slice_name": "Active Companies Ingested",
"viz_type": "big_number_total",
"description": "Count of distinct companies with ingestion activity in the last 24h",
"datasource_type": "query",
"query": "SELECT COUNT(DISTINCT company_id) AS active_companies FROM ingestion_runs WHERE started_at >= NOW() - INTERVAL '24 hours'"
}
]
}
+94
View File
@@ -0,0 +1,94 @@
{
"dashboard_title": "Model Extraction Quality",
"description": "Operational dashboard for monitoring Ollama extraction success rates, latency, validation failures, and confidence distributions.",
"slug": "model-extraction-quality",
"position_json": {
"HEADER_ID": {"id": "HEADER_ID", "type": "HEADER", "meta": {"text": "Model Extraction Quality"}},
"ROW-1": {
"type": "ROW",
"children": ["CHART-success-rate-kpi", "CHART-avg-latency-kpi", "CHART-avg-confidence-kpi", "CHART-retry-rate-kpi"]
},
"ROW-2": {
"type": "ROW",
"children": ["CHART-extraction-timeseries", "CHART-validation-status-pie"]
},
"ROW-3": {
"type": "ROW",
"children": ["CHART-latency-percentiles", "CHART-confidence-distribution"]
},
"ROW-4": {
"type": "ROW",
"children": ["CHART-recent-failures-table"]
}
},
"metadata": {
"refresh_frequency": 300,
"default_filters": "{}",
"color_scheme": "supersetColors"
},
"charts": [
{
"slice_name": "Extraction Success Rate",
"viz_type": "big_number_total",
"description": "Overall extraction success rate in the last 24h",
"datasource_type": "query",
"query": "SELECT ROUND(COUNT(*) FILTER (WHERE success)::numeric / NULLIF(COUNT(*), 0), 4) AS success_rate FROM model_performance_metrics WHERE recorded_at >= NOW() - INTERVAL '24 hours'"
},
{
"slice_name": "Avg Extraction Latency",
"viz_type": "big_number_total",
"description": "Average extraction duration in milliseconds",
"datasource_type": "query",
"query": "SELECT ROUND(AVG(total_duration_ms)::numeric, 0) AS avg_latency_ms FROM model_performance_metrics WHERE recorded_at >= NOW() - INTERVAL '24 hours'"
},
{
"slice_name": "Avg Confidence Score",
"viz_type": "big_number_total",
"description": "Average confidence of successful extractions",
"datasource_type": "query",
"query": "SELECT ROUND(AVG(confidence)::numeric, 3) AS avg_confidence FROM model_performance_metrics WHERE recorded_at >= NOW() - INTERVAL '24 hours' AND success = TRUE"
},
{
"slice_name": "Avg Retry Count",
"viz_type": "big_number_total",
"description": "Average retries per extraction attempt",
"datasource_type": "query",
"query": "SELECT ROUND(AVG(retry_count)::numeric, 2) AS avg_retries FROM model_performance_metrics WHERE recorded_at >= NOW() - INTERVAL '24 hours'"
},
{
"slice_name": "Extractions Over Time",
"viz_type": "echarts_timeseries_bar",
"description": "Hourly extraction counts split by success/failure",
"datasource_type": "query",
"query": "SELECT date_trunc('hour', recorded_at) AS bucket, COUNT(*) FILTER (WHERE success) AS successful, COUNT(*) FILTER (WHERE NOT success) AS failed FROM model_performance_metrics WHERE recorded_at >= NOW() - INTERVAL '24 hours' GROUP BY 1 ORDER BY 1"
},
{
"slice_name": "Validation Status Distribution",
"viz_type": "pie",
"description": "Breakdown of extraction validation outcomes",
"datasource_type": "query",
"query": "SELECT validation_status, COUNT(*) AS count FROM model_performance_metrics WHERE recorded_at >= NOW() - INTERVAL '24 hours' GROUP BY validation_status"
},
{
"slice_name": "Latency Percentiles Over Time",
"viz_type": "echarts_timeseries_line",
"description": "P50, P95, P99 extraction latency per hour",
"datasource_type": "query",
"query": "SELECT date_trunc('hour', recorded_at) AS bucket, ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_duration_ms)::numeric, 0) AS p50_ms, ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_duration_ms)::numeric, 0) AS p95_ms, ROUND(PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY total_duration_ms)::numeric, 0) AS p99_ms FROM model_performance_metrics WHERE recorded_at >= NOW() - INTERVAL '24 hours' GROUP BY 1 ORDER BY 1"
},
{
"slice_name": "Confidence Distribution",
"viz_type": "histogram",
"description": "Distribution of extraction confidence scores",
"datasource_type": "query",
"query": "SELECT CASE WHEN confidence >= 0.9 THEN '0.9-1.0' WHEN confidence >= 0.8 THEN '0.8-0.9' WHEN confidence >= 0.7 THEN '0.7-0.8' WHEN confidence >= 0.6 THEN '0.6-0.7' WHEN confidence >= 0.5 THEN '0.5-0.6' ELSE '<0.5' END AS confidence_bucket, COUNT(*) AS count FROM model_performance_metrics WHERE recorded_at >= NOW() - INTERVAL '24 hours' AND success = TRUE GROUP BY 1 ORDER BY 1"
},
{
"slice_name": "Recent Extraction Failures",
"viz_type": "table",
"description": "Most recent failed extractions with error details",
"datasource_type": "query",
"query": "SELECT mpm.ticker, mpm.model_name, mpm.validation_status, mpm.validation_error_count, mpm.attempt_count, mpm.total_duration_ms, mpm.recorded_at, d.title, d.document_type FROM model_performance_metrics mpm LEFT JOIN documents d ON d.id = mpm.document_id WHERE mpm.success = FALSE AND mpm.recorded_at >= NOW() - INTERVAL '24 hours' ORDER BY mpm.recorded_at DESC LIMIT 50"
}
]
}
@@ -0,0 +1,51 @@
{
"dashboard_title": "Source Coverage & Gaps",
"description": "Operational dashboard for identifying source coverage gaps, stale sources, and symbols missing expected data feeds.",
"slug": "source-coverage-gaps",
"position_json": {
"HEADER_ID": {"id": "HEADER_ID", "type": "HEADER", "meta": {"text": "Source Coverage & Gaps"}},
"ROW-1": {
"type": "ROW",
"children": ["CHART-coverage-matrix", "CHART-missing-types-table"]
},
"ROW-2": {
"type": "ROW",
"children": ["CHART-stale-sources-table", "CHART-failure-heatmap"]
}
},
"metadata": {
"refresh_frequency": 600,
"default_filters": "{}",
"color_scheme": "supersetColors"
},
"charts": [
{
"slice_name": "Source Coverage Matrix",
"viz_type": "table",
"description": "Per-symbol source type coverage showing active source counts",
"datasource_type": "query",
"query": "SELECT c.ticker, c.legal_name, c.sector, COUNT(s.id) FILTER (WHERE s.active) AS active_sources, COUNT(s.id) FILTER (WHERE s.source_type = 'market_api' AND s.active) AS market_sources, COUNT(s.id) FILTER (WHERE s.source_type = 'news_api' AND s.active) AS news_sources, COUNT(s.id) FILTER (WHERE s.source_type = 'filings_api' AND s.active) AS filings_sources, COUNT(s.id) FILTER (WHERE s.source_type = 'web_scrape' AND s.active) AS web_scrape_sources, COUNT(s.id) FILTER (WHERE s.source_type = 'broker' AND s.active) AS broker_sources FROM companies c LEFT JOIN sources s ON s.company_id = c.id WHERE c.active = TRUE GROUP BY c.ticker, c.legal_name, c.sector ORDER BY c.ticker"
},
{
"slice_name": "Symbols Missing Source Types",
"viz_type": "table",
"description": "Companies that lack one or more expected source types (market_api, news_api, filings_api)",
"datasource_type": "query",
"query": "SELECT c.ticker, c.legal_name, c.sector, ARRAY_AGG(DISTINCT s.source_type) FILTER (WHERE s.active) AS active_types FROM companies c LEFT JOIN sources s ON s.company_id = c.id AND s.active = TRUE WHERE c.active = TRUE GROUP BY c.ticker, c.legal_name, c.sector HAVING NOT ARRAY['market_api', 'news_api', 'filings_api'] <@ ARRAY_AGG(DISTINCT s.source_type) FILTER (WHERE s.active) OR ARRAY_AGG(DISTINCT s.source_type) FILTER (WHERE s.active) IS NULL ORDER BY c.ticker"
},
{
"slice_name": "Stale Sources (No Success in 24h)",
"viz_type": "table",
"description": "Active sources that have not completed a successful ingestion run in the last 24 hours",
"datasource_type": "query",
"query": "SELECT c.ticker, s.source_type, s.source_name, MAX(ir.started_at) FILTER (WHERE ir.status = 'completed') AS last_success, MAX(ir.started_at) AS last_attempt, COUNT(*) FILTER (WHERE ir.status = 'failed' AND ir.started_at >= NOW() - INTERVAL '24 hours') AS recent_failures FROM sources s JOIN companies c ON c.id = s.company_id LEFT JOIN ingestion_runs ir ON ir.source_id = s.id WHERE s.active = TRUE AND c.active = TRUE GROUP BY c.ticker, s.source_type, s.source_name HAVING MAX(ir.started_at) FILTER (WHERE ir.status = 'completed') < NOW() - INTERVAL '24 hours' OR MAX(ir.started_at) FILTER (WHERE ir.status = 'completed') IS NULL ORDER BY c.ticker, s.source_type"
},
{
"slice_name": "Source Failure Heatmap",
"viz_type": "heatmap",
"description": "Failure counts by source type and ticker in the last 24h",
"datasource_type": "query",
"query": "SELECT c.ticker, ir.source_type, COUNT(*) FILTER (WHERE ir.status = 'failed') AS failures FROM ingestion_runs ir JOIN companies c ON c.id = ir.company_id WHERE ir.started_at >= NOW() - INTERVAL '24 hours' GROUP BY c.ticker, ir.source_type HAVING COUNT(*) FILTER (WHERE ir.status = 'failed') > 0 ORDER BY failures DESC"
}
]
}
+29
View File
@@ -0,0 +1,29 @@
# Starter Dashboards
Superset dashboard definitions for Stonks Oracle research, analysis, and trading review.
## Dashboards
- Symbol Overview — company profiles, source health, recent documents, and market snapshots
- Sentiment Heatmap — market-wide sentiment by sector and symbol, catalyst analysis, contradiction tracking
- Prediction Accuracy — predicted signals vs realized price moves, confidence calibration, per-symbol accuracy
- Paper Trading PnL — cumulative PnL, daily performance, position snapshots, order history, and scorecards
## Data Sources
These dashboards query the Trino `lakehouse` catalog over MinIO-backed analytical fact tables:
- `lakehouse.stonks.documents` — ingested document metadata
- `lakehouse.stonks.document_extractions` — AI extraction outputs
- `lakehouse.stonks.trade_signals` — aggregated trend signals
- `lakehouse.stonks.market_bars` — OHLCV bar data
- `lakehouse.stonks.prediction_vs_outcome` — prediction accuracy tracking
- `lakehouse.stonks.pnl_daily` — daily PnL records
- `lakehouse.stonks.positions_daily` — end-of-day position snapshots
- `lakehouse.stonks.trade_orders` — order submission records
- `lakehouse.stonks.trade_fills` — fill and execution records
## Setup
1. Import the dashboard JSON files into Superset via the Superset UI or CLI
2. Ensure the Trino datasource is configured: `trino://trino@trino:8080/lakehouse/stonks`
3. Create the lakehouse views from `lakehouse/views/` for additional drill-down capability
## Trino Connection
The dashboards use the default Superset Trino connection configured in `infra/superset/superset_config.py`.
+124
View File
@@ -0,0 +1,124 @@
{
"dashboard_title": "Paper Trading PnL",
"description": "Paper trading performance tracking with PnL curves, position snapshots, order history, and trade detail drill-down.",
"slug": "paper-trading-pnl",
"position_json": {
"HEADER_ID": {"id": "HEADER_ID", "type": "HEADER", "meta": {"text": "Paper Trading PnL"}},
"ROW-1": {
"type": "ROW",
"children": ["CHART-total-net-pnl-kpi", "CHART-win-rate-kpi", "CHART-total-orders-kpi", "CHART-active-positions-kpi"]
},
"ROW-2": {
"type": "ROW",
"children": ["CHART-cumulative-pnl-timeseries", "CHART-daily-pnl-bar"]
},
"ROW-3": {
"type": "ROW",
"children": ["CHART-pnl-by-symbol", "CHART-order-status-pie"]
},
"ROW-4": {
"type": "ROW",
"children": ["CHART-positions-table"]
},
"ROW-5": {
"type": "ROW",
"children": ["CHART-scorecard-table"]
},
"ROW-6": {
"type": "ROW",
"children": ["CHART-recent-orders-table"]
}
},
"metadata": {
"refresh_frequency": 300,
"default_filters": "{}",
"color_scheme": "supersetColors"
},
"charts": [
{
"slice_name": "Total Net PnL",
"viz_type": "big_number_total",
"description": "Cumulative net PnL across all paper trading activity",
"datasource_type": "trino",
"query": "SELECT ROUND(SUM(net_pnl), 2) AS total_net_pnl FROM lakehouse.stonks.pnl_daily WHERE execution_mode = 'paper'"
},
{
"slice_name": "Win Rate",
"viz_type": "big_number_total",
"description": "Fraction of trading days with positive net PnL",
"datasource_type": "trino",
"query": "SELECT ROUND(CAST(COUNT(CASE WHEN net_pnl > 0 THEN 1 END) AS DOUBLE) / NULLIF(COUNT(*), 0), 4) AS win_rate FROM lakehouse.stonks.pnl_daily WHERE execution_mode = 'paper'"
},
{
"slice_name": "Total Orders",
"viz_type": "big_number_total",
"description": "Total paper trade orders submitted",
"datasource_type": "trino",
"query": "SELECT COUNT(DISTINCT order_id) AS total_orders FROM lakehouse.stonks.trade_orders WHERE execution_mode = 'paper'"
},
{
"slice_name": "Active Positions",
"viz_type": "big_number_total",
"description": "Number of symbols with open positions as of the latest snapshot",
"datasource_type": "trino",
"query": "SELECT COUNT(DISTINCT ticker) AS active_positions FROM lakehouse.stonks.positions_daily WHERE execution_mode = 'paper' AND quantity <> 0 AND dt = (SELECT MAX(dt) FROM lakehouse.stonks.positions_daily WHERE execution_mode = 'paper')"
},
{
"slice_name": "Cumulative PnL Over Time",
"viz_type": "echarts_timeseries_line",
"description": "Running cumulative net PnL across all paper trades",
"datasource_type": "trino",
"query": "SELECT dt AS bucket, SUM(net_pnl) AS daily_net_pnl, SUM(SUM(net_pnl)) OVER (ORDER BY dt) AS cumulative_pnl FROM lakehouse.stonks.pnl_daily WHERE execution_mode = 'paper' GROUP BY dt ORDER BY dt"
},
{
"slice_name": "Daily PnL",
"viz_type": "echarts_timeseries_bar",
"description": "Daily net PnL for paper trading, colored by positive/negative",
"datasource_type": "trino",
"query": "SELECT dt AS bucket, ROUND(SUM(net_pnl), 2) AS daily_pnl, ROUND(SUM(realized_pnl), 2) AS realized, ROUND(SUM(unrealized_pnl), 2) AS unrealized FROM lakehouse.stonks.pnl_daily WHERE execution_mode = 'paper' GROUP BY dt ORDER BY dt",
"params": {
"x_axis": "bucket",
"metrics": ["daily_pnl"]
}
},
{
"slice_name": "PnL by Symbol",
"viz_type": "echarts_timeseries_bar",
"description": "Total net PnL per symbol for paper trading",
"datasource_type": "trino",
"query": "SELECT ticker, ROUND(SUM(net_pnl), 2) AS total_pnl, ROUND(SUM(realized_pnl), 2) AS realized_pnl, ROUND(SUM(fees), 2) AS total_fees FROM lakehouse.stonks.pnl_daily WHERE execution_mode = 'paper' GROUP BY ticker ORDER BY total_pnl DESC",
"params": {
"x_axis": "ticker",
"metrics": ["total_pnl"]
}
},
{
"slice_name": "Order Status Distribution",
"viz_type": "pie",
"description": "Breakdown of paper trade order statuses",
"datasource_type": "trino",
"query": "SELECT status, COUNT(*) AS count FROM lakehouse.stonks.trade_orders WHERE execution_mode = 'paper' GROUP BY status ORDER BY count DESC"
},
{
"slice_name": "Current Positions",
"viz_type": "table",
"description": "Latest position snapshot for all paper trading symbols",
"datasource_type": "trino",
"query": "SELECT p.ticker, p.quantity, ROUND(p.avg_entry_price, 2) AS avg_entry, ROUND(p.close_price, 2) AS close_price, ROUND(p.market_value, 2) AS market_value, ROUND(p.unrealized_pnl, 2) AS unrealized_pnl, p.snapshot_at FROM lakehouse.stonks.positions_daily p WHERE p.execution_mode = 'paper' AND p.dt = (SELECT MAX(dt) FROM lakehouse.stonks.positions_daily WHERE execution_mode = 'paper') ORDER BY ABS(p.unrealized_pnl) DESC"
},
{
"slice_name": "Paper Trade Scorecard",
"viz_type": "table",
"description": "Per-symbol paper trading scorecard with win rates, PnL, and order counts",
"datasource_type": "trino",
"query": "SELECT pnl.ticker, COUNT(DISTINCT pnl.dt) AS trading_days, ROUND(SUM(pnl.net_pnl), 2) AS total_net_pnl, ROUND(AVG(pnl.net_pnl), 2) AS avg_daily_pnl, ROUND(CAST(COUNT(CASE WHEN pnl.net_pnl > 0 THEN 1 END) AS DOUBLE) / NULLIF(COUNT(*), 0), 4) AS win_rate, ROUND(MIN(pnl.net_pnl), 2) AS worst_day, ROUND(MAX(pnl.net_pnl), 2) AS best_day, ROUND(SUM(pnl.fees), 2) AS total_fees, MIN(pnl.dt) AS first_trade, MAX(pnl.dt) AS last_trade FROM lakehouse.stonks.pnl_daily pnl WHERE pnl.execution_mode = 'paper' GROUP BY pnl.ticker ORDER BY total_net_pnl DESC"
},
{
"slice_name": "Recent Orders",
"viz_type": "table",
"description": "Most recent paper trade orders with fill details",
"datasource_type": "trino",
"query": "SELECT o.ticker, o.side, o.order_type, o.quantity, ROUND(o.limit_price, 2) AS limit_price, o.status, f.fill_price, f.fill_quantity, f.commission, o.submitted_at, f.filled_at FROM lakehouse.stonks.trade_orders o LEFT JOIN lakehouse.stonks.trade_fills f ON o.order_id = f.order_id AND o.dt = f.dt WHERE o.execution_mode = 'paper' ORDER BY o.submitted_at DESC LIMIT 50"
}
]
}
+125
View File
@@ -0,0 +1,125 @@
{
"dashboard_title": "Prediction Accuracy",
"description": "Predicted signals vs realized price moves, confidence calibration, and model accuracy tracking.",
"slug": "prediction-accuracy",
"position_json": {
"HEADER_ID": {"id": "HEADER_ID", "type": "HEADER", "meta": {"text": "Prediction Accuracy"}},
"ROW-1": {
"type": "ROW",
"children": ["CHART-overall-hit-rate-kpi", "CHART-total-predictions-kpi", "CHART-avg-confidence-kpi", "CHART-avg-move-kpi"]
},
"ROW-2": {
"type": "ROW",
"children": ["CHART-hit-rate-timeseries", "CHART-outcome-distribution-pie"]
},
"ROW-3": {
"type": "ROW",
"children": ["CHART-confidence-calibration", "CHART-confidence-vs-move-scatter"]
},
"ROW-4": {
"type": "ROW",
"children": ["CHART-accuracy-by-symbol", "CHART-accuracy-by-action"]
},
"ROW-5": {
"type": "ROW",
"children": ["CHART-recent-predictions-table"]
}
},
"metadata": {
"refresh_frequency": 600,
"default_filters": "{}",
"color_scheme": "supersetColors"
},
"charts": [
{
"slice_name": "Overall Hit Rate",
"viz_type": "big_number_total",
"description": "Fraction of predictions with correct directional outcome over the last 30 days",
"datasource_type": "trino",
"query": "SELECT ROUND(CAST(COUNT(CASE WHEN outcome = 'correct' THEN 1 END) AS DOUBLE) / NULLIF(COUNT(*), 0), 4) AS hit_rate FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY"
},
{
"slice_name": "Total Predictions (30d)",
"viz_type": "big_number_total",
"description": "Total evaluated predictions in the last 30 days",
"datasource_type": "trino",
"query": "SELECT COUNT(*) AS total_predictions FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY"
},
{
"slice_name": "Avg Predicted Confidence",
"viz_type": "big_number_total",
"description": "Average confidence of predictions in the last 30 days",
"datasource_type": "trino",
"query": "SELECT ROUND(AVG(predicted_confidence), 3) AS avg_confidence FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY"
},
{
"slice_name": "Avg Realized Move",
"viz_type": "big_number_total",
"description": "Average absolute realized price move percentage",
"datasource_type": "trino",
"query": "SELECT ROUND(AVG(ABS(actual_move_pct)), 3) AS avg_abs_move FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY"
},
{
"slice_name": "Daily Hit Rate",
"viz_type": "echarts_timeseries_line",
"description": "Daily prediction hit rate over the last 30 days",
"datasource_type": "trino",
"query": "SELECT dt AS bucket, COUNT(*) AS total, COUNT(CASE WHEN outcome = 'correct' THEN 1 END) AS correct, ROUND(CAST(COUNT(CASE WHEN outcome = 'correct' THEN 1 END) AS DOUBLE) / NULLIF(COUNT(*), 0), 4) AS hit_rate FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY dt ORDER BY dt"
},
{
"slice_name": "Outcome Distribution",
"viz_type": "pie",
"description": "Breakdown of prediction outcomes (correct, incorrect, neutral) over the last 30 days",
"datasource_type": "trino",
"query": "SELECT outcome, COUNT(*) AS count FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY outcome ORDER BY count DESC"
},
{
"slice_name": "Confidence Calibration",
"viz_type": "echarts_timeseries_bar",
"description": "Hit rate by confidence bucket to assess calibration quality",
"datasource_type": "trino",
"query": "SELECT CASE WHEN predicted_confidence >= 0.8 THEN '0.8-1.0 (high)' WHEN predicted_confidence >= 0.6 THEN '0.6-0.8 (medium)' WHEN predicted_confidence >= 0.4 THEN '0.4-0.6 (low)' ELSE '0.0-0.4 (very low)' END AS confidence_bucket, COUNT(*) AS total, COUNT(CASE WHEN outcome = 'correct' THEN 1 END) AS correct, ROUND(CAST(COUNT(CASE WHEN outcome = 'correct' THEN 1 END) AS DOUBLE) / NULLIF(COUNT(*), 0), 4) AS hit_rate FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY 1 ORDER BY 1",
"params": {
"x_axis": "confidence_bucket",
"metrics": ["hit_rate"]
}
},
{
"slice_name": "Confidence vs Realized Move",
"viz_type": "echarts_timeseries_scatter",
"description": "Scatter plot of predicted confidence vs actual realized move percentage",
"datasource_type": "trino",
"query": "SELECT ticker, predicted_confidence, actual_move_pct, predicted_action, outcome, dt FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY ORDER BY dt DESC",
"params": {
"x_axis": "predicted_confidence",
"y_axis": "actual_move_pct",
"groupby": ["outcome"]
}
},
{
"slice_name": "Accuracy by Symbol",
"viz_type": "table",
"description": "Per-symbol prediction accuracy summary",
"datasource_type": "trino",
"query": "SELECT ticker, COUNT(*) AS predictions, COUNT(CASE WHEN outcome = 'correct' THEN 1 END) AS correct, COUNT(CASE WHEN outcome = 'incorrect' THEN 1 END) AS incorrect, ROUND(CAST(COUNT(CASE WHEN outcome = 'correct' THEN 1 END) AS DOUBLE) / NULLIF(COUNT(*), 0), 4) AS hit_rate, ROUND(AVG(predicted_confidence), 3) AS avg_confidence, ROUND(AVG(actual_move_pct), 3) AS avg_move_pct, ROUND(AVG(ABS(actual_move_pct)), 3) AS avg_abs_move_pct FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY ticker ORDER BY hit_rate DESC"
},
{
"slice_name": "Accuracy by Action Type",
"viz_type": "echarts_timeseries_bar",
"description": "Hit rate broken down by predicted action (buy, sell, hold, watch)",
"datasource_type": "trino",
"query": "SELECT predicted_action, COUNT(*) AS total, COUNT(CASE WHEN outcome = 'correct' THEN 1 END) AS correct, ROUND(CAST(COUNT(CASE WHEN outcome = 'correct' THEN 1 END) AS DOUBLE) / NULLIF(COUNT(*), 0), 4) AS hit_rate, ROUND(AVG(predicted_confidence), 3) AS avg_confidence FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY predicted_action ORDER BY predicted_action",
"params": {
"x_axis": "predicted_action",
"metrics": ["hit_rate"]
}
},
{
"slice_name": "Recent Predictions",
"viz_type": "table",
"description": "Most recent evaluated predictions with outcomes",
"datasource_type": "trino",
"query": "SELECT ticker, predicted_action, ROUND(predicted_confidence, 3) AS confidence, ROUND(actual_move_pct, 3) AS actual_move_pct, outcome, horizon_days, model_version, predicted_at, evaluated_at FROM lakehouse.stonks.prediction_vs_outcome WHERE dt >= CURRENT_DATE - INTERVAL '14' DAY ORDER BY evaluated_at DESC LIMIT 50"
}
]
}
+120
View File
@@ -0,0 +1,120 @@
{
"dashboard_title": "Sentiment Heatmap",
"description": "Market-wide sentiment visualization by sector and symbol, with trend direction and catalyst analysis.",
"slug": "sentiment-heatmap",
"position_json": {
"HEADER_ID": {"id": "HEADER_ID", "type": "HEADER", "meta": {"text": "Sentiment Heatmap"}},
"ROW-1": {
"type": "ROW",
"children": ["CHART-bullish-count-kpi", "CHART-bearish-count-kpi", "CHART-mixed-count-kpi", "CHART-avg-contradiction-kpi"]
},
"ROW-2": {
"type": "ROW",
"children": ["CHART-sentiment-heatmap"]
},
"ROW-3": {
"type": "ROW",
"children": ["CHART-sentiment-timeseries", "CHART-catalyst-breakdown"]
},
"ROW-4": {
"type": "ROW",
"children": ["CHART-contradiction-scatter", "CHART-sentiment-distribution"]
},
"ROW-5": {
"type": "ROW",
"children": ["CHART-symbol-sentiment-detail"]
}
},
"metadata": {
"refresh_frequency": 300,
"default_filters": "{}",
"color_scheme": "supersetColors"
},
"charts": [
{
"slice_name": "Bullish Signals (7d)",
"viz_type": "big_number_total",
"description": "Count of bullish trend signals in the last 7 days",
"datasource_type": "trino",
"query": "SELECT COUNT(*) AS bullish_count FROM lakehouse.stonks.trade_signals WHERE trend_direction = 'bullish' AND dt >= CURRENT_DATE - INTERVAL '7' DAY"
},
{
"slice_name": "Bearish Signals (7d)",
"viz_type": "big_number_total",
"description": "Count of bearish trend signals in the last 7 days",
"datasource_type": "trino",
"query": "SELECT COUNT(*) AS bearish_count FROM lakehouse.stonks.trade_signals WHERE trend_direction = 'bearish' AND dt >= CURRENT_DATE - INTERVAL '7' DAY"
},
{
"slice_name": "Mixed Signals (7d)",
"viz_type": "big_number_total",
"description": "Count of mixed or neutral trend signals in the last 7 days",
"datasource_type": "trino",
"query": "SELECT COUNT(*) AS mixed_count FROM lakehouse.stonks.trade_signals WHERE trend_direction IN ('mixed', 'neutral') AND dt >= CURRENT_DATE - INTERVAL '7' DAY"
},
{
"slice_name": "Avg Contradiction Score (7d)",
"viz_type": "big_number_total",
"description": "Average contradiction score across all signals in the last 7 days",
"datasource_type": "trino",
"query": "SELECT ROUND(AVG(contradiction_score), 3) AS avg_contradiction FROM lakehouse.stonks.trade_signals WHERE dt >= CURRENT_DATE - INTERVAL '7' DAY"
},
{
"slice_name": "Sentiment Heatmap by Symbol",
"viz_type": "heatmap",
"description": "Daily average sentiment impact score by symbol over the last 14 days",
"datasource_type": "trino",
"query": "SELECT de.ticker, de.dt, ROUND(AVG(de.impact_score), 3) AS avg_impact, AVG(CASE WHEN de.sentiment = 'positive' THEN 1.0 WHEN de.sentiment = 'negative' THEN -1.0 ELSE 0.0 END) AS sentiment_score FROM lakehouse.stonks.document_extractions de WHERE de.dt >= CURRENT_DATE - INTERVAL '14' DAY GROUP BY de.ticker, de.dt ORDER BY de.ticker, de.dt",
"params": {
"x_axis": "dt",
"y_axis": "ticker",
"metric": "sentiment_score"
}
},
{
"slice_name": "Sentiment Trend Over Time",
"viz_type": "echarts_timeseries_line",
"description": "Daily average sentiment score across all symbols over the last 30 days",
"datasource_type": "trino",
"query": "SELECT de.dt AS bucket, ROUND(AVG(CASE WHEN de.sentiment = 'positive' THEN 1.0 WHEN de.sentiment = 'negative' THEN -1.0 ELSE 0.0 END), 3) AS avg_sentiment, COUNT(*) AS extraction_count FROM lakehouse.stonks.document_extractions de WHERE de.dt >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY de.dt ORDER BY de.dt"
},
{
"slice_name": "Catalyst Type Breakdown",
"viz_type": "pie",
"description": "Distribution of catalyst types across extractions in the last 14 days",
"datasource_type": "trino",
"query": "SELECT catalyst_type, COUNT(*) AS count FROM lakehouse.stonks.document_extractions WHERE dt >= CURRENT_DATE - INTERVAL '14' DAY AND catalyst_type IS NOT NULL GROUP BY catalyst_type ORDER BY count DESC"
},
{
"slice_name": "Contradiction vs Confidence",
"viz_type": "echarts_timeseries_scatter",
"description": "Scatter of contradiction score vs confidence for recent signals",
"datasource_type": "trino",
"query": "SELECT ticker, confidence, contradiction_score, trend_strength, trend_direction, dt FROM lakehouse.stonks.trade_signals WHERE dt >= CURRENT_DATE - INTERVAL '14' DAY ORDER BY dt DESC",
"params": {
"x_axis": "confidence",
"y_axis": "contradiction_score",
"groupby": ["trend_direction"]
}
},
{
"slice_name": "Sentiment Distribution by Symbol",
"viz_type": "echarts_timeseries_bar",
"description": "Count of positive, negative, and neutral extractions per symbol in the last 14 days",
"datasource_type": "trino",
"query": "SELECT ticker, sentiment, COUNT(*) AS count FROM lakehouse.stonks.document_extractions WHERE dt >= CURRENT_DATE - INTERVAL '14' DAY GROUP BY ticker, sentiment ORDER BY ticker, sentiment",
"params": {
"x_axis": "ticker",
"metrics": ["count"],
"groupby": ["sentiment"]
}
},
{
"slice_name": "Symbol Sentiment Detail",
"viz_type": "table",
"description": "Per-symbol sentiment summary with extraction counts, average impact, and dominant catalysts",
"datasource_type": "trino",
"query": "SELECT de.ticker, COUNT(*) AS extractions, ROUND(AVG(de.impact_score), 3) AS avg_impact, ROUND(AVG(de.confidence), 3) AS avg_confidence, ROUND(AVG(de.novelty_score), 3) AS avg_novelty, COUNT(CASE WHEN de.sentiment = 'positive' THEN 1 END) AS positive_count, COUNT(CASE WHEN de.sentiment = 'negative' THEN 1 END) AS negative_count, COUNT(CASE WHEN de.sentiment = 'neutral' THEN 1 END) AS neutral_count, ts.trend_direction AS latest_trend, ts.trend_strength AS latest_trend_strength FROM lakehouse.stonks.document_extractions de LEFT JOIN lakehouse.stonks.trade_signals ts ON de.ticker = ts.ticker AND ts.dt = (SELECT MAX(dt) FROM lakehouse.stonks.trade_signals WHERE ticker = de.ticker) WHERE de.dt >= CURRENT_DATE - INTERVAL '14' DAY GROUP BY de.ticker, ts.trend_direction, ts.trend_strength ORDER BY de.ticker"
}
]
}
+104
View File
@@ -0,0 +1,104 @@
{
"dashboard_title": "Symbol Overview",
"description": "Company profiles, source health, recent documents, and market snapshot for tracked symbols.",
"slug": "symbol-overview",
"position_json": {
"HEADER_ID": {"id": "HEADER_ID", "type": "HEADER", "meta": {"text": "Symbol Overview"}},
"ROW-1": {
"type": "ROW",
"children": ["CHART-tracked-symbols-kpi", "CHART-total-documents-kpi", "CHART-total-extractions-kpi", "CHART-active-signals-kpi"]
},
"ROW-2": {
"type": "ROW",
"children": ["CHART-company-summary-table"]
},
"ROW-3": {
"type": "ROW",
"children": ["CHART-recent-documents-timeseries", "CHART-document-type-breakdown"]
},
"ROW-4": {
"type": "ROW",
"children": ["CHART-latest-prices-table"]
},
"ROW-5": {
"type": "ROW",
"children": ["CHART-recent-documents-table"]
}
},
"metadata": {
"refresh_frequency": 300,
"default_filters": "{}",
"color_scheme": "supersetColors"
},
"charts": [
{
"slice_name": "Tracked Symbols",
"viz_type": "big_number_total",
"description": "Count of distinct symbols with documents in the last 30 days",
"datasource_type": "trino",
"query": "SELECT COUNT(DISTINCT ticker) AS tracked_symbols FROM lakehouse.stonks.documents WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY"
},
{
"slice_name": "Total Documents (30d)",
"viz_type": "big_number_total",
"description": "Total documents ingested in the last 30 days",
"datasource_type": "trino",
"query": "SELECT COUNT(*) AS total_documents FROM lakehouse.stonks.documents WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY"
},
{
"slice_name": "Total Extractions (30d)",
"viz_type": "big_number_total",
"description": "Total AI extractions completed in the last 30 days",
"datasource_type": "trino",
"query": "SELECT COUNT(*) AS total_extractions FROM lakehouse.stonks.document_extractions WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY"
},
{
"slice_name": "Active Signals (7d)",
"viz_type": "big_number_total",
"description": "Trade signals generated in the last 7 days",
"datasource_type": "trino",
"query": "SELECT COUNT(*) AS active_signals FROM lakehouse.stonks.trade_signals WHERE dt >= CURRENT_DATE - INTERVAL '7' DAY"
},
{
"slice_name": "Company Summary",
"viz_type": "table",
"description": "Per-symbol summary with document counts, extraction counts, latest signal, and latest price",
"datasource_type": "trino",
"query": "SELECT d.ticker, COUNT(DISTINCT d.document_id) AS documents_30d, COUNT(DISTINCT de.document_id) AS extractions_30d, MAX(d.published_at) AS latest_document_at, MAX(ts.generated_at) AS latest_signal_at, MAX(ts.trend_direction) AS latest_trend, MAX(mb.close_price) AS latest_close FROM lakehouse.stonks.documents d LEFT JOIN lakehouse.stonks.document_extractions de ON d.ticker = de.ticker AND de.dt >= CURRENT_DATE - INTERVAL '30' DAY LEFT JOIN lakehouse.stonks.trade_signals ts ON d.ticker = ts.ticker AND ts.dt = (SELECT MAX(dt) FROM lakehouse.stonks.trade_signals WHERE ticker = d.ticker) LEFT JOIN lakehouse.stonks.market_bars mb ON d.ticker = mb.ticker AND mb.dt = (SELECT MAX(dt) FROM lakehouse.stonks.market_bars WHERE ticker = d.ticker) WHERE d.dt >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY d.ticker ORDER BY d.ticker"
},
{
"slice_name": "Documents Ingested Over Time",
"viz_type": "echarts_timeseries_bar",
"description": "Daily document ingestion counts by source type over the last 30 days",
"datasource_type": "trino",
"query": "SELECT dt AS bucket, source_type, COUNT(*) AS doc_count FROM lakehouse.stonks.documents WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY dt, source_type ORDER BY dt",
"params": {
"x_axis": "bucket",
"metrics": ["doc_count"],
"groupby": ["source_type"],
"time_grain_sqla": "P1D"
}
},
{
"slice_name": "Document Type Breakdown",
"viz_type": "pie",
"description": "Distribution of documents by type in the last 30 days",
"datasource_type": "trino",
"query": "SELECT document_type, COUNT(*) AS count FROM lakehouse.stonks.documents WHERE dt >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY document_type ORDER BY count DESC"
},
{
"slice_name": "Latest Prices by Symbol",
"viz_type": "table",
"description": "Most recent closing prices and volume for each tracked symbol",
"datasource_type": "trino",
"query": "SELECT mb.ticker, mb.close_price, mb.open_price, mb.high_price, mb.low_price, mb.volume, mb.vwap, mb.bar_timestamp FROM lakehouse.stonks.market_bars mb INNER JOIN (SELECT ticker, MAX(bar_timestamp) AS max_ts FROM lakehouse.stonks.market_bars GROUP BY ticker) latest ON mb.ticker = latest.ticker AND mb.bar_timestamp = latest.max_ts ORDER BY mb.ticker"
},
{
"slice_name": "Recent Documents",
"viz_type": "table",
"description": "Most recently ingested documents across all symbols",
"datasource_type": "trino",
"query": "SELECT ticker, document_type, source_type, title, publisher, published_at, retrieved_at, confidence FROM lakehouse.stonks.documents WHERE dt >= CURRENT_DATE - INTERVAL '7' DAY ORDER BY retrieved_at DESC LIMIT 50"
}
]
}