33 lines
1.5 KiB
Markdown
33 lines
1.5 KiB
Markdown
# Lakehouse Schemas
|
|
|
|
Analytical fact table definitions for MinIO-backed datasets queried via Trino.
|
|
|
|
All tables use Hive-compatible partition layouts on MinIO (`s3a://stonks-lakehouse/warehouse/`)
|
|
and are defined in the `lakehouse.stonks` schema. Parquet is the storage format.
|
|
|
|
## Fact Tables
|
|
- `lake.market_bars` — OHLCV bar data per symbol per interval
|
|
- `lake.market_quotes` — bid/ask quote snapshots
|
|
- `lake.company_events` — corporate actions, earnings, filings, and issuer events
|
|
- `lake.documents` — ingested document metadata (articles, filings, transcripts)
|
|
- `lake.document_extractions` — AI extraction outputs per document per company
|
|
- `lake.trade_signals` — aggregated trend signals and recommendation actions
|
|
- `lake.trade_orders` — order submission records (paper and live)
|
|
- `lake.trade_fills` — fill and execution records from broker
|
|
- `lake.positions_daily` — end-of-day position snapshots
|
|
- `lake.pnl_daily` — daily PnL records per symbol per account
|
|
- `lake.prediction_vs_outcome` — prediction accuracy tracking
|
|
- `lake.model_performance` — extraction model performance metrics
|
|
|
|
## Partitioning
|
|
- Most tables partition by `dt` (date)
|
|
- `document_extractions`, `prediction_vs_outcome`, and `model_performance` also partition by `model_version`
|
|
|
|
## Trino Catalogs
|
|
- `lakehouse` catalog (Hive connector) for external Hive-compatible tables
|
|
- `iceberg` catalog (Iceberg connector) for managed Iceberg tables
|
|
|
|
## Views
|
|
Example SQL views for dashboards and ad hoc analysis are in `lakehouse/views/`.
|
|
See `lakehouse/views/README.md` for details.
|