phase 0+1: project scaffold, k8s manifests, CI pipeline, steering, hooks, tests

- Repository structure for all services, infra, lakehouse, dashboards
- K8s manifests targeting stonks-oracle namespace with GHCR images
- Ingress via Traefik with ca-issuer TLS for internal services
- ConfigMap wired to existing cluster services (pg, redis, minio, ollama)
- GitHub Actions workflow for lint, test, multi-service container builds
- Dockerfile with build-arg CMD per service
- Makefile for local build/push/deploy
- Steering rules for TDD workflow, K8s conventions, project context
- Agent hooks for lint-on-save, test-on-save, k8s-validate, phase-commit
- Ruff linter config, all lint issues fixed
- 14 passing tests for schemas, config, redis keys
- PostgreSQL migrations, Trino catalogs, Superset config, MinIO lifecycle
This commit is contained in:
Celes Renata
2026-04-11 03:25:08 -07:00
parent 8cfc4f423b
commit ebea70573b
90 changed files with 3590 additions and 19 deletions
+40
View File
@@ -0,0 +1,40 @@
# Kubernetes Manifests — Stonks Oracle
All manifests target the `stonks-oracle` namespace.
## Prerequisites (already running in cluster)
- `postgresql-service` — PostgreSQL
- `redis-service` — Redis
- `minio-service` / `minio-operator` — MinIO
- `ollama-service` — Ollama LLM
## Shared Configuration
- `namespace.yaml` — namespace definition
- `configmap.yaml` — environment config referencing existing cluster services
- `secrets.yaml` — credentials (update before deploying)
## Application Workloads
- `symbol-registry.yaml` — company/watchlist/source management API
- `scheduler.yaml` — polling orchestrator
- `ingestion-worker.yaml` — fetches external data, stores raw artifacts
- `parser-worker.yaml` — HTML-to-text, normalization, quality scoring
- `extractor-worker.yaml` — Ollama structured extraction
- `aggregation-worker.yaml` — trend summaries and signal aggregation
- `recommendation-worker.yaml` — trade recommendation generation
- `risk-engine.yaml` — risk controls and trade eligibility API
- `broker-adapter.yaml` — paper/live trading adapter
- `lake-publisher.yaml` — operational-to-analytical fact publisher
- `query-api.yaml` — analytics and admin API
## Analytics Infrastructure
- `hive-metastore.yaml` — Hive Metastore for Trino catalog
- `trino.yaml` — SQL query engine with Hive + Iceberg catalogs
- `superset.yaml` — dashboard and exploration layer
## Deploy
```bash
kubectl apply -f infra/k8s/namespace.yaml
kubectl apply -f infra/k8s/configmap.yaml
kubectl apply -f infra/k8s/secrets.yaml
kubectl apply -f infra/k8s/
```
+34
View File
@@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: aggregation-worker
namespace: stonks-oracle
labels:
app: aggregation-worker
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: aggregation-worker
template:
metadata:
labels:
app: aggregation-worker
spec:
containers:
- name: aggregation-worker
image: ghcr.io/celesrenata/stonks-oracle/aggregation:latest
imagePullPolicy: Always
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
+34
View File
@@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: broker-adapter
namespace: stonks-oracle
labels:
app: broker-adapter
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: broker-adapter
template:
metadata:
labels:
app: broker-adapter
spec:
containers:
- name: broker-adapter
image: ghcr.io/celesrenata/stonks-oracle/broker-adapter:latest
imagePullPolicy: Always
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
+39
View File
@@ -0,0 +1,39 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: stonks-config
namespace: stonks-oracle
labels:
app.kubernetes.io/part-of: stonks-oracle
data:
# PostgreSQL — existing cluster service
POSTGRES_HOST: "postgresql-rw.postgresql-service.svc.cluster.local"
POSTGRES_PORT: "5432"
POSTGRES_DB: "stonks"
POSTGRES_USER: "stonks"
# Redis — existing cluster service
REDIS_HOST: "redis-master.redis-service.svc.cluster.local"
REDIS_PORT: "6379"
REDIS_DB: "0"
# MinIO — existing cluster service
MINIO_ENDPOINT: "minio.minio-service.svc.cluster.local:80"
MINIO_SECURE: "false"
# Ollama — existing cluster service
OLLAMA_BASE_URL: "http://ollama.ollama-service.svc.cluster.local:11434"
OLLAMA_MODEL: "llama3.1:8b"
OLLAMA_TIMEOUT: "120"
# Trino — deployed in stonks-oracle namespace
TRINO_HOST: "trino.stonks-oracle.svc.cluster.local"
TRINO_PORT: "8080"
TRINO_CATALOG: "lakehouse"
TRINO_SCHEMA: "stonks"
# Broker
BROKER_MODE: "paper"
# General
LOG_LEVEL: "INFO"
+34
View File
@@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: extractor-worker
namespace: stonks-oracle
labels:
app: extractor-worker
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: extractor-worker
template:
metadata:
labels:
app: extractor-worker
spec:
containers:
- name: extractor-worker
image: ghcr.io/celesrenata/stonks-oracle/extractor:latest
imagePullPolicy: Always
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: "1"
memory: 512Mi
+68
View File
@@ -0,0 +1,68 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: hive-metastore
namespace: stonks-oracle
labels:
app: hive-metastore
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: hive-metastore
template:
metadata:
labels:
app: hive-metastore
spec:
containers:
- name: hive-metastore
image: apache/hive:4.0.0
ports:
- containerPort: 9083
env:
- name: SERVICE_NAME
value: metastore
- name: DB_DRIVER
value: derby
- name: SERVICE_OPTS
value: "-Djavax.jdo.option.ConnectionURL=jdbc:derby:/opt/hive/data/metastore_db;create=true"
volumeMounts:
- name: hive-data
mountPath: /opt/hive/data
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: "1"
memory: 1Gi
volumes:
- name: hive-data
persistentVolumeClaim:
claimName: hive-metastore-data
---
apiVersion: v1
kind: Service
metadata:
name: hive-metastore
namespace: stonks-oracle
spec:
selector:
app: hive-metastore
ports:
- port: 9083
targetPort: 9083
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: hive-metastore-data
namespace: stonks-oracle
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
+34
View File
@@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: ingestion-worker
namespace: stonks-oracle
labels:
app: ingestion-worker
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 2
selector:
matchLabels:
app: ingestion-worker
template:
metadata:
labels:
app: ingestion-worker
spec:
containers:
- name: ingestion-worker
image: ghcr.io/celesrenata/stonks-oracle/ingestion:latest
imagePullPolicy: Always
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
+99
View File
@@ -0,0 +1,99 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: stonks-query-api-https
namespace: stonks-oracle
annotations:
cert-manager.io/cluster-issuer: ca-issuer
spec:
ingressClassName: traefik
tls:
- hosts:
- stonks-api.celestium.life
secretName: stonks-api-tls
rules:
- host: stonks-api.celestium.life
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: query-api
port:
number: 8000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: stonks-registry-https
namespace: stonks-oracle
annotations:
cert-manager.io/cluster-issuer: ca-issuer
spec:
ingressClassName: traefik
tls:
- hosts:
- stonks-registry.celestium.life
secretName: stonks-registry-tls
rules:
- host: stonks-registry.celestium.life
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: symbol-registry-api
port:
number: 8000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: stonks-superset-https
namespace: stonks-oracle
annotations:
cert-manager.io/cluster-issuer: ca-issuer
spec:
ingressClassName: traefik
tls:
- hosts:
- stonks-dash.celestium.life
secretName: stonks-dash-tls
rules:
- host: stonks-dash.celestium.life
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: superset
port:
number: 8088
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: stonks-trino-https
namespace: stonks-oracle
annotations:
cert-manager.io/cluster-issuer: ca-issuer
spec:
ingressClassName: traefik
tls:
- hosts:
- stonks-trino.celestium.life
secretName: stonks-trino-tls
rules:
- host: stonks-trino.celestium.life
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: trino
port:
number: 8080
+34
View File
@@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: lake-publisher
namespace: stonks-oracle
labels:
app: lake-publisher
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: lake-publisher
template:
metadata:
labels:
app: lake-publisher
spec:
containers:
- name: lake-publisher
image: ghcr.io/celesrenata/stonks-oracle/lake-publisher:latest
imagePullPolicy: Always
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
+6
View File
@@ -0,0 +1,6 @@
apiVersion: v1
kind: Namespace
metadata:
name: stonks-oracle
labels:
app.kubernetes.io/part-of: stonks-oracle
+34
View File
@@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: parser-worker
namespace: stonks-oracle
labels:
app: parser-worker
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 2
selector:
matchLabels:
app: parser-worker
template:
metadata:
labels:
app: parser-worker
spec:
containers:
- name: parser-worker
image: ghcr.io/celesrenata/stonks-oracle/parser:latest
imagePullPolicy: Always
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
+54
View File
@@ -0,0 +1,54 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: query-api
namespace: stonks-oracle
labels:
app: query-api
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: query-api
template:
metadata:
labels:
app: query-api
spec:
containers:
- name: query-api
image: ghcr.io/celesrenata/stonks-oracle/query-api:latest
imagePullPolicy: Always
ports:
- containerPort: 8000
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
readinessProbe:
httpGet:
path: /docs
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: query-api
namespace: stonks-oracle
spec:
selector:
app: query-api
ports:
- port: 8000
targetPort: 8000
+34
View File
@@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: recommendation-worker
namespace: stonks-oracle
labels:
app: recommendation-worker
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: recommendation-worker
template:
metadata:
labels:
app: recommendation-worker
spec:
containers:
- name: recommendation-worker
image: ghcr.io/celesrenata/stonks-oracle/recommendation:latest
imagePullPolicy: Always
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
+48
View File
@@ -0,0 +1,48 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: risk-engine
namespace: stonks-oracle
labels:
app: risk-engine
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: risk-engine
template:
metadata:
labels:
app: risk-engine
spec:
containers:
- name: risk-engine
image: ghcr.io/celesrenata/stonks-oracle/risk:latest
imagePullPolicy: Always
ports:
- containerPort: 8000
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
name: risk-engine
namespace: stonks-oracle
spec:
selector:
app: risk-engine
ports:
- port: 8000
targetPort: 8000
+34
View File
@@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: scheduler
namespace: stonks-oracle
labels:
app: scheduler
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: scheduler
template:
metadata:
labels:
app: scheduler
spec:
containers:
- name: scheduler
image: ghcr.io/celesrenata/stonks-oracle/scheduler:latest
imagePullPolicy: Always
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
+17
View File
@@ -0,0 +1,17 @@
apiVersion: v1
kind: Secret
metadata:
name: stonks-secrets
namespace: stonks-oracle
labels:
app.kubernetes.io/part-of: stonks-oracle
type: Opaque
stringData:
POSTGRES_PASSWORD: "changeme"
MINIO_ACCESS_KEY: "changeme"
MINIO_SECRET_KEY: "changeme"
REDIS_PASSWORD: ""
BROKER_API_KEY: ""
BROKER_API_SECRET: ""
BROKER_BASE_URL: ""
SUPERSET_SECRET_KEY: "stonks-superset-secret-change-me"
+105
View File
@@ -0,0 +1,105 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: superset
namespace: stonks-oracle
labels:
app: superset
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: superset
template:
metadata:
labels:
app: superset
spec:
containers:
- name: superset
image: apache/superset:latest
ports:
- containerPort: 8088
env:
- name: SUPERSET_SECRET_KEY
valueFrom:
secretKeyRef:
name: stonks-secrets
key: SUPERSET_SECRET_KEY
- name: ADMIN_USERNAME
value: admin
- name: ADMIN_PASSWORD
value: admin
- name: ADMIN_EMAIL
value: admin@stonks.local
volumeMounts:
- name: superset-home
mountPath: /app/superset_home
- name: superset-config
mountPath: /app/pythonpath/superset_config.py
subPath: superset_config.py
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: "1"
memory: 2Gi
readinessProbe:
httpGet:
path: /health
port: 8088
initialDelaySeconds: 30
periodSeconds: 15
volumes:
- name: superset-home
persistentVolumeClaim:
claimName: superset-data
- name: superset-config
configMap:
name: superset-config
---
apiVersion: v1
kind: Service
metadata:
name: superset
namespace: stonks-oracle
spec:
selector:
app: superset
ports:
- port: 8088
targetPort: 8088
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: superset-data
namespace: stonks-oracle
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: superset-config
namespace: stonks-oracle
data:
superset_config.py: |
import os
SECRET_KEY = os.getenv("SUPERSET_SECRET_KEY", "stonks-dev-secret-key-change-me")
SQLALCHEMY_DATABASE_URI = "trino://trino@trino.stonks-oracle.svc.cluster.local:8080/lakehouse/stonks"
FEATURE_FLAGS = {"ENABLE_TEMPLATE_PROCESSING": True}
CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_DEFAULT_TIMEOUT": 300,
"CACHE_KEY_PREFIX": "superset_",
"CACHE_REDIS_HOST": os.getenv("REDIS_HOST", "redis.redis-service.svc.cluster.local"),
"CACHE_REDIS_PORT": int(os.getenv("REDIS_PORT", "6379")),
"CACHE_REDIS_DB": 1,
}
+60
View File
@@ -0,0 +1,60 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: symbol-registry-api
namespace: stonks-oracle
labels:
app: symbol-registry-api
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: symbol-registry-api
template:
metadata:
labels:
app: symbol-registry-api
spec:
containers:
- name: symbol-registry-api
image: ghcr.io/celesrenata/stonks-oracle/symbol-registry:latest
imagePullPolicy: Always
ports:
- containerPort: 8000
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-secrets
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
readinessProbe:
httpGet:
path: /docs
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /docs
port: 8000
initialDelaySeconds: 10
periodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
name: symbol-registry-api
namespace: stonks-oracle
spec:
selector:
app: symbol-registry-api
ports:
- port: 8000
targetPort: 8000
+79
View File
@@ -0,0 +1,79 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: trino
namespace: stonks-oracle
labels:
app: trino
app.kubernetes.io/part-of: stonks-oracle
spec:
replicas: 1
selector:
matchLabels:
app: trino
template:
metadata:
labels:
app: trino
spec:
containers:
- name: trino
image: trinodb/trino:latest
ports:
- containerPort: 8080
volumeMounts:
- name: catalog-config
mountPath: /etc/trino/catalog
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
readinessProbe:
httpGet:
path: /v1/info
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
volumes:
- name: catalog-config
configMap:
name: trino-catalog
---
apiVersion: v1
kind: Service
metadata:
name: trino
namespace: stonks-oracle
spec:
selector:
app: trino
ports:
- port: 8080
targetPort: 8080
---
apiVersion: v1
kind: ConfigMap
metadata:
name: trino-catalog
namespace: stonks-oracle
data:
iceberg.properties: |
connector.name=iceberg
iceberg.catalog.type=hive_metastore
hive.metastore.uri=thrift://hive-metastore.stonks-oracle.svc.cluster.local:9083
hive.s3.endpoint=http://minio.minio-service.svc.cluster.local:80
hive.s3.path-style-access=true
hive.s3.aws-access-key=changeme
hive.s3.aws-secret-key=changeme
lakehouse.properties: |
connector.name=hive
hive.metastore.uri=thrift://hive-metastore.stonks-oracle.svc.cluster.local:9083
hive.s3.endpoint=http://minio.minio-service.svc.cluster.local:80
hive.s3.path-style-access=true
hive.s3.aws-access-key=changeme
hive.s3.aws-secret-key=changeme
hive.non-managed-table-writes-enabled=true
hive.s3select-pushdown.enabled=true
+99
View File
@@ -0,0 +1,99 @@
-- Stonks Oracle - Initial PostgreSQL Schema
-- Phase 1: Core data model
-- Extensions
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
-- ============================================================
-- Companies and Watchlists
-- ============================================================
CREATE TABLE companies (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
ticker VARCHAR(20) NOT NULL,
legal_name VARCHAR(500) NOT NULL,
exchange VARCHAR(50),
sector VARCHAR(200),
industry VARCHAR(200),
market_cap_bucket VARCHAR(50),
active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(ticker, exchange)
);
CREATE TABLE company_aliases (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
company_id UUID NOT NULL REFERENCES companies(id) ON DELETE CASCADE,
alias VARCHAR(500) NOT NULL,
alias_type VARCHAR(50) NOT NULL DEFAULT 'brand',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_company_aliases_company ON company_aliases(company_id);
CREATE INDEX idx_company_aliases_alias ON company_aliases(alias);
CREATE TABLE watchlists (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name VARCHAR(200) NOT NULL UNIQUE,
description TEXT,
active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE watchlist_members (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
watchlist_id UUID NOT NULL REFERENCES watchlists(id) ON DELETE CASCADE,
company_id UUID NOT NULL REFERENCES companies(id) ON DELETE CASCADE,
added_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(watchlist_id, company_id)
);
-- ============================================================
-- Sources and Credentials
-- ============================================================
CREATE TABLE sources (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
company_id UUID NOT NULL REFERENCES companies(id) ON DELETE CASCADE,
source_type VARCHAR(50) NOT NULL,
source_name VARCHAR(200) NOT NULL,
config JSONB NOT NULL DEFAULT '{}',
credibility_score FLOAT DEFAULT 0.5,
retention_days INTEGER DEFAULT 365,
access_policy VARCHAR(50) DEFAULT 'internal',
active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_sources_company ON sources(company_id);
CREATE INDEX idx_sources_type ON sources(source_type);
CREATE TABLE api_credentials_refs (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
provider VARCHAR(100) NOT NULL UNIQUE,
secret_ref VARCHAR(500) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- ============================================================
-- Ingestion Tracking
-- ============================================================
CREATE TABLE ingestion_runs (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
source_id UUID REFERENCES sources(id),
company_id UUID REFERENCES companies(id),
source_type VARCHAR(50) NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'pending',
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
completed_at TIMESTAMPTZ,
items_fetched INTEGER DEFAULT 0,
items_new INTEGER DEFAULT 0,
error_message TEXT,
retry_count INTEGER DEFAULT 0,
next_retry_at TIMESTAMPTZ
);
CREATE INDEX idx_ingestion_runs_status ON ingestion_runs(status);
CREATE INDEX idx_ingestion_runs_source ON ingestion_runs(source_id);
@@ -0,0 +1,114 @@
-- Stonks Oracle - Documents and Intelligence Schema
-- ============================================================
-- Market Snapshots
-- ============================================================
CREATE TABLE market_snapshots (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
company_id UUID NOT NULL REFERENCES companies(id),
ticker VARCHAR(20) NOT NULL,
snapshot_type VARCHAR(50) NOT NULL,
data JSONB NOT NULL,
source_provider VARCHAR(100),
captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
storage_ref VARCHAR(1000),
content_hash VARCHAR(128)
);
CREATE INDEX idx_market_snapshots_ticker ON market_snapshots(ticker, captured_at DESC);
CREATE INDEX idx_market_snapshots_hash ON market_snapshots(content_hash);
-- ============================================================
-- Documents
-- ============================================================
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
document_type VARCHAR(50) NOT NULL,
source_type VARCHAR(50) NOT NULL,
publisher VARCHAR(500),
url TEXT,
canonical_url TEXT,
title TEXT,
published_at TIMESTAMPTZ,
retrieved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
language VARCHAR(10) DEFAULT 'en',
content_hash VARCHAR(128) NOT NULL,
raw_storage_ref VARCHAR(1000),
normalized_storage_ref VARCHAR(1000),
parse_quality_score FLOAT,
parse_confidence VARCHAR(20) DEFAULT 'unknown',
status VARCHAR(50) NOT NULL DEFAULT 'ingested',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE UNIQUE INDEX idx_documents_hash ON documents(content_hash);
CREATE INDEX idx_documents_status ON documents(status);
CREATE INDEX idx_documents_published ON documents(published_at DESC);
CREATE TABLE document_versions (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
version INTEGER NOT NULL DEFAULT 1,
content_hash VARCHAR(128) NOT NULL,
storage_ref VARCHAR(1000),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE document_company_mentions (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
company_id UUID NOT NULL REFERENCES companies(id),
ticker VARCHAR(20) NOT NULL,
mention_type VARCHAR(50) DEFAULT 'direct',
confidence FLOAT DEFAULT 0.5,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_doc_mentions_doc ON document_company_mentions(document_id);
CREATE INDEX idx_doc_mentions_company ON document_company_mentions(company_id);
-- ============================================================
-- Document Intelligence (AI Extraction)
-- ============================================================
CREATE TABLE document_intelligence (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
summary TEXT,
macro_themes JSONB DEFAULT '[]',
novelty_score FLOAT,
source_credibility FLOAT,
extraction_warnings JSONB DEFAULT '[]',
confidence FLOAT,
model_provider VARCHAR(50),
model_name VARCHAR(200),
prompt_version VARCHAR(100),
schema_version VARCHAR(50),
raw_output_ref VARCHAR(1000),
prompt_ref VARCHAR(1000),
validation_status VARCHAR(50) DEFAULT 'pending',
validation_errors JSONB DEFAULT '[]',
retry_count INTEGER DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_doc_intel_document ON document_intelligence(document_id);
CREATE INDEX idx_doc_intel_validation ON document_intelligence(validation_status);
CREATE TABLE document_impact_records (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
intelligence_id UUID NOT NULL REFERENCES document_intelligence(id) ON DELETE CASCADE,
company_id UUID NOT NULL REFERENCES companies(id),
ticker VARCHAR(20) NOT NULL,
relevance FLOAT,
sentiment VARCHAR(20),
impact_score FLOAT,
impact_horizon VARCHAR(50),
catalyst_type VARCHAR(50),
key_facts JSONB DEFAULT '[]',
risks JSONB DEFAULT '[]',
evidence_spans JSONB DEFAULT '[]',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_impact_intel ON document_impact_records(intelligence_id);
CREATE INDEX idx_impact_company ON document_impact_records(company_id);
@@ -0,0 +1,160 @@
-- Stonks Oracle - Trends, Recommendations, Orders Schema
-- ============================================================
-- Trend Windows
-- ============================================================
CREATE TABLE trend_windows (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
entity_type VARCHAR(50) NOT NULL DEFAULT 'company',
entity_id VARCHAR(100) NOT NULL,
window VARCHAR(20) NOT NULL,
trend_direction VARCHAR(20) NOT NULL DEFAULT 'neutral',
trend_strength FLOAT DEFAULT 0.5,
confidence FLOAT DEFAULT 0.5,
top_supporting_evidence JSONB DEFAULT '[]',
top_opposing_evidence JSONB DEFAULT '[]',
dominant_catalysts JSONB DEFAULT '[]',
material_risks JSONB DEFAULT '[]',
contradiction_score FLOAT DEFAULT 0.0,
market_context JSONB DEFAULT '{}',
generated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_trends_entity ON trend_windows(entity_type, entity_id, window);
CREATE INDEX idx_trends_generated ON trend_windows(generated_at DESC);
-- ============================================================
-- Recommendations
-- ============================================================
CREATE TABLE recommendations (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
ticker VARCHAR(20) NOT NULL,
company_id UUID REFERENCES companies(id),
action VARCHAR(20) NOT NULL DEFAULT 'watch',
mode VARCHAR(30) NOT NULL DEFAULT 'informational',
confidence FLOAT DEFAULT 0.5,
time_horizon VARCHAR(50),
thesis TEXT,
invalidation_conditions JSONB DEFAULT '[]',
portfolio_pct FLOAT DEFAULT 0.02,
max_loss_pct FLOAT DEFAULT 0.005,
model_version VARCHAR(100),
generated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_recommendations_ticker ON recommendations(ticker, generated_at DESC);
CREATE INDEX idx_recommendations_mode ON recommendations(mode);
CREATE TABLE recommendation_evidence (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
recommendation_id UUID NOT NULL REFERENCES recommendations(id) ON DELETE CASCADE,
document_id UUID REFERENCES documents(id),
intelligence_id UUID REFERENCES document_intelligence(id),
evidence_type VARCHAR(50) DEFAULT 'supporting',
weight FLOAT DEFAULT 1.0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_rec_evidence_rec ON recommendation_evidence(recommendation_id);
-- ============================================================
-- Risk Evaluations
-- ============================================================
CREATE TABLE risk_evaluations (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
recommendation_id UUID NOT NULL REFERENCES recommendations(id),
eligible BOOLEAN NOT NULL DEFAULT FALSE,
allowed_mode VARCHAR(30) DEFAULT 'informational',
rejection_reasons JSONB DEFAULT '[]',
risk_checks JSONB DEFAULT '{}',
evaluated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_risk_eval_rec ON risk_evaluations(recommendation_id);
-- ============================================================
-- Broker Accounts and Orders
-- ============================================================
CREATE TABLE broker_accounts (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
provider VARCHAR(100) NOT NULL,
account_id VARCHAR(200) NOT NULL,
mode VARCHAR(20) NOT NULL DEFAULT 'paper',
config JSONB DEFAULT '{}',
active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
recommendation_id UUID REFERENCES recommendations(id),
broker_account_id UUID REFERENCES broker_accounts(id),
ticker VARCHAR(20) NOT NULL,
side VARCHAR(10) NOT NULL,
order_type VARCHAR(20) NOT NULL DEFAULT 'market',
quantity NUMERIC NOT NULL,
limit_price NUMERIC,
stop_price NUMERIC,
status VARCHAR(30) NOT NULL DEFAULT 'pending',
idempotency_key VARCHAR(200) NOT NULL UNIQUE,
broker_order_id VARCHAR(200),
decision_trace JSONB DEFAULT '{}',
submitted_at TIMESTAMPTZ,
acknowledged_at TIMESTAMPTZ,
filled_at TIMESTAMPTZ,
cancelled_at TIMESTAMPTZ,
rejected_at TIMESTAMPTZ,
rejection_reason TEXT,
fill_price NUMERIC,
fill_quantity NUMERIC,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_orders_ticker ON orders(ticker, created_at DESC);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_idempotency ON orders(idempotency_key);
CREATE TABLE order_events (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
order_id UUID NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
event_type VARCHAR(50) NOT NULL,
data JSONB DEFAULT '{}',
broker_timestamp TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_order_events_order ON order_events(order_id);
-- ============================================================
-- Positions
-- ============================================================
CREATE TABLE positions (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
broker_account_id UUID REFERENCES broker_accounts(id),
ticker VARCHAR(20) NOT NULL,
quantity NUMERIC NOT NULL DEFAULT 0,
avg_entry_price NUMERIC,
current_price NUMERIC,
unrealized_pnl NUMERIC,
realized_pnl NUMERIC DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_positions_ticker ON positions(ticker);
-- ============================================================
-- Audit Events
-- ============================================================
CREATE TABLE audit_events (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
event_type VARCHAR(100) NOT NULL,
entity_type VARCHAR(100),
entity_id UUID,
actor VARCHAR(200) DEFAULT 'system',
data JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_audit_events_type ON audit_events(event_type, created_at DESC);
CREATE INDEX idx_audit_events_entity ON audit_events(entity_type, entity_id);
+14
View File
@@ -0,0 +1,14 @@
{
"Rules": [
{
"ID": "raw-retention-365d",
"Status": "Enabled",
"Filter": {
"Prefix": ""
},
"Expiration": {
"Days": 365
}
}
]
}
+23
View File
@@ -0,0 +1,23 @@
"""Apache Superset configuration for Stonks Oracle."""
import os
# Superset secret key
SECRET_KEY = os.getenv("SUPERSET_SECRET_KEY", "stonks-dev-secret-key-change-me")
# Trino datasource
SQLALCHEMY_DATABASE_URI = "trino://trino@trino:8080/lakehouse/stonks"
# Feature flags
FEATURE_FLAGS = {
"ENABLE_TEMPLATE_PROCESSING": True,
}
# Cache config (Redis-backed)
CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_DEFAULT_TIMEOUT": 300,
"CACHE_KEY_PREFIX": "superset_",
"CACHE_REDIS_HOST": os.getenv("REDIS_HOST", "redis"),
"CACHE_REDIS_PORT": int(os.getenv("REDIS_PORT", "6379")),
"CACHE_REDIS_DB": 1,
}
+7
View File
@@ -0,0 +1,7 @@
connector.name=iceberg
iceberg.catalog.type=hive_metastore
hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.endpoint=http://minio:9000
hive.s3.path-style-access=true
hive.s3.aws-access-key=minioadmin
hive.s3.aws-secret-key=minioadmin
+8
View File
@@ -0,0 +1,8 @@
connector.name=hive
hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.endpoint=http://minio:9000
hive.s3.path-style-access=true
hive.s3.aws-access-key=minioadmin
hive.s3.aws-secret-key=minioadmin
hive.non-managed-table-writes-enabled=true
hive.s3select-pushdown.enabled=true