update steering docs and hooks for current project state

This commit is contained in:
Celes Renata
2026-04-11 20:41:57 -07:00
parent 99e17be282
commit 37d5f9b01c
8 changed files with 177 additions and 67 deletions
+13 -8
View File
@@ -1,14 +1,19 @@
--- ---
name: Lint Python on Save name: Lint on Save
description: Run ruff linter when any Python file is saved description: Run linter when Python or TypeScript files are saved
version: "1.0" version: "2.0"
trigger: trigger:
type: onSave type: onSave
filePattern: "**/*.py" filePattern: "**/*.{py,ts,tsx}"
--- ---
When any Python file is saved: When a file is saved:
1. Run `ruff check {filePath}` on the saved file 1. If it's a Python file (`*.py`):
2. If there are fixable issues, run `ruff check --fix {filePath}` to auto-fix - Run `nix-shell -p ruff --run "ruff check {filePath}"` on the saved file
3. Report any remaining issues concisely - If there are fixable issues, run `nix-shell -p ruff --run "ruff check --fix {filePath}"`
- Report any remaining issues concisely
2. If it's a TypeScript/React file (`*.ts` or `*.tsx`) under `frontend/`:
- Run `npx tsc --noEmit` from the `frontend/` directory to check types
- Report any type errors concisely
+5 -2
View File
@@ -1,7 +1,7 @@
--- ---
name: Phase Commit and Push name: Phase Commit and Push
description: Commit and push after completing a spec phase task description: Commit, push, and verify CI after completing a phase task
version: "1.0" version: "2.0"
trigger: trigger:
type: manual type: manual
--- ---
@@ -13,3 +13,6 @@ When triggered manually after completing a phase:
3. Run `git commit -m "{message}"` 3. Run `git commit -m "{message}"`
4. Run `git push origin main` 4. Run `git push origin main`
5. Report the commit SHA and confirm push succeeded 5. Report the commit SHA and confirm push succeeded
6. Wait 30 seconds, then check CI status with `nix-shell -p gh --run "gh run list -L 1"`
7. If CI is still running, report that and suggest checking back later
8. If CI failed, run `nix-shell -p gh --run "gh run view --log-failed"` and report the error
+14 -9
View File
@@ -1,16 +1,21 @@
--- ---
name: Run Tests on Save name: Run Tests on Save
description: Automatically run relevant tests when a Python service file is saved description: Run relevant tests when service or frontend files are saved
version: "1.0" version: "2.0"
trigger: trigger:
type: onSave type: onSave
filePattern: "services/**/*.py" filePattern: "{services/**/*.py,frontend/src/**/*.{ts,tsx}}"
--- ---
When a Python file under `services/` is saved: When a file is saved:
1. Identify which service module was modified (e.g. `services/ingestion/worker.py``ingestion`) 1. If it's a Python file under `services/`:
2. Look for corresponding tests in `tests/` matching the service name - Identify the service module (e.g. `services/ingestion/worker.py``ingestion`)
3. Run `pytest tests/test_{service_name}*.py -x --tb=short -q` if test files exist - Look for corresponding tests in `tests/` matching the service name
4. If no specific test file exists, run `ruff check` on the modified file to catch syntax/lint issues - Run `python -m pytest tests/test_{service_name}*.py -x --tb=short -q` if test files exist
5. Report results concisely — only show failures or a one-line success confirmation - If no specific test file exists, run lint check only
- Report results concisely
2. If it's a TypeScript/React file under `frontend/src/`:
- Run `npx vitest --run` from the `frontend/` directory
- Report results concisely — only show failures or a one-line success
+16 -10
View File
@@ -1,16 +1,22 @@
--- ---
name: Validate K8s Manifests name: Validate Helm & K8s on Save
description: Validate Kubernetes YAML when manifest files are saved description: Validate Helm templates and K8s manifests when infrastructure files are saved
version: "1.0" version: "2.0"
trigger: trigger:
type: onSave type: onSave
filePattern: "infra/k8s/**/*.yaml" filePattern: "infra/**/*.{yaml,yml,tpl}"
--- ---
When a Kubernetes manifest YAML file is saved: When a Helm or K8s manifest file is saved:
1. Parse the YAML to check for syntax errors 1. If it's under `infra/helm/`:
2. Verify required fields exist (apiVersion, kind, metadata) - Run `helm template stonks-oracle infra/helm/stonks-oracle -n stonks-oracle` to validate template rendering
3. Check that namespace is set to `stonks-oracle` for application resources - Check for template syntax errors
4. Verify image references point to `ghcr.io/celesrenata/stonks-oracle/` - Verify the output contains expected resource types (Deployment, Service, Ingress, NetworkPolicy)
5. Report any issues found - Report any rendering errors concisely
2. If it's under `infra/k8s/`:
- Parse the YAML to check for syntax errors
- Verify required fields exist (apiVersion, kind, metadata)
- Check that namespace is set to `stonks-oracle`
- Report any issues found
+27 -22
View File
@@ -3,44 +3,49 @@
## Local Environment ## Local Environment
- Python 3.12 via NixOS, virtualenv at `.venv/` - Python 3.12 via NixOS, virtualenv at `.venv/`
- Always use `.venv/bin/python` or activate with `source .venv/bin/activate` before running Python commands - Always use `.venv/bin/python` or activate with `source .venv/bin/activate` before running Python commands
- When running `pytest`, `ruff`, or any Python tool, use the `.venv` — e.g. `python -m pytest` (not bare `pytest` which may resolve to system Python) - For tools not available in `.venv/` (ruff, gh, etc.), use `nix-shell -p <pkg> --run "<cmd>"`
- Node.js 24 available for frontend work; `frontend/` has its own `node_modules/` - Node.js 24 for frontend; `frontend/` has its own `node_modules/`
- Frontend tests: `cd frontend && npx vitest --run`
- Python tests: `nix-shell -p ruff --run "ruff check services/"` then `python -m pytest tests/ -x --tb=short -q`
## Workflow ## Workflow
1. Write or update tests for the target behavior 1. Write or update tests for the target behavior
2. Implement the minimal code to pass 2. Implement the minimal code to pass
3. Debug failures, fix, re-run 3. Debug failures, fix, re-run
4. Commit and push after each phase completes 4. Commit and push — CI builds images automatically
5. GitHub Actions CI automatically builds container images and pushes to GHCR 5. Deploy: `helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle`
6. Deploy to cluster via Helm or `kubectl apply` 6. Restart changed services: `kubectl rollout restart deployment/<name> -n stonks-oracle`
## Testing ## Testing
- Use `pytest` with `pytest-asyncio` for async code - Python: `pytest` with `pytest-asyncio` for async code, tests in `tests/`
- Tests live in the top-level `tests/` directory - Frontend: Vitest + MSW (Mock Service Worker) for deterministic API mocking, tests in `frontend/src/test/`
- Run tests with `python -m pytest tests/ -x --tb=short -q` - Run Python tests: `python -m pytest tests/ -x --tb=short -q`
- Focus on core logic, not mocking infrastructure - Run frontend tests: `cd frontend && npx vitest --run`
- Lint Python: `nix-shell -p ruff --run "ruff check services/"`
## CI/CD — GitHub Actions ## CI/CD — GitHub Actions
- Workflow file: `.github/workflows/build.yml` - Workflow: `.github/workflows/build.yml`
- Triggers on push to `main` and PRs - Triggers on push to `main` and PRs
- Jobs: - Jobs:
- `lint-and-test`: runs ruff lint + pytest on ubuntu with Python 3.12 - `lint-and-test`: ruff lint + pytest + frontend vitest (Node 24)
- `build-services`: matrix build of all Python services via `docker/Dockerfile`, pushes to GHCR with `:<sha>` and `:latest` tags - `build-services`: matrix build of all Python services → GHCR
- `build-dashboard`: builds `frontend/Dockerfile` separately, pushes `dashboard` image to GHCR - `build-dashboard`: frontend/Dockerfile GHCR
- CI handles image building and pushing — do NOT manually `docker push` unless CI is broken or you need to bypass it - `build-superset`: docker/Dockerfile.superset → GHCR
- After pushing to `main`, wait for CI to complete before deploying (check GitHub Actions status) - CI handles all image builds and pushes — do NOT manually docker push
- If you need to build locally for testing: `make build` or `docker build` directly, but let CI do the GHCR push - Check CI: `nix-shell -p gh --run "gh run list -L 3"`
- Re-run failed: `nix-shell -p gh --run "gh run rerun <id> --failed"`
## Deploy ## Deploy
- Helm chart at `infra/helm/stonks-oracle/` - Full deploy/redeploy: `~/sources/kube/stonks-oracle/runmefirst.sh`
- Deploy: `helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle` - Full teardown: `~/sources/kube/stonks-oracle/runmelast.sh`
- Alternative raw manifests: `kubectl apply -f infra/k8s/` - Quick Helm upgrade: `helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle`
- To restart a deployment after CI pushes new images: `kubectl rollout restart deployment/<name> -n stonks-oracle` - Restart single service: `kubectl rollout restart deployment/<name> -n stonks-oracle`
- Check pods: `kubectl get pods -n stonks-oracle`
## Git Conventions ## Git Conventions
- Commit after each completed phase task - Commit after each completed phase task
- Commit message format: `phase N: short description` - Commit message format: `phase N: short description`
- Push to `main` branch triggers CI - Push to `main` triggers CI
## Code Style ## Code Style
- Python 3.12, type hints everywhere - Python 3.12, type hints everywhere
@@ -49,9 +54,9 @@
- asyncio + asyncpg/aioredis for async I/O - asyncio + asyncpg/aioredis for async I/O
- Minimal dependencies, prefer stdlib where possible - Minimal dependencies, prefer stdlib where possible
- Frontend: React 19, TypeScript strict mode, Tailwind CSS, TanStack Router/Query - Frontend: React 19, TypeScript strict mode, Tailwind CSS, TanStack Router/Query
- UUID fields from asyncpg must be converted to str via `_row_dict()` helpers
## Documentation ## Documentation
- Do NOT create large summary/success markdown files after each step - Do NOT create large summary/success markdown files after each step
- Keep notes short, concise, and organized under `docs/notes/` - Keep notes short, concise, and organized under `docs/notes/`
- Name note files to match the task they relate to (e.g. `docs/notes/phase0-k8s-manifests.md`)
- If a note isn't useful for future reference, don't write it - If a note isn't useful for future reference, don't write it
+43
View File
@@ -0,0 +1,43 @@
---
inclusion: fileMatch
fileMatchPattern: "frontend/**"
---
# Frontend Conventions
## Stack
- React 19, TypeScript strict mode, Vite 8
- Tailwind CSS with custom dark theme (surface-*, brand-* colors)
- TanStack Router (file-based routes in `routes.tsx`)
- TanStack Query for data fetching (hooks in `api/hooks.ts`)
- Recharts for charts, Monaco Editor for SQL, Lucide for icons
## API Client
- `api/client.ts` — shared fetch wrapper with `apiGet`, `apiPost`, `apiPut`, `apiDelete`
- Three API bases: `query` (→ `/api/`), `registry` (→ `/registry/`), `risk` (→ `/risk/`)
- Base URLs use `||` fallback (not `??`) because Vite inlines empty string for undefined env vars
- All hooks in `api/hooks.ts` — typed with TanStack Query
## Testing
- Vitest + MSW (Mock Service Worker) for deterministic tests
- Test setup: `src/test/setup.ts` starts MSW server
- Mock handlers: `src/test/mocks/handlers.ts`
- Test helper: `src/test/render.tsx` provides `renderRoute(path)` with QueryClient + Router
- Run: `npx vitest --run`
## Components
- Shared UI in `components/ui.tsx`: StatusBadge, ConfidenceBar, TrendArrow, DateRangeSelector, TickerFilter, LoadingSpinner, ErrorBoundary, Card
- DataTable in `components/DataTable.tsx`: generic sortable/filterable/paginated table
- AppLayout in `components/AppLayout.tsx`: sidebar nav + main content area
## Docker
- `frontend/Dockerfile`: multi-stage node:24-alpine → nginxinc/nginx-unprivileged:alpine
- Listens on port 8080 (not 80) for K8s security context compatibility
- `frontend/nginx.conf`: SPA fallback + `/api/`, `/registry/`, `/risk/` reverse proxies
## Adding a New Page
1. Create `src/pages/MyPage.tsx`
2. Add route in `src/routes.tsx`
3. Add nav item in `components/AppLayout.tsx` navItems array
4. Add API hooks in `api/hooks.ts` if needed
5. Add MSW handler in `test/mocks/handlers.ts`
6. Add test in `test/pages.test.tsx`
+28 -7
View File
@@ -1,21 +1,41 @@
--- ---
inclusion: fileMatch inclusion: fileMatch
fileMatchPattern: "infra/k8s/**" fileMatchPattern: "infra/**"
--- ---
# Kubernetes Conventions # Kubernetes & Helm Conventions
## Namespace ## Namespace
All Stonks Oracle workloads deploy to `stonks-oracle` namespace. All Stonks Oracle workloads deploy to `stonks-oracle` namespace.
The namespace is NOT managed by Helm — it's created by `runmefirst.sh` with Helm ownership labels.
## Helm Chart
- Chart at `infra/helm/stonks-oracle/`
- Services defined in `values.yaml` under `services:` — the deployments template iterates over them
- Adding a new service: add entry to `values.yaml`, add network policy if it needs ingress, add ingress if it needs external access
- Dashboard uses nginx-unprivileged on port 8080 (not 80)
- Superset uses custom image `ghcr.io/celesrenata/stonks-oracle/superset:latest` with trino + psycopg2 drivers
## TLS ## TLS
- Internal services: use `ca-issuer` ClusterIssuer (local CA) - Internal services: use `ca-issuer` ClusterIssuer (local CA)
- Public-facing services (Superset, Query API): use `celestium-le-production` ClusterIssuer (Let's Encrypt) - Annotate ingress with `cert-manager.io/cluster-issuer: ca-issuer`
- Annotate ingress with `cert-manager.io/cluster-issuer`
## Ingress ## Ingress
- Traefik ingress controller - Traefik ingress controller
- Domain pattern: `<service>.celestium.life` - Domain pattern: `<service>.celestium.life`
- Always create both HTTP and HTTPS ingress rules - Dashboard: `stonks.celestium.life`
- Query API: `stonks-api.celestium.life`
- Symbol Registry: `stonks-registry.celestium.life`
- Superset: `stonks-dash.celestium.life`
- Trino: `stonks-trino.celestium.life`
## Network Policies
- `default-deny-ingress` blocks all ingress by default
- Each service that needs ingress must have an explicit allow policy
- Dashboard needs: ingress from kube-system (Traefik) on 8080
- Query API needs: ingress from kube-system + dashboard pod on 8000
- Symbol Registry needs: ingress from kube-system + dashboard pod on 8000
- Risk Engine needs: ingress from broker-adapter + query-api + dashboard on 8000
- When adding a new externally-accessible service, add both an ingress AND a network policy
## Service References ## Service References
- PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432` - PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432`
@@ -25,9 +45,10 @@ All Stonks Oracle workloads deploy to `stonks-oracle` namespace.
## Images ## Images
- All images from `ghcr.io/celesrenata/stonks-oracle/<service>:latest` - All images from `ghcr.io/celesrenata/stonks-oracle/<service>:latest`
- Use `imagePullPolicy: Always` in production - Use `imagePullPolicy: Always`
- Use `imagePullSecrets` referencing `ghcr-secret` if repo is private - Use `imagePullSecrets` referencing `ghcr-credentials`
## Labels ## Labels
- `app.kubernetes.io/part-of: stonks-oracle` - `app.kubernetes.io/part-of: stonks-oracle`
- `app: <service-name>` - `app: <service-name>`
- `stonks-oracle/tier: <tier>` (api, frontend, processing, trading, orchestration, analytics)
+31 -9
View File
@@ -7,34 +7,56 @@ Python monorepo with services under `services/`, infrastructure under `infra/`,
## Local Dev Environment ## Local Dev Environment
- NixOS dev environment, Python 3.12 - NixOS dev environment, Python 3.12
- Virtual environment at `.venv/` — always use it for Python commands - Virtual environment at `.venv/` — always use it for Python commands
- For tools not in `.venv/` (like `ruff`, `gh`), use `nix-shell -p <pkg> --run "<cmd>"`
- Node.js 24 for frontend (`frontend/` directory) - Node.js 24 for frontend (`frontend/` directory)
- Docker available locally for image builds - Docker available locally for image builds (but let CI handle pushes)
## Live Endpoints
- Dashboard: `https://stonks.celestium.life`
- Query API: `https://stonks-api.celestium.life`
- Symbol Registry: `https://stonks-registry.celestium.life`
- Superset: `https://stonks-dash.celestium.life`
- Trino: `https://stonks-trino.celestium.life`
## Infrastructure ## Infrastructure
- Kubernetes cluster: 4x NixOS nodes (gremlin-1 through gremlin-4), reachable via `kubectl`, `virtctl`, `ssh root@gremlin-{1,2,3,4}` - Kubernetes cluster: 4x NixOS nodes (gremlin-1 through gremlin-4), reachable via `kubectl`, `virtctl`, `ssh root@gremlin-{1,2,3,4}`
- NixOS configs stored at `/etc/nixos` on gremlin-1, git-pushed to other hosts - NixOS configs stored at `/etc/nixos` on gremlin-1, git-pushed to other hosts
- Ingress: Traefik, domain `*.celestium.life` - Ingress: Traefik, domain `*.celestium.life`
- Cert-Manager: `ca-issuer` (local CA) for internal services, `celestium-le-production` (Let's Encrypt) for public-facing - Cert-Manager: `ca-issuer` (local CA) for internal services
- Container registry: `ghcr.io/celesrenata/stonks-oracle` - Container registry: `ghcr.io/celesrenata/stonks-oracle`
## CI/CD ## CI/CD
- GitHub Actions workflow at `.github/workflows/build.yml` - GitHub Actions workflow at `.github/workflows/build.yml`
- Push to `main` triggers: lint → test → build all service images + dashboard image → push to GHCR - Push to `main` triggers: lint → pytest → frontend vitest → build all service images + dashboard + superset → push to GHCR
- Images tagged as `ghcr.io/celesrenata/stonks-oracle/<service>:<sha>` and `:latest` - Images tagged as `ghcr.io/celesrenata/stonks-oracle/<service>:<sha>` and `:latest`
- Dashboard image built from `frontend/Dockerfile` (multi-stage: node → nginx) - Dashboard image: `frontend/Dockerfile` (multi-stage: node:24 → nginx-unprivileged on port 8080)
- Python service images built from `docker/Dockerfile` with `SERVICE_CMD` build arg - Superset image: `docker/Dockerfile.superset` (apache/superset + trino + psycopg2)
- Let CI handle image builds and pushes — only build locally for testing or when CI is unavailable - Python service images: `docker/Dockerfile` with `SERVICE_CMD` build arg
- Let CI handle image builds and pushes — do NOT manually `docker build && docker push`
- Check CI status: `nix-shell -p gh --run "gh run list -L 3"`
## Deployment Scripts
- `~/sources/kube/stonks-oracle/runmefirst.sh` — full deploy: DB setup, migrations, Helm install, rolling restart
- `~/sources/kube/stonks-oracle/runmelast.sh` — teardown: Helm uninstall, clean resources (preserves DB/MinIO/Redis)
- After CI builds, deploy with: `helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle`
- Restart a single service: `kubectl rollout restart deployment/<name> -n stonks-oracle`
## API Secrets
- Stored as files in repo root (gitignored): `polygon.io.key`, `alpaca.key`, `alpaca.secret`, `alpaca.url`
- GitHub token at `/run/secrets/github_token`
- Injected into K8s secrets via `runmefirst.sh` Helm `--set` flags
## Existing Cluster Services (do NOT redeploy these) ## Existing Cluster Services (do NOT redeploy these)
- PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432` - PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432`
- Redis: `redis-master.redis-service.svc.cluster.local:6379` - Redis: `redis-master.redis-service.svc.cluster.local:6379`
- MinIO: `minio.minio-service.svc.cluster.local:80` (API), console at `minio-crawler-console.minio-service.svc.cluster.local:9090` - MinIO: `minio.minio-service.svc.cluster.local:80` (API)
- Ollama: `ollama.ollama-service.svc.cluster.local:11434` (cluster-internal), also at `http://10.1.1.12:2701` (external), GPU: 4070 Ti Super 16GB - Ollama: `ollama.ollama-service.svc.cluster.local:11434` (cluster-internal), also at `http://10.1.1.12:2701` (external), GPU: 4070 Ti Super 16GB
## Key Conventions ## Key Conventions
- All services use `services/shared/config.py` for configuration via env vars - All services use `services/shared/config.py` for configuration via env vars
- Redis queues defined in `services/shared/redis_keys.py` - Redis queues defined in `services/shared/redis_keys.py`
- Pydantic schemas in `services/shared/schemas.py` - Pydantic schemas in `services/shared/schemas.py`
- K8s manifests in `infra/k8s/`, Helm chart in `infra/helm/stonks-oracle/`, all in `stonks-oracle` namespace - Helm chart in `infra/helm/stonks-oracle/`, all in `stonks-oracle` namespace
- Lakehouse DDL in `lakehouse/schemas/` - Lakehouse DDL in `lakehouse/schemas/`
- Crawler patterns inspired by Noctipede (`~/sources/splinterstice/noctipede`): BeautifulSoup + requests with retry adapters, content hashing, boilerplate stripping, quality scoring - Frontend proxies: `/api/` → query-api:8000, `/registry/` → symbol-registry:8000, `/risk/` → risk:8000
- Network policies: default-deny with explicit allow rules per service