diff --git a/.kiro/hooks/lint-on-save.md b/.kiro/hooks/lint-on-save.md index 32b2047..5472cca 100644 --- a/.kiro/hooks/lint-on-save.md +++ b/.kiro/hooks/lint-on-save.md @@ -1,14 +1,19 @@ --- -name: Lint Python on Save -description: Run ruff linter when any Python file is saved -version: "1.0" +name: Lint on Save +description: Run linter when Python or TypeScript files are saved +version: "2.0" trigger: type: onSave - filePattern: "**/*.py" + filePattern: "**/*.{py,ts,tsx}" --- -When any Python file is saved: +When a file is saved: -1. Run `ruff check {filePath}` on the saved file -2. If there are fixable issues, run `ruff check --fix {filePath}` to auto-fix -3. Report any remaining issues concisely +1. If it's a Python file (`*.py`): + - Run `nix-shell -p ruff --run "ruff check {filePath}"` on the saved file + - If there are fixable issues, run `nix-shell -p ruff --run "ruff check --fix {filePath}"` + - Report any remaining issues concisely + +2. If it's a TypeScript/React file (`*.ts` or `*.tsx`) under `frontend/`: + - Run `npx tsc --noEmit` from the `frontend/` directory to check types + - Report any type errors concisely diff --git a/.kiro/hooks/phase-commit.md b/.kiro/hooks/phase-commit.md index 024bf73..0fff379 100644 --- a/.kiro/hooks/phase-commit.md +++ b/.kiro/hooks/phase-commit.md @@ -1,7 +1,7 @@ --- name: Phase Commit and Push -description: Commit and push after completing a spec phase task -version: "1.0" +description: Commit, push, and verify CI after completing a phase task +version: "2.0" trigger: type: manual --- @@ -13,3 +13,6 @@ When triggered manually after completing a phase: 3. Run `git commit -m "{message}"` 4. Run `git push origin main` 5. Report the commit SHA and confirm push succeeded +6. Wait 30 seconds, then check CI status with `nix-shell -p gh --run "gh run list -L 1"` +7. If CI is still running, report that and suggest checking back later +8. If CI failed, run `nix-shell -p gh --run "gh run view --log-failed"` and report the error diff --git a/.kiro/hooks/run-tests-on-save.md b/.kiro/hooks/run-tests-on-save.md index a56e247..c035b59 100644 --- a/.kiro/hooks/run-tests-on-save.md +++ b/.kiro/hooks/run-tests-on-save.md @@ -1,16 +1,21 @@ --- name: Run Tests on Save -description: Automatically run relevant tests when a Python service file is saved -version: "1.0" +description: Run relevant tests when service or frontend files are saved +version: "2.0" trigger: type: onSave - filePattern: "services/**/*.py" + filePattern: "{services/**/*.py,frontend/src/**/*.{ts,tsx}}" --- -When a Python file under `services/` is saved: +When a file is saved: -1. Identify which service module was modified (e.g. `services/ingestion/worker.py` → `ingestion`) -2. Look for corresponding tests in `tests/` matching the service name -3. Run `pytest tests/test_{service_name}*.py -x --tb=short -q` if test files exist -4. If no specific test file exists, run `ruff check` on the modified file to catch syntax/lint issues -5. Report results concisely — only show failures or a one-line success confirmation +1. If it's a Python file under `services/`: + - Identify the service module (e.g. `services/ingestion/worker.py` → `ingestion`) + - Look for corresponding tests in `tests/` matching the service name + - Run `python -m pytest tests/test_{service_name}*.py -x --tb=short -q` if test files exist + - If no specific test file exists, run lint check only + - Report results concisely + +2. If it's a TypeScript/React file under `frontend/src/`: + - Run `npx vitest --run` from the `frontend/` directory + - Report results concisely — only show failures or a one-line success diff --git a/.kiro/hooks/validate-k8s-on-save.md b/.kiro/hooks/validate-k8s-on-save.md index 3d294c0..e675762 100644 --- a/.kiro/hooks/validate-k8s-on-save.md +++ b/.kiro/hooks/validate-k8s-on-save.md @@ -1,16 +1,22 @@ --- -name: Validate K8s Manifests -description: Validate Kubernetes YAML when manifest files are saved -version: "1.0" +name: Validate Helm & K8s on Save +description: Validate Helm templates and K8s manifests when infrastructure files are saved +version: "2.0" trigger: type: onSave - filePattern: "infra/k8s/**/*.yaml" + filePattern: "infra/**/*.{yaml,yml,tpl}" --- -When a Kubernetes manifest YAML file is saved: +When a Helm or K8s manifest file is saved: -1. Parse the YAML to check for syntax errors -2. Verify required fields exist (apiVersion, kind, metadata) -3. Check that namespace is set to `stonks-oracle` for application resources -4. Verify image references point to `ghcr.io/celesrenata/stonks-oracle/` -5. Report any issues found +1. If it's under `infra/helm/`: + - Run `helm template stonks-oracle infra/helm/stonks-oracle -n stonks-oracle` to validate template rendering + - Check for template syntax errors + - Verify the output contains expected resource types (Deployment, Service, Ingress, NetworkPolicy) + - Report any rendering errors concisely + +2. If it's under `infra/k8s/`: + - Parse the YAML to check for syntax errors + - Verify required fields exist (apiVersion, kind, metadata) + - Check that namespace is set to `stonks-oracle` + - Report any issues found diff --git a/.kiro/steering/development-process.md b/.kiro/steering/development-process.md index 2146d95..166dd3b 100644 --- a/.kiro/steering/development-process.md +++ b/.kiro/steering/development-process.md @@ -3,44 +3,49 @@ ## Local Environment - Python 3.12 via NixOS, virtualenv at `.venv/` - Always use `.venv/bin/python` or activate with `source .venv/bin/activate` before running Python commands -- When running `pytest`, `ruff`, or any Python tool, use the `.venv` — e.g. `python -m pytest` (not bare `pytest` which may resolve to system Python) -- Node.js 24 available for frontend work; `frontend/` has its own `node_modules/` +- For tools not available in `.venv/` (ruff, gh, etc.), use `nix-shell -p --run ""` +- Node.js 24 for frontend; `frontend/` has its own `node_modules/` +- Frontend tests: `cd frontend && npx vitest --run` +- Python tests: `nix-shell -p ruff --run "ruff check services/"` then `python -m pytest tests/ -x --tb=short -q` ## Workflow 1. Write or update tests for the target behavior 2. Implement the minimal code to pass 3. Debug failures, fix, re-run -4. Commit and push after each phase completes -5. GitHub Actions CI automatically builds container images and pushes to GHCR -6. Deploy to cluster via Helm or `kubectl apply` +4. Commit and push — CI builds images automatically +5. Deploy: `helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle` +6. Restart changed services: `kubectl rollout restart deployment/ -n stonks-oracle` ## Testing -- Use `pytest` with `pytest-asyncio` for async code -- Tests live in the top-level `tests/` directory -- Run tests with `python -m pytest tests/ -x --tb=short -q` -- Focus on core logic, not mocking infrastructure +- Python: `pytest` with `pytest-asyncio` for async code, tests in `tests/` +- Frontend: Vitest + MSW (Mock Service Worker) for deterministic API mocking, tests in `frontend/src/test/` +- Run Python tests: `python -m pytest tests/ -x --tb=short -q` +- Run frontend tests: `cd frontend && npx vitest --run` +- Lint Python: `nix-shell -p ruff --run "ruff check services/"` ## CI/CD — GitHub Actions -- Workflow file: `.github/workflows/build.yml` +- Workflow: `.github/workflows/build.yml` - Triggers on push to `main` and PRs - Jobs: - - `lint-and-test`: runs ruff lint + pytest on ubuntu with Python 3.12 - - `build-services`: matrix build of all Python services via `docker/Dockerfile`, pushes to GHCR with `:` and `:latest` tags - - `build-dashboard`: builds `frontend/Dockerfile` separately, pushes `dashboard` image to GHCR -- CI handles image building and pushing — do NOT manually `docker push` unless CI is broken or you need to bypass it -- After pushing to `main`, wait for CI to complete before deploying (check GitHub Actions status) -- If you need to build locally for testing: `make build` or `docker build` directly, but let CI do the GHCR push + - `lint-and-test`: ruff lint + pytest + frontend vitest (Node 24) + - `build-services`: matrix build of all Python services → GHCR + - `build-dashboard`: frontend/Dockerfile → GHCR + - `build-superset`: docker/Dockerfile.superset → GHCR +- CI handles all image builds and pushes — do NOT manually docker push +- Check CI: `nix-shell -p gh --run "gh run list -L 3"` +- Re-run failed: `nix-shell -p gh --run "gh run rerun --failed"` ## Deploy -- Helm chart at `infra/helm/stonks-oracle/` -- Deploy: `helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle` -- Alternative raw manifests: `kubectl apply -f infra/k8s/` -- To restart a deployment after CI pushes new images: `kubectl rollout restart deployment/ -n stonks-oracle` +- Full deploy/redeploy: `~/sources/kube/stonks-oracle/runmefirst.sh` +- Full teardown: `~/sources/kube/stonks-oracle/runmelast.sh` +- Quick Helm upgrade: `helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle` +- Restart single service: `kubectl rollout restart deployment/ -n stonks-oracle` +- Check pods: `kubectl get pods -n stonks-oracle` ## Git Conventions - Commit after each completed phase task - Commit message format: `phase N: short description` -- Push to `main` branch triggers CI +- Push to `main` triggers CI ## Code Style - Python 3.12, type hints everywhere @@ -49,9 +54,9 @@ - asyncio + asyncpg/aioredis for async I/O - Minimal dependencies, prefer stdlib where possible - Frontend: React 19, TypeScript strict mode, Tailwind CSS, TanStack Router/Query +- UUID fields from asyncpg must be converted to str via `_row_dict()` helpers ## Documentation - Do NOT create large summary/success markdown files after each step - Keep notes short, concise, and organized under `docs/notes/` -- Name note files to match the task they relate to (e.g. `docs/notes/phase0-k8s-manifests.md`) - If a note isn't useful for future reference, don't write it diff --git a/.kiro/steering/frontend-conventions.md b/.kiro/steering/frontend-conventions.md new file mode 100644 index 0000000..6df0b4c --- /dev/null +++ b/.kiro/steering/frontend-conventions.md @@ -0,0 +1,43 @@ +--- +inclusion: fileMatch +fileMatchPattern: "frontend/**" +--- +# Frontend Conventions + +## Stack +- React 19, TypeScript strict mode, Vite 8 +- Tailwind CSS with custom dark theme (surface-*, brand-* colors) +- TanStack Router (file-based routes in `routes.tsx`) +- TanStack Query for data fetching (hooks in `api/hooks.ts`) +- Recharts for charts, Monaco Editor for SQL, Lucide for icons + +## API Client +- `api/client.ts` — shared fetch wrapper with `apiGet`, `apiPost`, `apiPut`, `apiDelete` +- Three API bases: `query` (→ `/api/`), `registry` (→ `/registry/`), `risk` (→ `/risk/`) +- Base URLs use `||` fallback (not `??`) because Vite inlines empty string for undefined env vars +- All hooks in `api/hooks.ts` — typed with TanStack Query + +## Testing +- Vitest + MSW (Mock Service Worker) for deterministic tests +- Test setup: `src/test/setup.ts` starts MSW server +- Mock handlers: `src/test/mocks/handlers.ts` +- Test helper: `src/test/render.tsx` provides `renderRoute(path)` with QueryClient + Router +- Run: `npx vitest --run` + +## Components +- Shared UI in `components/ui.tsx`: StatusBadge, ConfidenceBar, TrendArrow, DateRangeSelector, TickerFilter, LoadingSpinner, ErrorBoundary, Card +- DataTable in `components/DataTable.tsx`: generic sortable/filterable/paginated table +- AppLayout in `components/AppLayout.tsx`: sidebar nav + main content area + +## Docker +- `frontend/Dockerfile`: multi-stage node:24-alpine → nginxinc/nginx-unprivileged:alpine +- Listens on port 8080 (not 80) for K8s security context compatibility +- `frontend/nginx.conf`: SPA fallback + `/api/`, `/registry/`, `/risk/` reverse proxies + +## Adding a New Page +1. Create `src/pages/MyPage.tsx` +2. Add route in `src/routes.tsx` +3. Add nav item in `components/AppLayout.tsx` navItems array +4. Add API hooks in `api/hooks.ts` if needed +5. Add MSW handler in `test/mocks/handlers.ts` +6. Add test in `test/pages.test.tsx` diff --git a/.kiro/steering/kubernetes-conventions.md b/.kiro/steering/kubernetes-conventions.md index 9a49a85..e00cecc 100644 --- a/.kiro/steering/kubernetes-conventions.md +++ b/.kiro/steering/kubernetes-conventions.md @@ -1,21 +1,41 @@ --- inclusion: fileMatch -fileMatchPattern: "infra/k8s/**" +fileMatchPattern: "infra/**" --- -# Kubernetes Conventions +# Kubernetes & Helm Conventions ## Namespace All Stonks Oracle workloads deploy to `stonks-oracle` namespace. +The namespace is NOT managed by Helm — it's created by `runmefirst.sh` with Helm ownership labels. + +## Helm Chart +- Chart at `infra/helm/stonks-oracle/` +- Services defined in `values.yaml` under `services:` — the deployments template iterates over them +- Adding a new service: add entry to `values.yaml`, add network policy if it needs ingress, add ingress if it needs external access +- Dashboard uses nginx-unprivileged on port 8080 (not 80) +- Superset uses custom image `ghcr.io/celesrenata/stonks-oracle/superset:latest` with trino + psycopg2 drivers ## TLS - Internal services: use `ca-issuer` ClusterIssuer (local CA) -- Public-facing services (Superset, Query API): use `celestium-le-production` ClusterIssuer (Let's Encrypt) -- Annotate ingress with `cert-manager.io/cluster-issuer` +- Annotate ingress with `cert-manager.io/cluster-issuer: ca-issuer` ## Ingress - Traefik ingress controller - Domain pattern: `.celestium.life` -- Always create both HTTP and HTTPS ingress rules +- Dashboard: `stonks.celestium.life` +- Query API: `stonks-api.celestium.life` +- Symbol Registry: `stonks-registry.celestium.life` +- Superset: `stonks-dash.celestium.life` +- Trino: `stonks-trino.celestium.life` + +## Network Policies +- `default-deny-ingress` blocks all ingress by default +- Each service that needs ingress must have an explicit allow policy +- Dashboard needs: ingress from kube-system (Traefik) on 8080 +- Query API needs: ingress from kube-system + dashboard pod on 8000 +- Symbol Registry needs: ingress from kube-system + dashboard pod on 8000 +- Risk Engine needs: ingress from broker-adapter + query-api + dashboard on 8000 +- When adding a new externally-accessible service, add both an ingress AND a network policy ## Service References - PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432` @@ -25,9 +45,10 @@ All Stonks Oracle workloads deploy to `stonks-oracle` namespace. ## Images - All images from `ghcr.io/celesrenata/stonks-oracle/:latest` -- Use `imagePullPolicy: Always` in production -- Use `imagePullSecrets` referencing `ghcr-secret` if repo is private +- Use `imagePullPolicy: Always` +- Use `imagePullSecrets` referencing `ghcr-credentials` ## Labels - `app.kubernetes.io/part-of: stonks-oracle` - `app: ` +- `stonks-oracle/tier: ` (api, frontend, processing, trading, orchestration, analytics) diff --git a/.kiro/steering/project-context.md b/.kiro/steering/project-context.md index 031bdf0..cab269c 100644 --- a/.kiro/steering/project-context.md +++ b/.kiro/steering/project-context.md @@ -7,34 +7,56 @@ Python monorepo with services under `services/`, infrastructure under `infra/`, ## Local Dev Environment - NixOS dev environment, Python 3.12 - Virtual environment at `.venv/` — always use it for Python commands +- For tools not in `.venv/` (like `ruff`, `gh`), use `nix-shell -p --run ""` - Node.js 24 for frontend (`frontend/` directory) -- Docker available locally for image builds +- Docker available locally for image builds (but let CI handle pushes) + +## Live Endpoints +- Dashboard: `https://stonks.celestium.life` +- Query API: `https://stonks-api.celestium.life` +- Symbol Registry: `https://stonks-registry.celestium.life` +- Superset: `https://stonks-dash.celestium.life` +- Trino: `https://stonks-trino.celestium.life` ## Infrastructure - Kubernetes cluster: 4x NixOS nodes (gremlin-1 through gremlin-4), reachable via `kubectl`, `virtctl`, `ssh root@gremlin-{1,2,3,4}` - NixOS configs stored at `/etc/nixos` on gremlin-1, git-pushed to other hosts - Ingress: Traefik, domain `*.celestium.life` -- Cert-Manager: `ca-issuer` (local CA) for internal services, `celestium-le-production` (Let's Encrypt) for public-facing +- Cert-Manager: `ca-issuer` (local CA) for internal services - Container registry: `ghcr.io/celesrenata/stonks-oracle` ## CI/CD - GitHub Actions workflow at `.github/workflows/build.yml` -- Push to `main` triggers: lint → test → build all service images + dashboard image → push to GHCR +- Push to `main` triggers: lint → pytest → frontend vitest → build all service images + dashboard + superset → push to GHCR - Images tagged as `ghcr.io/celesrenata/stonks-oracle/:` and `:latest` -- Dashboard image built from `frontend/Dockerfile` (multi-stage: node → nginx) -- Python service images built from `docker/Dockerfile` with `SERVICE_CMD` build arg -- Let CI handle image builds and pushes — only build locally for testing or when CI is unavailable +- Dashboard image: `frontend/Dockerfile` (multi-stage: node:24 → nginx-unprivileged on port 8080) +- Superset image: `docker/Dockerfile.superset` (apache/superset + trino + psycopg2) +- Python service images: `docker/Dockerfile` with `SERVICE_CMD` build arg +- Let CI handle image builds and pushes — do NOT manually `docker build && docker push` +- Check CI status: `nix-shell -p gh --run "gh run list -L 3"` + +## Deployment Scripts +- `~/sources/kube/stonks-oracle/runmefirst.sh` — full deploy: DB setup, migrations, Helm install, rolling restart +- `~/sources/kube/stonks-oracle/runmelast.sh` — teardown: Helm uninstall, clean resources (preserves DB/MinIO/Redis) +- After CI builds, deploy with: `helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle` +- Restart a single service: `kubectl rollout restart deployment/ -n stonks-oracle` + +## API Secrets +- Stored as files in repo root (gitignored): `polygon.io.key`, `alpaca.key`, `alpaca.secret`, `alpaca.url` +- GitHub token at `/run/secrets/github_token` +- Injected into K8s secrets via `runmefirst.sh` Helm `--set` flags ## Existing Cluster Services (do NOT redeploy these) - PostgreSQL: `postgresql-rw.postgresql-service.svc.cluster.local:5432` - Redis: `redis-master.redis-service.svc.cluster.local:6379` -- MinIO: `minio.minio-service.svc.cluster.local:80` (API), console at `minio-crawler-console.minio-service.svc.cluster.local:9090` +- MinIO: `minio.minio-service.svc.cluster.local:80` (API) - Ollama: `ollama.ollama-service.svc.cluster.local:11434` (cluster-internal), also at `http://10.1.1.12:2701` (external), GPU: 4070 Ti Super 16GB ## Key Conventions - All services use `services/shared/config.py` for configuration via env vars - Redis queues defined in `services/shared/redis_keys.py` - Pydantic schemas in `services/shared/schemas.py` -- K8s manifests in `infra/k8s/`, Helm chart in `infra/helm/stonks-oracle/`, all in `stonks-oracle` namespace +- Helm chart in `infra/helm/stonks-oracle/`, all in `stonks-oracle` namespace - Lakehouse DDL in `lakehouse/schemas/` -- Crawler patterns inspired by Noctipede (`~/sources/splinterstice/noctipede`): BeautifulSoup + requests with retry adapters, content hashing, boilerplate stripping, quality scoring +- Frontend proxies: `/api/` → query-api:8000, `/registry/` → symbol-registry:8000, `/risk/` → risk:8000 +- Network policies: default-deny with explicit allow rules per service