630837070a
- Steering: emphasize running ruff before committing Python changes - Hook: auto-runs ruff check --fix on services/**/*.py when files are edited
5.8 KiB
5.8 KiB
Development Process — Test-Develop-Debug
Local Environment
- Ubuntu dev machine, Python 3.12, virtualenv at
.venv/ - Always use
.venv/bin/pythonor activate withsource .venv/bin/activatebefore running Python commands - Node.js 24 via nvm — always load nvm before running Node/npm/npx commands:
export NVM_DIR="$HOME/.nvm" && [ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh" && nvm use 24 - For tools not available in
.venv/(ruff, gh, etc.), install via pip or apt as needed - Frontend tests: load nvm first, then
cd frontend && npx vitest --run - Python tests:
.venv/bin/ruff check services/then.venv/bin/python -m pytest tests/ -x --tb=short -q
Workflow
- Write or update tests for the target behavior
- Implement the minimal code to pass
- Debug failures, fix, re-run
- Commit and push — CI builds images automatically
- Deploy:
helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle - Restart changed services:
kubectl rollout restart deployment/<name> -n stonks-oracle
Testing
- Python:
pytestwithpytest-asynciofor async code, tests intests/ - Property-based tests: Hypothesis with
@settings(max_examples=100), files prefixedtest_pbt_* - Frontend: Vitest + MSW (Mock Service Worker) for deterministic API mocking, tests in
frontend/src/test/ - Run Python tests:
python -m pytest tests/ -x --tb=short -q - Run frontend tests:
cd frontend && npx vitest --run - Lint Python:
.venv/bin/ruff check services/ - Always run
.venv/bin/ruff check services/before committing Python changes — CI will reject the push if ruff fails - Ruff auto-fix:
.venv/bin/ruff check --fix services/(fixes import sorting and other auto-fixable issues) - Ruff is pinned to
ruff==0.15.10inrequirements.txt— CI uses the same version - Ruff config:
ruff.tomlwithknown-first-party = ["services"]for consistent import sorting - Pre-existing test failures (not regressions):
test_extractor_prompts.py,test_extractor_schemas.py,test_filings_adapter.py,test_ollama_client.py
CI/CD — GitHub Actions
- Workflow:
.github/workflows/build.yml - Triggers on push to
mainand PRs - Jobs:
lint-and-test: ruff lint + pytest + frontend vitest (Node 24)build-services: matrix build of all Python services → GHCRbuild-dashboard: frontend/Dockerfile → GHCR (TypeScript strict mode — catches unused imports)build-superset: docker/Dockerfile.superset → GHCR
- CI handles all image builds and pushes — do NOT manually docker push
- Check CI:
gh run list -L 3 - Re-run failed:
gh run rerun <id> --failed - View failure logs:
gh run view <id> --log-failed
Deploy
- Full deploy/redeploy:
bash ~/sources/kube/stonks-oracle/runmefirst.sh(from gremlin-1) - Full teardown:
bash ~/sources/kube/stonks-oracle/runmelast.sh(from gremlin-1) - Quick Helm upgrade:
helm upgrade --install stonks-oracle infra/helm/stonks-oracle -n stonks-oracle - Restart single service:
kubectl rollout restart deployment/<name> -n stonks-oracle - Restart multiple:
kubectl rollout restart deployment/aggregation deployment/query-api deployment/dashboard -n stonks-oracle - Check pods:
kubectl get pods -n stonks-oracle - Check logs:
kubectl logs deployment/<name> -n stonks-oracle --tail=30
Manually Triggering Ingestion
The scheduler runs on a cadence, but to trigger immediately:
# From a scheduler pod:
kubectl exec -n stonks-oracle <scheduler-pod> -- python -c "
import redis, json, asyncio, asyncpg
async def enqueue():
pool = await asyncpg.create_pool(dsn='postgresql://stonks:<password>@postgresql-rw.postgresql-service.svc.cluster.local:5432/stonks')
r = redis.from_url('redis://:<password>@redis-master.redis-service.svc.cluster.local:6379/0')
rows = await pool.fetch('SELECT s.id AS source_id, s.source_type, s.config, c.id AS company_id, c.ticker FROM sources s JOIN companies c ON c.id = s.company_id WHERE s.active = TRUE AND c.active = TRUE')
for row in rows:
cfg = row['config'] if isinstance(row['config'], dict) else {}
r.rpush('stonks:queue:ingestion', json.dumps({'source_id': str(row['source_id']), 'source_type': row['source_type'], 'ticker': row['ticker'], 'company_id': str(row['company_id']), 'config': cfg}))
await pool.close()
asyncio.run(enqueue())
"
Ingestion jobs MUST include source_id, source_type, ticker, company_id, and config.
Git Conventions
- Commit after each completed phase task
- Commit message format:
feat:,fix:,phase N:prefix - Push to
maintriggers CI
Code Style
- Python 3.12, type hints everywhere
- Pydantic for data validation
- FastAPI for HTTP services
- asyncio + asyncpg/aioredis for async I/O
- Minimal dependencies, prefer stdlib where possible
- Frontend: React 19, TypeScript strict mode, Tailwind CSS, TanStack Router/Query, Recharts for charts
- UUID fields from asyncpg must be converted to str via
_row_dict()helpers - asyncpg interval parameters must be Python
timedeltaobjects, not SQL strings - Recharts v3 callbacks: do NOT add explicit type annotations on
formatter/tickFormatter— let TypeScript infer
Common Pitfalls
- asyncpg expects
timedeltafor$N::intervalparams, not strings like'7 days' - asyncpg expects UUID objects/strings for
$N::uuidparams — synthetic IDs likepattern:AAPL:earnings:7dwill fail - The
competitor_relationshipstable uses UUID company IDs — queries must join throughcompaniesto match by ticker - The dashboard Docker build uses TypeScript strict mode — unused imports that pass local diagnostics will fail in CI
- Ingestion jobs require
source_idfrom thesourcestable — don't just passticker
Documentation
- Do NOT create large summary/success markdown files after each step
- Keep notes short, concise, and organized under
docs/notes/ - If a note isn't useful for future reference, don't write it