Files
stonks-oracle/.kiro/specs/integration-test-pipeline/design.md
T
Celes Renata c85c0068a2 fix: clean up utcnow deprecation warnings, fix 12 failing tests, add CI/CD pipeline manifests
- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files
- Fix 12 failing tests to match current implementation behavior
- Fix pytest_plugins in non-top-level conftest (moved to root conftest.py)
- Auto-fix 189 lint issues (import sorting, unused imports)
- Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests)
- Add values-beta.yaml and values-paper.yaml for staged deployments
- Update GitHub Actions workflow to use self-hosted-gremlin runners
- Add integration-test job to CI pipeline

Result: 1596 passed, 0 failed, 0 warnings
2026-04-18 03:59:28 +00:00

326 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Integration Test Pipeline — Design
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Job: inttest-runner │
│ │
│ Stage 1: Create namespace stonks-inttest-{id} │
│ Stage 2: Deploy infra (postgres, redis, minio) │
│ Stage 3: Run migrations + seed data │
│ Stage 4: Deploy services (query-api, registry, risk, trading) │
│ Stage 5: Wait for readiness │
│ Stage 6: Run API integration tests (pytest) │
│ Stage 7: Run frontend render tests (vitest + fetch) │
│ Stage 8: Collect profiling data │
│ Stage 9: Teardown namespace │
└─────────────────────────────────────────────────────────────────┘
```
## Implementation Approach
### Option A: Kubernetes Job with Python test runner (chosen)
A single Kubernetes Job that:
1. Creates ephemeral infra via `kubectl apply` (postgres, redis, minio manifests)
2. Runs a Python seed script directly against the DB
3. Deploys service pods pointing at the ephemeral infra
4. Runs pytest-based integration tests against the live services
5. Collects timing metrics
6. Tears down everything
### Why not Helm?
The sandbox doesn't need Helm's complexity. Plain manifests are simpler, faster, and easier to debug. The infra is ephemeral — no upgrades, no rollbacks.
## Components
### 1. Sandbox Manifests (`infra/inttest/`)
```
infra/inttest/
├── namespace.yaml # Namespace template
├── postgres.yaml # PostgreSQL 16 StatefulSet (no PV)
├── redis.yaml # Redis 7 Deployment
├── minio.yaml # MinIO Deployment + bucket init Job
├── services.yaml # All 4 API services (query-api, registry, risk, trading)
└── runner.yaml # The test runner Job itself
```
Each manifest uses `${NAMESPACE}` placeholder, substituted at runtime.
### 2. Seed Script (`tests/integration/seed_sandbox.py`)
Pure SQL + MinIO operations. No external API calls. Inserts:
- Companies, sources, aliases, competitor relationships
- Documents with intelligence records
- Trend windows with evidence and projections
- Recommendations with evidence citations
- Orders, positions, trading decisions
- Global events with macro impacts
- AI agents with variants and performance logs
- Trading engine config, portfolio snapshots
- MinIO objects (normalized text files)
All UUIDs are deterministic (hardcoded) for reproducible assertions.
### 3. Integration Tests (`tests/integration/test_api_endpoints.py`)
pytest-based tests that call every API endpoint the frontend depends on:
```python
class TestQueryAPI:
async def test_companies_list(self):
resp = await client.get("/api/companies")
assert resp.status_code == 200
data = resp.json()
assert len(data) >= 5
assert all("ticker" in c for c in data)
async def test_trend_detail(self):
resp = await client.get(f"/api/trends/{SEED_TREND_ID}")
assert resp.status_code == 200
data = resp.json()
assert data["trend_direction"] in ("bullish", "bearish", "mixed")
assert 0 <= data["confidence"] <= 1
# ... 40+ test functions
```
### 4. Frontend Render Tests (`tests/integration/test_frontend_renders.py`)
Uses the live sandbox APIs (not MSW mocks) to verify each page's data dependencies:
```python
class TestFrontendDataDeps:
"""Verify every API call each frontend page makes returns valid data."""
async def test_home_page_deps(self):
# Home page calls: companies, pipeline health, ingestion summary, recommendations
for endpoint in ["/api/companies", "/api/ops/pipeline/health", "/api/ops/ingestion/summary"]:
resp = await client.get(endpoint)
assert resp.status_code == 200
async def test_company_detail_deps(self):
# CompanyDetail calls 12 different endpoints
company_id = SEED_COMPANY_IDS["AAPL"]
for endpoint in [
f"/api/companies/{company_id}",
f"/api/companies/{company_id}/sources",
f"/api/companies/AAPL/macro-impacts",
f"/api/companies/{company_id}/competitors",
]:
resp = await client.get(endpoint)
assert resp.status_code == 200
```
### 5. Profiler (`tests/integration/profiler.py`)
Wraps each test with timing:
- Records wall-clock time per API call
- Computes P50/P95/P99 across all calls
- Outputs a summary table at the end
- Flags any endpoint > 500ms as "slow"
### 6. Runner Script (`infra/inttest/run_pipeline.sh`)
Standalone orchestration script with a well-defined CLI contract so any CI/CD system (or a human) can invoke it. The future CI/CD pipeline spec will call this script as a stage.
**CLI interface:**
```
Usage: bash infra/inttest/run_pipeline.sh [OPTIONS]
Options:
--image-tag TAG Docker image tag to deploy (default: latest)
--namespace NAME Override namespace name (default: stonks-inttest-<timestamp>)
--skip-teardown Leave namespace running after tests (for debugging)
--results-file PATH Path for JSON results output (default: inttest-results.json)
Exit codes:
0 All tests passed
1 One or more test failures
2 Infrastructure setup failure (postgres/redis/minio/services didn't start)
```
**JSON result contract** (`inttest-results.json`):
```json
{
"run_id": "stonks-inttest-1705312800",
"image_tag": "abc123",
"started_at": "2025-01-15T12:00:00Z",
"completed_at": "2025-01-15T12:07:30Z",
"exit_code": 0,
"stages": {
"infra_deploy": {"duration_s": 45.2, "status": "ok"},
"seed_data": {"duration_s": 8.1, "status": "ok"},
"service_deploy": {"duration_s": 32.5, "status": "ok"},
"integration_tests": {"duration_s": 28.3, "status": "ok"},
"teardown": {"duration_s": 5.0, "status": "ok"}
},
"tests": {
"total": 41,
"passed": 41,
"failed": 0,
"errors": 0
},
"profiling": {
"endpoints": {
"/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45},
...
},
"slow_endpoints": []
}
}
```
This contract is designed so the future CI/CD pipeline can:
1. Parse `exit_code` to decide whether to promote to the next stage
2. Parse `profiling.slow_endpoints` to flag performance regressions
3. Archive the full JSON as a build artifact
4. Display `tests.passed`/`tests.failed` in a dashboard
```bash
#!/bin/bash
set -euo pipefail
# Parse CLI args
IMAGE_TAG="latest"
NAMESPACE="stonks-inttest-$(date +%s)"
SKIP_TEARDOWN=false
RESULTS_FILE="inttest-results.json"
while [[ $# -gt 0 ]]; do
case $1 in
--image-tag) IMAGE_TAG="$2"; shift 2 ;;
--namespace) NAMESPACE="$2"; shift 2 ;;
--skip-teardown) SKIP_TEARDOWN=true; shift ;;
--results-file) RESULTS_FILE="$2"; shift 2 ;;
*) echo "Unknown option: $1"; exit 2 ;;
esac
done
# Cleanup function (always runs, even on failure)
cleanup() {
if [ "$SKIP_TEARDOWN" = false ]; then
kubectl delete namespace "$NAMESPACE" --wait=false 2>/dev/null || true
fi
}
trap cleanup EXIT
# Stage 1: Create namespace
kubectl create namespace "$NAMESPACE"
# Stage 2: Deploy infra
kubectl create configmap postgres-migrations --from-file=infra/migrations/ -n "$NAMESPACE"
export NAMESPACE
envsubst < infra/inttest/postgres.yaml | kubectl apply -n "$NAMESPACE" -f -
envsubst < infra/inttest/redis.yaml | kubectl apply -n "$NAMESPACE" -f -
envsubst < infra/inttest/minio.yaml | kubectl apply -n "$NAMESPACE" -f -
kubectl wait --for=condition=ready pod -l app=postgres -n "$NAMESPACE" --timeout=120s
kubectl wait --for=condition=ready pod -l app=redis -n "$NAMESPACE" --timeout=60s
kubectl wait --for=condition=ready pod -l app=minio -n "$NAMESPACE" --timeout=60s
# Stage 3: Seed data (run from a pod with DB access)
# ... seed runner pod ...
# Stage 4: Deploy services (using specified image tag)
envsubst < infra/inttest/services.yaml | sed "s/:latest/:${IMAGE_TAG}/g" | kubectl apply -n "$NAMESPACE" -f -
kubectl wait --for=condition=ready pod -l tier=api -n "$NAMESPACE" --timeout=120s
# Stage 5: Run integration tests
envsubst < infra/inttest/runner.yaml | sed "s/:latest/:${IMAGE_TAG}/g" | kubectl apply -n "$NAMESPACE" -f -
kubectl wait --for=condition=complete job/inttest-runner -n "$NAMESPACE" --timeout=600s
# Stage 6: Collect results
kubectl logs job/inttest-runner -n "$NAMESPACE" > "$RESULTS_FILE"
# Stage 7: Teardown (handled by trap)
```
## Profiling Strategy
### What to measure
1. **Seed insertion time** — how long to populate all tables
2. **Service startup time** — time from pod creation to readiness
3. **API response times** — per-endpoint P50/P95/P99
4. **Memory usage**`kubectl top pods` snapshot during tests
### Performance targets
| Metric | Target | Action if exceeded |
|--------|--------|--------------------|
| Seed insertion | < 30s | Batch INSERT optimization |
| Service startup | < 30s each | Reduce import time, lazy loading |
| API P95 | < 200ms | Query optimization, indexes |
| API P99 | < 500ms | Connection pooling, caching |
| Total pipeline | < 10 min | Parallelize stages |
### Optimization opportunities to discover
- Slow SQL queries (missing indexes, N+1 patterns)
- Heavy service startup (import chains)
- Inefficient aggregation math
- Unnecessary serialization overhead
- Connection pool sizing
## Data Flow
```
Seed Script
├── PostgreSQL: companies, documents, trends, recommendations, orders, ...
├── MinIO: normalized text files, audit artifacts
└── Redis: (empty — no queue state needed for API tests)
Integration Tests
├── Query API ← PostgreSQL (read-only queries)
├── Symbol Registry ← PostgreSQL (CRUD operations)
├── Risk Engine ← PostgreSQL (evaluation + approvals)
└── Trading Engine ← PostgreSQL + Redis (status, decisions, backtest)
```
## Namespace Lifecycle
```
CREATE namespace
→ Deploy postgres, redis, minio
→ Wait for healthy
→ Run migrations (init container)
→ Run seed script
→ Deploy services
→ Wait for ready
→ Run tests
→ Collect results
→ DELETE namespace (always, even on failure)
```
## Integration Contract for Future CI/CD Pipeline
This spec produces a standalone runner (`infra/inttest/run_pipeline.sh`) with a well-defined contract. A future spec ("CI/CD Deployment Pipeline") will consume it as one stage in a larger pipeline:
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Future CI/CD Pipeline (separate spec) │
│ │
│ 1. Git push → webhook to self-hosted runner on gremlin nodes │
│ 2. Lint + Unit Tests (ruff, pytest, vitest) │
│ 3. Docker Build → push to GHCR (self-hosted, no GH Actions compute) │
│ 4. ┌──────────────────────────────────────────────────────────┐ │
│ │ Integration Tests (THIS SPEC) │ │
│ │ bash infra/inttest/run_pipeline.sh --image-tag $SHA │ │
│ │ → reads inttest-results.json │ │
│ │ → exit code 0 = promote, 1 = block │ │
│ └──────────────────────────────────────────────────────────┘ │
│ 5. Promote to beta namespace (if tests pass) │
│ 6. Promote to paper namespace (manual gate or auto) │
│ 7. Promote to live namespace (market-hours blocker + break-glass) │
│ │
│ Each stage has enable/disable toggle. │
│ Promotions blocked during market hours (9:3016:00 ET) unless │
│ break-glass is activated. │
└─────────────────────────────────────────────────────────────────────────┘
```
**What this spec provides to the future pipeline:**
- `infra/inttest/run_pipeline.sh` — callable with `--image-tag` to test any build
- `inttest-results.json` — machine-readable results for promotion decisions
- Exit codes for pass/fail gating
- `--skip-teardown` for debugging failed runs
- All K8s manifests in `infra/inttest/` for sandbox lifecycle
- Deterministic seed data and comprehensive API test coverage