c85c0068a2
- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files - Fix 12 failing tests to match current implementation behavior - Fix pytest_plugins in non-top-level conftest (moved to root conftest.py) - Auto-fix 189 lint issues (import sorting, unused imports) - Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests) - Add values-beta.yaml and values-paper.yaml for staged deployments - Update GitHub Actions workflow to use self-hosted-gremlin runners - Add integration-test job to CI pipeline Result: 1596 passed, 0 failed, 0 warnings
326 lines
13 KiB
Markdown
326 lines
13 KiB
Markdown
# Integration Test Pipeline — Design
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Kubernetes Job: inttest-runner │
|
||
│ │
|
||
│ Stage 1: Create namespace stonks-inttest-{id} │
|
||
│ Stage 2: Deploy infra (postgres, redis, minio) │
|
||
│ Stage 3: Run migrations + seed data │
|
||
│ Stage 4: Deploy services (query-api, registry, risk, trading) │
|
||
│ Stage 5: Wait for readiness │
|
||
│ Stage 6: Run API integration tests (pytest) │
|
||
│ Stage 7: Run frontend render tests (vitest + fetch) │
|
||
│ Stage 8: Collect profiling data │
|
||
│ Stage 9: Teardown namespace │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Implementation Approach
|
||
|
||
### Option A: Kubernetes Job with Python test runner (chosen)
|
||
A single Kubernetes Job that:
|
||
1. Creates ephemeral infra via `kubectl apply` (postgres, redis, minio manifests)
|
||
2. Runs a Python seed script directly against the DB
|
||
3. Deploys service pods pointing at the ephemeral infra
|
||
4. Runs pytest-based integration tests against the live services
|
||
5. Collects timing metrics
|
||
6. Tears down everything
|
||
|
||
### Why not Helm?
|
||
The sandbox doesn't need Helm's complexity. Plain manifests are simpler, faster, and easier to debug. The infra is ephemeral — no upgrades, no rollbacks.
|
||
|
||
## Components
|
||
|
||
### 1. Sandbox Manifests (`infra/inttest/`)
|
||
|
||
```
|
||
infra/inttest/
|
||
├── namespace.yaml # Namespace template
|
||
├── postgres.yaml # PostgreSQL 16 StatefulSet (no PV)
|
||
├── redis.yaml # Redis 7 Deployment
|
||
├── minio.yaml # MinIO Deployment + bucket init Job
|
||
├── services.yaml # All 4 API services (query-api, registry, risk, trading)
|
||
└── runner.yaml # The test runner Job itself
|
||
```
|
||
|
||
Each manifest uses `${NAMESPACE}` placeholder, substituted at runtime.
|
||
|
||
### 2. Seed Script (`tests/integration/seed_sandbox.py`)
|
||
|
||
Pure SQL + MinIO operations. No external API calls. Inserts:
|
||
- Companies, sources, aliases, competitor relationships
|
||
- Documents with intelligence records
|
||
- Trend windows with evidence and projections
|
||
- Recommendations with evidence citations
|
||
- Orders, positions, trading decisions
|
||
- Global events with macro impacts
|
||
- AI agents with variants and performance logs
|
||
- Trading engine config, portfolio snapshots
|
||
- MinIO objects (normalized text files)
|
||
|
||
All UUIDs are deterministic (hardcoded) for reproducible assertions.
|
||
|
||
### 3. Integration Tests (`tests/integration/test_api_endpoints.py`)
|
||
|
||
pytest-based tests that call every API endpoint the frontend depends on:
|
||
|
||
```python
|
||
class TestQueryAPI:
|
||
async def test_companies_list(self):
|
||
resp = await client.get("/api/companies")
|
||
assert resp.status_code == 200
|
||
data = resp.json()
|
||
assert len(data) >= 5
|
||
assert all("ticker" in c for c in data)
|
||
|
||
async def test_trend_detail(self):
|
||
resp = await client.get(f"/api/trends/{SEED_TREND_ID}")
|
||
assert resp.status_code == 200
|
||
data = resp.json()
|
||
assert data["trend_direction"] in ("bullish", "bearish", "mixed")
|
||
assert 0 <= data["confidence"] <= 1
|
||
# ... 40+ test functions
|
||
```
|
||
|
||
### 4. Frontend Render Tests (`tests/integration/test_frontend_renders.py`)
|
||
|
||
Uses the live sandbox APIs (not MSW mocks) to verify each page's data dependencies:
|
||
|
||
```python
|
||
class TestFrontendDataDeps:
|
||
"""Verify every API call each frontend page makes returns valid data."""
|
||
|
||
async def test_home_page_deps(self):
|
||
# Home page calls: companies, pipeline health, ingestion summary, recommendations
|
||
for endpoint in ["/api/companies", "/api/ops/pipeline/health", "/api/ops/ingestion/summary"]:
|
||
resp = await client.get(endpoint)
|
||
assert resp.status_code == 200
|
||
|
||
async def test_company_detail_deps(self):
|
||
# CompanyDetail calls 12 different endpoints
|
||
company_id = SEED_COMPANY_IDS["AAPL"]
|
||
for endpoint in [
|
||
f"/api/companies/{company_id}",
|
||
f"/api/companies/{company_id}/sources",
|
||
f"/api/companies/AAPL/macro-impacts",
|
||
f"/api/companies/{company_id}/competitors",
|
||
]:
|
||
resp = await client.get(endpoint)
|
||
assert resp.status_code == 200
|
||
```
|
||
|
||
### 5. Profiler (`tests/integration/profiler.py`)
|
||
|
||
Wraps each test with timing:
|
||
- Records wall-clock time per API call
|
||
- Computes P50/P95/P99 across all calls
|
||
- Outputs a summary table at the end
|
||
- Flags any endpoint > 500ms as "slow"
|
||
|
||
### 6. Runner Script (`infra/inttest/run_pipeline.sh`)
|
||
|
||
Standalone orchestration script with a well-defined CLI contract so any CI/CD system (or a human) can invoke it. The future CI/CD pipeline spec will call this script as a stage.
|
||
|
||
**CLI interface:**
|
||
```
|
||
Usage: bash infra/inttest/run_pipeline.sh [OPTIONS]
|
||
|
||
Options:
|
||
--image-tag TAG Docker image tag to deploy (default: latest)
|
||
--namespace NAME Override namespace name (default: stonks-inttest-<timestamp>)
|
||
--skip-teardown Leave namespace running after tests (for debugging)
|
||
--results-file PATH Path for JSON results output (default: inttest-results.json)
|
||
|
||
Exit codes:
|
||
0 All tests passed
|
||
1 One or more test failures
|
||
2 Infrastructure setup failure (postgres/redis/minio/services didn't start)
|
||
```
|
||
|
||
**JSON result contract** (`inttest-results.json`):
|
||
```json
|
||
{
|
||
"run_id": "stonks-inttest-1705312800",
|
||
"image_tag": "abc123",
|
||
"started_at": "2025-01-15T12:00:00Z",
|
||
"completed_at": "2025-01-15T12:07:30Z",
|
||
"exit_code": 0,
|
||
"stages": {
|
||
"infra_deploy": {"duration_s": 45.2, "status": "ok"},
|
||
"seed_data": {"duration_s": 8.1, "status": "ok"},
|
||
"service_deploy": {"duration_s": 32.5, "status": "ok"},
|
||
"integration_tests": {"duration_s": 28.3, "status": "ok"},
|
||
"teardown": {"duration_s": 5.0, "status": "ok"}
|
||
},
|
||
"tests": {
|
||
"total": 41,
|
||
"passed": 41,
|
||
"failed": 0,
|
||
"errors": 0
|
||
},
|
||
"profiling": {
|
||
"endpoints": {
|
||
"/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45},
|
||
...
|
||
},
|
||
"slow_endpoints": []
|
||
}
|
||
}
|
||
```
|
||
|
||
This contract is designed so the future CI/CD pipeline can:
|
||
1. Parse `exit_code` to decide whether to promote to the next stage
|
||
2. Parse `profiling.slow_endpoints` to flag performance regressions
|
||
3. Archive the full JSON as a build artifact
|
||
4. Display `tests.passed`/`tests.failed` in a dashboard
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
set -euo pipefail
|
||
|
||
# Parse CLI args
|
||
IMAGE_TAG="latest"
|
||
NAMESPACE="stonks-inttest-$(date +%s)"
|
||
SKIP_TEARDOWN=false
|
||
RESULTS_FILE="inttest-results.json"
|
||
|
||
while [[ $# -gt 0 ]]; do
|
||
case $1 in
|
||
--image-tag) IMAGE_TAG="$2"; shift 2 ;;
|
||
--namespace) NAMESPACE="$2"; shift 2 ;;
|
||
--skip-teardown) SKIP_TEARDOWN=true; shift ;;
|
||
--results-file) RESULTS_FILE="$2"; shift 2 ;;
|
||
*) echo "Unknown option: $1"; exit 2 ;;
|
||
esac
|
||
done
|
||
|
||
# Cleanup function (always runs, even on failure)
|
||
cleanup() {
|
||
if [ "$SKIP_TEARDOWN" = false ]; then
|
||
kubectl delete namespace "$NAMESPACE" --wait=false 2>/dev/null || true
|
||
fi
|
||
}
|
||
trap cleanup EXIT
|
||
|
||
# Stage 1: Create namespace
|
||
kubectl create namespace "$NAMESPACE"
|
||
|
||
# Stage 2: Deploy infra
|
||
kubectl create configmap postgres-migrations --from-file=infra/migrations/ -n "$NAMESPACE"
|
||
export NAMESPACE
|
||
envsubst < infra/inttest/postgres.yaml | kubectl apply -n "$NAMESPACE" -f -
|
||
envsubst < infra/inttest/redis.yaml | kubectl apply -n "$NAMESPACE" -f -
|
||
envsubst < infra/inttest/minio.yaml | kubectl apply -n "$NAMESPACE" -f -
|
||
kubectl wait --for=condition=ready pod -l app=postgres -n "$NAMESPACE" --timeout=120s
|
||
kubectl wait --for=condition=ready pod -l app=redis -n "$NAMESPACE" --timeout=60s
|
||
kubectl wait --for=condition=ready pod -l app=minio -n "$NAMESPACE" --timeout=60s
|
||
|
||
# Stage 3: Seed data (run from a pod with DB access)
|
||
# ... seed runner pod ...
|
||
|
||
# Stage 4: Deploy services (using specified image tag)
|
||
envsubst < infra/inttest/services.yaml | sed "s/:latest/:${IMAGE_TAG}/g" | kubectl apply -n "$NAMESPACE" -f -
|
||
kubectl wait --for=condition=ready pod -l tier=api -n "$NAMESPACE" --timeout=120s
|
||
|
||
# Stage 5: Run integration tests
|
||
envsubst < infra/inttest/runner.yaml | sed "s/:latest/:${IMAGE_TAG}/g" | kubectl apply -n "$NAMESPACE" -f -
|
||
kubectl wait --for=condition=complete job/inttest-runner -n "$NAMESPACE" --timeout=600s
|
||
|
||
# Stage 6: Collect results
|
||
kubectl logs job/inttest-runner -n "$NAMESPACE" > "$RESULTS_FILE"
|
||
|
||
# Stage 7: Teardown (handled by trap)
|
||
```
|
||
|
||
## Profiling Strategy
|
||
|
||
### What to measure
|
||
1. **Seed insertion time** — how long to populate all tables
|
||
2. **Service startup time** — time from pod creation to readiness
|
||
3. **API response times** — per-endpoint P50/P95/P99
|
||
4. **Memory usage** — `kubectl top pods` snapshot during tests
|
||
|
||
### Performance targets
|
||
| Metric | Target | Action if exceeded |
|
||
|--------|--------|--------------------|
|
||
| Seed insertion | < 30s | Batch INSERT optimization |
|
||
| Service startup | < 30s each | Reduce import time, lazy loading |
|
||
| API P95 | < 200ms | Query optimization, indexes |
|
||
| API P99 | < 500ms | Connection pooling, caching |
|
||
| Total pipeline | < 10 min | Parallelize stages |
|
||
|
||
### Optimization opportunities to discover
|
||
- Slow SQL queries (missing indexes, N+1 patterns)
|
||
- Heavy service startup (import chains)
|
||
- Inefficient aggregation math
|
||
- Unnecessary serialization overhead
|
||
- Connection pool sizing
|
||
|
||
## Data Flow
|
||
|
||
```
|
||
Seed Script
|
||
├── PostgreSQL: companies, documents, trends, recommendations, orders, ...
|
||
├── MinIO: normalized text files, audit artifacts
|
||
└── Redis: (empty — no queue state needed for API tests)
|
||
|
||
Integration Tests
|
||
├── Query API ← PostgreSQL (read-only queries)
|
||
├── Symbol Registry ← PostgreSQL (CRUD operations)
|
||
├── Risk Engine ← PostgreSQL (evaluation + approvals)
|
||
└── Trading Engine ← PostgreSQL + Redis (status, decisions, backtest)
|
||
```
|
||
|
||
## Namespace Lifecycle
|
||
|
||
```
|
||
CREATE namespace
|
||
→ Deploy postgres, redis, minio
|
||
→ Wait for healthy
|
||
→ Run migrations (init container)
|
||
→ Run seed script
|
||
→ Deploy services
|
||
→ Wait for ready
|
||
→ Run tests
|
||
→ Collect results
|
||
→ DELETE namespace (always, even on failure)
|
||
```
|
||
|
||
## Integration Contract for Future CI/CD Pipeline
|
||
|
||
This spec produces a standalone runner (`infra/inttest/run_pipeline.sh`) with a well-defined contract. A future spec ("CI/CD Deployment Pipeline") will consume it as one stage in a larger pipeline:
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ Future CI/CD Pipeline (separate spec) │
|
||
│ │
|
||
│ 1. Git push → webhook to self-hosted runner on gremlin nodes │
|
||
│ 2. Lint + Unit Tests (ruff, pytest, vitest) │
|
||
│ 3. Docker Build → push to GHCR (self-hosted, no GH Actions compute) │
|
||
│ 4. ┌──────────────────────────────────────────────────────────┐ │
|
||
│ │ Integration Tests (THIS SPEC) │ │
|
||
│ │ bash infra/inttest/run_pipeline.sh --image-tag $SHA │ │
|
||
│ │ → reads inttest-results.json │ │
|
||
│ │ → exit code 0 = promote, 1 = block │ │
|
||
│ └──────────────────────────────────────────────────────────┘ │
|
||
│ 5. Promote to beta namespace (if tests pass) │
|
||
│ 6. Promote to paper namespace (manual gate or auto) │
|
||
│ 7. Promote to live namespace (market-hours blocker + break-glass) │
|
||
│ │
|
||
│ Each stage has enable/disable toggle. │
|
||
│ Promotions blocked during market hours (9:30–16:00 ET) unless │
|
||
│ break-glass is activated. │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
**What this spec provides to the future pipeline:**
|
||
- `infra/inttest/run_pipeline.sh` — callable with `--image-tag` to test any build
|
||
- `inttest-results.json` — machine-readable results for promotion decisions
|
||
- Exit codes for pass/fail gating
|
||
- `--skip-teardown` for debugging failed runs
|
||
- All K8s manifests in `infra/inttest/` for sandbox lifecycle
|
||
- Deterministic seed data and comprehensive API test coverage
|