stonks-oracle/.kiro/specs/integration-test-pipeline/design.md

# Integration Test Pipeline — Design

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                    Kubernetes Job: inttest-runner                │
│                                                                 │
│  Stage 1: Create namespace stonks-inttest-{id}                  │
│  Stage 2: Deploy infra (postgres, redis, minio)                 │
│  Stage 3: Run migrations + seed data                            │
│  Stage 4: Deploy services (query-api, registry, risk, trading)  │
│  Stage 5: Wait for readiness                                    │
│  Stage 6: Run API integration tests (pytest)                    │
│  Stage 7: Run frontend render tests (vitest + fetch)            │
│  Stage 8: Collect profiling data                                │
│  Stage 9: Teardown namespace                                    │
└─────────────────────────────────────────────────────────────────┘
```

## Implementation Approach

### Option A: Kubernetes Job with Python test runner (chosen)
A single Kubernetes Job that:
1. Creates ephemeral infra via `kubectl apply` (postgres, redis, minio manifests)
2. Runs a Python seed script directly against the DB
3. Deploys service pods pointing at the ephemeral infra
4. Runs pytest-based integration tests against the live services
5. Collects timing metrics
6. Tears down everything

### Why not Helm?
The sandbox doesn't need Helm's complexity. Plain manifests are simpler, faster, and easier to debug. The infra is ephemeral — no upgrades, no rollbacks.

## Components

### 1. Sandbox Manifests (`infra/inttest/`)

```
infra/inttest/
├── namespace.yaml          # Namespace template
├── postgres.yaml           # PostgreSQL 16 StatefulSet (no PV)
├── redis.yaml              # Redis 7 Deployment
├── minio.yaml              # MinIO Deployment + bucket init Job
├── services.yaml           # All 4 API services (query-api, registry, risk, trading)
└── runner.yaml             # The test runner Job itself
```

Each manifest uses `${NAMESPACE}` placeholder, substituted at runtime.

### 2. Seed Script (`tests/integration/seed_sandbox.py`)

Pure SQL + MinIO operations. No external API calls. Inserts:
- Companies, sources, aliases, competitor relationships
- Documents with intelligence records
- Trend windows with evidence and projections
- Recommendations with evidence citations
- Orders, positions, trading decisions
- Global events with macro impacts
- AI agents with variants and performance logs
- Trading engine config, portfolio snapshots
- MinIO objects (normalized text files)

All UUIDs are deterministic (hardcoded) for reproducible assertions.

### 3. Integration Tests (`tests/integration/test_api_endpoints.py`)

pytest-based tests that call every API endpoint the frontend depends on:

```python
class TestQueryAPI:
    async def test_companies_list(self):
        resp = await client.get("/api/companies")
        assert resp.status_code == 200
        data = resp.json()
        assert len(data) >= 5
        assert all("ticker" in c for c in data)

    async def test_trend_detail(self):
        resp = await client.get(f"/api/trends/{SEED_TREND_ID}")
        assert resp.status_code == 200
        data = resp.json()
        assert data["trend_direction"] in ("bullish", "bearish", "mixed")
        assert 0 <= data["confidence"] <= 1
    # ... 40+ test functions
```

### 4. Frontend Render Tests (`tests/integration/test_frontend_renders.py`)

Uses the live sandbox APIs (not MSW mocks) to verify each page's data dependencies:

```python
class TestFrontendDataDeps:
    """Verify every API call each frontend page makes returns valid data."""

    async def test_home_page_deps(self):
        # Home page calls: companies, pipeline health, ingestion summary, recommendations
        for endpoint in ["/api/companies", "/api/ops/pipeline/health", "/api/ops/ingestion/summary"]:
            resp = await client.get(endpoint)
            assert resp.status_code == 200

    async def test_company_detail_deps(self):
        # CompanyDetail calls 12 different endpoints
        company_id = SEED_COMPANY_IDS["AAPL"]
        for endpoint in [
            f"/api/companies/{company_id}",
            f"/api/companies/{company_id}/sources",
            f"/api/companies/AAPL/macro-impacts",
            f"/api/companies/{company_id}/competitors",
        ]:
            resp = await client.get(endpoint)
            assert resp.status_code == 200
```

### 5. Profiler (`tests/integration/profiler.py`)

Wraps each test with timing:
- Records wall-clock time per API call
- Computes P50/P95/P99 across all calls
- Outputs a summary table at the end
- Flags any endpoint > 500ms as "slow"

### 6. Runner Script (`infra/inttest/run_pipeline.sh`)

Standalone orchestration script with a well-defined CLI contract so any CI/CD system (or a human) can invoke it. The future CI/CD pipeline spec will call this script as a stage.

**CLI interface:**
```
Usage: bash infra/inttest/run_pipeline.sh [OPTIONS]

Options:
  --image-tag TAG       Docker image tag to deploy (default: latest)
  --namespace NAME      Override namespace name (default: stonks-inttest-<timestamp>)
  --skip-teardown       Leave namespace running after tests (for debugging)
  --results-file PATH   Path for JSON results output (default: inttest-results.json)

Exit codes:
  0  All tests passed
  1  One or more test failures
  2  Infrastructure setup failure (postgres/redis/minio/services didn't start)
```

**JSON result contract** (`inttest-results.json`):
```json
{
  "run_id": "stonks-inttest-1705312800",
  "image_tag": "abc123",
  "started_at": "2025-01-15T12:00:00Z",
  "completed_at": "2025-01-15T12:07:30Z",
  "exit_code": 0,
  "stages": {
    "infra_deploy": {"duration_s": 45.2, "status": "ok"},
    "seed_data": {"duration_s": 8.1, "status": "ok"},
    "service_deploy": {"duration_s": 32.5, "status": "ok"},
    "integration_tests": {"duration_s": 28.3, "status": "ok"},
    "teardown": {"duration_s": 5.0, "status": "ok"}
  },
  "tests": {
    "total": 41,
    "passed": 41,
    "failed": 0,
    "errors": 0
  },
  "profiling": {
    "endpoints": {
      "/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45},
      ...
    },
    "slow_endpoints": []
  }
}
```

This contract is designed so the future CI/CD pipeline can:
1. Parse `exit_code` to decide whether to promote to the next stage
2. Parse `profiling.slow_endpoints` to flag performance regressions
3. Archive the full JSON as a build artifact
4. Display `tests.passed`/`tests.failed` in a dashboard

```bash
#!/bin/bash
set -euo pipefail

# Parse CLI args
IMAGE_TAG="latest"
NAMESPACE="stonks-inttest-$(date +%s)"
SKIP_TEARDOWN=false
RESULTS_FILE="inttest-results.json"

while [[ $# -gt 0 ]]; do
  case $1 in
    --image-tag) IMAGE_TAG="$2"; shift 2 ;;
    --namespace) NAMESPACE="$2"; shift 2 ;;
    --skip-teardown) SKIP_TEARDOWN=true; shift ;;
    --results-file) RESULTS_FILE="$2"; shift 2 ;;
    *) echo "Unknown option: $1"; exit 2 ;;
  esac
done

# Cleanup function (always runs, even on failure)
cleanup() {
  if [ "$SKIP_TEARDOWN" = false ]; then
    kubectl delete namespace "$NAMESPACE" --wait=false 2>/dev/null || true
  fi
}
trap cleanup EXIT

# Stage 1: Create namespace
kubectl create namespace "$NAMESPACE"

# Stage 2: Deploy infra
kubectl create configmap postgres-migrations --from-file=infra/migrations/ -n "$NAMESPACE"
export NAMESPACE
envsubst < infra/inttest/postgres.yaml | kubectl apply -n "$NAMESPACE" -f -
envsubst < infra/inttest/redis.yaml | kubectl apply -n "$NAMESPACE" -f -
envsubst < infra/inttest/minio.yaml | kubectl apply -n "$NAMESPACE" -f -
kubectl wait --for=condition=ready pod -l app=postgres -n "$NAMESPACE" --timeout=120s
kubectl wait --for=condition=ready pod -l app=redis -n "$NAMESPACE" --timeout=60s
kubectl wait --for=condition=ready pod -l app=minio -n "$NAMESPACE" --timeout=60s

# Stage 3: Seed data (run from a pod with DB access)
# ... seed runner pod ...

# Stage 4: Deploy services (using specified image tag)
envsubst < infra/inttest/services.yaml | sed "s/:latest/:${IMAGE_TAG}/g" | kubectl apply -n "$NAMESPACE" -f -
kubectl wait --for=condition=ready pod -l tier=api -n "$NAMESPACE" --timeout=120s

# Stage 5: Run integration tests
envsubst < infra/inttest/runner.yaml | sed "s/:latest/:${IMAGE_TAG}/g" | kubectl apply -n "$NAMESPACE" -f -
kubectl wait --for=condition=complete job/inttest-runner -n "$NAMESPACE" --timeout=600s

# Stage 6: Collect results
kubectl logs job/inttest-runner -n "$NAMESPACE" > "$RESULTS_FILE"

# Stage 7: Teardown (handled by trap)
```

## Profiling Strategy

### What to measure
1. **Seed insertion time** — how long to populate all tables
2. **Service startup time** — time from pod creation to readiness
3. **API response times** — per-endpoint P50/P95/P99
4. **Memory usage** — `kubectl top pods` snapshot during tests

### Performance targets
| Metric | Target | Action if exceeded |
|--------|--------|--------------------|
| Seed insertion | < 30s | Batch INSERT optimization |
| Service startup | < 30s each | Reduce import time, lazy loading |
| API P95 | < 200ms | Query optimization, indexes |
| API P99 | < 500ms | Connection pooling, caching |
| Total pipeline | < 10 min | Parallelize stages |

### Optimization opportunities to discover
- Slow SQL queries (missing indexes, N+1 patterns)
- Heavy service startup (import chains)
- Inefficient aggregation math
- Unnecessary serialization overhead
- Connection pool sizing

## Data Flow

```
Seed Script
  ├── PostgreSQL: companies, documents, trends, recommendations, orders, ...
  ├── MinIO: normalized text files, audit artifacts
  └── Redis: (empty — no queue state needed for API tests)

Integration Tests
  ├── Query API ← PostgreSQL (read-only queries)
  ├── Symbol Registry ← PostgreSQL (CRUD operations)
  ├── Risk Engine ← PostgreSQL (evaluation + approvals)
  └── Trading Engine ← PostgreSQL + Redis (status, decisions, backtest)
```

## Namespace Lifecycle

```
CREATE namespace
  → Deploy postgres, redis, minio
    → Wait for healthy
      → Run migrations (init container)
        → Run seed script
          → Deploy services
            → Wait for ready
              → Run tests
                → Collect results
                  → DELETE namespace (always, even on failure)
```

## Integration Contract for Future CI/CD Pipeline

This spec produces a standalone runner (`infra/inttest/run_pipeline.sh`) with a well-defined contract. A future spec ("CI/CD Deployment Pipeline") will consume it as one stage in a larger pipeline:

```
┌─────────────────────────────────────────────────────────────────────────┐
│  Future CI/CD Pipeline (separate spec)                                  │
│                                                                         │
│  1. Git push → webhook to self-hosted runner on gremlin nodes           │
│  2. Lint + Unit Tests (ruff, pytest, vitest)                            │
│  3. Docker Build → push to GHCR (self-hosted, no GH Actions compute)   │
│  4. ┌──────────────────────────────────────────────────────────┐        │
│     │  Integration Tests (THIS SPEC)                           │        │
│     │  bash infra/inttest/run_pipeline.sh --image-tag $SHA     │        │
│     │  → reads inttest-results.json                            │        │
│     │  → exit code 0 = promote, 1 = block                     │        │
│     └──────────────────────────────────────────────────────────┘        │
│  5. Promote to beta namespace (if tests pass)                           │
│  6. Promote to paper namespace (manual gate or auto)                    │
│  7. Promote to live namespace (market-hours blocker + break-glass)      │
│                                                                         │
│  Each stage has enable/disable toggle.                                  │
│  Promotions blocked during market hours (9:30–16:00 ET) unless          │
│  break-glass is activated.                                              │
└─────────────────────────────────────────────────────────────────────────┘
```

**What this spec provides to the future pipeline:**
- `infra/inttest/run_pipeline.sh` — callable with `--image-tag` to test any build
- `inttest-results.json` — machine-readable results for promotion decisions
- Exit codes for pass/fail gating
- `--skip-teardown` for debugging failed runs
- All K8s manifests in `infra/inttest/` for sandbox lifecycle
- Deterministic seed data and comprehensive API test coverage