Files
stonks-oracle/.kiro/specs/integration-test-pipeline/design.md
T
Celes Renata c85c0068a2 fix: clean up utcnow deprecation warnings, fix 12 failing tests, add CI/CD pipeline manifests
- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files
- Fix 12 failing tests to match current implementation behavior
- Fix pytest_plugins in non-top-level conftest (moved to root conftest.py)
- Auto-fix 189 lint issues (import sorting, unused imports)
- Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests)
- Add values-beta.yaml and values-paper.yaml for staged deployments
- Update GitHub Actions workflow to use self-hosted-gremlin runners
- Add integration-test job to CI pipeline

Result: 1596 passed, 0 failed, 0 warnings
2026-04-18 03:59:28 +00:00

13 KiB
Raw Blame History

Integration Test Pipeline — Design

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Kubernetes Job: inttest-runner                │
│                                                                 │
│  Stage 1: Create namespace stonks-inttest-{id}                  │
│  Stage 2: Deploy infra (postgres, redis, minio)                 │
│  Stage 3: Run migrations + seed data                            │
│  Stage 4: Deploy services (query-api, registry, risk, trading)  │
│  Stage 5: Wait for readiness                                    │
│  Stage 6: Run API integration tests (pytest)                    │
│  Stage 7: Run frontend render tests (vitest + fetch)            │
│  Stage 8: Collect profiling data                                │
│  Stage 9: Teardown namespace                                    │
└─────────────────────────────────────────────────────────────────┘

Implementation Approach

Option A: Kubernetes Job with Python test runner (chosen)

A single Kubernetes Job that:

  1. Creates ephemeral infra via kubectl apply (postgres, redis, minio manifests)
  2. Runs a Python seed script directly against the DB
  3. Deploys service pods pointing at the ephemeral infra
  4. Runs pytest-based integration tests against the live services
  5. Collects timing metrics
  6. Tears down everything

Why not Helm?

The sandbox doesn't need Helm's complexity. Plain manifests are simpler, faster, and easier to debug. The infra is ephemeral — no upgrades, no rollbacks.

Components

1. Sandbox Manifests (infra/inttest/)

infra/inttest/
├── namespace.yaml          # Namespace template
├── postgres.yaml           # PostgreSQL 16 StatefulSet (no PV)
├── redis.yaml              # Redis 7 Deployment
├── minio.yaml              # MinIO Deployment + bucket init Job
├── services.yaml           # All 4 API services (query-api, registry, risk, trading)
└── runner.yaml             # The test runner Job itself

Each manifest uses ${NAMESPACE} placeholder, substituted at runtime.

2. Seed Script (tests/integration/seed_sandbox.py)

Pure SQL + MinIO operations. No external API calls. Inserts:

  • Companies, sources, aliases, competitor relationships
  • Documents with intelligence records
  • Trend windows with evidence and projections
  • Recommendations with evidence citations
  • Orders, positions, trading decisions
  • Global events with macro impacts
  • AI agents with variants and performance logs
  • Trading engine config, portfolio snapshots
  • MinIO objects (normalized text files)

All UUIDs are deterministic (hardcoded) for reproducible assertions.

3. Integration Tests (tests/integration/test_api_endpoints.py)

pytest-based tests that call every API endpoint the frontend depends on:

class TestQueryAPI:
    async def test_companies_list(self):
        resp = await client.get("/api/companies")
        assert resp.status_code == 200
        data = resp.json()
        assert len(data) >= 5
        assert all("ticker" in c for c in data)

    async def test_trend_detail(self):
        resp = await client.get(f"/api/trends/{SEED_TREND_ID}")
        assert resp.status_code == 200
        data = resp.json()
        assert data["trend_direction"] in ("bullish", "bearish", "mixed")
        assert 0 <= data["confidence"] <= 1
    # ... 40+ test functions

4. Frontend Render Tests (tests/integration/test_frontend_renders.py)

Uses the live sandbox APIs (not MSW mocks) to verify each page's data dependencies:

class TestFrontendDataDeps:
    """Verify every API call each frontend page makes returns valid data."""

    async def test_home_page_deps(self):
        # Home page calls: companies, pipeline health, ingestion summary, recommendations
        for endpoint in ["/api/companies", "/api/ops/pipeline/health", "/api/ops/ingestion/summary"]:
            resp = await client.get(endpoint)
            assert resp.status_code == 200

    async def test_company_detail_deps(self):
        # CompanyDetail calls 12 different endpoints
        company_id = SEED_COMPANY_IDS["AAPL"]
        for endpoint in [
            f"/api/companies/{company_id}",
            f"/api/companies/{company_id}/sources",
            f"/api/companies/AAPL/macro-impacts",
            f"/api/companies/{company_id}/competitors",
        ]:
            resp = await client.get(endpoint)
            assert resp.status_code == 200

5. Profiler (tests/integration/profiler.py)

Wraps each test with timing:

  • Records wall-clock time per API call
  • Computes P50/P95/P99 across all calls
  • Outputs a summary table at the end
  • Flags any endpoint > 500ms as "slow"

6. Runner Script (infra/inttest/run_pipeline.sh)

Standalone orchestration script with a well-defined CLI contract so any CI/CD system (or a human) can invoke it. The future CI/CD pipeline spec will call this script as a stage.

CLI interface:

Usage: bash infra/inttest/run_pipeline.sh [OPTIONS]

Options:
  --image-tag TAG       Docker image tag to deploy (default: latest)
  --namespace NAME      Override namespace name (default: stonks-inttest-<timestamp>)
  --skip-teardown       Leave namespace running after tests (for debugging)
  --results-file PATH   Path for JSON results output (default: inttest-results.json)

Exit codes:
  0  All tests passed
  1  One or more test failures
  2  Infrastructure setup failure (postgres/redis/minio/services didn't start)

JSON result contract (inttest-results.json):

{
  "run_id": "stonks-inttest-1705312800",
  "image_tag": "abc123",
  "started_at": "2025-01-15T12:00:00Z",
  "completed_at": "2025-01-15T12:07:30Z",
  "exit_code": 0,
  "stages": {
    "infra_deploy": {"duration_s": 45.2, "status": "ok"},
    "seed_data": {"duration_s": 8.1, "status": "ok"},
    "service_deploy": {"duration_s": 32.5, "status": "ok"},
    "integration_tests": {"duration_s": 28.3, "status": "ok"},
    "teardown": {"duration_s": 5.0, "status": "ok"}
  },
  "tests": {
    "total": 41,
    "passed": 41,
    "failed": 0,
    "errors": 0
  },
  "profiling": {
    "endpoints": {
      "/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45},
      ...
    },
    "slow_endpoints": []
  }
}

This contract is designed so the future CI/CD pipeline can:

  1. Parse exit_code to decide whether to promote to the next stage
  2. Parse profiling.slow_endpoints to flag performance regressions
  3. Archive the full JSON as a build artifact
  4. Display tests.passed/tests.failed in a dashboard
#!/bin/bash
set -euo pipefail

# Parse CLI args
IMAGE_TAG="latest"
NAMESPACE="stonks-inttest-$(date +%s)"
SKIP_TEARDOWN=false
RESULTS_FILE="inttest-results.json"

while [[ $# -gt 0 ]]; do
  case $1 in
    --image-tag) IMAGE_TAG="$2"; shift 2 ;;
    --namespace) NAMESPACE="$2"; shift 2 ;;
    --skip-teardown) SKIP_TEARDOWN=true; shift ;;
    --results-file) RESULTS_FILE="$2"; shift 2 ;;
    *) echo "Unknown option: $1"; exit 2 ;;
  esac
done

# Cleanup function (always runs, even on failure)
cleanup() {
  if [ "$SKIP_TEARDOWN" = false ]; then
    kubectl delete namespace "$NAMESPACE" --wait=false 2>/dev/null || true
  fi
}
trap cleanup EXIT

# Stage 1: Create namespace
kubectl create namespace "$NAMESPACE"

# Stage 2: Deploy infra
kubectl create configmap postgres-migrations --from-file=infra/migrations/ -n "$NAMESPACE"
export NAMESPACE
envsubst < infra/inttest/postgres.yaml | kubectl apply -n "$NAMESPACE" -f -
envsubst < infra/inttest/redis.yaml | kubectl apply -n "$NAMESPACE" -f -
envsubst < infra/inttest/minio.yaml | kubectl apply -n "$NAMESPACE" -f -
kubectl wait --for=condition=ready pod -l app=postgres -n "$NAMESPACE" --timeout=120s
kubectl wait --for=condition=ready pod -l app=redis -n "$NAMESPACE" --timeout=60s
kubectl wait --for=condition=ready pod -l app=minio -n "$NAMESPACE" --timeout=60s

# Stage 3: Seed data (run from a pod with DB access)
# ... seed runner pod ...

# Stage 4: Deploy services (using specified image tag)
envsubst < infra/inttest/services.yaml | sed "s/:latest/:${IMAGE_TAG}/g" | kubectl apply -n "$NAMESPACE" -f -
kubectl wait --for=condition=ready pod -l tier=api -n "$NAMESPACE" --timeout=120s

# Stage 5: Run integration tests
envsubst < infra/inttest/runner.yaml | sed "s/:latest/:${IMAGE_TAG}/g" | kubectl apply -n "$NAMESPACE" -f -
kubectl wait --for=condition=complete job/inttest-runner -n "$NAMESPACE" --timeout=600s

# Stage 6: Collect results
kubectl logs job/inttest-runner -n "$NAMESPACE" > "$RESULTS_FILE"

# Stage 7: Teardown (handled by trap)

Profiling Strategy

What to measure

  1. Seed insertion time — how long to populate all tables
  2. Service startup time — time from pod creation to readiness
  3. API response times — per-endpoint P50/P95/P99
  4. Memory usagekubectl top pods snapshot during tests

Performance targets

Metric Target Action if exceeded
Seed insertion < 30s Batch INSERT optimization
Service startup < 30s each Reduce import time, lazy loading
API P95 < 200ms Query optimization, indexes
API P99 < 500ms Connection pooling, caching
Total pipeline < 10 min Parallelize stages

Optimization opportunities to discover

  • Slow SQL queries (missing indexes, N+1 patterns)
  • Heavy service startup (import chains)
  • Inefficient aggregation math
  • Unnecessary serialization overhead
  • Connection pool sizing

Data Flow

Seed Script
  ├── PostgreSQL: companies, documents, trends, recommendations, orders, ...
  ├── MinIO: normalized text files, audit artifacts
  └── Redis: (empty — no queue state needed for API tests)

Integration Tests
  ├── Query API ← PostgreSQL (read-only queries)
  ├── Symbol Registry ← PostgreSQL (CRUD operations)
  ├── Risk Engine ← PostgreSQL (evaluation + approvals)
  └── Trading Engine ← PostgreSQL + Redis (status, decisions, backtest)

Namespace Lifecycle

CREATE namespace
  → Deploy postgres, redis, minio
    → Wait for healthy
      → Run migrations (init container)
        → Run seed script
          → Deploy services
            → Wait for ready
              → Run tests
                → Collect results
                  → DELETE namespace (always, even on failure)

Integration Contract for Future CI/CD Pipeline

This spec produces a standalone runner (infra/inttest/run_pipeline.sh) with a well-defined contract. A future spec ("CI/CD Deployment Pipeline") will consume it as one stage in a larger pipeline:

┌─────────────────────────────────────────────────────────────────────────┐
│  Future CI/CD Pipeline (separate spec)                                  │
│                                                                         │
│  1. Git push → webhook to self-hosted runner on gremlin nodes           │
│  2. Lint + Unit Tests (ruff, pytest, vitest)                            │
│  3. Docker Build → push to GHCR (self-hosted, no GH Actions compute)   │
│  4. ┌──────────────────────────────────────────────────────────┐        │
│     │  Integration Tests (THIS SPEC)                           │        │
│     │  bash infra/inttest/run_pipeline.sh --image-tag $SHA     │        │
│     │  → reads inttest-results.json                            │        │
│     │  → exit code 0 = promote, 1 = block                     │        │
│     └──────────────────────────────────────────────────────────┘        │
│  5. Promote to beta namespace (if tests pass)                           │
│  6. Promote to paper namespace (manual gate or auto)                    │
│  7. Promote to live namespace (market-hours blocker + break-glass)      │
│                                                                         │
│  Each stage has enable/disable toggle.                                  │
│  Promotions blocked during market hours (9:3016:00 ET) unless          │
│  break-glass is activated.                                              │
└─────────────────────────────────────────────────────────────────────────┘

What this spec provides to the future pipeline:

  • infra/inttest/run_pipeline.sh — callable with --image-tag to test any build
  • inttest-results.json — machine-readable results for promotion decisions
  • Exit codes for pass/fail gating
  • --skip-teardown for debugging failed runs
  • All K8s manifests in infra/inttest/ for sandbox lifecycle
  • Deterministic seed data and comprehensive API test coverage