spec: integration test pipeline — requirements, design, and tasks

This commit is contained in:
Celes Renata
2026-04-18 00:12:49 +00:00
parent ee5fd30398
commit 40227a4eb2
3 changed files with 326 additions and 0 deletions
@@ -0,0 +1,219 @@
# Integration Test Pipeline — Design
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Job: inttest-runner │
│ │
│ Stage 1: Create namespace stonks-inttest-{id} │
│ Stage 2: Deploy infra (postgres, redis, minio) │
│ Stage 3: Run migrations + seed data │
│ Stage 4: Deploy services (query-api, registry, risk, trading) │
│ Stage 5: Wait for readiness │
│ Stage 6: Run API integration tests (pytest) │
│ Stage 7: Run frontend render tests (vitest + fetch) │
│ Stage 8: Collect profiling data │
│ Stage 9: Teardown namespace │
└─────────────────────────────────────────────────────────────────┘
```
## Implementation Approach
### Option A: Kubernetes Job with Python test runner (chosen)
A single Kubernetes Job that:
1. Creates ephemeral infra via `kubectl apply` (postgres, redis, minio manifests)
2. Runs a Python seed script directly against the DB
3. Deploys service pods pointing at the ephemeral infra
4. Runs pytest-based integration tests against the live services
5. Collects timing metrics
6. Tears down everything
### Why not Helm?
The sandbox doesn't need Helm's complexity. Plain manifests are simpler, faster, and easier to debug. The infra is ephemeral — no upgrades, no rollbacks.
## Components
### 1. Sandbox Manifests (`infra/inttest/`)
```
infra/inttest/
├── namespace.yaml # Namespace template
├── postgres.yaml # PostgreSQL 16 StatefulSet (no PV)
├── redis.yaml # Redis 7 Deployment
├── minio.yaml # MinIO Deployment + bucket init Job
├── services.yaml # All 4 API services (query-api, registry, risk, trading)
└── runner.yaml # The test runner Job itself
```
Each manifest uses `${NAMESPACE}` placeholder, substituted at runtime.
### 2. Seed Script (`tests/integration/seed_sandbox.py`)
Pure SQL + MinIO operations. No external API calls. Inserts:
- Companies, sources, aliases, competitor relationships
- Documents with intelligence records
- Trend windows with evidence and projections
- Recommendations with evidence citations
- Orders, positions, trading decisions
- Global events with macro impacts
- AI agents with variants and performance logs
- Trading engine config, portfolio snapshots
- MinIO objects (normalized text files)
All UUIDs are deterministic (hardcoded) for reproducible assertions.
### 3. Integration Tests (`tests/integration/test_api_endpoints.py`)
pytest-based tests that call every API endpoint the frontend depends on:
```python
class TestQueryAPI:
async def test_companies_list(self):
resp = await client.get("/api/companies")
assert resp.status_code == 200
data = resp.json()
assert len(data) >= 5
assert all("ticker" in c for c in data)
async def test_trend_detail(self):
resp = await client.get(f"/api/trends/{SEED_TREND_ID}")
assert resp.status_code == 200
data = resp.json()
assert data["trend_direction"] in ("bullish", "bearish", "mixed")
assert 0 <= data["confidence"] <= 1
# ... 40+ test functions
```
### 4. Frontend Render Tests (`tests/integration/test_frontend_renders.py`)
Uses the live sandbox APIs (not MSW mocks) to verify each page's data dependencies:
```python
class TestFrontendDataDeps:
"""Verify every API call each frontend page makes returns valid data."""
async def test_home_page_deps(self):
# Home page calls: companies, pipeline health, ingestion summary, recommendations
for endpoint in ["/api/companies", "/api/ops/pipeline/health", "/api/ops/ingestion/summary"]:
resp = await client.get(endpoint)
assert resp.status_code == 200
async def test_company_detail_deps(self):
# CompanyDetail calls 12 different endpoints
company_id = SEED_COMPANY_IDS["AAPL"]
for endpoint in [
f"/api/companies/{company_id}",
f"/api/companies/{company_id}/sources",
f"/api/companies/AAPL/macro-impacts",
f"/api/companies/{company_id}/competitors",
]:
resp = await client.get(endpoint)
assert resp.status_code == 200
```
### 5. Profiler (`tests/integration/profiler.py`)
Wraps each test with timing:
- Records wall-clock time per API call
- Computes P50/P95/P99 across all calls
- Outputs a summary table at the end
- Flags any endpoint > 500ms as "slow"
### 6. Runner Script (`tests/integration/run_pipeline.sh`)
Orchestrates the full pipeline:
```bash
#!/bin/bash
set -euo pipefail
NAMESPACE="stonks-inttest-$(date +%s)"
PROFILING_OUTPUT="inttest-results-${NAMESPACE}.json"
# Stage 1: Create namespace
kubectl create namespace $NAMESPACE
# Stage 2: Deploy infra
envsubst < infra/inttest/postgres.yaml | kubectl apply -n $NAMESPACE -f -
envsubst < infra/inttest/redis.yaml | kubectl apply -n $NAMESPACE -f -
envsubst < infra/inttest/minio.yaml | kubectl apply -n $NAMESPACE -f -
kubectl wait --for=condition=ready pod -l app=postgres -n $NAMESPACE --timeout=120s
kubectl wait --for=condition=ready pod -l app=redis -n $NAMESPACE --timeout=60s
kubectl wait --for=condition=ready pod -l app=minio -n $NAMESPACE --timeout=60s
# Stage 3: Run migrations + seed
kubectl run seed-runner --image=ghcr.io/celesrenata/stonks-oracle/query-api:latest \
-n $NAMESPACE --restart=Never --env="POSTGRES_HOST=postgres" ... \
-- python -c "import asyncio; from tests.integration.seed_sandbox import seed; asyncio.run(seed())"
kubectl wait --for=condition=complete job/seed-runner -n $NAMESPACE --timeout=120s
# Stage 4: Deploy services
envsubst < infra/inttest/services.yaml | kubectl apply -n $NAMESPACE -f -
kubectl wait --for=condition=ready pod -l tier=api -n $NAMESPACE --timeout=120s
# Stage 5: Run integration tests
kubectl run test-runner --image=ghcr.io/celesrenata/stonks-oracle/query-api:latest \
-n $NAMESPACE --restart=Never \
-- python -m pytest tests/integration/ -v --tb=short
# Stage 6: Collect results
kubectl logs job/test-runner -n $NAMESPACE > $PROFILING_OUTPUT
# Stage 7: Teardown
kubectl delete namespace $NAMESPACE --wait=false
```
## Profiling Strategy
### What to measure
1. **Seed insertion time** — how long to populate all tables
2. **Service startup time** — time from pod creation to readiness
3. **API response times** — per-endpoint P50/P95/P99
4. **Memory usage**`kubectl top pods` snapshot during tests
### Performance targets
| Metric | Target | Action if exceeded |
|--------|--------|--------------------|
| Seed insertion | < 30s | Batch INSERT optimization |
| Service startup | < 30s each | Reduce import time, lazy loading |
| API P95 | < 200ms | Query optimization, indexes |
| API P99 | < 500ms | Connection pooling, caching |
| Total pipeline | < 10 min | Parallelize stages |
### Optimization opportunities to discover
- Slow SQL queries (missing indexes, N+1 patterns)
- Heavy service startup (import chains)
- Inefficient aggregation math
- Unnecessary serialization overhead
- Connection pool sizing
## Data Flow
```
Seed Script
├── PostgreSQL: companies, documents, trends, recommendations, orders, ...
├── MinIO: normalized text files, audit artifacts
└── Redis: (empty — no queue state needed for API tests)
Integration Tests
├── Query API ← PostgreSQL (read-only queries)
├── Symbol Registry ← PostgreSQL (CRUD operations)
├── Risk Engine ← PostgreSQL (evaluation + approvals)
└── Trading Engine ← PostgreSQL + Redis (status, decisions, backtest)
```
## Namespace Lifecycle
```
CREATE namespace
→ Deploy postgres, redis, minio
→ Wait for healthy
→ Run migrations (init container)
→ Run seed script
→ Deploy services
→ Wait for ready
→ Run tests
→ Collect results
→ DELETE namespace (always, even on failure)
```
@@ -0,0 +1,76 @@
# Integration Test Pipeline — Requirements
## Overview
End-to-end integration test pipeline that runs in Kubernetes, spinning up isolated infrastructure (PostgreSQL, Redis, MinIO), seeding realistic data, deploying all application services, and validating every frontend page's data dependencies against live API responses. Includes profiling for performance optimization.
## Functional Requirements
### FR-1: Pipeline Stages
1. **Lint** — ruff check on Python, eslint on frontend
2. **Unit Tests** — pytest + vitest against local mocks
3. **Build** — Docker images for all services + dashboard
4. **Deploy Sandbox** — ephemeral namespace with own PostgreSQL, Redis, MinIO (no Ollama — too heavy for CI)
5. **Seed Data** — populate DB and S3 with enough data to exercise every frontend component
6. **Integration Tests** — HTTP-level validation of every API endpoint the frontend depends on
7. **Frontend E2E** — render every page against the live sandbox APIs, assert no errors and expected data
8. **Profiling** — measure and report timing for each pipeline stage and each API endpoint
9. **Teardown** — delete the ephemeral namespace and all resources
### FR-2: Sandbox Infrastructure
- PostgreSQL 16 (ephemeral, no persistent volume)
- Redis 7 (ephemeral)
- MinIO (ephemeral, with bucket initialization)
- All application services (query-api, symbol-registry, risk, trading-engine) running against sandbox infra
- No Ollama — LLM-dependent services (extractor, recommendation thesis rewriter) are excluded from integration tests
- No Trino/Hive/Superset — analytical stack excluded (not needed for frontend validation)
### FR-3: Seed Data Coverage
The seed data must exercise every frontend page. Minimum:
- 5 companies with sources, aliases, competitor relationships
- 10 documents (mix of news, filings, macro_event) with intelligence extraction records
- 5 trend windows with projections and evidence
- 5 recommendations with evidence citations
- 3 orders (filled, pending, cancelled) with events and audit trail
- 2 positions with P&L
- 2 global events with macro impact records across multiple companies
- 2 competitive signal records
- 2 historical pattern records
- 1 trading engine config + 1 trading decision
- 1 portfolio snapshot
- 3 AI agents with 1 variant each + performance log entries
- 1 risk config with macro_enabled and competitive_enabled
- MinIO buckets with at least 1 object in stonks-normalized
### FR-4: API Endpoint Validation
Test every endpoint the frontend calls:
- **Query API** (17 endpoints): companies, documents, trends, recommendations, orders, positions, macro events, pipeline health, ingestion summary, coverage gaps, agents, variants
- **Symbol Registry** (8 endpoints): companies CRUD, sources, aliases, competitors, exposure profiles
- **Risk Engine** (4 endpoints): evaluate, approvals pending/review, health
- **Trading Engine** (12 endpoints): status, config, decisions, metrics, backtest, notifications, override
### FR-5: Frontend Page Validation
For each of the 17 frontend pages, verify:
- The page renders without JavaScript errors
- All API calls return 200 with non-empty data
- Key data fields are present (e.g., company has ticker, trend has direction)
### FR-6: Profiling & Reporting
- Wall-clock time for each pipeline stage
- P50/P95/P99 response times for each API endpoint
- Total seed data insertion time
- Memory usage of each service pod
- Final pass/fail summary with details on any failures
## Non-Functional Requirements
### NFR-1: Isolation
Each pipeline run uses a unique Kubernetes namespace (`stonks-inttest-{run-id}`) that is fully cleaned up on completion (success or failure).
### NFR-2: Speed
Target: full pipeline completes in under 10 minutes. Seed data insertion under 30 seconds. API validation under 60 seconds.
### NFR-3: Reproducibility
Seed data is deterministic (fixed UUIDs, timestamps). No external API calls (Polygon, Alpaca). All data is synthetic.
### NFR-4: CI Integration
Pipeline can be triggered from GitHub Actions as a separate workflow, or manually via `kubectl apply`.
@@ -0,0 +1,31 @@
# Integration Test Pipeline — Tasks
## Phase 1: Sandbox Infrastructure Manifests
- [ ] 1. Create `infra/inttest/postgres.yaml` — PostgreSQL 16 Deployment with migrations as init container, no PV
- [ ] 2. Create `infra/inttest/redis.yaml` — Redis 7 Deployment, no persistence
- [ ] 3. Create `infra/inttest/minio.yaml` — MinIO Deployment + bucket init Job
- [ ] 4. Create `infra/inttest/services.yaml` — query-api, symbol-registry, risk, trading-engine Deployments pointing at sandbox infra
- [ ] 5. Create `infra/inttest/runner.yaml` — test runner Job template
## Phase 2: Seed Data
- [ ] 6. Create `tests/integration/seed_sandbox.py` — deterministic seed script with fixed UUIDs for 5 companies, 10 documents, 5 trends, 5 recommendations, 3 orders, 2 positions, 2 global events, 2 competitive signals, 3 agents, trading config, portfolio snapshot
- [ ] 7. Create `tests/integration/seed_minio.py` — seed MinIO buckets with sample normalized text files
## Phase 3: API Integration Tests
- [ ] 8. Create `tests/integration/conftest.py` — pytest fixtures for HTTP client, base URLs, seed IDs
- [ ] 9. Create `tests/integration/test_query_api.py` — tests for all 17 query API endpoints
- [ ] 10. Create `tests/integration/test_registry_api.py` — tests for all 8 symbol registry endpoints
- [ ] 11. Create `tests/integration/test_risk_api.py` — tests for all 4 risk engine endpoints
- [ ] 12. Create `tests/integration/test_trading_api.py` — tests for all 12 trading engine endpoints
- [ ] 13. Create `tests/integration/test_frontend_data_deps.py` — tests verifying every frontend page's API dependencies return valid data
## Phase 4: Profiling
- [ ] 14. Create `tests/integration/profiler.py` — timing wrapper that records per-endpoint latency and produces a summary report
- [ ] 15. Add profiling output to test runner (JSON report with P50/P95/P99 per endpoint, stage timings)
## Phase 5: Pipeline Runner
- [ ] 16. Create `infra/inttest/run_pipeline.sh` — orchestration script that creates namespace, deploys infra, seeds, deploys services, runs tests, collects results, tears down
- [ ] 17. Create `.github/workflows/integration.yml` — GitHub Actions workflow that triggers the pipeline on demand or on PR
## Phase 6: Documentation
- [ ] 18. Add integration test section to `docs/LOCAL_DEV_SETUP.md` with instructions for running locally