diff --git a/.kiro/specs/trading-feedback-engine/.config.kiro b/.kiro/specs/trading-feedback-engine/.config.kiro
new file mode 100644
index 0000000..0679511
--- /dev/null
+++ b/.kiro/specs/trading-feedback-engine/.config.kiro
@@ -0,0 +1 @@
+{"specId": "d76705a8-fb91-4fce-b59e-c4b3b0dbbd83", "workflowType": "requirements-first", "specType": "feature"}
diff --git a/.kiro/specs/trading-feedback-engine/design.md b/.kiro/specs/trading-feedback-engine/design.md
new file mode 100644
index 0000000..95444f6
--- /dev/null
+++ b/.kiro/specs/trading-feedback-engine/design.md
@@ -0,0 +1,802 @@
+# Design Document — Trading Feedback Engine
+
+## Overview
+
+This design adds a periodic trading performance reporting system to Stonks Oracle. The system collects trading data (P&L, recommendations, positions, risk metrics, model quality), generates structured JSON reports with AI-powered summaries, validates report metrics against live data, and stores reports for retrieval via API.
+
+The core challenge is fitting AI summarization within the 8k-token context window of the `qwen3.5:9b-fast` model on the local Ollama instance. The design addresses this with a chunking strategy that serializes report section data into ≤6,000-character chunks, summarizes each chunk independently, then merges chunk summaries into a final section summary. This hierarchical summarization approach keeps each LLM call well within the token budget while producing coherent narratives.
+
+### Design Rationale
+
+A trading system without periodic performance feedback forces the operator to manually query tables and compute metrics. The feedback engine closes this gap by:
+
+1. **Automating data collection** — pulling from 7+ tables (trading_decisions, orders, positions, portfolio_snapshots, recommendations, prediction_outcomes, model_metric_snapshots) into a single structured report
+2. **AI-powered summarization** — using the existing agent infrastructure (ai_agents, AgentConfigResolver, llm_factory) to generate natural-language summaries that highlight trends and anomalies
+3. **Cross-validation** — comparing computed metrics against live validation data (prediction_outcomes, model_metric_snapshots) and flagging discrepancies >5%
+4. **Persistent storage** — storing reports as JSONB for historical comparison and trend analysis
+5. **Scheduled generation** — daily (after market close) and weekly (Saturday) reports via Redis queue jobs
+
+The design reuses existing infrastructure: asyncpg for persistence, FastAPI for API endpoints, Redis queues for async job processing, the ai_agents/AgentConfigResolver/llm_factory stack for LLM access, and TanStack Query hooks on the frontend.
+
+---
+
+## Architecture
+
+### High-Level Data Flow
+
+```mermaid
+flowchart TD
+ subgraph "Scheduling (Trigger)"
+ A[Scheduler Service] -->|after 16:30 ET daily| B[Redis Queue
stonks:queue:report_generation]
+ A -->|Saturday weekly| B
+ C[Manual API Trigger] --> B
+ end
+
+ subgraph "Report Generation (Async Worker)"
+ B --> D[Report Generator
services/reporting/generator.py]
+ D -->|1. Collect| E[Data Collector
services/reporting/collector.py]
+ E -->|queries| F[(trading_decisions
orders, positions
portfolio_snapshots
recommendations)]
+ D -->|2. Build sections| G[Section Builder
services/reporting/sections.py]
+ G -->|P&L, accuracy,
positions, risk,
model quality| H[Report Sections]
+ D -->|3. Validate| I[Report Validator
services/reporting/validator.py]
+ I -->|cross-check| J[(prediction_outcomes
model_metric_snapshots)]
+ D -->|4. Summarize| K[AI Summarizer
services/reporting/summarizer.py]
+ K -->|chunk & summarize| L[Report_Summarizer_Agent
via AgentConfigResolver
+ llm_factory]
+ D -->|5. Store| M[(trading_reports table)]
+ end
+
+ subgraph "API Layer"
+ N[GET /api/reports] -->|paginated list| M
+ O[GET /api/reports/:id] -->|full report| M
+ end
+
+ subgraph "Frontend"
+ P[useReports hook] --> N
+ Q[useReport hook] --> O
+ end
+```
+
+### Scheduling Strategy
+
+| Component | Trigger | Cadence |
+|-----------|---------|---------|
+| Daily Report | Scheduler after 16:30 ET | Every trading day |
+| Weekly Report | Scheduler on Saturday | Weekly (Mon–Fri coverage) |
+| Report Generator Worker | Redis queue consumer | On-demand from queue |
+| AI Summarizer | Called by generator | Per report section |
+
+### Chunking Strategy
+
+The `qwen3.5:9b-fast` model has an 8k-token context window. With the system prompt (~200 tokens) and response budget (~200 tokens), roughly 7,600 tokens remain for input. At ~4 chars/token for structured data, that's ~30,400 characters. The 6,000-character chunk limit provides a 5x safety margin to account for JSON overhead, prompt framing, and tokenization variance.
+
+```mermaid
+flowchart LR
+ A[Section Data
e.g. 15,000 chars] --> B{> 6,000 chars?}
+ B -->|No| C[Single LLM call
→ summary]
+ B -->|Yes| D[Split into chunks
≤ 6,000 chars each]
+ D --> E[Chunk 1 → LLM → summary 1]
+ D --> F[Chunk 2 → LLM → summary 2]
+ D --> G[Chunk N → LLM → summary N]
+ E --> H[Merge summaries
→ final LLM call
→ section summary]
+ F --> H
+ G --> H
+```
+
+---
+
+## Components and Interfaces
+
+### New Modules
+
+| Module | File | Responsibility |
+|--------|------|----------------|
+| Report Data Collector | `services/reporting/collector.py` | Queries trading data for a reporting period |
+| Report Section Builder | `services/reporting/sections.py` | Builds structured report sections from raw data |
+| Report Validator | `services/reporting/validator.py` | Cross-checks metrics against validation tables |
+| AI Summarizer | `services/reporting/summarizer.py` | Chunks data and generates AI summaries |
+| Report Generator | `services/reporting/generator.py` | Orchestrates the full report generation pipeline |
+| Report Models | `services/reporting/models.py` | Pydantic models for report structure and serialization |
+
+### Modified Modules
+
+| Module | File | Changes |
+|--------|------|---------|
+| Query API | `services/api/app.py` | 2 new `/api/reports` endpoints |
+| Redis Keys | `services/shared/redis_keys.py` | New `QUEUE_REPORT_GENERATION` constant |
+| Frontend Hooks | `frontend/src/api/hooks.ts` | 2 new report hooks |
+| DB Migration | `infra/migrations/038_trading_reports.sql` | New table + agent seed |
+
+### Component Interface Details
+
+#### 1. Report Models (`services/reporting/models.py`)
+
+```python
+from __future__ import annotations
+from datetime import date, datetime
+from enum import Enum
+from typing import Optional
+from pydantic import BaseModel, Field
+
+
+class ReportType(str, Enum):
+ DAILY = "daily"
+ WEEKLY = "weekly"
+
+
+class ValidationStatus(str, Enum):
+ PASSED = "passed"
+ WARNINGS = "warnings"
+
+
+class ValidationWarning(BaseModel):
+ field_name: str
+ computed_value: float
+ snapshot_value: float
+ pct_difference: float
+
+
+class PLSection(BaseModel):
+ realized_pnl: float
+ unrealized_pnl: float
+ daily_return: float
+ cumulative_return: float
+ win_count: int
+ loss_count: int
+ win_rate: float
+ profit_factor: float
+ sharpe_ratio: float
+ summary: str = ""
+ validation_warnings: list[ValidationWarning] = Field(default_factory=list)
+
+
+class RecommendationAccuracySection(BaseModel):
+ total_evaluated: int
+ act_count: int
+ skip_count: int
+ acted_win_rate: float
+ avg_confidence_acted: float
+ avg_confidence_skipped: float
+ summary: str = ""
+ validation_warnings: list[ValidationWarning] = Field(default_factory=list)
+
+
+class PositionDetail(BaseModel):
+ ticker: str
+ entry_price: float
+ current_or_exit_price: float
+ pnl: float
+ pnl_pct: float
+ hold_duration_hours: float
+ status: str # "open" or "closed"
+
+
+class PositionPerformanceSection(BaseModel):
+ positions: list[PositionDetail] = Field(default_factory=list)
+ summary: str = ""
+
+
+class RiskMetricsSection(BaseModel):
+ current_risk_tier: str
+ portfolio_heat: float
+ max_drawdown: float
+ current_drawdown_pct: float
+ reserve_pool_balance: float
+ circuit_breaker_event_count: int
+ summary: str = ""
+
+
+class ModelQualityWindow(BaseModel):
+ lookback: str
+ win_rate: float | None
+ directional_accuracy: float | None
+ information_coefficient: float | None
+ calibration_error: float | None
+ brier_score: float | None
+
+
+class ModelQualitySection(BaseModel):
+ windows: list[ModelQualityWindow] = Field(default_factory=list)
+ summary: str = ""
+ validation_warnings: list[ValidationWarning] = Field(default_factory=list)
+
+
+class ReportData(BaseModel):
+ """Top-level report structure stored as JSONB."""
+ pnl: PLSection
+ recommendation_accuracy: RecommendationAccuracySection
+ position_performance: PositionPerformanceSection
+ risk_metrics: RiskMetricsSection
+ model_quality: ModelQualitySection
+ executive_summary: str = ""
+ validation_status: ValidationStatus = ValidationStatus.PASSED
+ generated_at: datetime
+ period_start: date
+ period_end: date
+ report_type: ReportType
+```
+
+#### 2. Report Data Collector (`services/reporting/collector.py`)
+
+```python
+from __future__ import annotations
+from dataclasses import dataclass
+from datetime import date, datetime
+import asyncpg
+
+
+@dataclass
+class CollectedData:
+ """Raw data collected for a reporting period."""
+ trading_decisions: list[dict]
+ orders: list[dict]
+ open_positions: list[dict]
+ closed_positions: list[dict]
+ portfolio_snapshot: dict | None
+ previous_portfolio_snapshot: dict | None
+ recommendations: list[dict]
+ prediction_outcomes: list[dict]
+ model_metric_snapshots: list[dict]
+ circuit_breaker_events: list[dict]
+ reserve_pool_balance: float
+
+
+async def collect_report_data(
+ pool: asyncpg.Pool,
+ period_start: date,
+ period_end: date,
+) -> CollectedData:
+ """Query all trading data for the reporting period.
+
+ Queries: trading_decisions, orders, positions, portfolio_snapshots,
+ recommendations, prediction_outcomes, model_metric_snapshots,
+ circuit_breaker_events, reserve_pool_ledger.
+
+ Returns CollectedData with all raw query results.
+ If no trading_decisions exist, returns empty lists (zero-activity).
+ """
+ ...
+```
+
+#### 3. Report Section Builder (`services/reporting/sections.py`)
+
+```python
+from __future__ import annotations
+from services.reporting.models import (
+ PLSection, RecommendationAccuracySection,
+ PositionPerformanceSection, PositionDetail,
+ RiskMetricsSection, ModelQualitySection, ModelQualityWindow,
+)
+from services.reporting.collector import CollectedData
+
+
+def build_pnl_section(data: CollectedData) -> PLSection:
+ """Build P&L section from collected data.
+
+ Computes realized/unrealized P&L, daily return, cumulative return,
+ win/loss counts, win rate, profit factor, and Sharpe ratio from
+ portfolio_snapshot and closed positions.
+ """
+ ...
+
+
+def build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection:
+ """Build recommendation accuracy section.
+
+ Joins trading_decisions with prediction_outcomes to compute
+ act/skip breakdown, win rate of acted recommendations, and
+ average confidence of acted vs skipped.
+ """
+ ...
+
+
+def build_position_performance_section(data: CollectedData) -> PositionPerformanceSection:
+ """Build position performance section.
+
+ Lists each position (open and closed) with entry price,
+ current/exit price, P&L, P&L%, and hold duration.
+ """
+ ...
+
+
+def build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection:
+ """Build risk metrics section.
+
+ Extracts current risk tier, portfolio heat, max drawdown,
+ current drawdown %, reserve pool balance, and circuit breaker
+ event count from collected data.
+ """
+ ...
+
+
+def build_model_quality_section(data: CollectedData) -> ModelQualitySection:
+ """Build model quality section.
+
+ Extracts latest model_metric_snapshot values for 7d, 30d, 90d
+ lookback windows.
+ """
+ ...
+```
+
+#### 4. Report Validator (`services/reporting/validator.py`)
+
+```python
+from __future__ import annotations
+import asyncpg
+from services.reporting.models import (
+ ReportData, ValidationStatus, ValidationWarning,
+)
+
+
+DISCREPANCY_THRESHOLD_PCT = 5.0
+
+
+def validate_recommendation_accuracy(
+ section: "RecommendationAccuracySection",
+ prediction_outcomes: list[dict],
+) -> list[ValidationWarning]:
+ """Cross-reference reported win rates with prediction_outcomes.
+
+ Compares computed win rate against direction_correct/profitable
+ fields from prediction_outcomes for the same tickers and period.
+ Returns warnings for discrepancies > 5%.
+ """
+ ...
+
+
+def validate_model_quality(
+ section: "ModelQualitySection",
+ metric_snapshots: list[dict],
+) -> list[ValidationWarning]:
+ """Compare reported model quality metrics against model_metric_snapshots.
+
+ Flags discrepancies > 5% between computed and snapshot values
+ for win_rate, directional_accuracy, IC, ECE, and Brier score.
+ """
+ ...
+
+
+def compute_validation_status(report: ReportData) -> ValidationStatus:
+ """Determine overall validation status.
+
+ Returns 'passed' if no warnings across all sections,
+ 'warnings' if any section has validation warnings.
+ """
+ ...
+```
+
+#### 5. AI Summarizer (`services/reporting/summarizer.py`)
+
+```python
+from __future__ import annotations
+import asyncpg
+from services.shared.agent_config import AgentConfigResolver
+
+
+CHUNK_SIZE_LIMIT = 6000 # characters per chunk
+MAX_SUMMARY_WORDS = 200 # per section summary
+MAX_EXECUTIVE_SUMMARY_WORDS = 300
+
+
+def chunk_data(serialized: str, max_chars: int = CHUNK_SIZE_LIMIT) -> list[str]:
+ """Split serialized data into chunks of at most max_chars.
+
+ Splits on newline boundaries to avoid breaking JSON structures.
+ Each chunk is ≤ max_chars characters.
+ Returns at least one chunk (even if empty input).
+ """
+ ...
+
+
+async def summarize_section(
+ pool: asyncpg.Pool,
+ resolver: AgentConfigResolver,
+ section_name: str,
+ section_data: str,
+) -> str:
+ """Generate AI summary for a report section.
+
+ 1. Serialize section data to string
+ 2. Chunk if > CHUNK_SIZE_LIMIT
+ 3. Summarize each chunk via Report_Summarizer_Agent
+ 4. If multiple chunks, merge summaries with a final LLM call
+ 5. Log each invocation to agent_performance_log
+ 6. On failure after max_retries, fall back to deterministic summary
+
+ Uses AgentConfigResolver to resolve agent config by slug
+ 'report-summarizer', then llm_factory to build the LLM client.
+ """
+ ...
+
+
+def build_deterministic_summary(section_name: str, section_data: dict) -> str:
+ """Build a fallback deterministic summary from raw metrics.
+
+ Produces a template-based text summary when AI summarization fails.
+ """
+ ...
+
+
+async def generate_executive_summary(
+ pool: asyncpg.Pool,
+ resolver: AgentConfigResolver,
+ section_summaries: dict[str, str],
+) -> str:
+ """Generate executive summary from all section summaries.
+
+ Concatenates section summaries, chunks if needed, and produces
+ a ≤300-word synthesis via the Report_Summarizer_Agent.
+ Falls back to concatenated section summaries on failure.
+ """
+ ...
+```
+
+#### 6. Report Generator (`services/reporting/generator.py`)
+
+```python
+from __future__ import annotations
+from datetime import date
+import asyncpg
+from services.reporting.models import ReportData, ReportType
+
+
+async def generate_report(
+ pool: asyncpg.Pool,
+ report_type: ReportType,
+ period_start: date,
+ period_end: date,
+) -> ReportData:
+ """Orchestrate full report generation.
+
+ 1. Collect data via collector
+ 2. Build sections via section builder
+ 3. Validate sections via validator
+ 4. Generate AI summaries via summarizer
+ 5. Generate executive summary
+ 6. Assemble final ReportData
+ """
+ ...
+
+
+async def store_report(
+ pool: asyncpg.Pool,
+ report: ReportData,
+) -> str:
+ """Store report in trading_reports table.
+
+ Uses INSERT ... ON CONFLICT (report_type, period_start, period_end)
+ DO UPDATE to handle regeneration of existing reports.
+
+ Returns the report UUID.
+ """
+ ...
+
+
+async def process_report_job(
+ pool: asyncpg.Pool,
+ job: dict,
+) -> None:
+ """Process a report generation job from the Redis queue.
+
+ Deserializes job payload, calls generate_report + store_report.
+ Handles retries with exponential backoff (up to 3 attempts).
+ Rejects duplicate jobs for the same report_type + period.
+ """
+ ...
+```
+
+#### 7. API Endpoints (added to `services/api/app.py`)
+
+| Endpoint | Method | Parameters | Returns |
+|----------|--------|------------|---------|
+| `GET /api/reports` | GET | `report_type`, `start_date`, `end_date`, `limit`, `offset` | Paginated list: id, report_type, period_start, period_end, validation_status, generated_at |
+| `GET /api/reports/{report_id}` | GET | — | Full report including report_data JSONB |
+
+#### 8. Frontend Hooks (added to `frontend/src/api/hooks.ts`)
+
+```typescript
+export interface ReportListItem {
+ id: string;
+ report_type: string;
+ period_start: string;
+ period_end: string;
+ validation_status: string;
+ generated_at: string;
+}
+
+export interface ReportDetail extends ReportListItem {
+ report_data: Record;
+ created_at: string;
+}
+
+export function useReports(params?: {
+ report_type?: string;
+ start_date?: string;
+ end_date?: string;
+ limit?: number;
+ offset?: number;
+}) {
+ const qs = new URLSearchParams();
+ if (params?.report_type) qs.set('report_type', params.report_type);
+ if (params?.start_date) qs.set('start_date', params.start_date);
+ if (params?.end_date) qs.set('end_date', params.end_date);
+ if (params?.limit) qs.set('limit', String(params.limit));
+ if (params?.offset) qs.set('offset', String(params.offset));
+ const path = `/api/reports${qs.toString() ? '?' + qs : ''}`;
+ return useGet(['reports', params], 'query', path);
+}
+
+export function useReport(id: string | undefined) {
+ return useGet(
+ ['report', id], 'query', `/api/reports/${id}`, !!id
+ );
+}
+```
+
+---
+
+## Data Models
+
+### Database Schema (Migration 038)
+
+#### trading_reports
+
+```sql
+CREATE TABLE IF NOT EXISTS trading_reports (
+ id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+ report_type VARCHAR(20) NOT NULL,
+ period_start DATE NOT NULL,
+ period_end DATE NOT NULL,
+ report_data JSONB NOT NULL,
+ validation_status VARCHAR(20) NOT NULL DEFAULT 'passed',
+ generated_at TIMESTAMPTZ NOT NULL,
+ created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+ CONSTRAINT uq_trading_reports_period UNIQUE (report_type, period_start, period_end),
+ CONSTRAINT chk_report_type CHECK (report_type IN ('daily', 'weekly'))
+);
+
+CREATE INDEX IF NOT EXISTS idx_trading_reports_type ON trading_reports(report_type);
+CREATE INDEX IF NOT EXISTS idx_trading_reports_period ON trading_reports(period_start, period_end);
+CREATE INDEX IF NOT EXISTS idx_trading_reports_generated ON trading_reports(generated_at DESC);
+```
+
+#### Report Summarizer Agent Seed
+
+```sql
+INSERT INTO ai_agents (name, slug, purpose, model_provider, model_name, system_prompt, prompt_version, schema_version, temperature, max_tokens, timeout_seconds, max_retries, source)
+SELECT * FROM (VALUES
+ (
+ 'Report Summarizer',
+ 'report-summarizer',
+ 'Generates concise natural-language summaries of trading performance report sections. Processes chunked data within the 8k-token context window.',
+ 'ollama',
+ 'qwen3.5:9b-fast',
+ E'You are a concise financial performance analyst. You summarize trading performance data into clear, professional prose.\n\nSTRICT RULES:\n1. Do NOT fabricate any data not present in the input.\n2. Do NOT add opinions, predictions, or recommendations.\n3. Keep each summary under 200 words.\n4. Highlight notable trends, outliers, and changes from prior periods.\n5. Use precise numbers from the input data.\n6. Use a neutral, professional tone.\n7. Return ONLY the summary text. No JSON, no markdown, no commentary.',
+ 'report-summarizer-v1',
+ '1.0.0',
+ 0.0,
+ 1024,
+ 60,
+ 2,
+ 'system'
+ )
+) AS v(name, slug, purpose, model_provider, model_name, system_prompt, prompt_version, schema_version, temperature, max_tokens, timeout_seconds, max_retries, source)
+WHERE NOT EXISTS (SELECT 1 FROM ai_agents WHERE slug = 'report-summarizer');
+```
+
+### Report JSONB Structure
+
+The `report_data` column stores a JSON object matching the `ReportData` Pydantic model:
+
+```json
+{
+ "pnl": {
+ "realized_pnl": 125.50,
+ "unrealized_pnl": -30.20,
+ "daily_return": 0.012,
+ "cumulative_return": 0.085,
+ "win_count": 8,
+ "loss_count": 3,
+ "win_rate": 0.727,
+ "profit_factor": 2.15,
+ "sharpe_ratio": 1.42,
+ "summary": "AI-generated summary...",
+ "validation_warnings": []
+ },
+ "recommendation_accuracy": {
+ "total_evaluated": 15,
+ "act_count": 8,
+ "skip_count": 7,
+ "acted_win_rate": 0.75,
+ "avg_confidence_acted": 0.72,
+ "avg_confidence_skipped": 0.48,
+ "summary": "AI-generated summary...",
+ "validation_warnings": []
+ },
+ "position_performance": {
+ "positions": [
+ {
+ "ticker": "AAPL",
+ "entry_price": 185.50,
+ "current_or_exit_price": 192.30,
+ "pnl": 68.00,
+ "pnl_pct": 3.66,
+ "hold_duration_hours": 72.5,
+ "status": "open"
+ }
+ ],
+ "summary": "AI-generated summary..."
+ },
+ "risk_metrics": {
+ "current_risk_tier": "moderate",
+ "portfolio_heat": 0.12,
+ "max_drawdown": 0.08,
+ "current_drawdown_pct": 0.03,
+ "reserve_pool_balance": 450.00,
+ "circuit_breaker_event_count": 1,
+ "summary": "AI-generated summary..."
+ },
+ "model_quality": {
+ "windows": [
+ {
+ "lookback": "7d",
+ "win_rate": 0.65,
+ "directional_accuracy": 0.62,
+ "information_coefficient": 0.08,
+ "calibration_error": 0.12,
+ "brier_score": 0.22
+ }
+ ],
+ "summary": "AI-generated summary...",
+ "validation_warnings": []
+ },
+ "executive_summary": "AI-generated executive summary...",
+ "validation_status": "passed",
+ "generated_at": "2025-01-15T21:30:00Z",
+ "period_start": "2025-01-15",
+ "period_end": "2025-01-15",
+ "report_type": "daily"
+}
+```
+
+
+---
+
+## Correctness Properties
+
+*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
+
+The following properties were derived from the acceptance criteria through systematic prework analysis. After reflection, 5 unique properties remain. Report section structure checks (3.1–3.5) are subsumed by the round-trip property — if a ReportData object survives serialization and deserialization, its structure is correct by construction (Pydantic enforces required fields). Validation status computation (4.4) is subsumed by the discrepancy detection property. ISO 8601 datetime formatting (8.4) is verified as part of the round-trip property since Pydantic's JSON serialization uses ISO 8601 by default and the round-trip would fail if datetimes were mangled.
+
+### Property 1: Chunking Round-Trip and Size Constraint
+
+*For any* input string, splitting it into chunks with a maximum size limit SHALL produce chunks where (a) every chunk is ≤ the size limit in characters, (b) no chunk is empty (except when the input itself is empty, which produces exactly one empty chunk), and (c) concatenating all chunks in order reconstructs the original input string.
+
+**Validates: Requirements 2.2**
+
+### Property 2: Report Serialization Round-Trip
+
+*For any* valid ReportData object (with valid P&L, recommendation accuracy, position performance, risk metrics, and model quality sections), serializing to JSON and then deserializing back SHALL produce a ReportData object equivalent to the original. All datetime fields in the serialized JSON SHALL be in ISO 8601 format.
+
+**Validates: Requirements 8.1, 8.2, 8.3, 8.4**
+
+### Property 3: Validation Discrepancy Detection Correctness
+
+*For any* pair of computed metric value and snapshot metric value (both finite, non-negative floats), the validation function SHALL produce a warning if and only if the percentage difference exceeds 5%. The percentage difference SHALL be computed as `|computed - snapshot| / snapshot * 100` when snapshot > 0, and SHALL flag any non-zero computed value when snapshot is 0.
+
+**Validates: Requirements 4.1, 4.2, 4.3, 4.4**
+
+### Property 4: Recommendation Accuracy Aggregation
+
+*For any* non-empty list of trading decisions with associated prediction outcomes (each having a boolean `direction_correct`, boolean `profitable`, and float `excess_return_vs_spy`), the computed win rate SHALL equal the count of profitable outcomes divided by total outcomes, the directional accuracy SHALL equal the count of direction-correct outcomes divided by total outcomes, and the average excess return SHALL equal the arithmetic mean of all excess_return_vs_spy values. All three values SHALL be in [0.0, 1.0] for rates and finite for the average.
+
+**Validates: Requirements 1.4**
+
+### Property 5: Portfolio Period-Over-Period Delta Computation
+
+*For any* two valid portfolio snapshots (current and previous) with non-negative portfolio_value, active_pool, reserve_pool, and finite cumulative_return, the period-over-period deltas SHALL equal (current - previous) for each field. When no previous snapshot exists, the deltas SHALL be zero.
+
+**Validates: Requirements 1.3**
+
+---
+
+## Error Handling
+
+### Data Collection Failures
+
+| Scenario | Handling |
+|----------|----------|
+| No trading_decisions for period | Generate zero-activity report with note "No trading activity during this period" |
+| No portfolio_snapshot for period | Use most recent snapshot before period_start; if none exists, use zero values |
+| No prediction_outcomes for period | Skip recommendation accuracy validation; set validation_warnings noting missing data |
+| No model_metric_snapshots for period | Model quality section shows NULL values for all metrics |
+| Database connection failure during collection | Propagate error to job processor for retry |
+
+### AI Summarization Failures
+
+| Scenario | Handling |
+|----------|----------|
+| LLM timeout (>60s) | Retry up to max_retries (from agent config, default 2) |
+| LLM returns empty response | Treat as failure, retry |
+| LLM returns response > 200 words | Truncate to 200 words at sentence boundary |
+| All LLM retries exhausted | Fall back to deterministic template summary |
+| AgentConfigResolver returns None (agent not found) | Log error, use deterministic summary for all sections |
+| Chunk merge LLM call fails | Use concatenation of chunk summaries (joined with newlines) |
+
+### Validation Edge Cases
+
+| Scenario | Handling |
+|----------|----------|
+| Snapshot value is 0 and computed value is non-zero | Flag as warning with pct_difference = 100.0 |
+| Both snapshot and computed values are 0 | No warning (0% difference) |
+| Snapshot value is NULL | Skip validation for that metric, no warning |
+| Computed value is NaN or infinity | Replace with 0.0, log warning |
+| No prediction_outcomes to cross-reference | Skip recommendation accuracy validation entirely |
+
+### Report Storage Failures
+
+| Scenario | Handling |
+|----------|----------|
+| Unique constraint violation on insert | Use ON CONFLICT DO UPDATE to upsert |
+| JSONB serialization failure | Log error with report structure, propagate to job processor |
+| Report exceeds PostgreSQL JSONB size limit (~255 MB) | Extremely unlikely given report structure; log error if it occurs |
+
+### Job Processing Failures
+
+| Scenario | Handling |
+|----------|----------|
+| Job fails on first attempt | Retry with exponential backoff: 30s, 60s, 120s |
+| Job fails after 3 retries | Mark job as failed, log error with full context |
+| Duplicate job submitted for same period | Reject with log message, return without error |
+| Redis connection failure | Job stays in queue, picked up on reconnection |
+
+---
+
+## Testing Strategy
+
+### Property-Based Tests (Hypothesis)
+
+Property-based tests use the Hypothesis library with `@settings(max_examples=100)`. Test files are prefixed `test_pbt_*` per project convention.
+
+| Property | Test File | What It Tests |
+|----------|-----------|---------------|
+| Property 1: Chunking Round-Trip | `tests/test_pbt_report_chunking.py` | `chunk_data()` preserves content and respects size limits |
+| Property 2: Report Serialization Round-Trip | `tests/test_pbt_report_serialization.py` | `ReportData.model_dump_json()` → `ReportData.model_validate_json()` round-trip |
+| Property 3: Validation Discrepancy Detection | `tests/test_pbt_report_validation.py` | Discrepancy detection correctly flags >5% differences |
+| Property 4: Recommendation Accuracy Aggregation | `tests/test_pbt_report_sections.py` | `build_recommendation_accuracy_section()` computes correct aggregates |
+| Property 5: Portfolio Delta Computation | `tests/test_pbt_report_sections.py` | `build_pnl_section()` computes correct period-over-period deltas |
+
+Each property test is tagged with a comment referencing the design property:
+```python
+# Feature: trading-feedback-engine, Property 1: Chunking round-trip and size constraint
+```
+
+### Unit Tests (pytest)
+
+| Test File | Coverage |
+|-----------|----------|
+| `tests/test_report_sections.py` | Section builders with known inputs, edge cases (empty data, single position, zero-activity) |
+| `tests/test_report_validator.py` | Specific discrepancy scenarios, boundary cases (exactly 5%), NULL snapshot values |
+| `tests/test_report_summarizer.py` | Deterministic fallback summary, chunk splitting edge cases (empty input, single char) |
+| `tests/test_report_models.py` | Pydantic model validation, enum constraints, default values |
+| `tests/test_report_generator.py` | Orchestration with mocked dependencies, zero-activity report, upsert behavior |
+
+### Integration Tests
+
+| Test File | Coverage |
+|-----------|----------|
+| `tests/test_report_api.py` | API endpoints with seeded database, pagination, filtering by report_type and date range |
+| `tests/test_report_storage.py` | Store/retrieve round-trip against real asyncpg pool, upsert behavior, unique constraint |
+
+### Frontend Tests (Vitest)
+
+| Test File | Coverage |
+|-----------|----------|
+| `frontend/src/test/reports.test.ts` | useReports and useReport hooks with MSW mocks, loading/error states |
+
+### Test Configuration
+
+- Python PBT: Hypothesis with `@settings(max_examples=100)`, files prefixed `test_pbt_*`
+- Python unit/integration: pytest with pytest-asyncio for async code
+- Frontend: Vitest with MSW for deterministic API mocking
+- Lint: `ruff check services/` before all commits
+- CI: Woodpecker runs all tests automatically on push to Gitea
diff --git a/.kiro/specs/trading-feedback-engine/requirements.md b/.kiro/specs/trading-feedback-engine/requirements.md
new file mode 100644
index 0000000..6fce314
--- /dev/null
+++ b/.kiro/specs/trading-feedback-engine/requirements.md
@@ -0,0 +1,117 @@
+# Requirements Document
+
+## Introduction
+
+The Trading Feedback Engine generates periodic performance reports from the Stonks Oracle trading system. Reports cover trading P&L, recommendation accuracy, position performance, risk metrics, and model quality trends. An AI agent (registered in the `ai_agents` table) summarizes sections of the report by processing data in small chunks that fit within the 8k-token context window. Reports are validated against live data from the prediction outcomes and model metric snapshots tables, stored in the database for retrieval, and exposed via API endpoints.
+
+## Glossary
+
+- **Feedback_Engine**: The backend service that orchestrates report generation, data collection, AI summarization, and report storage.
+- **Report_Summarizer_Agent**: The AI agent registered in the `ai_agents` table that generates natural-language summaries for report sections. Uses the existing `AgentConfigResolver` and `llm_factory` infrastructure.
+- **Report**: A structured JSON document containing trading performance metrics, AI-generated summaries, and validation data for a specific period (daily or weekly).
+- **Report_Section**: A self-contained portion of a report (e.g., P&L summary, recommendation accuracy, position performance) that can be independently generated and summarized.
+- **Chunk**: A subset of data rows small enough to fit within the 8k-token context window when serialized, allowing the Report_Summarizer_Agent to process it in a single LLM call.
+- **Portfolio_Snapshot**: A daily record in the `portfolio_snapshots` table containing portfolio value, pool balances, returns, win/loss counts, Sharpe ratio, max drawdown, and risk tier.
+- **Prediction_Outcome**: A record in the `prediction_outcomes` table containing realized returns, direction correctness, and excess returns vs benchmarks for a prediction at a specific horizon.
+- **Model_Metric_Snapshot**: A record in the `model_metric_snapshots` table containing aggregate model quality metrics (win rate, IC, ECE, Brier score) for a lookback/horizon combination.
+- **Trading_Decision**: A record in the `trading_decisions` table capturing the act/skip decision, skip reason, position sizing, risk tier, circuit breaker status, and decision trace for a recommendation evaluation.
+- **Validation_Data**: Live data from `prediction_outcomes`, `model_metric_snapshots`, and `signal_evidence_links` used to cross-check report claims against actual measured performance.
+- **Query_API**: The existing FastAPI service (`services/api/app.py`) that serves HTTP endpoints for the dashboard and external consumers.
+
+## Requirements
+
+### Requirement 1: Report Data Collection
+
+**User Story:** As a trader, I want the feedback engine to collect all relevant trading data for a reporting period, so that reports reflect the complete picture of trading activity.
+
+#### Acceptance Criteria
+
+1. WHEN a report generation is triggered for a date range, THE Feedback_Engine SHALL query trading_decisions, orders, positions, portfolio_snapshots, recommendations, prediction_outcomes, and model_metric_snapshots for that period.
+2. WHEN collecting trading decision data, THE Feedback_Engine SHALL include the decision type, skip reason, ticker, computed position size, risk tier, circuit breaker status, and correlation check result for each Trading_Decision.
+3. WHEN collecting portfolio data, THE Feedback_Engine SHALL retrieve the most recent Portfolio_Snapshot within the reporting period and compute period-over-period changes in portfolio value, active pool, reserve pool, and cumulative return.
+4. WHEN collecting recommendation accuracy data, THE Feedback_Engine SHALL join recommendations with Prediction_Outcomes to compute win rate, directional accuracy, and average excess return vs SPY for the period.
+5. IF no trading_decisions exist for the requested period, THEN THE Feedback_Engine SHALL generate a report with zero-activity sections and a note indicating no trading occurred.
+
+### Requirement 2: Chunked AI Summarization
+
+**User Story:** As a trader, I want AI-generated summaries in my reports, so that I can quickly understand performance trends without reading raw numbers.
+
+#### Acceptance Criteria
+
+1. THE Report_Summarizer_Agent SHALL be registered in the `ai_agents` table with slug `report-summarizer`, model `qwen3.5:9b-fast`, and source `system`.
+2. WHEN generating a summary for a Report_Section, THE Feedback_Engine SHALL serialize the section data into Chunks of no more than 6,000 characters each to stay within the 8k-token context window.
+3. WHEN a Report_Section contains data that exceeds a single Chunk, THE Feedback_Engine SHALL split the data into multiple Chunks, summarize each Chunk independently, and then produce a final merged summary from the individual Chunk summaries.
+4. WHEN invoking the Report_Summarizer_Agent, THE Feedback_Engine SHALL use the existing `AgentConfigResolver` and `llm_factory` infrastructure to resolve model configuration and build the LLM client.
+5. WHEN invoking the Report_Summarizer_Agent, THE Feedback_Engine SHALL log each invocation to the `agent_performance_log` table with agent_id, success status, duration_ms, and token estimates.
+6. IF the Report_Summarizer_Agent fails after max_retries, THEN THE Feedback_Engine SHALL fall back to a deterministic text summary built from the raw metrics and continue report generation.
+
+### Requirement 3: Report Structure and Content
+
+**User Story:** As a trader, I want reports to cover P&L, recommendation accuracy, position performance, risk metrics, and model quality, so that I have a comprehensive view of system performance.
+
+#### Acceptance Criteria
+
+1. THE Report SHALL contain a P&L section with realized P&L, unrealized P&L, daily return, cumulative return, win count, loss count, win rate, profit factor, and Sharpe ratio for the reporting period.
+2. THE Report SHALL contain a recommendation accuracy section with total recommendations evaluated, act/skip breakdown, win rate of acted-upon recommendations, and average confidence of acted vs skipped recommendations.
+3. THE Report SHALL contain a position performance section listing each position held during the period with ticker, entry price, current or exit price, unrealized or realized P&L, P&L percentage, and hold duration.
+4. THE Report SHALL contain a risk metrics section with current risk tier, portfolio heat, max drawdown, current drawdown percentage, reserve pool balance, and a count of circuit breaker events during the period.
+5. THE Report SHALL contain a model quality section with the latest Model_Metric_Snapshot values for win rate, directional accuracy, information coefficient, calibration error (ECE), and Brier score across the 7d, 30d, and 90d lookback windows.
+6. THE Report SHALL contain an AI-generated executive summary that synthesizes the key findings from all sections into a concise narrative of no more than 300 words.
+
+### Requirement 4: Report Validation Against Live Data
+
+**User Story:** As a trader, I want report metrics to be cross-checked against live validation data, so that I can trust the accuracy of the reported numbers.
+
+#### Acceptance Criteria
+
+1. WHEN generating the recommendation accuracy section, THE Feedback_Engine SHALL cross-reference reported win rates with the `direction_correct` and `profitable` fields from Prediction_Outcomes for the same tickers and period.
+2. WHEN generating the model quality section, THE Feedback_Engine SHALL compare the reported metrics against the most recent Model_Metric_Snapshot records and flag discrepancies greater than 5% between computed and snapshot values.
+3. WHEN a validation discrepancy is detected, THE Feedback_Engine SHALL include a `validation_warnings` array in the report section with the field name, computed value, snapshot value, and percentage difference.
+4. THE Report SHALL include a `validation_status` field set to `passed` when no discrepancies exceed 5%, or `warnings` when one or more discrepancies are detected.
+
+### Requirement 5: Report Storage and Retrieval
+
+**User Story:** As a trader, I want reports stored in the database and accessible via API, so that I can review historical performance at any time.
+
+#### Acceptance Criteria
+
+1. THE Feedback_Engine SHALL store each generated Report as a row in a `trading_reports` table with columns for id (UUID), report_type (daily/weekly), period_start (DATE), period_end (DATE), report_data (JSONB), validation_status (VARCHAR), generated_at (TIMESTAMPTZ), and created_at (TIMESTAMPTZ).
+2. THE Feedback_Engine SHALL enforce a unique constraint on (report_type, period_start, period_end) to prevent duplicate reports for the same period.
+3. WHEN a report for an existing period is regenerated, THE Feedback_Engine SHALL update the existing row with the new report_data, validation_status, and generated_at timestamp.
+4. THE Query_API SHALL expose a `GET /api/reports` endpoint that returns a paginated list of reports with id, report_type, period_start, period_end, validation_status, and generated_at.
+5. THE Query_API SHALL expose a `GET /api/reports/{report_id}` endpoint that returns the full report including report_data JSONB.
+6. THE Query_API SHALL support filtering reports by report_type and date range via query parameters on the `GET /api/reports` endpoint.
+
+### Requirement 6: Periodic Report Generation
+
+**User Story:** As a trader, I want reports generated automatically on a daily and weekly schedule, so that I always have up-to-date performance feedback.
+
+#### Acceptance Criteria
+
+1. THE Feedback_Engine SHALL generate a daily report after market close (after 16:30 ET) covering the current trading day.
+2. THE Feedback_Engine SHALL generate a weekly report on Saturday covering the Monday-through-Friday trading week.
+3. WHEN a scheduled report generation is triggered, THE Feedback_Engine SHALL enqueue a report generation job on a Redis queue for asynchronous processing.
+4. IF a report generation job fails, THEN THE Feedback_Engine SHALL retry the job up to 3 times with exponential backoff before marking the job as failed.
+5. WHILE a report generation job is in progress for a given period, THE Feedback_Engine SHALL reject duplicate job submissions for the same report_type and period.
+
+### Requirement 7: Agent Registration and Editability
+
+**User Story:** As a trader, I want the report summarizer agent registered in the ai_agents table, so that I can edit its prompts, model, and parameters through the existing agent management API.
+
+#### Acceptance Criteria
+
+1. THE Feedback_Engine SHALL register the Report_Summarizer_Agent in the `ai_agents` table via a database migration with slug `report-summarizer`, source `system`, model_provider `ollama`, and model_name `qwen3.5:9b-fast`.
+2. THE Report_Summarizer_Agent system prompt SHALL instruct the model to produce concise financial performance summaries, avoid fabricating data not present in the input, and keep each summary under 200 words.
+3. THE Report_Summarizer_Agent SHALL support variant creation and activation through the existing agent variants system, allowing A/B testing of different summarization prompts.
+4. WHEN the Report_Summarizer_Agent configuration is updated via the agent management API, THE Feedback_Engine SHALL pick up the new configuration within 60 seconds via the `AgentConfigResolver` TTL cache.
+
+### Requirement 8: Report Serialization Round-Trip
+
+**User Story:** As a developer, I want report data to survive serialization and deserialization without data loss, so that stored reports are always faithful to the generated content.
+
+#### Acceptance Criteria
+
+1. THE Feedback_Engine SHALL serialize Report objects to JSON for storage in the `report_data` JSONB column.
+2. THE Feedback_Engine SHALL deserialize stored JSON back into Report objects for API responses.
+3. FOR ALL valid Report objects, serializing to JSON then deserializing back SHALL produce an equivalent Report object (round-trip property).
+4. THE Feedback_Engine SHALL use ISO 8601 format for all datetime fields in serialized reports.
diff --git a/.kiro/specs/trading-feedback-engine/tasks.md b/.kiro/specs/trading-feedback-engine/tasks.md
new file mode 100644
index 0000000..fff79af
--- /dev/null
+++ b/.kiro/specs/trading-feedback-engine/tasks.md
@@ -0,0 +1,195 @@
+# Implementation Plan: Trading Feedback Engine
+
+## Overview
+
+Add a periodic trading performance reporting system to Stonks Oracle. The system collects trading data, generates structured JSON reports with AI-powered summaries, validates metrics against live data, and stores reports for retrieval via API. Implementation follows the four-phase approach from the design: foundation → validation & AI → generator & API → scheduling & tests.
+
+## Tasks
+
+- [x] 1. Database migration 038 — trading_reports table and report-summarizer agent
+ - [x] 1.1 Create `infra/migrations/038_trading_reports.sql`
+ - Create `trading_reports` table with columns: id (UUID PK, gen_random_uuid()), report_type (VARCHAR(20) NOT NULL), period_start (DATE NOT NULL), period_end (DATE NOT NULL), report_data (JSONB NOT NULL), validation_status (VARCHAR(20) NOT NULL DEFAULT 'passed'), generated_at (TIMESTAMPTZ NOT NULL), created_at (TIMESTAMPTZ NOT NULL DEFAULT NOW())
+ - Add UNIQUE constraint on (report_type, period_start, period_end)
+ - Add CHECK constraint: report_type IN ('daily', 'weekly')
+ - Create indexes: idx_trading_reports_type, idx_trading_reports_period, idx_trading_reports_generated
+ - Seed Report_Summarizer_Agent into ai_agents table with slug 'report-summarizer', model_provider 'ollama', model_name 'qwen3.5:9b-fast', source 'system', temperature 0.0, max_tokens 1024, timeout_seconds 60, max_retries 2
+ - Use WHERE NOT EXISTS guard on agent insert to be idempotent
+ - _Requirements: 5.1, 5.2, 7.1, 7.2_
+
+ - [x] 1.2 Add `QUEUE_REPORT_GENERATION` constant to `services/shared/redis_keys.py`
+ - Add `QUEUE_REPORT_GENERATION = "report_generation"` following existing queue naming convention
+ - _Requirements: 6.3_
+
+- [x] 2. Phase 1 — Report models, data collector, and section builders
+ - [x] 2.1 Create report models (`services/reporting/models.py`)
+ - Create `services/reporting/__init__.py`
+ - Define enums: ReportType (daily, weekly), ValidationStatus (passed, warnings)
+ - Define Pydantic models: ValidationWarning, PLSection, RecommendationAccuracySection, PositionDetail, PositionPerformanceSection, RiskMetricsSection, ModelQualityWindow, ModelQualitySection, ReportData
+ - ReportData includes all sections, executive_summary, validation_status, generated_at, period_start, period_end, report_type
+ - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 8.1, 8.2, 8.4_
+
+ - [x] 2.2 Implement data collector (`services/reporting/collector.py`)
+ - Define CollectedData dataclass with fields: trading_decisions, orders, open_positions, closed_positions, portfolio_snapshot, previous_portfolio_snapshot, recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_balance
+ - Implement `collect_report_data(pool, period_start, period_end)` → CollectedData
+ - Query trading_decisions, orders, positions (open + closed), portfolio_snapshots (current + previous), recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_ledger for the period
+ - Return empty lists for tables with no data (zero-activity case)
+ - Use `_row_dict()` pattern for UUID conversion from asyncpg rows
+ - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
+
+ - [x] 2.3 Implement section builders (`services/reporting/sections.py`)
+ - Implement `build_pnl_section(data: CollectedData) -> PLSection` — compute realized/unrealized P&L, daily return, cumulative return, win/loss counts, win rate, profit factor, Sharpe ratio from portfolio_snapshot and closed positions
+ - Implement `build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection` — join trading_decisions with prediction_outcomes, compute act/skip breakdown, win rate of acted, avg confidence acted vs skipped
+ - Implement `build_position_performance_section(data: CollectedData) -> PositionPerformanceSection` — list each position with ticker, entry price, current/exit price, P&L, P&L%, hold duration
+ - Implement `build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection` — extract risk tier, portfolio heat, max drawdown, current drawdown %, reserve pool balance, circuit breaker event count
+ - Implement `build_model_quality_section(data: CollectedData) -> ModelQualitySection` — extract model_metric_snapshot values for 7d, 30d, 90d lookback windows
+ - Handle zero-activity gracefully (zero values, empty lists)
+ - _Requirements: 1.3, 1.4, 3.1, 3.2, 3.3, 3.4, 3.5_
+
+- [x] 3. Checkpoint — Verify foundation modules
+ - Ensure all tests pass, ask the user if questions arise.
+ - Run `.venv/bin/ruff check services/reporting/`
+ - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify models and section builders
+
+- [x] 4. Phase 2 — Report validator and AI summarizer
+ - [x] 4.1 Implement report validator (`services/reporting/validator.py`)
+ - Define `DISCREPANCY_THRESHOLD_PCT = 5.0`
+ - Implement `validate_recommendation_accuracy(section, prediction_outcomes)` → list[ValidationWarning] — compare computed win rate against direction_correct/profitable from prediction_outcomes, flag >5% discrepancies
+ - Implement `validate_model_quality(section, metric_snapshots)` → list[ValidationWarning] — compare reported metrics against model_metric_snapshots for win_rate, directional_accuracy, IC, ECE, Brier score, flag >5% discrepancies
+ - Implement `compute_validation_status(report: ReportData)` → ValidationStatus — return 'passed' if no warnings, 'warnings' if any section has validation_warnings
+ - Handle edge cases: snapshot=0 with computed≠0 → 100% difference; both=0 → no warning; snapshot=NULL → skip; computed=NaN → replace with 0.0
+ - _Requirements: 4.1, 4.2, 4.3, 4.4_
+
+ - [x] 4.2 Implement AI summarizer (`services/reporting/summarizer.py`)
+ - Define constants: CHUNK_SIZE_LIMIT=6000, MAX_SUMMARY_WORDS=200, MAX_EXECUTIVE_SUMMARY_WORDS=300
+ - Implement `chunk_data(serialized: str, max_chars: int)` → list[str] — split on newline boundaries, each chunk ≤ max_chars, at least one chunk returned
+ - Implement `summarize_section(pool, resolver, section_name, section_data)` → str — serialize, chunk if needed, summarize each chunk via Report_Summarizer_Agent (resolved by slug 'report-summarizer'), merge if multiple chunks, log to agent_performance_log, fall back to deterministic on failure
+ - Implement `build_deterministic_summary(section_name, section_data)` → str — template-based fallback summary from raw metrics
+ - Implement `generate_executive_summary(pool, resolver, section_summaries)` → str — concatenate section summaries, chunk if needed, produce ≤300-word synthesis, fall back to concatenation on failure
+ - Use AgentConfigResolver + llm_factory for LLM access
+ - Log each invocation to agent_performance_log with agent_id, success, duration_ms, token estimates
+ - _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.6_
+
+- [x] 5. Checkpoint — Verify validator and summarizer
+ - Ensure all tests pass, ask the user if questions arise.
+ - Run `.venv/bin/ruff check services/reporting/`
+ - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify validator and summarizer
+
+- [x] 6. Phase 3 — Report generator orchestrator and API endpoints
+ - [x] 6.1 Implement report generator (`services/reporting/generator.py`)
+ - Implement `generate_report(pool, report_type, period_start, period_end)` → ReportData — orchestrate: collect data → build sections → validate → summarize → assemble ReportData
+ - Implement `store_report(pool, report)` → str (UUID) — INSERT ... ON CONFLICT (report_type, period_start, period_end) DO UPDATE for upsert, return report id
+ - Implement `process_report_job(pool, job: dict)` → None — deserialize job payload, call generate_report + store_report, handle retries with exponential backoff (30s, 60s, 120s up to 3 attempts), reject duplicate jobs for same report_type + period
+ - _Requirements: 5.1, 5.2, 5.3, 6.3, 6.4, 6.5_
+
+ - [x] 6.2 Add API endpoints to `services/api/app.py`
+ - Add `GET /api/reports` — paginated list with query params: report_type, start_date, end_date, limit (default 20), offset (default 0); returns id, report_type, period_start, period_end, validation_status, generated_at
+ - Add `GET /api/reports/{report_id}` — full report including report_data JSONB
+ - Use asyncpg pool from existing app state
+ - Return 404 for non-existent report_id
+ - _Requirements: 5.4, 5.5, 5.6_
+
+ - [x] 6.3 Add frontend hooks to `frontend/src/api/hooks.ts`
+ - Add `ReportListItem` and `ReportDetail` TypeScript interfaces
+ - Implement `useReports(params?)` hook — builds query string from report_type, start_date, end_date, limit, offset; uses `useGet` with 'query' base
+ - Implement `useReport(id)` hook — fetches single report by id, enabled only when id is defined
+ - _Requirements: 5.4, 5.5_
+
+- [x] 7. Checkpoint — Verify generator and API
+ - Ensure all tests pass, ask the user if questions arise.
+ - Run `.venv/bin/ruff check services/`
+ - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify generator and API endpoints
+
+- [x] 8. Phase 4 — Scheduling, property-based tests, unit tests, and frontend tests
+ - [x] 8.1 Wire Redis queue integration and scheduler
+ - Add report generation job consumer to the scheduler service that listens on `stonks:queue:report_generation`
+ - Add daily report trigger (after 16:30 ET on trading days) and weekly report trigger (Saturday) to the scheduler
+ - Job payload: `{"report_type": "daily"|"weekly", "period_start": "YYYY-MM-DD", "period_end": "YYYY-MM-DD"}`
+ - _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_
+
+ - [x] 8.2 Write property test: Chunking Round-Trip and Size Constraint
+ - **Property 1: Chunking Round-Trip and Size Constraint**
+ - File: `tests/test_pbt_report_chunking.py`
+ - Use Hypothesis `@settings(max_examples=100)` with `@given(st.text())` and `@given(st.integers(min_value=1, max_value=10000))`
+ - Assert: every chunk ≤ max_chars, no empty chunks (except empty input → one empty chunk), concatenation of chunks == original input
+ - **Validates: Requirements 2.2**
+
+ - [x] 8.3 Write property test: Report Serialization Round-Trip
+ - **Property 2: Report Serialization Round-Trip**
+ - File: `tests/test_pbt_report_serialization.py`
+ - Use Hypothesis with custom strategies for ReportData (valid PLSection, RecommendationAccuracySection, etc.)
+ - Assert: `ReportData.model_validate_json(report.model_dump_json())` == original report
+ - Assert: all datetime fields in serialized JSON are ISO 8601 format
+ - **Validates: Requirements 8.1, 8.2, 8.3, 8.4**
+
+ - [x] 8.4 Write property test: Validation Discrepancy Detection Correctness
+ - **Property 3: Validation Discrepancy Detection Correctness**
+ - File: `tests/test_pbt_report_validation.py`
+ - Use Hypothesis with `@given(st.floats(min_value=0, max_value=1e6), st.floats(min_value=0, max_value=1e6))`
+ - Assert: warning iff |computed - snapshot| / snapshot * 100 > 5% (when snapshot > 0); flag any non-zero computed when snapshot == 0; no warning when both == 0
+ - **Validates: Requirements 4.1, 4.2, 4.3, 4.4**
+
+ - [x] 8.5 Write property test: Recommendation Accuracy Aggregation
+ - **Property 4: Recommendation Accuracy Aggregation**
+ - File: `tests/test_pbt_report_sections.py`
+ - Use Hypothesis with lists of trading decisions + prediction outcomes (direction_correct bool, profitable bool, excess_return_vs_spy float)
+ - Assert: win_rate == count(profitable) / total, directional_accuracy == count(direction_correct) / total, avg excess return == mean(excess_return_vs_spy), all rates in [0.0, 1.0]
+ - **Validates: Requirements 1.4**
+
+ - [x] 8.6 Write property test: Portfolio Period-Over-Period Delta Computation
+ - **Property 5: Portfolio Period-Over-Period Delta Computation**
+ - File: `tests/test_pbt_report_sections.py`
+ - Use Hypothesis with two portfolio snapshots (non-negative portfolio_value, active_pool, reserve_pool, finite cumulative_return)
+ - Assert: deltas == (current - previous) for each field; when no previous snapshot, deltas == 0
+ - **Validates: Requirements 1.3**
+
+ - [x] 8.7 Write unit tests for section builders
+ - File: `tests/test_report_sections.py`
+ - Test each section builder with known inputs and expected outputs
+ - Test edge cases: empty data (zero-activity), single position, no portfolio snapshot
+ - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5_
+
+ - [x] 8.8 Write unit tests for report validator
+ - File: `tests/test_report_validator.py`
+ - Test specific discrepancy scenarios: exactly 5% (no warning), 5.1% (warning), snapshot=0 computed≠0, both=0, NULL snapshot
+ - _Requirements: 4.1, 4.2, 4.3, 4.4_
+
+ - [x] 8.9 Write unit tests for AI summarizer
+ - File: `tests/test_report_summarizer.py`
+ - Test deterministic fallback summary generation
+ - Test chunk_data edge cases: empty input, single character, exactly at limit, one char over limit
+ - _Requirements: 2.2, 2.6_
+
+ - [x] 8.10 Write unit tests for report generator
+ - File: `tests/test_report_generator.py`
+ - Test orchestration with mocked dependencies (collector, sections, validator, summarizer)
+ - Test zero-activity report generation
+ - Test upsert behavior (regeneration of existing report)
+ - _Requirements: 5.1, 5.2, 5.3_
+
+ - [x] 8.11 Write API integration tests
+ - File: `tests/test_report_api.py`
+ - Test GET /api/reports with pagination, filtering by report_type and date range
+ - Test GET /api/reports/{report_id} with valid and invalid IDs
+ - _Requirements: 5.4, 5.5, 5.6_
+
+ - [x] 8.12 Write frontend hook tests
+ - File: `frontend/src/test/reports.test.ts`
+ - Test useReports and useReport hooks with MSW mocks
+ - Test loading and error states
+ - _Requirements: 5.4, 5.5_
+
+- [x] 9. Final checkpoint — Full test suite and lint
+ - Ensure all tests pass, ask the user if questions arise.
+ - Run `.venv/bin/ruff check services/`
+ - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"`
+ - Run frontend tests: `cd frontend && npx vitest --run`
+
+## Notes
+
+- Tasks marked with `*` are optional and can be skipped for faster MVP
+- Each task references specific requirements for traceability
+- Checkpoints ensure incremental validation after each phase
+- Property tests validate the 5 universal correctness properties from the design document
+- Unit tests validate specific examples and edge cases
+- The design document contains full interface signatures — use those as the implementation guide
+- Always run `.venv/bin/ruff check services/` before committing Python changes
diff --git a/frontend/src/api/hooks.ts b/frontend/src/api/hooks.ts
index ccec86f..82cd150 100644
--- a/frontend/src/api/hooks.ts
+++ b/frontend/src/api/hooks.ts
@@ -1051,3 +1051,44 @@ export function useValidationAttributionLayers(lookback = '30d', horizon = '7d')
const path = `/api/validation/attribution/layers${qs.toString() ? '?' + qs : ''}`;
return useGet(['validation-attribution-layers', lookback, horizon], 'query', path);
}
+
+// ---------------------------------------------------------------------------
+// Trading Reports
+// ---------------------------------------------------------------------------
+
+export interface ReportListItem {
+ id: string;
+ report_type: string;
+ period_start: string;
+ period_end: string;
+ validation_status: string;
+ generated_at: string;
+}
+
+export interface ReportDetail extends ReportListItem {
+ report_data: Record;
+ created_at: string;
+}
+
+export function useReports(params?: {
+ report_type?: string;
+ start_date?: string;
+ end_date?: string;
+ limit?: number;
+ offset?: number;
+}) {
+ const qs = new URLSearchParams();
+ if (params?.report_type) qs.set('report_type', params.report_type);
+ if (params?.start_date) qs.set('start_date', params.start_date);
+ if (params?.end_date) qs.set('end_date', params.end_date);
+ if (params?.limit) qs.set('limit', String(params.limit));
+ if (params?.offset) qs.set('offset', String(params.offset));
+ const path = `/api/reports${qs.toString() ? '?' + qs : ''}`;
+ return useGet(['reports', params], 'query', path);
+}
+
+export function useReport(id: string | undefined) {
+ return useGet(
+ ['report', id], 'query', `/api/reports/${id}`, !!id
+ );
+}
diff --git a/frontend/src/test/mocks/handlers.ts b/frontend/src/test/mocks/handlers.ts
index afb365c..4ab2d0d 100644
--- a/frontend/src/test/mocks/handlers.ts
+++ b/frontend/src/test/mocks/handlers.ts
@@ -334,6 +334,17 @@ export const handlers = [
return HttpResponse.json({ enabled: body.enabled, previous_enabled: true, toggled_by: 'operator' });
}),
+ // Trading Reports
+ http.get('/api/reports', () => HttpResponse.json([
+ { id: 'rpt-1', report_type: 'daily', period_start: '2025-01-15', period_end: '2025-01-15', validation_status: 'passed', generated_at: '2025-01-15T21:30:00Z' },
+ ])),
+ http.get('/api/reports/:id', ({ params }) => {
+ if (params.id === 'rpt-1') {
+ return HttpResponse.json({ id: 'rpt-1', report_type: 'daily', period_start: '2025-01-15', period_end: '2025-01-15', report_data: { pnl: { realized_pnl: 125.5 }, executive_summary: 'Test' }, validation_status: 'passed', generated_at: '2025-01-15T21:30:00Z', created_at: '2025-01-15T21:30:05Z' });
+ }
+ return new HttpResponse(null, { status: 404 });
+ }),
+
// Validation: Model Quality & Calibration endpoints
http.get('/api/validation/summary', () => HttpResponse.json(mockValidationSummary)),
http.get('/api/validation/calibration', () => HttpResponse.json(mockValidationCalibration)),
diff --git a/frontend/src/test/reports.test.ts b/frontend/src/test/reports.test.ts
new file mode 100644
index 0000000..47c13d5
--- /dev/null
+++ b/frontend/src/test/reports.test.ts
@@ -0,0 +1,155 @@
+/**
+ * Frontend hook tests for trading reports.
+ *
+ * Tests useReports and useReport hooks with MSW mocks.
+ * Requirements validated: 5.4, 5.5
+ */
+import { renderHook, waitFor } from '@testing-library/react';
+import { QueryClient, QueryClientProvider } from '@tanstack/react-query';
+import { http, HttpResponse } from 'msw';
+import { type ReactNode, createElement } from 'react';
+import { describe, expect, it } from 'vitest';
+import { useReports, useReport } from '../api/hooks';
+import { server } from './mocks/server';
+
+const mockReportList = [
+ {
+ id: 'rpt-1',
+ report_type: 'daily',
+ period_start: '2025-01-15',
+ period_end: '2025-01-15',
+ validation_status: 'passed',
+ generated_at: '2025-01-15T21:30:00Z',
+ },
+ {
+ id: 'rpt-2',
+ report_type: 'weekly',
+ period_start: '2025-01-13',
+ period_end: '2025-01-17',
+ validation_status: 'warnings',
+ generated_at: '2025-01-18T10:00:00Z',
+ },
+];
+
+const mockReportDetail = {
+ id: 'rpt-1',
+ report_type: 'daily',
+ period_start: '2025-01-15',
+ period_end: '2025-01-15',
+ validation_status: 'passed',
+ generated_at: '2025-01-15T21:30:00Z',
+ created_at: '2025-01-15T21:30:05Z',
+ report_data: {
+ pnl: { realized_pnl: 125.5, unrealized_pnl: -30.2 },
+ executive_summary: 'Test executive summary',
+ },
+};
+
+function createWrapper() {
+ const queryClient = new QueryClient({
+ defaultOptions: {
+ queries: { retry: false, gcTime: 0 },
+ },
+ });
+ return function Wrapper({ children }: { children: ReactNode }) {
+ return createElement(QueryClientProvider, { client: queryClient }, children);
+ };
+}
+
+describe('useReports', () => {
+ it('fetches report list with default params', async () => {
+ server.use(
+ http.get('/api/reports', () => HttpResponse.json(mockReportList)),
+ );
+
+ const { result } = renderHook(() => useReports(), {
+ wrapper: createWrapper(),
+ });
+
+ await waitFor(() => expect(result.current.isSuccess).toBe(true));
+
+ expect(result.current.data).toHaveLength(2);
+ expect(result.current.data![0].id).toBe('rpt-1');
+ expect(result.current.data![0].report_type).toBe('daily');
+ expect(result.current.data![1].report_type).toBe('weekly');
+ });
+
+ it('passes query params for filtering', async () => {
+ let capturedUrl = '';
+ server.use(
+ http.get('/api/reports', ({ request }) => {
+ capturedUrl = request.url;
+ return HttpResponse.json([mockReportList[0]]);
+ }),
+ );
+
+ const { result } = renderHook(
+ () => useReports({ report_type: 'daily', limit: 10 }),
+ { wrapper: createWrapper() },
+ );
+
+ await waitFor(() => expect(result.current.isSuccess).toBe(true));
+
+ expect(capturedUrl).toContain('report_type=daily');
+ expect(capturedUrl).toContain('limit=10');
+ expect(result.current.data).toHaveLength(1);
+ });
+
+ it('handles error state', async () => {
+ server.use(
+ http.get('/api/reports', () =>
+ new HttpResponse(null, { status: 500 }),
+ ),
+ );
+
+ const { result } = renderHook(() => useReports(), {
+ wrapper: createWrapper(),
+ });
+
+ await waitFor(() => expect(result.current.isError).toBe(true));
+ });
+});
+
+describe('useReport', () => {
+ it('fetches single report by id', async () => {
+ server.use(
+ http.get('/api/reports/rpt-1', () =>
+ HttpResponse.json(mockReportDetail),
+ ),
+ );
+
+ const { result } = renderHook(() => useReport('rpt-1'), {
+ wrapper: createWrapper(),
+ });
+
+ await waitFor(() => expect(result.current.isSuccess).toBe(true));
+
+ expect(result.current.data!.id).toBe('rpt-1');
+ expect(result.current.data!.report_data).toBeDefined();
+ expect(result.current.data!.report_data.pnl).toBeDefined();
+ expect(result.current.data!.created_at).toBe('2025-01-15T21:30:05Z');
+ });
+
+ it('does not fetch when id is undefined', async () => {
+ const { result } = renderHook(() => useReport(undefined), {
+ wrapper: createWrapper(),
+ });
+
+ // Should stay in idle/loading state without fetching
+ expect(result.current.isFetching).toBe(false);
+ });
+
+ it('handles 404 error', async () => {
+ server.use(
+ http.get('/api/reports/nonexistent', () =>
+ new HttpResponse(null, { status: 404 }),
+ ),
+ );
+
+ const { result } = renderHook(() => useReport('nonexistent'), {
+ wrapper: createWrapper(),
+ });
+
+ await waitFor(() => expect(result.current.isError).toBe(true));
+ });
+});
diff --git a/infra/migrations/038_trading_reports.sql b/infra/migrations/038_trading_reports.sql
new file mode 100644
index 0000000..df01aa8
--- /dev/null
+++ b/infra/migrations/038_trading_reports.sql
@@ -0,0 +1,50 @@
+-- Migration 038: Trading Reports
+-- Creates the trading_reports table for storing periodic performance reports
+-- and seeds the Report Summarizer AI agent for report section summarization.
+
+-- ============================================================================
+-- Table: trading_reports
+-- Stores daily and weekly trading performance reports as structured JSONB
+-- ============================================================================
+CREATE TABLE IF NOT EXISTS trading_reports (
+ id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+ report_type VARCHAR(20) NOT NULL,
+ period_start DATE NOT NULL,
+ period_end DATE NOT NULL,
+ report_data JSONB NOT NULL,
+ validation_status VARCHAR(20) NOT NULL DEFAULT 'passed',
+ generated_at TIMESTAMPTZ NOT NULL,
+ created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+ CONSTRAINT uq_trading_reports_period UNIQUE (report_type, period_start, period_end),
+ CONSTRAINT chk_report_type CHECK (report_type IN ('daily', 'weekly'))
+);
+
+CREATE INDEX IF NOT EXISTS idx_trading_reports_type ON trading_reports(report_type);
+CREATE INDEX IF NOT EXISTS idx_trading_reports_period ON trading_reports(period_start, period_end);
+CREATE INDEX IF NOT EXISTS idx_trading_reports_generated ON trading_reports(generated_at DESC);
+
+-- ============================================================================
+-- Seed: Report Summarizer Agent
+-- Generates concise natural-language summaries of trading performance report
+-- sections. Uses chunked data within the 8k-token context window.
+-- Only inserted if the slug does not already exist (idempotent).
+-- ============================================================================
+INSERT INTO ai_agents (name, slug, purpose, model_provider, model_name, system_prompt, prompt_version, schema_version, temperature, max_tokens, timeout_seconds, max_retries, source)
+SELECT * FROM (VALUES
+ (
+ 'Report Summarizer',
+ 'report-summarizer',
+ 'Generates concise natural-language summaries of trading performance report sections. Processes chunked data within the 8k-token context window.',
+ 'ollama',
+ 'qwen3.5:9b-fast',
+ E'You are a concise financial performance analyst. You summarize trading performance data into clear, professional prose.\n\nSTRICT RULES:\n1. Do NOT fabricate any data not present in the input.\n2. Do NOT add opinions, predictions, or recommendations.\n3. Keep each summary under 200 words.\n4. Highlight notable trends, outliers, and changes from prior periods.\n5. Use precise numbers from the input data.\n6. Use a neutral, professional tone.\n7. Return ONLY the summary text. No JSON, no markdown, no commentary.',
+ 'report-summarizer-v1',
+ '1.0.0',
+ 0.0,
+ 1024,
+ 60,
+ 2,
+ 'system'
+ )
+) AS v(name, slug, purpose, model_provider, model_name, system_prompt, prompt_version, schema_version, temperature, max_tokens, timeout_seconds, max_retries, source)
+WHERE NOT EXISTS (SELECT 1 FROM ai_agents WHERE slug = 'report-summarizer');
diff --git a/services/api/app.py b/services/api/app.py
index a06cfce..8f7c757 100644
--- a/services/api/app.py
+++ b/services/api/app.py
@@ -4107,3 +4107,112 @@ async def get_validation_attribution_layers(
"lookback": lookback,
"horizon": horizon,
}
+
+
+# ---------------------------------------------------------------------------
+# Trading Reports
+# ---------------------------------------------------------------------------
+
+
+@app.get("/api/reports")
+async def list_reports(
+ report_type: Optional[str] = None,
+ start_date: Optional[str] = None,
+ end_date: Optional[str] = None,
+ limit: int = Query(default=20, le=100),
+ offset: int = Query(default=0, ge=0),
+):
+ """Paginated list of trading reports with optional filtering.
+
+ Query params:
+ - report_type: 'daily' or 'weekly'
+ - start_date: ISO date (YYYY-MM-DD) — filter period_start >= this
+ - end_date: ISO date (YYYY-MM-DD) — filter period_end <= this
+ - limit: max results (default 20, max 100)
+ - offset: pagination offset (default 0)
+
+ Requirements: 5.4, 5.5, 5.6
+ """
+ conditions: list[str] = []
+ params: list[Any] = []
+ idx = 1
+
+ if report_type:
+ if report_type not in ("daily", "weekly"):
+ raise HTTPException(400, "report_type must be 'daily' or 'weekly'")
+ conditions.append(f"report_type = ${idx}")
+ params.append(report_type)
+ idx += 1
+
+ if start_date:
+ try:
+ from datetime import date as _date
+ _date.fromisoformat(start_date)
+ except ValueError:
+ raise HTTPException(400, "start_date must be YYYY-MM-DD")
+ conditions.append(f"period_start >= ${idx}::date")
+ params.append(start_date)
+ idx += 1
+
+ if end_date:
+ try:
+ from datetime import date as _date
+ _date.fromisoformat(end_date)
+ except ValueError:
+ raise HTTPException(400, "end_date must be YYYY-MM-DD")
+ conditions.append(f"period_end <= ${idx}::date")
+ params.append(end_date)
+ idx += 1
+
+ where = f"WHERE {' AND '.join(conditions)}" if conditions else ""
+
+ query = f"""
+ SELECT id, report_type, period_start, period_end,
+ validation_status, generated_at
+ FROM trading_reports
+ {where}
+ ORDER BY generated_at DESC
+ LIMIT ${idx} OFFSET ${idx + 1}
+ """
+ params.extend([limit, offset])
+
+ rows = await pool.fetch(query, *params)
+ return [
+ {
+ "id": str(r["id"]),
+ "report_type": r["report_type"],
+ "period_start": r["period_start"].isoformat(),
+ "period_end": r["period_end"].isoformat(),
+ "validation_status": r["validation_status"],
+ "generated_at": r["generated_at"].isoformat(),
+ }
+ for r in rows
+ ]
+
+
+@app.get("/api/reports/{report_id}")
+async def get_report(report_id: str):
+ """Fetch a single report including full report_data JSONB.
+
+ Requirements: 5.4, 5.5
+ """
+ row = await pool.fetchrow(
+ """SELECT id, report_type, period_start, period_end,
+ report_data, validation_status, generated_at, created_at
+ FROM trading_reports
+ WHERE id = $1::uuid""",
+ report_id,
+ )
+ if row is None:
+ raise HTTPException(404, "Report not found")
+
+ return {
+ "id": str(row["id"]),
+ "report_type": row["report_type"],
+ "period_start": row["period_start"].isoformat(),
+ "period_end": row["period_end"].isoformat(),
+ "report_data": json.loads(row["report_data"]) if isinstance(row["report_data"], str) else row["report_data"],
+ "validation_status": row["validation_status"],
+ "generated_at": row["generated_at"].isoformat(),
+ "created_at": row["created_at"].isoformat(),
+ }
diff --git a/services/reporting/__init__.py b/services/reporting/__init__.py
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/services/reporting/__init__.py
@@ -0,0 +1 @@
+
diff --git a/services/reporting/collector.py b/services/reporting/collector.py
new file mode 100644
index 0000000..91acd08
--- /dev/null
+++ b/services/reporting/collector.py
@@ -0,0 +1,306 @@
+"""Data collector for trading performance reports.
+
+Queries all relevant trading data for a reporting period and returns
+a CollectedData bundle for downstream section builders.
+"""
+
+from __future__ import annotations
+
+import logging
+import uuid
+from dataclasses import dataclass, field
+from datetime import date
+from typing import Any
+
+import asyncpg
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class CollectedData:
+ """Raw data collected for a reporting period."""
+
+ trading_decisions: list[dict] = field(default_factory=list)
+ orders: list[dict] = field(default_factory=list)
+ open_positions: list[dict] = field(default_factory=list)
+ closed_positions: list[dict] = field(default_factory=list)
+ portfolio_snapshot: dict | None = None
+ previous_portfolio_snapshot: dict | None = None
+ recommendations: list[dict] = field(default_factory=list)
+ prediction_outcomes: list[dict] = field(default_factory=list)
+ model_metric_snapshots: list[dict] = field(default_factory=list)
+ circuit_breaker_events: list[dict] = field(default_factory=list)
+ reserve_pool_balance: float = 0.0
+
+
+def _row_dict(row: asyncpg.Record) -> dict[str, Any]:
+ """Convert asyncpg Record to dict with UUID→str coercion."""
+ d = dict(row)
+ for k, v in d.items():
+ if isinstance(v, uuid.UUID):
+ d[k] = str(v)
+ return d
+
+
+async def collect_report_data(
+ pool: asyncpg.Pool,
+ period_start: date,
+ period_end: date,
+) -> CollectedData:
+ """Query all trading data for the reporting period.
+
+ Queries: trading_decisions, orders, positions, portfolio_snapshots,
+ recommendations, prediction_outcomes, model_metric_snapshots,
+ circuit_breaker_events, reserve_pool_ledger.
+
+ Returns CollectedData with all raw query results.
+ If no trading_decisions exist, returns empty lists (zero-activity).
+ """
+ async with pool.acquire() as conn:
+ trading_decisions = await _fetch_trading_decisions(conn, period_start, period_end)
+ orders = await _fetch_orders(conn, period_start, period_end)
+ open_positions = await _fetch_open_positions(conn)
+ closed_positions = await _fetch_closed_positions(conn, period_start, period_end)
+ portfolio_snapshot = await _fetch_portfolio_snapshot(conn, period_start, period_end)
+ previous_portfolio_snapshot = await _fetch_previous_portfolio_snapshot(conn, period_start)
+ recommendations = await _fetch_recommendations(conn, period_start, period_end)
+ prediction_outcomes = await _fetch_prediction_outcomes(conn, period_start, period_end)
+ model_metric_snapshots = await _fetch_model_metric_snapshots(conn, period_start, period_end)
+ circuit_breaker_events = await _fetch_circuit_breaker_events(conn, period_start, period_end)
+ reserve_pool_balance = await _fetch_reserve_pool_balance(conn)
+
+ return CollectedData(
+ trading_decisions=trading_decisions,
+ orders=orders,
+ open_positions=open_positions,
+ closed_positions=closed_positions,
+ portfolio_snapshot=portfolio_snapshot,
+ previous_portfolio_snapshot=previous_portfolio_snapshot,
+ recommendations=recommendations,
+ prediction_outcomes=prediction_outcomes,
+ model_metric_snapshots=model_metric_snapshots,
+ circuit_breaker_events=circuit_breaker_events,
+ reserve_pool_balance=reserve_pool_balance,
+ )
+
+
+async def _fetch_trading_decisions(
+ conn: asyncpg.Connection,
+ period_start: date,
+ period_end: date,
+) -> list[dict]:
+ """Fetch trading decisions created within the period."""
+ rows = await conn.fetch(
+ """SELECT id, recommendation_id, decision, skip_reason, ticker,
+ computed_position_size, computed_share_quantity,
+ risk_tier_at_decision, portfolio_heat_at_decision,
+ active_pool_at_decision, reserve_pool_at_decision,
+ circuit_breaker_status, correlation_check_result,
+ sector_exposure_check_result, earnings_proximity_flag,
+ is_micro_trade, decision_trace, created_at
+ FROM trading_decisions
+ WHERE created_at >= $1::date AND created_at < ($2::date + INTERVAL '1 day')
+ ORDER BY created_at""",
+ period_start,
+ period_end,
+ )
+ return [_row_dict(r) for r in rows]
+
+
+async def _fetch_orders(
+ conn: asyncpg.Connection,
+ period_start: date,
+ period_end: date,
+) -> list[dict]:
+ """Fetch orders created within the period."""
+ rows = await conn.fetch(
+ """SELECT id, recommendation_id, broker_account_id, ticker, side,
+ order_type, quantity, limit_price, stop_price, status,
+ broker_order_id, fill_price, fill_quantity,
+ submitted_at, filled_at, cancelled_at, rejected_at,
+ rejection_reason, created_at
+ FROM orders
+ WHERE created_at >= $1::date AND created_at < ($2::date + INTERVAL '1 day')
+ ORDER BY created_at""",
+ period_start,
+ period_end,
+ )
+ return [_row_dict(r) for r in rows]
+
+
+async def _fetch_open_positions(conn: asyncpg.Connection) -> list[dict]:
+ """Fetch currently open positions (quantity > 0)."""
+ rows = await conn.fetch(
+ """SELECT id, broker_account_id, ticker, quantity,
+ avg_entry_price, current_price,
+ unrealized_pnl, realized_pnl, updated_at
+ FROM positions
+ WHERE quantity > 0
+ ORDER BY ticker""",
+ )
+ return [_row_dict(r) for r in rows]
+
+
+async def _fetch_closed_positions(
+ conn: asyncpg.Connection,
+ period_start: date,
+ period_end: date,
+) -> list[dict]:
+ """Fetch positions closed during the period (quantity = 0, updated within period)."""
+ rows = await conn.fetch(
+ """SELECT id, broker_account_id, ticker, quantity,
+ avg_entry_price, current_price,
+ unrealized_pnl, realized_pnl, updated_at
+ FROM positions
+ WHERE quantity = 0
+ AND updated_at >= $1::date
+ AND updated_at < ($2::date + INTERVAL '1 day')
+ ORDER BY updated_at""",
+ period_start,
+ period_end,
+ )
+ return [_row_dict(r) for r in rows]
+
+
+async def _fetch_portfolio_snapshot(
+ conn: asyncpg.Connection,
+ period_start: date,
+ period_end: date,
+) -> dict | None:
+ """Fetch the most recent portfolio snapshot within the period."""
+ row = await conn.fetchrow(
+ """SELECT id, snapshot_date, portfolio_value, active_pool, reserve_pool,
+ daily_return, cumulative_return, unrealized_pnl, realized_pnl,
+ win_count, loss_count, win_rate, sharpe_ratio,
+ max_drawdown, current_drawdown_pct, portfolio_heat,
+ risk_tier, positions, metrics, created_at
+ FROM portfolio_snapshots
+ WHERE snapshot_date >= $1 AND snapshot_date <= $2
+ ORDER BY snapshot_date DESC
+ LIMIT 1""",
+ period_start,
+ period_end,
+ )
+ return _row_dict(row) if row else None
+
+
+async def _fetch_previous_portfolio_snapshot(
+ conn: asyncpg.Connection,
+ period_start: date,
+) -> dict | None:
+ """Fetch the most recent portfolio snapshot before the period start."""
+ row = await conn.fetchrow(
+ """SELECT id, snapshot_date, portfolio_value, active_pool, reserve_pool,
+ daily_return, cumulative_return, unrealized_pnl, realized_pnl,
+ win_count, loss_count, win_rate, sharpe_ratio,
+ max_drawdown, current_drawdown_pct, portfolio_heat,
+ risk_tier, positions, metrics, created_at
+ FROM portfolio_snapshots
+ WHERE snapshot_date < $1
+ ORDER BY snapshot_date DESC
+ LIMIT 1""",
+ period_start,
+ )
+ return _row_dict(row) if row else None
+
+
+async def _fetch_recommendations(
+ conn: asyncpg.Connection,
+ period_start: date,
+ period_end: date,
+) -> list[dict]:
+ """Fetch recommendations created within the period."""
+ rows = await conn.fetch(
+ """SELECT id, ticker, company_id, action, mode, confidence,
+ time_horizon, thesis, portfolio_pct, max_loss_pct,
+ model_version, generated_at, created_at
+ FROM recommendations
+ WHERE created_at >= $1::date AND created_at < ($2::date + INTERVAL '1 day')
+ ORDER BY created_at""",
+ period_start,
+ period_end,
+ )
+ return [_row_dict(r) for r in rows]
+
+
+async def _fetch_prediction_outcomes(
+ conn: asyncpg.Connection,
+ period_start: date,
+ period_end: date,
+) -> list[dict]:
+ """Fetch prediction outcomes evaluated within the period."""
+ rows = await conn.fetch(
+ """SELECT po.id, po.prediction_id, po.evaluated_at, po.horizon,
+ po.future_price, po.future_return,
+ po.spy_future_price, po.spy_return,
+ po.sector_etf_future_price, po.sector_etf_return,
+ po.excess_return_vs_spy, po.excess_return_vs_sector,
+ po.direction_correct, po.profitable,
+ ps.ticker, ps.direction, ps.action, ps.confidence
+ FROM prediction_outcomes po
+ JOIN prediction_snapshots ps ON ps.id = po.prediction_id
+ WHERE po.evaluated_at >= $1::date
+ AND po.evaluated_at < ($2::date + INTERVAL '1 day')
+ ORDER BY po.evaluated_at""",
+ period_start,
+ period_end,
+ )
+ return [_row_dict(r) for r in rows]
+
+
+async def _fetch_model_metric_snapshots(
+ conn: asyncpg.Connection,
+ period_start: date,
+ period_end: date,
+) -> list[dict]:
+ """Fetch model metric snapshots generated within the period."""
+ rows = await conn.fetch(
+ """SELECT id, generated_at, lookback_window, horizon,
+ prediction_count, win_rate, directional_accuracy,
+ information_coefficient, rank_information_coefficient,
+ avg_return, avg_excess_return_vs_spy,
+ avg_excess_return_vs_sector,
+ calibration_error, brier_score,
+ buy_win_rate, sell_win_rate, hold_win_rate,
+ created_at
+ FROM model_metric_snapshots
+ WHERE generated_at >= $1::date
+ AND generated_at < ($2::date + INTERVAL '1 day')
+ ORDER BY generated_at DESC""",
+ period_start,
+ period_end,
+ )
+ return [_row_dict(r) for r in rows]
+
+
+async def _fetch_circuit_breaker_events(
+ conn: asyncpg.Connection,
+ period_start: date,
+ period_end: date,
+) -> list[dict]:
+ """Fetch circuit breaker events from trading decisions within the period.
+
+ Circuit breaker events are trading decisions where
+ circuit_breaker_status is not 'clear' (i.e. a breaker was active).
+ """
+ rows = await conn.fetch(
+ """SELECT id, recommendation_id, decision, ticker,
+ circuit_breaker_status, decision_trace, created_at
+ FROM trading_decisions
+ WHERE circuit_breaker_status != 'clear'
+ AND created_at >= $1::date
+ AND created_at < ($2::date + INTERVAL '1 day')
+ ORDER BY created_at""",
+ period_start,
+ period_end,
+ )
+ return [_row_dict(r) for r in rows]
+
+
+async def _fetch_reserve_pool_balance(conn: asyncpg.Connection) -> float:
+ """Fetch the latest reserve pool balance."""
+ row = await conn.fetchrow(
+ "SELECT balance_after FROM reserve_pool_ledger ORDER BY created_at DESC LIMIT 1",
+ )
+ return float(row["balance_after"]) if row else 0.0
diff --git a/services/reporting/generator.py b/services/reporting/generator.py
new file mode 100644
index 0000000..3d65443
--- /dev/null
+++ b/services/reporting/generator.py
@@ -0,0 +1,279 @@
+"""Report generator — orchestrates collection, building, validation, summarization, and storage.
+
+Provides three public functions:
+- generate_report: full pipeline from data collection to assembled ReportData
+- store_report: upsert into trading_reports table
+- process_report_job: Redis queue job handler with retry and dedup
+
+Requirements: 5.1, 5.2, 5.3, 6.3, 6.4, 6.5
+Design: Report Generator
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from datetime import date, datetime, timezone
+
+import asyncpg
+
+from services.reporting.collector import collect_report_data
+from services.reporting.models import ReportData, ReportType
+from services.reporting.sections import (
+ build_model_quality_section,
+ build_pnl_section,
+ build_position_performance_section,
+ build_recommendation_accuracy_section,
+ build_risk_metrics_section,
+)
+from services.reporting.summarizer import (
+ generate_executive_summary,
+ summarize_section,
+)
+from services.reporting.validator import (
+ compute_validation_status,
+ validate_model_quality,
+ validate_recommendation_accuracy,
+)
+from services.shared.agent_config import AgentConfigResolver
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Retry configuration for process_report_job
+# ---------------------------------------------------------------------------
+
+_MAX_RETRIES = 3
+_BACKOFF_SECONDS = (30, 60, 120)
+
+# In-memory set tracking in-progress jobs to reject duplicates.
+# Key format: "{report_type}:{period_start}:{period_end}"
+_in_progress_jobs: set[str] = set()
+
+
+# ---------------------------------------------------------------------------
+# generate_report
+# ---------------------------------------------------------------------------
+
+
+async def generate_report(
+ pool: asyncpg.Pool,
+ report_type: ReportType,
+ period_start: date,
+ period_end: date,
+) -> ReportData:
+ """Orchestrate full report generation.
+
+ 1. Collect data via collector
+ 2. Build all 5 sections via section builders
+ 3. Validate recommendation_accuracy and model_quality via validator
+ 4. Create AgentConfigResolver and summarize each section
+ 5. Generate executive summary
+ 6. Assemble final ReportData
+ """
+ # 1. Collect data
+ data = await collect_report_data(pool, period_start, period_end)
+
+ # 2. Build sections
+ pnl = build_pnl_section(data)
+ rec_accuracy = build_recommendation_accuracy_section(data)
+ position_perf = build_position_performance_section(data)
+ risk_metrics = build_risk_metrics_section(data)
+ model_quality = build_model_quality_section(data)
+
+ # 3. Validate
+ rec_warnings = validate_recommendation_accuracy(
+ rec_accuracy, data.prediction_outcomes,
+ )
+ rec_accuracy.validation_warnings = rec_warnings
+
+ mq_warnings = validate_model_quality(
+ model_quality, data.model_metric_snapshots,
+ )
+ model_quality.validation_warnings = mq_warnings
+
+ # 4. Summarize each section
+ resolver = AgentConfigResolver(pool)
+
+ pnl.summary = await summarize_section(
+ pool, resolver, "pnl", pnl.model_dump(),
+ )
+ rec_accuracy.summary = await summarize_section(
+ pool, resolver, "recommendation_accuracy", rec_accuracy.model_dump(),
+ )
+ position_perf.summary = await summarize_section(
+ pool, resolver, "position_performance", position_perf.model_dump(),
+ )
+ risk_metrics.summary = await summarize_section(
+ pool, resolver, "risk_metrics", risk_metrics.model_dump(),
+ )
+ model_quality.summary = await summarize_section(
+ pool, resolver, "model_quality", model_quality.model_dump(),
+ )
+
+ # 5. Generate executive summary
+ section_summaries = {
+ "pnl": pnl.summary,
+ "recommendation_accuracy": rec_accuracy.summary,
+ "position_performance": position_perf.summary,
+ "risk_metrics": risk_metrics.summary,
+ "model_quality": model_quality.summary,
+ }
+ executive_summary = await generate_executive_summary(
+ pool, resolver, section_summaries,
+ )
+
+ # 6. Assemble ReportData
+ report = ReportData(
+ pnl=pnl,
+ recommendation_accuracy=rec_accuracy,
+ position_performance=position_perf,
+ risk_metrics=risk_metrics,
+ model_quality=model_quality,
+ executive_summary=executive_summary,
+ generated_at=datetime.now(timezone.utc),
+ period_start=period_start,
+ period_end=period_end,
+ report_type=ReportType(report_type),
+ )
+
+ # Set validation status based on all warnings
+ report.validation_status = compute_validation_status(report)
+
+ return report
+
+
+# ---------------------------------------------------------------------------
+# store_report
+# ---------------------------------------------------------------------------
+
+_UPSERT_SQL = """\
+INSERT INTO trading_reports
+ (report_type, period_start, period_end, report_data, validation_status, generated_at)
+VALUES
+ ($1, $2, $3, $4::jsonb, $5, $6)
+ON CONFLICT (report_type, period_start, period_end)
+DO UPDATE SET
+ report_data = EXCLUDED.report_data,
+ validation_status = EXCLUDED.validation_status,
+ generated_at = EXCLUDED.generated_at
+RETURNING id
+"""
+
+
+async def store_report(
+ pool: asyncpg.Pool,
+ report: ReportData,
+) -> str:
+ """Store report in trading_reports table via upsert.
+
+ Uses INSERT ... ON CONFLICT (report_type, period_start, period_end)
+ DO UPDATE to handle regeneration of existing reports.
+
+ Returns the report UUID as a string.
+ """
+ row = await pool.fetchrow(
+ _UPSERT_SQL,
+ report.report_type.value,
+ report.period_start,
+ report.period_end,
+ report.model_dump_json(),
+ report.validation_status.value,
+ report.generated_at,
+ )
+ report_id = str(row["id"]) # type: ignore[index]
+ logger.info(
+ "Stored report %s (type=%s, period=%s to %s)",
+ report_id,
+ report.report_type.value,
+ report.period_start,
+ report.period_end,
+ )
+ return report_id
+
+
+# ---------------------------------------------------------------------------
+# process_report_job
+# ---------------------------------------------------------------------------
+
+
+def _job_key(report_type: str, period_start: str, period_end: str) -> str:
+ """Build a dedup key for an in-progress job."""
+ return f"{report_type}:{period_start}:{period_end}"
+
+
+async def process_report_job(
+ pool: asyncpg.Pool,
+ job: dict,
+) -> None:
+ """Process a report generation job from the Redis queue.
+
+ Deserializes job payload, calls generate_report + store_report.
+ Handles retries with exponential backoff (30s, 60s, 120s up to 3 attempts).
+ Rejects duplicate jobs for the same report_type + period.
+
+ Expected job payload::
+
+ {
+ "report_type": "daily" | "weekly",
+ "period_start": "YYYY-MM-DD",
+ "period_end": "YYYY-MM-DD"
+ }
+ """
+ report_type_str = job.get("report_type", "")
+ period_start_str = job.get("period_start", "")
+ period_end_str = job.get("period_end", "")
+
+ # Validate payload
+ try:
+ report_type = ReportType(report_type_str)
+ period_start = date.fromisoformat(period_start_str)
+ period_end = date.fromisoformat(period_end_str)
+ except (ValueError, TypeError) as exc:
+ logger.error("Invalid report job payload: %s — %s", job, exc)
+ return
+
+ # Reject duplicate in-progress jobs
+ key = _job_key(report_type_str, period_start_str, period_end_str)
+ if key in _in_progress_jobs:
+ logger.warning(
+ "Duplicate report job rejected (already in progress): %s", key,
+ )
+ return
+
+ _in_progress_jobs.add(key)
+ try:
+ last_error: Exception | None = None
+ for attempt in range(_MAX_RETRIES):
+ try:
+ report = await generate_report(
+ pool, report_type, period_start, period_end,
+ )
+ await store_report(pool, report)
+ logger.info(
+ "Report job completed: %s (attempt %d)", key, attempt + 1,
+ )
+ return
+ except Exception as exc:
+ last_error = exc
+ if attempt < _MAX_RETRIES - 1:
+ backoff = _BACKOFF_SECONDS[attempt]
+ logger.warning(
+ "Report job %s failed (attempt %d/%d): %s — retrying in %ds",
+ key,
+ attempt + 1,
+ _MAX_RETRIES,
+ exc,
+ backoff,
+ )
+ await asyncio.sleep(backoff)
+
+ # All retries exhausted
+ logger.error(
+ "Report job %s failed after %d attempts: %s",
+ key,
+ _MAX_RETRIES,
+ last_error,
+ )
+ finally:
+ _in_progress_jobs.discard(key)
diff --git a/services/reporting/models.py b/services/reporting/models.py
new file mode 100644
index 0000000..3ec0464
--- /dev/null
+++ b/services/reporting/models.py
@@ -0,0 +1,104 @@
+from __future__ import annotations
+
+from datetime import date, datetime
+from enum import Enum
+
+from pydantic import BaseModel, Field
+
+
+class ReportType(str, Enum):
+ DAILY = "daily"
+ WEEKLY = "weekly"
+
+
+class ValidationStatus(str, Enum):
+ PASSED = "passed"
+ WARNINGS = "warnings"
+
+
+class ValidationWarning(BaseModel):
+ field_name: str
+ computed_value: float
+ snapshot_value: float
+ pct_difference: float
+
+
+class PLSection(BaseModel):
+ realized_pnl: float
+ unrealized_pnl: float
+ daily_return: float
+ cumulative_return: float
+ win_count: int
+ loss_count: int
+ win_rate: float
+ profit_factor: float
+ sharpe_ratio: float
+ summary: str = ""
+ validation_warnings: list[ValidationWarning] = Field(default_factory=list)
+
+
+class RecommendationAccuracySection(BaseModel):
+ total_evaluated: int
+ act_count: int
+ skip_count: int
+ acted_win_rate: float
+ avg_confidence_acted: float
+ avg_confidence_skipped: float
+ summary: str = ""
+ validation_warnings: list[ValidationWarning] = Field(default_factory=list)
+
+
+class PositionDetail(BaseModel):
+ ticker: str
+ entry_price: float
+ current_or_exit_price: float
+ pnl: float
+ pnl_pct: float
+ hold_duration_hours: float
+ status: str # "open" or "closed"
+
+
+class PositionPerformanceSection(BaseModel):
+ positions: list[PositionDetail] = Field(default_factory=list)
+ summary: str = ""
+
+
+class RiskMetricsSection(BaseModel):
+ current_risk_tier: str
+ portfolio_heat: float
+ max_drawdown: float
+ current_drawdown_pct: float
+ reserve_pool_balance: float
+ circuit_breaker_event_count: int
+ summary: str = ""
+
+
+class ModelQualityWindow(BaseModel):
+ lookback: str
+ win_rate: float | None
+ directional_accuracy: float | None
+ information_coefficient: float | None
+ calibration_error: float | None
+ brier_score: float | None
+
+
+class ModelQualitySection(BaseModel):
+ windows: list[ModelQualityWindow] = Field(default_factory=list)
+ summary: str = ""
+ validation_warnings: list[ValidationWarning] = Field(default_factory=list)
+
+
+class ReportData(BaseModel):
+ """Top-level report structure stored as JSONB."""
+
+ pnl: PLSection
+ recommendation_accuracy: RecommendationAccuracySection
+ position_performance: PositionPerformanceSection
+ risk_metrics: RiskMetricsSection
+ model_quality: ModelQualitySection
+ executive_summary: str = ""
+ validation_status: ValidationStatus = ValidationStatus.PASSED
+ generated_at: datetime
+ period_start: date
+ period_end: date
+ report_type: ReportType
diff --git a/services/reporting/sections.py b/services/reporting/sections.py
new file mode 100644
index 0000000..2dd6c56
--- /dev/null
+++ b/services/reporting/sections.py
@@ -0,0 +1,370 @@
+"""Section builders for trading performance reports.
+
+Each builder takes a CollectedData bundle and returns a typed Pydantic
+section model. All builders handle zero-activity gracefully by returning
+zero values and empty lists when no data is available.
+"""
+
+from __future__ import annotations
+
+import logging
+from datetime import datetime, timezone
+
+from services.reporting.collector import CollectedData
+from services.reporting.models import (
+ ModelQualitySection,
+ ModelQualityWindow,
+ PLSection,
+ PositionDetail,
+ PositionPerformanceSection,
+ RecommendationAccuracySection,
+ RiskMetricsSection,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def build_pnl_section(data: CollectedData) -> PLSection:
+ """Build P&L section from collected data.
+
+ Computes realized/unrealized P&L, daily return, cumulative return,
+ win/loss counts, win rate, profit factor, and Sharpe ratio from
+ portfolio_snapshot and closed positions.
+ """
+ snap = data.portfolio_snapshot
+
+ if snap is None:
+ return PLSection(
+ realized_pnl=0.0,
+ unrealized_pnl=0.0,
+ daily_return=0.0,
+ cumulative_return=0.0,
+ win_count=0,
+ loss_count=0,
+ win_rate=0.0,
+ profit_factor=0.0,
+ sharpe_ratio=0.0,
+ )
+
+ # Compute profit factor from closed positions:
+ # sum of gains / abs(sum of losses)
+ gains = 0.0
+ losses = 0.0
+ for pos in data.closed_positions:
+ rpnl = float(pos.get("realized_pnl", 0) or 0)
+ if rpnl > 0:
+ gains += rpnl
+ elif rpnl < 0:
+ losses += abs(rpnl)
+
+ profit_factor = (gains / losses) if losses > 0 else 0.0
+
+ return PLSection(
+ realized_pnl=float(snap.get("realized_pnl", 0) or 0),
+ unrealized_pnl=float(snap.get("unrealized_pnl", 0) or 0),
+ daily_return=float(snap.get("daily_return", 0) or 0),
+ cumulative_return=float(snap.get("cumulative_return", 0) or 0),
+ win_count=int(snap.get("win_count", 0) or 0),
+ loss_count=int(snap.get("loss_count", 0) or 0),
+ win_rate=float(snap.get("win_rate", 0) or 0),
+ profit_factor=profit_factor,
+ sharpe_ratio=float(snap.get("sharpe_ratio", 0) or 0),
+ )
+
+
+def build_recommendation_accuracy_section(
+ data: CollectedData,
+) -> RecommendationAccuracySection:
+ """Build recommendation accuracy section.
+
+ Joins trading_decisions with prediction_outcomes to compute
+ act/skip breakdown, win rate of acted recommendations, and
+ average confidence of acted vs skipped.
+ """
+ if not data.trading_decisions:
+ return RecommendationAccuracySection(
+ total_evaluated=0,
+ act_count=0,
+ skip_count=0,
+ acted_win_rate=0.0,
+ avg_confidence_acted=0.0,
+ avg_confidence_skipped=0.0,
+ )
+
+ # Build lookup: recommendation_id -> prediction_outcome
+ # prediction_outcomes are joined with prediction_snapshots in the collector,
+ # so they carry ticker, direction, action, confidence from the snapshot.
+ # trading_decisions reference recommendations via recommendation_id.
+ # We need to match trading_decisions -> recommendations -> prediction_outcomes.
+ #
+ # The collector fetches prediction_outcomes joined with prediction_snapshots
+ # (po.prediction_id = ps.id). Trading decisions reference recommendation_id.
+ # Recommendations and prediction_snapshots share the same ticker, so we
+ # match by recommendation_id on the trading_decision side.
+
+ # Build recommendation_id -> recommendation dict for confidence lookup
+ rec_by_id: dict[str, dict] = {}
+ for rec in data.recommendations:
+ rec_id = str(rec.get("id", ""))
+ if rec_id:
+ rec_by_id[rec_id] = rec
+
+ # Build prediction_id -> prediction_outcome for profitability lookup
+ # We also need to map recommendation_id -> prediction_outcome.
+ # The link is: trading_decision.recommendation_id -> recommendation.id
+ # and prediction_outcome has ticker from prediction_snapshots.
+ # We match by ticker between recommendation and prediction_outcome.
+ outcome_by_ticker: dict[str, list[dict]] = {}
+ for po in data.prediction_outcomes:
+ ticker = po.get("ticker", "")
+ if ticker:
+ outcome_by_ticker.setdefault(ticker, []).append(po)
+
+ act_count = 0
+ skip_count = 0
+ acted_wins = 0
+ acted_total_with_outcome = 0
+ confidence_acted: list[float] = []
+ confidence_skipped: list[float] = []
+
+ for td in data.trading_decisions:
+ decision = str(td.get("decision", "")).lower()
+ rec_id = str(td.get("recommendation_id", ""))
+ rec = rec_by_id.get(rec_id, {})
+ conf = rec.get("confidence")
+ ticker = td.get("ticker", "")
+
+ if decision == "act":
+ act_count += 1
+ if conf is not None:
+ confidence_acted.append(float(conf))
+
+ # Check profitability from prediction_outcomes for this ticker
+ ticker_outcomes = outcome_by_ticker.get(ticker, [])
+ if ticker_outcomes:
+ # Use the most recent outcome for this ticker
+ latest = ticker_outcomes[-1]
+ acted_total_with_outcome += 1
+ if latest.get("profitable"):
+ acted_wins += 1
+ else:
+ skip_count += 1
+ if conf is not None:
+ confidence_skipped.append(float(conf))
+
+ total_evaluated = act_count + skip_count
+ acted_win_rate = (
+ (acted_wins / acted_total_with_outcome)
+ if acted_total_with_outcome > 0
+ else 0.0
+ )
+ avg_confidence_acted = (
+ (sum(confidence_acted) / len(confidence_acted))
+ if confidence_acted
+ else 0.0
+ )
+ avg_confidence_skipped = (
+ (sum(confidence_skipped) / len(confidence_skipped))
+ if confidence_skipped
+ else 0.0
+ )
+
+ return RecommendationAccuracySection(
+ total_evaluated=total_evaluated,
+ act_count=act_count,
+ skip_count=skip_count,
+ acted_win_rate=acted_win_rate,
+ avg_confidence_acted=avg_confidence_acted,
+ avg_confidence_skipped=avg_confidence_skipped,
+ )
+
+
+def build_position_performance_section(
+ data: CollectedData,
+) -> PositionPerformanceSection:
+ """Build position performance section.
+
+ Lists each position (open and closed) with entry price,
+ current/exit price, P&L, P&L%, and hold duration.
+ """
+ positions: list[PositionDetail] = []
+ now = datetime.now(timezone.utc)
+
+ # Open positions
+ for pos in data.open_positions:
+ entry_price = float(pos.get("avg_entry_price", 0) or 0)
+ current_price = float(pos.get("current_price", 0) or 0)
+ quantity = float(pos.get("quantity", 0) or 0)
+
+ pnl = (current_price - entry_price) * quantity
+ cost_basis = entry_price * quantity
+ pnl_pct = (pnl / cost_basis * 100) if cost_basis > 0 else 0.0
+
+ # Hold duration from updated_at to now
+ updated_at = pos.get("updated_at")
+ hold_hours = _compute_hold_hours(updated_at, now)
+
+ positions.append(
+ PositionDetail(
+ ticker=pos.get("ticker", ""),
+ entry_price=entry_price,
+ current_or_exit_price=current_price,
+ pnl=pnl,
+ pnl_pct=pnl_pct,
+ hold_duration_hours=hold_hours,
+ status="open",
+ )
+ )
+
+ # Closed positions
+ for pos in data.closed_positions:
+ entry_price = float(pos.get("avg_entry_price", 0) or 0)
+ current_price = float(pos.get("current_price", 0) or 0)
+ realized_pnl = float(pos.get("realized_pnl", 0) or 0)
+
+ cost_basis = entry_price * float(pos.get("quantity", 0) or 0)
+ # For closed positions, quantity is 0 in the DB, so use realized_pnl
+ # directly. P&L% is based on the original cost basis which we can
+ # approximate from entry_price and the realized_pnl.
+ # If entry_price is available, compute pnl_pct from realized_pnl / cost.
+ # Since quantity=0 for closed, we estimate original quantity from
+ # realized_pnl and price difference, or just use realized_pnl directly.
+ if entry_price > 0 and current_price != entry_price:
+ # Estimate original quantity from realized_pnl / (exit - entry)
+ price_diff = current_price - entry_price
+ if price_diff != 0:
+ est_quantity = abs(realized_pnl / price_diff)
+ est_cost = entry_price * est_quantity
+ pnl_pct = (realized_pnl / est_cost * 100) if est_cost > 0 else 0.0
+ else:
+ pnl_pct = 0.0
+ else:
+ pnl_pct = 0.0
+
+ updated_at = pos.get("updated_at")
+ hold_hours = _compute_hold_hours(updated_at, now)
+
+ positions.append(
+ PositionDetail(
+ ticker=pos.get("ticker", ""),
+ entry_price=entry_price,
+ current_or_exit_price=current_price,
+ pnl=realized_pnl,
+ pnl_pct=pnl_pct,
+ hold_duration_hours=hold_hours,
+ status="closed",
+ )
+ )
+
+ return PositionPerformanceSection(positions=positions)
+
+
+def _compute_hold_hours(updated_at: datetime | str | None, now: datetime) -> float:
+ """Compute hold duration in hours from updated_at to now."""
+ if updated_at is None:
+ return 0.0
+ if isinstance(updated_at, str):
+ try:
+ updated_at = datetime.fromisoformat(updated_at)
+ except (ValueError, TypeError):
+ return 0.0
+ if not isinstance(updated_at, datetime):
+ return 0.0
+ # Ensure timezone-aware comparison
+ if updated_at.tzinfo is None:
+ updated_at = updated_at.replace(tzinfo=timezone.utc)
+ delta = now - updated_at
+ return max(delta.total_seconds() / 3600.0, 0.0)
+
+
+def build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection:
+ """Build risk metrics section.
+
+ Extracts current risk tier, portfolio heat, max drawdown,
+ current drawdown %, reserve pool balance, and circuit breaker
+ event count from collected data.
+ """
+ snap = data.portfolio_snapshot
+
+ if snap is None:
+ return RiskMetricsSection(
+ current_risk_tier="unknown",
+ portfolio_heat=0.0,
+ max_drawdown=0.0,
+ current_drawdown_pct=0.0,
+ reserve_pool_balance=data.reserve_pool_balance,
+ circuit_breaker_event_count=len(data.circuit_breaker_events),
+ )
+
+ return RiskMetricsSection(
+ current_risk_tier=str(snap.get("risk_tier", "unknown") or "unknown"),
+ portfolio_heat=float(snap.get("portfolio_heat", 0) or 0),
+ max_drawdown=float(snap.get("max_drawdown", 0) or 0),
+ current_drawdown_pct=float(snap.get("current_drawdown_pct", 0) or 0),
+ reserve_pool_balance=data.reserve_pool_balance,
+ circuit_breaker_event_count=len(data.circuit_breaker_events),
+ )
+
+
+def build_model_quality_section(data: CollectedData) -> ModelQualitySection:
+ """Build model quality section.
+
+ Extracts latest model_metric_snapshot values for 7d, 30d, 90d
+ lookback windows.
+ """
+ if not data.model_metric_snapshots:
+ return ModelQualitySection(windows=[])
+
+ # Group by lookback_window, take the latest (first in list since
+ # collector orders by generated_at DESC)
+ target_windows = {"7d", "30d", "90d"}
+ latest_by_window: dict[str, dict] = {}
+
+ for snap in data.model_metric_snapshots:
+ window = snap.get("lookback_window", "")
+ if window in target_windows and window not in latest_by_window:
+ latest_by_window[window] = snap
+
+ windows: list[ModelQualityWindow] = []
+ for w in ("7d", "30d", "90d"):
+ snap = latest_by_window.get(w)
+ if snap is None:
+ windows.append(
+ ModelQualityWindow(
+ lookback=w,
+ win_rate=None,
+ directional_accuracy=None,
+ information_coefficient=None,
+ calibration_error=None,
+ brier_score=None,
+ )
+ )
+ else:
+ windows.append(
+ ModelQualityWindow(
+ lookback=w,
+ win_rate=_safe_float(snap.get("win_rate")),
+ directional_accuracy=_safe_float(snap.get("directional_accuracy")),
+ information_coefficient=_safe_float(
+ snap.get("information_coefficient")
+ ),
+ calibration_error=_safe_float(snap.get("calibration_error")),
+ brier_score=_safe_float(snap.get("brier_score")),
+ )
+ )
+
+ return ModelQualitySection(windows=windows)
+
+
+def _safe_float(value: object) -> float | None:
+ """Convert a value to float, returning None for None/invalid values."""
+ if value is None:
+ return None
+ try:
+ f = float(value) # type: ignore[arg-type]
+ # Replace NaN/inf with None
+ if f != f or f == float("inf") or f == float("-inf"):
+ return None
+ return f
+ except (ValueError, TypeError):
+ return None
diff --git a/services/reporting/summarizer.py b/services/reporting/summarizer.py
new file mode 100644
index 0000000..950d54f
--- /dev/null
+++ b/services/reporting/summarizer.py
@@ -0,0 +1,437 @@
+"""AI-powered report summarizer with chunking and deterministic fallback.
+
+Generates natural-language summaries for trading performance report sections
+using the Report_Summarizer_Agent (resolved via AgentConfigResolver + llm_factory).
+Data is chunked to fit within the 8k-token context window of the local model.
+
+Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.6
+Design: AI Summarizer
+"""
+from __future__ import annotations
+
+import json
+import logging
+import time
+
+import asyncpg
+
+from services.extractor.llm_factory import build_llm_client
+from services.shared.agent_config import AgentConfigResolver, ResolvedAgentConfig
+from services.shared.config import load_config
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+
+CHUNK_SIZE_LIMIT = 6000 # characters per chunk
+MAX_SUMMARY_WORDS = 200 # per section summary
+MAX_EXECUTIVE_SUMMARY_WORDS = 300
+
+_REPORT_SUMMARIZER_SLUG = "report-summarizer"
+
+
+# ---------------------------------------------------------------------------
+# Chunking
+# ---------------------------------------------------------------------------
+
+
+def chunk_data(serialized: str, max_chars: int = CHUNK_SIZE_LIMIT) -> list[str]:
+ """Split serialized data into chunks of at most *max_chars* characters.
+
+ Splits on newline boundaries to avoid breaking JSON structures.
+ Each chunk is ≤ *max_chars* characters. Returns at least one chunk
+ (even for empty input).
+
+ Round-trip property: ``"".join(chunk_data(s, n)) == s`` for all *s*.
+
+ If a single line (including its trailing newline) exceeds *max_chars*,
+ it is included as its own chunk (we never break mid-line).
+ """
+ if not serialized:
+ return [""]
+
+ # Split into segments where each segment includes its trailing "\n"
+ # (except possibly the last one if the string doesn't end with "\n").
+ # This preserves the exact original when chunks are concatenated.
+ segments: list[str] = []
+ start = 0
+ while start < len(serialized):
+ nl = serialized.find("\n", start)
+ if nl == -1:
+ # Last segment, no trailing newline
+ segments.append(serialized[start:])
+ break
+ else:
+ # Include the newline in this segment
+ segments.append(serialized[start : nl + 1])
+ start = nl + 1
+
+ chunks: list[str] = []
+ current_parts: list[str] = []
+ current_len = 0
+
+ for segment in segments:
+ if current_parts and current_len + len(segment) > max_chars:
+ # Flush current chunk
+ chunks.append("".join(current_parts))
+ current_parts = [segment]
+ current_len = len(segment)
+ else:
+ current_parts.append(segment)
+ current_len += len(segment)
+
+ # Flush remaining
+ if current_parts:
+ chunks.append("".join(current_parts))
+
+ return chunks if chunks else [""]
+
+
+# ---------------------------------------------------------------------------
+# Performance logging
+# ---------------------------------------------------------------------------
+
+
+async def _log_performance(
+ pool: asyncpg.Pool,
+ resolved: ResolvedAgentConfig,
+ success: bool,
+ duration_ms: int,
+ input_text: str,
+ output_text: str,
+ error_message: str | None = None,
+) -> None:
+ """Insert a row into agent_performance_log for a summarizer invocation."""
+ try:
+ await pool.execute(
+ """INSERT INTO agent_performance_log
+ (agent_id, variant_id, document_id, ticker, success,
+ duration_ms, confidence, retry_count,
+ input_tokens, output_tokens, error_message)
+ VALUES ($1::uuid, $2::uuid, $3, $4, $5, $6, $7, $8, $9, $10, $11)""",
+ resolved.agent_id,
+ resolved.variant_id,
+ None, # no document_id for report summaries
+ None, # no ticker for report summaries
+ success,
+ duration_ms,
+ 0.0, # no confidence score for summaries
+ 0,
+ len(input_text) // 4, # token estimate
+ len(output_text) // 4, # token estimate
+ error_message,
+ )
+ except Exception:
+ logger.warning("Failed to log summarizer performance", exc_info=True)
+
+
+# ---------------------------------------------------------------------------
+# LLM summarization helpers
+# ---------------------------------------------------------------------------
+
+
+async def _summarize_chunk(
+ resolved: ResolvedAgentConfig,
+ section_name: str,
+ chunk: str,
+) -> str:
+ """Summarize a single chunk via the Report_Summarizer_Agent LLM client.
+
+ Returns the raw text output from the model.
+ Raises on failure so the caller can handle retries / fallback.
+ """
+ cfg = load_config()
+ client = build_llm_client(resolved, cfg.ollama, cfg.vllm)
+ try:
+ prompts = {
+ "system": resolved.system_prompt,
+ "user": f"Summarize this {section_name} data:\n{chunk}",
+ }
+ attempt = await client.call_llm(
+ prompts=prompts,
+ json_schema={}, # plain text, no structured output
+ document_text="",
+ )
+ if attempt.error:
+ raise RuntimeError(f"LLM error: {attempt.error}")
+ if not attempt.raw_output.strip():
+ raise RuntimeError("LLM returned empty response")
+ return attempt.raw_output.strip()
+ finally:
+ await client.close()
+
+
+async def _merge_summaries(
+ resolved: ResolvedAgentConfig,
+ section_name: str,
+ summaries: list[str],
+) -> str:
+ """Merge multiple chunk summaries into a single coherent summary."""
+ combined = "\n\n".join(summaries)
+ cfg = load_config()
+ client = build_llm_client(resolved, cfg.ollama, cfg.vllm)
+ try:
+ prompts = {
+ "system": resolved.system_prompt,
+ "user": (
+ f"Merge these {section_name} summaries into a single coherent "
+ f"summary of no more than {MAX_SUMMARY_WORDS} words:\n{combined}"
+ ),
+ }
+ attempt = await client.call_llm(
+ prompts=prompts,
+ json_schema={},
+ document_text="",
+ )
+ if attempt.error:
+ raise RuntimeError(f"LLM merge error: {attempt.error}")
+ if not attempt.raw_output.strip():
+ raise RuntimeError("LLM returned empty merge response")
+ return attempt.raw_output.strip()
+ finally:
+ await client.close()
+
+
+# ---------------------------------------------------------------------------
+# Section summarization
+# ---------------------------------------------------------------------------
+
+
+async def summarize_section(
+ pool: asyncpg.Pool,
+ resolver: AgentConfigResolver,
+ section_name: str,
+ section_data: dict,
+) -> str:
+ """Generate AI summary for a report section.
+
+ 1. Serialize section data to JSON string
+ 2. Chunk if > CHUNK_SIZE_LIMIT
+ 3. Summarize each chunk via Report_Summarizer_Agent
+ 4. If multiple chunks, merge summaries with a final LLM call
+ 5. Log each invocation to agent_performance_log
+ 6. On failure, fall back to deterministic summary
+ """
+ resolved = await resolver.resolve(_REPORT_SUMMARIZER_SLUG)
+ if resolved is None:
+ logger.error(
+ "Report summarizer agent not found (slug=%s) — using deterministic fallback",
+ _REPORT_SUMMARIZER_SLUG,
+ )
+ return build_deterministic_summary(section_name, section_data)
+
+ serialized = json.dumps(section_data, indent=2, default=str)
+ chunks = chunk_data(serialized)
+
+ start = time.monotonic()
+ try:
+ # Summarize each chunk
+ chunk_summaries: list[str] = []
+ for chunk in chunks:
+ summary = await _summarize_chunk(resolved, section_name, chunk)
+ chunk_summaries.append(summary)
+
+ # Merge if multiple chunks
+ if len(chunk_summaries) > 1:
+ try:
+ final_summary = await _merge_summaries(
+ resolved, section_name, chunk_summaries,
+ )
+ except Exception:
+ # Merge failed — fall back to concatenation of chunk summaries
+ logger.warning(
+ "Chunk merge LLM call failed for section %s — concatenating summaries",
+ section_name,
+ )
+ final_summary = "\n".join(chunk_summaries)
+ else:
+ final_summary = chunk_summaries[0]
+
+ # Truncate to MAX_SUMMARY_WORDS at sentence boundary
+ words = final_summary.split()
+ if len(words) > MAX_SUMMARY_WORDS:
+ truncated = " ".join(words[:MAX_SUMMARY_WORDS])
+ # Try to end at a sentence boundary
+ last_period = truncated.rfind(".")
+ if last_period > len(truncated) // 2:
+ truncated = truncated[: last_period + 1]
+ final_summary = truncated
+
+ duration_ms = int((time.monotonic() - start) * 1000)
+ await _log_performance(
+ pool, resolved, True, duration_ms, serialized, final_summary,
+ )
+ return final_summary
+
+ except Exception as exc:
+ duration_ms = int((time.monotonic() - start) * 1000)
+ logger.warning(
+ "AI summarization failed for section %s: %s — using deterministic fallback",
+ section_name,
+ exc,
+ )
+ await _log_performance(
+ pool, resolved, False, duration_ms, serialized, "",
+ error_message=str(exc),
+ )
+ return build_deterministic_summary(section_name, section_data)
+
+
+# ---------------------------------------------------------------------------
+# Deterministic fallback summaries
+# ---------------------------------------------------------------------------
+
+_DETERMINISTIC_TEMPLATES: dict[str, str] = {
+ "pnl": (
+ "P&L Summary: Realized P&L ${realized_pnl}, unrealized ${unrealized_pnl}, "
+ "daily return {daily_return}%, win rate {win_rate}%."
+ ),
+ "recommendation_accuracy": (
+ "Recommendation Accuracy: {total_evaluated} evaluated, "
+ "{act_count} acted ({acted_win_rate}% win rate), "
+ "{skip_count} skipped. "
+ "Avg confidence acted {avg_confidence_acted}, skipped {avg_confidence_skipped}."
+ ),
+ "position_performance": (
+ "Position Performance: {position_count} positions tracked during the period."
+ ),
+ "risk_metrics": (
+ "Risk Metrics: Risk tier {current_risk_tier}, portfolio heat {portfolio_heat}, "
+ "max drawdown {max_drawdown}, current drawdown {current_drawdown_pct}%, "
+ "reserve pool ${reserve_pool_balance}, "
+ "{circuit_breaker_event_count} circuit breaker events."
+ ),
+ "model_quality": (
+ "Model Quality: {window_count} lookback windows evaluated."
+ ),
+}
+
+
+def build_deterministic_summary(section_name: str, section_data: dict) -> str:
+ """Build a fallback deterministic summary from raw metrics.
+
+ Produces a template-based text summary when AI summarization fails.
+ """
+ template = _DETERMINISTIC_TEMPLATES.get(section_name)
+ if template is None:
+ # Generic fallback for unknown sections
+ return f"{section_name} summary: {len(section_data)} metrics reported."
+
+ try:
+ # Prepare template variables with safe defaults
+ data = dict(section_data)
+
+ # Add computed fields for templates that need them
+ if section_name == "position_performance":
+ positions = data.get("positions", [])
+ data["position_count"] = len(positions)
+ elif section_name == "model_quality":
+ windows = data.get("windows", [])
+ data["window_count"] = len(windows)
+
+ return template.format(**data)
+ except (KeyError, ValueError, TypeError) as exc:
+ logger.warning(
+ "Deterministic summary template failed for %s: %s",
+ section_name,
+ exc,
+ )
+ return f"{section_name} summary: data available but template formatting failed."
+
+
+# ---------------------------------------------------------------------------
+# Executive summary
+# ---------------------------------------------------------------------------
+
+
+async def generate_executive_summary(
+ pool: asyncpg.Pool,
+ resolver: AgentConfigResolver,
+ section_summaries: dict[str, str],
+) -> str:
+ """Generate executive summary from all section summaries.
+
+ Concatenates section summaries, chunks if needed, and produces
+ a ≤300-word synthesis via the Report_Summarizer_Agent.
+ Falls back to concatenated section summaries on failure.
+ """
+ resolved = await resolver.resolve(_REPORT_SUMMARIZER_SLUG)
+ concatenated = "\n\n".join(
+ f"{name}: {summary}" for name, summary in section_summaries.items()
+ )
+
+ if resolved is None:
+ logger.error(
+ "Report summarizer agent not found — using concatenated summaries as executive summary",
+ )
+ return concatenated
+
+ chunks = chunk_data(concatenated)
+
+ start = time.monotonic()
+ try:
+ # Summarize chunks if needed
+ if len(chunks) > 1:
+ chunk_summaries: list[str] = []
+ for chunk in chunks:
+ summary = await _summarize_chunk(resolved, "executive", chunk)
+ chunk_summaries.append(summary)
+ input_text = "\n\n".join(chunk_summaries)
+ else:
+ input_text = chunks[0]
+
+ # Final executive summary call
+ cfg = load_config()
+ client = build_llm_client(resolved, cfg.ollama, cfg.vllm)
+ try:
+ prompts = {
+ "system": resolved.system_prompt,
+ "user": (
+ f"Synthesize these trading performance section summaries into "
+ f"a concise executive summary of no more than "
+ f"{MAX_EXECUTIVE_SUMMARY_WORDS} words:\n{input_text}"
+ ),
+ }
+ attempt = await client.call_llm(
+ prompts=prompts,
+ json_schema={},
+ document_text="",
+ )
+ finally:
+ await client.close()
+
+ if attempt.error:
+ raise RuntimeError(f"Executive summary LLM error: {attempt.error}")
+ if not attempt.raw_output.strip():
+ raise RuntimeError("Executive summary LLM returned empty response")
+
+ executive = attempt.raw_output.strip()
+
+ # Truncate to MAX_EXECUTIVE_SUMMARY_WORDS at sentence boundary
+ words = executive.split()
+ if len(words) > MAX_EXECUTIVE_SUMMARY_WORDS:
+ truncated = " ".join(words[:MAX_EXECUTIVE_SUMMARY_WORDS])
+ last_period = truncated.rfind(".")
+ if last_period > len(truncated) // 2:
+ truncated = truncated[: last_period + 1]
+ executive = truncated
+
+ duration_ms = int((time.monotonic() - start) * 1000)
+ await _log_performance(
+ pool, resolved, True, duration_ms, concatenated, executive,
+ )
+ return executive
+
+ except Exception as exc:
+ duration_ms = int((time.monotonic() - start) * 1000)
+ logger.warning(
+ "Executive summary generation failed: %s — using concatenated summaries",
+ exc,
+ )
+ await _log_performance(
+ pool, resolved, False, duration_ms, concatenated, "",
+ error_message=str(exc),
+ )
+ return concatenated
diff --git a/services/reporting/validator.py b/services/reporting/validator.py
new file mode 100644
index 0000000..031e5ba
--- /dev/null
+++ b/services/reporting/validator.py
@@ -0,0 +1,175 @@
+"""Report validator — cross-checks computed metrics against live data.
+
+Compares report section values against prediction_outcomes and
+model_metric_snapshots, flagging discrepancies that exceed the
+configured threshold.
+"""
+
+from __future__ import annotations
+
+import logging
+import math
+
+from services.reporting.models import (
+ ModelQualitySection,
+ RecommendationAccuracySection,
+ ReportData,
+ ValidationStatus,
+ ValidationWarning,
+)
+
+logger = logging.getLogger(__name__)
+
+DISCREPANCY_THRESHOLD_PCT = 5.0
+
+
+def _sanitize(value: float | None) -> float:
+ """Replace None, NaN, and infinity with 0.0."""
+ if value is None:
+ return 0.0
+ if math.isnan(value) or math.isinf(value):
+ return 0.0
+ return value
+
+
+def _check_discrepancy(
+ field_name: str,
+ computed: float,
+ snapshot: float,
+) -> ValidationWarning | None:
+ """Compare computed vs snapshot and return a warning if >5% discrepancy.
+
+ Edge cases:
+ - snapshot=0 and computed≠0 → 100% difference → warning
+ - both=0 → 0% difference → no warning
+ - snapshot is handled upstream (NULL → skip before calling this)
+ """
+ computed = _sanitize(computed)
+ snapshot = _sanitize(snapshot)
+
+ if snapshot == 0.0 and computed == 0.0:
+ return None
+
+ if snapshot == 0.0:
+ # Non-zero computed with zero snapshot → 100% discrepancy
+ pct_diff = 100.0
+ else:
+ pct_diff = abs(computed - snapshot) / abs(snapshot) * 100.0
+
+ if pct_diff > DISCREPANCY_THRESHOLD_PCT:
+ return ValidationWarning(
+ field_name=field_name,
+ computed_value=computed,
+ snapshot_value=snapshot,
+ pct_difference=round(pct_diff, 4),
+ )
+ return None
+
+
+def validate_recommendation_accuracy(
+ section: RecommendationAccuracySection,
+ prediction_outcomes: list[dict],
+) -> list[ValidationWarning]:
+ """Cross-reference reported win rates with prediction_outcomes.
+
+ Computes win_rate from prediction_outcomes (count profitable / total)
+ and compares against section.acted_win_rate. Returns warnings for
+ discrepancies > 5%.
+ """
+ warnings: list[ValidationWarning] = []
+
+ if not prediction_outcomes:
+ return warnings
+
+ total = len(prediction_outcomes)
+ profitable_count = sum(
+ 1 for po in prediction_outcomes if po.get("profitable")
+ )
+ computed_win_rate = profitable_count / total if total > 0 else 0.0
+
+ w = _check_discrepancy(
+ "acted_win_rate",
+ section.acted_win_rate,
+ computed_win_rate,
+ )
+ if w is not None:
+ warnings.append(w)
+
+ return warnings
+
+
+def validate_model_quality(
+ section: ModelQualitySection,
+ metric_snapshots: list[dict],
+) -> list[ValidationWarning]:
+ """Compare reported model quality metrics against model_metric_snapshots.
+
+ For each window in the section, finds the matching snapshot by
+ lookback_window and compares win_rate, directional_accuracy,
+ information_coefficient, calibration_error, and brier_score.
+ Flags discrepancies > 5%.
+ """
+ warnings: list[ValidationWarning] = []
+
+ if not metric_snapshots:
+ return warnings
+
+ # Build lookup: lookback_window → latest snapshot (first match since
+ # collector orders by generated_at DESC)
+ snap_by_window: dict[str, dict] = {}
+ for snap in metric_snapshots:
+ window = snap.get("lookback_window", "")
+ if window and window not in snap_by_window:
+ snap_by_window[window] = snap
+
+ metric_fields = [
+ ("win_rate", "win_rate"),
+ ("directional_accuracy", "directional_accuracy"),
+ ("information_coefficient", "information_coefficient"),
+ ("calibration_error", "calibration_error"),
+ ("brier_score", "brier_score"),
+ ]
+
+ for mq_window in section.windows:
+ snap = snap_by_window.get(mq_window.lookback)
+ if snap is None:
+ continue
+
+ for section_attr, snap_key in metric_fields:
+ section_value = getattr(mq_window, section_attr, None)
+ snapshot_value = snap.get(snap_key)
+
+ # NULL snapshot → skip
+ if snapshot_value is None:
+ continue
+ # NULL section value → skip
+ if section_value is None:
+ continue
+
+ snapshot_float = _sanitize(float(snapshot_value))
+ section_float = _sanitize(section_value)
+
+ w = _check_discrepancy(
+ f"{mq_window.lookback}_{section_attr}",
+ section_float,
+ snapshot_float,
+ )
+ if w is not None:
+ warnings.append(w)
+
+ return warnings
+
+
+def compute_validation_status(report: ReportData) -> ValidationStatus:
+ """Determine overall validation status.
+
+ Returns 'passed' if no warnings across all sections,
+ 'warnings' if any section has validation warnings.
+ """
+ if report.pnl.validation_warnings:
+ return ValidationStatus.WARNINGS
+ if report.recommendation_accuracy.validation_warnings:
+ return ValidationStatus.WARNINGS
+ if report.model_quality.validation_warnings:
+ return ValidationStatus.WARNINGS
+ return ValidationStatus.PASSED
diff --git a/services/scheduler/app.py b/services/scheduler/app.py
index df48b86..969efe3 100644
--- a/services/scheduler/app.py
+++ b/services/scheduler/app.py
@@ -10,8 +10,9 @@ import asyncio
import json
import logging
import os
-from datetime import datetime, timezone
+from datetime import datetime, timedelta, timezone
from typing import Any, Optional
+from zoneinfo import ZoneInfo
import asyncpg
import redis.asyncio as aioredis
@@ -26,6 +27,7 @@ from services.shared.redis_keys import (
QUEUE_INGESTION,
QUEUE_MACRO_CLASSIFICATION,
QUEUE_PREFIX,
+ QUEUE_REPORT_GENERATION,
lock_key,
queue_key,
rate_limit_key,
@@ -498,6 +500,163 @@ async def schedule_cycle(pool: asyncpg.Pool, rds: aioredis.Redis) -> int:
return enqueued
+# ---------------------------------------------------------------------------
+# Report generation: queue consumer + scheduled triggers
+# Requirements: 6.1, 6.2, 6.3, 6.4, 6.5
+# ---------------------------------------------------------------------------
+
+# Eastern Time zone for market-close checks
+_ET = ZoneInfo("America/New_York")
+
+# How often to check the report generation queue (every N cycles)
+# 15s tick × 4 cycles = ~1 minute
+REPORT_CONSUMER_CYCLE_INTERVAL: int = 4
+
+# How often to check report scheduling triggers (every N cycles)
+# 15s tick × 20 cycles = ~5 minutes
+REPORT_SCHEDULE_CYCLE_INTERVAL: int = 20
+
+# Redis key prefix for report schedule dedup markers
+_REPORT_DEDUPE_PREFIX = f"{QUEUE_PREFIX}:report_dedupe"
+_REPORT_DEDUPE_TTL = 86400 # 24 hours — prevents re-enqueuing same report within a day
+
+
+def _report_dedupe_key(report_type: str, period_start: str, period_end: str) -> str:
+ """Build a Redis key for deduplicating report schedule triggers."""
+ return f"{_REPORT_DEDUPE_PREFIX}:{report_type}:{period_start}:{period_end}"
+
+
+async def consume_report_generation_jobs(
+ pool: asyncpg.Pool,
+ rds: aioredis.Redis,
+) -> int:
+ """Pop and process jobs from the report generation queue.
+
+ Pops up to 5 jobs per invocation to avoid blocking the scheduler loop.
+ Each job is deserialized and handed to process_report_job from the
+ reporting generator module.
+
+ Returns the number of jobs processed.
+ """
+ from services.reporting.generator import process_report_job
+
+ report_queue = queue_key(QUEUE_REPORT_GENERATION)
+ processed = 0
+
+ for _ in range(5):
+ raw = await rds.lpop(report_queue)
+ if raw is None:
+ break
+
+ try:
+ job = json.loads(raw)
+ except (json.JSONDecodeError, TypeError):
+ logger.error("Invalid report generation job payload: %s", raw)
+ continue
+
+ logger.info(
+ "Processing report generation job: type=%s period=%s to %s",
+ job.get("report_type"),
+ job.get("period_start"),
+ job.get("period_end"),
+ )
+
+ try:
+ await process_report_job(pool, job)
+ processed += 1
+ except Exception:
+ logger.exception(
+ "Failed to process report generation job: %s", job,
+ )
+
+ if processed > 0:
+ logger.info("Processed %d report generation jobs", processed)
+ return processed
+
+
+async def maybe_enqueue_daily_report(
+ rds: aioredis.Redis,
+ now_et: datetime,
+) -> bool:
+ """Enqueue a daily report job if it's after 16:30 ET on a weekday.
+
+ Uses a Redis dedupe key to avoid re-enqueuing the same daily report.
+ Returns True if a job was enqueued, False otherwise.
+ """
+ # Only on weekdays (Mon=0 .. Fri=4)
+ if now_et.weekday() > 4:
+ return False
+
+ # Only after 16:30 ET
+ if now_et.hour < 16 or (now_et.hour == 16 and now_et.minute < 30):
+ return False
+
+ today = now_et.date()
+ period_start = today.isoformat()
+ period_end = today.isoformat()
+
+ dedupe = _report_dedupe_key("daily", period_start, period_end)
+ created = await rds.set(dedupe, "1", nx=True, ex=_REPORT_DEDUPE_TTL)
+ if not created:
+ return False
+
+ job = json.dumps({
+ "report_type": "daily",
+ "period_start": period_start,
+ "period_end": period_end,
+ })
+ await rds.rpush(queue_key(QUEUE_REPORT_GENERATION), job)
+ logger.info("Enqueued daily report for %s", period_start)
+ return True
+
+
+async def maybe_enqueue_weekly_report(
+ rds: aioredis.Redis,
+ now_et: datetime,
+) -> bool:
+ """Enqueue a weekly report job on Saturday.
+
+ Covers the previous Monday through Friday.
+ Uses a Redis dedupe key to avoid re-enqueuing the same weekly report.
+ Returns True if a job was enqueued, False otherwise.
+ """
+ # Only on Saturday (weekday() == 5)
+ if now_et.weekday() != 5:
+ return False
+
+ today = now_et.date()
+ # Previous Monday = today - 5 days, previous Friday = today - 1 day
+ period_start = (today - timedelta(days=5)).isoformat()
+ period_end = (today - timedelta(days=1)).isoformat()
+
+ dedupe = _report_dedupe_key("weekly", period_start, period_end)
+ created = await rds.set(dedupe, "1", nx=True, ex=_REPORT_DEDUPE_TTL)
+ if not created:
+ return False
+
+ job = json.dumps({
+ "report_type": "weekly",
+ "period_start": period_start,
+ "period_end": period_end,
+ })
+ await rds.rpush(queue_key(QUEUE_REPORT_GENERATION), job)
+ logger.info(
+ "Enqueued weekly report for %s to %s", period_start, period_end,
+ )
+ return True
+
+
+async def check_report_schedule(rds: aioredis.Redis) -> None:
+ """Check if daily or weekly report triggers should fire.
+
+ Called periodically from the main loop. Uses Eastern Time to determine
+ market close (16:30 ET) and day of week.
+ """
+ now_et = datetime.now(tz=_ET)
+ await maybe_enqueue_daily_report(rds, now_et)
+ await maybe_enqueue_weekly_report(rds, now_et)
+
+
async def enqueue_periodic_aggregation(pool: asyncpg.Pool, rds: aioredis.Redis) -> int:
"""Enqueue aggregation jobs for all active tickers.
@@ -544,6 +703,8 @@ async def main() -> None:
retry_counter = 0
cleanup_counter = 0
aggregation_counter = 0
+ report_consumer_counter = 0
+ report_schedule_counter = 0
try:
while True:
try:
@@ -576,6 +737,16 @@ async def main() -> None:
if aggregation_counter >= AGGREGATION_CYCLE_INTERVAL:
aggregation_counter = 0
await enqueue_periodic_aggregation(pool, rds)
+ # Consume report generation jobs (~1 minute)
+ report_consumer_counter += 1
+ if report_consumer_counter >= REPORT_CONSUMER_CYCLE_INTERVAL:
+ report_consumer_counter = 0
+ await consume_report_generation_jobs(pool, rds)
+ # Check report schedule triggers (~5 minutes)
+ report_schedule_counter += 1
+ if report_schedule_counter >= REPORT_SCHEDULE_CYCLE_INTERVAL:
+ report_schedule_counter = 0
+ await check_report_schedule(rds)
finally:
await release_lock(rds, "scheduler_cycle")
except Exception:
diff --git a/services/shared/redis_keys.py b/services/shared/redis_keys.py
index e176486..131d212 100644
--- a/services/shared/redis_keys.py
+++ b/services/shared/redis_keys.py
@@ -68,6 +68,8 @@ QUEUE_LAKE_PUBLISH = "lake_publish"
QUEUE_TRADE = "trade"
QUEUE_BROKER = "broker_orders"
QUEUE_MACRO_CLASSIFICATION = "macro_classification"
+QUEUE_REPORT_GENERATION = "report_generation"
+QUEUE_REPORT_GENERATION = "report_generation"
# --- Trading engine ---
QUEUE_TRADING_DECISIONS = "trading_decisions"
diff --git a/tests/test_pbt_report_chunking.py b/tests/test_pbt_report_chunking.py
new file mode 100644
index 0000000..5b43c9f
--- /dev/null
+++ b/tests/test_pbt_report_chunking.py
@@ -0,0 +1,110 @@
+# Feature: trading-feedback-engine, Property 1: Chunking round-trip and size constraint
+"""Property-based tests for report data chunking.
+
+Feature: trading-feedback-engine
+
+Tests the chunking round-trip and size constraint property from the design
+specification: for any input string, splitting it into chunks with a maximum
+size limit produces chunks where (a) every chunk is ≤ the size limit in
+characters (for chunks that don't contain a single oversized line), (b) no
+chunk is empty (except when the input itself is empty, which produces exactly
+one empty chunk), and (c) concatenating all chunks in order reconstructs the
+original input string.
+"""
+from __future__ import annotations
+
+from hypothesis import given, settings
+from hypothesis import strategies as st
+
+from services.reporting.summarizer import chunk_data
+
+# ---------------------------------------------------------------------------
+# Property 1: Chunking Round-Trip and Size Constraint
+# Validates: Requirements 2.2
+# ---------------------------------------------------------------------------
+
+
+@given(
+ text=st.text(),
+ max_chars=st.integers(min_value=1, max_value=10000),
+)
+@settings(max_examples=100)
+def test_chunk_data_round_trip(text: str, max_chars: int) -> None:
+ """**Validates: Requirements 2.2**
+
+ For any input string and any max_chars ≥ 1, concatenating all chunks
+ produced by chunk_data SHALL reconstruct the original input string
+ exactly (round-trip property).
+ """
+ chunks = chunk_data(text, max_chars)
+ reconstructed = "".join(chunks)
+ assert reconstructed == text, (
+ f"Round-trip failed: concatenation of {len(chunks)} chunks does not "
+ f"equal original input.\n"
+ f" original length: {len(text)}\n"
+ f" reconstructed length: {len(reconstructed)}\n"
+ f" max_chars: {max_chars}"
+ )
+
+
+@given(
+ text=st.text(),
+ max_chars=st.integers(min_value=1, max_value=10000),
+)
+@settings(max_examples=100)
+def test_chunk_data_no_empty_chunks(text: str, max_chars: int) -> None:
+ """**Validates: Requirements 2.2**
+
+ For any input string and any max_chars ≥ 1, chunk_data SHALL produce
+ no empty chunks — except when the input itself is empty, in which case
+ it SHALL produce exactly one empty chunk.
+ """
+ chunks = chunk_data(text, max_chars)
+
+ if text == "":
+ assert chunks == [""], (
+ f"Empty input should produce exactly [''], got {chunks!r}"
+ )
+ else:
+ for i, chunk in enumerate(chunks):
+ assert chunk != "", (
+ f"Chunk {i} is empty for non-empty input.\n"
+ f" input length: {len(text)}\n"
+ f" max_chars: {max_chars}\n"
+ f" total chunks: {len(chunks)}"
+ )
+
+
+@given(
+ text=st.text(),
+ max_chars=st.integers(min_value=1, max_value=10000),
+)
+@settings(max_examples=100)
+def test_chunk_data_size_constraint(text: str, max_chars: int) -> None:
+ """**Validates: Requirements 2.2**
+
+ For any input string and any max_chars ≥ 1, every chunk produced by
+ chunk_data SHALL be ≤ max_chars in length — UNLESS the chunk contains
+ a single line that by itself exceeds max_chars (since chunk_data never
+ breaks mid-line, such a line is emitted as its own chunk).
+
+ A chunk is considered "oversized due to a single long line" when it
+ consists of exactly one segment (a line with its trailing newline, or
+ the final line without one) whose length exceeds max_chars.
+ """
+ chunks = chunk_data(text, max_chars)
+
+ for i, chunk in enumerate(chunks):
+ if len(chunk) > max_chars:
+ # This chunk exceeds the limit. It must be because it contains
+ # a single line that is itself longer than max_chars.
+ # A single-segment chunk has at most one newline (at the end).
+ lines_in_chunk = chunk.split("\n")
+ # If the chunk ends with \n, split produces a trailing empty string
+ non_empty_lines = [ln for ln in lines_in_chunk if ln]
+ assert len(non_empty_lines) <= 1, (
+ f"Chunk {i} exceeds max_chars={max_chars} "
+ f"(len={len(chunk)}) but contains multiple non-empty lines, "
+ f"which should not happen.\n"
+ f" lines: {non_empty_lines!r}"
+ )
diff --git a/tests/test_pbt_report_sections.py b/tests/test_pbt_report_sections.py
new file mode 100644
index 0000000..cb07b66
--- /dev/null
+++ b/tests/test_pbt_report_sections.py
@@ -0,0 +1,423 @@
+# Feature: trading-feedback-engine, Property 4: Recommendation accuracy aggregation
+# Feature: trading-feedback-engine, Property 5: Portfolio period-over-period delta computation
+"""Property-based tests for report section builders.
+
+Feature: trading-feedback-engine
+
+Property 4 tests the recommendation accuracy aggregation property from the
+design specification: for any non-empty list of trading decisions with
+associated prediction outcomes, the computed acted_win_rate SHALL equal the
+count of profitable outcomes divided by total acted outcomes with prediction
+data, and all rate values SHALL be in [0.0, 1.0].
+
+Property 5 tests the portfolio period-over-period delta computation property
+from the design specification: for any two valid portfolio snapshots (current
+and previous), the period-over-period deltas SHALL equal (current - previous)
+for each field. When no previous snapshot exists, the deltas SHALL be zero.
+"""
+from __future__ import annotations
+
+import uuid
+
+from hypothesis import given, settings
+from hypothesis import strategies as st
+
+from services.reporting.collector import CollectedData
+from services.reporting.sections import (
+ build_pnl_section,
+ build_recommendation_accuracy_section,
+)
+
+# ---------------------------------------------------------------------------
+# Property 4: Recommendation Accuracy Aggregation
+# Validates: Requirements 1.4
+# ---------------------------------------------------------------------------
+
+# Strategy: generate a list of unique tickers, then build matching
+# trading_decisions, recommendations, and prediction_outcomes.
+
+_ticker_strategy = st.text(
+ alphabet=st.characters(whitelist_categories=("Lu",)),
+ min_size=1,
+ max_size=5,
+)
+
+_confidence_strategy = st.floats(
+ min_value=0.0, max_value=1.0, allow_nan=False, allow_infinity=False,
+)
+
+_excess_return_strategy = st.floats(
+ min_value=-1.0, max_value=1.0, allow_nan=False, allow_infinity=False,
+)
+
+
+@st.composite
+def recommendation_accuracy_data(draw: st.DrawFn) -> tuple[CollectedData, dict]:
+ """Generate CollectedData with matching trading decisions, recommendations,
+ and prediction outcomes for testing recommendation accuracy.
+
+ Returns (CollectedData, expected_values) where expected_values contains
+ the independently computed expected results.
+ """
+ # Generate 1-20 trading decisions with unique tickers
+ n = draw(st.integers(min_value=1, max_value=20))
+ tickers = [draw(_ticker_strategy) for _ in range(n)]
+ # Ensure unique tickers by appending index
+ tickers = [f"{t}{i}" for i, t in enumerate(tickers)]
+
+ decisions = draw(
+ st.lists(
+ st.sampled_from(["act", "skip"]),
+ min_size=n,
+ max_size=n,
+ )
+ )
+ confidences = draw(
+ st.lists(
+ _confidence_strategy,
+ min_size=n,
+ max_size=n,
+ )
+ )
+ profitable_flags = draw(
+ st.lists(
+ st.booleans(),
+ min_size=n,
+ max_size=n,
+ )
+ )
+ direction_correct_flags = draw(
+ st.lists(
+ st.booleans(),
+ min_size=n,
+ max_size=n,
+ )
+ )
+ excess_returns = draw(
+ st.lists(
+ _excess_return_strategy,
+ min_size=n,
+ max_size=n,
+ )
+ )
+
+ trading_decisions = []
+ recommendations = []
+ prediction_outcomes = []
+
+ # Track expected values
+ exp_act_count = 0
+ exp_skip_count = 0
+ exp_acted_wins = 0
+ exp_acted_with_outcome = 0
+ exp_confidence_acted: list[float] = []
+ exp_confidence_skipped: list[float] = []
+
+ for i in range(n):
+ rec_id = str(uuid.uuid4())
+ ticker = tickers[i]
+ decision = decisions[i]
+ confidence = confidences[i]
+ profitable = profitable_flags[i]
+ direction_correct = direction_correct_flags[i]
+ excess_return = excess_returns[i]
+
+ trading_decisions.append(
+ {
+ "id": str(uuid.uuid4()),
+ "recommendation_id": rec_id,
+ "decision": decision,
+ "ticker": ticker,
+ }
+ )
+ recommendations.append(
+ {
+ "id": rec_id,
+ "confidence": confidence,
+ }
+ )
+ prediction_outcomes.append(
+ {
+ "ticker": ticker,
+ "profitable": profitable,
+ "direction_correct": direction_correct,
+ "excess_return_vs_spy": excess_return,
+ }
+ )
+
+ if decision == "act":
+ exp_act_count += 1
+ exp_confidence_acted.append(confidence)
+ # Every acted decision has a matching prediction outcome by ticker
+ exp_acted_with_outcome += 1
+ if profitable:
+ exp_acted_wins += 1
+ else:
+ exp_skip_count += 1
+ exp_confidence_skipped.append(confidence)
+
+ data = CollectedData(
+ trading_decisions=trading_decisions,
+ recommendations=recommendations,
+ prediction_outcomes=prediction_outcomes,
+ )
+
+ exp_acted_win_rate = (
+ (exp_acted_wins / exp_acted_with_outcome)
+ if exp_acted_with_outcome > 0
+ else 0.0
+ )
+ exp_avg_confidence_acted = (
+ (sum(exp_confidence_acted) / len(exp_confidence_acted))
+ if exp_confidence_acted
+ else 0.0
+ )
+ exp_avg_confidence_skipped = (
+ (sum(exp_confidence_skipped) / len(exp_confidence_skipped))
+ if exp_confidence_skipped
+ else 0.0
+ )
+
+ expected = {
+ "total_evaluated": exp_act_count + exp_skip_count,
+ "act_count": exp_act_count,
+ "skip_count": exp_skip_count,
+ "acted_win_rate": exp_acted_win_rate,
+ "avg_confidence_acted": exp_avg_confidence_acted,
+ "avg_confidence_skipped": exp_avg_confidence_skipped,
+ }
+
+ return data, expected
+
+
+@given(data_and_expected=recommendation_accuracy_data())
+@settings(max_examples=100)
+def test_recommendation_accuracy_aggregation(
+ data_and_expected: tuple[CollectedData, dict],
+) -> None:
+ """**Validates: Requirements 1.4**
+
+ For any non-empty list of trading decisions with associated prediction
+ outcomes, the computed acted_win_rate SHALL equal the count of profitable
+ outcomes divided by total acted outcomes with prediction data, act/skip
+ counts SHALL match, average confidence values SHALL match, and all rate
+ values SHALL be in [0.0, 1.0].
+ """
+ data, expected = data_and_expected
+ section = build_recommendation_accuracy_section(data)
+
+ # Verify act/skip counts
+ assert section.total_evaluated == expected["total_evaluated"], (
+ f"total_evaluated mismatch: got {section.total_evaluated}, "
+ f"expected {expected['total_evaluated']}"
+ )
+ assert section.act_count == expected["act_count"], (
+ f"act_count mismatch: got {section.act_count}, "
+ f"expected {expected['act_count']}"
+ )
+ assert section.skip_count == expected["skip_count"], (
+ f"skip_count mismatch: got {section.skip_count}, "
+ f"expected {expected['skip_count']}"
+ )
+
+ # Verify acted win rate
+ assert abs(section.acted_win_rate - expected["acted_win_rate"]) < 1e-9, (
+ f"acted_win_rate mismatch: got {section.acted_win_rate}, "
+ f"expected {expected['acted_win_rate']}"
+ )
+
+ # Verify average confidence values
+ assert abs(section.avg_confidence_acted - expected["avg_confidence_acted"]) < 1e-9, (
+ f"avg_confidence_acted mismatch: got {section.avg_confidence_acted}, "
+ f"expected {expected['avg_confidence_acted']}"
+ )
+ assert abs(section.avg_confidence_skipped - expected["avg_confidence_skipped"]) < 1e-9, (
+ f"avg_confidence_skipped mismatch: got {section.avg_confidence_skipped}, "
+ f"expected {expected['avg_confidence_skipped']}"
+ )
+
+ # All rate values must be in [0.0, 1.0]
+ assert 0.0 <= section.acted_win_rate <= 1.0, (
+ f"acted_win_rate out of range: {section.acted_win_rate}"
+ )
+ assert 0.0 <= section.avg_confidence_acted <= 1.0, (
+ f"avg_confidence_acted out of range: {section.avg_confidence_acted}"
+ )
+ assert 0.0 <= section.avg_confidence_skipped <= 1.0, (
+ f"avg_confidence_skipped out of range: {section.avg_confidence_skipped}"
+ )
+
+
+# ---------------------------------------------------------------------------
+# Property 5: Portfolio Period-Over-Period Delta Computation
+# Validates: Requirements 1.3
+# ---------------------------------------------------------------------------
+
+_non_negative_float = st.floats(
+ min_value=0.0, max_value=1e8, allow_nan=False, allow_infinity=False,
+)
+
+_finite_float = st.floats(
+ min_value=-1e6, max_value=1e6, allow_nan=False, allow_infinity=False,
+)
+
+
+@st.composite
+def portfolio_snapshot_pair(draw: st.DrawFn) -> tuple[dict, dict]:
+ """Generate a pair of portfolio snapshots (current, previous) with
+ non-negative portfolio_value, active_pool, reserve_pool, and finite
+ cumulative_return.
+ """
+ current = {
+ "portfolio_value": draw(_non_negative_float),
+ "active_pool": draw(_non_negative_float),
+ "reserve_pool": draw(_non_negative_float),
+ "cumulative_return": draw(_finite_float),
+ "realized_pnl": draw(_finite_float),
+ "unrealized_pnl": draw(_finite_float),
+ "daily_return": draw(_finite_float),
+ "win_count": draw(st.integers(min_value=0, max_value=10000)),
+ "loss_count": draw(st.integers(min_value=0, max_value=10000)),
+ "win_rate": draw(
+ st.floats(
+ min_value=0.0, max_value=1.0,
+ allow_nan=False, allow_infinity=False,
+ )
+ ),
+ "sharpe_ratio": draw(_finite_float),
+ }
+ previous = {
+ "portfolio_value": draw(_non_negative_float),
+ "active_pool": draw(_non_negative_float),
+ "reserve_pool": draw(_non_negative_float),
+ "cumulative_return": draw(_finite_float),
+ "realized_pnl": draw(_finite_float),
+ "unrealized_pnl": draw(_finite_float),
+ "daily_return": draw(_finite_float),
+ "win_count": draw(st.integers(min_value=0, max_value=10000)),
+ "loss_count": draw(st.integers(min_value=0, max_value=10000)),
+ "win_rate": draw(
+ st.floats(
+ min_value=0.0, max_value=1.0,
+ allow_nan=False, allow_infinity=False,
+ )
+ ),
+ "sharpe_ratio": draw(_finite_float),
+ }
+ return current, previous
+
+
+@given(snapshots=portfolio_snapshot_pair())
+@settings(max_examples=100)
+def test_portfolio_delta_with_both_snapshots(
+ snapshots: tuple[dict, dict],
+) -> None:
+ """**Validates: Requirements 1.3**
+
+ For any two valid portfolio snapshots (current and previous), the
+ period-over-period deltas SHALL equal (current - previous) for
+ portfolio_value, active_pool, reserve_pool, and cumulative_return.
+
+ The build_pnl_section extracts values from the current snapshot.
+ We verify that the delta between the current and previous section
+ outputs matches (current - previous) for each field.
+ """
+ current_snap, previous_snap = snapshots
+
+ # Build sections from current and previous snapshots
+ data_current = CollectedData(portfolio_snapshot=current_snap)
+ data_previous = CollectedData(portfolio_snapshot=previous_snap)
+
+ section_current = build_pnl_section(data_current)
+ section_previous = build_pnl_section(data_previous)
+
+ # Verify deltas: current section values - previous section values
+ # should equal current snapshot values - previous snapshot values
+ delta_cumulative = section_current.cumulative_return - section_previous.cumulative_return
+ expected_delta_cumulative = (
+ float(current_snap["cumulative_return"])
+ - float(previous_snap["cumulative_return"])
+ )
+ assert abs(delta_cumulative - expected_delta_cumulative) < 1e-9, (
+ f"cumulative_return delta mismatch: "
+ f"got {delta_cumulative}, expected {expected_delta_cumulative}"
+ )
+
+ delta_realized = section_current.realized_pnl - section_previous.realized_pnl
+ expected_delta_realized = (
+ float(current_snap["realized_pnl"])
+ - float(previous_snap["realized_pnl"])
+ )
+ assert abs(delta_realized - expected_delta_realized) < 1e-9, (
+ f"realized_pnl delta mismatch: "
+ f"got {delta_realized}, expected {expected_delta_realized}"
+ )
+
+ delta_unrealized = section_current.unrealized_pnl - section_previous.unrealized_pnl
+ expected_delta_unrealized = (
+ float(current_snap["unrealized_pnl"])
+ - float(previous_snap["unrealized_pnl"])
+ )
+ assert abs(delta_unrealized - expected_delta_unrealized) < 1e-9, (
+ f"unrealized_pnl delta mismatch: "
+ f"got {delta_unrealized}, expected {expected_delta_unrealized}"
+ )
+
+ # Verify that section values faithfully reflect snapshot values
+ assert abs(section_current.cumulative_return - float(current_snap["cumulative_return"])) < 1e-9
+ assert abs(section_current.realized_pnl - float(current_snap["realized_pnl"])) < 1e-9
+ assert abs(section_current.unrealized_pnl - float(current_snap["unrealized_pnl"])) < 1e-9
+ assert abs(section_current.daily_return - float(current_snap["daily_return"])) < 1e-9
+ assert abs(section_current.win_rate - float(current_snap["win_rate"])) < 1e-9
+
+
+@given(
+ portfolio_value=_non_negative_float,
+ active_pool=_non_negative_float,
+ reserve_pool=_non_negative_float,
+ cumulative_return=_finite_float,
+)
+@settings(max_examples=100)
+def test_portfolio_delta_no_previous_snapshot(
+ portfolio_value: float,
+ active_pool: float,
+ reserve_pool: float,
+ cumulative_return: float,
+) -> None:
+ """**Validates: Requirements 1.3**
+
+ When no previous snapshot exists, the section SHALL use zero values
+ for all fields (since portfolio_snapshot is None), meaning the deltas
+ from a zero baseline are effectively zero.
+ """
+ # When portfolio_snapshot is None, build_pnl_section returns all zeros
+ data_no_snapshot = CollectedData(portfolio_snapshot=None)
+ section = build_pnl_section(data_no_snapshot)
+
+ assert section.realized_pnl == 0.0, (
+ f"Expected 0.0 realized_pnl with no snapshot, got {section.realized_pnl}"
+ )
+ assert section.unrealized_pnl == 0.0, (
+ f"Expected 0.0 unrealized_pnl with no snapshot, got {section.unrealized_pnl}"
+ )
+ assert section.daily_return == 0.0, (
+ f"Expected 0.0 daily_return with no snapshot, got {section.daily_return}"
+ )
+ assert section.cumulative_return == 0.0, (
+ f"Expected 0.0 cumulative_return with no snapshot, got {section.cumulative_return}"
+ )
+ assert section.win_count == 0, (
+ f"Expected 0 win_count with no snapshot, got {section.win_count}"
+ )
+ assert section.loss_count == 0, (
+ f"Expected 0 loss_count with no snapshot, got {section.loss_count}"
+ )
+ assert section.win_rate == 0.0, (
+ f"Expected 0.0 win_rate with no snapshot, got {section.win_rate}"
+ )
+ assert section.sharpe_ratio == 0.0, (
+ f"Expected 0.0 sharpe_ratio with no snapshot, got {section.sharpe_ratio}"
+ )
+ assert section.profit_factor == 0.0, (
+ f"Expected 0.0 profit_factor with no snapshot, got {section.profit_factor}"
+ )
diff --git a/tests/test_pbt_report_serialization.py b/tests/test_pbt_report_serialization.py
new file mode 100644
index 0000000..44b0bd2
--- /dev/null
+++ b/tests/test_pbt_report_serialization.py
@@ -0,0 +1,245 @@
+# Feature: trading-feedback-engine, Property 2: Report serialization round-trip
+"""Property-based tests for report serialization round-trip.
+
+Feature: trading-feedback-engine
+
+Tests the report serialization round-trip property from the design
+specification: for any valid ReportData object (with valid P&L,
+recommendation accuracy, position performance, risk metrics, and model
+quality sections), serializing to JSON and then deserializing back SHALL
+produce a ReportData object equivalent to the original. All datetime fields
+in the serialized JSON SHALL be in ISO 8601 format.
+"""
+from __future__ import annotations
+
+import json
+import re
+from datetime import date, datetime, timezone
+
+from hypothesis import given, settings
+from hypothesis import strategies as st
+
+from services.reporting.models import (
+ ModelQualitySection,
+ ModelQualityWindow,
+ PLSection,
+ PositionDetail,
+ PositionPerformanceSection,
+ RecommendationAccuracySection,
+ ReportData,
+ ReportType,
+ RiskMetricsSection,
+ ValidationStatus,
+ ValidationWarning,
+)
+
+# ---------------------------------------------------------------------------
+# Property 2: Report Serialization Round-Trip
+# Validates: Requirements 8.1, 8.2, 8.3, 8.4
+# ---------------------------------------------------------------------------
+
+# ISO 8601 datetime pattern (covers both datetime and date formats)
+_ISO8601_DATETIME_RE = re.compile(
+ r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}" # YYYY-MM-DDTHH:MM:SS
+ r"(?:\.\d+)?" # optional fractional seconds
+ r"(?:Z|[+-]\d{2}:\d{2})?$" # optional timezone
+)
+_ISO8601_DATE_RE = re.compile(r"^\d{4}-\d{2}-\d{2}$")
+
+# ---------------------------------------------------------------------------
+# Hypothesis strategies for each model
+# ---------------------------------------------------------------------------
+
+_finite_float = st.floats(allow_nan=False, allow_infinity=False)
+_non_negative_finite_float = st.floats(
+ min_value=0.0, allow_nan=False, allow_infinity=False,
+)
+_rate_float = st.floats(
+ min_value=0.0, max_value=1.0, allow_nan=False, allow_infinity=False,
+)
+_optional_finite_float = st.one_of(st.none(), _finite_float)
+
+_validation_warning_strategy = st.builds(
+ ValidationWarning,
+ field_name=st.text(min_size=1, max_size=50),
+ computed_value=_finite_float,
+ snapshot_value=_finite_float,
+ pct_difference=_non_negative_finite_float,
+)
+
+_pnl_section_strategy = st.builds(
+ PLSection,
+ realized_pnl=_finite_float,
+ unrealized_pnl=_finite_float,
+ daily_return=_finite_float,
+ cumulative_return=_finite_float,
+ win_count=st.integers(min_value=0, max_value=10000),
+ loss_count=st.integers(min_value=0, max_value=10000),
+ win_rate=_rate_float,
+ profit_factor=_non_negative_finite_float,
+ sharpe_ratio=_finite_float,
+ summary=st.text(max_size=200),
+ validation_warnings=st.lists(
+ _validation_warning_strategy, min_size=0, max_size=3,
+ ),
+)
+
+_recommendation_accuracy_strategy = st.builds(
+ RecommendationAccuracySection,
+ total_evaluated=st.integers(min_value=0, max_value=10000),
+ act_count=st.integers(min_value=0, max_value=10000),
+ skip_count=st.integers(min_value=0, max_value=10000),
+ acted_win_rate=_rate_float,
+ avg_confidence_acted=_rate_float,
+ avg_confidence_skipped=_rate_float,
+ summary=st.text(max_size=200),
+ validation_warnings=st.lists(
+ _validation_warning_strategy, min_size=0, max_size=3,
+ ),
+)
+
+_position_detail_strategy = st.builds(
+ PositionDetail,
+ ticker=st.text(min_size=1, max_size=10),
+ entry_price=_finite_float,
+ current_or_exit_price=_finite_float,
+ pnl=_finite_float,
+ pnl_pct=_finite_float,
+ hold_duration_hours=_non_negative_finite_float,
+ status=st.sampled_from(["open", "closed"]),
+)
+
+_position_performance_strategy = st.builds(
+ PositionPerformanceSection,
+ positions=st.lists(_position_detail_strategy, min_size=0, max_size=5),
+ summary=st.text(max_size=200),
+)
+
+_risk_metrics_strategy = st.builds(
+ RiskMetricsSection,
+ current_risk_tier=st.sampled_from(["low", "moderate", "high", "critical"]),
+ portfolio_heat=_non_negative_finite_float,
+ max_drawdown=_non_negative_finite_float,
+ current_drawdown_pct=_non_negative_finite_float,
+ reserve_pool_balance=_non_negative_finite_float,
+ circuit_breaker_event_count=st.integers(min_value=0, max_value=100),
+ summary=st.text(max_size=200),
+)
+
+_model_quality_window_strategy = st.builds(
+ ModelQualityWindow,
+ lookback=st.sampled_from(["7d", "30d", "90d"]),
+ win_rate=_optional_finite_float,
+ directional_accuracy=_optional_finite_float,
+ information_coefficient=_optional_finite_float,
+ calibration_error=_optional_finite_float,
+ brier_score=_optional_finite_float,
+)
+
+_model_quality_strategy = st.builds(
+ ModelQualitySection,
+ windows=st.lists(_model_quality_window_strategy, min_size=0, max_size=3),
+ summary=st.text(max_size=200),
+ validation_warnings=st.lists(
+ _validation_warning_strategy, min_size=0, max_size=3,
+ ),
+)
+
+# Use timezone-aware datetimes for generated_at
+_aware_datetime_strategy = st.datetimes(
+ min_value=datetime(2020, 1, 1),
+ max_value=datetime(2030, 12, 31),
+ timezones=st.just(timezone.utc),
+)
+
+_date_strategy = st.dates(
+ min_value=date(2020, 1, 1),
+ max_value=date(2030, 12, 31),
+)
+
+_report_data_strategy = st.builds(
+ ReportData,
+ pnl=_pnl_section_strategy,
+ recommendation_accuracy=_recommendation_accuracy_strategy,
+ position_performance=_position_performance_strategy,
+ risk_metrics=_risk_metrics_strategy,
+ model_quality=_model_quality_strategy,
+ executive_summary=st.text(max_size=300),
+ validation_status=st.sampled_from(list(ValidationStatus)),
+ generated_at=_aware_datetime_strategy,
+ period_start=_date_strategy,
+ period_end=_date_strategy,
+ report_type=st.sampled_from(list(ReportType)),
+)
+
+
+# ---------------------------------------------------------------------------
+# Helper: recursively find all datetime-like string values in parsed JSON
+# ---------------------------------------------------------------------------
+
+_DATETIME_FIELD_NAMES = {"generated_at"}
+_DATE_FIELD_NAMES = {"period_start", "period_end"}
+
+
+def _collect_datetime_strings(
+ obj: object,
+ key: str | None = None,
+) -> list[tuple[str, str]]:
+ """Walk parsed JSON and collect (field_name, value) for datetime fields."""
+ results: list[tuple[str, str]] = []
+ if isinstance(obj, dict):
+ for k, v in obj.items():
+ results.extend(_collect_datetime_strings(v, k))
+ elif isinstance(obj, list):
+ for item in obj:
+ results.extend(_collect_datetime_strings(item, key))
+ elif isinstance(obj, str) and key is not None:
+ if key in _DATETIME_FIELD_NAMES or key in _DATE_FIELD_NAMES:
+ results.append((key, obj))
+ return results
+
+
+# ---------------------------------------------------------------------------
+# Property tests
+# ---------------------------------------------------------------------------
+
+
+@given(report=_report_data_strategy)
+@settings(max_examples=100)
+def test_report_serialization_round_trip(report: ReportData) -> None:
+ """**Validates: Requirements 8.1, 8.2, 8.3, 8.4**
+
+ For any valid ReportData object, serializing to JSON and then
+ deserializing back SHALL produce a ReportData object equivalent
+ to the original.
+ """
+ json_str = report.model_dump_json()
+ restored = ReportData.model_validate_json(json_str)
+ assert restored == report, (
+ f"Round-trip failed: deserialized report differs from original.\n"
+ f" report_type: {report.report_type}\n"
+ f" period: {report.period_start} → {report.period_end}\n"
+ f" generated_at: {report.generated_at}"
+ )
+
+
+@given(report=_report_data_strategy)
+@settings(max_examples=100)
+def test_report_datetime_fields_iso8601(report: ReportData) -> None:
+ """**Validates: Requirements 8.4**
+
+ All datetime fields in the serialized JSON SHALL be in ISO 8601 format.
+ """
+ json_str = report.model_dump_json()
+ parsed = json.loads(json_str)
+ dt_fields = _collect_datetime_strings(parsed)
+
+ for field_name, value in dt_fields:
+ if field_name in _DATETIME_FIELD_NAMES:
+ assert _ISO8601_DATETIME_RE.match(value), (
+ f"Datetime field '{field_name}' is not ISO 8601: {value!r}"
+ )
+ elif field_name in _DATE_FIELD_NAMES:
+ assert _ISO8601_DATE_RE.match(value), (
+ f"Date field '{field_name}' is not ISO 8601: {value!r}"
+ )
diff --git a/tests/test_pbt_report_validation.py b/tests/test_pbt_report_validation.py
new file mode 100644
index 0000000..624886e
--- /dev/null
+++ b/tests/test_pbt_report_validation.py
@@ -0,0 +1,127 @@
+# Feature: trading-feedback-engine, Property 3: Validation discrepancy detection correctness
+"""Property-based tests for report validation discrepancy detection.
+
+Feature: trading-feedback-engine
+
+Tests the validation discrepancy detection correctness property from the
+design specification: for any pair of computed metric value and snapshot
+metric value (both finite, non-negative floats), the validation function
+SHALL produce a warning if and only if the percentage difference exceeds 5%.
+The percentage difference SHALL be computed as |computed - snapshot| /
+snapshot * 100 when snapshot > 0, and SHALL flag any non-zero computed value
+when snapshot is 0.
+"""
+from __future__ import annotations
+
+import math
+
+from hypothesis import given, settings
+from hypothesis import strategies as st
+
+from services.reporting.validator import (
+ DISCREPANCY_THRESHOLD_PCT,
+ _check_discrepancy,
+)
+
+# ---------------------------------------------------------------------------
+# Property 3: Validation Discrepancy Detection Correctness
+# Validates: Requirements 4.1, 4.2, 4.3, 4.4
+# ---------------------------------------------------------------------------
+
+# Strategy: finite, non-negative floats in [0, 1e6]
+_metric_float = st.floats(
+ min_value=0, max_value=1e6, allow_nan=False, allow_infinity=False,
+)
+
+
+@given(computed=_metric_float, snapshot=_metric_float)
+@settings(max_examples=100)
+def test_discrepancy_detection_correctness(
+ computed: float,
+ snapshot: float,
+) -> None:
+ """**Validates: Requirements 4.1, 4.2, 4.3, 4.4**
+
+ For any pair of computed and snapshot values (finite, non-negative):
+ - Both zero → no warning
+ - Snapshot zero, computed non-zero → warning (100% discrepancy)
+ - Snapshot > 0 → warning iff |computed - snapshot| / snapshot * 100 > 5%
+ """
+ result = _check_discrepancy("test_field", computed, snapshot)
+
+ if snapshot == 0.0 and computed == 0.0:
+ # Both zero → no discrepancy
+ assert result is None, (
+ f"Expected no warning when both values are 0, got {result}"
+ )
+ elif snapshot == 0.0:
+ # Non-zero computed with zero snapshot → always a warning
+ assert result is not None, (
+ f"Expected warning for non-zero computed={computed} with "
+ f"snapshot=0, got None"
+ )
+ assert result.pct_difference == 100.0, (
+ f"Expected 100% discrepancy for zero snapshot, "
+ f"got {result.pct_difference}%"
+ )
+ else:
+ # Normal case: snapshot > 0
+ expected_pct = abs(computed - snapshot) / snapshot * 100.0
+ if expected_pct > DISCREPANCY_THRESHOLD_PCT:
+ assert result is not None, (
+ f"Expected warning for {expected_pct:.4f}% discrepancy "
+ f"(computed={computed}, snapshot={snapshot}), got None"
+ )
+ # When expected_pct is inf (very small snapshot), both should be inf
+ if math.isinf(expected_pct):
+ assert math.isinf(result.pct_difference), (
+ f"Expected inf pct_difference, got {result.pct_difference}"
+ )
+ else:
+ assert abs(result.pct_difference - round(expected_pct, 4)) < 1e-6, (
+ f"Percentage difference mismatch: "
+ f"expected {round(expected_pct, 4)}, "
+ f"got {result.pct_difference}"
+ )
+ else:
+ assert result is None, (
+ f"Expected no warning for {expected_pct:.4f}% discrepancy "
+ f"(computed={computed}, snapshot={snapshot}), "
+ f"got warning with pct_difference={result.pct_difference}"
+ )
+
+
+@given(computed=_metric_float, snapshot=_metric_float)
+@settings(max_examples=100)
+def test_discrepancy_threshold_is_five_percent(
+ computed: float,
+ snapshot: float,
+) -> None:
+ """**Validates: Requirements 4.1, 4.2, 4.3, 4.4**
+
+ Verify that DISCREPANCY_THRESHOLD_PCT = 5.0 is the threshold used:
+ the function produces a warning if and only if the discrepancy
+ exceeds exactly 5%.
+ """
+ assert DISCREPANCY_THRESHOLD_PCT == 5.0, (
+ f"Expected threshold of 5.0%, got {DISCREPANCY_THRESHOLD_PCT}%"
+ )
+
+ result = _check_discrepancy("threshold_check", computed, snapshot)
+
+ if snapshot == 0.0 and computed == 0.0:
+ assert result is None
+ elif snapshot == 0.0:
+ # 100% > 5% → always warning
+ assert result is not None
+ else:
+ pct = abs(computed - snapshot) / snapshot * 100.0
+ should_warn = pct > 5.0
+ if should_warn:
+ assert result is not None, (
+ f"Discrepancy {pct:.4f}% > 5% but no warning produced"
+ )
+ else:
+ assert result is None, (
+ f"Discrepancy {pct:.4f}% <= 5% but warning produced"
+ )
diff --git a/tests/test_report_api.py b/tests/test_report_api.py
new file mode 100644
index 0000000..9bcccb6
--- /dev/null
+++ b/tests/test_report_api.py
@@ -0,0 +1,256 @@
+"""API integration tests for trading report endpoints.
+
+Tests GET /api/reports (list with pagination/filtering) and
+GET /api/reports/{report_id} (detail with full report_data).
+
+Uses httpx.AsyncClient with the FastAPI app and mocks the module-level
+``pool`` variable in services.api.app.
+
+Requirements validated: 5.4, 5.5, 5.6
+"""
+from __future__ import annotations
+
+import uuid
+from datetime import date, datetime, timezone
+from unittest.mock import AsyncMock, patch
+
+import httpx
+import pytest
+
+from services.api.app import app
+
+# ── Helpers ──────────────────────────────────────────────────────────────
+
+
+class FakeRecord(dict):
+ """Dict subclass that behaves like an asyncpg Record for bracket access."""
+
+ def __getattr__(self, name: str):
+ try:
+ return self[name]
+ except KeyError:
+ raise AttributeError(name)
+
+
+def _make_list_record(**overrides) -> FakeRecord:
+ """Build a FakeRecord matching the list-endpoint SELECT columns."""
+ defaults = {
+ "id": uuid.uuid4(),
+ "report_type": "daily",
+ "period_start": date(2025, 1, 15),
+ "period_end": date(2025, 1, 15),
+ "validation_status": "passed",
+ "generated_at": datetime(2025, 1, 15, 21, 30, tzinfo=timezone.utc),
+ }
+ defaults.update(overrides)
+ return FakeRecord(**defaults)
+
+
+def _make_detail_record(**overrides) -> FakeRecord:
+ """Build a FakeRecord matching the detail-endpoint SELECT columns."""
+ defaults = {
+ "id": uuid.uuid4(),
+ "report_type": "daily",
+ "period_start": date(2025, 1, 15),
+ "period_end": date(2025, 1, 15),
+ "report_data": {
+ "pnl": {"realized_pnl": 125.50, "unrealized_pnl": -30.20},
+ "executive_summary": "Test summary",
+ },
+ "validation_status": "passed",
+ "generated_at": datetime(2025, 1, 15, 21, 30, tzinfo=timezone.utc),
+ "created_at": datetime(2025, 1, 15, 21, 30, 5, tzinfo=timezone.utc),
+ }
+ defaults.update(overrides)
+ return FakeRecord(**defaults)
+
+
+_POOL_PATCH = "services.api.app.pool"
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 1. GET /api/reports — list endpoint
+# Requirements validated: 5.4, 5.6
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestListReports:
+ """Tests for GET /api/reports."""
+
+ @pytest.mark.asyncio
+ async def test_default_pagination(self) -> None:
+ """List reports with no params returns rows using default limit/offset."""
+ r1 = _make_list_record()
+ r2 = _make_list_record(
+ report_type="weekly",
+ period_start=date(2025, 1, 13),
+ period_end=date(2025, 1, 17),
+ )
+ mock_pool = AsyncMock()
+ mock_pool.fetch = AsyncMock(return_value=[r1, r2])
+
+ with patch(_POOL_PATCH, mock_pool):
+ async with httpx.AsyncClient(
+ transport=httpx.ASGITransport(app=app), base_url="http://test"
+ ) as client:
+ resp = await client.get("/api/reports")
+
+ assert resp.status_code == 200
+ data = resp.json()
+ assert len(data) == 2
+ # UUID fields are serialized as strings
+ assert data[0]["id"] == str(r1["id"])
+ assert data[0]["report_type"] == "daily"
+ assert data[0]["period_start"] == "2025-01-15"
+ assert data[0]["period_end"] == "2025-01-15"
+ assert data[0]["validation_status"] == "passed"
+ assert "generated_at" in data[0]
+
+ # pool.fetch called with default limit=20, offset=0
+ call_args = mock_pool.fetch.call_args
+ sql = call_args[0][0]
+ assert "LIMIT" in sql
+ assert "OFFSET" in sql
+ # Last two positional args are limit and offset
+ assert call_args[0][-2] == 20
+ assert call_args[0][-1] == 0
+
+ @pytest.mark.asyncio
+ async def test_filter_by_report_type(self) -> None:
+ """Filtering by report_type=weekly passes the value to the query."""
+ r1 = _make_list_record(report_type="weekly")
+ mock_pool = AsyncMock()
+ mock_pool.fetch = AsyncMock(return_value=[r1])
+
+ with patch(_POOL_PATCH, mock_pool):
+ async with httpx.AsyncClient(
+ transport=httpx.ASGITransport(app=app), base_url="http://test"
+ ) as client:
+ resp = await client.get("/api/reports", params={"report_type": "weekly"})
+
+ assert resp.status_code == 200
+ data = resp.json()
+ assert len(data) == 1
+ assert data[0]["report_type"] == "weekly"
+
+ # Verify the SQL includes a report_type condition
+ call_args = mock_pool.fetch.call_args
+ sql = call_args[0][0]
+ assert "report_type" in sql
+ # "weekly" should be among the positional params
+ assert "weekly" in call_args[0]
+
+ @pytest.mark.asyncio
+ async def test_filter_by_date_range(self) -> None:
+ """Filtering by start_date and end_date passes dates to the query."""
+ mock_pool = AsyncMock()
+ mock_pool.fetch = AsyncMock(return_value=[])
+
+ with patch(_POOL_PATCH, mock_pool):
+ async with httpx.AsyncClient(
+ transport=httpx.ASGITransport(app=app), base_url="http://test"
+ ) as client:
+ resp = await client.get(
+ "/api/reports",
+ params={"start_date": "2025-01-01", "end_date": "2025-01-31"},
+ )
+
+ assert resp.status_code == 200
+ call_args = mock_pool.fetch.call_args
+ sql = call_args[0][0]
+ assert "period_start" in sql
+ assert "period_end" in sql
+ # Date strings should be among the positional params
+ assert "2025-01-01" in call_args[0]
+ assert "2025-01-31" in call_args[0]
+
+ @pytest.mark.asyncio
+ async def test_invalid_report_type_returns_400(self) -> None:
+ """An invalid report_type value returns HTTP 400."""
+ mock_pool = AsyncMock()
+
+ with patch(_POOL_PATCH, mock_pool):
+ async with httpx.AsyncClient(
+ transport=httpx.ASGITransport(app=app), base_url="http://test"
+ ) as client:
+ resp = await client.get(
+ "/api/reports", params={"report_type": "monthly"}
+ )
+
+ assert resp.status_code == 400
+ assert "daily" in resp.json()["detail"].lower() or "weekly" in resp.json()["detail"].lower()
+ # pool.fetch should NOT have been called
+ mock_pool.fetch.assert_not_awaited()
+
+ @pytest.mark.asyncio
+ async def test_invalid_date_format_returns_400(self) -> None:
+ """A malformed start_date returns HTTP 400."""
+ mock_pool = AsyncMock()
+
+ with patch(_POOL_PATCH, mock_pool):
+ async with httpx.AsyncClient(
+ transport=httpx.ASGITransport(app=app), base_url="http://test"
+ ) as client:
+ resp = await client.get(
+ "/api/reports", params={"start_date": "not-a-date"}
+ )
+
+ assert resp.status_code == 400
+ assert "YYYY-MM-DD" in resp.json()["detail"]
+ mock_pool.fetch.assert_not_awaited()
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 2. GET /api/reports/{report_id} — detail endpoint
+# Requirements validated: 5.4, 5.5
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestGetReport:
+ """Tests for GET /api/reports/{report_id}."""
+
+ @pytest.mark.asyncio
+ async def test_valid_id_returns_full_report(self) -> None:
+ """A valid report_id returns the full report including report_data."""
+ record = _make_detail_record()
+ mock_pool = AsyncMock()
+ mock_pool.fetchrow = AsyncMock(return_value=record)
+
+ report_id = str(record["id"])
+
+ with patch(_POOL_PATCH, mock_pool):
+ async with httpx.AsyncClient(
+ transport=httpx.ASGITransport(app=app), base_url="http://test"
+ ) as client:
+ resp = await client.get(f"/api/reports/{report_id}")
+
+ assert resp.status_code == 200
+ data = resp.json()
+ assert data["id"] == report_id
+ assert data["report_type"] == "daily"
+ assert data["period_start"] == "2025-01-15"
+ assert data["period_end"] == "2025-01-15"
+ assert data["validation_status"] == "passed"
+ assert "generated_at" in data
+ assert "created_at" in data
+ # report_data is included as a dict
+ assert isinstance(data["report_data"], dict)
+ assert data["report_data"]["pnl"]["realized_pnl"] == 125.50
+ assert data["report_data"]["executive_summary"] == "Test summary"
+
+ @pytest.mark.asyncio
+ async def test_nonexistent_id_returns_404(self) -> None:
+ """A non-existent report_id returns HTTP 404."""
+ mock_pool = AsyncMock()
+ mock_pool.fetchrow = AsyncMock(return_value=None)
+
+ fake_id = str(uuid.uuid4())
+
+ with patch(_POOL_PATCH, mock_pool):
+ async with httpx.AsyncClient(
+ transport=httpx.ASGITransport(app=app), base_url="http://test"
+ ) as client:
+ resp = await client.get(f"/api/reports/{fake_id}")
+
+ assert resp.status_code == 404
+ assert "not found" in resp.json()["detail"].lower()
diff --git a/tests/test_report_collector.py b/tests/test_report_collector.py
new file mode 100644
index 0000000..5be6a1f
--- /dev/null
+++ b/tests/test_report_collector.py
@@ -0,0 +1,273 @@
+"""Unit tests for the report data collector.
+
+Tests the CollectedData dataclass defaults, _row_dict UUID conversion,
+and collect_report_data with mocked asyncpg pool.
+
+Requirements: 1.1, 1.2, 1.3, 1.4, 1.5
+"""
+from __future__ import annotations
+
+import uuid
+from datetime import date
+from unittest.mock import AsyncMock, MagicMock
+
+import pytest
+
+from services.reporting.collector import CollectedData, _row_dict, collect_report_data
+
+# ===================================================================
+# _row_dict tests
+# ===================================================================
+
+
+class TestRowDict:
+ """Tests for _row_dict UUID→str conversion."""
+
+ def test_uuid_fields_converted_to_str(self):
+ """UUID values in the record are converted to strings."""
+ test_uuid = uuid.uuid4()
+ row = MagicMock()
+ row.__iter__ = MagicMock(return_value=iter([("id", test_uuid), ("name", "test")]))
+ row.keys = MagicMock(return_value=["id", "name"])
+ row.values = MagicMock(return_value=[test_uuid, "test"])
+ row.items = MagicMock(return_value=[("id", test_uuid), ("name", "test")])
+ # dict(row) needs to work — use a real dict-like mock
+ mock_dict = {"id": test_uuid, "name": "test"}
+ row.__iter__ = MagicMock(return_value=iter(mock_dict))
+ row.__getitem__ = lambda self, key: mock_dict[key]
+
+ # Simpler approach: just pass a dict-like object
+ class FakeRecord(dict):
+ pass
+
+ record = FakeRecord(id=test_uuid, name="test", count=42)
+ result = _row_dict(record)
+
+ assert result["id"] == str(test_uuid)
+ assert result["name"] == "test"
+ assert result["count"] == 42
+
+ def test_no_uuid_fields_unchanged(self):
+ """Non-UUID values pass through unchanged."""
+
+ class FakeRecord(dict):
+ pass
+
+ record = FakeRecord(ticker="AAPL", price=185.50, active=True)
+ result = _row_dict(record)
+
+ assert result["ticker"] == "AAPL"
+ assert result["price"] == 185.50
+ assert result["active"] is True
+
+ def test_multiple_uuid_fields(self):
+ """Multiple UUID fields are all converted."""
+
+ class FakeRecord(dict):
+ pass
+
+ id1 = uuid.uuid4()
+ id2 = uuid.uuid4()
+ record = FakeRecord(id=id1, recommendation_id=id2, ticker="MSFT")
+ result = _row_dict(record)
+
+ assert result["id"] == str(id1)
+ assert result["recommendation_id"] == str(id2)
+ assert result["ticker"] == "MSFT"
+
+ def test_empty_record(self):
+ """Empty record returns empty dict."""
+
+ class FakeRecord(dict):
+ pass
+
+ record = FakeRecord()
+ result = _row_dict(record)
+ assert result == {}
+
+
+# ===================================================================
+# CollectedData defaults
+# ===================================================================
+
+
+class TestCollectedDataDefaults:
+ """Tests for CollectedData dataclass default values."""
+
+ def test_default_empty_lists(self):
+ """All list fields default to empty lists."""
+ data = CollectedData()
+ assert data.trading_decisions == []
+ assert data.orders == []
+ assert data.open_positions == []
+ assert data.closed_positions == []
+ assert data.recommendations == []
+ assert data.prediction_outcomes == []
+ assert data.model_metric_snapshots == []
+ assert data.circuit_breaker_events == []
+
+ def test_default_none_snapshots(self):
+ """Snapshot fields default to None."""
+ data = CollectedData()
+ assert data.portfolio_snapshot is None
+ assert data.previous_portfolio_snapshot is None
+
+ def test_default_zero_balance(self):
+ """Reserve pool balance defaults to 0.0."""
+ data = CollectedData()
+ assert data.reserve_pool_balance == 0.0
+
+ def test_independent_list_instances(self):
+ """Each CollectedData instance has independent list instances."""
+ data1 = CollectedData()
+ data2 = CollectedData()
+ data1.trading_decisions.append({"id": "test"})
+ assert data2.trading_decisions == []
+
+
+# ===================================================================
+# collect_report_data with mocked pool
+# ===================================================================
+
+
+def _make_mock_pool():
+ """Create a mock asyncpg pool with async context manager support."""
+ pool = MagicMock()
+ conn = AsyncMock()
+
+ # pool.acquire() returns a sync object that supports async context manager
+ ctx = MagicMock()
+ ctx.__aenter__ = AsyncMock(return_value=conn)
+ ctx.__aexit__ = AsyncMock(return_value=False)
+ pool.acquire.return_value = ctx
+
+ return pool, conn
+
+
+class TestCollectReportData:
+ """Tests for collect_report_data with mocked asyncpg."""
+
+ @pytest.mark.asyncio
+ async def test_zero_activity_returns_empty_lists(self):
+ """When no data exists, all lists are empty and snapshots are None."""
+ pool, conn = _make_mock_pool()
+
+ # All queries return empty results
+ conn.fetch.return_value = []
+ conn.fetchrow.return_value = None
+
+ result = await collect_report_data(
+ pool, date(2025, 1, 15), date(2025, 1, 15)
+ )
+
+ assert isinstance(result, CollectedData)
+ assert result.trading_decisions == []
+ assert result.orders == []
+ assert result.open_positions == []
+ assert result.closed_positions == []
+ assert result.portfolio_snapshot is None
+ assert result.previous_portfolio_snapshot is None
+ assert result.recommendations == []
+ assert result.prediction_outcomes == []
+ assert result.model_metric_snapshots == []
+ assert result.circuit_breaker_events == []
+ assert result.reserve_pool_balance == 0.0
+
+ @pytest.mark.asyncio
+ async def test_queries_use_correct_date_range(self):
+ """Verify that queries are called with the correct period dates."""
+ pool, conn = _make_mock_pool()
+ conn.fetch.return_value = []
+ conn.fetchrow.return_value = None
+
+ start = date(2025, 1, 13)
+ end = date(2025, 1, 17)
+
+ await collect_report_data(pool, start, end)
+
+ # Verify fetch was called (trading_decisions, orders, open_positions,
+ # closed_positions, recommendations, prediction_outcomes,
+ # model_metric_snapshots, circuit_breaker_events)
+ assert conn.fetch.call_count == 8
+
+ # Verify fetchrow was called (portfolio_snapshot, previous_snapshot,
+ # reserve_pool_balance)
+ assert conn.fetchrow.call_count == 3
+
+ @pytest.mark.asyncio
+ async def test_reserve_pool_balance_from_ledger(self):
+ """Reserve pool balance is read from the latest ledger entry."""
+ pool, conn = _make_mock_pool()
+ conn.fetch.return_value = []
+
+ # Mock fetchrow to return different values for different queries
+ balance_row = {"balance_after": 450.75}
+
+ call_count = 0
+
+ async def mock_fetchrow(query, *args):
+ nonlocal call_count
+ call_count += 1
+ if "reserve_pool_ledger" in query:
+ return balance_row
+ return None
+
+ conn.fetchrow.side_effect = mock_fetchrow
+
+ result = await collect_report_data(
+ pool, date(2025, 1, 15), date(2025, 1, 15)
+ )
+
+ assert result.reserve_pool_balance == 450.75
+
+ @pytest.mark.asyncio
+ async def test_portfolio_snapshots_populated(self):
+ """Portfolio snapshot and previous snapshot are populated when data exists."""
+ pool, conn = _make_mock_pool()
+ conn.fetch.return_value = []
+
+ current_snapshot = {
+ "id": uuid.uuid4(),
+ "snapshot_date": date(2025, 1, 15),
+ "portfolio_value": 10500.0,
+ "active_pool": 8000.0,
+ "reserve_pool": 2500.0,
+ "cumulative_return": 0.05,
+ }
+ previous_snapshot = {
+ "id": uuid.uuid4(),
+ "snapshot_date": date(2025, 1, 14),
+ "portfolio_value": 10000.0,
+ "active_pool": 7500.0,
+ "reserve_pool": 2500.0,
+ "cumulative_return": 0.0,
+ }
+
+ call_count = 0
+
+ async def mock_fetchrow(query, *args):
+ nonlocal call_count
+ call_count += 1
+ if "reserve_pool_ledger" in query:
+ return None
+ if "snapshot_date >=" in query:
+ # current snapshot query (snapshot_date >= $1 AND snapshot_date <= $2)
+ return current_snapshot
+ if "snapshot_date <" in query:
+ # previous snapshot query (snapshot_date < $1)
+ return previous_snapshot
+ return None
+
+ conn.fetchrow.side_effect = mock_fetchrow
+
+ result = await collect_report_data(
+ pool, date(2025, 1, 15), date(2025, 1, 15)
+ )
+
+ assert result.portfolio_snapshot is not None
+ assert result.portfolio_snapshot["portfolio_value"] == 10500.0
+ # UUID fields should be converted to str
+ assert isinstance(result.portfolio_snapshot["id"], str)
+
+ assert result.previous_portfolio_snapshot is not None
+ assert result.previous_portfolio_snapshot["portfolio_value"] == 10000.0
diff --git a/tests/test_report_generator.py b/tests/test_report_generator.py
new file mode 100644
index 0000000..f540ddd
--- /dev/null
+++ b/tests/test_report_generator.py
@@ -0,0 +1,678 @@
+"""Unit tests for report generator orchestrator.
+
+Tests the orchestration flow in services.reporting.generator with mocked
+dependencies (collector, section builders, validator, summarizer).
+
+Requirements validated: 5.1, 5.2, 5.3
+"""
+from __future__ import annotations
+
+import uuid
+from datetime import date, datetime, timezone
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from services.reporting.collector import CollectedData
+from services.reporting.generator import (
+ _in_progress_jobs,
+ generate_report,
+ process_report_job,
+ store_report,
+)
+from services.reporting.models import (
+ ModelQualitySection,
+ ModelQualityWindow,
+ PLSection,
+ PositionPerformanceSection,
+ RecommendationAccuracySection,
+ ReportData,
+ ReportType,
+ RiskMetricsSection,
+ ValidationStatus,
+)
+
+# ── Helpers ──────────────────────────────────────────────────────────────
+
+
+def _make_report_data(**overrides: object) -> ReportData:
+ """Build a minimal valid ReportData for testing."""
+ defaults = {
+ "pnl": PLSection(
+ realized_pnl=100.0,
+ unrealized_pnl=-20.0,
+ daily_return=0.01,
+ cumulative_return=0.05,
+ win_count=5,
+ loss_count=2,
+ win_rate=0.71,
+ profit_factor=2.0,
+ sharpe_ratio=1.2,
+ summary="P&L summary",
+ ),
+ "recommendation_accuracy": RecommendationAccuracySection(
+ total_evaluated=10,
+ act_count=6,
+ skip_count=4,
+ acted_win_rate=0.67,
+ avg_confidence_acted=0.75,
+ avg_confidence_skipped=0.40,
+ summary="Rec accuracy summary",
+ ),
+ "position_performance": PositionPerformanceSection(
+ positions=[],
+ summary="Position summary",
+ ),
+ "risk_metrics": RiskMetricsSection(
+ current_risk_tier="moderate",
+ portfolio_heat=0.12,
+ max_drawdown=0.06,
+ current_drawdown_pct=0.02,
+ reserve_pool_balance=500.0,
+ circuit_breaker_event_count=0,
+ summary="Risk summary",
+ ),
+ "model_quality": ModelQualitySection(
+ windows=[
+ ModelQualityWindow(
+ lookback="7d",
+ win_rate=0.65,
+ directional_accuracy=0.62,
+ information_coefficient=0.08,
+ calibration_error=0.12,
+ brier_score=0.22,
+ ),
+ ],
+ summary="Model quality summary",
+ ),
+ "executive_summary": "Executive summary text",
+ "validation_status": ValidationStatus.PASSED,
+ "generated_at": datetime(2025, 1, 15, 21, 30, tzinfo=timezone.utc),
+ "period_start": date(2025, 1, 15),
+ "period_end": date(2025, 1, 15),
+ "report_type": ReportType.DAILY,
+ }
+ defaults.update(overrides)
+ return ReportData(**defaults)
+
+
+def _empty_collected_data() -> CollectedData:
+ """Build a zero-activity CollectedData."""
+ return CollectedData()
+
+
+def _mock_pool() -> AsyncMock:
+ """Create a mock asyncpg pool."""
+ pool = AsyncMock()
+ return pool
+
+
+# Patch targets (all in the generator module namespace)
+_PATCH_COLLECT = "services.reporting.generator.collect_report_data"
+_PATCH_BUILD_PNL = "services.reporting.generator.build_pnl_section"
+_PATCH_BUILD_REC = "services.reporting.generator.build_recommendation_accuracy_section"
+_PATCH_BUILD_POS = "services.reporting.generator.build_position_performance_section"
+_PATCH_BUILD_RISK = "services.reporting.generator.build_risk_metrics_section"
+_PATCH_BUILD_MQ = "services.reporting.generator.build_model_quality_section"
+_PATCH_VALIDATE_REC = "services.reporting.generator.validate_recommendation_accuracy"
+_PATCH_VALIDATE_MQ = "services.reporting.generator.validate_model_quality"
+_PATCH_COMPUTE_STATUS = "services.reporting.generator.compute_validation_status"
+_PATCH_SUMMARIZE = "services.reporting.generator.summarize_section"
+_PATCH_EXEC_SUMMARY = "services.reporting.generator.generate_executive_summary"
+_PATCH_RESOLVER = "services.reporting.generator.AgentConfigResolver"
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 1. generate_report — orchestration flow
+# Requirements validated: 5.1
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestGenerateReport:
+ """Tests for generate_report orchestration."""
+
+ @pytest.mark.asyncio
+ async def test_orchestration_calls_all_steps(self) -> None:
+ """generate_report calls collector, builders, validators, summarizer in order."""
+ pool = _mock_pool()
+ collected = _empty_collected_data()
+
+ pnl = PLSection(
+ realized_pnl=0, unrealized_pnl=0, daily_return=0,
+ cumulative_return=0, win_count=0, loss_count=0,
+ win_rate=0, profit_factor=0, sharpe_ratio=0,
+ )
+ rec = RecommendationAccuracySection(
+ total_evaluated=0, act_count=0, skip_count=0,
+ acted_win_rate=0, avg_confidence_acted=0, avg_confidence_skipped=0,
+ )
+ pos = PositionPerformanceSection()
+ risk = RiskMetricsSection(
+ current_risk_tier="low", portfolio_heat=0, max_drawdown=0,
+ current_drawdown_pct=0, reserve_pool_balance=0,
+ circuit_breaker_event_count=0,
+ )
+ mq = ModelQualitySection()
+
+ with (
+ patch(_PATCH_COLLECT, new_callable=AsyncMock, return_value=collected) as mock_collect,
+ patch(_PATCH_BUILD_PNL, return_value=pnl) as mock_pnl,
+ patch(_PATCH_BUILD_REC, return_value=rec) as mock_rec,
+ patch(_PATCH_BUILD_POS, return_value=pos) as mock_pos,
+ patch(_PATCH_BUILD_RISK, return_value=risk) as mock_risk,
+ patch(_PATCH_BUILD_MQ, return_value=mq) as mock_mq,
+ patch(_PATCH_VALIDATE_REC, return_value=[]) as mock_val_rec,
+ patch(_PATCH_VALIDATE_MQ, return_value=[]) as mock_val_mq,
+ patch(_PATCH_COMPUTE_STATUS, return_value=ValidationStatus.PASSED) as mock_status,
+ patch(_PATCH_SUMMARIZE, new_callable=AsyncMock, return_value="summary") as mock_sum,
+ patch(_PATCH_EXEC_SUMMARY, new_callable=AsyncMock, return_value="exec summary") as mock_exec,
+ patch(_PATCH_RESOLVER) as mock_resolver_cls,
+ ):
+ result = await generate_report(
+ pool, ReportType.DAILY, date(2025, 1, 15), date(2025, 1, 15),
+ )
+
+ # Collector called with pool and dates
+ mock_collect.assert_awaited_once_with(pool, date(2025, 1, 15), date(2025, 1, 15))
+
+ # All section builders called with collected data
+ mock_pnl.assert_called_once_with(collected)
+ mock_rec.assert_called_once_with(collected)
+ mock_pos.assert_called_once_with(collected)
+ mock_risk.assert_called_once_with(collected)
+ mock_mq.assert_called_once_with(collected)
+
+ # Validators called
+ mock_val_rec.assert_called_once_with(rec, collected.prediction_outcomes)
+ mock_val_mq.assert_called_once_with(mq, collected.model_metric_snapshots)
+
+ # Summarizer called 5 times (one per section)
+ assert mock_sum.await_count == 5
+
+ # Executive summary called
+ mock_exec.assert_awaited_once()
+
+ # Validation status computed
+ mock_status.assert_called_once()
+
+ # Result is a ReportData
+ assert isinstance(result, ReportData)
+ assert result.report_type == ReportType.DAILY
+ assert result.period_start == date(2025, 1, 15)
+ assert result.period_end == date(2025, 1, 15)
+ assert result.executive_summary == "exec summary"
+
+ @pytest.mark.asyncio
+ async def test_zero_activity_report(self) -> None:
+ """generate_report handles zero-activity data (empty CollectedData)."""
+ pool = _mock_pool()
+ collected = _empty_collected_data()
+
+ pnl = PLSection(
+ realized_pnl=0, unrealized_pnl=0, daily_return=0,
+ cumulative_return=0, win_count=0, loss_count=0,
+ win_rate=0, profit_factor=0, sharpe_ratio=0,
+ )
+ rec = RecommendationAccuracySection(
+ total_evaluated=0, act_count=0, skip_count=0,
+ acted_win_rate=0, avg_confidence_acted=0, avg_confidence_skipped=0,
+ )
+ pos = PositionPerformanceSection()
+ risk = RiskMetricsSection(
+ current_risk_tier="unknown", portfolio_heat=0, max_drawdown=0,
+ current_drawdown_pct=0, reserve_pool_balance=0,
+ circuit_breaker_event_count=0,
+ )
+ mq = ModelQualitySection()
+
+ with (
+ patch(_PATCH_COLLECT, new_callable=AsyncMock, return_value=collected),
+ patch(_PATCH_BUILD_PNL, return_value=pnl),
+ patch(_PATCH_BUILD_REC, return_value=rec),
+ patch(_PATCH_BUILD_POS, return_value=pos),
+ patch(_PATCH_BUILD_RISK, return_value=risk),
+ patch(_PATCH_BUILD_MQ, return_value=mq),
+ patch(_PATCH_VALIDATE_REC, return_value=[]),
+ patch(_PATCH_VALIDATE_MQ, return_value=[]),
+ patch(_PATCH_COMPUTE_STATUS, return_value=ValidationStatus.PASSED),
+ patch(_PATCH_SUMMARIZE, new_callable=AsyncMock, return_value="No activity"),
+ patch(_PATCH_EXEC_SUMMARY, new_callable=AsyncMock, return_value="No trading activity"),
+ patch(_PATCH_RESOLVER),
+ ):
+ result = await generate_report(
+ pool, ReportType.DAILY, date(2025, 1, 15), date(2025, 1, 15),
+ )
+
+ assert result.pnl.realized_pnl == 0.0
+ assert result.pnl.win_count == 0
+ assert result.recommendation_accuracy.total_evaluated == 0
+ assert result.position_performance.positions == []
+ assert result.risk_metrics.current_risk_tier == "unknown"
+ assert result.validation_status == ValidationStatus.PASSED
+
+ @pytest.mark.asyncio
+ async def test_validation_warnings_attached(self) -> None:
+ """Validation warnings from validators are attached to sections."""
+ pool = _mock_pool()
+ collected = _empty_collected_data()
+
+ from services.reporting.models import ValidationWarning
+
+ rec_warning = ValidationWarning(
+ field_name="acted_win_rate",
+ computed_value=0.80,
+ snapshot_value=0.60,
+ pct_difference=33.33,
+ )
+
+ pnl = PLSection(
+ realized_pnl=0, unrealized_pnl=0, daily_return=0,
+ cumulative_return=0, win_count=0, loss_count=0,
+ win_rate=0, profit_factor=0, sharpe_ratio=0,
+ )
+ rec = RecommendationAccuracySection(
+ total_evaluated=5, act_count=3, skip_count=2,
+ acted_win_rate=0.80, avg_confidence_acted=0.7, avg_confidence_skipped=0.4,
+ )
+ pos = PositionPerformanceSection()
+ risk = RiskMetricsSection(
+ current_risk_tier="moderate", portfolio_heat=0.1, max_drawdown=0.05,
+ current_drawdown_pct=0.02, reserve_pool_balance=100,
+ circuit_breaker_event_count=0,
+ )
+ mq = ModelQualitySection()
+
+ with (
+ patch(_PATCH_COLLECT, new_callable=AsyncMock, return_value=collected),
+ patch(_PATCH_BUILD_PNL, return_value=pnl),
+ patch(_PATCH_BUILD_REC, return_value=rec),
+ patch(_PATCH_BUILD_POS, return_value=pos),
+ patch(_PATCH_BUILD_RISK, return_value=risk),
+ patch(_PATCH_BUILD_MQ, return_value=mq),
+ patch(_PATCH_VALIDATE_REC, return_value=[rec_warning]),
+ patch(_PATCH_VALIDATE_MQ, return_value=[]),
+ patch(_PATCH_COMPUTE_STATUS, return_value=ValidationStatus.WARNINGS),
+ patch(_PATCH_SUMMARIZE, new_callable=AsyncMock, return_value="summary"),
+ patch(_PATCH_EXEC_SUMMARY, new_callable=AsyncMock, return_value="exec"),
+ patch(_PATCH_RESOLVER),
+ ):
+ result = await generate_report(
+ pool, ReportType.DAILY, date(2025, 1, 15), date(2025, 1, 15),
+ )
+
+ assert result.validation_status == ValidationStatus.WARNINGS
+ assert len(result.recommendation_accuracy.validation_warnings) == 1
+ assert result.recommendation_accuracy.validation_warnings[0].field_name == "acted_win_rate"
+
+ @pytest.mark.asyncio
+ async def test_weekly_report_type(self) -> None:
+ """generate_report correctly sets weekly report type."""
+ pool = _mock_pool()
+ collected = _empty_collected_data()
+
+ pnl = PLSection(
+ realized_pnl=0, unrealized_pnl=0, daily_return=0,
+ cumulative_return=0, win_count=0, loss_count=0,
+ win_rate=0, profit_factor=0, sharpe_ratio=0,
+ )
+ rec = RecommendationAccuracySection(
+ total_evaluated=0, act_count=0, skip_count=0,
+ acted_win_rate=0, avg_confidence_acted=0, avg_confidence_skipped=0,
+ )
+ pos = PositionPerformanceSection()
+ risk = RiskMetricsSection(
+ current_risk_tier="low", portfolio_heat=0, max_drawdown=0,
+ current_drawdown_pct=0, reserve_pool_balance=0,
+ circuit_breaker_event_count=0,
+ )
+ mq = ModelQualitySection()
+
+ with (
+ patch(_PATCH_COLLECT, new_callable=AsyncMock, return_value=collected),
+ patch(_PATCH_BUILD_PNL, return_value=pnl),
+ patch(_PATCH_BUILD_REC, return_value=rec),
+ patch(_PATCH_BUILD_POS, return_value=pos),
+ patch(_PATCH_BUILD_RISK, return_value=risk),
+ patch(_PATCH_BUILD_MQ, return_value=mq),
+ patch(_PATCH_VALIDATE_REC, return_value=[]),
+ patch(_PATCH_VALIDATE_MQ, return_value=[]),
+ patch(_PATCH_COMPUTE_STATUS, return_value=ValidationStatus.PASSED),
+ patch(_PATCH_SUMMARIZE, new_callable=AsyncMock, return_value="summary"),
+ patch(_PATCH_EXEC_SUMMARY, new_callable=AsyncMock, return_value="exec"),
+ patch(_PATCH_RESOLVER),
+ ):
+ result = await generate_report(
+ pool, ReportType.WEEKLY, date(2025, 1, 13), date(2025, 1, 17),
+ )
+
+ assert result.report_type == ReportType.WEEKLY
+ assert result.period_start == date(2025, 1, 13)
+ assert result.period_end == date(2025, 1, 17)
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 2. store_report — upsert behavior
+# Requirements validated: 5.2, 5.3
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestStoreReport:
+ """Tests for store_report upsert behavior."""
+
+ @pytest.mark.asyncio
+ async def test_store_calls_upsert_sql(self) -> None:
+ """store_report calls pool.fetchrow with the upsert SQL and correct params."""
+ pool = _mock_pool()
+ report_id = str(uuid.uuid4())
+ pool.fetchrow = AsyncMock(return_value={"id": report_id})
+
+ report = _make_report_data()
+ result = await store_report(pool, report)
+
+ assert result == report_id
+ pool.fetchrow.assert_awaited_once()
+
+ call_args = pool.fetchrow.call_args
+ sql = call_args[0][0]
+ assert "INSERT INTO trading_reports" in sql
+ assert "ON CONFLICT" in sql
+ assert "DO UPDATE" in sql
+
+ # Verify the positional parameters
+ assert call_args[0][1] == report.report_type.value
+ assert call_args[0][2] == report.period_start
+ assert call_args[0][3] == report.period_end
+ # param 4 is the JSON string
+ assert call_args[0][4] == report.model_dump_json()
+ assert call_args[0][5] == report.validation_status.value
+ assert call_args[0][6] == report.generated_at
+
+ @pytest.mark.asyncio
+ async def test_store_returns_uuid_string(self) -> None:
+ """store_report returns the UUID as a string."""
+ pool = _mock_pool()
+ expected_id = str(uuid.uuid4())
+ pool.fetchrow = AsyncMock(return_value={"id": expected_id})
+
+ report = _make_report_data()
+ result = await store_report(pool, report)
+
+ assert isinstance(result, str)
+ assert result == expected_id
+
+ @pytest.mark.asyncio
+ async def test_store_upsert_regeneration(self) -> None:
+ """store_report handles regeneration (upsert) for existing period."""
+ pool = _mock_pool()
+ report_id = str(uuid.uuid4())
+ pool.fetchrow = AsyncMock(return_value={"id": report_id})
+
+ # First store
+ report1 = _make_report_data()
+ result1 = await store_report(pool, report1)
+
+ # Second store (regeneration) — same period, different data
+ report2 = _make_report_data(
+ executive_summary="Updated executive summary",
+ generated_at=datetime(2025, 1, 15, 22, 0, tzinfo=timezone.utc),
+ )
+ result2 = await store_report(pool, report2)
+
+ # Both calls succeed (upsert handles the conflict)
+ assert result1 == report_id
+ assert result2 == report_id
+ assert pool.fetchrow.await_count == 2
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 3. process_report_job — job processing
+# Requirements validated: 5.1, 5.3
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestProcessReportJob:
+ """Tests for process_report_job."""
+
+ @pytest.mark.asyncio
+ async def test_valid_job_calls_generate_and_store(self) -> None:
+ """A valid job payload triggers generate_report and store_report."""
+ pool = _mock_pool()
+ report = _make_report_data()
+
+ with (
+ patch(
+ "services.reporting.generator.generate_report",
+ new_callable=AsyncMock,
+ return_value=report,
+ ) as mock_gen,
+ patch(
+ "services.reporting.generator.store_report",
+ new_callable=AsyncMock,
+ return_value=str(uuid.uuid4()),
+ ) as mock_store,
+ ):
+ job = {
+ "report_type": "daily",
+ "period_start": "2025-01-15",
+ "period_end": "2025-01-15",
+ }
+ await process_report_job(pool, job)
+
+ mock_gen.assert_awaited_once_with(
+ pool, ReportType.DAILY, date(2025, 1, 15), date(2025, 1, 15),
+ )
+ mock_store.assert_awaited_once_with(pool, report)
+
+ @pytest.mark.asyncio
+ async def test_invalid_report_type_returns_early(self) -> None:
+ """An invalid report_type in the job payload causes early return."""
+ pool = _mock_pool()
+
+ with (
+ patch(
+ "services.reporting.generator.generate_report",
+ new_callable=AsyncMock,
+ ) as mock_gen,
+ ):
+ job = {
+ "report_type": "invalid_type",
+ "period_start": "2025-01-15",
+ "period_end": "2025-01-15",
+ }
+ await process_report_job(pool, job)
+
+ mock_gen.assert_not_awaited()
+
+ @pytest.mark.asyncio
+ async def test_invalid_date_returns_early(self) -> None:
+ """An invalid date in the job payload causes early return."""
+ pool = _mock_pool()
+
+ with (
+ patch(
+ "services.reporting.generator.generate_report",
+ new_callable=AsyncMock,
+ ) as mock_gen,
+ ):
+ job = {
+ "report_type": "daily",
+ "period_start": "not-a-date",
+ "period_end": "2025-01-15",
+ }
+ await process_report_job(pool, job)
+
+ mock_gen.assert_not_awaited()
+
+ @pytest.mark.asyncio
+ async def test_missing_fields_returns_early(self) -> None:
+ """Missing fields in the job payload causes early return."""
+ pool = _mock_pool()
+
+ with (
+ patch(
+ "services.reporting.generator.generate_report",
+ new_callable=AsyncMock,
+ ) as mock_gen,
+ ):
+ job = {}
+ await process_report_job(pool, job)
+
+ mock_gen.assert_not_awaited()
+
+ @pytest.mark.asyncio
+ async def test_duplicate_job_rejected(self) -> None:
+ """A duplicate in-progress job is rejected without calling generate_report."""
+ pool = _mock_pool()
+ key = "daily:2025-01-20:2025-01-20"
+
+ # Simulate an in-progress job
+ _in_progress_jobs.add(key)
+ try:
+ with (
+ patch(
+ "services.reporting.generator.generate_report",
+ new_callable=AsyncMock,
+ ) as mock_gen,
+ ):
+ job = {
+ "report_type": "daily",
+ "period_start": "2025-01-20",
+ "period_end": "2025-01-20",
+ }
+ await process_report_job(pool, job)
+
+ mock_gen.assert_not_awaited()
+ finally:
+ _in_progress_jobs.discard(key)
+
+ @pytest.mark.asyncio
+ async def test_job_cleans_up_in_progress_on_success(self) -> None:
+ """After successful completion, the job key is removed from _in_progress_jobs."""
+ pool = _mock_pool()
+ report = _make_report_data(
+ period_start=date(2025, 1, 21),
+ period_end=date(2025, 1, 21),
+ )
+ key = "daily:2025-01-21:2025-01-21"
+
+ with (
+ patch(
+ "services.reporting.generator.generate_report",
+ new_callable=AsyncMock,
+ return_value=report,
+ ),
+ patch(
+ "services.reporting.generator.store_report",
+ new_callable=AsyncMock,
+ return_value=str(uuid.uuid4()),
+ ),
+ ):
+ job = {
+ "report_type": "daily",
+ "period_start": "2025-01-21",
+ "period_end": "2025-01-21",
+ }
+ await process_report_job(pool, job)
+
+ assert key not in _in_progress_jobs
+
+ @pytest.mark.asyncio
+ async def test_job_cleans_up_in_progress_on_failure(self) -> None:
+ """After all retries fail, the job key is still removed from _in_progress_jobs."""
+ pool = _mock_pool()
+ key = "daily:2025-01-22:2025-01-22"
+
+ with (
+ patch(
+ "services.reporting.generator.generate_report",
+ new_callable=AsyncMock,
+ side_effect=RuntimeError("DB down"),
+ ),
+ patch("asyncio.sleep", new_callable=AsyncMock),
+ ):
+ job = {
+ "report_type": "daily",
+ "period_start": "2025-01-22",
+ "period_end": "2025-01-22",
+ }
+ await process_report_job(pool, job)
+
+ assert key not in _in_progress_jobs
+
+ @pytest.mark.asyncio
+ async def test_retries_on_failure(self) -> None:
+ """process_report_job retries up to 3 times on failure."""
+ pool = _mock_pool()
+ report = _make_report_data(
+ period_start=date(2025, 1, 23),
+ period_end=date(2025, 1, 23),
+ )
+
+ call_count = 0
+
+ async def _gen_side_effect(*args, **kwargs):
+ nonlocal call_count
+ call_count += 1
+ if call_count < 3:
+ raise RuntimeError("Transient error")
+ return report
+
+ with (
+ patch(
+ "services.reporting.generator.generate_report",
+ new_callable=AsyncMock,
+ side_effect=_gen_side_effect,
+ ),
+ patch(
+ "services.reporting.generator.store_report",
+ new_callable=AsyncMock,
+ return_value=str(uuid.uuid4()),
+ ) as mock_store,
+ patch("asyncio.sleep", new_callable=AsyncMock) as mock_sleep,
+ ):
+ job = {
+ "report_type": "daily",
+ "period_start": "2025-01-23",
+ "period_end": "2025-01-23",
+ }
+ await process_report_job(pool, job)
+
+ # generate_report called 3 times (2 failures + 1 success)
+ assert call_count == 3
+ # store_report called once on success
+ mock_store.assert_awaited_once()
+ # sleep called twice (between retries)
+ assert mock_sleep.await_count == 2
+
+ @pytest.mark.asyncio
+ async def test_weekly_job(self) -> None:
+ """A weekly job payload is processed correctly."""
+ pool = _mock_pool()
+ report = _make_report_data(
+ report_type=ReportType.WEEKLY,
+ period_start=date(2025, 1, 13),
+ period_end=date(2025, 1, 17),
+ )
+
+ with (
+ patch(
+ "services.reporting.generator.generate_report",
+ new_callable=AsyncMock,
+ return_value=report,
+ ) as mock_gen,
+ patch(
+ "services.reporting.generator.store_report",
+ new_callable=AsyncMock,
+ return_value=str(uuid.uuid4()),
+ ),
+ ):
+ job = {
+ "report_type": "weekly",
+ "period_start": "2025-01-13",
+ "period_end": "2025-01-17",
+ }
+ await process_report_job(pool, job)
+
+ mock_gen.assert_awaited_once_with(
+ pool, ReportType.WEEKLY, date(2025, 1, 13), date(2025, 1, 17),
+ )
diff --git a/tests/test_report_sections.py b/tests/test_report_sections.py
new file mode 100644
index 0000000..5119f29
--- /dev/null
+++ b/tests/test_report_sections.py
@@ -0,0 +1,578 @@
+"""Unit tests for report section builders.
+
+Tests each section builder from services.reporting.sections with known
+inputs and expected outputs, including edge cases for zero-activity,
+single positions, and missing portfolio snapshots.
+
+Requirements validated: 3.1, 3.2, 3.3, 3.4, 3.5
+"""
+from __future__ import annotations
+
+import uuid
+from datetime import datetime, timezone
+
+from services.reporting.collector import CollectedData
+from services.reporting.models import (
+ ModelQualitySection,
+ PLSection,
+ PositionPerformanceSection,
+ RecommendationAccuracySection,
+ RiskMetricsSection,
+)
+from services.reporting.sections import (
+ build_model_quality_section,
+ build_pnl_section,
+ build_position_performance_section,
+ build_recommendation_accuracy_section,
+ build_risk_metrics_section,
+)
+
+# ── Helpers ──────────────────────────────────────────────────────────────
+
+
+def _make_snapshot(**overrides: object) -> dict:
+ """Build a portfolio snapshot dict with sensible defaults."""
+ snap = {
+ "realized_pnl": 100.0,
+ "unrealized_pnl": -20.0,
+ "daily_return": 0.015,
+ "cumulative_return": 0.08,
+ "win_count": 7,
+ "loss_count": 3,
+ "win_rate": 0.7,
+ "sharpe_ratio": 1.5,
+ "portfolio_heat": 0.12,
+ "max_drawdown": 0.06,
+ "current_drawdown_pct": 0.02,
+ "risk_tier": "moderate",
+ }
+ snap.update(overrides)
+ return snap
+
+
+def _make_closed_position(
+ ticker: str,
+ entry: float,
+ exit_price: float,
+ realized_pnl: float,
+ updated_at: datetime | None = None,
+) -> dict:
+ """Build a closed position dict."""
+ return {
+ "id": str(uuid.uuid4()),
+ "ticker": ticker,
+ "avg_entry_price": entry,
+ "current_price": exit_price,
+ "realized_pnl": realized_pnl,
+ "quantity": 0,
+ "updated_at": updated_at or datetime(2025, 1, 15, 20, 0, tzinfo=timezone.utc),
+ }
+
+
+def _make_open_position(
+ ticker: str,
+ entry: float,
+ current: float,
+ quantity: float,
+ updated_at: datetime | None = None,
+) -> dict:
+ """Build an open position dict."""
+ return {
+ "id": str(uuid.uuid4()),
+ "ticker": ticker,
+ "avg_entry_price": entry,
+ "current_price": current,
+ "quantity": quantity,
+ "updated_at": updated_at or datetime(2025, 1, 14, 10, 0, tzinfo=timezone.utc),
+ }
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 1. build_pnl_section
+# Requirements validated: 3.1
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestBuildPnlSection:
+ """Tests for build_pnl_section."""
+
+ def test_with_portfolio_snapshot(self) -> None:
+ """Section values are extracted from the portfolio snapshot."""
+ snap = _make_snapshot()
+ data = CollectedData(portfolio_snapshot=snap)
+ section = build_pnl_section(data)
+
+ assert isinstance(section, PLSection)
+ assert section.realized_pnl == 100.0
+ assert section.unrealized_pnl == -20.0
+ assert section.daily_return == 0.015
+ assert section.cumulative_return == 0.08
+ assert section.win_count == 7
+ assert section.loss_count == 3
+ assert section.win_rate == 0.7
+ assert section.sharpe_ratio == 1.5
+
+ def test_no_snapshot_returns_zeros(self) -> None:
+ """When no portfolio snapshot exists, all values are zero."""
+ data = CollectedData(portfolio_snapshot=None)
+ section = build_pnl_section(data)
+
+ assert section.realized_pnl == 0.0
+ assert section.unrealized_pnl == 0.0
+ assert section.daily_return == 0.0
+ assert section.cumulative_return == 0.0
+ assert section.win_count == 0
+ assert section.loss_count == 0
+ assert section.win_rate == 0.0
+ assert section.profit_factor == 0.0
+ assert section.sharpe_ratio == 0.0
+
+ def test_profit_factor_from_closed_positions(self) -> None:
+ """Profit factor = sum(gains) / abs(sum(losses)) from closed positions."""
+ snap = _make_snapshot()
+ closed = [
+ _make_closed_position("AAPL", 100.0, 110.0, 50.0), # gain
+ _make_closed_position("MSFT", 200.0, 190.0, -20.0), # loss
+ _make_closed_position("GOOG", 150.0, 160.0, 30.0), # gain
+ ]
+ data = CollectedData(portfolio_snapshot=snap, closed_positions=closed)
+ section = build_pnl_section(data)
+
+ # gains = 50 + 30 = 80, losses = 20
+ expected_pf = 80.0 / 20.0
+ assert abs(section.profit_factor - expected_pf) < 1e-9
+
+ def test_profit_factor_no_losses(self) -> None:
+ """When there are no losses, profit factor is 0.0 (no divisor)."""
+ snap = _make_snapshot()
+ closed = [
+ _make_closed_position("AAPL", 100.0, 110.0, 50.0),
+ ]
+ data = CollectedData(portfolio_snapshot=snap, closed_positions=closed)
+ section = build_pnl_section(data)
+
+ assert section.profit_factor == 0.0
+
+ def test_profit_factor_no_closed_positions(self) -> None:
+ """When there are no closed positions, profit factor is 0.0."""
+ snap = _make_snapshot()
+ data = CollectedData(portfolio_snapshot=snap, closed_positions=[])
+ section = build_pnl_section(data)
+
+ assert section.profit_factor == 0.0
+
+ def test_snapshot_with_none_values(self) -> None:
+ """Snapshot fields that are None are coerced to zero."""
+ snap = _make_snapshot(
+ realized_pnl=None,
+ unrealized_pnl=None,
+ daily_return=None,
+ win_count=None,
+ )
+ data = CollectedData(portfolio_snapshot=snap)
+ section = build_pnl_section(data)
+
+ assert section.realized_pnl == 0.0
+ assert section.unrealized_pnl == 0.0
+ assert section.daily_return == 0.0
+ assert section.win_count == 0
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 2. build_recommendation_accuracy_section
+# Requirements validated: 3.2
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestBuildRecommendationAccuracySection:
+ """Tests for build_recommendation_accuracy_section."""
+
+ def test_with_act_and_skip_decisions(self) -> None:
+ """Correctly counts act/skip and computes win rate and confidence."""
+ rec_id_1 = str(uuid.uuid4())
+ rec_id_2 = str(uuid.uuid4())
+ rec_id_3 = str(uuid.uuid4())
+
+ data = CollectedData(
+ trading_decisions=[
+ {"id": "td1", "recommendation_id": rec_id_1, "decision": "act", "ticker": "AAPL"},
+ {"id": "td2", "recommendation_id": rec_id_2, "decision": "skip", "ticker": "MSFT"},
+ {"id": "td3", "recommendation_id": rec_id_3, "decision": "act", "ticker": "GOOG"},
+ ],
+ recommendations=[
+ {"id": rec_id_1, "confidence": 0.8},
+ {"id": rec_id_2, "confidence": 0.3},
+ {"id": rec_id_3, "confidence": 0.9},
+ ],
+ prediction_outcomes=[
+ {"ticker": "AAPL", "profitable": True, "direction_correct": True},
+ {"ticker": "GOOG", "profitable": False, "direction_correct": False},
+ ],
+ )
+ section = build_recommendation_accuracy_section(data)
+
+ assert isinstance(section, RecommendationAccuracySection)
+ assert section.total_evaluated == 3
+ assert section.act_count == 2
+ assert section.skip_count == 1
+ # 1 win out of 2 acted with outcomes
+ assert abs(section.acted_win_rate - 0.5) < 1e-9
+ # avg confidence acted = (0.8 + 0.9) / 2 = 0.85
+ assert abs(section.avg_confidence_acted - 0.85) < 1e-9
+ # avg confidence skipped = 0.3
+ assert abs(section.avg_confidence_skipped - 0.3) < 1e-9
+
+ def test_no_decisions_returns_zeros(self) -> None:
+ """When there are no trading decisions, all values are zero."""
+ data = CollectedData(trading_decisions=[])
+ section = build_recommendation_accuracy_section(data)
+
+ assert section.total_evaluated == 0
+ assert section.act_count == 0
+ assert section.skip_count == 0
+ assert section.acted_win_rate == 0.0
+ assert section.avg_confidence_acted == 0.0
+ assert section.avg_confidence_skipped == 0.0
+
+ def test_all_act_decisions(self) -> None:
+ """When all decisions are 'act', skip_count is 0."""
+ rec_id = str(uuid.uuid4())
+ data = CollectedData(
+ trading_decisions=[
+ {"id": "td1", "recommendation_id": rec_id, "decision": "act", "ticker": "AAPL"},
+ ],
+ recommendations=[
+ {"id": rec_id, "confidence": 0.75},
+ ],
+ prediction_outcomes=[
+ {"ticker": "AAPL", "profitable": True, "direction_correct": True},
+ ],
+ )
+ section = build_recommendation_accuracy_section(data)
+
+ assert section.act_count == 1
+ assert section.skip_count == 0
+ assert section.acted_win_rate == 1.0
+ assert abs(section.avg_confidence_acted - 0.75) < 1e-9
+ assert section.avg_confidence_skipped == 0.0
+
+ def test_act_without_prediction_outcome(self) -> None:
+ """When an acted decision has no matching prediction outcome, win rate is 0."""
+ rec_id = str(uuid.uuid4())
+ data = CollectedData(
+ trading_decisions=[
+ {"id": "td1", "recommendation_id": rec_id, "decision": "act", "ticker": "AAPL"},
+ ],
+ recommendations=[
+ {"id": rec_id, "confidence": 0.6},
+ ],
+ prediction_outcomes=[], # no outcomes
+ )
+ section = build_recommendation_accuracy_section(data)
+
+ assert section.act_count == 1
+ assert section.acted_win_rate == 0.0
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 3. build_position_performance_section
+# Requirements validated: 3.3
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestBuildPositionPerformanceSection:
+ """Tests for build_position_performance_section."""
+
+ def test_with_open_positions(self) -> None:
+ """Open positions are listed with computed P&L and P&L%."""
+ pos = _make_open_position("AAPL", 150.0, 160.0, 10.0)
+ data = CollectedData(open_positions=[pos])
+ section = build_position_performance_section(data)
+
+ assert isinstance(section, PositionPerformanceSection)
+ assert len(section.positions) == 1
+
+ p = section.positions[0]
+ assert p.ticker == "AAPL"
+ assert p.entry_price == 150.0
+ assert p.current_or_exit_price == 160.0
+ assert p.status == "open"
+ # pnl = (160 - 150) * 10 = 100
+ assert abs(p.pnl - 100.0) < 1e-9
+ # pnl_pct = 100 / (150 * 10) * 100 = 6.666...%
+ assert abs(p.pnl_pct - (100.0 / 1500.0 * 100)) < 1e-6
+
+ def test_with_closed_positions(self) -> None:
+ """Closed positions use realized_pnl directly."""
+ pos = _make_closed_position("MSFT", 200.0, 210.0, 50.0)
+ data = CollectedData(closed_positions=[pos])
+ section = build_position_performance_section(data)
+
+ assert len(section.positions) == 1
+ p = section.positions[0]
+ assert p.ticker == "MSFT"
+ assert p.status == "closed"
+ assert p.pnl == 50.0
+
+ def test_empty_positions(self) -> None:
+ """When there are no positions, the list is empty."""
+ data = CollectedData(open_positions=[], closed_positions=[])
+ section = build_position_performance_section(data)
+
+ assert isinstance(section, PositionPerformanceSection)
+ assert len(section.positions) == 0
+
+ def test_mixed_open_and_closed(self) -> None:
+ """Both open and closed positions appear in the output."""
+ open_pos = _make_open_position("AAPL", 150.0, 160.0, 10.0)
+ closed_pos = _make_closed_position("GOOG", 100.0, 90.0, -25.0)
+ data = CollectedData(open_positions=[open_pos], closed_positions=[closed_pos])
+ section = build_position_performance_section(data)
+
+ assert len(section.positions) == 2
+ tickers = {p.ticker for p in section.positions}
+ assert tickers == {"AAPL", "GOOG"}
+
+ statuses = {p.ticker: p.status for p in section.positions}
+ assert statuses["AAPL"] == "open"
+ assert statuses["GOOG"] == "closed"
+
+ def test_single_position(self) -> None:
+ """A single open position is handled correctly."""
+ pos = _make_open_position("TSLA", 250.0, 250.0, 5.0)
+ data = CollectedData(open_positions=[pos])
+ section = build_position_performance_section(data)
+
+ assert len(section.positions) == 1
+ p = section.positions[0]
+ # pnl = (250 - 250) * 5 = 0
+ assert p.pnl == 0.0
+ assert p.pnl_pct == 0.0
+
+ def test_hold_duration_computed(self) -> None:
+ """Hold duration is computed from updated_at to now."""
+ # Use a fixed updated_at far enough in the past to get a positive duration
+ updated = datetime(2025, 1, 10, 12, 0, tzinfo=timezone.utc)
+ pos = _make_open_position("AAPL", 100.0, 110.0, 1.0, updated_at=updated)
+ data = CollectedData(open_positions=[pos])
+ section = build_position_performance_section(data)
+
+ # Hold duration should be positive (since updated_at is in the past)
+ assert section.positions[0].hold_duration_hours > 0.0
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 4. build_risk_metrics_section
+# Requirements validated: 3.4
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestBuildRiskMetricsSection:
+ """Tests for build_risk_metrics_section."""
+
+ def test_with_snapshot(self) -> None:
+ """Risk metrics are extracted from the portfolio snapshot."""
+ snap = _make_snapshot(
+ risk_tier="high",
+ portfolio_heat=0.25,
+ max_drawdown=0.10,
+ current_drawdown_pct=0.05,
+ )
+ data = CollectedData(
+ portfolio_snapshot=snap,
+ reserve_pool_balance=500.0,
+ circuit_breaker_events=[{"id": "cb1"}, {"id": "cb2"}],
+ )
+ section = build_risk_metrics_section(data)
+
+ assert isinstance(section, RiskMetricsSection)
+ assert section.current_risk_tier == "high"
+ assert section.portfolio_heat == 0.25
+ assert section.max_drawdown == 0.10
+ assert section.current_drawdown_pct == 0.05
+ assert section.reserve_pool_balance == 500.0
+ assert section.circuit_breaker_event_count == 2
+
+ def test_no_snapshot(self) -> None:
+ """When no snapshot exists, risk tier is 'unknown' and metrics are zero."""
+ data = CollectedData(
+ portfolio_snapshot=None,
+ reserve_pool_balance=300.0,
+ circuit_breaker_events=[],
+ )
+ section = build_risk_metrics_section(data)
+
+ assert section.current_risk_tier == "unknown"
+ assert section.portfolio_heat == 0.0
+ assert section.max_drawdown == 0.0
+ assert section.current_drawdown_pct == 0.0
+ assert section.reserve_pool_balance == 300.0
+ assert section.circuit_breaker_event_count == 0
+
+ def test_circuit_breaker_count(self) -> None:
+ """Circuit breaker event count matches the number of events."""
+ events = [{"id": f"cb{i}"} for i in range(5)]
+ data = CollectedData(
+ portfolio_snapshot=_make_snapshot(),
+ circuit_breaker_events=events,
+ reserve_pool_balance=0.0,
+ )
+ section = build_risk_metrics_section(data)
+
+ assert section.circuit_breaker_event_count == 5
+
+ def test_zero_circuit_breaker_events(self) -> None:
+ """Zero circuit breaker events when list is empty."""
+ data = CollectedData(
+ portfolio_snapshot=_make_snapshot(),
+ circuit_breaker_events=[],
+ reserve_pool_balance=100.0,
+ )
+ section = build_risk_metrics_section(data)
+
+ assert section.circuit_breaker_event_count == 0
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 5. build_model_quality_section
+# Requirements validated: 3.5
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestBuildModelQualitySection:
+ """Tests for build_model_quality_section."""
+
+ def test_with_all_windows(self) -> None:
+ """Model quality section extracts metrics for 7d, 30d, 90d windows."""
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "generated_at": "2025-01-15T20:00:00Z",
+ "win_rate": 0.65,
+ "directional_accuracy": 0.62,
+ "information_coefficient": 0.08,
+ "calibration_error": 0.12,
+ "brier_score": 0.22,
+ },
+ {
+ "lookback_window": "30d",
+ "generated_at": "2025-01-15T20:00:00Z",
+ "win_rate": 0.60,
+ "directional_accuracy": 0.58,
+ "information_coefficient": 0.06,
+ "calibration_error": 0.15,
+ "brier_score": 0.25,
+ },
+ {
+ "lookback_window": "90d",
+ "generated_at": "2025-01-15T20:00:00Z",
+ "win_rate": 0.55,
+ "directional_accuracy": 0.53,
+ "information_coefficient": 0.04,
+ "calibration_error": 0.18,
+ "brier_score": 0.28,
+ },
+ ]
+ data = CollectedData(model_metric_snapshots=snapshots)
+ section = build_model_quality_section(data)
+
+ assert isinstance(section, ModelQualitySection)
+ assert len(section.windows) == 3
+
+ by_lookback = {w.lookback: w for w in section.windows}
+ assert by_lookback["7d"].win_rate == 0.65
+ assert by_lookback["7d"].directional_accuracy == 0.62
+ assert by_lookback["7d"].information_coefficient == 0.08
+ assert by_lookback["7d"].calibration_error == 0.12
+ assert by_lookback["7d"].brier_score == 0.22
+
+ assert by_lookback["30d"].win_rate == 0.60
+ assert by_lookback["90d"].win_rate == 0.55
+
+ def test_no_snapshots(self) -> None:
+ """When there are no model metric snapshots, windows list is empty."""
+ data = CollectedData(model_metric_snapshots=[])
+ section = build_model_quality_section(data)
+
+ assert isinstance(section, ModelQualitySection)
+ assert len(section.windows) == 0
+
+ def test_partial_windows(self) -> None:
+ """When only some lookback windows are present, missing ones get None values."""
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "generated_at": "2025-01-15T20:00:00Z",
+ "win_rate": 0.70,
+ "directional_accuracy": 0.68,
+ "information_coefficient": 0.10,
+ "calibration_error": 0.08,
+ "brier_score": 0.18,
+ },
+ ]
+ data = CollectedData(model_metric_snapshots=snapshots)
+ section = build_model_quality_section(data)
+
+ assert len(section.windows) == 3
+ by_lookback = {w.lookback: w for w in section.windows}
+
+ # 7d has values
+ assert by_lookback["7d"].win_rate == 0.70
+
+ # 30d and 90d have None values
+ assert by_lookback["30d"].win_rate is None
+ assert by_lookback["30d"].directional_accuracy is None
+ assert by_lookback["90d"].win_rate is None
+ assert by_lookback["90d"].brier_score is None
+
+ def test_takes_latest_snapshot_per_window(self) -> None:
+ """When multiple snapshots exist for a window, the first (latest) is used."""
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "generated_at": "2025-01-15T20:00:00Z",
+ "win_rate": 0.70,
+ "directional_accuracy": None,
+ "information_coefficient": None,
+ "calibration_error": None,
+ "brier_score": None,
+ },
+ {
+ "lookback_window": "7d",
+ "generated_at": "2025-01-14T20:00:00Z",
+ "win_rate": 0.50,
+ "directional_accuracy": None,
+ "information_coefficient": None,
+ "calibration_error": None,
+ "brier_score": None,
+ },
+ ]
+ data = CollectedData(model_metric_snapshots=snapshots)
+ section = build_model_quality_section(data)
+
+ by_lookback = {w.lookback: w for w in section.windows}
+ # Collector orders by generated_at DESC, so first entry (0.70) is latest
+ assert by_lookback["7d"].win_rate == 0.70
+
+ def test_none_metric_values(self) -> None:
+ """Snapshot with None metric values produces None in the window."""
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "generated_at": "2025-01-15T20:00:00Z",
+ "win_rate": None,
+ "directional_accuracy": None,
+ "information_coefficient": None,
+ "calibration_error": None,
+ "brier_score": None,
+ },
+ ]
+ data = CollectedData(model_metric_snapshots=snapshots)
+ section = build_model_quality_section(data)
+
+ w = section.windows[0]
+ assert w.win_rate is None
+ assert w.directional_accuracy is None
+ assert w.information_coefficient is None
+ assert w.calibration_error is None
+ assert w.brier_score is None
diff --git a/tests/test_report_summarizer.py b/tests/test_report_summarizer.py
new file mode 100644
index 0000000..3a3248f
--- /dev/null
+++ b/tests/test_report_summarizer.py
@@ -0,0 +1,203 @@
+"""Unit tests for AI summarizer.
+
+Tests the deterministic fallback summary generation and chunk_data edge cases
+from services.reporting.summarizer.
+
+Requirements validated: 2.2, 2.6
+"""
+from __future__ import annotations
+
+from services.reporting.summarizer import build_deterministic_summary, chunk_data
+
+# ═══════════════════════════════════════════════════════════════════════
+# 1. chunk_data — edge cases
+# Requirements validated: 2.2
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestChunkDataEdgeCases:
+ """Tests for chunk_data edge cases."""
+
+ def test_empty_input_returns_single_empty_chunk(self) -> None:
+ """Empty input produces exactly one empty-string chunk."""
+ result = chunk_data("", max_chars=100)
+ assert result == [""]
+
+ def test_single_character_returns_one_chunk(self) -> None:
+ """A single character fits in one chunk."""
+ result = chunk_data("x", max_chars=100)
+ assert result == ["x"]
+
+ def test_exactly_at_limit_returns_one_chunk(self) -> None:
+ """A string exactly at the limit fits in one chunk."""
+ data = "a" * 50
+ result = chunk_data(data, max_chars=50)
+ assert result == [data]
+
+ def test_one_char_over_limit_with_newline_returns_two_chunks(self) -> None:
+ """A string one char over the limit (with a newline) splits into two chunks."""
+ # 25 chars + newline + 25 chars = 51 chars total, limit=50
+ data = "a" * 25 + "\n" + "b" * 25
+ result = chunk_data(data, max_chars=50)
+ assert len(result) == 2
+ # First chunk: "aaa...a\n" (26 chars), second chunk: "bbb...b" (25 chars)
+ assert result[0] == "a" * 25 + "\n"
+ assert result[1] == "b" * 25
+ # Round-trip: concatenation reconstructs original
+ assert "".join(result) == data
+
+ def test_no_newlines_in_long_string_returns_one_chunk(self) -> None:
+ """A long string with no newlines is never broken mid-line — stays as one chunk."""
+ data = "x" * 200
+ result = chunk_data(data, max_chars=50)
+ # No newlines means no split points, so the entire string is one chunk
+ assert result == [data]
+
+ def test_multiple_newlines_proper_splitting(self) -> None:
+ """Multiple newlines produce proper splitting at line boundaries."""
+ # 3 lines of 30 chars each (including newlines): "aaa...\n" "bbb...\n" "ccc..."
+ line_a = "a" * 29 + "\n" # 30 chars
+ line_b = "b" * 29 + "\n" # 30 chars
+ line_c = "c" * 29 # 29 chars
+ data = line_a + line_b + line_c # 89 chars total
+ result = chunk_data(data, max_chars=60)
+ # First chunk: line_a + line_b = 60 chars (exactly at limit)
+ # Second chunk: line_c = 29 chars
+ assert len(result) == 2
+ assert result[0] == line_a + line_b
+ assert result[1] == line_c
+ assert "".join(result) == data
+
+ def test_round_trip_concatenation(self) -> None:
+ """Concatenating all chunks reconstructs the original string."""
+ data = "line1\nline2\nline3\nline4\n"
+ result = chunk_data(data, max_chars=12)
+ assert "".join(result) == data
+
+ def test_max_chars_one(self) -> None:
+ """With max_chars=1, each line-segment becomes its own chunk."""
+ data = "a\nb"
+ result = chunk_data(data, max_chars=1)
+ # "a\n" is 2 chars but no split point within it, so it's one chunk
+ # "b" is 1 char, another chunk
+ assert "".join(result) == data
+ assert len(result) >= 2
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 2. build_deterministic_summary — section type templates
+# Requirements validated: 2.6
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestBuildDeterministicSummary:
+ """Tests for build_deterministic_summary with each section type."""
+
+ def test_pnl_section(self) -> None:
+ """P&L section uses the pnl template with realized_pnl, unrealized_pnl, etc."""
+ data = {
+ "realized_pnl": 125.50,
+ "unrealized_pnl": -30.20,
+ "daily_return": 1.2,
+ "win_rate": 72.7,
+ }
+ result = build_deterministic_summary("pnl", data)
+ assert "125.5" in result
+ assert "-30.2" in result
+ assert "1.2" in result
+ assert "72.7" in result
+ assert result.startswith("P&L Summary:")
+
+ def test_recommendation_accuracy_section(self) -> None:
+ """Recommendation accuracy section uses the template with total_evaluated, act_count, etc."""
+ data = {
+ "total_evaluated": 15,
+ "act_count": 8,
+ "acted_win_rate": 75.0,
+ "skip_count": 7,
+ "avg_confidence_acted": 0.72,
+ "avg_confidence_skipped": 0.48,
+ }
+ result = build_deterministic_summary("recommendation_accuracy", data)
+ assert "15" in result
+ assert "8" in result
+ assert "75.0" in result or "75" in result
+ assert "7" in result
+ assert result.startswith("Recommendation Accuracy:")
+
+ def test_position_performance_section(self) -> None:
+ """Position performance section uses the template with position count."""
+ data = {
+ "positions": [
+ {"ticker": "AAPL", "pnl": 68.0},
+ {"ticker": "MSFT", "pnl": -12.0},
+ {"ticker": "GOOG", "pnl": 25.0},
+ ],
+ }
+ result = build_deterministic_summary("position_performance", data)
+ assert "3" in result
+ assert "Position Performance:" in result
+
+ def test_position_performance_empty_positions(self) -> None:
+ """Position performance with no positions reports 0."""
+ data = {"positions": []}
+ result = build_deterministic_summary("position_performance", data)
+ assert "0" in result
+
+ def test_risk_metrics_section(self) -> None:
+ """Risk metrics section uses the template with risk_tier, portfolio_heat, etc."""
+ data = {
+ "current_risk_tier": "moderate",
+ "portfolio_heat": 0.12,
+ "max_drawdown": 0.08,
+ "current_drawdown_pct": 3.0,
+ "reserve_pool_balance": 450.00,
+ "circuit_breaker_event_count": 1,
+ }
+ result = build_deterministic_summary("risk_metrics", data)
+ assert "moderate" in result
+ assert "0.12" in result
+ assert "0.08" in result
+ assert "3.0" in result or "3" in result
+ assert "450" in result
+ assert "1" in result
+ assert result.startswith("Risk Metrics:")
+
+ def test_model_quality_section(self) -> None:
+ """Model quality section uses the template with window count."""
+ data = {
+ "windows": [
+ {"lookback": "7d"},
+ {"lookback": "30d"},
+ {"lookback": "90d"},
+ ],
+ }
+ result = build_deterministic_summary("model_quality", data)
+ assert "3" in result
+ assert "Model Quality:" in result
+
+ def test_model_quality_no_windows(self) -> None:
+ """Model quality with no windows reports 0."""
+ data = {"windows": []}
+ result = build_deterministic_summary("model_quality", data)
+ assert "0" in result
+
+ def test_unknown_section_generic_fallback(self) -> None:
+ """An unknown section name produces a generic fallback summary."""
+ data = {"metric_a": 1, "metric_b": 2, "metric_c": 3}
+ result = build_deterministic_summary("unknown_section", data)
+ assert "unknown_section" in result
+ assert "3 metrics reported" in result
+
+ def test_unknown_section_empty_data(self) -> None:
+ """An unknown section with empty data reports 0 metrics."""
+ result = build_deterministic_summary("totally_new", {})
+ assert "totally_new" in result
+ assert "0 metrics reported" in result
+
+ def test_pnl_missing_key_falls_back(self) -> None:
+ """P&L template with missing keys falls back to error message."""
+ data = {"realized_pnl": 100.0} # missing other keys
+ result = build_deterministic_summary("pnl", data)
+ # Should fall back to the error message since template.format() will raise KeyError
+ assert "template formatting failed" in result
diff --git a/tests/test_report_validator.py b/tests/test_report_validator.py
new file mode 100644
index 0000000..2c97efe
--- /dev/null
+++ b/tests/test_report_validator.py
@@ -0,0 +1,551 @@
+"""Unit tests for report validator.
+
+Tests the validation functions from services.reporting.validator with
+specific discrepancy scenarios, boundary cases, and edge cases.
+
+Requirements validated: 4.1, 4.2, 4.3, 4.4
+"""
+from __future__ import annotations
+
+from datetime import date, datetime, timezone
+
+from services.reporting.models import (
+ ModelQualitySection,
+ ModelQualityWindow,
+ PLSection,
+ PositionPerformanceSection,
+ RecommendationAccuracySection,
+ ReportData,
+ ReportType,
+ RiskMetricsSection,
+ ValidationStatus,
+)
+from services.reporting.validator import (
+ _check_discrepancy,
+ compute_validation_status,
+ validate_model_quality,
+ validate_recommendation_accuracy,
+)
+
+# ── Helpers ──────────────────────────────────────────────────────────────
+
+
+def _make_report(**overrides: object) -> ReportData:
+ """Build a minimal ReportData with sensible defaults."""
+ defaults: dict = {
+ "pnl": PLSection(
+ realized_pnl=0.0,
+ unrealized_pnl=0.0,
+ daily_return=0.0,
+ cumulative_return=0.0,
+ win_count=0,
+ loss_count=0,
+ win_rate=0.0,
+ profit_factor=0.0,
+ sharpe_ratio=0.0,
+ ),
+ "recommendation_accuracy": RecommendationAccuracySection(
+ total_evaluated=0,
+ act_count=0,
+ skip_count=0,
+ acted_win_rate=0.0,
+ avg_confidence_acted=0.0,
+ avg_confidence_skipped=0.0,
+ ),
+ "position_performance": PositionPerformanceSection(),
+ "risk_metrics": RiskMetricsSection(
+ current_risk_tier="moderate",
+ portfolio_heat=0.0,
+ max_drawdown=0.0,
+ current_drawdown_pct=0.0,
+ reserve_pool_balance=0.0,
+ circuit_breaker_event_count=0,
+ ),
+ "model_quality": ModelQualitySection(),
+ "generated_at": datetime(2025, 1, 15, 21, 30, tzinfo=timezone.utc),
+ "period_start": date(2025, 1, 15),
+ "period_end": date(2025, 1, 15),
+ "report_type": ReportType.DAILY,
+ }
+ defaults.update(overrides)
+ return ReportData(**defaults)
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 1. _check_discrepancy — boundary tests
+# Requirements validated: 4.1, 4.2, 4.3
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestCheckDiscrepancy:
+ """Tests for _check_discrepancy boundary and edge cases."""
+
+ def test_exactly_5_percent_no_warning(self) -> None:
+ """Exactly 5% discrepancy does NOT trigger a warning (threshold is >5%)."""
+ # snapshot=100, computed=105 → |105-100|/100*100 = 5.0%
+ result = _check_discrepancy("test_field", 105.0, 100.0)
+ assert result is None
+
+ def test_just_above_5_percent_triggers_warning(self) -> None:
+ """5.1% discrepancy triggers a warning."""
+ # snapshot=100, computed=105.1 → |105.1-100|/100*100 = 5.1%
+ result = _check_discrepancy("test_field", 105.1, 100.0)
+ assert result is not None
+ assert result.field_name == "test_field"
+ assert result.computed_value == 105.1
+ assert result.snapshot_value == 100.0
+ assert abs(result.pct_difference - 5.1) < 0.01
+
+ def test_snapshot_zero_computed_nonzero_warns(self) -> None:
+ """snapshot=0 with computed≠0 → 100% discrepancy → warning."""
+ result = _check_discrepancy("test_field", 42.0, 0.0)
+ assert result is not None
+ assert result.pct_difference == 100.0
+
+ def test_both_zero_no_warning(self) -> None:
+ """Both snapshot=0 and computed=0 → no warning."""
+ result = _check_discrepancy("test_field", 0.0, 0.0)
+ assert result is None
+
+ def test_large_discrepancy(self) -> None:
+ """A large discrepancy (50%) triggers a warning."""
+ # snapshot=100, computed=150 → 50%
+ result = _check_discrepancy("big_diff", 150.0, 100.0)
+ assert result is not None
+ assert abs(result.pct_difference - 50.0) < 0.01
+
+ def test_small_discrepancy_no_warning(self) -> None:
+ """A small discrepancy (1%) does not trigger a warning."""
+ # snapshot=100, computed=101 → 1%
+ result = _check_discrepancy("small_diff", 101.0, 100.0)
+ assert result is None
+
+ def test_computed_below_snapshot(self) -> None:
+ """Discrepancy is detected when computed < snapshot too."""
+ # snapshot=100, computed=94 → 6%
+ result = _check_discrepancy("below", 94.0, 100.0)
+ assert result is not None
+ assert abs(result.pct_difference - 6.0) < 0.01
+
+ def test_nan_computed_sanitized_to_zero(self) -> None:
+ """NaN computed value is sanitized to 0.0 before comparison."""
+ result = _check_discrepancy("nan_field", float("nan"), 100.0)
+ # sanitized computed=0.0, snapshot=100 → 100% discrepancy
+ assert result is not None
+ assert result.computed_value == 0.0
+ assert result.pct_difference == 100.0
+
+ def test_inf_computed_sanitized_to_zero(self) -> None:
+ """Infinity computed value is sanitized to 0.0 before comparison."""
+ result = _check_discrepancy("inf_field", float("inf"), 100.0)
+ assert result is not None
+ assert result.computed_value == 0.0
+
+ def test_snapshot_zero_computed_zero_small(self) -> None:
+ """snapshot=0.0 and computed=0.0 exactly → no warning."""
+ result = _check_discrepancy("zero_zero", 0.0, 0.0)
+ assert result is None
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 2. validate_recommendation_accuracy
+# Requirements validated: 4.1
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestValidateRecommendationAccuracy:
+ """Tests for validate_recommendation_accuracy."""
+
+ def test_matching_data_no_warnings(self) -> None:
+ """When section win rate matches prediction outcomes, no warnings."""
+ # 2 out of 4 profitable → 0.5 win rate
+ section = RecommendationAccuracySection(
+ total_evaluated=4,
+ act_count=4,
+ skip_count=0,
+ acted_win_rate=0.5,
+ avg_confidence_acted=0.7,
+ avg_confidence_skipped=0.0,
+ )
+ outcomes = [
+ {"profitable": True},
+ {"profitable": False},
+ {"profitable": True},
+ {"profitable": False},
+ ]
+ warnings = validate_recommendation_accuracy(section, outcomes)
+ assert warnings == []
+
+ def test_discrepancy_triggers_warning(self) -> None:
+ """When section win rate differs >5% from outcomes, a warning is raised."""
+ # outcomes: 1/2 profitable → 0.5, section says 0.8 → 60% discrepancy
+ section = RecommendationAccuracySection(
+ total_evaluated=2,
+ act_count=2,
+ skip_count=0,
+ acted_win_rate=0.8,
+ avg_confidence_acted=0.7,
+ avg_confidence_skipped=0.0,
+ )
+ outcomes = [
+ {"profitable": True},
+ {"profitable": False},
+ ]
+ warnings = validate_recommendation_accuracy(section, outcomes)
+ assert len(warnings) == 1
+ assert warnings[0].field_name == "acted_win_rate"
+
+ def test_no_outcomes_returns_empty(self) -> None:
+ """When there are no prediction outcomes, validation is skipped."""
+ section = RecommendationAccuracySection(
+ total_evaluated=5,
+ act_count=3,
+ skip_count=2,
+ acted_win_rate=0.6,
+ avg_confidence_acted=0.7,
+ avg_confidence_skipped=0.4,
+ )
+ warnings = validate_recommendation_accuracy(section, [])
+ assert warnings == []
+
+ def test_all_profitable_matching(self) -> None:
+ """All outcomes profitable and section says 1.0 → no warning."""
+ section = RecommendationAccuracySection(
+ total_evaluated=3,
+ act_count=3,
+ skip_count=0,
+ acted_win_rate=1.0,
+ avg_confidence_acted=0.9,
+ avg_confidence_skipped=0.0,
+ )
+ outcomes = [
+ {"profitable": True},
+ {"profitable": True},
+ {"profitable": True},
+ ]
+ warnings = validate_recommendation_accuracy(section, outcomes)
+ assert warnings == []
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 3. validate_model_quality
+# Requirements validated: 4.2, 4.3
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestValidateModelQuality:
+ """Tests for validate_model_quality."""
+
+ def test_matching_data_no_warnings(self) -> None:
+ """When section metrics match snapshots, no warnings are produced."""
+ section = ModelQualitySection(
+ windows=[
+ ModelQualityWindow(
+ lookback="7d",
+ win_rate=0.65,
+ directional_accuracy=0.62,
+ information_coefficient=0.08,
+ calibration_error=0.12,
+ brier_score=0.22,
+ ),
+ ],
+ )
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "win_rate": 0.65,
+ "directional_accuracy": 0.62,
+ "information_coefficient": 0.08,
+ "calibration_error": 0.12,
+ "brier_score": 0.22,
+ },
+ ]
+ warnings = validate_model_quality(section, snapshots)
+ assert warnings == []
+
+ def test_discrepancy_triggers_warnings(self) -> None:
+ """When section metrics differ >5% from snapshots, warnings are raised."""
+ section = ModelQualitySection(
+ windows=[
+ ModelQualityWindow(
+ lookback="7d",
+ win_rate=0.80, # snapshot says 0.65 → ~23% off
+ directional_accuracy=0.62,
+ information_coefficient=0.08,
+ calibration_error=0.12,
+ brier_score=0.22,
+ ),
+ ],
+ )
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "win_rate": 0.65,
+ "directional_accuracy": 0.62,
+ "information_coefficient": 0.08,
+ "calibration_error": 0.12,
+ "brier_score": 0.22,
+ },
+ ]
+ warnings = validate_model_quality(section, snapshots)
+ assert len(warnings) == 1
+ assert warnings[0].field_name == "7d_win_rate"
+
+ def test_null_snapshot_value_skipped(self) -> None:
+ """When a snapshot metric is NULL (None), that metric is skipped."""
+ section = ModelQualitySection(
+ windows=[
+ ModelQualityWindow(
+ lookback="7d",
+ win_rate=0.65,
+ directional_accuracy=0.62,
+ information_coefficient=0.08,
+ calibration_error=0.12,
+ brier_score=0.22,
+ ),
+ ],
+ )
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "win_rate": None, # NULL → skip
+ "directional_accuracy": None,
+ "information_coefficient": None,
+ "calibration_error": None,
+ "brier_score": None,
+ },
+ ]
+ warnings = validate_model_quality(section, snapshots)
+ assert warnings == []
+
+ def test_no_snapshots_returns_empty(self) -> None:
+ """When there are no metric snapshots, validation is skipped."""
+ section = ModelQualitySection(
+ windows=[
+ ModelQualityWindow(
+ lookback="7d",
+ win_rate=0.65,
+ directional_accuracy=0.62,
+ information_coefficient=0.08,
+ calibration_error=0.12,
+ brier_score=0.22,
+ ),
+ ],
+ )
+ warnings = validate_model_quality(section, [])
+ assert warnings == []
+
+ def test_multiple_windows_validated(self) -> None:
+ """Validation runs across all lookback windows."""
+ section = ModelQualitySection(
+ windows=[
+ ModelQualityWindow(
+ lookback="7d",
+ win_rate=0.65,
+ directional_accuracy=0.62,
+ information_coefficient=0.08,
+ calibration_error=0.12,
+ brier_score=0.22,
+ ),
+ ModelQualityWindow(
+ lookback="30d",
+ win_rate=0.90, # snapshot says 0.60 → 50% off
+ directional_accuracy=0.58,
+ information_coefficient=0.06,
+ calibration_error=0.15,
+ brier_score=0.25,
+ ),
+ ],
+ )
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "win_rate": 0.65,
+ "directional_accuracy": 0.62,
+ "information_coefficient": 0.08,
+ "calibration_error": 0.12,
+ "brier_score": 0.22,
+ },
+ {
+ "lookback_window": "30d",
+ "win_rate": 0.60,
+ "directional_accuracy": 0.58,
+ "information_coefficient": 0.06,
+ "calibration_error": 0.15,
+ "brier_score": 0.25,
+ },
+ ]
+ warnings = validate_model_quality(section, snapshots)
+ # Only 30d_win_rate should be flagged
+ assert len(warnings) == 1
+ assert warnings[0].field_name == "30d_win_rate"
+
+ def test_null_section_value_skipped(self) -> None:
+ """When a section metric is None, that metric is skipped."""
+ section = ModelQualitySection(
+ windows=[
+ ModelQualityWindow(
+ lookback="7d",
+ win_rate=None,
+ directional_accuracy=None,
+ information_coefficient=None,
+ calibration_error=None,
+ brier_score=None,
+ ),
+ ],
+ )
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "win_rate": 0.65,
+ "directional_accuracy": 0.62,
+ "information_coefficient": 0.08,
+ "calibration_error": 0.12,
+ "brier_score": 0.22,
+ },
+ ]
+ warnings = validate_model_quality(section, snapshots)
+ assert warnings == []
+
+ def test_no_matching_window_in_snapshots(self) -> None:
+ """When section has a window not in snapshots, it is skipped."""
+ section = ModelQualitySection(
+ windows=[
+ ModelQualityWindow(
+ lookback="90d",
+ win_rate=0.55,
+ directional_accuracy=0.53,
+ information_coefficient=0.04,
+ calibration_error=0.18,
+ brier_score=0.28,
+ ),
+ ],
+ )
+ snapshots = [
+ {
+ "lookback_window": "7d",
+ "win_rate": 0.65,
+ "directional_accuracy": 0.62,
+ "information_coefficient": 0.08,
+ "calibration_error": 0.12,
+ "brier_score": 0.22,
+ },
+ ]
+ warnings = validate_model_quality(section, snapshots)
+ assert warnings == []
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# 4. compute_validation_status
+# Requirements validated: 4.4
+# ═══════════════════════════════════════════════════════════════════════
+
+
+class TestComputeValidationStatus:
+ """Tests for compute_validation_status."""
+
+ def test_no_warnings_returns_passed(self) -> None:
+ """When no sections have warnings, status is PASSED."""
+ report = _make_report()
+ status = compute_validation_status(report)
+ assert status == ValidationStatus.PASSED
+
+ def test_pnl_warnings_returns_warnings(self) -> None:
+ """When P&L section has warnings, status is WARNINGS."""
+ from services.reporting.models import ValidationWarning
+
+ report = _make_report(
+ pnl=PLSection(
+ realized_pnl=0.0,
+ unrealized_pnl=0.0,
+ daily_return=0.0,
+ cumulative_return=0.0,
+ win_count=0,
+ loss_count=0,
+ win_rate=0.0,
+ profit_factor=0.0,
+ sharpe_ratio=0.0,
+ validation_warnings=[
+ ValidationWarning(
+ field_name="test",
+ computed_value=1.0,
+ snapshot_value=0.5,
+ pct_difference=100.0,
+ ),
+ ],
+ ),
+ )
+ status = compute_validation_status(report)
+ assert status == ValidationStatus.WARNINGS
+
+ def test_recommendation_accuracy_warnings_returns_warnings(self) -> None:
+ """When recommendation accuracy section has warnings, status is WARNINGS."""
+ from services.reporting.models import ValidationWarning
+
+ report = _make_report(
+ recommendation_accuracy=RecommendationAccuracySection(
+ total_evaluated=0,
+ act_count=0,
+ skip_count=0,
+ acted_win_rate=0.0,
+ avg_confidence_acted=0.0,
+ avg_confidence_skipped=0.0,
+ validation_warnings=[
+ ValidationWarning(
+ field_name="acted_win_rate",
+ computed_value=0.8,
+ snapshot_value=0.5,
+ pct_difference=60.0,
+ ),
+ ],
+ ),
+ )
+ status = compute_validation_status(report)
+ assert status == ValidationStatus.WARNINGS
+
+ def test_model_quality_warnings_returns_warnings(self) -> None:
+ """When model quality section has warnings, status is WARNINGS."""
+ from services.reporting.models import ValidationWarning
+
+ report = _make_report(
+ model_quality=ModelQualitySection(
+ validation_warnings=[
+ ValidationWarning(
+ field_name="7d_win_rate",
+ computed_value=0.9,
+ snapshot_value=0.65,
+ pct_difference=38.46,
+ ),
+ ],
+ ),
+ )
+ status = compute_validation_status(report)
+ assert status == ValidationStatus.WARNINGS
+
+ def test_multiple_sections_with_warnings(self) -> None:
+ """When multiple sections have warnings, status is still WARNINGS."""
+ from services.reporting.models import ValidationWarning
+
+ w = ValidationWarning(
+ field_name="x",
+ computed_value=1.0,
+ snapshot_value=0.0,
+ pct_difference=100.0,
+ )
+ report = _make_report(
+ pnl=PLSection(
+ realized_pnl=0.0,
+ unrealized_pnl=0.0,
+ daily_return=0.0,
+ cumulative_return=0.0,
+ win_count=0,
+ loss_count=0,
+ win_rate=0.0,
+ profit_factor=0.0,
+ sharpe_ratio=0.0,
+ validation_warnings=[w],
+ ),
+ model_quality=ModelQualitySection(validation_warnings=[w]),
+ )
+ status = compute_validation_status(report)
+ assert status == ValidationStatus.WARNINGS