feat: trading feedback engine — periodic performance reports with AI summarization
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled

- Migration 038: trading_reports table + report-summarizer agent seed
- 6 reporting modules: models, collector, sections, validator, summarizer, generator
- API endpoints: GET /api/reports (paginated, filterable), GET /api/reports/{id}
- Frontend hooks: useReports, useReport with TanStack Query
- Scheduler: daily (after 16:30 ET) and weekly (Saturday) report triggers
- Redis queue consumer for async report generation with retry/dedup
- 5 property-based tests (chunking, serialization, validation, accuracy, deltas)
- 109 unit/integration tests across all modules
- 6 frontend hook tests with MSW mocks
This commit is contained in:
Celes Renata
2026-05-01 22:13:09 +00:00
parent 376fcb4bb4
commit bc077bfcc8
28 changed files with 6771 additions and 1 deletions
@@ -0,0 +1 @@
{"specId": "d76705a8-fb91-4fce-b59e-c4b3b0dbbd83", "workflowType": "requirements-first", "specType": "feature"}
@@ -0,0 +1,802 @@
# Design Document — Trading Feedback Engine
## Overview
This design adds a periodic trading performance reporting system to Stonks Oracle. The system collects trading data (P&L, recommendations, positions, risk metrics, model quality), generates structured JSON reports with AI-powered summaries, validates report metrics against live data, and stores reports for retrieval via API.
The core challenge is fitting AI summarization within the 8k-token context window of the `qwen3.5:9b-fast` model on the local Ollama instance. The design addresses this with a chunking strategy that serializes report section data into ≤6,000-character chunks, summarizes each chunk independently, then merges chunk summaries into a final section summary. This hierarchical summarization approach keeps each LLM call well within the token budget while producing coherent narratives.
### Design Rationale
A trading system without periodic performance feedback forces the operator to manually query tables and compute metrics. The feedback engine closes this gap by:
1. **Automating data collection** — pulling from 7+ tables (trading_decisions, orders, positions, portfolio_snapshots, recommendations, prediction_outcomes, model_metric_snapshots) into a single structured report
2. **AI-powered summarization** — using the existing agent infrastructure (ai_agents, AgentConfigResolver, llm_factory) to generate natural-language summaries that highlight trends and anomalies
3. **Cross-validation** — comparing computed metrics against live validation data (prediction_outcomes, model_metric_snapshots) and flagging discrepancies >5%
4. **Persistent storage** — storing reports as JSONB for historical comparison and trend analysis
5. **Scheduled generation** — daily (after market close) and weekly (Saturday) reports via Redis queue jobs
The design reuses existing infrastructure: asyncpg for persistence, FastAPI for API endpoints, Redis queues for async job processing, the ai_agents/AgentConfigResolver/llm_factory stack for LLM access, and TanStack Query hooks on the frontend.
---
## Architecture
### High-Level Data Flow
```mermaid
flowchart TD
subgraph "Scheduling (Trigger)"
A[Scheduler Service] -->|after 16:30 ET daily| B[Redis Queue<br/>stonks:queue:report_generation]
A -->|Saturday weekly| B
C[Manual API Trigger] --> B
end
subgraph "Report Generation (Async Worker)"
B --> D[Report Generator<br/>services/reporting/generator.py]
D -->|1. Collect| E[Data Collector<br/>services/reporting/collector.py]
E -->|queries| F[(trading_decisions<br/>orders, positions<br/>portfolio_snapshots<br/>recommendations)]
D -->|2. Build sections| G[Section Builder<br/>services/reporting/sections.py]
G -->|P&L, accuracy,<br/>positions, risk,<br/>model quality| H[Report Sections]
D -->|3. Validate| I[Report Validator<br/>services/reporting/validator.py]
I -->|cross-check| J[(prediction_outcomes<br/>model_metric_snapshots)]
D -->|4. Summarize| K[AI Summarizer<br/>services/reporting/summarizer.py]
K -->|chunk & summarize| L[Report_Summarizer_Agent<br/>via AgentConfigResolver<br/>+ llm_factory]
D -->|5. Store| M[(trading_reports table)]
end
subgraph "API Layer"
N[GET /api/reports] -->|paginated list| M
O[GET /api/reports/:id] -->|full report| M
end
subgraph "Frontend"
P[useReports hook] --> N
Q[useReport hook] --> O
end
```
### Scheduling Strategy
| Component | Trigger | Cadence |
|-----------|---------|---------|
| Daily Report | Scheduler after 16:30 ET | Every trading day |
| Weekly Report | Scheduler on Saturday | Weekly (MonFri coverage) |
| Report Generator Worker | Redis queue consumer | On-demand from queue |
| AI Summarizer | Called by generator | Per report section |
### Chunking Strategy
The `qwen3.5:9b-fast` model has an 8k-token context window. With the system prompt (~200 tokens) and response budget (~200 tokens), roughly 7,600 tokens remain for input. At ~4 chars/token for structured data, that's ~30,400 characters. The 6,000-character chunk limit provides a 5x safety margin to account for JSON overhead, prompt framing, and tokenization variance.
```mermaid
flowchart LR
A[Section Data<br/>e.g. 15,000 chars] --> B{> 6,000 chars?}
B -->|No| C[Single LLM call<br/>→ summary]
B -->|Yes| D[Split into chunks<br/>≤ 6,000 chars each]
D --> E[Chunk 1 → LLM → summary 1]
D --> F[Chunk 2 → LLM → summary 2]
D --> G[Chunk N → LLM → summary N]
E --> H[Merge summaries<br/>→ final LLM call<br/>→ section summary]
F --> H
G --> H
```
---
## Components and Interfaces
### New Modules
| Module | File | Responsibility |
|--------|------|----------------|
| Report Data Collector | `services/reporting/collector.py` | Queries trading data for a reporting period |
| Report Section Builder | `services/reporting/sections.py` | Builds structured report sections from raw data |
| Report Validator | `services/reporting/validator.py` | Cross-checks metrics against validation tables |
| AI Summarizer | `services/reporting/summarizer.py` | Chunks data and generates AI summaries |
| Report Generator | `services/reporting/generator.py` | Orchestrates the full report generation pipeline |
| Report Models | `services/reporting/models.py` | Pydantic models for report structure and serialization |
### Modified Modules
| Module | File | Changes |
|--------|------|---------|
| Query API | `services/api/app.py` | 2 new `/api/reports` endpoints |
| Redis Keys | `services/shared/redis_keys.py` | New `QUEUE_REPORT_GENERATION` constant |
| Frontend Hooks | `frontend/src/api/hooks.ts` | 2 new report hooks |
| DB Migration | `infra/migrations/038_trading_reports.sql` | New table + agent seed |
### Component Interface Details
#### 1. Report Models (`services/reporting/models.py`)
```python
from __future__ import annotations
from datetime import date, datetime
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field
class ReportType(str, Enum):
DAILY = "daily"
WEEKLY = "weekly"
class ValidationStatus(str, Enum):
PASSED = "passed"
WARNINGS = "warnings"
class ValidationWarning(BaseModel):
field_name: str
computed_value: float
snapshot_value: float
pct_difference: float
class PLSection(BaseModel):
realized_pnl: float
unrealized_pnl: float
daily_return: float
cumulative_return: float
win_count: int
loss_count: int
win_rate: float
profit_factor: float
sharpe_ratio: float
summary: str = ""
validation_warnings: list[ValidationWarning] = Field(default_factory=list)
class RecommendationAccuracySection(BaseModel):
total_evaluated: int
act_count: int
skip_count: int
acted_win_rate: float
avg_confidence_acted: float
avg_confidence_skipped: float
summary: str = ""
validation_warnings: list[ValidationWarning] = Field(default_factory=list)
class PositionDetail(BaseModel):
ticker: str
entry_price: float
current_or_exit_price: float
pnl: float
pnl_pct: float
hold_duration_hours: float
status: str # "open" or "closed"
class PositionPerformanceSection(BaseModel):
positions: list[PositionDetail] = Field(default_factory=list)
summary: str = ""
class RiskMetricsSection(BaseModel):
current_risk_tier: str
portfolio_heat: float
max_drawdown: float
current_drawdown_pct: float
reserve_pool_balance: float
circuit_breaker_event_count: int
summary: str = ""
class ModelQualityWindow(BaseModel):
lookback: str
win_rate: float | None
directional_accuracy: float | None
information_coefficient: float | None
calibration_error: float | None
brier_score: float | None
class ModelQualitySection(BaseModel):
windows: list[ModelQualityWindow] = Field(default_factory=list)
summary: str = ""
validation_warnings: list[ValidationWarning] = Field(default_factory=list)
class ReportData(BaseModel):
"""Top-level report structure stored as JSONB."""
pnl: PLSection
recommendation_accuracy: RecommendationAccuracySection
position_performance: PositionPerformanceSection
risk_metrics: RiskMetricsSection
model_quality: ModelQualitySection
executive_summary: str = ""
validation_status: ValidationStatus = ValidationStatus.PASSED
generated_at: datetime
period_start: date
period_end: date
report_type: ReportType
```
#### 2. Report Data Collector (`services/reporting/collector.py`)
```python
from __future__ import annotations
from dataclasses import dataclass
from datetime import date, datetime
import asyncpg
@dataclass
class CollectedData:
"""Raw data collected for a reporting period."""
trading_decisions: list[dict]
orders: list[dict]
open_positions: list[dict]
closed_positions: list[dict]
portfolio_snapshot: dict | None
previous_portfolio_snapshot: dict | None
recommendations: list[dict]
prediction_outcomes: list[dict]
model_metric_snapshots: list[dict]
circuit_breaker_events: list[dict]
reserve_pool_balance: float
async def collect_report_data(
pool: asyncpg.Pool,
period_start: date,
period_end: date,
) -> CollectedData:
"""Query all trading data for the reporting period.
Queries: trading_decisions, orders, positions, portfolio_snapshots,
recommendations, prediction_outcomes, model_metric_snapshots,
circuit_breaker_events, reserve_pool_ledger.
Returns CollectedData with all raw query results.
If no trading_decisions exist, returns empty lists (zero-activity).
"""
...
```
#### 3. Report Section Builder (`services/reporting/sections.py`)
```python
from __future__ import annotations
from services.reporting.models import (
PLSection, RecommendationAccuracySection,
PositionPerformanceSection, PositionDetail,
RiskMetricsSection, ModelQualitySection, ModelQualityWindow,
)
from services.reporting.collector import CollectedData
def build_pnl_section(data: CollectedData) -> PLSection:
"""Build P&L section from collected data.
Computes realized/unrealized P&L, daily return, cumulative return,
win/loss counts, win rate, profit factor, and Sharpe ratio from
portfolio_snapshot and closed positions.
"""
...
def build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection:
"""Build recommendation accuracy section.
Joins trading_decisions with prediction_outcomes to compute
act/skip breakdown, win rate of acted recommendations, and
average confidence of acted vs skipped.
"""
...
def build_position_performance_section(data: CollectedData) -> PositionPerformanceSection:
"""Build position performance section.
Lists each position (open and closed) with entry price,
current/exit price, P&L, P&L%, and hold duration.
"""
...
def build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection:
"""Build risk metrics section.
Extracts current risk tier, portfolio heat, max drawdown,
current drawdown %, reserve pool balance, and circuit breaker
event count from collected data.
"""
...
def build_model_quality_section(data: CollectedData) -> ModelQualitySection:
"""Build model quality section.
Extracts latest model_metric_snapshot values for 7d, 30d, 90d
lookback windows.
"""
...
```
#### 4. Report Validator (`services/reporting/validator.py`)
```python
from __future__ import annotations
import asyncpg
from services.reporting.models import (
ReportData, ValidationStatus, ValidationWarning,
)
DISCREPANCY_THRESHOLD_PCT = 5.0
def validate_recommendation_accuracy(
section: "RecommendationAccuracySection",
prediction_outcomes: list[dict],
) -> list[ValidationWarning]:
"""Cross-reference reported win rates with prediction_outcomes.
Compares computed win rate against direction_correct/profitable
fields from prediction_outcomes for the same tickers and period.
Returns warnings for discrepancies > 5%.
"""
...
def validate_model_quality(
section: "ModelQualitySection",
metric_snapshots: list[dict],
) -> list[ValidationWarning]:
"""Compare reported model quality metrics against model_metric_snapshots.
Flags discrepancies > 5% between computed and snapshot values
for win_rate, directional_accuracy, IC, ECE, and Brier score.
"""
...
def compute_validation_status(report: ReportData) -> ValidationStatus:
"""Determine overall validation status.
Returns 'passed' if no warnings across all sections,
'warnings' if any section has validation warnings.
"""
...
```
#### 5. AI Summarizer (`services/reporting/summarizer.py`)
```python
from __future__ import annotations
import asyncpg
from services.shared.agent_config import AgentConfigResolver
CHUNK_SIZE_LIMIT = 6000 # characters per chunk
MAX_SUMMARY_WORDS = 200 # per section summary
MAX_EXECUTIVE_SUMMARY_WORDS = 300
def chunk_data(serialized: str, max_chars: int = CHUNK_SIZE_LIMIT) -> list[str]:
"""Split serialized data into chunks of at most max_chars.
Splits on newline boundaries to avoid breaking JSON structures.
Each chunk is ≤ max_chars characters.
Returns at least one chunk (even if empty input).
"""
...
async def summarize_section(
pool: asyncpg.Pool,
resolver: AgentConfigResolver,
section_name: str,
section_data: str,
) -> str:
"""Generate AI summary for a report section.
1. Serialize section data to string
2. Chunk if > CHUNK_SIZE_LIMIT
3. Summarize each chunk via Report_Summarizer_Agent
4. If multiple chunks, merge summaries with a final LLM call
5. Log each invocation to agent_performance_log
6. On failure after max_retries, fall back to deterministic summary
Uses AgentConfigResolver to resolve agent config by slug
'report-summarizer', then llm_factory to build the LLM client.
"""
...
def build_deterministic_summary(section_name: str, section_data: dict) -> str:
"""Build a fallback deterministic summary from raw metrics.
Produces a template-based text summary when AI summarization fails.
"""
...
async def generate_executive_summary(
pool: asyncpg.Pool,
resolver: AgentConfigResolver,
section_summaries: dict[str, str],
) -> str:
"""Generate executive summary from all section summaries.
Concatenates section summaries, chunks if needed, and produces
a ≤300-word synthesis via the Report_Summarizer_Agent.
Falls back to concatenated section summaries on failure.
"""
...
```
#### 6. Report Generator (`services/reporting/generator.py`)
```python
from __future__ import annotations
from datetime import date
import asyncpg
from services.reporting.models import ReportData, ReportType
async def generate_report(
pool: asyncpg.Pool,
report_type: ReportType,
period_start: date,
period_end: date,
) -> ReportData:
"""Orchestrate full report generation.
1. Collect data via collector
2. Build sections via section builder
3. Validate sections via validator
4. Generate AI summaries via summarizer
5. Generate executive summary
6. Assemble final ReportData
"""
...
async def store_report(
pool: asyncpg.Pool,
report: ReportData,
) -> str:
"""Store report in trading_reports table.
Uses INSERT ... ON CONFLICT (report_type, period_start, period_end)
DO UPDATE to handle regeneration of existing reports.
Returns the report UUID.
"""
...
async def process_report_job(
pool: asyncpg.Pool,
job: dict,
) -> None:
"""Process a report generation job from the Redis queue.
Deserializes job payload, calls generate_report + store_report.
Handles retries with exponential backoff (up to 3 attempts).
Rejects duplicate jobs for the same report_type + period.
"""
...
```
#### 7. API Endpoints (added to `services/api/app.py`)
| Endpoint | Method | Parameters | Returns |
|----------|--------|------------|---------|
| `GET /api/reports` | GET | `report_type`, `start_date`, `end_date`, `limit`, `offset` | Paginated list: id, report_type, period_start, period_end, validation_status, generated_at |
| `GET /api/reports/{report_id}` | GET | — | Full report including report_data JSONB |
#### 8. Frontend Hooks (added to `frontend/src/api/hooks.ts`)
```typescript
export interface ReportListItem {
id: string;
report_type: string;
period_start: string;
period_end: string;
validation_status: string;
generated_at: string;
}
export interface ReportDetail extends ReportListItem {
report_data: Record<string, unknown>;
created_at: string;
}
export function useReports(params?: {
report_type?: string;
start_date?: string;
end_date?: string;
limit?: number;
offset?: number;
}) {
const qs = new URLSearchParams();
if (params?.report_type) qs.set('report_type', params.report_type);
if (params?.start_date) qs.set('start_date', params.start_date);
if (params?.end_date) qs.set('end_date', params.end_date);
if (params?.limit) qs.set('limit', String(params.limit));
if (params?.offset) qs.set('offset', String(params.offset));
const path = `/api/reports${qs.toString() ? '?' + qs : ''}`;
return useGet<ReportListItem[]>(['reports', params], 'query', path);
}
export function useReport(id: string | undefined) {
return useGet<ReportDetail>(
['report', id], 'query', `/api/reports/${id}`, !!id
);
}
```
---
## Data Models
### Database Schema (Migration 038)
#### trading_reports
```sql
CREATE TABLE IF NOT EXISTS trading_reports (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
report_type VARCHAR(20) NOT NULL,
period_start DATE NOT NULL,
period_end DATE NOT NULL,
report_data JSONB NOT NULL,
validation_status VARCHAR(20) NOT NULL DEFAULT 'passed',
generated_at TIMESTAMPTZ NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT uq_trading_reports_period UNIQUE (report_type, period_start, period_end),
CONSTRAINT chk_report_type CHECK (report_type IN ('daily', 'weekly'))
);
CREATE INDEX IF NOT EXISTS idx_trading_reports_type ON trading_reports(report_type);
CREATE INDEX IF NOT EXISTS idx_trading_reports_period ON trading_reports(period_start, period_end);
CREATE INDEX IF NOT EXISTS idx_trading_reports_generated ON trading_reports(generated_at DESC);
```
#### Report Summarizer Agent Seed
```sql
INSERT INTO ai_agents (name, slug, purpose, model_provider, model_name, system_prompt, prompt_version, schema_version, temperature, max_tokens, timeout_seconds, max_retries, source)
SELECT * FROM (VALUES
(
'Report Summarizer',
'report-summarizer',
'Generates concise natural-language summaries of trading performance report sections. Processes chunked data within the 8k-token context window.',
'ollama',
'qwen3.5:9b-fast',
E'You are a concise financial performance analyst. You summarize trading performance data into clear, professional prose.\n\nSTRICT RULES:\n1. Do NOT fabricate any data not present in the input.\n2. Do NOT add opinions, predictions, or recommendations.\n3. Keep each summary under 200 words.\n4. Highlight notable trends, outliers, and changes from prior periods.\n5. Use precise numbers from the input data.\n6. Use a neutral, professional tone.\n7. Return ONLY the summary text. No JSON, no markdown, no commentary.',
'report-summarizer-v1',
'1.0.0',
0.0,
1024,
60,
2,
'system'
)
) AS v(name, slug, purpose, model_provider, model_name, system_prompt, prompt_version, schema_version, temperature, max_tokens, timeout_seconds, max_retries, source)
WHERE NOT EXISTS (SELECT 1 FROM ai_agents WHERE slug = 'report-summarizer');
```
### Report JSONB Structure
The `report_data` column stores a JSON object matching the `ReportData` Pydantic model:
```json
{
"pnl": {
"realized_pnl": 125.50,
"unrealized_pnl": -30.20,
"daily_return": 0.012,
"cumulative_return": 0.085,
"win_count": 8,
"loss_count": 3,
"win_rate": 0.727,
"profit_factor": 2.15,
"sharpe_ratio": 1.42,
"summary": "AI-generated summary...",
"validation_warnings": []
},
"recommendation_accuracy": {
"total_evaluated": 15,
"act_count": 8,
"skip_count": 7,
"acted_win_rate": 0.75,
"avg_confidence_acted": 0.72,
"avg_confidence_skipped": 0.48,
"summary": "AI-generated summary...",
"validation_warnings": []
},
"position_performance": {
"positions": [
{
"ticker": "AAPL",
"entry_price": 185.50,
"current_or_exit_price": 192.30,
"pnl": 68.00,
"pnl_pct": 3.66,
"hold_duration_hours": 72.5,
"status": "open"
}
],
"summary": "AI-generated summary..."
},
"risk_metrics": {
"current_risk_tier": "moderate",
"portfolio_heat": 0.12,
"max_drawdown": 0.08,
"current_drawdown_pct": 0.03,
"reserve_pool_balance": 450.00,
"circuit_breaker_event_count": 1,
"summary": "AI-generated summary..."
},
"model_quality": {
"windows": [
{
"lookback": "7d",
"win_rate": 0.65,
"directional_accuracy": 0.62,
"information_coefficient": 0.08,
"calibration_error": 0.12,
"brier_score": 0.22
}
],
"summary": "AI-generated summary...",
"validation_warnings": []
},
"executive_summary": "AI-generated executive summary...",
"validation_status": "passed",
"generated_at": "2025-01-15T21:30:00Z",
"period_start": "2025-01-15",
"period_end": "2025-01-15",
"report_type": "daily"
}
```
---
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
The following properties were derived from the acceptance criteria through systematic prework analysis. After reflection, 5 unique properties remain. Report section structure checks (3.13.5) are subsumed by the round-trip property — if a ReportData object survives serialization and deserialization, its structure is correct by construction (Pydantic enforces required fields). Validation status computation (4.4) is subsumed by the discrepancy detection property. ISO 8601 datetime formatting (8.4) is verified as part of the round-trip property since Pydantic's JSON serialization uses ISO 8601 by default and the round-trip would fail if datetimes were mangled.
### Property 1: Chunking Round-Trip and Size Constraint
*For any* input string, splitting it into chunks with a maximum size limit SHALL produce chunks where (a) every chunk is ≤ the size limit in characters, (b) no chunk is empty (except when the input itself is empty, which produces exactly one empty chunk), and (c) concatenating all chunks in order reconstructs the original input string.
**Validates: Requirements 2.2**
### Property 2: Report Serialization Round-Trip
*For any* valid ReportData object (with valid P&L, recommendation accuracy, position performance, risk metrics, and model quality sections), serializing to JSON and then deserializing back SHALL produce a ReportData object equivalent to the original. All datetime fields in the serialized JSON SHALL be in ISO 8601 format.
**Validates: Requirements 8.1, 8.2, 8.3, 8.4**
### Property 3: Validation Discrepancy Detection Correctness
*For any* pair of computed metric value and snapshot metric value (both finite, non-negative floats), the validation function SHALL produce a warning if and only if the percentage difference exceeds 5%. The percentage difference SHALL be computed as `|computed - snapshot| / snapshot * 100` when snapshot > 0, and SHALL flag any non-zero computed value when snapshot is 0.
**Validates: Requirements 4.1, 4.2, 4.3, 4.4**
### Property 4: Recommendation Accuracy Aggregation
*For any* non-empty list of trading decisions with associated prediction outcomes (each having a boolean `direction_correct`, boolean `profitable`, and float `excess_return_vs_spy`), the computed win rate SHALL equal the count of profitable outcomes divided by total outcomes, the directional accuracy SHALL equal the count of direction-correct outcomes divided by total outcomes, and the average excess return SHALL equal the arithmetic mean of all excess_return_vs_spy values. All three values SHALL be in [0.0, 1.0] for rates and finite for the average.
**Validates: Requirements 1.4**
### Property 5: Portfolio Period-Over-Period Delta Computation
*For any* two valid portfolio snapshots (current and previous) with non-negative portfolio_value, active_pool, reserve_pool, and finite cumulative_return, the period-over-period deltas SHALL equal (current - previous) for each field. When no previous snapshot exists, the deltas SHALL be zero.
**Validates: Requirements 1.3**
---
## Error Handling
### Data Collection Failures
| Scenario | Handling |
|----------|----------|
| No trading_decisions for period | Generate zero-activity report with note "No trading activity during this period" |
| No portfolio_snapshot for period | Use most recent snapshot before period_start; if none exists, use zero values |
| No prediction_outcomes for period | Skip recommendation accuracy validation; set validation_warnings noting missing data |
| No model_metric_snapshots for period | Model quality section shows NULL values for all metrics |
| Database connection failure during collection | Propagate error to job processor for retry |
### AI Summarization Failures
| Scenario | Handling |
|----------|----------|
| LLM timeout (>60s) | Retry up to max_retries (from agent config, default 2) |
| LLM returns empty response | Treat as failure, retry |
| LLM returns response > 200 words | Truncate to 200 words at sentence boundary |
| All LLM retries exhausted | Fall back to deterministic template summary |
| AgentConfigResolver returns None (agent not found) | Log error, use deterministic summary for all sections |
| Chunk merge LLM call fails | Use concatenation of chunk summaries (joined with newlines) |
### Validation Edge Cases
| Scenario | Handling |
|----------|----------|
| Snapshot value is 0 and computed value is non-zero | Flag as warning with pct_difference = 100.0 |
| Both snapshot and computed values are 0 | No warning (0% difference) |
| Snapshot value is NULL | Skip validation for that metric, no warning |
| Computed value is NaN or infinity | Replace with 0.0, log warning |
| No prediction_outcomes to cross-reference | Skip recommendation accuracy validation entirely |
### Report Storage Failures
| Scenario | Handling |
|----------|----------|
| Unique constraint violation on insert | Use ON CONFLICT DO UPDATE to upsert |
| JSONB serialization failure | Log error with report structure, propagate to job processor |
| Report exceeds PostgreSQL JSONB size limit (~255 MB) | Extremely unlikely given report structure; log error if it occurs |
### Job Processing Failures
| Scenario | Handling |
|----------|----------|
| Job fails on first attempt | Retry with exponential backoff: 30s, 60s, 120s |
| Job fails after 3 retries | Mark job as failed, log error with full context |
| Duplicate job submitted for same period | Reject with log message, return without error |
| Redis connection failure | Job stays in queue, picked up on reconnection |
---
## Testing Strategy
### Property-Based Tests (Hypothesis)
Property-based tests use the Hypothesis library with `@settings(max_examples=100)`. Test files are prefixed `test_pbt_*` per project convention.
| Property | Test File | What It Tests |
|----------|-----------|---------------|
| Property 1: Chunking Round-Trip | `tests/test_pbt_report_chunking.py` | `chunk_data()` preserves content and respects size limits |
| Property 2: Report Serialization Round-Trip | `tests/test_pbt_report_serialization.py` | `ReportData.model_dump_json()``ReportData.model_validate_json()` round-trip |
| Property 3: Validation Discrepancy Detection | `tests/test_pbt_report_validation.py` | Discrepancy detection correctly flags >5% differences |
| Property 4: Recommendation Accuracy Aggregation | `tests/test_pbt_report_sections.py` | `build_recommendation_accuracy_section()` computes correct aggregates |
| Property 5: Portfolio Delta Computation | `tests/test_pbt_report_sections.py` | `build_pnl_section()` computes correct period-over-period deltas |
Each property test is tagged with a comment referencing the design property:
```python
# Feature: trading-feedback-engine, Property 1: Chunking round-trip and size constraint
```
### Unit Tests (pytest)
| Test File | Coverage |
|-----------|----------|
| `tests/test_report_sections.py` | Section builders with known inputs, edge cases (empty data, single position, zero-activity) |
| `tests/test_report_validator.py` | Specific discrepancy scenarios, boundary cases (exactly 5%), NULL snapshot values |
| `tests/test_report_summarizer.py` | Deterministic fallback summary, chunk splitting edge cases (empty input, single char) |
| `tests/test_report_models.py` | Pydantic model validation, enum constraints, default values |
| `tests/test_report_generator.py` | Orchestration with mocked dependencies, zero-activity report, upsert behavior |
### Integration Tests
| Test File | Coverage |
|-----------|----------|
| `tests/test_report_api.py` | API endpoints with seeded database, pagination, filtering by report_type and date range |
| `tests/test_report_storage.py` | Store/retrieve round-trip against real asyncpg pool, upsert behavior, unique constraint |
### Frontend Tests (Vitest)
| Test File | Coverage |
|-----------|----------|
| `frontend/src/test/reports.test.ts` | useReports and useReport hooks with MSW mocks, loading/error states |
### Test Configuration
- Python PBT: Hypothesis with `@settings(max_examples=100)`, files prefixed `test_pbt_*`
- Python unit/integration: pytest with pytest-asyncio for async code
- Frontend: Vitest with MSW for deterministic API mocking
- Lint: `ruff check services/` before all commits
- CI: Woodpecker runs all tests automatically on push to Gitea
@@ -0,0 +1,117 @@
# Requirements Document
## Introduction
The Trading Feedback Engine generates periodic performance reports from the Stonks Oracle trading system. Reports cover trading P&L, recommendation accuracy, position performance, risk metrics, and model quality trends. An AI agent (registered in the `ai_agents` table) summarizes sections of the report by processing data in small chunks that fit within the 8k-token context window. Reports are validated against live data from the prediction outcomes and model metric snapshots tables, stored in the database for retrieval, and exposed via API endpoints.
## Glossary
- **Feedback_Engine**: The backend service that orchestrates report generation, data collection, AI summarization, and report storage.
- **Report_Summarizer_Agent**: The AI agent registered in the `ai_agents` table that generates natural-language summaries for report sections. Uses the existing `AgentConfigResolver` and `llm_factory` infrastructure.
- **Report**: A structured JSON document containing trading performance metrics, AI-generated summaries, and validation data for a specific period (daily or weekly).
- **Report_Section**: A self-contained portion of a report (e.g., P&L summary, recommendation accuracy, position performance) that can be independently generated and summarized.
- **Chunk**: A subset of data rows small enough to fit within the 8k-token context window when serialized, allowing the Report_Summarizer_Agent to process it in a single LLM call.
- **Portfolio_Snapshot**: A daily record in the `portfolio_snapshots` table containing portfolio value, pool balances, returns, win/loss counts, Sharpe ratio, max drawdown, and risk tier.
- **Prediction_Outcome**: A record in the `prediction_outcomes` table containing realized returns, direction correctness, and excess returns vs benchmarks for a prediction at a specific horizon.
- **Model_Metric_Snapshot**: A record in the `model_metric_snapshots` table containing aggregate model quality metrics (win rate, IC, ECE, Brier score) for a lookback/horizon combination.
- **Trading_Decision**: A record in the `trading_decisions` table capturing the act/skip decision, skip reason, position sizing, risk tier, circuit breaker status, and decision trace for a recommendation evaluation.
- **Validation_Data**: Live data from `prediction_outcomes`, `model_metric_snapshots`, and `signal_evidence_links` used to cross-check report claims against actual measured performance.
- **Query_API**: The existing FastAPI service (`services/api/app.py`) that serves HTTP endpoints for the dashboard and external consumers.
## Requirements
### Requirement 1: Report Data Collection
**User Story:** As a trader, I want the feedback engine to collect all relevant trading data for a reporting period, so that reports reflect the complete picture of trading activity.
#### Acceptance Criteria
1. WHEN a report generation is triggered for a date range, THE Feedback_Engine SHALL query trading_decisions, orders, positions, portfolio_snapshots, recommendations, prediction_outcomes, and model_metric_snapshots for that period.
2. WHEN collecting trading decision data, THE Feedback_Engine SHALL include the decision type, skip reason, ticker, computed position size, risk tier, circuit breaker status, and correlation check result for each Trading_Decision.
3. WHEN collecting portfolio data, THE Feedback_Engine SHALL retrieve the most recent Portfolio_Snapshot within the reporting period and compute period-over-period changes in portfolio value, active pool, reserve pool, and cumulative return.
4. WHEN collecting recommendation accuracy data, THE Feedback_Engine SHALL join recommendations with Prediction_Outcomes to compute win rate, directional accuracy, and average excess return vs SPY for the period.
5. IF no trading_decisions exist for the requested period, THEN THE Feedback_Engine SHALL generate a report with zero-activity sections and a note indicating no trading occurred.
### Requirement 2: Chunked AI Summarization
**User Story:** As a trader, I want AI-generated summaries in my reports, so that I can quickly understand performance trends without reading raw numbers.
#### Acceptance Criteria
1. THE Report_Summarizer_Agent SHALL be registered in the `ai_agents` table with slug `report-summarizer`, model `qwen3.5:9b-fast`, and source `system`.
2. WHEN generating a summary for a Report_Section, THE Feedback_Engine SHALL serialize the section data into Chunks of no more than 6,000 characters each to stay within the 8k-token context window.
3. WHEN a Report_Section contains data that exceeds a single Chunk, THE Feedback_Engine SHALL split the data into multiple Chunks, summarize each Chunk independently, and then produce a final merged summary from the individual Chunk summaries.
4. WHEN invoking the Report_Summarizer_Agent, THE Feedback_Engine SHALL use the existing `AgentConfigResolver` and `llm_factory` infrastructure to resolve model configuration and build the LLM client.
5. WHEN invoking the Report_Summarizer_Agent, THE Feedback_Engine SHALL log each invocation to the `agent_performance_log` table with agent_id, success status, duration_ms, and token estimates.
6. IF the Report_Summarizer_Agent fails after max_retries, THEN THE Feedback_Engine SHALL fall back to a deterministic text summary built from the raw metrics and continue report generation.
### Requirement 3: Report Structure and Content
**User Story:** As a trader, I want reports to cover P&L, recommendation accuracy, position performance, risk metrics, and model quality, so that I have a comprehensive view of system performance.
#### Acceptance Criteria
1. THE Report SHALL contain a P&L section with realized P&L, unrealized P&L, daily return, cumulative return, win count, loss count, win rate, profit factor, and Sharpe ratio for the reporting period.
2. THE Report SHALL contain a recommendation accuracy section with total recommendations evaluated, act/skip breakdown, win rate of acted-upon recommendations, and average confidence of acted vs skipped recommendations.
3. THE Report SHALL contain a position performance section listing each position held during the period with ticker, entry price, current or exit price, unrealized or realized P&L, P&L percentage, and hold duration.
4. THE Report SHALL contain a risk metrics section with current risk tier, portfolio heat, max drawdown, current drawdown percentage, reserve pool balance, and a count of circuit breaker events during the period.
5. THE Report SHALL contain a model quality section with the latest Model_Metric_Snapshot values for win rate, directional accuracy, information coefficient, calibration error (ECE), and Brier score across the 7d, 30d, and 90d lookback windows.
6. THE Report SHALL contain an AI-generated executive summary that synthesizes the key findings from all sections into a concise narrative of no more than 300 words.
### Requirement 4: Report Validation Against Live Data
**User Story:** As a trader, I want report metrics to be cross-checked against live validation data, so that I can trust the accuracy of the reported numbers.
#### Acceptance Criteria
1. WHEN generating the recommendation accuracy section, THE Feedback_Engine SHALL cross-reference reported win rates with the `direction_correct` and `profitable` fields from Prediction_Outcomes for the same tickers and period.
2. WHEN generating the model quality section, THE Feedback_Engine SHALL compare the reported metrics against the most recent Model_Metric_Snapshot records and flag discrepancies greater than 5% between computed and snapshot values.
3. WHEN a validation discrepancy is detected, THE Feedback_Engine SHALL include a `validation_warnings` array in the report section with the field name, computed value, snapshot value, and percentage difference.
4. THE Report SHALL include a `validation_status` field set to `passed` when no discrepancies exceed 5%, or `warnings` when one or more discrepancies are detected.
### Requirement 5: Report Storage and Retrieval
**User Story:** As a trader, I want reports stored in the database and accessible via API, so that I can review historical performance at any time.
#### Acceptance Criteria
1. THE Feedback_Engine SHALL store each generated Report as a row in a `trading_reports` table with columns for id (UUID), report_type (daily/weekly), period_start (DATE), period_end (DATE), report_data (JSONB), validation_status (VARCHAR), generated_at (TIMESTAMPTZ), and created_at (TIMESTAMPTZ).
2. THE Feedback_Engine SHALL enforce a unique constraint on (report_type, period_start, period_end) to prevent duplicate reports for the same period.
3. WHEN a report for an existing period is regenerated, THE Feedback_Engine SHALL update the existing row with the new report_data, validation_status, and generated_at timestamp.
4. THE Query_API SHALL expose a `GET /api/reports` endpoint that returns a paginated list of reports with id, report_type, period_start, period_end, validation_status, and generated_at.
5. THE Query_API SHALL expose a `GET /api/reports/{report_id}` endpoint that returns the full report including report_data JSONB.
6. THE Query_API SHALL support filtering reports by report_type and date range via query parameters on the `GET /api/reports` endpoint.
### Requirement 6: Periodic Report Generation
**User Story:** As a trader, I want reports generated automatically on a daily and weekly schedule, so that I always have up-to-date performance feedback.
#### Acceptance Criteria
1. THE Feedback_Engine SHALL generate a daily report after market close (after 16:30 ET) covering the current trading day.
2. THE Feedback_Engine SHALL generate a weekly report on Saturday covering the Monday-through-Friday trading week.
3. WHEN a scheduled report generation is triggered, THE Feedback_Engine SHALL enqueue a report generation job on a Redis queue for asynchronous processing.
4. IF a report generation job fails, THEN THE Feedback_Engine SHALL retry the job up to 3 times with exponential backoff before marking the job as failed.
5. WHILE a report generation job is in progress for a given period, THE Feedback_Engine SHALL reject duplicate job submissions for the same report_type and period.
### Requirement 7: Agent Registration and Editability
**User Story:** As a trader, I want the report summarizer agent registered in the ai_agents table, so that I can edit its prompts, model, and parameters through the existing agent management API.
#### Acceptance Criteria
1. THE Feedback_Engine SHALL register the Report_Summarizer_Agent in the `ai_agents` table via a database migration with slug `report-summarizer`, source `system`, model_provider `ollama`, and model_name `qwen3.5:9b-fast`.
2. THE Report_Summarizer_Agent system prompt SHALL instruct the model to produce concise financial performance summaries, avoid fabricating data not present in the input, and keep each summary under 200 words.
3. THE Report_Summarizer_Agent SHALL support variant creation and activation through the existing agent variants system, allowing A/B testing of different summarization prompts.
4. WHEN the Report_Summarizer_Agent configuration is updated via the agent management API, THE Feedback_Engine SHALL pick up the new configuration within 60 seconds via the `AgentConfigResolver` TTL cache.
### Requirement 8: Report Serialization Round-Trip
**User Story:** As a developer, I want report data to survive serialization and deserialization without data loss, so that stored reports are always faithful to the generated content.
#### Acceptance Criteria
1. THE Feedback_Engine SHALL serialize Report objects to JSON for storage in the `report_data` JSONB column.
2. THE Feedback_Engine SHALL deserialize stored JSON back into Report objects for API responses.
3. FOR ALL valid Report objects, serializing to JSON then deserializing back SHALL produce an equivalent Report object (round-trip property).
4. THE Feedback_Engine SHALL use ISO 8601 format for all datetime fields in serialized reports.
@@ -0,0 +1,195 @@
# Implementation Plan: Trading Feedback Engine
## Overview
Add a periodic trading performance reporting system to Stonks Oracle. The system collects trading data, generates structured JSON reports with AI-powered summaries, validates metrics against live data, and stores reports for retrieval via API. Implementation follows the four-phase approach from the design: foundation → validation & AI → generator & API → scheduling & tests.
## Tasks
- [x] 1. Database migration 038 — trading_reports table and report-summarizer agent
- [x] 1.1 Create `infra/migrations/038_trading_reports.sql`
- Create `trading_reports` table with columns: id (UUID PK, gen_random_uuid()), report_type (VARCHAR(20) NOT NULL), period_start (DATE NOT NULL), period_end (DATE NOT NULL), report_data (JSONB NOT NULL), validation_status (VARCHAR(20) NOT NULL DEFAULT 'passed'), generated_at (TIMESTAMPTZ NOT NULL), created_at (TIMESTAMPTZ NOT NULL DEFAULT NOW())
- Add UNIQUE constraint on (report_type, period_start, period_end)
- Add CHECK constraint: report_type IN ('daily', 'weekly')
- Create indexes: idx_trading_reports_type, idx_trading_reports_period, idx_trading_reports_generated
- Seed Report_Summarizer_Agent into ai_agents table with slug 'report-summarizer', model_provider 'ollama', model_name 'qwen3.5:9b-fast', source 'system', temperature 0.0, max_tokens 1024, timeout_seconds 60, max_retries 2
- Use WHERE NOT EXISTS guard on agent insert to be idempotent
- _Requirements: 5.1, 5.2, 7.1, 7.2_
- [x] 1.2 Add `QUEUE_REPORT_GENERATION` constant to `services/shared/redis_keys.py`
- Add `QUEUE_REPORT_GENERATION = "report_generation"` following existing queue naming convention
- _Requirements: 6.3_
- [x] 2. Phase 1 — Report models, data collector, and section builders
- [x] 2.1 Create report models (`services/reporting/models.py`)
- Create `services/reporting/__init__.py`
- Define enums: ReportType (daily, weekly), ValidationStatus (passed, warnings)
- Define Pydantic models: ValidationWarning, PLSection, RecommendationAccuracySection, PositionDetail, PositionPerformanceSection, RiskMetricsSection, ModelQualityWindow, ModelQualitySection, ReportData
- ReportData includes all sections, executive_summary, validation_status, generated_at, period_start, period_end, report_type
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 8.1, 8.2, 8.4_
- [x] 2.2 Implement data collector (`services/reporting/collector.py`)
- Define CollectedData dataclass with fields: trading_decisions, orders, open_positions, closed_positions, portfolio_snapshot, previous_portfolio_snapshot, recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_balance
- Implement `collect_report_data(pool, period_start, period_end)` → CollectedData
- Query trading_decisions, orders, positions (open + closed), portfolio_snapshots (current + previous), recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_ledger for the period
- Return empty lists for tables with no data (zero-activity case)
- Use `_row_dict()` pattern for UUID conversion from asyncpg rows
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
- [x] 2.3 Implement section builders (`services/reporting/sections.py`)
- Implement `build_pnl_section(data: CollectedData) -> PLSection` — compute realized/unrealized P&L, daily return, cumulative return, win/loss counts, win rate, profit factor, Sharpe ratio from portfolio_snapshot and closed positions
- Implement `build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection` — join trading_decisions with prediction_outcomes, compute act/skip breakdown, win rate of acted, avg confidence acted vs skipped
- Implement `build_position_performance_section(data: CollectedData) -> PositionPerformanceSection` — list each position with ticker, entry price, current/exit price, P&L, P&L%, hold duration
- Implement `build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection` — extract risk tier, portfolio heat, max drawdown, current drawdown %, reserve pool balance, circuit breaker event count
- Implement `build_model_quality_section(data: CollectedData) -> ModelQualitySection` — extract model_metric_snapshot values for 7d, 30d, 90d lookback windows
- Handle zero-activity gracefully (zero values, empty lists)
- _Requirements: 1.3, 1.4, 3.1, 3.2, 3.3, 3.4, 3.5_
- [x] 3. Checkpoint — Verify foundation modules
- Ensure all tests pass, ask the user if questions arise.
- Run `.venv/bin/ruff check services/reporting/`
- Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify models and section builders
- [x] 4. Phase 2 — Report validator and AI summarizer
- [x] 4.1 Implement report validator (`services/reporting/validator.py`)
- Define `DISCREPANCY_THRESHOLD_PCT = 5.0`
- Implement `validate_recommendation_accuracy(section, prediction_outcomes)` → list[ValidationWarning] — compare computed win rate against direction_correct/profitable from prediction_outcomes, flag >5% discrepancies
- Implement `validate_model_quality(section, metric_snapshots)` → list[ValidationWarning] — compare reported metrics against model_metric_snapshots for win_rate, directional_accuracy, IC, ECE, Brier score, flag >5% discrepancies
- Implement `compute_validation_status(report: ReportData)` → ValidationStatus — return 'passed' if no warnings, 'warnings' if any section has validation_warnings
- Handle edge cases: snapshot=0 with computed≠0 → 100% difference; both=0 → no warning; snapshot=NULL → skip; computed=NaN → replace with 0.0
- _Requirements: 4.1, 4.2, 4.3, 4.4_
- [x] 4.2 Implement AI summarizer (`services/reporting/summarizer.py`)
- Define constants: CHUNK_SIZE_LIMIT=6000, MAX_SUMMARY_WORDS=200, MAX_EXECUTIVE_SUMMARY_WORDS=300
- Implement `chunk_data(serialized: str, max_chars: int)` → list[str] — split on newline boundaries, each chunk ≤ max_chars, at least one chunk returned
- Implement `summarize_section(pool, resolver, section_name, section_data)` → str — serialize, chunk if needed, summarize each chunk via Report_Summarizer_Agent (resolved by slug 'report-summarizer'), merge if multiple chunks, log to agent_performance_log, fall back to deterministic on failure
- Implement `build_deterministic_summary(section_name, section_data)` → str — template-based fallback summary from raw metrics
- Implement `generate_executive_summary(pool, resolver, section_summaries)` → str — concatenate section summaries, chunk if needed, produce ≤300-word synthesis, fall back to concatenation on failure
- Use AgentConfigResolver + llm_factory for LLM access
- Log each invocation to agent_performance_log with agent_id, success, duration_ms, token estimates
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.6_
- [x] 5. Checkpoint — Verify validator and summarizer
- Ensure all tests pass, ask the user if questions arise.
- Run `.venv/bin/ruff check services/reporting/`
- Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify validator and summarizer
- [x] 6. Phase 3 — Report generator orchestrator and API endpoints
- [x] 6.1 Implement report generator (`services/reporting/generator.py`)
- Implement `generate_report(pool, report_type, period_start, period_end)` → ReportData — orchestrate: collect data → build sections → validate → summarize → assemble ReportData
- Implement `store_report(pool, report)` → str (UUID) — INSERT ... ON CONFLICT (report_type, period_start, period_end) DO UPDATE for upsert, return report id
- Implement `process_report_job(pool, job: dict)` → None — deserialize job payload, call generate_report + store_report, handle retries with exponential backoff (30s, 60s, 120s up to 3 attempts), reject duplicate jobs for same report_type + period
- _Requirements: 5.1, 5.2, 5.3, 6.3, 6.4, 6.5_
- [x] 6.2 Add API endpoints to `services/api/app.py`
- Add `GET /api/reports` — paginated list with query params: report_type, start_date, end_date, limit (default 20), offset (default 0); returns id, report_type, period_start, period_end, validation_status, generated_at
- Add `GET /api/reports/{report_id}` — full report including report_data JSONB
- Use asyncpg pool from existing app state
- Return 404 for non-existent report_id
- _Requirements: 5.4, 5.5, 5.6_
- [x] 6.3 Add frontend hooks to `frontend/src/api/hooks.ts`
- Add `ReportListItem` and `ReportDetail` TypeScript interfaces
- Implement `useReports(params?)` hook — builds query string from report_type, start_date, end_date, limit, offset; uses `useGet` with 'query' base
- Implement `useReport(id)` hook — fetches single report by id, enabled only when id is defined
- _Requirements: 5.4, 5.5_
- [x] 7. Checkpoint — Verify generator and API
- Ensure all tests pass, ask the user if questions arise.
- Run `.venv/bin/ruff check services/`
- Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify generator and API endpoints
- [x] 8. Phase 4 — Scheduling, property-based tests, unit tests, and frontend tests
- [x] 8.1 Wire Redis queue integration and scheduler
- Add report generation job consumer to the scheduler service that listens on `stonks:queue:report_generation`
- Add daily report trigger (after 16:30 ET on trading days) and weekly report trigger (Saturday) to the scheduler
- Job payload: `{"report_type": "daily"|"weekly", "period_start": "YYYY-MM-DD", "period_end": "YYYY-MM-DD"}`
- _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_
- [x] 8.2 Write property test: Chunking Round-Trip and Size Constraint
- **Property 1: Chunking Round-Trip and Size Constraint**
- File: `tests/test_pbt_report_chunking.py`
- Use Hypothesis `@settings(max_examples=100)` with `@given(st.text())` and `@given(st.integers(min_value=1, max_value=10000))`
- Assert: every chunk ≤ max_chars, no empty chunks (except empty input → one empty chunk), concatenation of chunks == original input
- **Validates: Requirements 2.2**
- [x] 8.3 Write property test: Report Serialization Round-Trip
- **Property 2: Report Serialization Round-Trip**
- File: `tests/test_pbt_report_serialization.py`
- Use Hypothesis with custom strategies for ReportData (valid PLSection, RecommendationAccuracySection, etc.)
- Assert: `ReportData.model_validate_json(report.model_dump_json())` == original report
- Assert: all datetime fields in serialized JSON are ISO 8601 format
- **Validates: Requirements 8.1, 8.2, 8.3, 8.4**
- [x] 8.4 Write property test: Validation Discrepancy Detection Correctness
- **Property 3: Validation Discrepancy Detection Correctness**
- File: `tests/test_pbt_report_validation.py`
- Use Hypothesis with `@given(st.floats(min_value=0, max_value=1e6), st.floats(min_value=0, max_value=1e6))`
- Assert: warning iff |computed - snapshot| / snapshot * 100 > 5% (when snapshot > 0); flag any non-zero computed when snapshot == 0; no warning when both == 0
- **Validates: Requirements 4.1, 4.2, 4.3, 4.4**
- [x] 8.5 Write property test: Recommendation Accuracy Aggregation
- **Property 4: Recommendation Accuracy Aggregation**
- File: `tests/test_pbt_report_sections.py`
- Use Hypothesis with lists of trading decisions + prediction outcomes (direction_correct bool, profitable bool, excess_return_vs_spy float)
- Assert: win_rate == count(profitable) / total, directional_accuracy == count(direction_correct) / total, avg excess return == mean(excess_return_vs_spy), all rates in [0.0, 1.0]
- **Validates: Requirements 1.4**
- [x] 8.6 Write property test: Portfolio Period-Over-Period Delta Computation
- **Property 5: Portfolio Period-Over-Period Delta Computation**
- File: `tests/test_pbt_report_sections.py`
- Use Hypothesis with two portfolio snapshots (non-negative portfolio_value, active_pool, reserve_pool, finite cumulative_return)
- Assert: deltas == (current - previous) for each field; when no previous snapshot, deltas == 0
- **Validates: Requirements 1.3**
- [x] 8.7 Write unit tests for section builders
- File: `tests/test_report_sections.py`
- Test each section builder with known inputs and expected outputs
- Test edge cases: empty data (zero-activity), single position, no portfolio snapshot
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5_
- [x] 8.8 Write unit tests for report validator
- File: `tests/test_report_validator.py`
- Test specific discrepancy scenarios: exactly 5% (no warning), 5.1% (warning), snapshot=0 computed≠0, both=0, NULL snapshot
- _Requirements: 4.1, 4.2, 4.3, 4.4_
- [x] 8.9 Write unit tests for AI summarizer
- File: `tests/test_report_summarizer.py`
- Test deterministic fallback summary generation
- Test chunk_data edge cases: empty input, single character, exactly at limit, one char over limit
- _Requirements: 2.2, 2.6_
- [x] 8.10 Write unit tests for report generator
- File: `tests/test_report_generator.py`
- Test orchestration with mocked dependencies (collector, sections, validator, summarizer)
- Test zero-activity report generation
- Test upsert behavior (regeneration of existing report)
- _Requirements: 5.1, 5.2, 5.3_
- [x] 8.11 Write API integration tests
- File: `tests/test_report_api.py`
- Test GET /api/reports with pagination, filtering by report_type and date range
- Test GET /api/reports/{report_id} with valid and invalid IDs
- _Requirements: 5.4, 5.5, 5.6_
- [x] 8.12 Write frontend hook tests
- File: `frontend/src/test/reports.test.ts`
- Test useReports and useReport hooks with MSW mocks
- Test loading and error states
- _Requirements: 5.4, 5.5_
- [x] 9. Final checkpoint — Full test suite and lint
- Ensure all tests pass, ask the user if questions arise.
- Run `.venv/bin/ruff check services/`
- Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"`
- Run frontend tests: `cd frontend && npx vitest --run`
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation after each phase
- Property tests validate the 5 universal correctness properties from the design document
- Unit tests validate specific examples and edge cases
- The design document contains full interface signatures — use those as the implementation guide
- Always run `.venv/bin/ruff check services/` before committing Python changes