# Implementation Plan: Trading Feedback Engine ## Overview Add a periodic trading performance reporting system to Stonks Oracle. The system collects trading data, generates structured JSON reports with AI-powered summaries, validates metrics against live data, and stores reports for retrieval via API. Implementation follows the four-phase approach from the design: foundation → validation & AI → generator & API → scheduling & tests. ## Tasks - [x] 1. Database migration 038 — trading_reports table and report-summarizer agent - [x] 1.1 Create `infra/migrations/038_trading_reports.sql` - Create `trading_reports` table with columns: id (UUID PK, gen_random_uuid()), report_type (VARCHAR(20) NOT NULL), period_start (DATE NOT NULL), period_end (DATE NOT NULL), report_data (JSONB NOT NULL), validation_status (VARCHAR(20) NOT NULL DEFAULT 'passed'), generated_at (TIMESTAMPTZ NOT NULL), created_at (TIMESTAMPTZ NOT NULL DEFAULT NOW()) - Add UNIQUE constraint on (report_type, period_start, period_end) - Add CHECK constraint: report_type IN ('daily', 'weekly') - Create indexes: idx_trading_reports_type, idx_trading_reports_period, idx_trading_reports_generated - Seed Report_Summarizer_Agent into ai_agents table with slug 'report-summarizer', model_provider 'ollama', model_name 'qwen3.5:9b-fast', source 'system', temperature 0.0, max_tokens 1024, timeout_seconds 60, max_retries 2 - Use WHERE NOT EXISTS guard on agent insert to be idempotent - _Requirements: 5.1, 5.2, 7.1, 7.2_ - [x] 1.2 Add `QUEUE_REPORT_GENERATION` constant to `services/shared/redis_keys.py` - Add `QUEUE_REPORT_GENERATION = "report_generation"` following existing queue naming convention - _Requirements: 6.3_ - [x] 2. Phase 1 — Report models, data collector, and section builders - [x] 2.1 Create report models (`services/reporting/models.py`) - Create `services/reporting/__init__.py` - Define enums: ReportType (daily, weekly), ValidationStatus (passed, warnings) - Define Pydantic models: ValidationWarning, PLSection, RecommendationAccuracySection, PositionDetail, PositionPerformanceSection, RiskMetricsSection, ModelQualityWindow, ModelQualitySection, ReportData - ReportData includes all sections, executive_summary, validation_status, generated_at, period_start, period_end, report_type - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 8.1, 8.2, 8.4_ - [x] 2.2 Implement data collector (`services/reporting/collector.py`) - Define CollectedData dataclass with fields: trading_decisions, orders, open_positions, closed_positions, portfolio_snapshot, previous_portfolio_snapshot, recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_balance - Implement `collect_report_data(pool, period_start, period_end)` → CollectedData - Query trading_decisions, orders, positions (open + closed), portfolio_snapshots (current + previous), recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_ledger for the period - Return empty lists for tables with no data (zero-activity case) - Use `_row_dict()` pattern for UUID conversion from asyncpg rows - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_ - [x] 2.3 Implement section builders (`services/reporting/sections.py`) - Implement `build_pnl_section(data: CollectedData) -> PLSection` — compute realized/unrealized P&L, daily return, cumulative return, win/loss counts, win rate, profit factor, Sharpe ratio from portfolio_snapshot and closed positions - Implement `build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection` — join trading_decisions with prediction_outcomes, compute act/skip breakdown, win rate of acted, avg confidence acted vs skipped - Implement `build_position_performance_section(data: CollectedData) -> PositionPerformanceSection` — list each position with ticker, entry price, current/exit price, P&L, P&L%, hold duration - Implement `build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection` — extract risk tier, portfolio heat, max drawdown, current drawdown %, reserve pool balance, circuit breaker event count - Implement `build_model_quality_section(data: CollectedData) -> ModelQualitySection` — extract model_metric_snapshot values for 7d, 30d, 90d lookback windows - Handle zero-activity gracefully (zero values, empty lists) - _Requirements: 1.3, 1.4, 3.1, 3.2, 3.3, 3.4, 3.5_ - [x] 3. Checkpoint — Verify foundation modules - Ensure all tests pass, ask the user if questions arise. - Run `.venv/bin/ruff check services/reporting/` - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify models and section builders - [x] 4. Phase 2 — Report validator and AI summarizer - [x] 4.1 Implement report validator (`services/reporting/validator.py`) - Define `DISCREPANCY_THRESHOLD_PCT = 5.0` - Implement `validate_recommendation_accuracy(section, prediction_outcomes)` → list[ValidationWarning] — compare computed win rate against direction_correct/profitable from prediction_outcomes, flag >5% discrepancies - Implement `validate_model_quality(section, metric_snapshots)` → list[ValidationWarning] — compare reported metrics against model_metric_snapshots for win_rate, directional_accuracy, IC, ECE, Brier score, flag >5% discrepancies - Implement `compute_validation_status(report: ReportData)` → ValidationStatus — return 'passed' if no warnings, 'warnings' if any section has validation_warnings - Handle edge cases: snapshot=0 with computed≠0 → 100% difference; both=0 → no warning; snapshot=NULL → skip; computed=NaN → replace with 0.0 - _Requirements: 4.1, 4.2, 4.3, 4.4_ - [x] 4.2 Implement AI summarizer (`services/reporting/summarizer.py`) - Define constants: CHUNK_SIZE_LIMIT=6000, MAX_SUMMARY_WORDS=200, MAX_EXECUTIVE_SUMMARY_WORDS=300 - Implement `chunk_data(serialized: str, max_chars: int)` → list[str] — split on newline boundaries, each chunk ≤ max_chars, at least one chunk returned - Implement `summarize_section(pool, resolver, section_name, section_data)` → str — serialize, chunk if needed, summarize each chunk via Report_Summarizer_Agent (resolved by slug 'report-summarizer'), merge if multiple chunks, log to agent_performance_log, fall back to deterministic on failure - Implement `build_deterministic_summary(section_name, section_data)` → str — template-based fallback summary from raw metrics - Implement `generate_executive_summary(pool, resolver, section_summaries)` → str — concatenate section summaries, chunk if needed, produce ≤300-word synthesis, fall back to concatenation on failure - Use AgentConfigResolver + llm_factory for LLM access - Log each invocation to agent_performance_log with agent_id, success, duration_ms, token estimates - _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.6_ - [x] 5. Checkpoint — Verify validator and summarizer - Ensure all tests pass, ask the user if questions arise. - Run `.venv/bin/ruff check services/reporting/` - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify validator and summarizer - [x] 6. Phase 3 — Report generator orchestrator and API endpoints - [x] 6.1 Implement report generator (`services/reporting/generator.py`) - Implement `generate_report(pool, report_type, period_start, period_end)` → ReportData — orchestrate: collect data → build sections → validate → summarize → assemble ReportData - Implement `store_report(pool, report)` → str (UUID) — INSERT ... ON CONFLICT (report_type, period_start, period_end) DO UPDATE for upsert, return report id - Implement `process_report_job(pool, job: dict)` → None — deserialize job payload, call generate_report + store_report, handle retries with exponential backoff (30s, 60s, 120s up to 3 attempts), reject duplicate jobs for same report_type + period - _Requirements: 5.1, 5.2, 5.3, 6.3, 6.4, 6.5_ - [x] 6.2 Add API endpoints to `services/api/app.py` - Add `GET /api/reports` — paginated list with query params: report_type, start_date, end_date, limit (default 20), offset (default 0); returns id, report_type, period_start, period_end, validation_status, generated_at - Add `GET /api/reports/{report_id}` — full report including report_data JSONB - Use asyncpg pool from existing app state - Return 404 for non-existent report_id - _Requirements: 5.4, 5.5, 5.6_ - [x] 6.3 Add frontend hooks to `frontend/src/api/hooks.ts` - Add `ReportListItem` and `ReportDetail` TypeScript interfaces - Implement `useReports(params?)` hook — builds query string from report_type, start_date, end_date, limit, offset; uses `useGet` with 'query' base - Implement `useReport(id)` hook — fetches single report by id, enabled only when id is defined - _Requirements: 5.4, 5.5_ - [x] 7. Checkpoint — Verify generator and API - Ensure all tests pass, ask the user if questions arise. - Run `.venv/bin/ruff check services/` - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify generator and API endpoints - [x] 8. Phase 4 — Scheduling, property-based tests, unit tests, and frontend tests - [x] 8.1 Wire Redis queue integration and scheduler - Add report generation job consumer to the scheduler service that listens on `stonks:queue:report_generation` - Add daily report trigger (after 16:30 ET on trading days) and weekly report trigger (Saturday) to the scheduler - Job payload: `{"report_type": "daily"|"weekly", "period_start": "YYYY-MM-DD", "period_end": "YYYY-MM-DD"}` - _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_ - [x] 8.2 Write property test: Chunking Round-Trip and Size Constraint - **Property 1: Chunking Round-Trip and Size Constraint** - File: `tests/test_pbt_report_chunking.py` - Use Hypothesis `@settings(max_examples=100)` with `@given(st.text())` and `@given(st.integers(min_value=1, max_value=10000))` - Assert: every chunk ≤ max_chars, no empty chunks (except empty input → one empty chunk), concatenation of chunks == original input - **Validates: Requirements 2.2** - [x] 8.3 Write property test: Report Serialization Round-Trip - **Property 2: Report Serialization Round-Trip** - File: `tests/test_pbt_report_serialization.py` - Use Hypothesis with custom strategies for ReportData (valid PLSection, RecommendationAccuracySection, etc.) - Assert: `ReportData.model_validate_json(report.model_dump_json())` == original report - Assert: all datetime fields in serialized JSON are ISO 8601 format - **Validates: Requirements 8.1, 8.2, 8.3, 8.4** - [x] 8.4 Write property test: Validation Discrepancy Detection Correctness - **Property 3: Validation Discrepancy Detection Correctness** - File: `tests/test_pbt_report_validation.py` - Use Hypothesis with `@given(st.floats(min_value=0, max_value=1e6), st.floats(min_value=0, max_value=1e6))` - Assert: warning iff |computed - snapshot| / snapshot * 100 > 5% (when snapshot > 0); flag any non-zero computed when snapshot == 0; no warning when both == 0 - **Validates: Requirements 4.1, 4.2, 4.3, 4.4** - [x] 8.5 Write property test: Recommendation Accuracy Aggregation - **Property 4: Recommendation Accuracy Aggregation** - File: `tests/test_pbt_report_sections.py` - Use Hypothesis with lists of trading decisions + prediction outcomes (direction_correct bool, profitable bool, excess_return_vs_spy float) - Assert: win_rate == count(profitable) / total, directional_accuracy == count(direction_correct) / total, avg excess return == mean(excess_return_vs_spy), all rates in [0.0, 1.0] - **Validates: Requirements 1.4** - [x] 8.6 Write property test: Portfolio Period-Over-Period Delta Computation - **Property 5: Portfolio Period-Over-Period Delta Computation** - File: `tests/test_pbt_report_sections.py` - Use Hypothesis with two portfolio snapshots (non-negative portfolio_value, active_pool, reserve_pool, finite cumulative_return) - Assert: deltas == (current - previous) for each field; when no previous snapshot, deltas == 0 - **Validates: Requirements 1.3** - [x] 8.7 Write unit tests for section builders - File: `tests/test_report_sections.py` - Test each section builder with known inputs and expected outputs - Test edge cases: empty data (zero-activity), single position, no portfolio snapshot - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5_ - [x] 8.8 Write unit tests for report validator - File: `tests/test_report_validator.py` - Test specific discrepancy scenarios: exactly 5% (no warning), 5.1% (warning), snapshot=0 computed≠0, both=0, NULL snapshot - _Requirements: 4.1, 4.2, 4.3, 4.4_ - [x] 8.9 Write unit tests for AI summarizer - File: `tests/test_report_summarizer.py` - Test deterministic fallback summary generation - Test chunk_data edge cases: empty input, single character, exactly at limit, one char over limit - _Requirements: 2.2, 2.6_ - [x] 8.10 Write unit tests for report generator - File: `tests/test_report_generator.py` - Test orchestration with mocked dependencies (collector, sections, validator, summarizer) - Test zero-activity report generation - Test upsert behavior (regeneration of existing report) - _Requirements: 5.1, 5.2, 5.3_ - [x] 8.11 Write API integration tests - File: `tests/test_report_api.py` - Test GET /api/reports with pagination, filtering by report_type and date range - Test GET /api/reports/{report_id} with valid and invalid IDs - _Requirements: 5.4, 5.5, 5.6_ - [x] 8.12 Write frontend hook tests - File: `frontend/src/test/reports.test.ts` - Test useReports and useReport hooks with MSW mocks - Test loading and error states - _Requirements: 5.4, 5.5_ - [x] 9. Final checkpoint — Full test suite and lint - Ensure all tests pass, ask the user if questions arise. - Run `.venv/bin/ruff check services/` - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` - Run frontend tests: `cd frontend && npx vitest --run` ## Notes - Tasks marked with `*` are optional and can be skipped for faster MVP - Each task references specific requirements for traceability - Checkpoints ensure incremental validation after each phase - Property tests validate the 5 universal correctness properties from the design document - Unit tests validate specific examples and edge cases - The design document contains full interface signatures — use those as the implementation guide - Always run `.venv/bin/ruff check services/` before committing Python changes