stonks-oracle/.kiro/specs/trading-feedback-engine/tasks.md

# Implementation Plan: Trading Feedback Engine

## Overview

Add a periodic trading performance reporting system to Stonks Oracle. The system collects trading data, generates structured JSON reports with AI-powered summaries, validates metrics against live data, and stores reports for retrieval via API. Implementation follows the four-phase approach from the design: foundation → validation & AI → generator & API → scheduling & tests.

## Tasks

- [x] 1. Database migration 038 — trading_reports table and report-summarizer agent
  - [x] 1.1 Create `infra/migrations/038_trading_reports.sql`
    - Create `trading_reports` table with columns: id (UUID PK, gen_random_uuid()), report_type (VARCHAR(20) NOT NULL), period_start (DATE NOT NULL), period_end (DATE NOT NULL), report_data (JSONB NOT NULL), validation_status (VARCHAR(20) NOT NULL DEFAULT 'passed'), generated_at (TIMESTAMPTZ NOT NULL), created_at (TIMESTAMPTZ NOT NULL DEFAULT NOW())
    - Add UNIQUE constraint on (report_type, period_start, period_end)
    - Add CHECK constraint: report_type IN ('daily', 'weekly')
    - Create indexes: idx_trading_reports_type, idx_trading_reports_period, idx_trading_reports_generated
    - Seed Report_Summarizer_Agent into ai_agents table with slug 'report-summarizer', model_provider 'ollama', model_name 'qwen3.5:9b-fast', source 'system', temperature 0.0, max_tokens 1024, timeout_seconds 60, max_retries 2
    - Use WHERE NOT EXISTS guard on agent insert to be idempotent
    - _Requirements: 5.1, 5.2, 7.1, 7.2_

  - [x] 1.2 Add `QUEUE_REPORT_GENERATION` constant to `services/shared/redis_keys.py`
    - Add `QUEUE_REPORT_GENERATION = "report_generation"` following existing queue naming convention
    - _Requirements: 6.3_

- [x] 2. Phase 1 — Report models, data collector, and section builders
  - [x] 2.1 Create report models (`services/reporting/models.py`)
    - Create `services/reporting/__init__.py`
    - Define enums: ReportType (daily, weekly), ValidationStatus (passed, warnings)
    - Define Pydantic models: ValidationWarning, PLSection, RecommendationAccuracySection, PositionDetail, PositionPerformanceSection, RiskMetricsSection, ModelQualityWindow, ModelQualitySection, ReportData
    - ReportData includes all sections, executive_summary, validation_status, generated_at, period_start, period_end, report_type
    - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 8.1, 8.2, 8.4_

  - [x] 2.2 Implement data collector (`services/reporting/collector.py`)
    - Define CollectedData dataclass with fields: trading_decisions, orders, open_positions, closed_positions, portfolio_snapshot, previous_portfolio_snapshot, recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_balance
    - Implement `collect_report_data(pool, period_start, period_end)` → CollectedData
    - Query trading_decisions, orders, positions (open + closed), portfolio_snapshots (current + previous), recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_ledger for the period
    - Return empty lists for tables with no data (zero-activity case)
    - Use `_row_dict()` pattern for UUID conversion from asyncpg rows
    - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_

  - [x] 2.3 Implement section builders (`services/reporting/sections.py`)
    - Implement `build_pnl_section(data: CollectedData) -> PLSection` — compute realized/unrealized P&L, daily return, cumulative return, win/loss counts, win rate, profit factor, Sharpe ratio from portfolio_snapshot and closed positions
    - Implement `build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection` — join trading_decisions with prediction_outcomes, compute act/skip breakdown, win rate of acted, avg confidence acted vs skipped
    - Implement `build_position_performance_section(data: CollectedData) -> PositionPerformanceSection` — list each position with ticker, entry price, current/exit price, P&L, P&L%, hold duration
    - Implement `build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection` — extract risk tier, portfolio heat, max drawdown, current drawdown %, reserve pool balance, circuit breaker event count
    - Implement `build_model_quality_section(data: CollectedData) -> ModelQualitySection` — extract model_metric_snapshot values for 7d, 30d, 90d lookback windows
    - Handle zero-activity gracefully (zero values, empty lists)
    - _Requirements: 1.3, 1.4, 3.1, 3.2, 3.3, 3.4, 3.5_

- [x] 3. Checkpoint — Verify foundation modules
  - Ensure all tests pass, ask the user if questions arise.
  - Run `.venv/bin/ruff check services/reporting/`
  - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify models and section builders

- [x] 4. Phase 2 — Report validator and AI summarizer
  - [x] 4.1 Implement report validator (`services/reporting/validator.py`)
    - Define `DISCREPANCY_THRESHOLD_PCT = 5.0`
    - Implement `validate_recommendation_accuracy(section, prediction_outcomes)` → list[ValidationWarning] — compare computed win rate against direction_correct/profitable from prediction_outcomes, flag >5% discrepancies
    - Implement `validate_model_quality(section, metric_snapshots)` → list[ValidationWarning] — compare reported metrics against model_metric_snapshots for win_rate, directional_accuracy, IC, ECE, Brier score, flag >5% discrepancies
    - Implement `compute_validation_status(report: ReportData)` → ValidationStatus — return 'passed' if no warnings, 'warnings' if any section has validation_warnings
    - Handle edge cases: snapshot=0 with computed≠0 → 100% difference; both=0 → no warning; snapshot=NULL → skip; computed=NaN → replace with 0.0
    - _Requirements: 4.1, 4.2, 4.3, 4.4_

  - [x] 4.2 Implement AI summarizer (`services/reporting/summarizer.py`)
    - Define constants: CHUNK_SIZE_LIMIT=6000, MAX_SUMMARY_WORDS=200, MAX_EXECUTIVE_SUMMARY_WORDS=300
    - Implement `chunk_data(serialized: str, max_chars: int)` → list[str] — split on newline boundaries, each chunk ≤ max_chars, at least one chunk returned
    - Implement `summarize_section(pool, resolver, section_name, section_data)` → str — serialize, chunk if needed, summarize each chunk via Report_Summarizer_Agent (resolved by slug 'report-summarizer'), merge if multiple chunks, log to agent_performance_log, fall back to deterministic on failure
    - Implement `build_deterministic_summary(section_name, section_data)` → str — template-based fallback summary from raw metrics
    - Implement `generate_executive_summary(pool, resolver, section_summaries)` → str — concatenate section summaries, chunk if needed, produce ≤300-word synthesis, fall back to concatenation on failure
    - Use AgentConfigResolver + llm_factory for LLM access
    - Log each invocation to agent_performance_log with agent_id, success, duration_ms, token estimates
    - _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.6_

- [x] 5. Checkpoint — Verify validator and summarizer
  - Ensure all tests pass, ask the user if questions arise.
  - Run `.venv/bin/ruff check services/reporting/`
  - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify validator and summarizer

- [x] 6. Phase 3 — Report generator orchestrator and API endpoints
  - [x] 6.1 Implement report generator (`services/reporting/generator.py`)
    - Implement `generate_report(pool, report_type, period_start, period_end)` → ReportData — orchestrate: collect data → build sections → validate → summarize → assemble ReportData
    - Implement `store_report(pool, report)` → str (UUID) — INSERT ... ON CONFLICT (report_type, period_start, period_end) DO UPDATE for upsert, return report id
    - Implement `process_report_job(pool, job: dict)` → None — deserialize job payload, call generate_report + store_report, handle retries with exponential backoff (30s, 60s, 120s up to 3 attempts), reject duplicate jobs for same report_type + period
    - _Requirements: 5.1, 5.2, 5.3, 6.3, 6.4, 6.5_

  - [x] 6.2 Add API endpoints to `services/api/app.py`
    - Add `GET /api/reports` — paginated list with query params: report_type, start_date, end_date, limit (default 20), offset (default 0); returns id, report_type, period_start, period_end, validation_status, generated_at
    - Add `GET /api/reports/{report_id}` — full report including report_data JSONB
    - Use asyncpg pool from existing app state
    - Return 404 for non-existent report_id
    - _Requirements: 5.4, 5.5, 5.6_

  - [x] 6.3 Add frontend hooks to `frontend/src/api/hooks.ts`
    - Add `ReportListItem` and `ReportDetail` TypeScript interfaces
    - Implement `useReports(params?)` hook — builds query string from report_type, start_date, end_date, limit, offset; uses `useGet` with 'query' base
    - Implement `useReport(id)` hook — fetches single report by id, enabled only when id is defined
    - _Requirements: 5.4, 5.5_

- [x] 7. Checkpoint — Verify generator and API
  - Ensure all tests pass, ask the user if questions arise.
  - Run `.venv/bin/ruff check services/`
  - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify generator and API endpoints

- [x] 8. Phase 4 — Scheduling, property-based tests, unit tests, and frontend tests
  - [x] 8.1 Wire Redis queue integration and scheduler
    - Add report generation job consumer to the scheduler service that listens on `stonks:queue:report_generation`
    - Add daily report trigger (after 16:30 ET on trading days) and weekly report trigger (Saturday) to the scheduler
    - Job payload: `{"report_type": "daily"|"weekly", "period_start": "YYYY-MM-DD", "period_end": "YYYY-MM-DD"}`
    - _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_

  - [x] 8.2 Write property test: Chunking Round-Trip and Size Constraint
    - **Property 1: Chunking Round-Trip and Size Constraint**
    - File: `tests/test_pbt_report_chunking.py`
    - Use Hypothesis `@settings(max_examples=100)` with `@given(st.text())` and `@given(st.integers(min_value=1, max_value=10000))`
    - Assert: every chunk ≤ max_chars, no empty chunks (except empty input → one empty chunk), concatenation of chunks == original input
    - **Validates: Requirements 2.2**

  - [x] 8.3 Write property test: Report Serialization Round-Trip
    - **Property 2: Report Serialization Round-Trip**
    - File: `tests/test_pbt_report_serialization.py`
    - Use Hypothesis with custom strategies for ReportData (valid PLSection, RecommendationAccuracySection, etc.)
    - Assert: `ReportData.model_validate_json(report.model_dump_json())` == original report
    - Assert: all datetime fields in serialized JSON are ISO 8601 format
    - **Validates: Requirements 8.1, 8.2, 8.3, 8.4**

  - [x] 8.4 Write property test: Validation Discrepancy Detection Correctness
    - **Property 3: Validation Discrepancy Detection Correctness**
    - File: `tests/test_pbt_report_validation.py`
    - Use Hypothesis with `@given(st.floats(min_value=0, max_value=1e6), st.floats(min_value=0, max_value=1e6))`
    - Assert: warning iff |computed - snapshot| / snapshot * 100 > 5% (when snapshot > 0); flag any non-zero computed when snapshot == 0; no warning when both == 0
    - **Validates: Requirements 4.1, 4.2, 4.3, 4.4**

  - [x] 8.5 Write property test: Recommendation Accuracy Aggregation
    - **Property 4: Recommendation Accuracy Aggregation**
    - File: `tests/test_pbt_report_sections.py`
    - Use Hypothesis with lists of trading decisions + prediction outcomes (direction_correct bool, profitable bool, excess_return_vs_spy float)
    - Assert: win_rate == count(profitable) / total, directional_accuracy == count(direction_correct) / total, avg excess return == mean(excess_return_vs_spy), all rates in [0.0, 1.0]
    - **Validates: Requirements 1.4**

  - [x] 8.6 Write property test: Portfolio Period-Over-Period Delta Computation
    - **Property 5: Portfolio Period-Over-Period Delta Computation**
    - File: `tests/test_pbt_report_sections.py`
    - Use Hypothesis with two portfolio snapshots (non-negative portfolio_value, active_pool, reserve_pool, finite cumulative_return)
    - Assert: deltas == (current - previous) for each field; when no previous snapshot, deltas == 0
    - **Validates: Requirements 1.3**

  - [x] 8.7 Write unit tests for section builders
    - File: `tests/test_report_sections.py`
    - Test each section builder with known inputs and expected outputs
    - Test edge cases: empty data (zero-activity), single position, no portfolio snapshot
    - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5_

  - [x] 8.8 Write unit tests for report validator
    - File: `tests/test_report_validator.py`
    - Test specific discrepancy scenarios: exactly 5% (no warning), 5.1% (warning), snapshot=0 computed≠0, both=0, NULL snapshot
    - _Requirements: 4.1, 4.2, 4.3, 4.4_

  - [x] 8.9 Write unit tests for AI summarizer
    - File: `tests/test_report_summarizer.py`
    - Test deterministic fallback summary generation
    - Test chunk_data edge cases: empty input, single character, exactly at limit, one char over limit
    - _Requirements: 2.2, 2.6_

  - [x] 8.10 Write unit tests for report generator
    - File: `tests/test_report_generator.py`
    - Test orchestration with mocked dependencies (collector, sections, validator, summarizer)
    - Test zero-activity report generation
    - Test upsert behavior (regeneration of existing report)
    - _Requirements: 5.1, 5.2, 5.3_

  - [x] 8.11 Write API integration tests
    - File: `tests/test_report_api.py`
    - Test GET /api/reports with pagination, filtering by report_type and date range
    - Test GET /api/reports/{report_id} with valid and invalid IDs
    - _Requirements: 5.4, 5.5, 5.6_

  - [x] 8.12 Write frontend hook tests
    - File: `frontend/src/test/reports.test.ts`
    - Test useReports and useReport hooks with MSW mocks
    - Test loading and error states
    - _Requirements: 5.4, 5.5_

- [x] 9. Final checkpoint — Full test suite and lint
  - Ensure all tests pass, ask the user if questions arise.
  - Run `.venv/bin/ruff check services/`
  - Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"`
  - Run frontend tests: `cd frontend && npx vitest --run`

## Notes

- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation after each phase
- Property tests validate the 5 universal correctness properties from the design document
- Unit tests validate specific examples and edge cases
- The design document contains full interface signatures — use those as the implementation guide
- Always run `.venv/bin/ruff check services/` before committing Python changes