- Migration 038: trading_reports table + report-summarizer agent seed
- 6 reporting modules: models, collector, sections, validator, summarizer, generator
- API endpoints: GET /api/reports (paginated, filterable), GET /api/reports/{id}
- Frontend hooks: useReports, useReport with TanStack Query
- Scheduler: daily (after 16:30 ET) and weekly (Saturday) report triggers
- Redis queue consumer for async report generation with retry/dedup
- 5 property-based tests (chunking, serialization, validation, accuracy, deltas)
- 109 unit/integration tests across all modules
- 6 frontend hook tests with MSW mocks
14 KiB
Implementation Plan: Trading Feedback Engine
Overview
Add a periodic trading performance reporting system to Stonks Oracle. The system collects trading data, generates structured JSON reports with AI-powered summaries, validates metrics against live data, and stores reports for retrieval via API. Implementation follows the four-phase approach from the design: foundation → validation & AI → generator & API → scheduling & tests.
Tasks
-
1. Database migration 038 — trading_reports table and report-summarizer agent
-
1.1 Create
infra/migrations/038_trading_reports.sql- Create
trading_reportstable with columns: id (UUID PK, gen_random_uuid()), report_type (VARCHAR(20) NOT NULL), period_start (DATE NOT NULL), period_end (DATE NOT NULL), report_data (JSONB NOT NULL), validation_status (VARCHAR(20) NOT NULL DEFAULT 'passed'), generated_at (TIMESTAMPTZ NOT NULL), created_at (TIMESTAMPTZ NOT NULL DEFAULT NOW()) - Add UNIQUE constraint on (report_type, period_start, period_end)
- Add CHECK constraint: report_type IN ('daily', 'weekly')
- Create indexes: idx_trading_reports_type, idx_trading_reports_period, idx_trading_reports_generated
- Seed Report_Summarizer_Agent into ai_agents table with slug 'report-summarizer', model_provider 'ollama', model_name 'qwen3.5:9b-fast', source 'system', temperature 0.0, max_tokens 1024, timeout_seconds 60, max_retries 2
- Use WHERE NOT EXISTS guard on agent insert to be idempotent
- Requirements: 5.1, 5.2, 7.1, 7.2
- Create
-
1.2 Add
QUEUE_REPORT_GENERATIONconstant toservices/shared/redis_keys.py- Add
QUEUE_REPORT_GENERATION = "report_generation"following existing queue naming convention - Requirements: 6.3
- Add
-
-
2. Phase 1 — Report models, data collector, and section builders
-
2.1 Create report models (
services/reporting/models.py)- Create
services/reporting/__init__.py - Define enums: ReportType (daily, weekly), ValidationStatus (passed, warnings)
- Define Pydantic models: ValidationWarning, PLSection, RecommendationAccuracySection, PositionDetail, PositionPerformanceSection, RiskMetricsSection, ModelQualityWindow, ModelQualitySection, ReportData
- ReportData includes all sections, executive_summary, validation_status, generated_at, period_start, period_end, report_type
- Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 8.1, 8.2, 8.4
- Create
-
2.2 Implement data collector (
services/reporting/collector.py)- Define CollectedData dataclass with fields: trading_decisions, orders, open_positions, closed_positions, portfolio_snapshot, previous_portfolio_snapshot, recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_balance
- Implement
collect_report_data(pool, period_start, period_end)→ CollectedData - Query trading_decisions, orders, positions (open + closed), portfolio_snapshots (current + previous), recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_ledger for the period
- Return empty lists for tables with no data (zero-activity case)
- Use
_row_dict()pattern for UUID conversion from asyncpg rows - Requirements: 1.1, 1.2, 1.3, 1.4, 1.5
-
2.3 Implement section builders (
services/reporting/sections.py)- Implement
build_pnl_section(data: CollectedData) -> PLSection— compute realized/unrealized P&L, daily return, cumulative return, win/loss counts, win rate, profit factor, Sharpe ratio from portfolio_snapshot and closed positions - Implement
build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection— join trading_decisions with prediction_outcomes, compute act/skip breakdown, win rate of acted, avg confidence acted vs skipped - Implement
build_position_performance_section(data: CollectedData) -> PositionPerformanceSection— list each position with ticker, entry price, current/exit price, P&L, P&L%, hold duration - Implement
build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection— extract risk tier, portfolio heat, max drawdown, current drawdown %, reserve pool balance, circuit breaker event count - Implement
build_model_quality_section(data: CollectedData) -> ModelQualitySection— extract model_metric_snapshot values for 7d, 30d, 90d lookback windows - Handle zero-activity gracefully (zero values, empty lists)
- Requirements: 1.3, 1.4, 3.1, 3.2, 3.3, 3.4, 3.5
- Implement
-
-
3. Checkpoint — Verify foundation modules
- Ensure all tests pass, ask the user if questions arise.
- Run
.venv/bin/ruff check services/reporting/ - Run
.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"to verify models and section builders
-
4. Phase 2 — Report validator and AI summarizer
-
4.1 Implement report validator (
services/reporting/validator.py)- Define
DISCREPANCY_THRESHOLD_PCT = 5.0 - Implement
validate_recommendation_accuracy(section, prediction_outcomes)→ list[ValidationWarning] — compare computed win rate against direction_correct/profitable from prediction_outcomes, flag >5% discrepancies - Implement
validate_model_quality(section, metric_snapshots)→ list[ValidationWarning] — compare reported metrics against model_metric_snapshots for win_rate, directional_accuracy, IC, ECE, Brier score, flag >5% discrepancies - Implement
compute_validation_status(report: ReportData)→ ValidationStatus — return 'passed' if no warnings, 'warnings' if any section has validation_warnings - Handle edge cases: snapshot=0 with computed≠0 → 100% difference; both=0 → no warning; snapshot=NULL → skip; computed=NaN → replace with 0.0
- Requirements: 4.1, 4.2, 4.3, 4.4
- Define
-
4.2 Implement AI summarizer (
services/reporting/summarizer.py)- Define constants: CHUNK_SIZE_LIMIT=6000, MAX_SUMMARY_WORDS=200, MAX_EXECUTIVE_SUMMARY_WORDS=300
- Implement
chunk_data(serialized: str, max_chars: int)→ list[str] — split on newline boundaries, each chunk ≤ max_chars, at least one chunk returned - Implement
summarize_section(pool, resolver, section_name, section_data)→ str — serialize, chunk if needed, summarize each chunk via Report_Summarizer_Agent (resolved by slug 'report-summarizer'), merge if multiple chunks, log to agent_performance_log, fall back to deterministic on failure - Implement
build_deterministic_summary(section_name, section_data)→ str — template-based fallback summary from raw metrics - Implement
generate_executive_summary(pool, resolver, section_summaries)→ str — concatenate section summaries, chunk if needed, produce ≤300-word synthesis, fall back to concatenation on failure - Use AgentConfigResolver + llm_factory for LLM access
- Log each invocation to agent_performance_log with agent_id, success, duration_ms, token estimates
- Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.6
-
-
5. Checkpoint — Verify validator and summarizer
- Ensure all tests pass, ask the user if questions arise.
- Run
.venv/bin/ruff check services/reporting/ - Run
.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"to verify validator and summarizer
-
6. Phase 3 — Report generator orchestrator and API endpoints
-
6.1 Implement report generator (
services/reporting/generator.py)- Implement
generate_report(pool, report_type, period_start, period_end)→ ReportData — orchestrate: collect data → build sections → validate → summarize → assemble ReportData - Implement
store_report(pool, report)→ str (UUID) — INSERT ... ON CONFLICT (report_type, period_start, period_end) DO UPDATE for upsert, return report id - Implement
process_report_job(pool, job: dict)→ None — deserialize job payload, call generate_report + store_report, handle retries with exponential backoff (30s, 60s, 120s up to 3 attempts), reject duplicate jobs for same report_type + period - Requirements: 5.1, 5.2, 5.3, 6.3, 6.4, 6.5
- Implement
-
6.2 Add API endpoints to
services/api/app.py- Add
GET /api/reports— paginated list with query params: report_type, start_date, end_date, limit (default 20), offset (default 0); returns id, report_type, period_start, period_end, validation_status, generated_at - Add
GET /api/reports/{report_id}— full report including report_data JSONB - Use asyncpg pool from existing app state
- Return 404 for non-existent report_id
- Requirements: 5.4, 5.5, 5.6
- Add
-
6.3 Add frontend hooks to
frontend/src/api/hooks.ts- Add
ReportListItemandReportDetailTypeScript interfaces - Implement
useReports(params?)hook — builds query string from report_type, start_date, end_date, limit, offset; usesuseGetwith 'query' base - Implement
useReport(id)hook — fetches single report by id, enabled only when id is defined - Requirements: 5.4, 5.5
- Add
-
-
7. Checkpoint — Verify generator and API
- Ensure all tests pass, ask the user if questions arise.
- Run
.venv/bin/ruff check services/ - Run
.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"to verify generator and API endpoints
-
8. Phase 4 — Scheduling, property-based tests, unit tests, and frontend tests
-
8.1 Wire Redis queue integration and scheduler
- Add report generation job consumer to the scheduler service that listens on
stonks:queue:report_generation - Add daily report trigger (after 16:30 ET on trading days) and weekly report trigger (Saturday) to the scheduler
- Job payload:
{"report_type": "daily"|"weekly", "period_start": "YYYY-MM-DD", "period_end": "YYYY-MM-DD"} - Requirements: 6.1, 6.2, 6.3, 6.4, 6.5
- Add report generation job consumer to the scheduler service that listens on
-
8.2 Write property test: Chunking Round-Trip and Size Constraint
- Property 1: Chunking Round-Trip and Size Constraint
- File:
tests/test_pbt_report_chunking.py - Use Hypothesis
@settings(max_examples=100)with@given(st.text())and@given(st.integers(min_value=1, max_value=10000)) - Assert: every chunk ≤ max_chars, no empty chunks (except empty input → one empty chunk), concatenation of chunks == original input
- Validates: Requirements 2.2
-
8.3 Write property test: Report Serialization Round-Trip
- Property 2: Report Serialization Round-Trip
- File:
tests/test_pbt_report_serialization.py - Use Hypothesis with custom strategies for ReportData (valid PLSection, RecommendationAccuracySection, etc.)
- Assert:
ReportData.model_validate_json(report.model_dump_json())== original report - Assert: all datetime fields in serialized JSON are ISO 8601 format
- Validates: Requirements 8.1, 8.2, 8.3, 8.4
-
8.4 Write property test: Validation Discrepancy Detection Correctness
- Property 3: Validation Discrepancy Detection Correctness
- File:
tests/test_pbt_report_validation.py - Use Hypothesis with
@given(st.floats(min_value=0, max_value=1e6), st.floats(min_value=0, max_value=1e6)) - Assert: warning iff |computed - snapshot| / snapshot * 100 > 5% (when snapshot > 0); flag any non-zero computed when snapshot == 0; no warning when both == 0
- Validates: Requirements 4.1, 4.2, 4.3, 4.4
-
8.5 Write property test: Recommendation Accuracy Aggregation
- Property 4: Recommendation Accuracy Aggregation
- File:
tests/test_pbt_report_sections.py - Use Hypothesis with lists of trading decisions + prediction outcomes (direction_correct bool, profitable bool, excess_return_vs_spy float)
- Assert: win_rate == count(profitable) / total, directional_accuracy == count(direction_correct) / total, avg excess return == mean(excess_return_vs_spy), all rates in [0.0, 1.0]
- Validates: Requirements 1.4
-
8.6 Write property test: Portfolio Period-Over-Period Delta Computation
- Property 5: Portfolio Period-Over-Period Delta Computation
- File:
tests/test_pbt_report_sections.py - Use Hypothesis with two portfolio snapshots (non-negative portfolio_value, active_pool, reserve_pool, finite cumulative_return)
- Assert: deltas == (current - previous) for each field; when no previous snapshot, deltas == 0
- Validates: Requirements 1.3
-
8.7 Write unit tests for section builders
- File:
tests/test_report_sections.py - Test each section builder with known inputs and expected outputs
- Test edge cases: empty data (zero-activity), single position, no portfolio snapshot
- Requirements: 3.1, 3.2, 3.3, 3.4, 3.5
- File:
-
8.8 Write unit tests for report validator
- File:
tests/test_report_validator.py - Test specific discrepancy scenarios: exactly 5% (no warning), 5.1% (warning), snapshot=0 computed≠0, both=0, NULL snapshot
- Requirements: 4.1, 4.2, 4.3, 4.4
- File:
-
8.9 Write unit tests for AI summarizer
- File:
tests/test_report_summarizer.py - Test deterministic fallback summary generation
- Test chunk_data edge cases: empty input, single character, exactly at limit, one char over limit
- Requirements: 2.2, 2.6
- File:
-
8.10 Write unit tests for report generator
- File:
tests/test_report_generator.py - Test orchestration with mocked dependencies (collector, sections, validator, summarizer)
- Test zero-activity report generation
- Test upsert behavior (regeneration of existing report)
- Requirements: 5.1, 5.2, 5.3
- File:
-
8.11 Write API integration tests
- File:
tests/test_report_api.py - Test GET /api/reports with pagination, filtering by report_type and date range
- Test GET /api/reports/{report_id} with valid and invalid IDs
- Requirements: 5.4, 5.5, 5.6
- File:
-
8.12 Write frontend hook tests
- File:
frontend/src/test/reports.test.ts - Test useReports and useReport hooks with MSW mocks
- Test loading and error states
- Requirements: 5.4, 5.5
- File:
-
-
9. Final checkpoint — Full test suite and lint
- Ensure all tests pass, ask the user if questions arise.
- Run
.venv/bin/ruff check services/ - Run
.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report" - Run frontend tests:
cd frontend && npx vitest --run
Notes
- Tasks marked with
*are optional and can be skipped for faster MVP - Each task references specific requirements for traceability
- Checkpoints ensure incremental validation after each phase
- Property tests validate the 5 universal correctness properties from the design document
- Unit tests validate specific examples and edge cases
- The design document contains full interface signatures — use those as the implementation guide
- Always run
.venv/bin/ruff check services/before committing Python changes