Files
stonks-oracle/.kiro/specs/trading-feedback-engine/tasks.md
T
Celes Renata bc077bfcc8
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
feat: trading feedback engine — periodic performance reports with AI summarization
- Migration 038: trading_reports table + report-summarizer agent seed
- 6 reporting modules: models, collector, sections, validator, summarizer, generator
- API endpoints: GET /api/reports (paginated, filterable), GET /api/reports/{id}
- Frontend hooks: useReports, useReport with TanStack Query
- Scheduler: daily (after 16:30 ET) and weekly (Saturday) report triggers
- Redis queue consumer for async report generation with retry/dedup
- 5 property-based tests (chunking, serialization, validation, accuracy, deltas)
- 109 unit/integration tests across all modules
- 6 frontend hook tests with MSW mocks
2026-05-01 22:13:09 +00:00

196 lines
14 KiB
Markdown

# Implementation Plan: Trading Feedback Engine
## Overview
Add a periodic trading performance reporting system to Stonks Oracle. The system collects trading data, generates structured JSON reports with AI-powered summaries, validates metrics against live data, and stores reports for retrieval via API. Implementation follows the four-phase approach from the design: foundation → validation & AI → generator & API → scheduling & tests.
## Tasks
- [x] 1. Database migration 038 — trading_reports table and report-summarizer agent
- [x] 1.1 Create `infra/migrations/038_trading_reports.sql`
- Create `trading_reports` table with columns: id (UUID PK, gen_random_uuid()), report_type (VARCHAR(20) NOT NULL), period_start (DATE NOT NULL), period_end (DATE NOT NULL), report_data (JSONB NOT NULL), validation_status (VARCHAR(20) NOT NULL DEFAULT 'passed'), generated_at (TIMESTAMPTZ NOT NULL), created_at (TIMESTAMPTZ NOT NULL DEFAULT NOW())
- Add UNIQUE constraint on (report_type, period_start, period_end)
- Add CHECK constraint: report_type IN ('daily', 'weekly')
- Create indexes: idx_trading_reports_type, idx_trading_reports_period, idx_trading_reports_generated
- Seed Report_Summarizer_Agent into ai_agents table with slug 'report-summarizer', model_provider 'ollama', model_name 'qwen3.5:9b-fast', source 'system', temperature 0.0, max_tokens 1024, timeout_seconds 60, max_retries 2
- Use WHERE NOT EXISTS guard on agent insert to be idempotent
- _Requirements: 5.1, 5.2, 7.1, 7.2_
- [x] 1.2 Add `QUEUE_REPORT_GENERATION` constant to `services/shared/redis_keys.py`
- Add `QUEUE_REPORT_GENERATION = "report_generation"` following existing queue naming convention
- _Requirements: 6.3_
- [x] 2. Phase 1 — Report models, data collector, and section builders
- [x] 2.1 Create report models (`services/reporting/models.py`)
- Create `services/reporting/__init__.py`
- Define enums: ReportType (daily, weekly), ValidationStatus (passed, warnings)
- Define Pydantic models: ValidationWarning, PLSection, RecommendationAccuracySection, PositionDetail, PositionPerformanceSection, RiskMetricsSection, ModelQualityWindow, ModelQualitySection, ReportData
- ReportData includes all sections, executive_summary, validation_status, generated_at, period_start, period_end, report_type
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 8.1, 8.2, 8.4_
- [x] 2.2 Implement data collector (`services/reporting/collector.py`)
- Define CollectedData dataclass with fields: trading_decisions, orders, open_positions, closed_positions, portfolio_snapshot, previous_portfolio_snapshot, recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_balance
- Implement `collect_report_data(pool, period_start, period_end)` → CollectedData
- Query trading_decisions, orders, positions (open + closed), portfolio_snapshots (current + previous), recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_ledger for the period
- Return empty lists for tables with no data (zero-activity case)
- Use `_row_dict()` pattern for UUID conversion from asyncpg rows
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
- [x] 2.3 Implement section builders (`services/reporting/sections.py`)
- Implement `build_pnl_section(data: CollectedData) -> PLSection` — compute realized/unrealized P&L, daily return, cumulative return, win/loss counts, win rate, profit factor, Sharpe ratio from portfolio_snapshot and closed positions
- Implement `build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection` — join trading_decisions with prediction_outcomes, compute act/skip breakdown, win rate of acted, avg confidence acted vs skipped
- Implement `build_position_performance_section(data: CollectedData) -> PositionPerformanceSection` — list each position with ticker, entry price, current/exit price, P&L, P&L%, hold duration
- Implement `build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection` — extract risk tier, portfolio heat, max drawdown, current drawdown %, reserve pool balance, circuit breaker event count
- Implement `build_model_quality_section(data: CollectedData) -> ModelQualitySection` — extract model_metric_snapshot values for 7d, 30d, 90d lookback windows
- Handle zero-activity gracefully (zero values, empty lists)
- _Requirements: 1.3, 1.4, 3.1, 3.2, 3.3, 3.4, 3.5_
- [x] 3. Checkpoint — Verify foundation modules
- Ensure all tests pass, ask the user if questions arise.
- Run `.venv/bin/ruff check services/reporting/`
- Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify models and section builders
- [x] 4. Phase 2 — Report validator and AI summarizer
- [x] 4.1 Implement report validator (`services/reporting/validator.py`)
- Define `DISCREPANCY_THRESHOLD_PCT = 5.0`
- Implement `validate_recommendation_accuracy(section, prediction_outcomes)` → list[ValidationWarning] — compare computed win rate against direction_correct/profitable from prediction_outcomes, flag >5% discrepancies
- Implement `validate_model_quality(section, metric_snapshots)` → list[ValidationWarning] — compare reported metrics against model_metric_snapshots for win_rate, directional_accuracy, IC, ECE, Brier score, flag >5% discrepancies
- Implement `compute_validation_status(report: ReportData)` → ValidationStatus — return 'passed' if no warnings, 'warnings' if any section has validation_warnings
- Handle edge cases: snapshot=0 with computed≠0 → 100% difference; both=0 → no warning; snapshot=NULL → skip; computed=NaN → replace with 0.0
- _Requirements: 4.1, 4.2, 4.3, 4.4_
- [x] 4.2 Implement AI summarizer (`services/reporting/summarizer.py`)
- Define constants: CHUNK_SIZE_LIMIT=6000, MAX_SUMMARY_WORDS=200, MAX_EXECUTIVE_SUMMARY_WORDS=300
- Implement `chunk_data(serialized: str, max_chars: int)` → list[str] — split on newline boundaries, each chunk ≤ max_chars, at least one chunk returned
- Implement `summarize_section(pool, resolver, section_name, section_data)` → str — serialize, chunk if needed, summarize each chunk via Report_Summarizer_Agent (resolved by slug 'report-summarizer'), merge if multiple chunks, log to agent_performance_log, fall back to deterministic on failure
- Implement `build_deterministic_summary(section_name, section_data)` → str — template-based fallback summary from raw metrics
- Implement `generate_executive_summary(pool, resolver, section_summaries)` → str — concatenate section summaries, chunk if needed, produce ≤300-word synthesis, fall back to concatenation on failure
- Use AgentConfigResolver + llm_factory for LLM access
- Log each invocation to agent_performance_log with agent_id, success, duration_ms, token estimates
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.6_
- [x] 5. Checkpoint — Verify validator and summarizer
- Ensure all tests pass, ask the user if questions arise.
- Run `.venv/bin/ruff check services/reporting/`
- Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify validator and summarizer
- [x] 6. Phase 3 — Report generator orchestrator and API endpoints
- [x] 6.1 Implement report generator (`services/reporting/generator.py`)
- Implement `generate_report(pool, report_type, period_start, period_end)` → ReportData — orchestrate: collect data → build sections → validate → summarize → assemble ReportData
- Implement `store_report(pool, report)` → str (UUID) — INSERT ... ON CONFLICT (report_type, period_start, period_end) DO UPDATE for upsert, return report id
- Implement `process_report_job(pool, job: dict)` → None — deserialize job payload, call generate_report + store_report, handle retries with exponential backoff (30s, 60s, 120s up to 3 attempts), reject duplicate jobs for same report_type + period
- _Requirements: 5.1, 5.2, 5.3, 6.3, 6.4, 6.5_
- [x] 6.2 Add API endpoints to `services/api/app.py`
- Add `GET /api/reports` — paginated list with query params: report_type, start_date, end_date, limit (default 20), offset (default 0); returns id, report_type, period_start, period_end, validation_status, generated_at
- Add `GET /api/reports/{report_id}` — full report including report_data JSONB
- Use asyncpg pool from existing app state
- Return 404 for non-existent report_id
- _Requirements: 5.4, 5.5, 5.6_
- [x] 6.3 Add frontend hooks to `frontend/src/api/hooks.ts`
- Add `ReportListItem` and `ReportDetail` TypeScript interfaces
- Implement `useReports(params?)` hook — builds query string from report_type, start_date, end_date, limit, offset; uses `useGet` with 'query' base
- Implement `useReport(id)` hook — fetches single report by id, enabled only when id is defined
- _Requirements: 5.4, 5.5_
- [x] 7. Checkpoint — Verify generator and API
- Ensure all tests pass, ask the user if questions arise.
- Run `.venv/bin/ruff check services/`
- Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"` to verify generator and API endpoints
- [x] 8. Phase 4 — Scheduling, property-based tests, unit tests, and frontend tests
- [x] 8.1 Wire Redis queue integration and scheduler
- Add report generation job consumer to the scheduler service that listens on `stonks:queue:report_generation`
- Add daily report trigger (after 16:30 ET on trading days) and weekly report trigger (Saturday) to the scheduler
- Job payload: `{"report_type": "daily"|"weekly", "period_start": "YYYY-MM-DD", "period_end": "YYYY-MM-DD"}`
- _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_
- [x] 8.2 Write property test: Chunking Round-Trip and Size Constraint
- **Property 1: Chunking Round-Trip and Size Constraint**
- File: `tests/test_pbt_report_chunking.py`
- Use Hypothesis `@settings(max_examples=100)` with `@given(st.text())` and `@given(st.integers(min_value=1, max_value=10000))`
- Assert: every chunk ≤ max_chars, no empty chunks (except empty input → one empty chunk), concatenation of chunks == original input
- **Validates: Requirements 2.2**
- [x] 8.3 Write property test: Report Serialization Round-Trip
- **Property 2: Report Serialization Round-Trip**
- File: `tests/test_pbt_report_serialization.py`
- Use Hypothesis with custom strategies for ReportData (valid PLSection, RecommendationAccuracySection, etc.)
- Assert: `ReportData.model_validate_json(report.model_dump_json())` == original report
- Assert: all datetime fields in serialized JSON are ISO 8601 format
- **Validates: Requirements 8.1, 8.2, 8.3, 8.4**
- [x] 8.4 Write property test: Validation Discrepancy Detection Correctness
- **Property 3: Validation Discrepancy Detection Correctness**
- File: `tests/test_pbt_report_validation.py`
- Use Hypothesis with `@given(st.floats(min_value=0, max_value=1e6), st.floats(min_value=0, max_value=1e6))`
- Assert: warning iff |computed - snapshot| / snapshot * 100 > 5% (when snapshot > 0); flag any non-zero computed when snapshot == 0; no warning when both == 0
- **Validates: Requirements 4.1, 4.2, 4.3, 4.4**
- [x] 8.5 Write property test: Recommendation Accuracy Aggregation
- **Property 4: Recommendation Accuracy Aggregation**
- File: `tests/test_pbt_report_sections.py`
- Use Hypothesis with lists of trading decisions + prediction outcomes (direction_correct bool, profitable bool, excess_return_vs_spy float)
- Assert: win_rate == count(profitable) / total, directional_accuracy == count(direction_correct) / total, avg excess return == mean(excess_return_vs_spy), all rates in [0.0, 1.0]
- **Validates: Requirements 1.4**
- [x] 8.6 Write property test: Portfolio Period-Over-Period Delta Computation
- **Property 5: Portfolio Period-Over-Period Delta Computation**
- File: `tests/test_pbt_report_sections.py`
- Use Hypothesis with two portfolio snapshots (non-negative portfolio_value, active_pool, reserve_pool, finite cumulative_return)
- Assert: deltas == (current - previous) for each field; when no previous snapshot, deltas == 0
- **Validates: Requirements 1.3**
- [x] 8.7 Write unit tests for section builders
- File: `tests/test_report_sections.py`
- Test each section builder with known inputs and expected outputs
- Test edge cases: empty data (zero-activity), single position, no portfolio snapshot
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5_
- [x] 8.8 Write unit tests for report validator
- File: `tests/test_report_validator.py`
- Test specific discrepancy scenarios: exactly 5% (no warning), 5.1% (warning), snapshot=0 computed≠0, both=0, NULL snapshot
- _Requirements: 4.1, 4.2, 4.3, 4.4_
- [x] 8.9 Write unit tests for AI summarizer
- File: `tests/test_report_summarizer.py`
- Test deterministic fallback summary generation
- Test chunk_data edge cases: empty input, single character, exactly at limit, one char over limit
- _Requirements: 2.2, 2.6_
- [x] 8.10 Write unit tests for report generator
- File: `tests/test_report_generator.py`
- Test orchestration with mocked dependencies (collector, sections, validator, summarizer)
- Test zero-activity report generation
- Test upsert behavior (regeneration of existing report)
- _Requirements: 5.1, 5.2, 5.3_
- [x] 8.11 Write API integration tests
- File: `tests/test_report_api.py`
- Test GET /api/reports with pagination, filtering by report_type and date range
- Test GET /api/reports/{report_id} with valid and invalid IDs
- _Requirements: 5.4, 5.5, 5.6_
- [x] 8.12 Write frontend hook tests
- File: `frontend/src/test/reports.test.ts`
- Test useReports and useReport hooks with MSW mocks
- Test loading and error states
- _Requirements: 5.4, 5.5_
- [x] 9. Final checkpoint — Full test suite and lint
- Ensure all tests pass, ask the user if questions arise.
- Run `.venv/bin/ruff check services/`
- Run `.venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"`
- Run frontend tests: `cd frontend && npx vitest --run`
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation after each phase
- Property tests validate the 5 universal correctness properties from the design document
- Unit tests validate specific examples and edge cases
- The design document contains full interface signatures — use those as the implementation guide
- Always run `.venv/bin/ruff check services/` before committing Python changes