Files
stonks-oracle/.kiro/specs/trading-feedback-engine/tasks.md
T
Celes Renata bc077bfcc8
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
feat: trading feedback engine — periodic performance reports with AI summarization
- Migration 038: trading_reports table + report-summarizer agent seed
- 6 reporting modules: models, collector, sections, validator, summarizer, generator
- API endpoints: GET /api/reports (paginated, filterable), GET /api/reports/{id}
- Frontend hooks: useReports, useReport with TanStack Query
- Scheduler: daily (after 16:30 ET) and weekly (Saturday) report triggers
- Redis queue consumer for async report generation with retry/dedup
- 5 property-based tests (chunking, serialization, validation, accuracy, deltas)
- 109 unit/integration tests across all modules
- 6 frontend hook tests with MSW mocks
2026-05-01 22:13:09 +00:00

14 KiB

Implementation Plan: Trading Feedback Engine

Overview

Add a periodic trading performance reporting system to Stonks Oracle. The system collects trading data, generates structured JSON reports with AI-powered summaries, validates metrics against live data, and stores reports for retrieval via API. Implementation follows the four-phase approach from the design: foundation → validation & AI → generator & API → scheduling & tests.

Tasks

  • 1. Database migration 038 — trading_reports table and report-summarizer agent

    • 1.1 Create infra/migrations/038_trading_reports.sql

      • Create trading_reports table with columns: id (UUID PK, gen_random_uuid()), report_type (VARCHAR(20) NOT NULL), period_start (DATE NOT NULL), period_end (DATE NOT NULL), report_data (JSONB NOT NULL), validation_status (VARCHAR(20) NOT NULL DEFAULT 'passed'), generated_at (TIMESTAMPTZ NOT NULL), created_at (TIMESTAMPTZ NOT NULL DEFAULT NOW())
      • Add UNIQUE constraint on (report_type, period_start, period_end)
      • Add CHECK constraint: report_type IN ('daily', 'weekly')
      • Create indexes: idx_trading_reports_type, idx_trading_reports_period, idx_trading_reports_generated
      • Seed Report_Summarizer_Agent into ai_agents table with slug 'report-summarizer', model_provider 'ollama', model_name 'qwen3.5:9b-fast', source 'system', temperature 0.0, max_tokens 1024, timeout_seconds 60, max_retries 2
      • Use WHERE NOT EXISTS guard on agent insert to be idempotent
      • Requirements: 5.1, 5.2, 7.1, 7.2
    • 1.2 Add QUEUE_REPORT_GENERATION constant to services/shared/redis_keys.py

      • Add QUEUE_REPORT_GENERATION = "report_generation" following existing queue naming convention
      • Requirements: 6.3
  • 2. Phase 1 — Report models, data collector, and section builders

    • 2.1 Create report models (services/reporting/models.py)

      • Create services/reporting/__init__.py
      • Define enums: ReportType (daily, weekly), ValidationStatus (passed, warnings)
      • Define Pydantic models: ValidationWarning, PLSection, RecommendationAccuracySection, PositionDetail, PositionPerformanceSection, RiskMetricsSection, ModelQualityWindow, ModelQualitySection, ReportData
      • ReportData includes all sections, executive_summary, validation_status, generated_at, period_start, period_end, report_type
      • Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 8.1, 8.2, 8.4
    • 2.2 Implement data collector (services/reporting/collector.py)

      • Define CollectedData dataclass with fields: trading_decisions, orders, open_positions, closed_positions, portfolio_snapshot, previous_portfolio_snapshot, recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_balance
      • Implement collect_report_data(pool, period_start, period_end) → CollectedData
      • Query trading_decisions, orders, positions (open + closed), portfolio_snapshots (current + previous), recommendations, prediction_outcomes, model_metric_snapshots, circuit_breaker_events, reserve_pool_ledger for the period
      • Return empty lists for tables with no data (zero-activity case)
      • Use _row_dict() pattern for UUID conversion from asyncpg rows
      • Requirements: 1.1, 1.2, 1.3, 1.4, 1.5
    • 2.3 Implement section builders (services/reporting/sections.py)

      • Implement build_pnl_section(data: CollectedData) -> PLSection — compute realized/unrealized P&L, daily return, cumulative return, win/loss counts, win rate, profit factor, Sharpe ratio from portfolio_snapshot and closed positions
      • Implement build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection — join trading_decisions with prediction_outcomes, compute act/skip breakdown, win rate of acted, avg confidence acted vs skipped
      • Implement build_position_performance_section(data: CollectedData) -> PositionPerformanceSection — list each position with ticker, entry price, current/exit price, P&L, P&L%, hold duration
      • Implement build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection — extract risk tier, portfolio heat, max drawdown, current drawdown %, reserve pool balance, circuit breaker event count
      • Implement build_model_quality_section(data: CollectedData) -> ModelQualitySection — extract model_metric_snapshot values for 7d, 30d, 90d lookback windows
      • Handle zero-activity gracefully (zero values, empty lists)
      • Requirements: 1.3, 1.4, 3.1, 3.2, 3.3, 3.4, 3.5
  • 3. Checkpoint — Verify foundation modules

    • Ensure all tests pass, ask the user if questions arise.
    • Run .venv/bin/ruff check services/reporting/
    • Run .venv/bin/python -m pytest tests/ -x --tb=short -q -k "report" to verify models and section builders
  • 4. Phase 2 — Report validator and AI summarizer

    • 4.1 Implement report validator (services/reporting/validator.py)

      • Define DISCREPANCY_THRESHOLD_PCT = 5.0
      • Implement validate_recommendation_accuracy(section, prediction_outcomes) → list[ValidationWarning] — compare computed win rate against direction_correct/profitable from prediction_outcomes, flag >5% discrepancies
      • Implement validate_model_quality(section, metric_snapshots) → list[ValidationWarning] — compare reported metrics against model_metric_snapshots for win_rate, directional_accuracy, IC, ECE, Brier score, flag >5% discrepancies
      • Implement compute_validation_status(report: ReportData) → ValidationStatus — return 'passed' if no warnings, 'warnings' if any section has validation_warnings
      • Handle edge cases: snapshot=0 with computed≠0 → 100% difference; both=0 → no warning; snapshot=NULL → skip; computed=NaN → replace with 0.0
      • Requirements: 4.1, 4.2, 4.3, 4.4
    • 4.2 Implement AI summarizer (services/reporting/summarizer.py)

      • Define constants: CHUNK_SIZE_LIMIT=6000, MAX_SUMMARY_WORDS=200, MAX_EXECUTIVE_SUMMARY_WORDS=300
      • Implement chunk_data(serialized: str, max_chars: int) → list[str] — split on newline boundaries, each chunk ≤ max_chars, at least one chunk returned
      • Implement summarize_section(pool, resolver, section_name, section_data) → str — serialize, chunk if needed, summarize each chunk via Report_Summarizer_Agent (resolved by slug 'report-summarizer'), merge if multiple chunks, log to agent_performance_log, fall back to deterministic on failure
      • Implement build_deterministic_summary(section_name, section_data) → str — template-based fallback summary from raw metrics
      • Implement generate_executive_summary(pool, resolver, section_summaries) → str — concatenate section summaries, chunk if needed, produce ≤300-word synthesis, fall back to concatenation on failure
      • Use AgentConfigResolver + llm_factory for LLM access
      • Log each invocation to agent_performance_log with agent_id, success, duration_ms, token estimates
      • Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.6
  • 5. Checkpoint — Verify validator and summarizer

    • Ensure all tests pass, ask the user if questions arise.
    • Run .venv/bin/ruff check services/reporting/
    • Run .venv/bin/python -m pytest tests/ -x --tb=short -q -k "report" to verify validator and summarizer
  • 6. Phase 3 — Report generator orchestrator and API endpoints

    • 6.1 Implement report generator (services/reporting/generator.py)

      • Implement generate_report(pool, report_type, period_start, period_end) → ReportData — orchestrate: collect data → build sections → validate → summarize → assemble ReportData
      • Implement store_report(pool, report) → str (UUID) — INSERT ... ON CONFLICT (report_type, period_start, period_end) DO UPDATE for upsert, return report id
      • Implement process_report_job(pool, job: dict) → None — deserialize job payload, call generate_report + store_report, handle retries with exponential backoff (30s, 60s, 120s up to 3 attempts), reject duplicate jobs for same report_type + period
      • Requirements: 5.1, 5.2, 5.3, 6.3, 6.4, 6.5
    • 6.2 Add API endpoints to services/api/app.py

      • Add GET /api/reports — paginated list with query params: report_type, start_date, end_date, limit (default 20), offset (default 0); returns id, report_type, period_start, period_end, validation_status, generated_at
      • Add GET /api/reports/{report_id} — full report including report_data JSONB
      • Use asyncpg pool from existing app state
      • Return 404 for non-existent report_id
      • Requirements: 5.4, 5.5, 5.6
    • 6.3 Add frontend hooks to frontend/src/api/hooks.ts

      • Add ReportListItem and ReportDetail TypeScript interfaces
      • Implement useReports(params?) hook — builds query string from report_type, start_date, end_date, limit, offset; uses useGet with 'query' base
      • Implement useReport(id) hook — fetches single report by id, enabled only when id is defined
      • Requirements: 5.4, 5.5
  • 7. Checkpoint — Verify generator and API

    • Ensure all tests pass, ask the user if questions arise.
    • Run .venv/bin/ruff check services/
    • Run .venv/bin/python -m pytest tests/ -x --tb=short -q -k "report" to verify generator and API endpoints
  • 8. Phase 4 — Scheduling, property-based tests, unit tests, and frontend tests

    • 8.1 Wire Redis queue integration and scheduler

      • Add report generation job consumer to the scheduler service that listens on stonks:queue:report_generation
      • Add daily report trigger (after 16:30 ET on trading days) and weekly report trigger (Saturday) to the scheduler
      • Job payload: {"report_type": "daily"|"weekly", "period_start": "YYYY-MM-DD", "period_end": "YYYY-MM-DD"}
      • Requirements: 6.1, 6.2, 6.3, 6.4, 6.5
    • 8.2 Write property test: Chunking Round-Trip and Size Constraint

      • Property 1: Chunking Round-Trip and Size Constraint
      • File: tests/test_pbt_report_chunking.py
      • Use Hypothesis @settings(max_examples=100) with @given(st.text()) and @given(st.integers(min_value=1, max_value=10000))
      • Assert: every chunk ≤ max_chars, no empty chunks (except empty input → one empty chunk), concatenation of chunks == original input
      • Validates: Requirements 2.2
    • 8.3 Write property test: Report Serialization Round-Trip

      • Property 2: Report Serialization Round-Trip
      • File: tests/test_pbt_report_serialization.py
      • Use Hypothesis with custom strategies for ReportData (valid PLSection, RecommendationAccuracySection, etc.)
      • Assert: ReportData.model_validate_json(report.model_dump_json()) == original report
      • Assert: all datetime fields in serialized JSON are ISO 8601 format
      • Validates: Requirements 8.1, 8.2, 8.3, 8.4
    • 8.4 Write property test: Validation Discrepancy Detection Correctness

      • Property 3: Validation Discrepancy Detection Correctness
      • File: tests/test_pbt_report_validation.py
      • Use Hypothesis with @given(st.floats(min_value=0, max_value=1e6), st.floats(min_value=0, max_value=1e6))
      • Assert: warning iff |computed - snapshot| / snapshot * 100 > 5% (when snapshot > 0); flag any non-zero computed when snapshot == 0; no warning when both == 0
      • Validates: Requirements 4.1, 4.2, 4.3, 4.4
    • 8.5 Write property test: Recommendation Accuracy Aggregation

      • Property 4: Recommendation Accuracy Aggregation
      • File: tests/test_pbt_report_sections.py
      • Use Hypothesis with lists of trading decisions + prediction outcomes (direction_correct bool, profitable bool, excess_return_vs_spy float)
      • Assert: win_rate == count(profitable) / total, directional_accuracy == count(direction_correct) / total, avg excess return == mean(excess_return_vs_spy), all rates in [0.0, 1.0]
      • Validates: Requirements 1.4
    • 8.6 Write property test: Portfolio Period-Over-Period Delta Computation

      • Property 5: Portfolio Period-Over-Period Delta Computation
      • File: tests/test_pbt_report_sections.py
      • Use Hypothesis with two portfolio snapshots (non-negative portfolio_value, active_pool, reserve_pool, finite cumulative_return)
      • Assert: deltas == (current - previous) for each field; when no previous snapshot, deltas == 0
      • Validates: Requirements 1.3
    • 8.7 Write unit tests for section builders

      • File: tests/test_report_sections.py
      • Test each section builder with known inputs and expected outputs
      • Test edge cases: empty data (zero-activity), single position, no portfolio snapshot
      • Requirements: 3.1, 3.2, 3.3, 3.4, 3.5
    • 8.8 Write unit tests for report validator

      • File: tests/test_report_validator.py
      • Test specific discrepancy scenarios: exactly 5% (no warning), 5.1% (warning), snapshot=0 computed≠0, both=0, NULL snapshot
      • Requirements: 4.1, 4.2, 4.3, 4.4
    • 8.9 Write unit tests for AI summarizer

      • File: tests/test_report_summarizer.py
      • Test deterministic fallback summary generation
      • Test chunk_data edge cases: empty input, single character, exactly at limit, one char over limit
      • Requirements: 2.2, 2.6
    • 8.10 Write unit tests for report generator

      • File: tests/test_report_generator.py
      • Test orchestration with mocked dependencies (collector, sections, validator, summarizer)
      • Test zero-activity report generation
      • Test upsert behavior (regeneration of existing report)
      • Requirements: 5.1, 5.2, 5.3
    • 8.11 Write API integration tests

      • File: tests/test_report_api.py
      • Test GET /api/reports with pagination, filtering by report_type and date range
      • Test GET /api/reports/{report_id} with valid and invalid IDs
      • Requirements: 5.4, 5.5, 5.6
    • 8.12 Write frontend hook tests

      • File: frontend/src/test/reports.test.ts
      • Test useReports and useReport hooks with MSW mocks
      • Test loading and error states
      • Requirements: 5.4, 5.5
  • 9. Final checkpoint — Full test suite and lint

    • Ensure all tests pass, ask the user if questions arise.
    • Run .venv/bin/ruff check services/
    • Run .venv/bin/python -m pytest tests/ -x --tb=short -q -k "report"
    • Run frontend tests: cd frontend && npx vitest --run

Notes

  • Tasks marked with * are optional and can be skipped for faster MVP
  • Each task references specific requirements for traceability
  • Checkpoints ensure incremental validation after each phase
  • Property tests validate the 5 universal correctness properties from the design document
  • Unit tests validate specific examples and edge cases
  • The design document contains full interface signatures — use those as the implementation guide
  • Always run .venv/bin/ruff check services/ before committing Python changes