Files
stonks-oracle/.kiro/specs/trading-feedback-engine/design.md
T
Celes Renata bc077bfcc8
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
feat: trading feedback engine — periodic performance reports with AI summarization
- Migration 038: trading_reports table + report-summarizer agent seed
- 6 reporting modules: models, collector, sections, validator, summarizer, generator
- API endpoints: GET /api/reports (paginated, filterable), GET /api/reports/{id}
- Frontend hooks: useReports, useReport with TanStack Query
- Scheduler: daily (after 16:30 ET) and weekly (Saturday) report triggers
- Redis queue consumer for async report generation with retry/dedup
- 5 property-based tests (chunking, serialization, validation, accuracy, deltas)
- 109 unit/integration tests across all modules
- 6 frontend hook tests with MSW mocks
2026-05-01 22:13:09 +00:00

30 KiB
Raw Blame History

Design Document — Trading Feedback Engine

Overview

This design adds a periodic trading performance reporting system to Stonks Oracle. The system collects trading data (P&L, recommendations, positions, risk metrics, model quality), generates structured JSON reports with AI-powered summaries, validates report metrics against live data, and stores reports for retrieval via API.

The core challenge is fitting AI summarization within the 8k-token context window of the qwen3.5:9b-fast model on the local Ollama instance. The design addresses this with a chunking strategy that serializes report section data into ≤6,000-character chunks, summarizes each chunk independently, then merges chunk summaries into a final section summary. This hierarchical summarization approach keeps each LLM call well within the token budget while producing coherent narratives.

Design Rationale

A trading system without periodic performance feedback forces the operator to manually query tables and compute metrics. The feedback engine closes this gap by:

  1. Automating data collection — pulling from 7+ tables (trading_decisions, orders, positions, portfolio_snapshots, recommendations, prediction_outcomes, model_metric_snapshots) into a single structured report
  2. AI-powered summarization — using the existing agent infrastructure (ai_agents, AgentConfigResolver, llm_factory) to generate natural-language summaries that highlight trends and anomalies
  3. Cross-validation — comparing computed metrics against live validation data (prediction_outcomes, model_metric_snapshots) and flagging discrepancies >5%
  4. Persistent storage — storing reports as JSONB for historical comparison and trend analysis
  5. Scheduled generation — daily (after market close) and weekly (Saturday) reports via Redis queue jobs

The design reuses existing infrastructure: asyncpg for persistence, FastAPI for API endpoints, Redis queues for async job processing, the ai_agents/AgentConfigResolver/llm_factory stack for LLM access, and TanStack Query hooks on the frontend.


Architecture

High-Level Data Flow

flowchart TD
    subgraph "Scheduling (Trigger)"
        A[Scheduler Service] -->|after 16:30 ET daily| B[Redis Queue<br/>stonks:queue:report_generation]
        A -->|Saturday weekly| B
        C[Manual API Trigger] --> B
    end

    subgraph "Report Generation (Async Worker)"
        B --> D[Report Generator<br/>services/reporting/generator.py]
        D -->|1. Collect| E[Data Collector<br/>services/reporting/collector.py]
        E -->|queries| F[(trading_decisions<br/>orders, positions<br/>portfolio_snapshots<br/>recommendations)]
        D -->|2. Build sections| G[Section Builder<br/>services/reporting/sections.py]
        G -->|P&L, accuracy,<br/>positions, risk,<br/>model quality| H[Report Sections]
        D -->|3. Validate| I[Report Validator<br/>services/reporting/validator.py]
        I -->|cross-check| J[(prediction_outcomes<br/>model_metric_snapshots)]
        D -->|4. Summarize| K[AI Summarizer<br/>services/reporting/summarizer.py]
        K -->|chunk & summarize| L[Report_Summarizer_Agent<br/>via AgentConfigResolver<br/>+ llm_factory]
        D -->|5. Store| M[(trading_reports table)]
    end

    subgraph "API Layer"
        N[GET /api/reports] -->|paginated list| M
        O[GET /api/reports/:id] -->|full report| M
    end

    subgraph "Frontend"
        P[useReports hook] --> N
        Q[useReport hook] --> O
    end

Scheduling Strategy

Component Trigger Cadence
Daily Report Scheduler after 16:30 ET Every trading day
Weekly Report Scheduler on Saturday Weekly (MonFri coverage)
Report Generator Worker Redis queue consumer On-demand from queue
AI Summarizer Called by generator Per report section

Chunking Strategy

The qwen3.5:9b-fast model has an 8k-token context window. With the system prompt (~200 tokens) and response budget (~200 tokens), roughly 7,600 tokens remain for input. At ~4 chars/token for structured data, that's ~30,400 characters. The 6,000-character chunk limit provides a 5x safety margin to account for JSON overhead, prompt framing, and tokenization variance.

flowchart LR
    A[Section Data<br/>e.g. 15,000 chars] --> B{> 6,000 chars?}
    B -->|No| C[Single LLM call<br/>→ summary]
    B -->|Yes| D[Split into chunks<br/>≤ 6,000 chars each]
    D --> E[Chunk 1 → LLM → summary 1]
    D --> F[Chunk 2 → LLM → summary 2]
    D --> G[Chunk N → LLM → summary N]
    E --> H[Merge summaries<br/>→ final LLM call<br/>→ section summary]
    F --> H
    G --> H

Components and Interfaces

New Modules

Module File Responsibility
Report Data Collector services/reporting/collector.py Queries trading data for a reporting period
Report Section Builder services/reporting/sections.py Builds structured report sections from raw data
Report Validator services/reporting/validator.py Cross-checks metrics against validation tables
AI Summarizer services/reporting/summarizer.py Chunks data and generates AI summaries
Report Generator services/reporting/generator.py Orchestrates the full report generation pipeline
Report Models services/reporting/models.py Pydantic models for report structure and serialization

Modified Modules

Module File Changes
Query API services/api/app.py 2 new /api/reports endpoints
Redis Keys services/shared/redis_keys.py New QUEUE_REPORT_GENERATION constant
Frontend Hooks frontend/src/api/hooks.ts 2 new report hooks
DB Migration infra/migrations/038_trading_reports.sql New table + agent seed

Component Interface Details

1. Report Models (services/reporting/models.py)

from __future__ import annotations
from datetime import date, datetime
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field


class ReportType(str, Enum):
    DAILY = "daily"
    WEEKLY = "weekly"


class ValidationStatus(str, Enum):
    PASSED = "passed"
    WARNINGS = "warnings"


class ValidationWarning(BaseModel):
    field_name: str
    computed_value: float
    snapshot_value: float
    pct_difference: float


class PLSection(BaseModel):
    realized_pnl: float
    unrealized_pnl: float
    daily_return: float
    cumulative_return: float
    win_count: int
    loss_count: int
    win_rate: float
    profit_factor: float
    sharpe_ratio: float
    summary: str = ""
    validation_warnings: list[ValidationWarning] = Field(default_factory=list)


class RecommendationAccuracySection(BaseModel):
    total_evaluated: int
    act_count: int
    skip_count: int
    acted_win_rate: float
    avg_confidence_acted: float
    avg_confidence_skipped: float
    summary: str = ""
    validation_warnings: list[ValidationWarning] = Field(default_factory=list)


class PositionDetail(BaseModel):
    ticker: str
    entry_price: float
    current_or_exit_price: float
    pnl: float
    pnl_pct: float
    hold_duration_hours: float
    status: str  # "open" or "closed"


class PositionPerformanceSection(BaseModel):
    positions: list[PositionDetail] = Field(default_factory=list)
    summary: str = ""


class RiskMetricsSection(BaseModel):
    current_risk_tier: str
    portfolio_heat: float
    max_drawdown: float
    current_drawdown_pct: float
    reserve_pool_balance: float
    circuit_breaker_event_count: int
    summary: str = ""


class ModelQualityWindow(BaseModel):
    lookback: str
    win_rate: float | None
    directional_accuracy: float | None
    information_coefficient: float | None
    calibration_error: float | None
    brier_score: float | None


class ModelQualitySection(BaseModel):
    windows: list[ModelQualityWindow] = Field(default_factory=list)
    summary: str = ""
    validation_warnings: list[ValidationWarning] = Field(default_factory=list)


class ReportData(BaseModel):
    """Top-level report structure stored as JSONB."""
    pnl: PLSection
    recommendation_accuracy: RecommendationAccuracySection
    position_performance: PositionPerformanceSection
    risk_metrics: RiskMetricsSection
    model_quality: ModelQualitySection
    executive_summary: str = ""
    validation_status: ValidationStatus = ValidationStatus.PASSED
    generated_at: datetime
    period_start: date
    period_end: date
    report_type: ReportType

2. Report Data Collector (services/reporting/collector.py)

from __future__ import annotations
from dataclasses import dataclass
from datetime import date, datetime
import asyncpg


@dataclass
class CollectedData:
    """Raw data collected for a reporting period."""
    trading_decisions: list[dict]
    orders: list[dict]
    open_positions: list[dict]
    closed_positions: list[dict]
    portfolio_snapshot: dict | None
    previous_portfolio_snapshot: dict | None
    recommendations: list[dict]
    prediction_outcomes: list[dict]
    model_metric_snapshots: list[dict]
    circuit_breaker_events: list[dict]
    reserve_pool_balance: float


async def collect_report_data(
    pool: asyncpg.Pool,
    period_start: date,
    period_end: date,
) -> CollectedData:
    """Query all trading data for the reporting period.

    Queries: trading_decisions, orders, positions, portfolio_snapshots,
    recommendations, prediction_outcomes, model_metric_snapshots,
    circuit_breaker_events, reserve_pool_ledger.

    Returns CollectedData with all raw query results.
    If no trading_decisions exist, returns empty lists (zero-activity).
    """
    ...

3. Report Section Builder (services/reporting/sections.py)

from __future__ import annotations
from services.reporting.models import (
    PLSection, RecommendationAccuracySection,
    PositionPerformanceSection, PositionDetail,
    RiskMetricsSection, ModelQualitySection, ModelQualityWindow,
)
from services.reporting.collector import CollectedData


def build_pnl_section(data: CollectedData) -> PLSection:
    """Build P&L section from collected data.

    Computes realized/unrealized P&L, daily return, cumulative return,
    win/loss counts, win rate, profit factor, and Sharpe ratio from
    portfolio_snapshot and closed positions.
    """
    ...


def build_recommendation_accuracy_section(data: CollectedData) -> RecommendationAccuracySection:
    """Build recommendation accuracy section.

    Joins trading_decisions with prediction_outcomes to compute
    act/skip breakdown, win rate of acted recommendations, and
    average confidence of acted vs skipped.
    """
    ...


def build_position_performance_section(data: CollectedData) -> PositionPerformanceSection:
    """Build position performance section.

    Lists each position (open and closed) with entry price,
    current/exit price, P&L, P&L%, and hold duration.
    """
    ...


def build_risk_metrics_section(data: CollectedData) -> RiskMetricsSection:
    """Build risk metrics section.

    Extracts current risk tier, portfolio heat, max drawdown,
    current drawdown %, reserve pool balance, and circuit breaker
    event count from collected data.
    """
    ...


def build_model_quality_section(data: CollectedData) -> ModelQualitySection:
    """Build model quality section.

    Extracts latest model_metric_snapshot values for 7d, 30d, 90d
    lookback windows.
    """
    ...

4. Report Validator (services/reporting/validator.py)

from __future__ import annotations
import asyncpg
from services.reporting.models import (
    ReportData, ValidationStatus, ValidationWarning,
)


DISCREPANCY_THRESHOLD_PCT = 5.0


def validate_recommendation_accuracy(
    section: "RecommendationAccuracySection",
    prediction_outcomes: list[dict],
) -> list[ValidationWarning]:
    """Cross-reference reported win rates with prediction_outcomes.

    Compares computed win rate against direction_correct/profitable
    fields from prediction_outcomes for the same tickers and period.
    Returns warnings for discrepancies > 5%.
    """
    ...


def validate_model_quality(
    section: "ModelQualitySection",
    metric_snapshots: list[dict],
) -> list[ValidationWarning]:
    """Compare reported model quality metrics against model_metric_snapshots.

    Flags discrepancies > 5% between computed and snapshot values
    for win_rate, directional_accuracy, IC, ECE, and Brier score.
    """
    ...


def compute_validation_status(report: ReportData) -> ValidationStatus:
    """Determine overall validation status.

    Returns 'passed' if no warnings across all sections,
    'warnings' if any section has validation warnings.
    """
    ...

5. AI Summarizer (services/reporting/summarizer.py)

from __future__ import annotations
import asyncpg
from services.shared.agent_config import AgentConfigResolver


CHUNK_SIZE_LIMIT = 6000  # characters per chunk
MAX_SUMMARY_WORDS = 200  # per section summary
MAX_EXECUTIVE_SUMMARY_WORDS = 300


def chunk_data(serialized: str, max_chars: int = CHUNK_SIZE_LIMIT) -> list[str]:
    """Split serialized data into chunks of at most max_chars.

    Splits on newline boundaries to avoid breaking JSON structures.
    Each chunk is ≤ max_chars characters.
    Returns at least one chunk (even if empty input).
    """
    ...


async def summarize_section(
    pool: asyncpg.Pool,
    resolver: AgentConfigResolver,
    section_name: str,
    section_data: str,
) -> str:
    """Generate AI summary for a report section.

    1. Serialize section data to string
    2. Chunk if > CHUNK_SIZE_LIMIT
    3. Summarize each chunk via Report_Summarizer_Agent
    4. If multiple chunks, merge summaries with a final LLM call
    5. Log each invocation to agent_performance_log
    6. On failure after max_retries, fall back to deterministic summary

    Uses AgentConfigResolver to resolve agent config by slug
    'report-summarizer', then llm_factory to build the LLM client.
    """
    ...


def build_deterministic_summary(section_name: str, section_data: dict) -> str:
    """Build a fallback deterministic summary from raw metrics.

    Produces a template-based text summary when AI summarization fails.
    """
    ...


async def generate_executive_summary(
    pool: asyncpg.Pool,
    resolver: AgentConfigResolver,
    section_summaries: dict[str, str],
) -> str:
    """Generate executive summary from all section summaries.

    Concatenates section summaries, chunks if needed, and produces
    a ≤300-word synthesis via the Report_Summarizer_Agent.
    Falls back to concatenated section summaries on failure.
    """
    ...

6. Report Generator (services/reporting/generator.py)

from __future__ import annotations
from datetime import date
import asyncpg
from services.reporting.models import ReportData, ReportType


async def generate_report(
    pool: asyncpg.Pool,
    report_type: ReportType,
    period_start: date,
    period_end: date,
) -> ReportData:
    """Orchestrate full report generation.

    1. Collect data via collector
    2. Build sections via section builder
    3. Validate sections via validator
    4. Generate AI summaries via summarizer
    5. Generate executive summary
    6. Assemble final ReportData
    """
    ...


async def store_report(
    pool: asyncpg.Pool,
    report: ReportData,
) -> str:
    """Store report in trading_reports table.

    Uses INSERT ... ON CONFLICT (report_type, period_start, period_end)
    DO UPDATE to handle regeneration of existing reports.

    Returns the report UUID.
    """
    ...


async def process_report_job(
    pool: asyncpg.Pool,
    job: dict,
) -> None:
    """Process a report generation job from the Redis queue.

    Deserializes job payload, calls generate_report + store_report.
    Handles retries with exponential backoff (up to 3 attempts).
    Rejects duplicate jobs for the same report_type + period.
    """
    ...

7. API Endpoints (added to services/api/app.py)

Endpoint Method Parameters Returns
GET /api/reports GET report_type, start_date, end_date, limit, offset Paginated list: id, report_type, period_start, period_end, validation_status, generated_at
GET /api/reports/{report_id} GET Full report including report_data JSONB

8. Frontend Hooks (added to frontend/src/api/hooks.ts)

export interface ReportListItem {
  id: string;
  report_type: string;
  period_start: string;
  period_end: string;
  validation_status: string;
  generated_at: string;
}

export interface ReportDetail extends ReportListItem {
  report_data: Record<string, unknown>;
  created_at: string;
}

export function useReports(params?: {
  report_type?: string;
  start_date?: string;
  end_date?: string;
  limit?: number;
  offset?: number;
}) {
  const qs = new URLSearchParams();
  if (params?.report_type) qs.set('report_type', params.report_type);
  if (params?.start_date) qs.set('start_date', params.start_date);
  if (params?.end_date) qs.set('end_date', params.end_date);
  if (params?.limit) qs.set('limit', String(params.limit));
  if (params?.offset) qs.set('offset', String(params.offset));
  const path = `/api/reports${qs.toString() ? '?' + qs : ''}`;
  return useGet<ReportListItem[]>(['reports', params], 'query', path);
}

export function useReport(id: string | undefined) {
  return useGet<ReportDetail>(
    ['report', id], 'query', `/api/reports/${id}`, !!id
  );
}

Data Models

Database Schema (Migration 038)

trading_reports

CREATE TABLE IF NOT EXISTS trading_reports (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    report_type VARCHAR(20) NOT NULL,
    period_start DATE NOT NULL,
    period_end DATE NOT NULL,
    report_data JSONB NOT NULL,
    validation_status VARCHAR(20) NOT NULL DEFAULT 'passed',
    generated_at TIMESTAMPTZ NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    CONSTRAINT uq_trading_reports_period UNIQUE (report_type, period_start, period_end),
    CONSTRAINT chk_report_type CHECK (report_type IN ('daily', 'weekly'))
);

CREATE INDEX IF NOT EXISTS idx_trading_reports_type ON trading_reports(report_type);
CREATE INDEX IF NOT EXISTS idx_trading_reports_period ON trading_reports(period_start, period_end);
CREATE INDEX IF NOT EXISTS idx_trading_reports_generated ON trading_reports(generated_at DESC);

Report Summarizer Agent Seed

INSERT INTO ai_agents (name, slug, purpose, model_provider, model_name, system_prompt, prompt_version, schema_version, temperature, max_tokens, timeout_seconds, max_retries, source)
SELECT * FROM (VALUES
    (
        'Report Summarizer',
        'report-summarizer',
        'Generates concise natural-language summaries of trading performance report sections. Processes chunked data within the 8k-token context window.',
        'ollama',
        'qwen3.5:9b-fast',
        E'You are a concise financial performance analyst. You summarize trading performance data into clear, professional prose.\n\nSTRICT RULES:\n1. Do NOT fabricate any data not present in the input.\n2. Do NOT add opinions, predictions, or recommendations.\n3. Keep each summary under 200 words.\n4. Highlight notable trends, outliers, and changes from prior periods.\n5. Use precise numbers from the input data.\n6. Use a neutral, professional tone.\n7. Return ONLY the summary text. No JSON, no markdown, no commentary.',
        'report-summarizer-v1',
        '1.0.0',
        0.0,
        1024,
        60,
        2,
        'system'
    )
) AS v(name, slug, purpose, model_provider, model_name, system_prompt, prompt_version, schema_version, temperature, max_tokens, timeout_seconds, max_retries, source)
WHERE NOT EXISTS (SELECT 1 FROM ai_agents WHERE slug = 'report-summarizer');

Report JSONB Structure

The report_data column stores a JSON object matching the ReportData Pydantic model:

{
  "pnl": {
    "realized_pnl": 125.50,
    "unrealized_pnl": -30.20,
    "daily_return": 0.012,
    "cumulative_return": 0.085,
    "win_count": 8,
    "loss_count": 3,
    "win_rate": 0.727,
    "profit_factor": 2.15,
    "sharpe_ratio": 1.42,
    "summary": "AI-generated summary...",
    "validation_warnings": []
  },
  "recommendation_accuracy": {
    "total_evaluated": 15,
    "act_count": 8,
    "skip_count": 7,
    "acted_win_rate": 0.75,
    "avg_confidence_acted": 0.72,
    "avg_confidence_skipped": 0.48,
    "summary": "AI-generated summary...",
    "validation_warnings": []
  },
  "position_performance": {
    "positions": [
      {
        "ticker": "AAPL",
        "entry_price": 185.50,
        "current_or_exit_price": 192.30,
        "pnl": 68.00,
        "pnl_pct": 3.66,
        "hold_duration_hours": 72.5,
        "status": "open"
      }
    ],
    "summary": "AI-generated summary..."
  },
  "risk_metrics": {
    "current_risk_tier": "moderate",
    "portfolio_heat": 0.12,
    "max_drawdown": 0.08,
    "current_drawdown_pct": 0.03,
    "reserve_pool_balance": 450.00,
    "circuit_breaker_event_count": 1,
    "summary": "AI-generated summary..."
  },
  "model_quality": {
    "windows": [
      {
        "lookback": "7d",
        "win_rate": 0.65,
        "directional_accuracy": 0.62,
        "information_coefficient": 0.08,
        "calibration_error": 0.12,
        "brier_score": 0.22
      }
    ],
    "summary": "AI-generated summary...",
    "validation_warnings": []
  },
  "executive_summary": "AI-generated executive summary...",
  "validation_status": "passed",
  "generated_at": "2025-01-15T21:30:00Z",
  "period_start": "2025-01-15",
  "period_end": "2025-01-15",
  "report_type": "daily"
}

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

The following properties were derived from the acceptance criteria through systematic prework analysis. After reflection, 5 unique properties remain. Report section structure checks (3.13.5) are subsumed by the round-trip property — if a ReportData object survives serialization and deserialization, its structure is correct by construction (Pydantic enforces required fields). Validation status computation (4.4) is subsumed by the discrepancy detection property. ISO 8601 datetime formatting (8.4) is verified as part of the round-trip property since Pydantic's JSON serialization uses ISO 8601 by default and the round-trip would fail if datetimes were mangled.

Property 1: Chunking Round-Trip and Size Constraint

For any input string, splitting it into chunks with a maximum size limit SHALL produce chunks where (a) every chunk is ≤ the size limit in characters, (b) no chunk is empty (except when the input itself is empty, which produces exactly one empty chunk), and (c) concatenating all chunks in order reconstructs the original input string.

Validates: Requirements 2.2

Property 2: Report Serialization Round-Trip

For any valid ReportData object (with valid P&L, recommendation accuracy, position performance, risk metrics, and model quality sections), serializing to JSON and then deserializing back SHALL produce a ReportData object equivalent to the original. All datetime fields in the serialized JSON SHALL be in ISO 8601 format.

Validates: Requirements 8.1, 8.2, 8.3, 8.4

Property 3: Validation Discrepancy Detection Correctness

For any pair of computed metric value and snapshot metric value (both finite, non-negative floats), the validation function SHALL produce a warning if and only if the percentage difference exceeds 5%. The percentage difference SHALL be computed as |computed - snapshot| / snapshot * 100 when snapshot > 0, and SHALL flag any non-zero computed value when snapshot is 0.

Validates: Requirements 4.1, 4.2, 4.3, 4.4

Property 4: Recommendation Accuracy Aggregation

For any non-empty list of trading decisions with associated prediction outcomes (each having a boolean direction_correct, boolean profitable, and float excess_return_vs_spy), the computed win rate SHALL equal the count of profitable outcomes divided by total outcomes, the directional accuracy SHALL equal the count of direction-correct outcomes divided by total outcomes, and the average excess return SHALL equal the arithmetic mean of all excess_return_vs_spy values. All three values SHALL be in [0.0, 1.0] for rates and finite for the average.

Validates: Requirements 1.4

Property 5: Portfolio Period-Over-Period Delta Computation

For any two valid portfolio snapshots (current and previous) with non-negative portfolio_value, active_pool, reserve_pool, and finite cumulative_return, the period-over-period deltas SHALL equal (current - previous) for each field. When no previous snapshot exists, the deltas SHALL be zero.

Validates: Requirements 1.3


Error Handling

Data Collection Failures

Scenario Handling
No trading_decisions for period Generate zero-activity report with note "No trading activity during this period"
No portfolio_snapshot for period Use most recent snapshot before period_start; if none exists, use zero values
No prediction_outcomes for period Skip recommendation accuracy validation; set validation_warnings noting missing data
No model_metric_snapshots for period Model quality section shows NULL values for all metrics
Database connection failure during collection Propagate error to job processor for retry

AI Summarization Failures

Scenario Handling
LLM timeout (>60s) Retry up to max_retries (from agent config, default 2)
LLM returns empty response Treat as failure, retry
LLM returns response > 200 words Truncate to 200 words at sentence boundary
All LLM retries exhausted Fall back to deterministic template summary
AgentConfigResolver returns None (agent not found) Log error, use deterministic summary for all sections
Chunk merge LLM call fails Use concatenation of chunk summaries (joined with newlines)

Validation Edge Cases

Scenario Handling
Snapshot value is 0 and computed value is non-zero Flag as warning with pct_difference = 100.0
Both snapshot and computed values are 0 No warning (0% difference)
Snapshot value is NULL Skip validation for that metric, no warning
Computed value is NaN or infinity Replace with 0.0, log warning
No prediction_outcomes to cross-reference Skip recommendation accuracy validation entirely

Report Storage Failures

Scenario Handling
Unique constraint violation on insert Use ON CONFLICT DO UPDATE to upsert
JSONB serialization failure Log error with report structure, propagate to job processor
Report exceeds PostgreSQL JSONB size limit (~255 MB) Extremely unlikely given report structure; log error if it occurs

Job Processing Failures

Scenario Handling
Job fails on first attempt Retry with exponential backoff: 30s, 60s, 120s
Job fails after 3 retries Mark job as failed, log error with full context
Duplicate job submitted for same period Reject with log message, return without error
Redis connection failure Job stays in queue, picked up on reconnection

Testing Strategy

Property-Based Tests (Hypothesis)

Property-based tests use the Hypothesis library with @settings(max_examples=100). Test files are prefixed test_pbt_* per project convention.

Property Test File What It Tests
Property 1: Chunking Round-Trip tests/test_pbt_report_chunking.py chunk_data() preserves content and respects size limits
Property 2: Report Serialization Round-Trip tests/test_pbt_report_serialization.py ReportData.model_dump_json()ReportData.model_validate_json() round-trip
Property 3: Validation Discrepancy Detection tests/test_pbt_report_validation.py Discrepancy detection correctly flags >5% differences
Property 4: Recommendation Accuracy Aggregation tests/test_pbt_report_sections.py build_recommendation_accuracy_section() computes correct aggregates
Property 5: Portfolio Delta Computation tests/test_pbt_report_sections.py build_pnl_section() computes correct period-over-period deltas

Each property test is tagged with a comment referencing the design property:

# Feature: trading-feedback-engine, Property 1: Chunking round-trip and size constraint

Unit Tests (pytest)

Test File Coverage
tests/test_report_sections.py Section builders with known inputs, edge cases (empty data, single position, zero-activity)
tests/test_report_validator.py Specific discrepancy scenarios, boundary cases (exactly 5%), NULL snapshot values
tests/test_report_summarizer.py Deterministic fallback summary, chunk splitting edge cases (empty input, single char)
tests/test_report_models.py Pydantic model validation, enum constraints, default values
tests/test_report_generator.py Orchestration with mocked dependencies, zero-activity report, upsert behavior

Integration Tests

Test File Coverage
tests/test_report_api.py API endpoints with seeded database, pagination, filtering by report_type and date range
tests/test_report_storage.py Store/retrieve round-trip against real asyncpg pool, upsert behavior, unique constraint

Frontend Tests (Vitest)

Test File Coverage
frontend/src/test/reports.test.ts useReports and useReport hooks with MSW mocks, loading/error states

Test Configuration

  • Python PBT: Hypothesis with @settings(max_examples=100), files prefixed test_pbt_*
  • Python unit/integration: pytest with pytest-asyncio for async code
  • Frontend: Vitest with MSW for deterministic API mocking
  • Lint: ruff check services/ before all commits
  • CI: Woodpecker runs all tests automatically on push to Gitea