# Replay Dataset for Deterministic Extraction Testing Archived document fixtures used to verify that the extraction pipeline produces consistent, schema-valid results across code changes. Each fixture is a JSON file containing: - `document_id`: stable identifier for the fixture - `document_type`: article, filing, transcript, or press_release - `document_text`: normalized text as it would arrive from the parser - `known_tickers`: ticker hints passed to the extraction prompt - `expected_extraction`: the expected extraction result (schema-valid) - `metadata`: fixture provenance info (created_at, description, schema_version) The replay runner (`tests/test_replay_extraction.py`) loads these fixtures, validates the expected outputs against the current extraction schema, and optionally runs them through a live Ollama instance for end-to-end checks.