phase 14-15: docker build validation and helm deployment
This commit is contained in:
@@ -0,0 +1,16 @@
|
||||
# Replay Dataset for Deterministic Extraction Testing
|
||||
|
||||
Archived document fixtures used to verify that the extraction pipeline
|
||||
produces consistent, schema-valid results across code changes.
|
||||
|
||||
Each fixture is a JSON file containing:
|
||||
- `document_id`: stable identifier for the fixture
|
||||
- `document_type`: article, filing, transcript, or press_release
|
||||
- `document_text`: normalized text as it would arrive from the parser
|
||||
- `known_tickers`: ticker hints passed to the extraction prompt
|
||||
- `expected_extraction`: the expected extraction result (schema-valid)
|
||||
- `metadata`: fixture provenance info (created_at, description, schema_version)
|
||||
|
||||
The replay runner (`tests/test_replay_extraction.py`) loads these fixtures,
|
||||
validates the expected outputs against the current extraction schema, and
|
||||
optionally runs them through a live Ollama instance for end-to-end checks.
|
||||
Reference in New Issue
Block a user