phase 14-15: docker build validation and helm deployment

2026-04-11 11:59:45 -07:00
parent 7394d241c9
commit ce10afa034
179 changed files with 32559 additions and 576 deletions
@@ -0,0 +1,16 @@
+# Replay Dataset for Deterministic Extraction Testing
+
+Archived document fixtures used to verify that the extraction pipeline
+produces consistent, schema-valid results across code changes.
+
+Each fixture is a JSON file containing:
+- `document_id`: stable identifier for the fixture
+- `document_type`: article, filing, transcript, or press_release
+- `document_text`: normalized text as it would arrive from the parser
+- `known_tickers`: ticker hints passed to the extraction prompt
+- `expected_extraction`: the expected extraction result (schema-valid)
+- `metadata`: fixture provenance info (created_at, description, schema_version)
+
+The replay runner (`tests/test_replay_extraction.py`) loads these fixtures,
+validates the expected outputs against the current extraction schema, and
+optionally runs them through a live Ollama instance for end-to-end checks.