fix(extractor): streaming with guardrails + catalyst_type normalization

- Switch Ollama calls from non-streaming to streaming with early termination
- Add loop detection, max token limit, and stall timeout guards
- Add catalyst_type alias normalizer to handle model hallucinations
- Add explicit enum values in extraction prompt for catalyst_type
- Add streaming config knobs to OllamaConfig
This commit is contained in:
Celes Renata
2026-04-12 15:28:20 -07:00
parent 527be42f82
commit cd782d1552
4 changed files with 116 additions and 14 deletions
+2
View File
@@ -114,6 +114,8 @@ Fill these fields:
For each company entry fill: ticker, company_name, relevance (0-1), sentiment, impact_score (0-1), impact_horizon, catalyst_type, key_facts (list), risks (list), evidence_spans (verbatim quotes from text).
catalyst_type MUST be exactly one of: earnings, product, legal, macro, supply_chain, m_and_a, rating_change, other. Use "other" if none of the specific categories fit.
--- DOCUMENT TEXT ---
{document_text}
--- END DOCUMENT TEXT ---"""