Celes Renata
00044af993
fix: switch to think=false with json-repair — 20x faster extraction
2026-04-15 02:54:39 +00:00
Celes Renata
4f2ae23d42
fix: set num_predict=16384 so model has token budget for thinking + content
2026-04-15 01:47:00 +00:00
Celes Renata
46b069a748
fix: switch to non-streaming Ollama calls — streaming breaks thinking mode
2026-04-15 01:19:17 +00:00
Celes Renata
ffe19eb23a
fix: handle empty ticker in MinIO storage paths, clean up debug log
2026-04-15 00:39:53 +00:00
Celes Renata
8b5b692d3c
fix: update stall timer during thinking phase to prevent premature stream abort
2026-04-15 00:06:49 +00:00
Celes Renata
01726af360
fix: remove think=false (Ollama bug #14645 ), bump max_tokens to 32k
2026-04-14 23:50:28 +00:00
Celes Renata
cd782d1552
fix(extractor): streaming with guardrails + catalyst_type normalization
...
- Switch Ollama calls from non-streaming to streaming with early termination
- Add loop detection, max token limit, and stall timeout guards
- Add catalyst_type alias normalizer to handle model hallucinations
- Add explicit enum values in extraction prompt for catalyst_type
- Add streaming config knobs to OllamaConfig
2026-04-12 15:28:20 -07:00
Celes Renata
6e2f174b19
phase 17: disable qwen3.5 thinking mode (think:false) to reduce latency and improve structured output
2026-04-12 12:35:24 -07:00
Celes Renata
45f0c03639
phase 17: add request-level URL logging to OllamaClient for proxy debugging
2026-04-12 12:32:44 -07:00
Celes Renata
ce10afa034
phase 14-15: docker build validation and helm deployment
2026-04-11 11:59:45 -07:00