Commit Graph

11 Commits

Author SHA1 Message Date
Celes Renata aa67523acd fix: ensure JSON output instruction in system prompt override + retry on ValueError 2026-04-17 17:03:58 +00:00
Celes Renata 523d3ea749 fix: catch ValueError in classification retry loop + add debug logging for raw output 2026-04-17 17:00:32 +00:00
Celes Renata f054e97b5b fix: event classifier unwraps single-element list from model output
Model sometimes returns [{...}] instead of {...}. Now unwraps
single-element lists before parsing the event fields.
2026-04-17 16:44:57 +00:00
Celes Renata 76ff7ae00a fix: ruff import sort order 2026-04-17 16:37:30 +00:00
Celes Renata 1394e6168b fix: event classifier now strips markdown fences and repairs JSON
_parse_classification_response receives raw model output (with thinking
tags, markdown fences, etc.) but was calling json.loads directly.
Now uses _strip_markdown_fences + _repair_json from the client module
before parsing, matching what _call_ollama does for extractions.
2026-04-17 16:35:13 +00:00
Celes Renata 759d868e3b fix: event classifier was blocked by extraction schema validation
_call_ollama validates against the document extraction schema, which
doesn't match event classification output. The event classifier was
checking 'if attempt.error is None' before trying its own parsing,
so it never got to parse the valid event JSON — 956 consecutive
failures.

Now tries _parse_classification_response whenever raw_output exists,
regardless of the extraction validation error.
2026-04-17 16:28:39 +00:00
Celes Renata c501ccea40 fix: default model to qwen3.5:9b + improve event classifier prompt
- Migration 026 and OllamaConfig now default to qwen3.5:9b instead of
  llama3.1:8b. Existing deployments keep their current model (qwen3.5:9b-fast)
  since the migration uses WHERE NOT EXISTS on slug.

- Event classifier system prompt expanded with macro-vs-company filtering:
  explicitly instructs the model to NOT classify single-company news
  (lawsuits, earnings, management changes, debt crises) as macro events.
  Sets severity=low and confidence<0.3 for company-specific articles.
  Reserves 'critical' severity for multi-country/global market events.
  Prevents over-tagging event_types by requiring direct description.

- Updated test_system_prompt_is_concise threshold to accommodate the
  expanded prompt (300 → 1000 chars).
2026-04-17 02:53:38 +00:00
Celes Renata 693d9e0d60 fix: reduce LLM timeouts — truncate docs to 8k/6k chars, cut num_predict 16k→4k, tighten prompts, trim anti-hallucination rules 2026-04-16 18:56:11 +00:00
Celes Renata 3ff910433f fix: reject empty LLM classifications for global events
When the LLM returns empty summary and no key facts, raise ValueError
so the retry logic kicks in instead of persisting an empty event.
Also strip whitespace from summary and filter empty key_facts entries.

Cleaned up 17 empty events from the database.
2026-04-15 19:46:31 +00:00
Celes Renata d8ea58104c fix: lint errors (import sorting, unused vars) 2026-04-14 19:48:19 +00:00
Celes Renata f7a11d14ea feat: competitive intelligence & historical pattern matching layer 2026-04-14 19:42:48 +00:00