Workers (ingestion, parser, extractor, aggregation, recommendation,
broker, lake-publisher) now check the pipeline:enabled Redis flag on
each loop iteration and sleep when disabled.
The toggle endpoint flushes all pipeline queues on disable so queued
jobs don't resume when workers eventually check. Broker/trading queues
are excluded from flush to avoid dropping in-flight orders.
- Migration 026: update seed defaults from ollama to vllm/AxionML
- Migration 031: fix existing rows still on old ollama defaults
- Helm values: set OLLAMA_BASE_URL to cluster ollama endpoint (was empty)
- Extractor: guard against switching to ollama when base_url is empty
- OllamaClient: validate base_url on construction to fail fast
- LLMClient Protocol for provider-agnostic inference
- VLLMClient for OpenAI-compatible /v1/chat/completions API
- LLM client factory with provider routing (ollama/vllm)
- VLLMConfig with VLLM_* environment variable loading
- Updated extractor worker with health check and provider switching
- Updated event classifier to use LLMClient protocol
- Helm values for vLLM configuration
- 18 unit tests + 6 property-based tests
- Full backward compatibility preserved
_parse_classification_response receives raw model output (with thinking
tags, markdown fences, etc.) but was calling json.loads directly.
Now uses _strip_markdown_fences + _repair_json from the client module
before parsing, matching what _call_ollama does for extractions.
_call_ollama validates against the document extraction schema, which
doesn't match event classification output. The event classifier was
checking 'if attempt.error is None' before trying its own parsing,
so it never got to parse the valid event JSON — 956 consecutive
failures.
Now tries _parse_classification_response whenever raw_output exists,
regardless of the extraction validation error.
- Recommendation worker now resolves thesis-rewriter config from DB
and passes ollama_config to generate_recommendation. Thesis rewriting
is now active when the thesis-rewriter agent exists in ai_agents.
Refreshes config every 50 jobs.
- Event classifier now resolves its own config separately from the
document extractor via 'event-classifier' slug. Uses a separate
OllamaClient when the model differs from the extractor. Refreshes
alongside the extractor every 100 jobs.
- Document extractor was already wired (existing code).
- Added 8 unit tests for AgentConfigResolver covering: DB resolution,
variant override, not-found, DB errors, TTL caching, cache refresh,
and invalidation.
- Migration 026 and OllamaConfig now default to qwen3.5:9b instead of
llama3.1:8b. Existing deployments keep their current model (qwen3.5:9b-fast)
since the migration uses WHERE NOT EXISTS on slug.
- Event classifier system prompt expanded with macro-vs-company filtering:
explicitly instructs the model to NOT classify single-company news
(lawsuits, earnings, management changes, debt crises) as macro events.
Sets severity=low and confidence<0.3 for company-specific articles.
Reserves 'critical' severity for multi-country/global market events.
Prevents over-tagging event_types by requiring direct description.
- Updated test_system_prompt_is_concise threshold to accommodate the
expanded prompt (300 → 1000 chars).
When the LLM returns empty summary and no key facts, raise ValueError
so the retry logic kicks in instead of persisting an empty event.
Also strip whitespace from summary and filter empty key_facts entries.
Cleaned up 17 empty events from the database.
- Switch Ollama calls from non-streaming to streaming with early termination
- Add loop detection, max token limit, and stall timeout guards
- Add catalyst_type alias normalizer to handle model hallucinations
- Add explicit enum values in extraction prompt for catalyst_type
- Add streaming config knobs to OllamaConfig