# Tasks ## Task 1: LLM Client Protocol and VLLMConfig - [x] 1.1 Create `services/shared/llm_protocol.py` with `LLMClient` Protocol defining `call_llm(prompts, json_schema, document_text) -> ExtractionAttempt` and `close()` methods - [x] 1.2 Add `VLLMConfig` dataclass to `services/shared/config.py` with fields: `base_url`, `model`, `timeout`, `max_retries`, `retry_base_delay`, `retry_max_delay`, `retry_backoff_multiplier`, `max_tokens`, `temperature`, `api_key` - [x] 1.3 Add `vllm: VLLMConfig` field to `AppConfig` dataclass - [x] 1.4 Add `VLLM_*` environment variable loading to `load_config()` function - [x] 1.5 Add public `call_llm()` method to `OllamaClient` in `services/extractor/client.py` that delegates to `_call_ollama()` ## Task 2: VLLMClient Implementation - [x] 2.1 Create `services/extractor/vllm_client.py` with `VLLMClient` class that satisfies the `LLMClient` protocol - [x] 2.2 Implement `call_llm()` method that sends POST to `/v1/chat/completions` with OpenAI-compatible payload (`model`, `messages`, `max_tokens`, `temperature`, `response_format`) - [x] 2.3 Implement response parsing: extract content from `choices[0].message.content`, apply `_strip_markdown_fences()` and `_repair_json()` - [x] 2.4 Implement error handling: map timeout → `timeout`, HTTP errors → `http_{code}`, connection errors → `connection_error: {details}`, empty response → `empty_model_response` - [x] 2.5 Implement `close()` method to release the underlying `httpx.AsyncClient` - [x] 2.6 Implement `check_vllm_health(base_url, timeout=10.0)` async function that GETs `/v1/models` and returns bool ## Task 3: LLM Client Factory - [x] 3.1 Create `services/extractor/llm_factory.py` with `build_llm_client()` function that returns `OllamaClient` or `VLLMClient` based on resolved `model_provider` - [x] 3.2 Implement `build_config_from_resolved()` function that creates provider-specific config from `ResolvedAgentConfig` and base configs - [x] 3.3 Handle unknown provider values: log warning and fall back to `OllamaClient` ## Task 4: Update Extractor Worker for Provider Abstraction - [x] 4.1 Update `services/extractor/main.py` to import and use `build_llm_client()` from the factory instead of directly constructing `OllamaClient` - [x] 4.2 Replace `_build_ollama_config_from_resolved()` usage with the factory's `build_config_from_resolved()` for both extractor and classifier clients - [x] 4.3 Add vLLM health check call at startup when resolved config specifies `model_provider = "vllm"`, with fallback to Ollama on failure - [x] 4.4 Update config refresh logic (every 100 jobs) to detect provider changes, close old client, and construct new client via factory - [x] 4.5 Add INFO-level logging for provider switches including old/new provider, model name, and variant ID ## Task 5: Update Event Classifier for Provider Abstraction - [x] 5.1 Update `classify_global_event()` in `services/extractor/event_classifier.py` to accept `LLMClient` protocol type instead of `Any` for the client parameter - [x] 5.2 Replace `ollama_client._call_ollama()` calls with `client.call_llm()` calls - [x] 5.3 Update `ModelMetadata.provider` assignment to use the actual provider string from the client (detect from config type or pass explicitly) - [x] 5.4 Update retry logic to use client config attributes instead of accessing `ollama_client._base_delay` and `ollama_client._backoff_multiplier` directly ## Task 6: Helm Configuration - [x] 6.1 Add `VLLM_BASE_URL`, `VLLM_MODEL`, `VLLM_TIMEOUT`, `VLLM_MAX_RETRIES`, `VLLM_TEMPERATURE`, and `VLLM_API_KEY` entries to the `config:` section in `infra/helm/stonks-oracle/values.yaml` ## Task 7: Unit Tests for VLLMClient - [x] 7.1 Create `tests/test_vllm_client.py` with test for VLLMClient sending correct payload to `/v1/chat/completions` using mock httpx transport - [x] 7.2 Add test for VLLMClient extracting content from `choices[0].message.content` - [x] 7.3 Add test for VLLMClient handling empty choices array returning `empty_model_response` error - [x] 7.4 Add test for VLLMClient handling HTTP timeout returning `timeout` error - [x] 7.5 Add test for VLLMClient handling HTTP 500 returning `http_500` retryable error - [x] 7.6 Add test for VLLMClient handling HTTP 400 returning `http_400` non-retryable error - [x] 7.7 Add test for VLLMClient handling connection error returning `connection_error: ...` - [x] 7.8 Add test for VLLMClient applying markdown fence stripping and JSON repair to response - [x] 7.9 Add test for VLLMClient including temperature and response_format in payload - [x] 7.10 Add test for health check success returning True and logging INFO - [x] 7.11 Add test for health check failure returning False and logging WARNING - [x] 7.12 Add test for OllamaClient.call_llm() delegating to _call_ollama() - [x] 7.13 Add test for VLLMConfig loading from environment variables - [x] 7.14 Add test for AppConfig including vllm field with correct defaults ## Task 8: Unit Tests for LLM Factory - [x] 8.1 Add tests to `tests/test_vllm_client.py` for factory returning OllamaClient when provider is "ollama" - [x] 8.2 Add test for factory returning VLLMClient when provider is "vllm" - [x] 8.3 Add test for factory returning OllamaClient when provider is empty string (default) - [x] 8.4 Add test for factory returning OllamaClient with warning when provider is unknown value ## Task 9: Property-Based Tests - [x] 9.1 Create `tests/test_pbt_llm_provider.py` with property test for factory routing: for all model_provider in {"ollama", "vllm", "", None}, factory returns correct client type [PBT] - [x] 9.2 Add property test for error string format consistency: for all HTTP status codes (100-599), `_is_retryable()` classifies them consistently [PBT] - [x] 9.3 Add property test for VLLMClient request payload structure: for all generated prompt dicts, payload contains required OpenAI fields and excludes Ollama-specific fields [PBT] - [x] 9.4 Add property test for JSON repair idempotence: for all valid JSON strings, `_repair_json()` is idempotent [PBT] - [x] 9.5 Add property test for markdown fence stripping: for all strings, wrapping in fences then stripping recovers the original [PBT] - [x] 9.6 Add property test for VLLMConfig defaults: for all default-constructed instances, invariants hold (timeout > 0, max_retries >= 0, 0 <= temperature <= 2, max_tokens > 0) [PBT] ## Task 10: Verification and Backward Compatibility - [x] 10.1 Run existing `tests/test_ollama_client.py` to verify no regressions - [x] 10.2 Run `ruff check services/` to verify no lint errors in modified files - [x] 10.3 Run full test suite `python -m pytest tests/ -x --tb=short -q` to verify all tests pass