117b693b19
- LLMClient Protocol for provider-agnostic inference - VLLMClient for OpenAI-compatible /v1/chat/completions API - LLM client factory with provider routing (ollama/vllm) - VLLMConfig with VLLM_* environment variable loading - Updated extractor worker with health check and provider switching - Updated event classifier to use LLMClient protocol - Helm values for vLLM configuration - 18 unit tests + 6 property-based tests - Full backward compatibility preserved
6.5 KiB
6.5 KiB
Tasks
Task 1: LLM Client Protocol and VLLMConfig
- 1.1 Create
services/shared/llm_protocol.pywithLLMClientProtocol definingcall_llm(prompts, json_schema, document_text) -> ExtractionAttemptandclose()methods - 1.2 Add
VLLMConfigdataclass toservices/shared/config.pywith fields:base_url,model,timeout,max_retries,retry_base_delay,retry_max_delay,retry_backoff_multiplier,max_tokens,temperature,api_key - 1.3 Add
vllm: VLLMConfigfield toAppConfigdataclass - 1.4 Add
VLLM_*environment variable loading toload_config()function - 1.5 Add public
call_llm()method toOllamaClientinservices/extractor/client.pythat delegates to_call_ollama()
Task 2: VLLMClient Implementation
- 2.1 Create
services/extractor/vllm_client.pywithVLLMClientclass that satisfies theLLMClientprotocol - 2.2 Implement
call_llm()method that sends POST to/v1/chat/completionswith OpenAI-compatible payload (model,messages,max_tokens,temperature,response_format) - 2.3 Implement response parsing: extract content from
choices[0].message.content, apply_strip_markdown_fences()and_repair_json() - 2.4 Implement error handling: map timeout →
timeout, HTTP errors →http_{code}, connection errors →connection_error: {details}, empty response →empty_model_response - 2.5 Implement
close()method to release the underlyinghttpx.AsyncClient - 2.6 Implement
check_vllm_health(base_url, timeout=10.0)async function that GETs/v1/modelsand returns bool
Task 3: LLM Client Factory
- 3.1 Create
services/extractor/llm_factory.pywithbuild_llm_client()function that returnsOllamaClientorVLLMClientbased on resolvedmodel_provider - 3.2 Implement
build_config_from_resolved()function that creates provider-specific config fromResolvedAgentConfigand base configs - 3.3 Handle unknown provider values: log warning and fall back to
OllamaClient
Task 4: Update Extractor Worker for Provider Abstraction
- 4.1 Update
services/extractor/main.pyto import and usebuild_llm_client()from the factory instead of directly constructingOllamaClient - 4.2 Replace
_build_ollama_config_from_resolved()usage with the factory'sbuild_config_from_resolved()for both extractor and classifier clients - 4.3 Add vLLM health check call at startup when resolved config specifies
model_provider = "vllm", with fallback to Ollama on failure - 4.4 Update config refresh logic (every 100 jobs) to detect provider changes, close old client, and construct new client via factory
- 4.5 Add INFO-level logging for provider switches including old/new provider, model name, and variant ID
Task 5: Update Event Classifier for Provider Abstraction
- 5.1 Update
classify_global_event()inservices/extractor/event_classifier.pyto acceptLLMClientprotocol type instead ofAnyfor the client parameter - 5.2 Replace
ollama_client._call_ollama()calls withclient.call_llm()calls - 5.3 Update
ModelMetadata.providerassignment to use the actual provider string from the client (detect from config type or pass explicitly) - 5.4 Update retry logic to use client config attributes instead of accessing
ollama_client._base_delayandollama_client._backoff_multiplierdirectly
Task 6: Helm Configuration
- 6.1 Add
VLLM_BASE_URL,VLLM_MODEL,VLLM_TIMEOUT,VLLM_MAX_RETRIES,VLLM_TEMPERATURE, andVLLM_API_KEYentries to theconfig:section ininfra/helm/stonks-oracle/values.yaml
Task 7: Unit Tests for VLLMClient
- 7.1 Create
tests/test_vllm_client.pywith test for VLLMClient sending correct payload to/v1/chat/completionsusing mock httpx transport - 7.2 Add test for VLLMClient extracting content from
choices[0].message.content - 7.3 Add test for VLLMClient handling empty choices array returning
empty_model_responseerror - 7.4 Add test for VLLMClient handling HTTP timeout returning
timeouterror - 7.5 Add test for VLLMClient handling HTTP 500 returning
http_500retryable error - 7.6 Add test for VLLMClient handling HTTP 400 returning
http_400non-retryable error - 7.7 Add test for VLLMClient handling connection error returning
connection_error: ... - 7.8 Add test for VLLMClient applying markdown fence stripping and JSON repair to response
- 7.9 Add test for VLLMClient including temperature and response_format in payload
- 7.10 Add test for health check success returning True and logging INFO
- 7.11 Add test for health check failure returning False and logging WARNING
- 7.12 Add test for OllamaClient.call_llm() delegating to _call_ollama()
- 7.13 Add test for VLLMConfig loading from environment variables
- 7.14 Add test for AppConfig including vllm field with correct defaults
Task 8: Unit Tests for LLM Factory
- 8.1 Add tests to
tests/test_vllm_client.pyfor factory returning OllamaClient when provider is "ollama" - 8.2 Add test for factory returning VLLMClient when provider is "vllm"
- 8.3 Add test for factory returning OllamaClient when provider is empty string (default)
- 8.4 Add test for factory returning OllamaClient with warning when provider is unknown value
Task 9: Property-Based Tests
- 9.1 Create
tests/test_pbt_llm_provider.pywith property test for factory routing: for all model_provider in {"ollama", "vllm", "", None}, factory returns correct client type [PBT] - 9.2 Add property test for error string format consistency: for all HTTP status codes (100-599),
_is_retryable()classifies them consistently [PBT] - 9.3 Add property test for VLLMClient request payload structure: for all generated prompt dicts, payload contains required OpenAI fields and excludes Ollama-specific fields [PBT]
- 9.4 Add property test for JSON repair idempotence: for all valid JSON strings,
_repair_json()is idempotent [PBT] - 9.5 Add property test for markdown fence stripping: for all strings, wrapping in fences then stripping recovers the original [PBT]
- 9.6 Add property test for VLLMConfig defaults: for all default-constructed instances, invariants hold (timeout > 0, max_retries >= 0, 0 <= temperature <= 2, max_tokens > 0) [PBT]
Task 10: Verification and Backward Compatibility
- 10.1 Run existing
tests/test_ollama_client.pyto verify no regressions - 10.2 Run
ruff check services/to verify no lint errors in modified files - 10.3 Run full test suite
python -m pytest tests/ -x --tb=short -qto verify all tests pass