feat: add remote vLLM support with provider abstraction layer

- LLMClient Protocol for provider-agnostic inference
- VLLMClient for OpenAI-compatible /v1/chat/completions API
- LLM client factory with provider routing (ollama/vllm)
- VLLMConfig with VLLM_* environment variable loading
- Updated extractor worker with health check and provider switching
- Updated event classifier to use LLMClient protocol
- Helm values for vLLM configuration
- 18 unit tests + 6 property-based tests
- Full backward compatibility preserved
This commit is contained in:
Celes Renata
2026-04-23 08:17:23 +00:00
parent 63e4fb96ea
commit 117b693b19
15 changed files with 1876 additions and 77 deletions
+6
View File
@@ -181,6 +181,12 @@ config:
OLLAMA_RETRY_BASE_DELAY: "1.0"
OLLAMA_RETRY_MAX_DELAY: "10.0"
OLLAMA_RETRY_BACKOFF_MULTIPLIER: "2.0"
VLLM_BASE_URL: "http://192.168.42.254:8000"
VLLM_MODEL: "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
VLLM_TIMEOUT: "120"
VLLM_MAX_RETRIES: "2"
VLLM_TEMPERATURE: "0.7"
VLLM_API_KEY: ""
TRINO_HOST: "trino.stonks-oracle.svc.cluster.local"
TRINO_PORT: "8080"
TRINO_CATALOG: "lakehouse"