stonks-oracle

admin/stonks-oracle

Fork 0

Commit Graph

Author	SHA1	Message	Date
Celes Renata	0437943863	fix: reduce vLLM default max_tokens to 4096, update model to AxionML/Qwen3.5-9B-NVFP4 The model's max_model_len is 16384 — requesting 32768 output tokens caused HTTP 400 from vLLM. 4096 is a safe default for extraction output.	2026-04-23 19:49:34 +00:00
Celes Renata	117b693b19	feat: add remote vLLM support with provider abstraction layer - LLMClient Protocol for provider-agnostic inference - VLLMClient for OpenAI-compatible /v1/chat/completions API - LLM client factory with provider routing (ollama/vllm) - VLLMConfig with VLLM_* environment variable loading - Updated extractor worker with health check and provider switching - Updated event classifier to use LLMClient protocol - Helm values for vLLM configuration - 18 unit tests + 6 property-based tests - Full backward compatibility preserved	2026-04-23 08:17:23 +00:00

Author

SHA1

Message

Date

Celes Renata

0437943863

fix: reduce vLLM default max_tokens to 4096, update model to AxionML/Qwen3.5-9B-NVFP4

The model's max_model_len is 16384 — requesting 32768 output tokens
caused HTTP 400 from vLLM. 4096 is a safe default for extraction output.

2026-04-23 19:49:34 +00:00

Celes Renata

117b693b19

feat: add remote vLLM support with provider abstraction layer

- LLMClient Protocol for provider-agnostic inference
- VLLMClient for OpenAI-compatible /v1/chat/completions API
- LLM client factory with provider routing (ollama/vllm)
- VLLMConfig with VLLM_* environment variable loading
- Updated extractor worker with health check and provider switching
- Updated event classifier to use LLMClient protocol
- Helm values for vLLM configuration
- 18 unit tests + 6 property-based tests
- Full backward compatibility preserved

2026-04-23 08:17:23 +00:00

2 Commits