From 6880f11c2668a97a466c7a2bbb1eff9f31635d53 Mon Sep 17 00:00:00 2001 From: Celes Renata Date: Wed, 29 Apr 2026 16:11:44 +0000 Subject: [PATCH] fix: add /no_think inline tag to disable Qwen3 thinking mode chat_template_kwargs isn't being respected by the vLLM deployment. Qwen3 models support /no_think as an inline suffix in the user message to disable thinking mode. This is the most reliable method across all serving backends (vLLM, Ollama, SGLang). --- services/recommendation/thesis_llm.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/services/recommendation/thesis_llm.py b/services/recommendation/thesis_llm.py index 1ff0248..482ef24 100644 --- a/services/recommendation/thesis_llm.py +++ b/services/recommendation/thesis_llm.py @@ -76,7 +76,7 @@ Rewrite the following structured thesis into clear, professional analyst prose. {context_block} --- END CONTEXT --- -Return ONLY the rewritten thesis. No other text.""" +Return ONLY the rewritten thesis. No other text. /no_think""" return { "system": THESIS_SYSTEM_PROMPT,