-
- Notifications
You must be signed in to change notification settings - Fork 11.7k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues ci/build multi-modality Related to multi-modality (#4194) rocm Related to AMD ROCm
#29909 opened Dec 2, 2025 by AndreasKaratzas Loading…
[BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention deepseek Related to DeepSeek models llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed
#29908 opened Dec 2, 2025 by juliendenize Loading…
5 tasks
[CI/Build] Avoid duplicate empty inputs test for common multimodal generation tests multi-modality Related to multi-modality (#4194)
[Bugfix] Free requests to avoid a KV Cache exhaustion during VLLM_NIXL_ABORT_REQUEST_TIMEOUT kv-connector v1
#29906 opened Dec 2, 2025 by hasB4K Loading…
Gigachat 3 tool parser and tests documentation Improvements or additions to documentation frontend tool-calling
#29905 opened Dec 2, 2025 by ajpqs Loading…
4 of 5 tasks
SigLIP example add chat_template documentation Improvements or additions to documentation needs-rebase
#29902 opened Dec 2, 2025 by piood Loading…
[Kernel][Quantization][MoE] add marlin kernel support for turing (sm75) ci/build
#29901 opened Dec 2, 2025 by jinzhen-lin Loading…
[Core] Fix standalone runs of test_reset_prefix_cache_e2e v1
#29899 opened Dec 2, 2025 by markmc Loading…
[Compile] Fix torch warning ONLY add when PR is ready to merge/full CI is needed v1
TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled ready #29897 opened Dec 2, 2025 by yewentao256 Loading…
feat(model): Add BitsAndBytes quantization support for Qwen3-Omni-MoE qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
#29896 opened Dec 2, 2025 by Navanit-git Loading…
5 tasks
[DOC] Add Arm to list of compute resouces providers documentation Improvements or additions to documentation
#29894 opened Dec 2, 2025 by fadara01 Loading…
2 tasks
[BugFix] Fix assert in ONLY add when PR is ready to merge/full CI is needed v1
build_for_cudagraph_capture nvidia ready [Bugfix] respect user-defined flash attention version in ViT attentions
#29889 opened Dec 2, 2025 by cjackal Loading…
3 tasks done
[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for Related to AMD ROCm v1
AiterFlashAttentionBackend rocm #29887 opened Dec 2, 2025 by ganyi1996ppo Loading…
5 tasks
DRAFT: Mistral large 3 Extended Blackwell Support ci/build nvidia performance Performance-related issues v1
[bugfix] fix MiniMaxM2ReasoningParser streaming output not separating reasoning_content.
#29882 opened Dec 2, 2025 by JaviS-Rei Loading…
Improve and supplement LoRA tuning documentation performance Performance-related issues
#29880 opened Dec 2, 2025 by caozuoba Loading…
5 tasks
Scheduler: Prevent async loads from exploding the KV Cache v1
#29877 opened Dec 2, 2025 by orozery Loading…
add toolparser for deepseek v3.2 reusing qwen xml parser deepseek Related to DeepSeek models frontend qwen Related to Qwen models tool-calling
#29874 opened Dec 2, 2025 by wenmengzhou Loading…
Previous Next
ProTip! Type g i on any issue or pull request to go back to the issue listing page.