[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms #29952

pisceskkk · 2025-12-03T08:19:10Z

Since NPU already provides FullGraph support for both PCP and DCP, I believe we should relocate the graph support determination logic from the generic code to platform-specific judgment functions.
Currently, CUDA only supports PIECEWISE mode, but it's unclear whether ROCM behaves consistently with CUDA. Additionally, based on the graph mode determination logic for each platform, backends other than CUDA and ROCM should not require additional handling. If my understanding is incorrect, please feel free to point it out.

CC @LucasWilkinson @FENP @zhenwenqi2024

gemini-code-assist

Code Review

This pull request refactors the CUDAGraph compatibility checks for Prefill Context Parallelism (PCP) and Decode Context Parallelism (DCP) by moving them from the generic VllmConfig to platform-specific check_and_update_config functions in cuda.py and rocm.py. This is a good architectural improvement, as it correctly isolates platform-specific logic. However, this change introduces code duplication between the CUDA and ROCm platform files. I've left comments suggesting to refactor the duplicated logic into a shared helper function to improve maintainability.

vllm/platforms/cuda.py

vllm/platforms/rocm.py

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>

LucasWilkinson

LGTM; we should really try to turn on DCP with full-CGs on CUDA; do you know what the blocker is here?

pisceskkk · 2025-12-04T10:35:33Z

Thanks for review!

we should really try to turn on DCP with full-CGs on CUDA; do you know what the blocker is here?

I believe the DCP solution itself inherently supports full-CGs. The precision issues may be caused by certain buffers not being correctly persisted.

pisceskkk requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tjtanaa, tlrmchlsmth, yewentao256 and youkaichao as code owners December 3, 2025 08:19

mergify bot added nvidia rocm Related to AMD ROCm labels Dec 3, 2025

github-project-automation bot added this to NVIDIA Dec 3, 2025

gemini-code-assist bot reviewed Dec 3, 2025

View reviewed changes

vllm/platforms/cuda.py Show resolved Hide resolved

vllm/platforms/rocm.py Show resolved Hide resolved

move CUDAGraph check for PCP&DCP to the check func of platforms

4ffa8e7

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>

pisceskkk force-pushed the cudagraph-fix branch from cc7ad31 to 4ffa8e7 Compare December 4, 2025 02:39

LucasWilkinson approved these changes Dec 4, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Dec 4, 2025

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 4, 2025

Merge branch 'main' into cudagraph-fix

639abd9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms #29952

[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms #29952

pisceskkk commented Dec 3, 2025 •

edited by github-actions bot

Loading

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

LucasWilkinson left a comment

pisceskkk commented Dec 4, 2025

Labels

3 participants

Uh oh!

[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms #29952

Are you sure you want to change the base?

[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms #29952

Conversation

pisceskkk commented Dec 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

pisceskkk commented Dec 4, 2025

Labels

3 participants

pisceskkk commented Dec 3, 2025 •

edited by github-actions bot

Loading