Skip to content

Conversation

@pisceskkk
Copy link
Contributor

@pisceskkk pisceskkk commented Dec 3, 2025

Since NPU already provides FullGraph support for both PCP and DCP, I believe we should relocate the graph support determination logic from the generic code to platform-specific judgment functions.
Currently, CUDA only supports PIECEWISE mode, but it's unclear whether ROCM behaves consistently with CUDA. Additionally, based on the graph mode determination logic for each platform, backends other than CUDA and ROCM should not require additional handling. If my understanding is incorrect, please feel free to point it out.

CC @LucasWilkinson @FENP @zhenwenqi2024

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the CUDAGraph compatibility checks for Prefill Context Parallelism (PCP) and Decode Context Parallelism (DCP) by moving them from the generic VllmConfig to platform-specific check_and_update_config functions in cuda.py and rocm.py. This is a good architectural improvement, as it correctly isolates platform-specific logic. However, this change introduces code duplication between the CUDA and ROCm platform files. I've left comments suggesting to refactor the duplicated logic into a shared helper function to improve maintainability.

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; we should really try to turn on DCP with full-CGs on CUDA; do you know what the blocker is here?

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Dec 4, 2025
@LucasWilkinson LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 4, 2025
@pisceskkk
Copy link
Contributor Author

Thanks for review!

we should really try to turn on DCP with full-CGs on CUDA; do you know what the blocker is here?

I believe the DCP solution itself inherently supports full-CGs. The precision issues may be caused by certain buffers not being correctly persisted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

3 participants