A unified API for the MNNVL and single-node AllReduce kernels. #2130

nvmbreughe · 2025-11-21T23:08:11Z

📌 Description

A unified API for the MNNVL and single-node AllReduce kernels.

The backend will be chosen during workspace creation. We can either pick it explicitly, or use the "auto" backend to have a heuristic pick the best backend.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

coderabbitai · 2025-11-21T23:08:17Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

yzh119 · 2025-11-25T19:10:08Z

flashinfer/comm/allreduce.py

+ max_token_num: int,
+ hidden_dim: int,
+ dtype: torch.dtype,
+ topology: str,


Suggested change

topology: str,

topology: Literal["single_node", "multi_node"],

nvcastet · 2025-12-03T16:05:11Z

flashinfer/comm/allreduce.py

+ max_token_num: int = None,
+ hidden_dim: int = None,
+ dtype: torch.dtype = None,
+ topology: str = "single_node",


I don't think it is needed longer term since we will use the same pytorch symmetric API to allocate symmetric memory for single and multi-node (under the cover pytorch/NCCL/NVSHMEM will detect platform and decides the right mem allocation handle)

nvcastet · 2025-12-03T16:11:06Z

flashinfer/comm/allreduce.py

+ input: torch.Tensor,
+ workspace: AllReduceFusionWorkspace,
+ pattern: int,
+ launch_with_pdl: bool = False,


Why the advantage to give pdl control to the user?

nvcastet · 2025-12-03T16:12:13Z

flashinfer/comm/allreduce.py

+ Args:
+ input: Input tensor [token_num, hidden_dim]
+ workspace: Workspace object (type determines backend)
+ pattern: Fusion pattern (AllReduceFusionPattern constant, 0-5)


All they all 2-kernel overlap or some are real fusion kernels?

with one-shot mnnvl it's real fusion. And I think similar for the trtllm_ar kernels. It's just two-shot mnnvl that is the 2-kernel overlap.

nvcastet · 2025-12-03T16:24:21Z

flashinfer/comm/allreduce.py

+ },
+ heuristic_func=_workspace_creation_heuristic,
+)
+def create_allreduce_fusion_workspace(


Could create_allreduce_fusion_workspace take an optional workspace argument? If workspace is big enough or too big this is a noop (maybe just updating backend selection). If it is too small, destroy current workspace and allocate a bigger one.

When we switch to mem pool, we should be able to call create_allreduce_fusion_workspace at each forward pass and memory will just get reused from the mempool (instead of new allocations).
CC @Amir-19

nvcastet · 2025-12-03T16:25:52Z

flashinfer/comm/allreduce.py

+ - Workspace(max_token_num=2048, hidden_dim=4096) can handle:
+ - (token_num=2048, hidden_dim=4096) ✓
+ - (token_num=1024, hidden_dim=4096) ✓
+ - (token_num=4096, hidden_dim=2048) ✓ (same total size)


I only see FW adjusting the num of tokens but hidden_dim should be fixed per model.

nvcastet · 2025-12-03T16:26:53Z

flashinfer/comm/allreduce.py

+ ... max_token_num=2048,
+ ... hidden_dim=4096,
+ ... dtype=torch.bfloat16,
+ ... topology="single_node"


Could we had a check now to detect topology? before we switch to the mempool allocation?

nvmbreughe added 4 commits November 20, 2025 16:09

Added first non-working version

a4c9da2

Polished the interface

1917c76

Removed device param

9225e52

Updated test with legacy vs unified API

84e75e4

nvmbreughe added 4 commits November 24, 2025 09:18

Fixed unit test

3bb586b

Relaxed check on trtllm_ar

0bcc8da

Made metadata mandatory in unified API, added workspace check functions

0c1391d

Merged dtype and use_fp32_lamport params

b47ade4

yzh119 reviewed Nov 25, 2025

View reviewed changes

nvmbreughe added 3 commits December 1, 2025 13:29

removed useless function

2d00267

Moved in the helper functions, rejected some patterns for mnnvl

7001c92

Made fusion pattern param mandatory

e2fdea2

nvcastet reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A unified API for the MNNVL and single-node AllReduce kernels. #2130

A unified API for the MNNVL and single-node AllReduce kernels. #2130

nvmbreughe commented Nov 21, 2025 •

edited

Loading

coderabbitai bot commented Nov 21, 2025 •

edited

Loading

Review skipped

yzh119 Nov 25, 2025

nvcastet Dec 3, 2025

nvcastet Dec 3, 2025

nvcastet Dec 3, 2025

nvmbreughe Dec 3, 2025

nvcastet Dec 3, 2025

nvcastet Dec 3, 2025

nvcastet Dec 3, 2025

Labels

3 participants

	topology: str,
	topology: Literal["single_node", "multi_node"],

A unified API for the MNNVL and single-node AllReduce kernels. #2130

Are you sure you want to change the base?

A unified API for the MNNVL and single-node AllReduce kernels. #2130

Conversation

nvmbreughe commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

coderabbitai bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Labels

3 participants

nvmbreughe commented Nov 21, 2025 •

edited

Loading

coderabbitai bot commented Nov 21, 2025 •

edited

Loading