[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence loop/entry intrinsics. #166452

luciechoi · 2025-11-04T21:52:32Z

Fixes a bug #165642. Similar fix is being made in IndVarSimplify pass to account for convergence tokens.

LLVM Spec states that only a single loop / entry convergence token can be included in a basic block.

This PR fixes the issue in SimplifyCFG pass so that when a basic block and its predecessor both contain such convergence intrinsics, it skips merging the two blocks.

llvmbot · 2025-11-04T21:53:06Z

@llvm/pr-subscribers-llvm-transforms

Author: Lucie Choi (luciechoi)

Changes

Fixes a bug #165642

LLVM Spec states that only a single convergence intrinsic can be included in a basic block.

This PR fixes the issue in SimplifyCFG pass so that when a basic block and its predecessor both contain convergence intrinsics, it skips merging the two blocks.

Full diff: https://github.com/llvm/llvm-project/pull/166452.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+9)
(added) llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll (+68)

diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp index 11db0ec487328..c1b6140abb471 100644 --- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp +++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp @@ -230,6 +230,15 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU, // Don't break self-loops. if (PredBB == BB) return false; + // Don't break if both the basic block and the predecessor contain convergent + // intrinsics. + for (Instruction &I : *BB) + if (isa<ConvergenceControlInst>(I)) { + for (Instruction &I : *PredBB) + if (isa<ConvergenceControlInst>(I)) + return false; + } + // Don't break unwinding instructions or terminators with other side-effects. Instruction *PTI = PredBB->getTerminator(); if (PTI->isSpecialTerminator() || PTI->mayHaveSideEffects()) diff --git a/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll new file mode 100644 index 0000000000000..d5ae64f6897e3 --- /dev/null +++ b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll @@ -0,0 +1,68 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -S -passes=simplifycfg | FileCheck %s + +declare token @llvm.experimental.convergence.entry() #0 + +define void @nested(i32 %tidx, i32 %tidy, ptr %array) #0 { +; CHECK-LABEL: @nested( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = tail call token @llvm.experimental.convergence.entry() +; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[TIDY:%.*]], [[TIDX:%.*]] +; CHECK-NEXT: [[OR_COND_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; CHECK-NEXT: br label [[FOR_COND_I:%.*]] +; CHECK: for.cond.i: +; CHECK-NEXT: [[TMP2:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP0]]) ] +; CHECK-NEXT: br label [[FOR_COND1_I:%.*]] +; CHECK: for.cond1.i: +; CHECK-NEXT: [[CMP2_I:%.*]] = phi i1 [ false, [[FOR_BODY4_I:%.*]] ], [ true, [[FOR_COND_I]] ] +; CHECK-NEXT: [[TMP3:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP2]]) ] +; CHECK-NEXT: br i1 [[CMP2_I]], label [[FOR_BODY4_I]], label [[EXIT:%.*]] +; CHECK: for.body4.i: +; CHECK-NEXT: br i1 [[OR_COND_I]], label [[IF_THEN_I:%.*]], label [[FOR_COND1_I]] +; CHECK: if.then.i: +; CHECK-NEXT: [[HLSL_WAVE_ACTIVE_MAX7_I:%.*]] = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token [[TMP3]]) ] +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[ARRAY:%.*]], i32 0 +; CHECK-NEXT: store i32 [[HLSL_WAVE_ACTIVE_MAX7_I]], ptr [[TMP4]], align 4 +; CHECK-NEXT: br label [[EXIT]] +; CHECK: exit: +; CHECK-NEXT: ret void +; +entry: + %0 = tail call token @llvm.experimental.convergence.entry() + %2 = or i32 %tidy, %tidx + %or.cond.i = icmp eq i32 %2, 0 + br label %for.cond.i + +for.cond.i: + %3 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %0) ] + br label %for.cond1.i + +for.cond1.i: + %cmp2.i = phi i1 [ false, %for.body4.i ], [ true, %for.cond.i ] + %4 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %3) ] + br i1 %cmp2.i, label %for.body4.i, label %cleanup.i.loopexit + +for.body4.i: + br i1 %or.cond.i, label %if.then.i, label %for.cond1.i + +if.then.i: + %hlsl.wave.active.max7.i = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token %4) ] + %5 = getelementptr inbounds i32, ptr %array, i32 0 + store i32 %hlsl.wave.active.max7.i, ptr %5, align 4 + br label %cleanup.i + +cleanup.i.loopexit: + br label %cleanup.i + +cleanup.i: + br label %exit + +exit: + ret void +} + +declare token @llvm.experimental.convergence.loop() #0 + +declare i32 @llvm.spv.wave.reduce.umax.i32(i32) #0 + +attributes #0 = { convergent }

rnk · 2025-11-05T00:12:43Z

It seems bad that the convergence control LangRef rules make it illegal to merge basic blocks connected by a single direct branch. If this every happens, it's a sign that the loop is degenerate, i.e. it runs no more than once. Could the backends requiring convergence be taught to tolerate the possibility of multiple convergence control intrinsics, perhaps simply by running a simple cleanup pass that eliminates degenerate loops and replaces the loop token with the token used to create it? Does that transform preserve semantics, or can it change observable behavior?

It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information.

s-perron · 2025-11-05T20:47:40Z

It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information.

@ssahasra Do you know which backends use the convergence tokens? Do you have any thoughts on modifying the spec?

I think the SPIR-V backend could handle having two tokens defined in the same BB as long as the second is a loop that uses the first. We just have to do a type of copy propagation on it.

However, I'm not convinced many transforms will benefit from the rule change. We still need any transform that replicates code to be skipped if it replicates the token. How many transformations merge two block without replicating code? We will still need to add a legality check to other passes.

Keenuts

an initial comment regarding the test file (waiting to see if the discussion over the change makes progress)

llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll

luciechoi · 2025-11-18T20:00:25Z

@rnk @s-perron @Keenuts friendly ping for review, thanks!

github-actions · 2025-11-18T20:30:01Z

🐧 Linux x64 Test Results

186626 tests passed
4889 tests skipped

s-perron · 2025-11-19T15:18:05Z

llvm/lib/Transforms/Utils/BasicBlockUtils.cpp

+ for (Instruction &I : *BB)
+ if (isa<ConvergenceControlInst>(I)) {
+ for (Instruction &I : *PredBB)
+ if (isa<ConvergenceControlInst>(I))
+ return false;
+ }


This is a move expensive check because it has to traverse potentially two basic blocks. We should move it later. I would make it the last check before deciding to merge.

Also the way this is written is a little unclear. First if we know that there can be at most one convergence token per basic block, we could make this two separate checks and take advantage of short-circuit evaluation:

bool HasConvergenceToken(const BasicBlock *BB) { for (const Instruction &I : *BB) if (isa<ConvergenceControlInst>(I)) return true; return false; }

Then the check becomes:

if (HasConvergenceToken(BB) && HasConvervenceToken(PredBB)) return false;

Thanks, updated. Also modified the condition to check for the convergence token kind, as it seems the two anchor tokens are allowed in the same block.

llvm-ci · 2025-11-29T23:28:09Z

LLVM Buildbot has detected a new failure on builder ppc64le-flang-rhel-clang running on ppc64le-flang-rhel-test while building llvm at step 6 "test-build-unified-tree-check-flang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/157/builds/42628

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-flang) failure: 1200 seconds without output running [b'ninja', b'check-flang'], attempting to kill ... PASS: Flang :: Semantics/synchronization04b.f90 (4021 of 4031) PASS: Flang :: Transforms/stack-arrays-hlfir.f90 (4022 of 4031) PASS: Flang :: Transforms/OpenACC/acc-implicit-data-fortran.F90 (4023 of 4031) PASS: Flang :: Semantics/widening.f90 (4024 of 4031) PASS: Flang :: Semantics/unsigned-errors.f90 (4025 of 4031) PASS: Flang :: Transforms/simplify-fir-operations.fir (4026 of 4031) PASS: Flang :: Transforms/tbaa-local-alloc-threshold.fir (4027 of 4031) PASS: Flang :: Transforms/stack-arrays.f90 (4028 of 4031) PASS: Flang :: Driver/omp-driver-offload.f90 (4029 of 4031) PASS: Flang :: Intrinsics/math-codegen.fir (4030 of 4031) command timed out: 1200 seconds without output running [b'ninja', b'check-flang'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=3012.969949

… contain convergence loop/entry intrinsics. (llvm#166452) Fixes a bug llvm#165642. [Similar fix](llvm#165643) is being made in `IndVarSimplify` pass to account for convergence tokens. [LLVM Spec](https://llvm.org/docs/ConvergentOperations.html#llvm-experimental-convergence-loop) states that only a single loop / entry convergence token can be included in a basic block. This PR fixes the issue in `SimplifyCFG` pass so that when a basic block and its predecessor both contain such convergence intrinsics, it skips merging the two blocks.

llvmbot added the llvm:transforms label Nov 4, 2025

luciechoi requested review from Keenuts and s-perron November 4, 2025 21:53

luciechoi requested a review from rnk November 4, 2025 21:53

Keenuts reviewed Nov 12, 2025

View reviewed changes

llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll Outdated Show resolved Hide resolved

luciechoi force-pushed the cfgsimplify branch from 6228972 to 0cad7f0 Compare November 18, 2025 19:59

Keenuts approved these changes Nov 19, 2025

View reviewed changes

s-perron reviewed Nov 19, 2025

View reviewed changes

luciechoi added 3 commits November 26, 2025 20:34

Fix simplifycfg pass

98758e5

Remove hlsl&spv specific functions

72d5693

Change the conditon to check for convergence token kind

0117763

luciechoi force-pushed the cfgsimplify branch from 9047980 to 0117763 Compare November 26, 2025 21:30

Merge branch 'main' into cfgsimplify

0d32bec

luciechoi changed the title ~~[SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence intrinsics.~~ [SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence loop/entry intrinsics. Nov 26, 2025

s-perron approved these changes Nov 28, 2025

View reviewed changes

luciechoi merged commit 9ffd2e4 into llvm:main Nov 29, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence loop/entry intrinsics. #166452

[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence loop/entry intrinsics. #166452

luciechoi commented Nov 4, 2025 •

edited

Loading

llvmbot commented Nov 4, 2025

rnk commented Nov 5, 2025

s-perron commented Nov 5, 2025

Keenuts left a comment

Uh oh!

luciechoi commented Nov 18, 2025

github-actions bot commented Nov 18, 2025 •

edited

Loading

s-perron Nov 19, 2025

luciechoi Nov 26, 2025

Uh oh!

llvm-ci commented Nov 29, 2025

Labels

6 participants

[SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence loop/entry intrinsics. #166452

[SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence loop/entry intrinsics. #166452

Conversation

luciechoi commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llvmbot commented Nov 4, 2025

rnk commented Nov 5, 2025

s-perron commented Nov 5, 2025

Keenuts left a comment

Choose a reason for hiding this comment

Uh oh!

luciechoi commented Nov 18, 2025

github-actions bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

s-perron Nov 19, 2025

Choose a reason for hiding this comment

luciechoi Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

llvm-ci commented Nov 29, 2025

Labels

6 participants

[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence loop/entry intrinsics. #166452

[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence loop/entry intrinsics. #166452

luciechoi commented Nov 4, 2025 •

edited

Loading

github-actions bot commented Nov 18, 2025 •

edited

Loading