Skip to content

Conversation

@luciechoi
Copy link
Contributor

@luciechoi luciechoi commented Nov 4, 2025

Fixes a bug #165642. Similar fix is being made in IndVarSimplify pass to account for convergence tokens.

LLVM Spec states that only a single loop / entry convergence token can be included in a basic block.

This PR fixes the issue in SimplifyCFG pass so that when a basic block and its predecessor both contain such convergence intrinsics, it skips merging the two blocks.

@llvmbot
Copy link
Member

llvmbot commented Nov 4, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Lucie Choi (luciechoi)

Changes

Fixes a bug #165642

LLVM Spec states that only a single convergence intrinsic can be included in a basic block.

This PR fixes the issue in SimplifyCFG pass so that when a basic block and its predecessor both contain convergence intrinsics, it skips merging the two blocks.


Full diff: https://github.com/llvm/llvm-project/pull/166452.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+9)
  • (added) llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll (+68)
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp index 11db0ec487328..c1b6140abb471 100644 --- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp +++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp @@ -230,6 +230,15 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU, // Don't break self-loops. if (PredBB == BB) return false; + // Don't break if both the basic block and the predecessor contain convergent + // intrinsics. + for (Instruction &I : *BB) + if (isa<ConvergenceControlInst>(I)) { + for (Instruction &I : *PredBB) + if (isa<ConvergenceControlInst>(I)) + return false; + } + // Don't break unwinding instructions or terminators with other side-effects. Instruction *PTI = PredBB->getTerminator(); if (PTI->isSpecialTerminator() || PTI->mayHaveSideEffects()) diff --git a/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll new file mode 100644 index 0000000000000..d5ae64f6897e3 --- /dev/null +++ b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll @@ -0,0 +1,68 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -S -passes=simplifycfg | FileCheck %s + +declare token @llvm.experimental.convergence.entry() #0 + +define void @nested(i32 %tidx, i32 %tidy, ptr %array) #0 { +; CHECK-LABEL: @nested( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = tail call token @llvm.experimental.convergence.entry() +; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[TIDY:%.*]], [[TIDX:%.*]] +; CHECK-NEXT: [[OR_COND_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; CHECK-NEXT: br label [[FOR_COND_I:%.*]] +; CHECK: for.cond.i: +; CHECK-NEXT: [[TMP2:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP0]]) ] +; CHECK-NEXT: br label [[FOR_COND1_I:%.*]] +; CHECK: for.cond1.i: +; CHECK-NEXT: [[CMP2_I:%.*]] = phi i1 [ false, [[FOR_BODY4_I:%.*]] ], [ true, [[FOR_COND_I]] ] +; CHECK-NEXT: [[TMP3:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP2]]) ] +; CHECK-NEXT: br i1 [[CMP2_I]], label [[FOR_BODY4_I]], label [[EXIT:%.*]] +; CHECK: for.body4.i: +; CHECK-NEXT: br i1 [[OR_COND_I]], label [[IF_THEN_I:%.*]], label [[FOR_COND1_I]] +; CHECK: if.then.i: +; CHECK-NEXT: [[HLSL_WAVE_ACTIVE_MAX7_I:%.*]] = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token [[TMP3]]) ] +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[ARRAY:%.*]], i32 0 +; CHECK-NEXT: store i32 [[HLSL_WAVE_ACTIVE_MAX7_I]], ptr [[TMP4]], align 4 +; CHECK-NEXT: br label [[EXIT]] +; CHECK: exit: +; CHECK-NEXT: ret void +; +entry: + %0 = tail call token @llvm.experimental.convergence.entry() + %2 = or i32 %tidy, %tidx + %or.cond.i = icmp eq i32 %2, 0 + br label %for.cond.i + +for.cond.i: + %3 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %0) ] + br label %for.cond1.i + +for.cond1.i: + %cmp2.i = phi i1 [ false, %for.body4.i ], [ true, %for.cond.i ] + %4 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %3) ] + br i1 %cmp2.i, label %for.body4.i, label %cleanup.i.loopexit + +for.body4.i: + br i1 %or.cond.i, label %if.then.i, label %for.cond1.i + +if.then.i: + %hlsl.wave.active.max7.i = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token %4) ] + %5 = getelementptr inbounds i32, ptr %array, i32 0 + store i32 %hlsl.wave.active.max7.i, ptr %5, align 4 + br label %cleanup.i + +cleanup.i.loopexit: + br label %cleanup.i + +cleanup.i: + br label %exit + +exit: + ret void +} + +declare token @llvm.experimental.convergence.loop() #0 + +declare i32 @llvm.spv.wave.reduce.umax.i32(i32) #0 + +attributes #0 = { convergent } 
@luciechoi luciechoi requested a review from rnk November 4, 2025 21:53
@rnk
Copy link
Collaborator

rnk commented Nov 5, 2025

It seems bad that the convergence control LangRef rules make it illegal to merge basic blocks connected by a single direct branch. If this every happens, it's a sign that the loop is degenerate, i.e. it runs no more than once. Could the backends requiring convergence be taught to tolerate the possibility of multiple convergence control intrinsics, perhaps simply by running a simple cleanup pass that eliminates degenerate loops and replaces the loop token with the token used to create it? Does that transform preserve semantics, or can it change observable behavior?

It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information.

@s-perron
Copy link
Contributor

s-perron commented Nov 5, 2025

It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information.

@ssahasra Do you know which backends use the convergence tokens? Do you have any thoughts on modifying the spec?

I think the SPIR-V backend could handle having two tokens defined in the same BB as long as the second is a loop that uses the first. We just have to do a type of copy propagation on it.

However, I'm not convinced many transforms will benefit from the rule change. We still need any transform that replicates code to be skipped if it replicates the token. How many transformations merge two block without replicating code? We will still need to add a legality check to other passes.

Copy link
Contributor

@Keenuts Keenuts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an initial comment regarding the test file (waiting to see if the discussion over the change makes progress)

@luciechoi
Copy link
Contributor Author

@rnk @s-perron @Keenuts friendly ping for review, thanks!

@github-actions
Copy link

github-actions bot commented Nov 18, 2025

🐧 Linux x64 Test Results

  • 186626 tests passed
  • 4889 tests skipped
Comment on lines 235 to 240
for (Instruction &I : *BB)
if (isa<ConvergenceControlInst>(I)) {
for (Instruction &I : *PredBB)
if (isa<ConvergenceControlInst>(I))
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a move expensive check because it has to traverse potentially two basic blocks. We should move it later. I would make it the last check before deciding to merge.

Also the way this is written is a little unclear. First if we know that there can be at most one convergence token per basic block, we could make this two separate checks and take advantage of short-circuit evaluation:

bool HasConvergenceToken(const BasicBlock *BB) { for (const Instruction &I : *BB) if (isa<ConvergenceControlInst>(I)) return true; return false; } 

Then the check becomes:

if (HasConvergenceToken(BB) && HasConvervenceToken(PredBB)) return false; 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated. Also modified the condition to check for the convergence token kind, as it seems the two anchor tokens are allowed in the same block.

@luciechoi luciechoi changed the title [SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence intrinsics. [SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence loop/entry intrinsics. Nov 26, 2025
@luciechoi luciechoi merged commit 9ffd2e4 into llvm:main Nov 29, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 29, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-flang-rhel-clang running on ppc64le-flang-rhel-test while building llvm at step 6 "test-build-unified-tree-check-flang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/157/builds/42628

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-flang) failure: 1200 seconds without output running [b'ninja', b'check-flang'], attempting to kill ... PASS: Flang :: Semantics/synchronization04b.f90 (4021 of 4031) PASS: Flang :: Transforms/stack-arrays-hlfir.f90 (4022 of 4031) PASS: Flang :: Transforms/OpenACC/acc-implicit-data-fortran.F90 (4023 of 4031) PASS: Flang :: Semantics/widening.f90 (4024 of 4031) PASS: Flang :: Semantics/unsigned-errors.f90 (4025 of 4031) PASS: Flang :: Transforms/simplify-fir-operations.fir (4026 of 4031) PASS: Flang :: Transforms/tbaa-local-alloc-threshold.fir (4027 of 4031) PASS: Flang :: Transforms/stack-arrays.f90 (4028 of 4031) PASS: Flang :: Driver/omp-driver-offload.f90 (4029 of 4031) PASS: Flang :: Intrinsics/math-codegen.fir (4030 of 4031) command timed out: 1200 seconds without output running [b'ninja', b'check-flang'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=3012.969949 
aahrun pushed a commit to aahrun/llvm-project that referenced this pull request Dec 1, 2025
… contain convergence loop/entry intrinsics. (llvm#166452) Fixes a bug llvm#165642. [Similar fix](llvm#165643) is being made in `IndVarSimplify` pass to account for convergence tokens. [LLVM Spec](https://llvm.org/docs/ConvergentOperations.html#llvm-experimental-convergence-loop) states that only a single loop / entry convergence token can be included in a basic block. This PR fixes the issue in `SimplifyCFG` pass so that when a basic block and its predecessor both contain such convergence intrinsics, it skips merging the two blocks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

6 participants