- Notifications
You must be signed in to change notification settings - Fork 15.3k
[SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence loop/entry intrinsics. #166452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @llvm/pr-subscribers-llvm-transforms Author: Lucie Choi (luciechoi) ChangesFixes a bug #165642 LLVM Spec states that only a single convergence intrinsic can be included in a basic block. This PR fixes the issue in Full diff: https://github.com/llvm/llvm-project/pull/166452.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp index 11db0ec487328..c1b6140abb471 100644 --- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp +++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp @@ -230,6 +230,15 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU, // Don't break self-loops. if (PredBB == BB) return false; + // Don't break if both the basic block and the predecessor contain convergent + // intrinsics. + for (Instruction &I : *BB) + if (isa<ConvergenceControlInst>(I)) { + for (Instruction &I : *PredBB) + if (isa<ConvergenceControlInst>(I)) + return false; + } + // Don't break unwinding instructions or terminators with other side-effects. Instruction *PTI = PredBB->getTerminator(); if (PTI->isSpecialTerminator() || PTI->mayHaveSideEffects()) diff --git a/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll new file mode 100644 index 0000000000000..d5ae64f6897e3 --- /dev/null +++ b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll @@ -0,0 +1,68 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -S -passes=simplifycfg | FileCheck %s + +declare token @llvm.experimental.convergence.entry() #0 + +define void @nested(i32 %tidx, i32 %tidy, ptr %array) #0 { +; CHECK-LABEL: @nested( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = tail call token @llvm.experimental.convergence.entry() +; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[TIDY:%.*]], [[TIDX:%.*]] +; CHECK-NEXT: [[OR_COND_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; CHECK-NEXT: br label [[FOR_COND_I:%.*]] +; CHECK: for.cond.i: +; CHECK-NEXT: [[TMP2:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP0]]) ] +; CHECK-NEXT: br label [[FOR_COND1_I:%.*]] +; CHECK: for.cond1.i: +; CHECK-NEXT: [[CMP2_I:%.*]] = phi i1 [ false, [[FOR_BODY4_I:%.*]] ], [ true, [[FOR_COND_I]] ] +; CHECK-NEXT: [[TMP3:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP2]]) ] +; CHECK-NEXT: br i1 [[CMP2_I]], label [[FOR_BODY4_I]], label [[EXIT:%.*]] +; CHECK: for.body4.i: +; CHECK-NEXT: br i1 [[OR_COND_I]], label [[IF_THEN_I:%.*]], label [[FOR_COND1_I]] +; CHECK: if.then.i: +; CHECK-NEXT: [[HLSL_WAVE_ACTIVE_MAX7_I:%.*]] = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token [[TMP3]]) ] +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[ARRAY:%.*]], i32 0 +; CHECK-NEXT: store i32 [[HLSL_WAVE_ACTIVE_MAX7_I]], ptr [[TMP4]], align 4 +; CHECK-NEXT: br label [[EXIT]] +; CHECK: exit: +; CHECK-NEXT: ret void +; +entry: + %0 = tail call token @llvm.experimental.convergence.entry() + %2 = or i32 %tidy, %tidx + %or.cond.i = icmp eq i32 %2, 0 + br label %for.cond.i + +for.cond.i: + %3 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %0) ] + br label %for.cond1.i + +for.cond1.i: + %cmp2.i = phi i1 [ false, %for.body4.i ], [ true, %for.cond.i ] + %4 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %3) ] + br i1 %cmp2.i, label %for.body4.i, label %cleanup.i.loopexit + +for.body4.i: + br i1 %or.cond.i, label %if.then.i, label %for.cond1.i + +if.then.i: + %hlsl.wave.active.max7.i = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token %4) ] + %5 = getelementptr inbounds i32, ptr %array, i32 0 + store i32 %hlsl.wave.active.max7.i, ptr %5, align 4 + br label %cleanup.i + +cleanup.i.loopexit: + br label %cleanup.i + +cleanup.i: + br label %exit + +exit: + ret void +} + +declare token @llvm.experimental.convergence.loop() #0 + +declare i32 @llvm.spv.wave.reduce.umax.i32(i32) #0 + +attributes #0 = { convergent } |
| It seems bad that the convergence control LangRef rules make it illegal to merge basic blocks connected by a single direct branch. If this every happens, it's a sign that the loop is degenerate, i.e. it runs no more than once. Could the backends requiring convergence be taught to tolerate the possibility of multiple convergence control intrinsics, perhaps simply by running a simple cleanup pass that eliminates degenerate loops and replaces the loop token with the token used to create it? Does that transform preserve semantics, or can it change observable behavior? It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information. |
@ssahasra Do you know which backends use the convergence tokens? Do you have any thoughts on modifying the spec? I think the SPIR-V backend could handle having two tokens defined in the same BB as long as the second is a loop that uses the first. We just have to do a type of copy propagation on it. However, I'm not convinced many transforms will benefit from the rule change. We still need any transform that replicates code to be skipped if it replicates the token. How many transformations merge two block without replicating code? We will still need to add a legality check to other passes. |
Keenuts left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an initial comment regarding the test file (waiting to see if the discussion over the change makes progress)
llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll Outdated Show resolved Hide resolved
6228972 to 0cad7f0 Compare 🐧 Linux x64 Test Results
|
| for (Instruction &I : *BB) | ||
| if (isa<ConvergenceControlInst>(I)) { | ||
| for (Instruction &I : *PredBB) | ||
| if (isa<ConvergenceControlInst>(I)) | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a move expensive check because it has to traverse potentially two basic blocks. We should move it later. I would make it the last check before deciding to merge.
Also the way this is written is a little unclear. First if we know that there can be at most one convergence token per basic block, we could make this two separate checks and take advantage of short-circuit evaluation:
bool HasConvergenceToken(const BasicBlock *BB) { for (const Instruction &I : *BB) if (isa<ConvergenceControlInst>(I)) return true; return false; } Then the check becomes:
if (HasConvergenceToken(BB) && HasConvervenceToken(PredBB)) return false; There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated. Also modified the condition to check for the convergence token kind, as it seems the two anchor tokens are allowed in the same block.
9047980 to 0117763 Compare SimplifyCFG pass to skip folding when both blocks contain convergence intrinsics.SimplifyCFG pass to skip folding when both blocks contain convergence loop/entry intrinsics. | LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/157/builds/42628 Here is the relevant piece of the build log for the reference |
… contain convergence loop/entry intrinsics. (llvm#166452) Fixes a bug llvm#165642. [Similar fix](llvm#165643) is being made in `IndVarSimplify` pass to account for convergence tokens. [LLVM Spec](https://llvm.org/docs/ConvergentOperations.html#llvm-experimental-convergence-loop) states that only a single loop / entry convergence token can be included in a basic block. This PR fixes the issue in `SimplifyCFG` pass so that when a basic block and its predecessor both contain such convergence intrinsics, it skips merging the two blocks.
Fixes a bug #165642. Similar fix is being made in
IndVarSimplifypass to account for convergence tokens.LLVM Spec states that only a single loop / entry convergence token can be included in a basic block.
This PR fixes the issue in
SimplifyCFGpass so that when a basic block and its predecessor both contain such convergence intrinsics, it skips merging the two blocks.