Skip to content

Conversation

@frederik-h
Copy link
Contributor

@frederik-h frederik-h commented Sep 15, 2025

Extend the existing "scalarize" function which is used for the
fp-integer conversion instruction expansion to BinaryOperator instructions
and reuse it for the frem expansion; a similar function for scalarizing BinaryOperator
instructions exists in the ExpandLargeDivRem pass and this change is a step towards
merging that pass with ExpandFp.

Further refactoring: Scalarize directly instead of using the "ReplaceVector" as a worklist, rename "Replace" vector to "Worklist", and hoist a check for unsupported scalable vectors to the top of the instruction visiting loop.

Extend the existing "scalarize" function which is used for the fp-integer conversion instruction expansion to BinaryOperator instructions and reuse it for the frem expansion. Furthermore, extract a function to dispatch instructions to the scalar and vector queues and hoist a check for scalable vectors to the top of the instruction visiting loop.
@llvmbot
Copy link
Member

llvmbot commented Sep 15, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-globalisel

Author: Frederik Harwath (frederik-h)

Changes

Extend the existing "scalarize" function which is used for the
fp-integer conversion instruction expansion to BinaryOperator instructions
and reuse it for the frem expansion; a similar function for scalarizing BinaryOperator
instructions exists in the ExpandLargeDivRem pass and this change is a step towards
merging that pass with ExpandFp.

Further refactoring: Extract a function to dispatch instructions to the scalar and
vector queues and hoist a check for scalable vectors to the top of the
instruction visiting loop.


Patch is 383.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/158588.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/ExpandFp.cpp (+45-65)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll (+140-140)
  • (modified) llvm/test/CodeGen/AMDGPU/frem.ll (+650-650)
  • (modified) llvm/test/Transforms/ExpandFp/AMDGPU/frem.ll (+140-140)
diff --git a/llvm/lib/CodeGen/ExpandFp.cpp b/llvm/lib/CodeGen/ExpandFp.cpp index 9cc6c6a706c58..e336da7b914f0 100644 --- a/llvm/lib/CodeGen/ExpandFp.cpp +++ b/llvm/lib/CodeGen/ExpandFp.cpp @@ -356,8 +356,9 @@ Value *FRemExpander::buildFRem(Value *X, Value *Y, static bool expandFRem(BinaryOperator &I, std::optional<SimplifyQuery> &SQ) { LLVM_DEBUG(dbgs() << "Expanding instruction: " << I << '\n'); - Type *ReturnTy = I.getType(); - assert(FRemExpander::canExpandType(ReturnTy->getScalarType())); + Type *Ty = I.getType(); + assert(Ty->isFloatingPointTy() && "Instruction should have been scalarized"); + assert(FRemExpander::canExpandType(Ty)); FastMathFlags FMF = I.getFastMathFlags(); // TODO Make use of those flags for optimization? @@ -368,32 +369,10 @@ static bool expandFRem(BinaryOperator &I, std::optional<SimplifyQuery> &SQ) { B.setFastMathFlags(FMF); B.SetCurrentDebugLocation(I.getDebugLoc()); - Type *ElemTy = ReturnTy->getScalarType(); - const FRemExpander Expander = FRemExpander::create(B, ElemTy); - - Value *Ret; - if (ReturnTy->isFloatingPointTy()) - Ret = FMF.approxFunc() - ? Expander.buildApproxFRem(I.getOperand(0), I.getOperand(1)) - : Expander.buildFRem(I.getOperand(0), I.getOperand(1), SQ); - else { - auto *VecTy = cast<FixedVectorType>(ReturnTy); - - // This could use SplitBlockAndInsertForEachLane but the interface - // is a bit awkward for a constant number of elements and it will - // boil down to the same code. - // TODO Expand the FRem instruction only once and reuse the code. - Value *Nums = I.getOperand(0); - Value *Denums = I.getOperand(1); - Ret = PoisonValue::get(I.getType()); - for (int I = 0, E = VecTy->getNumElements(); I != E; ++I) { - Value *Num = B.CreateExtractElement(Nums, I); - Value *Denum = B.CreateExtractElement(Denums, I); - Value *Rem = FMF.approxFunc() ? Expander.buildApproxFRem(Num, Denum) - : Expander.buildFRem(Num, Denum, SQ); - Ret = B.CreateInsertElement(Ret, Rem, I); - } - } + const FRemExpander Expander = FRemExpander::create(B, Ty); + Value *Ret = FMF.approxFunc() + ? Expander.buildApproxFRem(I.getOperand(0), I.getOperand(1)) + : Expander.buildFRem(I.getOperand(0), I.getOperand(1), SQ); I.replaceAllUsesWith(Ret); Ret->takeName(&I); @@ -948,12 +927,21 @@ static void scalarize(Instruction *I, SmallVectorImpl<Instruction *> &Replace) { Value *Result = PoisonValue::get(VTy); for (unsigned Idx = 0; Idx < NumElements; ++Idx) { Value *Ext = Builder.CreateExtractElement(I->getOperand(0), Idx); - Value *Cast = Builder.CreateCast(cast<CastInst>(I)->getOpcode(), Ext, - I->getType()->getScalarType()); - Result = Builder.CreateInsertElement(Result, Cast, Idx); - if (isa<Instruction>(Cast)) - Replace.push_back(cast<Instruction>(Cast)); + Value *Op; + if (isa<BinaryOperator>(I)) + Op = Builder.CreateBinOp( + cast<BinaryOperator>(I)->getOpcode(), Ext, + Builder.CreateExtractElement(I->getOperand(1), Idx)); + else + Op = Builder.CreateCast(cast<CastInst>(I)->getOpcode(), Ext, + I->getType()->getScalarType()); + Result = Builder.CreateInsertElement(Result, Op, Idx); + if (auto *ScalarizedI = dyn_cast<Instruction>(Op)) { + ScalarizedI->copyIRFlags(I, true); + Replace.push_back(ScalarizedI); + } } + I->replaceAllUsesWith(Result); I->dropAllReferences(); I->eraseFromParent(); @@ -989,6 +977,16 @@ static bool targetSupportsFrem(const TargetLowering &TLI, Type *Ty) { return TLI.getLibcallName(fremToLibcall(Ty->getScalarType())); } +static void enqueueInstruction(Instruction &I, + SmallVector<Instruction *, 4> &Replace, + SmallVector<Instruction *, 4> &ReplaceVector) { + + if (I.getOperand(0)->getType()->isVectorTy()) + ReplaceVector.push_back(&I); + else + Replace.push_back(&I); +} + static bool runImpl(Function &F, const TargetLowering &TLI, AssumptionCache *AC) { SmallVector<Instruction *, 4> Replace; @@ -1004,55 +1002,37 @@ static bool runImpl(Function &F, const TargetLowering &TLI, return false; for (auto &I : instructions(F)) { - switch (I.getOpcode()) { - case Instruction::FRem: { - Type *Ty = I.getType(); - // TODO: This pass doesn't handle scalable vectors. - if (Ty->isScalableTy()) - continue; - - if (targetSupportsFrem(TLI, Ty) || - !FRemExpander::canExpandType(Ty->getScalarType())) - continue; - - Replace.push_back(&I); - Modified = true; + Type *Ty = I.getType(); + // TODO: This pass doesn't handle scalable vectors. + if (Ty->isScalableTy()) + continue; + switch (I.getOpcode()) { + case Instruction::FRem: + if (!targetSupportsFrem(TLI, Ty) && + FRemExpander::canExpandType(Ty->getScalarType())) { + enqueueInstruction(I, Replace, ReplaceVector); + Modified = true; + } break; - } case Instruction::FPToUI: case Instruction::FPToSI: { - // TODO: This pass doesn't handle scalable vectors. - if (I.getOperand(0)->getType()->isScalableTy()) - continue; - - auto *IntTy = cast<IntegerType>(I.getType()->getScalarType()); + auto *IntTy = cast<IntegerType>(Ty->getScalarType()); if (IntTy->getIntegerBitWidth() <= MaxLegalFpConvertBitWidth) continue; - if (I.getOperand(0)->getType()->isVectorTy()) - ReplaceVector.push_back(&I); - else - Replace.push_back(&I); + enqueueInstruction(I, Replace, ReplaceVector); Modified = true; break; } case Instruction::UIToFP: case Instruction::SIToFP: { - // TODO: This pass doesn't handle scalable vectors. - if (I.getOperand(0)->getType()->isScalableTy()) - continue; - auto *IntTy = cast<IntegerType>(I.getOperand(0)->getType()->getScalarType()); if (IntTy->getIntegerBitWidth() <= MaxLegalFpConvertBitWidth) continue; - if (I.getOperand(0)->getType()->isVectorTy()) - ReplaceVector.push_back(&I); - else - Replace.push_back(&I); - Modified = true; + enqueueInstruction(I, Replace, ReplaceVector); break; } default: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll index 302b2395642d0..c87cfbdfe87b7 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll @@ -1048,7 +1048,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cvt_f32_f16_e64 v1, |s1| ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v2, v1 ; CI-NEXT: s_cbranch_vccz .LBB9_2 -; CI-NEXT: ; %bb.1: ; %frem.else +; CI-NEXT: ; %bb.1: ; %frem.else20 ; CI-NEXT: s_and_b32 s2, s0, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v2, v1 ; CI-NEXT: v_mov_b32_e32 v0, s2 @@ -1059,7 +1059,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s2, s2, 1 ; CI-NEXT: s_cmp_lg_u32 s2, 0 ; CI-NEXT: s_cbranch_scc1 .LBB9_8 -; CI-NEXT: ; %bb.3: ; %frem.compute +; CI-NEXT: ; %bb.3: ; %frem.compute19 ; CI-NEXT: v_frexp_mant_f32_e32 v3, v1 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v6, v1 ; CI-NEXT: v_ldexp_f32_e64 v1, v3, 1 @@ -1084,10 +1084,10 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v2 ; CI-NEXT: v_div_fixup_f32 v3, v3, v1, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB9_6 -; CI-NEXT: ; %bb.4: ; %frem.loop_body.preheader +; CI-NEXT: ; %bb.4: ; %frem.loop_body27.preheader ; CI-NEXT: v_add_i32_e32 v2, vcc, 11, v5 ; CI-NEXT: v_sub_i32_e32 v2, vcc, v2, v6 -; CI-NEXT: .LBB9_5: ; %frem.loop_body +; CI-NEXT: .LBB9_5: ; %frem.loop_body27 ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v5, v4 ; CI-NEXT: v_mul_f32_e32 v4, v5, v3 @@ -1103,7 +1103,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB9_7 ; CI-NEXT: .LBB9_6: ; CI-NEXT: v_mov_b32_e32 v5, v4 -; CI-NEXT: .LBB9_7: ; %frem.loop_exit +; CI-NEXT: .LBB9_7: ; %frem.loop_exit28 ; CI-NEXT: v_add_i32_e32 v2, vcc, -10, v2 ; CI-NEXT: v_ldexp_f32_e32 v2, v5, v2 ; CI-NEXT: v_mul_f32_e32 v3, v2, v3 @@ -1126,7 +1126,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: ; implicit-def: $vgpr1 ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v3, v2 ; CI-NEXT: s_cbranch_vccz .LBB9_10 -; CI-NEXT: ; %bb.9: ; %frem.else20 +; CI-NEXT: ; %bb.9: ; %frem.else ; CI-NEXT: s_and_b32 s4, s2, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v3, v2 ; CI-NEXT: v_mov_b32_e32 v1, s4 @@ -1137,7 +1137,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s4, s4, 1 ; CI-NEXT: s_cmp_lg_u32 s4, 0 ; CI-NEXT: s_cbranch_scc1 .LBB9_16 -; CI-NEXT: ; %bb.11: ; %frem.compute19 +; CI-NEXT: ; %bb.11: ; %frem.compute ; CI-NEXT: v_frexp_mant_f32_e32 v4, v2 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v7, v2 ; CI-NEXT: v_ldexp_f32_e64 v2, v4, 1 @@ -1162,10 +1162,10 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v3 ; CI-NEXT: v_div_fixup_f32 v4, v4, v2, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB9_14 -; CI-NEXT: ; %bb.12: ; %frem.loop_body27.preheader +; CI-NEXT: ; %bb.12: ; %frem.loop_body.preheader ; CI-NEXT: v_add_i32_e32 v3, vcc, 11, v6 ; CI-NEXT: v_sub_i32_e32 v3, vcc, v3, v7 -; CI-NEXT: .LBB9_13: ; %frem.loop_body27 +; CI-NEXT: .LBB9_13: ; %frem.loop_body ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v6, v5 ; CI-NEXT: v_mul_f32_e32 v5, v6, v4 @@ -1181,7 +1181,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB9_15 ; CI-NEXT: .LBB9_14: ; CI-NEXT: v_mov_b32_e32 v6, v5 -; CI-NEXT: .LBB9_15: ; %frem.loop_exit28 +; CI-NEXT: .LBB9_15: ; %frem.loop_exit ; CI-NEXT: v_add_i32_e32 v3, vcc, -10, v3 ; CI-NEXT: v_ldexp_f32_e32 v3, v6, v3 ; CI-NEXT: v_mul_f32_e32 v4, v3, v4 @@ -1239,7 +1239,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: v_cvt_f32_f16_e64 v1, |s1| ; VI-NEXT: v_cmp_ngt_f32_e32 vcc, v2, v1 ; VI-NEXT: s_cbranch_vccz .LBB9_2 -; VI-NEXT: ; %bb.1: ; %frem.else +; VI-NEXT: ; %bb.1: ; %frem.else20 ; VI-NEXT: s_and_b32 s2, s0, 0x8000 ; VI-NEXT: v_cmp_eq_f32_e32 vcc, v2, v1 ; VI-NEXT: v_mov_b32_e32 v0, s2 @@ -1250,7 +1250,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: s_xor_b32 s2, s2, 1 ; VI-NEXT: s_cmp_lg_u32 s2, 0 ; VI-NEXT: s_cbranch_scc1 .LBB9_8 -; VI-NEXT: ; %bb.3: ; %frem.compute +; VI-NEXT: ; %bb.3: ; %frem.compute19 ; VI-NEXT: v_frexp_mant_f32_e32 v3, v1 ; VI-NEXT: v_frexp_exp_i32_f32_e32 v6, v1 ; VI-NEXT: v_ldexp_f32 v1, v3, 1 @@ -1275,10 +1275,10 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v2 ; VI-NEXT: v_div_fixup_f32 v3, v3, v1, 1.0 ; VI-NEXT: s_cbranch_vccnz .LBB9_6 -; VI-NEXT: ; %bb.4: ; %frem.loop_body.preheader +; VI-NEXT: ; %bb.4: ; %frem.loop_body27.preheader ; VI-NEXT: v_add_u32_e32 v2, vcc, 11, v5 ; VI-NEXT: v_sub_u32_e32 v2, vcc, v2, v6 -; VI-NEXT: .LBB9_5: ; %frem.loop_body +; VI-NEXT: .LBB9_5: ; %frem.loop_body27 ; VI-NEXT: ; =>This Inner Loop Header: Depth=1 ; VI-NEXT: v_mov_b32_e32 v5, v4 ; VI-NEXT: v_mul_f32_e32 v4, v5, v3 @@ -1294,7 +1294,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: s_branch .LBB9_7 ; VI-NEXT: .LBB9_6: ; VI-NEXT: v_mov_b32_e32 v5, v4 -; VI-NEXT: .LBB9_7: ; %frem.loop_exit +; VI-NEXT: .LBB9_7: ; %frem.loop_exit28 ; VI-NEXT: v_add_u32_e32 v2, vcc, -10, v2 ; VI-NEXT: v_ldexp_f32 v2, v5, v2 ; VI-NEXT: v_mul_f32_e32 v3, v2, v3 @@ -1317,7 +1317,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: ; implicit-def: $vgpr1 ; VI-NEXT: v_cmp_ngt_f32_e32 vcc, v3, v2 ; VI-NEXT: s_cbranch_vccz .LBB9_10 -; VI-NEXT: ; %bb.9: ; %frem.else20 +; VI-NEXT: ; %bb.9: ; %frem.else ; VI-NEXT: s_and_b32 s3, s4, 0x8000 ; VI-NEXT: v_cmp_eq_f32_e32 vcc, v3, v2 ; VI-NEXT: v_mov_b32_e32 v1, s3 @@ -1328,7 +1328,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: s_xor_b32 s3, s3, 1 ; VI-NEXT: s_cmp_lg_u32 s3, 0 ; VI-NEXT: s_cbranch_scc1 .LBB9_16 -; VI-NEXT: ; %bb.11: ; %frem.compute19 +; VI-NEXT: ; %bb.11: ; %frem.compute ; VI-NEXT: v_frexp_mant_f32_e32 v4, v2 ; VI-NEXT: v_frexp_exp_i32_f32_e32 v7, v2 ; VI-NEXT: v_ldexp_f32 v2, v4, 1 @@ -1353,10 +1353,10 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v3 ; VI-NEXT: v_div_fixup_f32 v4, v4, v2, 1.0 ; VI-NEXT: s_cbranch_vccnz .LBB9_14 -; VI-NEXT: ; %bb.12: ; %frem.loop_body27.preheader +; VI-NEXT: ; %bb.12: ; %frem.loop_body.preheader ; VI-NEXT: v_add_u32_e32 v3, vcc, 11, v6 ; VI-NEXT: v_sub_u32_e32 v3, vcc, v3, v7 -; VI-NEXT: .LBB9_13: ; %frem.loop_body27 +; VI-NEXT: .LBB9_13: ; %frem.loop_body ; VI-NEXT: ; =>This Inner Loop Header: Depth=1 ; VI-NEXT: v_mov_b32_e32 v6, v5 ; VI-NEXT: v_mul_f32_e32 v5, v6, v4 @@ -1372,7 +1372,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: s_branch .LBB9_15 ; VI-NEXT: .LBB9_14: ; VI-NEXT: v_mov_b32_e32 v6, v5 -; VI-NEXT: .LBB9_15: ; %frem.loop_exit28 +; VI-NEXT: .LBB9_15: ; %frem.loop_exit ; VI-NEXT: v_add_u32_e32 v3, vcc, -10, v3 ; VI-NEXT: v_ldexp_f32 v3, v6, v3 ; VI-NEXT: v_mul_f32_e32 v4, v3, v4 @@ -1427,7 +1427,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cvt_f32_f16_e64 v1, |s2| ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v2, v1 ; CI-NEXT: s_cbranch_vccz .LBB10_2 -; CI-NEXT: ; %bb.1: ; %frem.else +; CI-NEXT: ; %bb.1: ; %frem.else86 ; CI-NEXT: s_and_b32 s0, s4, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v2, v1 ; CI-NEXT: v_mov_b32_e32 v0, s0 @@ -1438,7 +1438,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s0, s0, 1 ; CI-NEXT: s_cmp_lg_u32 s0, 0 ; CI-NEXT: s_cbranch_scc1 .LBB10_8 -; CI-NEXT: ; %bb.3: ; %frem.compute +; CI-NEXT: ; %bb.3: ; %frem.compute85 ; CI-NEXT: v_frexp_mant_f32_e32 v3, v1 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v6, v1 ; CI-NEXT: v_ldexp_f32_e64 v1, v3, 1 @@ -1463,10 +1463,10 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v2 ; CI-NEXT: v_div_fixup_f32 v3, v3, v1, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB10_6 -; CI-NEXT: ; %bb.4: ; %frem.loop_body.preheader +; CI-NEXT: ; %bb.4: ; %frem.loop_body93.preheader ; CI-NEXT: v_add_i32_e32 v2, vcc, 11, v5 ; CI-NEXT: v_sub_i32_e32 v2, vcc, v2, v6 -; CI-NEXT: .LBB10_5: ; %frem.loop_body +; CI-NEXT: .LBB10_5: ; %frem.loop_body93 ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v5, v4 ; CI-NEXT: v_mul_f32_e32 v4, v5, v3 @@ -1482,7 +1482,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB10_7 ; CI-NEXT: .LBB10_6: ; CI-NEXT: v_mov_b32_e32 v5, v4 -; CI-NEXT: .LBB10_7: ; %frem.loop_exit +; CI-NEXT: .LBB10_7: ; %frem.loop_exit94 ; CI-NEXT: v_add_i32_e32 v2, vcc, -10, v2 ; CI-NEXT: v_ldexp_f32_e32 v2, v5, v2 ; CI-NEXT: v_mul_f32_e32 v3, v2, v3 @@ -1505,7 +1505,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: ; implicit-def: $vgpr1 ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v3, v2 ; CI-NEXT: s_cbranch_vccz .LBB10_10 -; CI-NEXT: ; %bb.9: ; %frem.else20 +; CI-NEXT: ; %bb.9: ; %frem.else53 ; CI-NEXT: s_and_b32 s1, s6, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v3, v2 ; CI-NEXT: v_mov_b32_e32 v1, s1 @@ -1516,7 +1516,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s1, s1, 1 ; CI-NEXT: s_cmp_lg_u32 s1, 0 ; CI-NEXT: s_cbranch_scc1 .LBB10_16 -; CI-NEXT: ; %bb.11: ; %frem.compute19 +; CI-NEXT: ; %bb.11: ; %frem.compute52 ; CI-NEXT: v_frexp_mant_f32_e32 v4, v2 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v7, v2 ; CI-NEXT: v_ldexp_f32_e64 v2, v4, 1 @@ -1541,10 +1541,10 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v3 ; CI-NEXT: v_div_fixup_f32 v4, v4, v2, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB10_14 -; CI-NEXT: ; %bb.12: ; %frem.loop_body27.preheader +; CI-NEXT: ; %bb.12: ; %frem.loop_body60.preheader ; CI-NEXT: v_add_i32_e32 v3, vcc, 11, v6 ; CI-NEXT: v_sub_i32_e32 v3, vcc, v3, v7 -; CI-NEXT: .LBB10_13: ; %frem.loop_body27 +; CI-NEXT: .LBB10_13: ; %frem.loop_body60 ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v6, v5 ; CI-NEXT: v_mul_f32_e32 v5, v6, v4 @@ -1560,7 +1560,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB10_15 ; CI-NEXT: .LBB10_14: ; CI-NEXT: v_mov_b32_e32 v6, v5 -; CI-NEXT: .LBB10_15: ; %frem.loop_exit28 +; CI-NEXT: .LBB10_15: ; %frem.loop_exit61 ; CI-NEXT: v_add_i32_e32 v3, vcc, -10, v3 ; CI-NEXT: v_ldexp_f32_e32 v3, v6, v3 ; CI-NEXT: v_mul_f32_e32 v4, v3, v4 @@ -1581,7 +1581,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: ; implicit-def: $vgpr2 ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v4, v3 ; CI-NEXT: s_cbranch_vccz .LBB10_18 -; CI-NEXT: ; %bb.17: ; %frem.else53 +; CI-NEXT: ; %bb.17: ; %frem.else20 ; CI-NEXT: s_and_b32 s1, s5, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v4, v3 ; CI-NEXT: v_mov_b32_e32 v2, s1 @@ -1592,7 +1592,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s1, s1, 1 ; CI-NEXT: s_cmp_lg_u32 s1, 0 ; CI-NEXT: s_cbranch_scc1 .LBB10_24 -; CI-NEXT: ; %bb.19: ; %frem.compute52 +; CI-NEXT: ; %bb.19: ; %frem.compute19 ; CI-NEXT: v_frexp_mant_f32_e32 v5, v3 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v8, v3 ; CI-NEXT: v_ldexp_f32_e64 v3, v5, 1 @@ -1617,10 +1617,10 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v4 ; CI-NEXT: v_div_fixup_f32 v5, v5, v3, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB10_22 -; CI-NEXT: ; %bb.20: ; %frem.loop_body60.preheader +; CI-NEXT: ; %bb.20: ; %frem.loop_body27.preheader ; CI-NEXT: v_add_i32_e32 v4, vcc, 11, v7 ; CI-NEXT: v_sub_i32_e32 v4, vcc, v4, v8 -; CI-NEXT: .LBB10_21: ; %frem.loop_body60 +; CI-NEXT: .LBB10_21: ; %frem.loop_body27 ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v7, v6 ; CI-NEXT: v_mul_f32_e32 v6, v7, v5 @@ -1636,7 +1636,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB10_23 ; CI-NEXT: .LBB10_22: ; CI-NEXT: v_mov_b32_e32 v7, v6 -; CI-NEXT: .LBB10_23: ; %frem.loop_exit61 +; CI-NEXT: .LBB10_23: ; %frem.loop_exit28 ; CI-NEXT: v_add_i32_e32 v4, vcc, -10, v4 ; CI-NEXT: v_ldexp_f32_e32 v4, v7, v4 ; CI-NEXT: v_mul_f32_e32 v5, v4, v5 @@ -1659,7 +1659,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: ; implicit-def: $vgpr3 ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v5, v4 ; CI-NEXT: s_cbranch_vccz .LBB10_26 -; CI-NEXT: ; %bb.25: ; %frem.else86 +; CI-NEXT: ; %bb.25: ; %frem.else ; CI-NEXT: s_and_b32 s1, s7, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v5, v4 ; CI-NEXT: v_mov_b32_e32 v3, s1 @@ -1670,7 +1670,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s1, s1, 1 ; ... [truncated] 
@llvmbot
Copy link
Member

llvmbot commented Sep 15, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Frederik Harwath (frederik-h)

Changes

Extend the existing "scalarize" function which is used for the
fp-integer conversion instruction expansion to BinaryOperator instructions
and reuse it for the frem expansion; a similar function for scalarizing BinaryOperator
instructions exists in the ExpandLargeDivRem pass and this change is a step towards
merging that pass with ExpandFp.

Further refactoring: Extract a function to dispatch instructions to the scalar and
vector queues and hoist a check for scalable vectors to the top of the
instruction visiting loop.


Patch is 383.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/158588.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/ExpandFp.cpp (+45-65)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll (+140-140)
  • (modified) llvm/test/CodeGen/AMDGPU/frem.ll (+650-650)
  • (modified) llvm/test/Transforms/ExpandFp/AMDGPU/frem.ll (+140-140)
diff --git a/llvm/lib/CodeGen/ExpandFp.cpp b/llvm/lib/CodeGen/ExpandFp.cpp index 9cc6c6a706c58..e336da7b914f0 100644 --- a/llvm/lib/CodeGen/ExpandFp.cpp +++ b/llvm/lib/CodeGen/ExpandFp.cpp @@ -356,8 +356,9 @@ Value *FRemExpander::buildFRem(Value *X, Value *Y, static bool expandFRem(BinaryOperator &I, std::optional<SimplifyQuery> &SQ) { LLVM_DEBUG(dbgs() << "Expanding instruction: " << I << '\n'); - Type *ReturnTy = I.getType(); - assert(FRemExpander::canExpandType(ReturnTy->getScalarType())); + Type *Ty = I.getType(); + assert(Ty->isFloatingPointTy() && "Instruction should have been scalarized"); + assert(FRemExpander::canExpandType(Ty)); FastMathFlags FMF = I.getFastMathFlags(); // TODO Make use of those flags for optimization? @@ -368,32 +369,10 @@ static bool expandFRem(BinaryOperator &I, std::optional<SimplifyQuery> &SQ) { B.setFastMathFlags(FMF); B.SetCurrentDebugLocation(I.getDebugLoc()); - Type *ElemTy = ReturnTy->getScalarType(); - const FRemExpander Expander = FRemExpander::create(B, ElemTy); - - Value *Ret; - if (ReturnTy->isFloatingPointTy()) - Ret = FMF.approxFunc() - ? Expander.buildApproxFRem(I.getOperand(0), I.getOperand(1)) - : Expander.buildFRem(I.getOperand(0), I.getOperand(1), SQ); - else { - auto *VecTy = cast<FixedVectorType>(ReturnTy); - - // This could use SplitBlockAndInsertForEachLane but the interface - // is a bit awkward for a constant number of elements and it will - // boil down to the same code. - // TODO Expand the FRem instruction only once and reuse the code. - Value *Nums = I.getOperand(0); - Value *Denums = I.getOperand(1); - Ret = PoisonValue::get(I.getType()); - for (int I = 0, E = VecTy->getNumElements(); I != E; ++I) { - Value *Num = B.CreateExtractElement(Nums, I); - Value *Denum = B.CreateExtractElement(Denums, I); - Value *Rem = FMF.approxFunc() ? Expander.buildApproxFRem(Num, Denum) - : Expander.buildFRem(Num, Denum, SQ); - Ret = B.CreateInsertElement(Ret, Rem, I); - } - } + const FRemExpander Expander = FRemExpander::create(B, Ty); + Value *Ret = FMF.approxFunc() + ? Expander.buildApproxFRem(I.getOperand(0), I.getOperand(1)) + : Expander.buildFRem(I.getOperand(0), I.getOperand(1), SQ); I.replaceAllUsesWith(Ret); Ret->takeName(&I); @@ -948,12 +927,21 @@ static void scalarize(Instruction *I, SmallVectorImpl<Instruction *> &Replace) { Value *Result = PoisonValue::get(VTy); for (unsigned Idx = 0; Idx < NumElements; ++Idx) { Value *Ext = Builder.CreateExtractElement(I->getOperand(0), Idx); - Value *Cast = Builder.CreateCast(cast<CastInst>(I)->getOpcode(), Ext, - I->getType()->getScalarType()); - Result = Builder.CreateInsertElement(Result, Cast, Idx); - if (isa<Instruction>(Cast)) - Replace.push_back(cast<Instruction>(Cast)); + Value *Op; + if (isa<BinaryOperator>(I)) + Op = Builder.CreateBinOp( + cast<BinaryOperator>(I)->getOpcode(), Ext, + Builder.CreateExtractElement(I->getOperand(1), Idx)); + else + Op = Builder.CreateCast(cast<CastInst>(I)->getOpcode(), Ext, + I->getType()->getScalarType()); + Result = Builder.CreateInsertElement(Result, Op, Idx); + if (auto *ScalarizedI = dyn_cast<Instruction>(Op)) { + ScalarizedI->copyIRFlags(I, true); + Replace.push_back(ScalarizedI); + } } + I->replaceAllUsesWith(Result); I->dropAllReferences(); I->eraseFromParent(); @@ -989,6 +977,16 @@ static bool targetSupportsFrem(const TargetLowering &TLI, Type *Ty) { return TLI.getLibcallName(fremToLibcall(Ty->getScalarType())); } +static void enqueueInstruction(Instruction &I, + SmallVector<Instruction *, 4> &Replace, + SmallVector<Instruction *, 4> &ReplaceVector) { + + if (I.getOperand(0)->getType()->isVectorTy()) + ReplaceVector.push_back(&I); + else + Replace.push_back(&I); +} + static bool runImpl(Function &F, const TargetLowering &TLI, AssumptionCache *AC) { SmallVector<Instruction *, 4> Replace; @@ -1004,55 +1002,37 @@ static bool runImpl(Function &F, const TargetLowering &TLI, return false; for (auto &I : instructions(F)) { - switch (I.getOpcode()) { - case Instruction::FRem: { - Type *Ty = I.getType(); - // TODO: This pass doesn't handle scalable vectors. - if (Ty->isScalableTy()) - continue; - - if (targetSupportsFrem(TLI, Ty) || - !FRemExpander::canExpandType(Ty->getScalarType())) - continue; - - Replace.push_back(&I); - Modified = true; + Type *Ty = I.getType(); + // TODO: This pass doesn't handle scalable vectors. + if (Ty->isScalableTy()) + continue; + switch (I.getOpcode()) { + case Instruction::FRem: + if (!targetSupportsFrem(TLI, Ty) && + FRemExpander::canExpandType(Ty->getScalarType())) { + enqueueInstruction(I, Replace, ReplaceVector); + Modified = true; + } break; - } case Instruction::FPToUI: case Instruction::FPToSI: { - // TODO: This pass doesn't handle scalable vectors. - if (I.getOperand(0)->getType()->isScalableTy()) - continue; - - auto *IntTy = cast<IntegerType>(I.getType()->getScalarType()); + auto *IntTy = cast<IntegerType>(Ty->getScalarType()); if (IntTy->getIntegerBitWidth() <= MaxLegalFpConvertBitWidth) continue; - if (I.getOperand(0)->getType()->isVectorTy()) - ReplaceVector.push_back(&I); - else - Replace.push_back(&I); + enqueueInstruction(I, Replace, ReplaceVector); Modified = true; break; } case Instruction::UIToFP: case Instruction::SIToFP: { - // TODO: This pass doesn't handle scalable vectors. - if (I.getOperand(0)->getType()->isScalableTy()) - continue; - auto *IntTy = cast<IntegerType>(I.getOperand(0)->getType()->getScalarType()); if (IntTy->getIntegerBitWidth() <= MaxLegalFpConvertBitWidth) continue; - if (I.getOperand(0)->getType()->isVectorTy()) - ReplaceVector.push_back(&I); - else - Replace.push_back(&I); - Modified = true; + enqueueInstruction(I, Replace, ReplaceVector); break; } default: diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll index 302b2395642d0..c87cfbdfe87b7 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll @@ -1048,7 +1048,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cvt_f32_f16_e64 v1, |s1| ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v2, v1 ; CI-NEXT: s_cbranch_vccz .LBB9_2 -; CI-NEXT: ; %bb.1: ; %frem.else +; CI-NEXT: ; %bb.1: ; %frem.else20 ; CI-NEXT: s_and_b32 s2, s0, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v2, v1 ; CI-NEXT: v_mov_b32_e32 v0, s2 @@ -1059,7 +1059,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s2, s2, 1 ; CI-NEXT: s_cmp_lg_u32 s2, 0 ; CI-NEXT: s_cbranch_scc1 .LBB9_8 -; CI-NEXT: ; %bb.3: ; %frem.compute +; CI-NEXT: ; %bb.3: ; %frem.compute19 ; CI-NEXT: v_frexp_mant_f32_e32 v3, v1 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v6, v1 ; CI-NEXT: v_ldexp_f32_e64 v1, v3, 1 @@ -1084,10 +1084,10 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v2 ; CI-NEXT: v_div_fixup_f32 v3, v3, v1, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB9_6 -; CI-NEXT: ; %bb.4: ; %frem.loop_body.preheader +; CI-NEXT: ; %bb.4: ; %frem.loop_body27.preheader ; CI-NEXT: v_add_i32_e32 v2, vcc, 11, v5 ; CI-NEXT: v_sub_i32_e32 v2, vcc, v2, v6 -; CI-NEXT: .LBB9_5: ; %frem.loop_body +; CI-NEXT: .LBB9_5: ; %frem.loop_body27 ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v5, v4 ; CI-NEXT: v_mul_f32_e32 v4, v5, v3 @@ -1103,7 +1103,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB9_7 ; CI-NEXT: .LBB9_6: ; CI-NEXT: v_mov_b32_e32 v5, v4 -; CI-NEXT: .LBB9_7: ; %frem.loop_exit +; CI-NEXT: .LBB9_7: ; %frem.loop_exit28 ; CI-NEXT: v_add_i32_e32 v2, vcc, -10, v2 ; CI-NEXT: v_ldexp_f32_e32 v2, v5, v2 ; CI-NEXT: v_mul_f32_e32 v3, v2, v3 @@ -1126,7 +1126,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: ; implicit-def: $vgpr1 ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v3, v2 ; CI-NEXT: s_cbranch_vccz .LBB9_10 -; CI-NEXT: ; %bb.9: ; %frem.else20 +; CI-NEXT: ; %bb.9: ; %frem.else ; CI-NEXT: s_and_b32 s4, s2, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v3, v2 ; CI-NEXT: v_mov_b32_e32 v1, s4 @@ -1137,7 +1137,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s4, s4, 1 ; CI-NEXT: s_cmp_lg_u32 s4, 0 ; CI-NEXT: s_cbranch_scc1 .LBB9_16 -; CI-NEXT: ; %bb.11: ; %frem.compute19 +; CI-NEXT: ; %bb.11: ; %frem.compute ; CI-NEXT: v_frexp_mant_f32_e32 v4, v2 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v7, v2 ; CI-NEXT: v_ldexp_f32_e64 v2, v4, 1 @@ -1162,10 +1162,10 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v3 ; CI-NEXT: v_div_fixup_f32 v4, v4, v2, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB9_14 -; CI-NEXT: ; %bb.12: ; %frem.loop_body27.preheader +; CI-NEXT: ; %bb.12: ; %frem.loop_body.preheader ; CI-NEXT: v_add_i32_e32 v3, vcc, 11, v6 ; CI-NEXT: v_sub_i32_e32 v3, vcc, v3, v7 -; CI-NEXT: .LBB9_13: ; %frem.loop_body27 +; CI-NEXT: .LBB9_13: ; %frem.loop_body ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v6, v5 ; CI-NEXT: v_mul_f32_e32 v5, v6, v4 @@ -1181,7 +1181,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB9_15 ; CI-NEXT: .LBB9_14: ; CI-NEXT: v_mov_b32_e32 v6, v5 -; CI-NEXT: .LBB9_15: ; %frem.loop_exit28 +; CI-NEXT: .LBB9_15: ; %frem.loop_exit ; CI-NEXT: v_add_i32_e32 v3, vcc, -10, v3 ; CI-NEXT: v_ldexp_f32_e32 v3, v6, v3 ; CI-NEXT: v_mul_f32_e32 v4, v3, v4 @@ -1239,7 +1239,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: v_cvt_f32_f16_e64 v1, |s1| ; VI-NEXT: v_cmp_ngt_f32_e32 vcc, v2, v1 ; VI-NEXT: s_cbranch_vccz .LBB9_2 -; VI-NEXT: ; %bb.1: ; %frem.else +; VI-NEXT: ; %bb.1: ; %frem.else20 ; VI-NEXT: s_and_b32 s2, s0, 0x8000 ; VI-NEXT: v_cmp_eq_f32_e32 vcc, v2, v1 ; VI-NEXT: v_mov_b32_e32 v0, s2 @@ -1250,7 +1250,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: s_xor_b32 s2, s2, 1 ; VI-NEXT: s_cmp_lg_u32 s2, 0 ; VI-NEXT: s_cbranch_scc1 .LBB9_8 -; VI-NEXT: ; %bb.3: ; %frem.compute +; VI-NEXT: ; %bb.3: ; %frem.compute19 ; VI-NEXT: v_frexp_mant_f32_e32 v3, v1 ; VI-NEXT: v_frexp_exp_i32_f32_e32 v6, v1 ; VI-NEXT: v_ldexp_f32 v1, v3, 1 @@ -1275,10 +1275,10 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v2 ; VI-NEXT: v_div_fixup_f32 v3, v3, v1, 1.0 ; VI-NEXT: s_cbranch_vccnz .LBB9_6 -; VI-NEXT: ; %bb.4: ; %frem.loop_body.preheader +; VI-NEXT: ; %bb.4: ; %frem.loop_body27.preheader ; VI-NEXT: v_add_u32_e32 v2, vcc, 11, v5 ; VI-NEXT: v_sub_u32_e32 v2, vcc, v2, v6 -; VI-NEXT: .LBB9_5: ; %frem.loop_body +; VI-NEXT: .LBB9_5: ; %frem.loop_body27 ; VI-NEXT: ; =>This Inner Loop Header: Depth=1 ; VI-NEXT: v_mov_b32_e32 v5, v4 ; VI-NEXT: v_mul_f32_e32 v4, v5, v3 @@ -1294,7 +1294,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: s_branch .LBB9_7 ; VI-NEXT: .LBB9_6: ; VI-NEXT: v_mov_b32_e32 v5, v4 -; VI-NEXT: .LBB9_7: ; %frem.loop_exit +; VI-NEXT: .LBB9_7: ; %frem.loop_exit28 ; VI-NEXT: v_add_u32_e32 v2, vcc, -10, v2 ; VI-NEXT: v_ldexp_f32 v2, v5, v2 ; VI-NEXT: v_mul_f32_e32 v3, v2, v3 @@ -1317,7 +1317,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: ; implicit-def: $vgpr1 ; VI-NEXT: v_cmp_ngt_f32_e32 vcc, v3, v2 ; VI-NEXT: s_cbranch_vccz .LBB9_10 -; VI-NEXT: ; %bb.9: ; %frem.else20 +; VI-NEXT: ; %bb.9: ; %frem.else ; VI-NEXT: s_and_b32 s3, s4, 0x8000 ; VI-NEXT: v_cmp_eq_f32_e32 vcc, v3, v2 ; VI-NEXT: v_mov_b32_e32 v1, s3 @@ -1328,7 +1328,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: s_xor_b32 s3, s3, 1 ; VI-NEXT: s_cmp_lg_u32 s3, 0 ; VI-NEXT: s_cbranch_scc1 .LBB9_16 -; VI-NEXT: ; %bb.11: ; %frem.compute19 +; VI-NEXT: ; %bb.11: ; %frem.compute ; VI-NEXT: v_frexp_mant_f32_e32 v4, v2 ; VI-NEXT: v_frexp_exp_i32_f32_e32 v7, v2 ; VI-NEXT: v_ldexp_f32 v2, v4, 1 @@ -1353,10 +1353,10 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v3 ; VI-NEXT: v_div_fixup_f32 v4, v4, v2, 1.0 ; VI-NEXT: s_cbranch_vccnz .LBB9_14 -; VI-NEXT: ; %bb.12: ; %frem.loop_body27.preheader +; VI-NEXT: ; %bb.12: ; %frem.loop_body.preheader ; VI-NEXT: v_add_u32_e32 v3, vcc, 11, v6 ; VI-NEXT: v_sub_u32_e32 v3, vcc, v3, v7 -; VI-NEXT: .LBB9_13: ; %frem.loop_body27 +; VI-NEXT: .LBB9_13: ; %frem.loop_body ; VI-NEXT: ; =>This Inner Loop Header: Depth=1 ; VI-NEXT: v_mov_b32_e32 v6, v5 ; VI-NEXT: v_mul_f32_e32 v5, v6, v4 @@ -1372,7 +1372,7 @@ define amdgpu_kernel void @frem_v2f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; VI-NEXT: s_branch .LBB9_15 ; VI-NEXT: .LBB9_14: ; VI-NEXT: v_mov_b32_e32 v6, v5 -; VI-NEXT: .LBB9_15: ; %frem.loop_exit28 +; VI-NEXT: .LBB9_15: ; %frem.loop_exit ; VI-NEXT: v_add_u32_e32 v3, vcc, -10, v3 ; VI-NEXT: v_ldexp_f32 v3, v6, v3 ; VI-NEXT: v_mul_f32_e32 v4, v3, v4 @@ -1427,7 +1427,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cvt_f32_f16_e64 v1, |s2| ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v2, v1 ; CI-NEXT: s_cbranch_vccz .LBB10_2 -; CI-NEXT: ; %bb.1: ; %frem.else +; CI-NEXT: ; %bb.1: ; %frem.else86 ; CI-NEXT: s_and_b32 s0, s4, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v2, v1 ; CI-NEXT: v_mov_b32_e32 v0, s0 @@ -1438,7 +1438,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s0, s0, 1 ; CI-NEXT: s_cmp_lg_u32 s0, 0 ; CI-NEXT: s_cbranch_scc1 .LBB10_8 -; CI-NEXT: ; %bb.3: ; %frem.compute +; CI-NEXT: ; %bb.3: ; %frem.compute85 ; CI-NEXT: v_frexp_mant_f32_e32 v3, v1 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v6, v1 ; CI-NEXT: v_ldexp_f32_e64 v1, v3, 1 @@ -1463,10 +1463,10 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v2 ; CI-NEXT: v_div_fixup_f32 v3, v3, v1, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB10_6 -; CI-NEXT: ; %bb.4: ; %frem.loop_body.preheader +; CI-NEXT: ; %bb.4: ; %frem.loop_body93.preheader ; CI-NEXT: v_add_i32_e32 v2, vcc, 11, v5 ; CI-NEXT: v_sub_i32_e32 v2, vcc, v2, v6 -; CI-NEXT: .LBB10_5: ; %frem.loop_body +; CI-NEXT: .LBB10_5: ; %frem.loop_body93 ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v5, v4 ; CI-NEXT: v_mul_f32_e32 v4, v5, v3 @@ -1482,7 +1482,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB10_7 ; CI-NEXT: .LBB10_6: ; CI-NEXT: v_mov_b32_e32 v5, v4 -; CI-NEXT: .LBB10_7: ; %frem.loop_exit +; CI-NEXT: .LBB10_7: ; %frem.loop_exit94 ; CI-NEXT: v_add_i32_e32 v2, vcc, -10, v2 ; CI-NEXT: v_ldexp_f32_e32 v2, v5, v2 ; CI-NEXT: v_mul_f32_e32 v3, v2, v3 @@ -1505,7 +1505,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: ; implicit-def: $vgpr1 ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v3, v2 ; CI-NEXT: s_cbranch_vccz .LBB10_10 -; CI-NEXT: ; %bb.9: ; %frem.else20 +; CI-NEXT: ; %bb.9: ; %frem.else53 ; CI-NEXT: s_and_b32 s1, s6, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v3, v2 ; CI-NEXT: v_mov_b32_e32 v1, s1 @@ -1516,7 +1516,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s1, s1, 1 ; CI-NEXT: s_cmp_lg_u32 s1, 0 ; CI-NEXT: s_cbranch_scc1 .LBB10_16 -; CI-NEXT: ; %bb.11: ; %frem.compute19 +; CI-NEXT: ; %bb.11: ; %frem.compute52 ; CI-NEXT: v_frexp_mant_f32_e32 v4, v2 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v7, v2 ; CI-NEXT: v_ldexp_f32_e64 v2, v4, 1 @@ -1541,10 +1541,10 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v3 ; CI-NEXT: v_div_fixup_f32 v4, v4, v2, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB10_14 -; CI-NEXT: ; %bb.12: ; %frem.loop_body27.preheader +; CI-NEXT: ; %bb.12: ; %frem.loop_body60.preheader ; CI-NEXT: v_add_i32_e32 v3, vcc, 11, v6 ; CI-NEXT: v_sub_i32_e32 v3, vcc, v3, v7 -; CI-NEXT: .LBB10_13: ; %frem.loop_body27 +; CI-NEXT: .LBB10_13: ; %frem.loop_body60 ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v6, v5 ; CI-NEXT: v_mul_f32_e32 v5, v6, v4 @@ -1560,7 +1560,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB10_15 ; CI-NEXT: .LBB10_14: ; CI-NEXT: v_mov_b32_e32 v6, v5 -; CI-NEXT: .LBB10_15: ; %frem.loop_exit28 +; CI-NEXT: .LBB10_15: ; %frem.loop_exit61 ; CI-NEXT: v_add_i32_e32 v3, vcc, -10, v3 ; CI-NEXT: v_ldexp_f32_e32 v3, v6, v3 ; CI-NEXT: v_mul_f32_e32 v4, v3, v4 @@ -1581,7 +1581,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: ; implicit-def: $vgpr2 ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v4, v3 ; CI-NEXT: s_cbranch_vccz .LBB10_18 -; CI-NEXT: ; %bb.17: ; %frem.else53 +; CI-NEXT: ; %bb.17: ; %frem.else20 ; CI-NEXT: s_and_b32 s1, s5, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v4, v3 ; CI-NEXT: v_mov_b32_e32 v2, s1 @@ -1592,7 +1592,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s1, s1, 1 ; CI-NEXT: s_cmp_lg_u32 s1, 0 ; CI-NEXT: s_cbranch_scc1 .LBB10_24 -; CI-NEXT: ; %bb.19: ; %frem.compute52 +; CI-NEXT: ; %bb.19: ; %frem.compute19 ; CI-NEXT: v_frexp_mant_f32_e32 v5, v3 ; CI-NEXT: v_frexp_exp_i32_f32_e32 v8, v3 ; CI-NEXT: v_ldexp_f32_e64 v3, v5, 1 @@ -1617,10 +1617,10 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: v_cmp_ge_i32_e32 vcc, 11, v4 ; CI-NEXT: v_div_fixup_f32 v5, v5, v3, 1.0 ; CI-NEXT: s_cbranch_vccnz .LBB10_22 -; CI-NEXT: ; %bb.20: ; %frem.loop_body60.preheader +; CI-NEXT: ; %bb.20: ; %frem.loop_body27.preheader ; CI-NEXT: v_add_i32_e32 v4, vcc, 11, v7 ; CI-NEXT: v_sub_i32_e32 v4, vcc, v4, v8 -; CI-NEXT: .LBB10_21: ; %frem.loop_body60 +; CI-NEXT: .LBB10_21: ; %frem.loop_body27 ; CI-NEXT: ; =>This Inner Loop Header: Depth=1 ; CI-NEXT: v_mov_b32_e32 v7, v6 ; CI-NEXT: v_mul_f32_e32 v6, v7, v5 @@ -1636,7 +1636,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_branch .LBB10_23 ; CI-NEXT: .LBB10_22: ; CI-NEXT: v_mov_b32_e32 v7, v6 -; CI-NEXT: .LBB10_23: ; %frem.loop_exit61 +; CI-NEXT: .LBB10_23: ; %frem.loop_exit28 ; CI-NEXT: v_add_i32_e32 v4, vcc, -10, v4 ; CI-NEXT: v_ldexp_f32_e32 v4, v7, v4 ; CI-NEXT: v_mul_f32_e32 v5, v4, v5 @@ -1659,7 +1659,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: ; implicit-def: $vgpr3 ; CI-NEXT: v_cmp_ngt_f32_e32 vcc, v5, v4 ; CI-NEXT: s_cbranch_vccz .LBB10_26 -; CI-NEXT: ; %bb.25: ; %frem.else86 +; CI-NEXT: ; %bb.25: ; %frem.else ; CI-NEXT: s_and_b32 s1, s7, 0x8000 ; CI-NEXT: v_cmp_eq_f32_e32 vcc, v5, v4 ; CI-NEXT: v_mov_b32_e32 v3, s1 @@ -1670,7 +1670,7 @@ define amdgpu_kernel void @frem_v4f16(ptr addrspace(1) %out, ptr addrspace(1) %i ; CI-NEXT: s_xor_b32 s1, s1, 1 ; ... [truncated] 
* Remove redundant assertion * Use "dyn_cast" instad of "isa".
This naming is more aligned with other passes.
@frederik-h frederik-h requested a review from arsenm September 15, 2025 16:07
@github-actions
Copy link

github-actions bot commented Sep 15, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@frederik-h frederik-h merged commit 7314565 into llvm:main Oct 13, 2025
9 checks passed
@frederik-h frederik-h deleted the expand-fp-unify-scalarization branch October 13, 2025 07:18
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 13, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-ubuntu running on as-builder-4 while building llvm at step 7 "test-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/12459

Here is the relevant piece of the build log for the reference
Step 7 (test-check-all) failure: Test just built components: check-all completed (failure) ******************** TEST 'LLVM :: CodeGen/AMDGPU/itofp.i128.ll' FAILED ******************** Exit Code: 2 Command Output (stdout): -- # RUN: at line 2 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/itofp.i128.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefixes=GCN,SDAG /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/itofp.i128.ll # executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 # .---command stderr------------ # | Pass modifies its input and doesn't report it: Expand fp # | Pass modifies its input and doesn't report it # | UNREACHABLE executed at /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1404! # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | Stack dump: # | 0.	Program arguments: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 # | 1.	Running pass 'Function Pass Manager' on module '<stdin>'. # | 2.	Running pass 'Expand fp' on function '@sitofp_i128_to_f32' # | #0 0x000061958f2a09f8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x7dcc9f8) # | #1 0x000061958f29e105 llvm::sys::RunSignalHandlers() (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x7dca105) # | #2 0x000061958f2a17c1 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0 # | #3 0x0000738518a45330 (/lib/x86_64-linux-gnu/libc.so.6+0x45330) # | #4 0x0000738518a9eb2c pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x9eb2c) # | #5 0x0000738518a4527e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4527e) # | #6 0x0000738518a288ff abort (/lib/x86_64-linux-gnu/libc.so.6+0x288ff) # | #7 0x000061958f206d0f (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x7d32d0f) # | #8 0x000061958e814912 (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x7340912) # | #9 0x000061958e81c6a2 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x73486a2) # | #10 0x000061958e81514a llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x734114a) # | #11 0x000061958c1ad6c5 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0 # | #12 0x000061958c1aacdd main (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x4cd6cdd) # | #13 0x0000738518a2a1ca (/lib/x86_64-linux-gnu/libc.so.6+0x2a1ca) # | #14 0x0000738518a2a28b __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28b) # | #15 0x000061958c1a6765 _start (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x4cd2765) # `----------------------------- # error: command failed with exit status: -6 # executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefixes=GCN,SDAG /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/itofp.i128.ll # .---command stderr------------ # | FileCheck error: '<stdin>' is empty. # | FileCheck command line: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefixes=GCN,SDAG /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/itofp.i128.ll # `----------------------------- # error: command failed with exit status: 2 -- ******************** 
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 13, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-ubsan running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/25/builds/12285

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) ... llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/ld.lld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 89308 tests, 64 workers -- Testing: 0.. 10.. 20.. 30 FAIL: Clangd Unit Tests :: ./ClangdTests/26/84 (28824 of 89308) ******************** TEST 'Clangd Unit Tests :: ./ClangdTests/26/84' FAILED ******************** Script(shard): -- GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-801416-26-84.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=84 GTEST_SHARD_INDEX=26 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests -- Script: -- /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests --gtest_filter=CompletionTest.DynamicIndexIncludeInsertion -- ASTWorker building file /clangd-test/foo_impl.cpp version null with command [/clangd-test] clang -ffreestanding /clangd-test/foo_impl.cpp Driver produced command: cc1 -cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -disable-free -clear-ast-before-backend -main-file-name foo_impl.cpp -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -ffreestanding -target-cpu x86-64 -tune-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/clangd-test -fcoverage-compilation-dir=/clangd-test -resource-dir lib/clang/22 -internal-isystem lib/clang/22/include -internal-isystem /usr/local/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -ferror-limit 19 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -no-round-trip-args -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -x c++ /clangd-test/foo_impl.cpp Building first preamble for /clangd-test/foo_impl.cpp version null Built preamble of size 238484 for file /clangd-test/foo_impl.cpp version null in 4.42 seconds indexed preamble AST for /clangd-test/foo_impl.cpp version null: symbol slab: 2 symbols, 4680 bytes ref slab: 0 symbols, 0 refs, 128 bytes relations slab: 0 relations, 24 bytes Build dynamic index for header symbols with estimated memory usage of 7136 bytes /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp:1034: Failure Value of: Server.blockUntilIdleForTest() Actual: false Expected: true indexed file AST for /clangd-test/foo_impl.cpp version null: symbol slab: 1 symbols, 4448 bytes ref slab: 2 symbols, 2 refs, 4272 bytes relations slab: 0 relations, 24 bytes Build dynamic index for main-file symbols with estimated memory usage of 11576 bytes /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp:1034 Value of: Server.blockUntilIdleForTest() Actual: false Expected: true Step 14 (stage3/ubsan check) failure: stage3/ubsan check (failure) ... llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/ld.lld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 89308 tests, 64 workers -- Testing: 0.. 10.. 20.. 30 FAIL: Clangd Unit Tests :: ./ClangdTests/26/84 (28824 of 89308) ******************** TEST 'Clangd Unit Tests :: ./ClangdTests/26/84' FAILED ******************** Script(shard): -- GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-801416-26-84.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=84 GTEST_SHARD_INDEX=26 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests -- Script: -- /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/tools/clang/tools/extra/clangd/unittests/./ClangdTests --gtest_filter=CompletionTest.DynamicIndexIncludeInsertion -- ASTWorker building file /clangd-test/foo_impl.cpp version null with command [/clangd-test] clang -ffreestanding /clangd-test/foo_impl.cpp Driver produced command: cc1 -cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -disable-free -clear-ast-before-backend -main-file-name foo_impl.cpp -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -ffreestanding -target-cpu x86-64 -tune-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/clangd-test -fcoverage-compilation-dir=/clangd-test -resource-dir lib/clang/22 -internal-isystem lib/clang/22/include -internal-isystem /usr/local/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -ferror-limit 19 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -no-round-trip-args -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -x c++ /clangd-test/foo_impl.cpp Building first preamble for /clangd-test/foo_impl.cpp version null Built preamble of size 238484 for file /clangd-test/foo_impl.cpp version null in 4.42 seconds indexed preamble AST for /clangd-test/foo_impl.cpp version null: symbol slab: 2 symbols, 4680 bytes ref slab: 0 symbols, 0 refs, 128 bytes relations slab: 0 relations, 24 bytes Build dynamic index for header symbols with estimated memory usage of 7136 bytes /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp:1034: Failure Value of: Server.blockUntilIdleForTest() Actual: false Expected: true indexed file AST for /clangd-test/foo_impl.cpp version null: symbol slab: 1 symbols, 4448 bytes ref slab: 2 symbols, 2 refs, 4272 bytes relations slab: 0 relations, 24 bytes Build dynamic index for main-file symbols with estimated memory usage of 11576 bytes /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp:1034 Value of: Server.blockUntilIdleForTest() Actual: false Expected: true 
DharuniRAcharya pushed a commit to DharuniRAcharya/llvm-project that referenced this pull request Oct 13, 2025
Extend the existing "scalarize" function which is used for the fp-integer conversion instruction expansion to BinaryOperator instructions and reuse it for the frem expansion; a similar function for scalarizing BinaryOperator instructions exists in the ExpandLargeDivRem pass and this change is a step towards merging that pass with ExpandFp. Further refactoring: Scalarize directly instead of using the "ReplaceVector" as a worklist, rename "Replace" vector to "Worklist", and hoist a check for unsupported scalable vectors to the top of the instruction visiting loop.
@RKSimon
Copy link
Collaborator

RKSimon commented Oct 13, 2025

@frederik-h This is breaking EXPENSIVE_CHECKS builds - are you able to take a look soon or should we revert in the meantime?

@frederik-h
Copy link
Contributor Author

@frederik-h This is breaking EXPENSIVE_CHECKS builds - are you able to take a look soon or should we revert in the meantime?

The fix is here: #163153. @arsenm just pointed out an improvement during the review, but I guess we should just commit the minimal fix if this takes longer.

frederik-h added a commit that referenced this pull request Oct 13, 2025
The last change to the pass in PR #158588 lost the assignment to the "Modified" variable for one of the pass optimizations. Add it back. This fixes the test failure in `CodeGen/AMDGPU/itofp.i128.bf.ll` (in a `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` build).
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
Extend the existing "scalarize" function which is used for the fp-integer conversion instruction expansion to BinaryOperator instructions and reuse it for the frem expansion; a similar function for scalarizing BinaryOperator instructions exists in the ExpandLargeDivRem pass and this change is a step towards merging that pass with ExpandFp. Further refactoring: Scalarize directly instead of using the "ReplaceVector" as a worklist, rename "Replace" vector to "Worklist", and hoist a check for unsupported scalable vectors to the top of the instruction visiting loop.
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
The last change to the pass in PR llvm#158588 lost the assignment to the "Modified" variable for one of the pass optimizations. Add it back. This fixes the test failure in `CodeGen/AMDGPU/itofp.i128.bf.ll` (in a `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` build).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment