Skip to content

Conversation

@frederik-h
Copy link
Contributor

This is a small refactoring to set the return value of the runImpl function which indicates whether or not the IR has been changed in a single place instead of doing it separately at the insertion of supported instructions into the worklist.

This is a small refactoring to set the return value of the runImpl function which indicates whether or not the IR has been changed in a single place instead of doing it separately at the insertion of supported instructions into the worklist.
@frederik-h
Copy link
Contributor Author

This change was suggested in the review of PR #163153.

The "dyn_cast" needs to be there as witnessed by the test case in CodeGen/AMDGPU/frem.ll with two constant vector operands. Refactor the loop that visits the instructions to allow for a single assignment to the "Modified" variable.
This does reflect the structure of the instruction visiting function and is more readable.
@llvmbot
Copy link
Member

llvmbot commented Oct 16, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Frederik Harwath (frederik-h)

Changes

This is a small refactoring to set the return value of the runImpl function which indicates whether or not the IR has been changed in a single place instead of doing it separately at the insertion of supported instructions into the worklist.


Patch is 64.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163542.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/ExpandFp.cpp (+33-28)
  • (modified) llvm/test/CodeGen/AMDGPU/frem.ll (+1358)
diff --git a/llvm/lib/CodeGen/ExpandFp.cpp b/llvm/lib/CodeGen/ExpandFp.cpp index 04c700869cd69..2b5ced3915a2c 100644 --- a/llvm/lib/CodeGen/ExpandFp.cpp +++ b/llvm/lib/CodeGen/ExpandFp.cpp @@ -993,7 +993,6 @@ static void addToWorklist(Instruction &I, static bool runImpl(Function &F, const TargetLowering &TLI, AssumptionCache *AC) { SmallVector<Instruction *, 4> Worklist; - bool Modified = false; unsigned MaxLegalFpConvertBitWidth = TLI.getMaxLargeFPConvertBitWidthSupported(); @@ -1003,50 +1002,49 @@ static bool runImpl(Function &F, const TargetLowering &TLI, if (MaxLegalFpConvertBitWidth >= llvm::IntegerType::MAX_INT_BITS) return false; - for (auto It = inst_begin(&F), End = inst_end(F); It != End;) { - Instruction &I = *It++; + auto ShouldHandleInst = [&](Instruction &I) { Type *Ty = I.getType(); // TODO: This pass doesn't handle scalable vectors. if (Ty->isScalableTy()) - continue; + return false; switch (I.getOpcode()) { case Instruction::FRem: - if (!targetSupportsFrem(TLI, Ty) && - FRemExpander::canExpandType(Ty->getScalarType())) { - addToWorklist(I, Worklist); - Modified = true; - } - break; + return !targetSupportsFrem(TLI, Ty) && + FRemExpander::canExpandType(Ty->getScalarType()); + case Instruction::FPToUI: case Instruction::FPToSI: { auto *IntTy = cast<IntegerType>(Ty->getScalarType()); - if (IntTy->getIntegerBitWidth() <= MaxLegalFpConvertBitWidth) - continue; - - addToWorklist(I, Worklist); - Modified = true; - break; + return IntTy->getIntegerBitWidth() > MaxLegalFpConvertBitWidth; } + case Instruction::UIToFP: case Instruction::SIToFP: { auto *IntTy = cast<IntegerType>(I.getOperand(0)->getType()->getScalarType()); - if (IntTy->getIntegerBitWidth() <= MaxLegalFpConvertBitWidth) - continue; - - addToWorklist(I, Worklist); - Modified = true; - break; + return IntTy->getIntegerBitWidth() > MaxLegalFpConvertBitWidth; } - default: - break; } + + return false; + }; + + bool Modified = false; + for (auto It = inst_begin(&F), End = inst_end(F); It != End;) { + Instruction &I = *It++; + if (!ShouldHandleInst(I)) + continue; + + addToWorklist(I, Worklist); + Modified = true; } while (!Worklist.empty()) { Instruction *I = Worklist.pop_back_val(); - if (I->getOpcode() == Instruction::FRem) { + + switch (I->getOpcode()) { + case Instruction::FRem: { auto SQ = [&]() -> std::optional<SimplifyQuery> { if (AC) { auto Res = std::make_optional<SimplifyQuery>( @@ -1058,11 +1056,18 @@ static bool runImpl(Function &F, const TargetLowering &TLI, }(); expandFRem(cast<BinaryOperator>(*I), SQ); - } else if (I->getOpcode() == Instruction::FPToUI || - I->getOpcode() == Instruction::FPToSI) { + break; + } + + case Instruction::FPToUI: + case Instruction::FPToSI: expandFPToI(I); - } else { + break; + + case Instruction::UIToFP: + case Instruction::SIToFP: expandIToFP(I); + break; } } diff --git a/llvm/test/CodeGen/AMDGPU/frem.ll b/llvm/test/CodeGen/AMDGPU/frem.ll index 415828f32f920..901ce6146cc9b 100644 --- a/llvm/test/CodeGen/AMDGPU/frem.ll +++ b/llvm/test/CodeGen/AMDGPU/frem.ll @@ -17589,5 +17589,1363 @@ define amdgpu_kernel void @frem_v2f64(ptr addrspace(1) %out, ptr addrspace(1) %i ret void } + +define amdgpu_kernel void @frem_v2f64_const_zero_num(ptr addrspace(1) %out, ptr addrspace(1) %in) #0 { +; SI-LABEL: frem_v2f64_const_zero_num: +; SI: ; %bb.0: +; SI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x9 +; SI-NEXT: s_mov_b32 s7, 0xf000 +; SI-NEXT: s_mov_b32 s6, -1 +; SI-NEXT: s_waitcnt lgkmcnt(0) +; SI-NEXT: s_mov_b32 s4, s2 +; SI-NEXT: s_mov_b32 s5, s3 +; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0 +; SI-NEXT: s_waitcnt vmcnt(0) +; SI-NEXT: v_cmp_nlg_f64_e32 vcc, 0, v[0:1] +; SI-NEXT: s_and_b64 s[2:3], vcc, exec +; SI-NEXT: s_cselect_b32 s8, 0x7ff80000, 0 +; SI-NEXT: s_mov_b32 s2, s6 +; SI-NEXT: s_mov_b32 s3, s7 +; SI-NEXT: v_cmp_nlg_f64_e32 vcc, 0, v[2:3] +; SI-NEXT: s_and_b64 s[4:5], vcc, exec +; SI-NEXT: s_cselect_b32 s4, 0x7ff80000, 0 +; SI-NEXT: v_mov_b32_e32 v0, 0 +; SI-NEXT: v_mov_b32_e32 v1, s8 +; SI-NEXT: v_mov_b32_e32 v3, s4 +; SI-NEXT: v_mov_b32_e32 v2, v0 +; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 +; SI-NEXT: s_endpgm +; +; CI-LABEL: frem_v2f64_const_zero_num: +; CI: ; %bb.0: +; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x9 +; CI-NEXT: s_mov_b32 s7, 0xf000 +; CI-NEXT: s_mov_b32 s6, -1 +; CI-NEXT: s_waitcnt lgkmcnt(0) +; CI-NEXT: s_mov_b32 s4, s2 +; CI-NEXT: s_mov_b32 s5, s3 +; CI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0 +; CI-NEXT: s_waitcnt vmcnt(0) +; CI-NEXT: v_cmp_nlg_f64_e32 vcc, 0, v[0:1] +; CI-NEXT: v_mov_b32_e32 v0, 0 +; CI-NEXT: s_and_b64 s[2:3], vcc, exec +; CI-NEXT: v_cmp_nlg_f64_e32 vcc, 0, v[2:3] +; CI-NEXT: s_cselect_b32 s8, 0x7ff80000, 0 +; CI-NEXT: s_mov_b32 s2, s6 +; CI-NEXT: s_mov_b32 s3, s7 +; CI-NEXT: v_mov_b32_e32 v1, s8 +; CI-NEXT: v_mov_b32_e32 v2, v0 +; CI-NEXT: s_and_b64 s[4:5], vcc, exec +; CI-NEXT: s_cselect_b32 s4, 0x7ff80000, 0 +; CI-NEXT: v_mov_b32_e32 v3, s4 +; CI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 +; CI-NEXT: s_endpgm +; +; VI-LABEL: frem_v2f64_const_zero_num: +; VI: ; %bb.0: +; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x24 +; VI-NEXT: s_waitcnt lgkmcnt(0) +; VI-NEXT: v_mov_b32_e32 v0, s2 +; VI-NEXT: v_mov_b32_e32 v1, s3 +; VI-NEXT: flat_load_dwordx4 v[0:3], v[0:1] +; VI-NEXT: v_mov_b32_e32 v4, s0 +; VI-NEXT: v_mov_b32_e32 v5, s1 +; VI-NEXT: s_waitcnt vmcnt(0) +; VI-NEXT: v_cmp_nlg_f64_e32 vcc, 0, v[0:1] +; VI-NEXT: v_mov_b32_e32 v0, 0 +; VI-NEXT: s_and_b64 s[2:3], vcc, exec +; VI-NEXT: v_cmp_nlg_f64_e32 vcc, 0, v[2:3] +; VI-NEXT: s_cselect_b32 s2, 0x7ff80000, 0 +; VI-NEXT: v_mov_b32_e32 v1, s2 +; VI-NEXT: v_mov_b32_e32 v2, v0 +; VI-NEXT: s_and_b64 s[0:1], vcc, exec +; VI-NEXT: s_cselect_b32 s0, 0x7ff80000, 0 +; VI-NEXT: v_mov_b32_e32 v3, s0 +; VI-NEXT: flat_store_dwordx4 v[4:5], v[0:3] +; VI-NEXT: s_endpgm +; +; GFX9-LABEL: frem_v2f64_const_zero_num: +; GFX9: ; %bb.0: +; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x24 +; GFX9-NEXT: v_mov_b32_e32 v0, 0 +; GFX9-NEXT: s_waitcnt lgkmcnt(0) +; GFX9-NEXT: global_load_dwordx4 v[1:4], v0, s[2:3] +; GFX9-NEXT: s_waitcnt vmcnt(0) +; GFX9-NEXT: v_cmp_nlg_f64_e32 vcc, 0, v[1:2] +; GFX9-NEXT: v_mov_b32_e32 v2, v0 +; GFX9-NEXT: s_and_b64 s[2:3], vcc, exec +; GFX9-NEXT: v_cmp_nlg_f64_e32 vcc, 0, v[3:4] +; GFX9-NEXT: s_cselect_b32 s4, 0x7ff80000, 0 +; GFX9-NEXT: v_mov_b32_e32 v1, s4 +; GFX9-NEXT: s_and_b64 s[2:3], vcc, exec +; GFX9-NEXT: s_cselect_b32 s2, 0x7ff80000, 0 +; GFX9-NEXT: v_mov_b32_e32 v3, s2 +; GFX9-NEXT: global_store_dwordx4 v0, v[0:3], s[0:1] +; GFX9-NEXT: s_endpgm +; +; GFX10-LABEL: frem_v2f64_const_zero_num: +; GFX10: ; %bb.0: +; GFX10-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x24 +; GFX10-NEXT: v_mov_b32_e32 v0, 0 +; GFX10-NEXT: s_waitcnt lgkmcnt(0) +; GFX10-NEXT: global_load_dwordx4 v[1:4], v0, s[2:3] +; GFX10-NEXT: s_waitcnt vmcnt(0) +; GFX10-NEXT: v_cmp_nlg_f64_e32 vcc_lo, 0, v[1:2] +; GFX10-NEXT: v_mov_b32_e32 v2, v0 +; GFX10-NEXT: s_and_b32 s2, vcc_lo, exec_lo +; GFX10-NEXT: v_cmp_nlg_f64_e32 vcc_lo, 0, v[3:4] +; GFX10-NEXT: s_cselect_b32 s2, 0x7ff80000, 0 +; GFX10-NEXT: v_mov_b32_e32 v1, s2 +; GFX10-NEXT: s_and_b32 s3, vcc_lo, exec_lo +; GFX10-NEXT: s_cselect_b32 s3, 0x7ff80000, 0 +; GFX10-NEXT: v_mov_b32_e32 v3, s3 +; GFX10-NEXT: global_store_dwordx4 v0, v[0:3], s[0:1] +; GFX10-NEXT: s_endpgm +; +; GFX11-LABEL: frem_v2f64_const_zero_num: +; GFX11: ; %bb.0: +; GFX11-NEXT: s_load_b128 s[0:3], s[4:5], 0x24 +; GFX11-NEXT: v_mov_b32_e32 v0, 0 +; GFX11-NEXT: s_waitcnt lgkmcnt(0) +; GFX11-NEXT: global_load_b128 v[1:4], v0, s[2:3] +; GFX11-NEXT: s_waitcnt vmcnt(0) +; GFX11-NEXT: v_cmp_nlg_f64_e32 vcc_lo, 0, v[1:2] +; GFX11-NEXT: s_and_b32 s2, vcc_lo, exec_lo +; GFX11-NEXT: v_cmp_nlg_f64_e32 vcc_lo, 0, v[3:4] +; GFX11-NEXT: s_cselect_b32 s2, 0x7ff80000, 0 +; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_2) | instid1(SALU_CYCLE_1) +; GFX11-NEXT: v_dual_mov_b32 v1, s2 :: v_dual_mov_b32 v2, v0 +; GFX11-NEXT: s_and_b32 s3, vcc_lo, exec_lo +; GFX11-NEXT: s_cselect_b32 s3, 0x7ff80000, 0 +; GFX11-NEXT: v_mov_b32_e32 v3, s3 +; GFX11-NEXT: global_store_b128 v0, v[0:3], s[0:1] +; GFX11-NEXT: s_endpgm +; +; GFX1150-LABEL: frem_v2f64_const_zero_num: +; GFX1150: ; %bb.0: +; GFX1150-NEXT: s_load_b128 s[0:3], s[4:5], 0x24 +; GFX1150-NEXT: v_mov_b32_e32 v0, 0 +; GFX1150-NEXT: s_waitcnt lgkmcnt(0) +; GFX1150-NEXT: global_load_b128 v[1:4], v0, s[2:3] +; GFX1150-NEXT: s_waitcnt vmcnt(0) +; GFX1150-NEXT: v_cmp_nlg_f64_e32 vcc_lo, 0, v[1:2] +; GFX1150-NEXT: s_and_b32 s2, vcc_lo, exec_lo +; GFX1150-NEXT: v_cmp_nlg_f64_e32 vcc_lo, 0, v[3:4] +; GFX1150-NEXT: s_cselect_b32 s2, 0x7ff80000, 0 +; GFX1150-NEXT: s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_2) | instid1(SALU_CYCLE_1) +; GFX1150-NEXT: v_dual_mov_b32 v1, s2 :: v_dual_mov_b32 v2, v0 +; GFX1150-NEXT: s_and_b32 s3, vcc_lo, exec_lo +; GFX1150-NEXT: s_cselect_b32 s3, 0x7ff80000, 0 +; GFX1150-NEXT: v_mov_b32_e32 v3, s3 +; GFX1150-NEXT: global_store_b128 v0, v[0:3], s[0:1] +; GFX1150-NEXT: s_endpgm +; +; GFX1200-LABEL: frem_v2f64_const_zero_num: +; GFX1200: ; %bb.0: +; GFX1200-NEXT: s_load_b128 s[0:3], s[4:5], 0x24 +; GFX1200-NEXT: v_mov_b32_e32 v0, 0 +; GFX1200-NEXT: s_wait_kmcnt 0x0 +; GFX1200-NEXT: global_load_b128 v[1:4], v0, s[2:3] +; GFX1200-NEXT: s_wait_loadcnt 0x0 +; GFX1200-NEXT: v_cmp_nlg_f64_e32 vcc_lo, 0, v[1:2] +; GFX1200-NEXT: s_and_b32 s2, vcc_lo, exec_lo +; GFX1200-NEXT: v_cmp_nlg_f64_e32 vcc_lo, 0, v[3:4] +; GFX1200-NEXT: s_cselect_b32 s2, 0x7ff80000, 0 +; GFX1200-NEXT: s_delay_alu instid0(SALU_CYCLE_1) +; GFX1200-NEXT: v_dual_mov_b32 v1, s2 :: v_dual_mov_b32 v2, v0 +; GFX1200-NEXT: s_and_b32 s3, vcc_lo, exec_lo +; GFX1200-NEXT: s_cselect_b32 s3, 0x7ff80000, 0 +; GFX1200-NEXT: s_wait_alu 0xfffe +; GFX1200-NEXT: v_mov_b32_e32 v3, s3 +; GFX1200-NEXT: global_store_b128 v0, v[0:3], s[0:1] +; GFX1200-NEXT: s_endpgm + %r0 = load <2 x double>, ptr addrspace(1) %in, align 16 + %r1 = frem <2 x double> <double 0.0, double 0.0>, %r0 + store <2 x double> %r1, ptr addrspace(1) %out, align 16 + ret void +} + +define amdgpu_kernel void @frem_v2f64_const_one_denum(ptr addrspace(1) %out, ptr addrspace(1) %in) #0 { +; SI-LABEL: frem_v2f64_const_one_denum: +; SI: ; %bb.0: +; SI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x9 +; SI-NEXT: s_mov_b32 s7, 0xf000 +; SI-NEXT: s_mov_b32 s6, -1 +; SI-NEXT: s_waitcnt lgkmcnt(0) +; SI-NEXT: s_mov_b32 s4, s2 +; SI-NEXT: s_mov_b32 s5, s3 +; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0 +; SI-NEXT: s_waitcnt vmcnt(0) +; SI-NEXT: v_cmp_ngt_f64_e64 s[2:3], |v[0:1]|, 1.0 +; SI-NEXT: s_and_b64 vcc, exec, s[2:3] +; SI-NEXT: s_cbranch_vccz .LBB15_2 +; SI-NEXT: ; %bb.1: ; %frem.else16 +; SI-NEXT: v_and_b32_e32 v4, 0x80000000, v1 +; SI-NEXT: v_cmp_eq_f64_e64 vcc, |v[0:1]|, 1.0 +; SI-NEXT: v_cndmask_b32_e32 v5, v1, v4, vcc +; SI-NEXT: v_cndmask_b32_e64 v4, v0, 0, vcc +; SI-NEXT: s_mov_b64 vcc, exec +; SI-NEXT: s_cbranch_execz .LBB15_3 +; SI-NEXT: s_branch .LBB15_8 +; SI-NEXT: .LBB15_2: +; SI-NEXT: ; implicit-def: $vgpr4_vgpr5 +; SI-NEXT: s_mov_b64 vcc, 0 +; SI-NEXT: .LBB15_3: ; %frem.compute15 +; SI-NEXT: s_brev_b32 s4, -2 +; SI-NEXT: v_and_b32_e32 v6, 0x7fffffff, v1 +; SI-NEXT: s_mov_b32 s2, 0 +; SI-NEXT: s_mov_b32 s3, 0x7ff00000 +; SI-NEXT: v_cmp_lt_f64_e64 vcc, |v[0:1]|, s[2:3] +; SI-NEXT: v_frexp_mant_f64_e64 v[4:5], |v[0:1]| +; SI-NEXT: v_cndmask_b32_e32 v5, v6, v5, vcc +; SI-NEXT: v_cndmask_b32_e32 v4, v0, v4, vcc +; SI-NEXT: v_frexp_exp_i32_f64_e32 v6, v[0:1] +; SI-NEXT: s_and_b64 s[2:3], vcc, exec +; SI-NEXT: v_readfirstlane_b32 s2, v6 +; SI-NEXT: s_cselect_b32 s3, s2, 0 +; SI-NEXT: s_mov_b32 s2, -1 +; SI-NEXT: s_add_i32 s5, s3, -1 +; SI-NEXT: v_ldexp_f64 v[5:6], v[4:5], 26 +; SI-NEXT: s_cmp_lt_i32 s5, 27 +; SI-NEXT: s_cbranch_scc1 .LBB15_7 +; SI-NEXT: ; %bb.4: ; %frem.loop_body23.preheader +; SI-NEXT: s_add_i32 s5, s3, 25 +; SI-NEXT: v_mov_b32_e32 v9, 0x43300000 +; SI-NEXT: v_mov_b32_e32 v4, 0 +; SI-NEXT: s_mov_b32 s3, 0x432fffff +; SI-NEXT: .LBB15_5: ; %frem.loop_body23 +; SI-NEXT: ; =>This Inner Loop Header: Depth=1 +; SI-NEXT: v_mov_b32_e32 v8, v6 +; SI-NEXT: v_mov_b32_e32 v7, v5 +; SI-NEXT: v_bfi_b32 v5, s4, v9, v8 +; SI-NEXT: v_add_f64 v[10:11], v[7:8], v[4:5] +; SI-NEXT: v_add_f64 v[5:6], v[10:11], -v[4:5] +; SI-NEXT: v_cmp_gt_f64_e64 vcc, |v[7:8]|, s[2:3] +; SI-NEXT: v_cndmask_b32_e32 v6, v6, v8, vcc +; SI-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc +; SI-NEXT: v_add_f64 v[5:6], v[7:8], -v[5:6] +; SI-NEXT: v_cmp_gt_f64_e32 vcc, 0, v[5:6] +; SI-NEXT: v_add_f64 v[10:11], v[5:6], 1.0 +; SI-NEXT: v_cndmask_b32_e32 v6, v6, v11, vcc +; SI-NEXT: v_cndmask_b32_e32 v5, v5, v10, vcc +; SI-NEXT: v_ldexp_f64 v[5:6], v[5:6], 26 +; SI-NEXT: s_sub_i32 s5, s5, 26 +; SI-NEXT: s_cmp_gt_i32 s5, 26 +; SI-NEXT: s_cbranch_scc1 .LBB15_5 +; SI-NEXT: ; %bb.6: ; %Flow50 +; SI-NEXT: v_mov_b32_e32 v5, v7 +; SI-NEXT: v_mov_b32_e32 v6, v8 +; SI-NEXT: .LBB15_7: ; %frem.loop_exit24 +; SI-NEXT: s_sub_i32 s2, s5, 25 +; SI-NEXT: v_ldexp_f64 v[4:5], v[5:6], s2 +; SI-NEXT: s_mov_b32 s2, -1 +; SI-NEXT: s_mov_b32 s3, 0x432fffff +; SI-NEXT: v_cmp_gt_f64_e64 vcc, |v[4:5]|, s[2:3] +; SI-NEXT: s_brev_b32 s2, -2 +; SI-NEXT: v_mov_b32_e32 v6, 0x43300000 +; SI-NEXT: v_bfi_b32 v7, s2, v6, v5 +; SI-NEXT: v_mov_b32_e32 v6, 0 +; SI-NEXT: v_add_f64 v[8:9], v[4:5], v[6:7] +; SI-NEXT: v_add_f64 v[6:7], v[8:9], -v[6:7] +; SI-NEXT: v_cndmask_b32_e32 v7, v7, v5, vcc +; SI-NEXT: v_cndmask_b32_e32 v6, v6, v4, vcc +; SI-NEXT: v_add_f64 v[4:5], v[4:5], -v[6:7] +; SI-NEXT: v_cmp_gt_f64_e32 vcc, 0, v[4:5] +; SI-NEXT: v_add_f64 v[6:7], v[4:5], 1.0 +; SI-NEXT: v_cndmask_b32_e32 v4, v4, v6, vcc +; SI-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc +; SI-NEXT: v_bfi_b32 v5, s2, v5, v1 +; SI-NEXT: .LBB15_8: +; SI-NEXT: v_cmp_ngt_f64_e64 s[2:3], |v[2:3]|, 1.0 +; SI-NEXT: s_and_b64 vcc, exec, s[2:3] +; SI-NEXT: s_cbranch_vccz .LBB15_10 +; SI-NEXT: ; %bb.9: ; %frem.else +; SI-NEXT: v_and_b32_e32 v6, 0x80000000, v3 +; SI-NEXT: v_cmp_eq_f64_e64 vcc, |v[2:3]|, 1.0 +; SI-NEXT: v_cndmask_b32_e32 v7, v3, v6, vcc +; SI-NEXT: v_cndmask_b32_e64 v6, v2, 0, vcc +; SI-NEXT: s_mov_b64 vcc, exec +; SI-NEXT: s_cbranch_execz .LBB15_11 +; SI-NEXT: s_branch .LBB15_16 +; SI-NEXT: .LBB15_10: +; SI-NEXT: ; implicit-def: $vgpr6_vgpr7 +; SI-NEXT: s_mov_b64 vcc, 0 +; SI-NEXT: .LBB15_11: ; %frem.compute +; SI-NEXT: s_brev_b32 s4, -2 +; SI-NEXT: v_and_b32_e32 v8, 0x7fffffff, v3 +; SI-NEXT: s_mov_b32 s2, 0 +; SI-NEXT: s_mov_b32 s3, 0x7ff00000 +; SI-NEXT: v_cmp_lt_f64_e64 vcc, |v[2:3]|, s[2:3] +; SI-NEXT: v_frexp_mant_f64_e64 v[6:7], |v[2:3]| +; SI-NEXT: v_cndmask_b32_e32 v7, v8, v7, vcc +; SI-NEXT: v_cndmask_b32_e32 v6, v2, v6, vcc +; SI-NEXT: v_frexp_exp_i32_f64_e32 v8, v[2:3] +; SI-NEXT: s_and_b64 s[2:3], vcc, exec +; SI-NEXT: v_readfirstlane_b32 s2, v8 +; SI-NEXT: s_cselect_b32 s3, s2, 0 +; SI-NEXT: s_mov_b32 s2, -1 +; SI-NEXT: s_add_i32 s5, s3, -1 +; SI-NEXT: v_ldexp_f64 v[7:8], v[6:7], 26 +; SI-NEXT: s_cmp_lt_i32 s5, 27 +; SI-NEXT: s_cbranch_scc1 .LBB15_15 +; SI-NEXT: ; %bb.12: ; %frem.loop_body.preheader +; SI-NEXT: s_add_i32 s5, s3, 25 +; SI-NEXT: v_mov_b32_e32 v11, 0x43300000 +; SI-NEXT: v_mov_b32_e32 v6, 0 +; SI-NEXT: s_mov_b32 s3, 0x432fffff +; SI-NEXT: .LBB15_13: ; %frem.loop_body +; SI-NEXT: ; =>This Inner Loop Header: Depth=1 +; SI-NEXT: v_mov_b32_e32 v10, v8 +; SI-NEXT: v_mov_b32_e32 v9, v7 +; SI-NEXT: v_bfi_b32 v7, s4, v11, v10 +; SI-NEXT: v_add_f64 v[12:13], v[9:10], v[6:7] +; SI-NEXT: v_add_f64 v[7:8], v[12:13], -v[6:7] +; SI-NEXT: v_cmp_gt_f64_e64 vcc, |v[9:10]|, s[2:3] +; SI-NEXT: v_cndmask_b32_e32 v8, v8, v10, vcc +; SI-NEXT: v_cndmask_b32_e32 v7, v7, v9, vcc +; SI-NEXT: v_add_f64 v[7:8], v[9:10], -v[7:8] +; SI-NEXT: v_cmp_gt_f64_e32 vcc, 0, v[7:8] +; SI-NEXT: v_add_f64 v[12:13], v[7:8], 1.0 +; SI-NEXT: v_cndmask_b32_e32 v8, v8, v13, vcc +; SI-NEXT: v_cndmask_b32_e32 v7, v7, v12, vcc +; SI-NEXT: v_ldexp_f64 v[7:8], v[7:8], 26 +; SI-NEXT: s_sub_i32 s5, s5, 26 +; SI-NEXT: s_cmp_gt_i32 s5, 26 +; SI-NEXT: s_cbranch_scc1 .LBB15_13 +; SI-NEXT: ; %bb.14: ; %Flow +; SI-NEXT: v_mov_b32_e32 v7, v9 +; SI-NEXT: v_mov_b32_e32 v8, v10 +; SI-NEXT: .LBB15_15: ; %frem.loop_exit +; SI-NEXT: s_sub_i32 s2, s5, 25 +; SI-NEXT: v_ldexp_f64 v[6:7], v[7:8], s2 +; SI-NEXT: s_mov_b32 s2, -1 +; SI-NEXT: s_mov_b32 s3, 0x432fffff +; SI-NEXT: v_cmp_gt_f64_e64 vcc, |v[6:7]|, s[2:3] +; SI-NEXT: s_brev_b32 s2, -2 +; SI-NEXT: v_mov_b32_e32 v8, 0x43300000 +; SI-NEXT: v_bfi_b32 v9, s2, v8, v7 +; SI-NEXT: v_mov_b32_e32 v8, 0 +; SI-NEXT: v_add_f64 v[10:11], v[6:7], v[8:9] +; SI-NEXT: v_add_f64 v[8:9], v[10:11], -v[8:9] +; SI-NEXT: v_cndmask_b32_e32 v9, v9, v7, vcc +; SI-NEXT: v_cndmask_b32_e32 v8, v8, v6, vcc +; SI-NEXT: v_add_f64 v[6:7], v[6:7], -v[8:9] +; SI-NEXT: v_cmp_gt_f64_e32 vcc, 0, v[6:7] +; SI-NEXT: v_add_f64 v[8:9], v[6:7], 1.0 +; SI-NEXT: v_cndmask_b32_e32 v6, v6, v8, vcc +; SI-NEXT: v_cndmask_b32_e32 v7, v7, v9, vcc +; SI-NEXT: v_bfi_b32 v7, s2, v7, v3 +; SI-NEXT: .LBB15_16: ; %Flow49 +; SI-NEXT: s_mov_b32 s4, 0 +; SI-NEXT: s_mov_b32 s5, 0x7ff00000 +; SI-NEXT: v_cmp_nge_f64_e64 vcc, |v[0:1]|, s[4:5] +; SI-NEXT: v_mov_b32_e32 v8, 0x7ff80000 +; SI-NEXT: v_cndmask_b32_e32 v1, v8, v5, vcc +; SI-NEXT: v_cndmask_b32_e32 v0, 0, v4, vcc +; SI-NEXT: s_mov_b32 s3, 0xf000 +; SI-NEXT: s_mov_b32 s2, -1 +; SI-NEXT: v_cmp_nge_f64_e64 vcc, |v[2:3]|, s[4:5] +; SI-NEXT: v_cndmask_b32_e32 v3, v8, v7, vcc +; SI-NEXT: v_cndmask_b32_e32 v2, 0, v6, vcc +; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 +; SI-NEXT: s_endpgm +; +; CI-LABEL: frem_v2f64_const_one_denum: +; CI: ; %bb.0: +; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x9 +; CI-NEXT: s_mov_b32 s7, 0xf000 +; CI-NEXT: s_mov_b32 s6, -1 +; CI-NEXT: s_waitcnt lgkmcnt(0) +; CI-NEXT: s_mov_b32 s4, s2 +; CI-NEXT: s_mov_b32 s5, s3 +; CI-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0 +; CI-NEXT: s_waitcnt vmcnt(0) +; CI-NEXT: v_cmp_ngt_f64_e64 s[2:3], |v[0:1]|, 1.0 +; CI-NEXT: s_and_b64 vcc, exec, s[2:3] +; CI-NEXT: s_cbranch_vccz .LBB15_2 +; CI-NEXT: ; %bb.1: ; %frem.else16 +; CI-NEXT: v_cmp_eq_f64_e64 vcc, |v[0:1]|, 1.0 +; CI-NEXT: v_and_b32_e32 v4, 0x80000000, v1 +; CI-NEXT: v_cndmask_b32_e32 v5, v1, v4, vcc +; CI-NEXT: v_cndmask_b32_e64 v4, v0, 0, vcc +; CI-NEXT: s_cbranch_execz .LBB15_3 +; CI-NEXT: s_branch .LBB15_8 +; CI-NEXT: .LBB15_2: +; CI-NEXT: ; implicit-def: $vgpr4_vgpr5 +; CI-NEXT: .LBB15_3: ; %frem.compute15 +; CI-NEXT: v_frexp_mant_f64_e64 v[4:5], |v[0:1]| +; CI-NEXT: v_frexp_exp_i32_f64_e32 v6, v[0:1] +; CI-NEXT: v_ldexp_f64 v[4:5], v[4:5], 26 +; CI-NEXT: v_add_i32_e32 v8, vcc, -1, v6 +; CI-NEXT: v_cmp_gt_i32_e32 vcc, 27, v8 +; CI-NEXT: s_cbranch_vccnz .LBB15_7 +; CI-NEXT: ; %bb.4: ; %frem.loop_body23.preheader +; CI-NEXT: v_add_i32_e32 v8, vcc, 25, v6 +; CI-NEXT: .LBB15_5: ; %frem.loop_body23 +; CI-NEXT: ; =>This Inner Loop Header: Depth=1 +; CI-NEXT: v_mov_b32_e32 v7, v5 +; CI-NEXT: v_mov_b32_e32 v6, v4 +; CI-NEXT: v_rndne_f64_e32 v[4:5], v[6:7] +; CI-NEXT: v_add_f64 v[4:5], v[6:7], -v[4:5] +; CI-NEXT: v_cmp_gt_f64_e32 vcc, 0, v[4:5] +; CI-NEXT: v_add_f64 v[9:10], v[4:5], 1.0 +; CI-NEXT: v_cndmask_b32_e32 v5, v5,... [truncated] 
@frederik-h frederik-h removed the request for review from FreddyLeaf October 20, 2025 05:57
@frederik-h frederik-h merged commit 46a866a into llvm:main Oct 20, 2025
10 checks passed
@frederik-h frederik-h deleted the expand-fp-refactor-modified branch October 20, 2025 08:24
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 20, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-sie-ubuntu-fast running on sie-linux-worker while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/38187

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure) ******************** TEST 'lld :: ELF/linkerscript/empty-relaplt-dyntags.test' FAILED ******************** Exit Code: 250 Command Output (stdout): -- # RUN: at line 2 /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/llvm-mc -filetype=obj -triple=x86_64-pc-linux /dev/null -o /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/tools/lld/test/ELF/linkerscript/Output/empty-relaplt-dyntags.test.tmp.o # executed command: /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/llvm-mc -filetype=obj -triple=x86_64-pc-linux /dev/null -o /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/tools/lld/test/ELF/linkerscript/Output/empty-relaplt-dyntags.test.tmp.o # note: command had no output on stdout or stderr # RUN: at line 3 /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/ld.lld -shared /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/tools/lld/test/ELF/linkerscript/Output/empty-relaplt-dyntags.test.tmp.o -T /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/lld/test/ELF/linkerscript/empty-relaplt-dyntags.test -o /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/tools/lld/test/ELF/linkerscript/Output/empty-relaplt-dyntags.test.tmp # executed command: /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/ld.lld -shared /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/tools/lld/test/ELF/linkerscript/Output/empty-relaplt-dyntags.test.tmp.o -T /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/lld/test/ELF/linkerscript/empty-relaplt-dyntags.test -o /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/tools/lld/test/ELF/linkerscript/Output/empty-relaplt-dyntags.test.tmp # .---command stderr------------ # | terminate called after throwing an instance of 'std::system_error' # | what(): Resource temporarily unavailable # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | ld.lld: error: failed to write output '/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/tools/lld/test/ELF/linkerscript/Output/empty-relaplt-dyntags.test.tmp': No such file or directory # | #0 0x00006248b7533fb0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/ld.lld+0x657fb0) # | #1 0x00006248b7530c6f llvm::sys::RunSignalHandlers() (/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/ld.lld+0x654c6f) # | #2 0x00006248b7530dc2 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0 # | #3 0x000078c31377c520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520) # | #4 0x000078c3137d09fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 # | #5 0x000078c3137d09fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10 # | #6 0x000078c3137d09fc pthread_kill ./nptl/pthread_kill.c:89:10 # | #7 0x000078c31377c476 gsignal ./signal/../sysdeps/posix/raise.c:27:6 # | #8 0x000078c3137627f3 abort ./stdlib/abort.c:81:7 # | #9 0x000078c313b0eb9e (/lib/x86_64-linux-gnu/libstdc++.so.6+0xa2b9e) # | #10 0x000078c313b1a20c (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae20c) # | #11 0x000078c313b1a277 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae277) # | #12 0x000078c313b1a4d8 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae4d8) # | #13 0x000078c313b1183c std::__throw_system_error(int) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xa583c) # | #14 0x000078c313b4833d (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc33d) # | #15 0x00006248baad569e std::thread::_State_impl<std::thread::_Invoker<std::tuple<llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()>>>::_M_run() Parallel.cpp:0:0 # | #16 0x000078c313b48253 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc253) # | #17 0x000078c3137ceac3 start_thread ./nptl/pthread_create.c:442:8 # | #18 0x000078c313860850 ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:83:0 # `----------------------------- # error: command failed with exit status: 250 -- ******************** 
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 20, 2025

LLVM Buildbot has detected a new failure on builder clang-m68k-linux-cross running on suse-gary-m68k-cross while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/17750

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: stage 1 checked (failure) ******************** TEST 'LLVM :: tools/dsymutil/X86/objc.test' FAILED ******************** Exit Code: 250 Command Output (stdout): -- # RUN: at line 1 /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/dsymutil --verify-dwarf=output -f -oso-prepend-path=/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/.. /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/../Inputs/objc.macho.x86_64 -o /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/test/tools/dsymutil/X86/Output/objc.test.tmp.d4 # executed command: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/dsymutil --verify-dwarf=output -f -oso-prepend-path=/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/.. /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/../Inputs/objc.macho.x86_64 -o /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/test/tools/dsymutil/X86/Output/objc.test.tmp.d4 # .---command stderr------------ # | warning: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/../Inputs/objc.macho.x86_64.o: timestamp mismatch between object file (2024-10-09 15:48:55.357038027) and debug map (2015-03-27 02:12:27.000000000) # `----------------------------- # RUN: at line 2 /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/llvm-dwarfdump -apple-types -apple-objc /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/test/tools/dsymutil/X86/Output/objc.test.tmp.d4 | /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/FileCheck /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/objc.test # executed command: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/llvm-dwarfdump -apple-types -apple-objc /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/test/tools/dsymutil/X86/Output/objc.test.tmp.d4 # executed command: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/FileCheck /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/objc.test # RUN: at line 5 /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/dsymutil --verify-dwarf=output --accelerator='Dwarf' -f -oso-prepend-path=/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/.. /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/../Inputs/objc.macho.x86_64 -o /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/test/tools/dsymutil/X86/Output/objc.test.tmp.d5 # executed command: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/dsymutil --verify-dwarf=output --accelerator=Dwarf -f -oso-prepend-path=/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/.. /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/../Inputs/objc.macho.x86_64 -o /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/test/tools/dsymutil/X86/Output/objc.test.tmp.d5 # .---command stderr------------ # | warning: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/tools/dsymutil/X86/../Inputs/objc.macho.x86_64.o: timestamp mismatch between object file (2024-10-09 15:48:55.357038027) and debug map (2015-03-27 02:12:27.000000000) # | terminate called after throwing an instance of 'std::system_error' # | what(): Resource temporarily unavailable # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | #0 0x00000000014eea9b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/dsymutil+0x14eea9b) # | #1 0x00000000014ebd3a SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0 # | #2 0x00007fe559e41580 __restore_rt (/lib64/libc.so.6+0x41580) # | #3 0x00007fe559e9a25c __pthread_kill_implementation (/lib64/libc.so.6+0x9a25c) # | #4 0x00007fe559e414b6 gsignal (/lib64/libc.so.6+0x414b6) # | #5 0x00007fe559e2891a abort (/lib64/libc.so.6+0x2891a) # | #6 0x00007fe55a2adc4d (/lib64/libstdc++.so.6+0xadc4d) # | #7 0x00007fe55a2bf2ec (/lib64/libstdc++.so.6+0xbf2ec) # | #8 0x00007fe55a2ad7f5 std::unexpected() (/lib64/libstdc++.so.6+0xad7f5) # | #9 0x00007fe55a2bf578 (/lib64/libstdc++.so.6+0xbf578) # | #10 0x00007fe55a2b0ffd std::__throw_system_error(int) (/lib64/libstdc++.so.6+0xb0ffd) # | #11 0x00007fe55a2b1055 (/lib64/libstdc++.so.6+0xb1055) # | #12 0x000000000147c743 std::thread::_State_impl<std::thread::_Invoker<std::tuple<llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()>>>::_M_run() Parallel.cpp:0:0 # | #13 0x00007fe55a2ed424 (/lib64/libstdc++.so.6+0xed424) # | #14 0x00007fe559e983b2 start_thread (/lib64/libc.so.6+0x983b2) # | #15 0x00007fe559f1d5fc __GI___clone3 (/lib64/libc.so.6+0x11d5fc) # `----------------------------- # error: command failed with exit status: 250 -- ******************** 
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
Modify the return value of the runImpl function which indicates whether or not the IR has been changed in a single place instead of doing it separately for each instruction at the insertion into the worklist. Further changes: Replace if-else in worklist processing loop by switch and add test cases which demonstrate that the "scalarize" function does not always add items to the worklist and hence a worklist emptiness check cannot be used for the runImpl return value.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
Modify the return value of the runImpl function which indicates whether or not the IR has been changed in a single place instead of doing it separately for each instruction at the insertion into the worklist. Further changes: Replace if-else in worklist processing loop by switch and add test cases which demonstrate that the "scalarize" function does not always add items to the worklist and hence a worklist emptiness check cannot be used for the runImpl return value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment