Skip to content

Conversation

@diggerlin
Copy link
Contributor

@diggerlin diggerlin commented Jun 13, 2025

LBARX loads a byte from memory into a register, automatically setting the remaining bits of the register to zero. If a subsequent RLWINM instruction is used to clear those same bits (which LBARX has already set to zero), the RLWINM is redundant and can be eliminated.

these redundant clear instructions are introduced by 85a9f2e.

@llvmbot
Copy link
Member

llvmbot commented Jun 13, 2025

@llvm/pr-subscribers-backend-powerpc

Author: zhijian lin (diggerlin)

Changes

Patch is 106.82 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144089.diff

5 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCMIPeephole.cpp (+34-1)
  • (modified) llvm/test/CodeGen/PowerPC/PR35812-neg-cmpxchg.ll (+45-47)
  • (modified) llvm/test/CodeGen/PowerPC/all-atomics.ll (+950-963)
  • (modified) llvm/test/CodeGen/PowerPC/atomics-regression.ll (-68)
  • (modified) llvm/test/CodeGen/PowerPC/loop-comment.ll (+6-7)
diff --git a/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp b/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp index 18350650bfe2d..736443d555b9f 100644 --- a/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp +++ b/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp @@ -1281,7 +1281,40 @@ bool PPCMIPeephole::simplifyCode() { Simplified = true; break; } - case PPC::RLWINM: + case PPC::RLWINM: { + Register SrcReg = MI.getOperand(1).getReg(); + MachineInstr *DefMI = MRI->getVRegDef(SrcReg); + + if (DefMI) { + unsigned Opcode = DefMI->getOpcode(); + if (Opcode == PPC::LBARX || Opcode == PPC::LHARX) { + unsigned SH = MI.getOperand(2).getImm(); + unsigned MB = MI.getOperand(3).getImm(); + unsigned ME = MI.getOperand(4).getImm(); + + // LBARX already sets the upper 24 bits of the destination register + // to zero. If the register is cleared to zero in the upper 24 bits + // using RLWINM later, we eliminate the RLWINM. Same applies to + // LHARX. + if (SH == 0 && ME == 31 && + ((MB == 24 && Opcode == PPC::LBARX) || + (MB == 16 && Opcode == PPC::LHARX))) { + Register SrcReg = MI.getOperand(0).getReg(); + Register DstReg = + DefMI->getOperand(0).getReg(); + + // Replace all uses of RLWINM's result with LBARX result + MRI->replaceRegWith(DstReg, SrcReg); + addRegToUpdate(DstReg); + addRegToUpdate(SrcReg); + ToErase = &MI; + Simplified = true; + break; + } + } + } + [[fall_through]]; + } case PPC::RLWINM_rec: case PPC::RLWINM8: case PPC::RLWINM8_rec: { diff --git a/llvm/test/CodeGen/PowerPC/PR35812-neg-cmpxchg.ll b/llvm/test/CodeGen/PowerPC/PR35812-neg-cmpxchg.ll index 1a8dabc5ad719..b7852c3c3e6e0 100644 --- a/llvm/test/CodeGen/PowerPC/PR35812-neg-cmpxchg.ll +++ b/llvm/test/CodeGen/PowerPC/PR35812-neg-cmpxchg.ll @@ -18,54 +18,52 @@ define signext i32 @main() nounwind { ; CHECK-NEXT: sth 3, 46(1) ; CHECK-NEXT: addi 3, 1, 46 ; CHECK-NEXT: lharx 4, 0, 3 -; CHECK-NEXT: clrlwi 4, 4, 16 -; CHECK-NEXT: cmplwi 4, 33059 -; CHECK-NEXT: bne 0, .LBB0_4 -; CHECK-NEXT: # %bb.1: # %cmpxchg.fencedstore +; CHECK-NEXT: cmplwi 4, 33059 +; CHECK-NEXT: bne 0, .LBB0_4 +; CHECK-NEXT: # %bb.1: # %cmpxchg.fencedstore ; CHECK-NEXT: sync ; CHECK-NEXT: li 4, 234 -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB0_2: # %cmpxchg.trystore -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 +; CHECK-NEXT: .p2align 5 +; CHECK-NEXT: .LBB0_2: # %cmpxchg.trystore +; CHECK-NEXT: # ; CHECK-NEXT: sthcx. 4, 0, 3 -; CHECK-NEXT: beq 0, .LBB0_7 -; CHECK-NEXT: # %bb.3: # %cmpxchg.releasedload -; CHECK-NEXT: # in Loop: Header=BB0_2 Depth=1 +; CHECK-NEXT: beq 0, .LBB0_7 +; CHECK-NEXT: # %bb.3: # %cmpxchg.releasedload +; CHECK-NEXT: # ; CHECK-NEXT: lharx 5, 0, 3 -; CHECK-NEXT: clrlwi 5, 5, 16 -; CHECK-NEXT: cmplwi 5, 33059 -; CHECK-NEXT: beq 0, .LBB0_2 -; CHECK-NEXT: .LBB0_4: # %cmpxchg.nostore +; CHECK-NEXT: cmplwi 5, 33059 +; CHECK-NEXT: beq 0, .LBB0_2 +; CHECK-NEXT: .LBB0_4: # %cmpxchg.nostore ; CHECK-NEXT: lwsync ; CHECK-NEXT: b .LBB0_8 -; CHECK-NEXT: .LBB0_5: # %L.B0000 +; CHECK-NEXT: .LBB0_5: # %L.B0000 ; CHECK-NEXT: lhz 3, 46(1) -; CHECK-NEXT: cmplwi 3, 234 -; CHECK-NEXT: bne 0, .LBB0_9 -; CHECK-NEXT: # %bb.6: # %L.B0001 +; CHECK-NEXT: cmplwi 3, 234 +; CHECK-NEXT: bne 0, .LBB0_9 +; CHECK-NEXT: # %bb.6: # %L.B0001 ; CHECK-NEXT: addis 3, 2, .L_MergedGlobals@toc@ha ; CHECK-NEXT: addi 3, 3, .L_MergedGlobals@toc@l ; CHECK-NEXT: bl puts ; CHECK-NEXT: nop ; CHECK-NEXT: li 3, 0 ; CHECK-NEXT: b .LBB0_11 -; CHECK-NEXT: .LBB0_7: # %cmpxchg.success +; CHECK-NEXT: .LBB0_7: # %cmpxchg.success ; CHECK-NEXT: lwsync ; CHECK-NEXT: b .LBB0_5 -; CHECK-NEXT: .LBB0_8: # %L.B0003 +; CHECK-NEXT: .LBB0_8: # %L.B0003 ; CHECK-NEXT: addis 3, 2, .L_MergedGlobals@toc@ha ; CHECK-NEXT: addi 3, 3, .L_MergedGlobals@toc@l ; CHECK-NEXT: addi 3, 3, 16 ; CHECK-NEXT: b .LBB0_10 -; CHECK-NEXT: .LBB0_9: # %L.B0005 +; CHECK-NEXT: .LBB0_9: # %L.B0005 ; CHECK-NEXT: addis 3, 2, .L_MergedGlobals@toc@ha ; CHECK-NEXT: addi 3, 3, .L_MergedGlobals@toc@l ; CHECK-NEXT: addi 3, 3, 64 -; CHECK-NEXT: .LBB0_10: # %L.B0003 +; CHECK-NEXT: .LBB0_10: # %L.B0003 ; CHECK-NEXT: bl puts ; CHECK-NEXT: nop ; CHECK-NEXT: li 3, 1 -; CHECK-NEXT: .LBB0_11: # %L.B0003 +; CHECK-NEXT: .LBB0_11: # %L.B0003 ; CHECK-NEXT: addi 1, 1, 48 ; CHECK-NEXT: ld 0, 16(1) ; CHECK-NEXT: mtlr 0 @@ -83,62 +81,62 @@ define signext i32 @main() nounwind { ; CHECK-P7-NEXT: rlwinm 4, 4, 3, 27, 27 ; CHECK-P7-NEXT: lwarx 5, 0, 3 ; CHECK-P7-NEXT: srw 6, 5, 4 -; CHECK-P7-NEXT: clrlwi 6, 6, 16 -; CHECK-P7-NEXT: cmplwi 6, 33059 -; CHECK-P7-NEXT: bne 0, .LBB0_4 -; CHECK-P7-NEXT: # %bb.1: # %cmpxchg.fencedstore +; CHECK-P7-NEXT: clrlwi 6, 6, 16 +; CHECK-P7-NEXT: cmplwi 6, 33059 +; CHECK-P7-NEXT: bne 0, .LBB0_4 +; CHECK-P7-NEXT: # %bb.1: # %cmpxchg.fencedstore ; CHECK-P7-NEXT: lis 6, 0 ; CHECK-P7-NEXT: li 7, 234 ; CHECK-P7-NEXT: sync ; CHECK-P7-NEXT: ori 6, 6, 65535 ; CHECK-P7-NEXT: slw 7, 7, 4 ; CHECK-P7-NEXT: slw 6, 6, 4 -; CHECK-P7-NEXT: not 6, 6 -; CHECK-P7-NEXT: .p2align 4 -; CHECK-P7-NEXT: .LBB0_2: # %cmpxchg.trystore -; CHECK-P7-NEXT: # =>This Inner Loop Header: Depth=1 +; CHECK-P7-NEXT: not 6, 6 +; CHECK-P7-NEXT: .p2align 4 +; CHECK-P7-NEXT: .LBB0_2: # %cmpxchg.trystore +; CHECK-P7-NEXT: # ; CHECK-P7-NEXT: and 5, 5, 6 ; CHECK-P7-NEXT: or 5, 5, 7 ; CHECK-P7-NEXT: stwcx. 5, 0, 3 -; CHECK-P7-NEXT: beq 0, .LBB0_7 -; CHECK-P7-NEXT: # %bb.3: # %cmpxchg.releasedload -; CHECK-P7-NEXT: # in Loop: Header=BB0_2 Depth=1 +; CHECK-P7-NEXT: beq 0, .LBB0_7 +; CHECK-P7-NEXT: # %bb.3: # %cmpxchg.releasedload +; CHECK-P7-NEXT: # ; CHECK-P7-NEXT: lwarx 5, 0, 3 ; CHECK-P7-NEXT: srw 8, 5, 4 -; CHECK-P7-NEXT: clrlwi 8, 8, 16 -; CHECK-P7-NEXT: cmplwi 8, 33059 -; CHECK-P7-NEXT: beq 0, .LBB0_2 -; CHECK-P7-NEXT: .LBB0_4: # %cmpxchg.nostore +; CHECK-P7-NEXT: clrlwi 8, 8, 16 +; CHECK-P7-NEXT: cmplwi 8, 33059 +; CHECK-P7-NEXT: beq 0, .LBB0_2 +; CHECK-P7-NEXT: .LBB0_4: # %cmpxchg.nostore ; CHECK-P7-NEXT: lwsync ; CHECK-P7-NEXT: b .LBB0_8 -; CHECK-P7-NEXT: .LBB0_5: # %L.B0000 +; CHECK-P7-NEXT: .LBB0_5: # %L.B0000 ; CHECK-P7-NEXT: lhz 3, 46(1) -; CHECK-P7-NEXT: cmplwi 3, 234 -; CHECK-P7-NEXT: bne 0, .LBB0_9 -; CHECK-P7-NEXT: # %bb.6: # %L.B0001 +; CHECK-P7-NEXT: cmplwi 3, 234 +; CHECK-P7-NEXT: bne 0, .LBB0_9 +; CHECK-P7-NEXT: # %bb.6: # %L.B0001 ; CHECK-P7-NEXT: addis 3, 2, .L_MergedGlobals@toc@ha ; CHECK-P7-NEXT: addi 3, 3, .L_MergedGlobals@toc@l ; CHECK-P7-NEXT: bl puts ; CHECK-P7-NEXT: nop ; CHECK-P7-NEXT: li 3, 0 ; CHECK-P7-NEXT: b .LBB0_11 -; CHECK-P7-NEXT: .LBB0_7: # %cmpxchg.success +; CHECK-P7-NEXT: .LBB0_7: # %cmpxchg.success ; CHECK-P7-NEXT: lwsync ; CHECK-P7-NEXT: b .LBB0_5 -; CHECK-P7-NEXT: .LBB0_8: # %L.B0003 +; CHECK-P7-NEXT: .LBB0_8: # %L.B0003 ; CHECK-P7-NEXT: addis 3, 2, .L_MergedGlobals@toc@ha ; CHECK-P7-NEXT: addi 3, 3, .L_MergedGlobals@toc@l ; CHECK-P7-NEXT: addi 3, 3, 16 ; CHECK-P7-NEXT: b .LBB0_10 -; CHECK-P7-NEXT: .LBB0_9: # %L.B0005 +; CHECK-P7-NEXT: .LBB0_9: # %L.B0005 ; CHECK-P7-NEXT: addis 3, 2, .L_MergedGlobals@toc@ha ; CHECK-P7-NEXT: addi 3, 3, .L_MergedGlobals@toc@l ; CHECK-P7-NEXT: addi 3, 3, 64 -; CHECK-P7-NEXT: .LBB0_10: # %L.B0003 +; CHECK-P7-NEXT: .LBB0_10: # %L.B0003 ; CHECK-P7-NEXT: bl puts ; CHECK-P7-NEXT: nop ; CHECK-P7-NEXT: li 3, 1 -; CHECK-P7-NEXT: .LBB0_11: # %L.B0003 +; CHECK-P7-NEXT: .LBB0_11: # %L.B0003 ; CHECK-P7-NEXT: addi 1, 1, 48 ; CHECK-P7-NEXT: ld 0, 16(1) ; CHECK-P7-NEXT: mtlr 0 diff --git a/llvm/test/CodeGen/PowerPC/all-atomics.ll b/llvm/test/CodeGen/PowerPC/all-atomics.ll index 67cee358882ff..07afea75aec67 100644 --- a/llvm/test/CodeGen/PowerPC/all-atomics.ll +++ b/llvm/test/CodeGen/PowerPC/all-atomics.ll @@ -4336,959 +4336,943 @@ entry: define dso_local void @test_compare_and_swap() local_unnamed_addr #0 { ; CHECK-LABEL: test_compare_and_swap: ; CHECK: # %bb.0: # %entry -; CHECK-NEXT: addis 4, 2, sc@toc@ha -; CHECK-NEXT: addis 3, 2, uc@toc@ha -; CHECK-NEXT: std 27, -40(1) # 8-byte Folded Spill -; CHECK-NEXT: std 28, -32(1) # 8-byte Folded Spill -; CHECK-NEXT: std 29, -24(1) # 8-byte Folded Spill -; CHECK-NEXT: std 30, -16(1) # 8-byte Folded Spill -; CHECK-NEXT: addi 6, 4, sc@toc@l -; CHECK-NEXT: lbz 7, uc@toc@l(3) -; CHECK-NEXT: lbz 8, sc@toc@l(4) -; CHECK-NEXT: lbarx 5, 0, 6 -; CHECK-NEXT: clrlwi 9, 5, 24 -; CHECK-NEXT: cmplw 9, 7 -; CHECK-NEXT: bne 0, .LBB3_4 -; CHECK-NEXT: # %bb.1: # %cmpxchg.fencedstore276 -; CHECK-NEXT: sync -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_2: # %cmpxchg.trystore275 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: stbcx. 8, 0, 6 -; CHECK-NEXT: beq 0, .LBB3_4 -; CHECK-NEXT: # %bb.3: # %cmpxchg.releasedload274 -; CHECK-NEXT: # in Loop: Header=BB3_2 Depth=1 -; CHECK-NEXT: lbarx 5, 0, 6 -; CHECK-NEXT: clrlwi 9, 5, 24 -; CHECK-NEXT: cmplw 9, 7 -; CHECK-NEXT: beq 0, .LBB3_2 -; CHECK-NEXT: .LBB3_4: # %cmpxchg.nostore272 -; CHECK-NEXT: addi 7, 3, uc@toc@l -; CHECK-NEXT: lwsync -; CHECK-NEXT: stb 5, sc@toc@l(4) -; CHECK-NEXT: lbz 9, uc@toc@l(3) -; CHECK-NEXT: lbarx 8, 0, 7 -; CHECK-NEXT: clrlwi 10, 8, 24 -; CHECK-NEXT: cmplw 10, 9 -; CHECK-NEXT: bne 0, .LBB3_8 -; CHECK-NEXT: # %bb.5: # %cmpxchg.fencedstore257 -; CHECK-NEXT: sync -; CHECK-NEXT: clrlwi 5, 5, 24 -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_6: # %cmpxchg.trystore256 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: stbcx. 5, 0, 7 -; CHECK-NEXT: beq 0, .LBB3_8 -; CHECK-NEXT: # %bb.7: # %cmpxchg.releasedload255 -; CHECK-NEXT: # in Loop: Header=BB3_6 Depth=1 -; CHECK-NEXT: lbarx 8, 0, 7 -; CHECK-NEXT: clrlwi 10, 8, 24 -; CHECK-NEXT: cmplw 10, 9 -; CHECK-NEXT: beq 0, .LBB3_6 -; CHECK-NEXT: .LBB3_8: # %cmpxchg.nostore253 -; CHECK-NEXT: addis 5, 2, ss@toc@ha -; CHECK-NEXT: lwsync -; CHECK-NEXT: stb 8, uc@toc@l(3) -; CHECK-NEXT: clrlwi 10, 8, 24 -; CHECK-NEXT: lbz 11, sc@toc@l(4) -; CHECK-NEXT: addi 8, 5, ss@toc@l -; CHECK-NEXT: lharx 9, 0, 8 -; CHECK-NEXT: clrlwi 12, 9, 16 -; CHECK-NEXT: cmplw 12, 10 -; CHECK-NEXT: bne 0, .LBB3_12 -; CHECK-NEXT: # %bb.9: # %cmpxchg.fencedstore238 -; CHECK-NEXT: extsb 11, 11 -; CHECK-NEXT: sync -; CHECK-NEXT: clrlwi 11, 11, 16 -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_10: # %cmpxchg.trystore237 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: sthcx. 11, 0, 8 -; CHECK-NEXT: beq 0, .LBB3_12 -; CHECK-NEXT: # %bb.11: # %cmpxchg.releasedload236 -; CHECK-NEXT: # in Loop: Header=BB3_10 Depth=1 -; CHECK-NEXT: lharx 9, 0, 8 -; CHECK-NEXT: clrlwi 12, 9, 16 -; CHECK-NEXT: cmplw 12, 10 -; CHECK-NEXT: beq 0, .LBB3_10 -; CHECK-NEXT: .LBB3_12: # %cmpxchg.nostore234 -; CHECK-NEXT: lwsync -; CHECK-NEXT: sth 9, ss@toc@l(5) -; CHECK-NEXT: addis 5, 2, us@toc@ha -; CHECK-NEXT: lbz 11, uc@toc@l(3) -; CHECK-NEXT: lbz 12, sc@toc@l(4) -; CHECK-NEXT: addi 9, 5, us@toc@l -; CHECK-NEXT: lharx 10, 0, 9 -; CHECK-NEXT: clrlwi 0, 10, 16 -; CHECK-NEXT: cmplw 0, 11 -; CHECK-NEXT: bne 0, .LBB3_16 -; CHECK-NEXT: # %bb.13: # %cmpxchg.fencedstore219 -; CHECK-NEXT: extsb 12, 12 -; CHECK-NEXT: sync -; CHECK-NEXT: clrlwi 12, 12, 16 -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_14: # %cmpxchg.trystore218 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: sthcx. 12, 0, 9 -; CHECK-NEXT: beq 0, .LBB3_16 -; CHECK-NEXT: # %bb.15: # %cmpxchg.releasedload217 -; CHECK-NEXT: # in Loop: Header=BB3_14 Depth=1 -; CHECK-NEXT: lharx 10, 0, 9 -; CHECK-NEXT: clrlwi 0, 10, 16 -; CHECK-NEXT: cmplw 0, 11 -; CHECK-NEXT: beq 0, .LBB3_14 -; CHECK-NEXT: .LBB3_16: # %cmpxchg.nostore215 -; CHECK-NEXT: lwsync -; CHECK-NEXT: sth 10, us@toc@l(5) -; CHECK-NEXT: addis 5, 2, si@toc@ha -; CHECK-NEXT: lbz 12, uc@toc@l(3) -; CHECK-NEXT: lbz 0, sc@toc@l(4) -; CHECK-NEXT: addi 10, 5, si@toc@l -; CHECK-NEXT: lwarx 11, 0, 10 -; CHECK-NEXT: cmplw 11, 12 -; CHECK-NEXT: bne 0, .LBB3_20 -; CHECK-NEXT: # %bb.17: # %cmpxchg.fencedstore200 -; CHECK-NEXT: extsb 0, 0 -; CHECK-NEXT: sync -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_18: # %cmpxchg.trystore199 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: stwcx. 0, 0, 10 -; CHECK-NEXT: beq 0, .LBB3_20 -; CHECK-NEXT: # %bb.19: # %cmpxchg.releasedload198 -; CHECK-NEXT: # in Loop: Header=BB3_18 Depth=1 -; CHECK-NEXT: lwarx 11, 0, 10 -; CHECK-NEXT: cmplw 11, 12 -; CHECK-NEXT: beq 0, .LBB3_18 -; CHECK-NEXT: .LBB3_20: # %cmpxchg.nostore196 -; CHECK-NEXT: lwsync -; CHECK-NEXT: stw 11, si@toc@l(5) -; CHECK-NEXT: addis 5, 2, ui@toc@ha -; CHECK-NEXT: lbz 0, uc@toc@l(3) -; CHECK-NEXT: lbz 30, sc@toc@l(4) -; CHECK-NEXT: addi 11, 5, ui@toc@l -; CHECK-NEXT: lwarx 12, 0, 11 -; CHECK-NEXT: cmplw 12, 0 -; CHECK-NEXT: bne 0, .LBB3_24 -; CHECK-NEXT: # %bb.21: # %cmpxchg.fencedstore181 -; CHECK-NEXT: extsb 30, 30 -; CHECK-NEXT: sync -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_22: # %cmpxchg.trystore180 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: stwcx. 30, 0, 11 -; CHECK-NEXT: beq 0, .LBB3_24 -; CHECK-NEXT: # %bb.23: # %cmpxchg.releasedload179 -; CHECK-NEXT: # in Loop: Header=BB3_22 Depth=1 -; CHECK-NEXT: lwarx 12, 0, 11 -; CHECK-NEXT: cmplw 12, 0 -; CHECK-NEXT: beq 0, .LBB3_22 -; CHECK-NEXT: .LBB3_24: # %cmpxchg.nostore177 -; CHECK-NEXT: addis 30, 2, sll@toc@ha -; CHECK-NEXT: lwsync -; CHECK-NEXT: stw 12, ui@toc@l(5) -; CHECK-NEXT: lbz 29, uc@toc@l(3) -; CHECK-NEXT: lbz 28, sc@toc@l(4) -; CHECK-NEXT: addi 12, 30, sll@toc@l -; CHECK-NEXT: ldarx 0, 0, 12 -; CHECK-NEXT: cmpld 0, 29 -; CHECK-NEXT: bne 0, .LBB3_28 -; CHECK-NEXT: # %bb.25: # %cmpxchg.fencedstore162 -; CHECK-NEXT: extsb 28, 28 -; CHECK-NEXT: sync -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_26: # %cmpxchg.trystore161 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: stdcx. 28, 0, 12 -; CHECK-NEXT: beq 0, .LBB3_28 -; CHECK-NEXT: # %bb.27: # %cmpxchg.releasedload160 -; CHECK-NEXT: # in Loop: Header=BB3_26 Depth=1 -; CHECK-NEXT: ldarx 0, 0, 12 -; CHECK-NEXT: cmpld 0, 29 -; CHECK-NEXT: beq 0, .LBB3_26 -; CHECK-NEXT: .LBB3_28: # %cmpxchg.nostore158 -; CHECK-NEXT: lwsync -; CHECK-NEXT: std 0, sll@toc@l(30) -; CHECK-NEXT: addis 30, 2, ull@toc@ha -; CHECK-NEXT: lbz 28, uc@toc@l(3) -; CHECK-NEXT: lbz 27, sc@toc@l(4) -; CHECK-NEXT: addi 0, 30, ull@toc@l -; CHECK-NEXT: ldarx 29, 0, 0 -; CHECK-NEXT: cmpld 29, 28 -; CHECK-NEXT: bne 0, .LBB3_32 -; CHECK-NEXT: # %bb.29: # %cmpxchg.fencedstore143 -; CHECK-NEXT: extsb 27, 27 -; CHECK-NEXT: sync -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_30: # %cmpxchg.trystore142 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: stdcx. 27, 0, 0 -; CHECK-NEXT: beq 0, .LBB3_32 -; CHECK-NEXT: # %bb.31: # %cmpxchg.releasedload141 -; CHECK-NEXT: # in Loop: Header=BB3_30 Depth=1 -; CHECK-NEXT: ldarx 29, 0, 0 -; CHECK-NEXT: cmpld 29, 28 -; CHECK-NEXT: beq 0, .LBB3_30 -; CHECK-NEXT: .LBB3_32: # %cmpxchg.nostore139 -; CHECK-NEXT: lwsync -; CHECK-NEXT: std 29, ull@toc@l(30) -; CHECK-NEXT: lbz 30, uc@toc@l(3) -; CHECK-NEXT: lbz 29, sc@toc@l(4) -; CHECK-NEXT: lbarx 28, 0, 6 -; CHECK-NEXT: clrlwi 28, 28, 24 -; CHECK-NEXT: cmplw 28, 30 -; CHECK-NEXT: bne 0, .LBB3_36 -; CHECK-NEXT: # %bb.33: # %cmpxchg.fencedstore124 -; CHECK-NEXT: sync -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_34: # %cmpxchg.trystore123 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: stbcx. 29, 0, 6 -; CHECK-NEXT: beq 0, .LBB3_37 -; CHECK-NEXT: # %bb.35: # %cmpxchg.releasedload122 -; CHECK-NEXT: # in Loop: Header=BB3_34 Depth=1 -; CHECK-NEXT: lbarx 28, 0, 6 -; CHECK-NEXT: clrlwi 28, 28, 24 -; CHECK-NEXT: cmplw 28, 30 -; CHECK-NEXT: beq 0, .LBB3_34 -; CHECK-NEXT: .LBB3_36: # %cmpxchg.nostore120 -; CHECK-NEXT: lwsync -; CHECK-NEXT: crxor 20, 20, 20 -; CHECK-NEXT: b .LBB3_38 -; CHECK-NEXT: .LBB3_37: # %cmpxchg.success121 -; CHECK-NEXT: lwsync -; CHECK-NEXT: creqv 20, 20, 20 -; CHECK-NEXT: .LBB3_38: # %cmpxchg.end118 -; CHECK-NEXT: li 6, 0 -; CHECK-NEXT: li 30, 1 -; CHECK-NEXT: isel 6, 30, 6, 20 -; CHECK-NEXT: lbz 30, sc@toc@l(4) -; CHECK-NEXT: stw 6, ui@toc@l(5) -; CHECK-NEXT: lbz 6, uc@toc@l(3) -; CHECK-NEXT: lbarx 29, 0, 7 -; CHECK-NEXT: clrlwi 29, 29, 24 -; CHECK-NEXT: cmplw 29, 6 -; CHECK-NEXT: bne 0, .LBB3_42 -; CHECK-NEXT: # %bb.39: # %cmpxchg.fencedstore105 -; CHECK-NEXT: sync -; CHECK-NEXT: .p2align 5 -; CHECK-NEXT: .LBB3_40: # %cmpxchg.trystore104 -; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 -; CHECK-NEXT: stbcx. 30, 0, 7 -; CHECK-NEX... [truncated] 
@github-actions
Copy link

github-actions bot commented Jun 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@diggerlin diggerlin force-pushed the digger/elimate-RLWINM branch from bcfc75f to 4bbf278 Compare June 13, 2025 15:33
@diggerlin diggerlin changed the title [PowerPC] [PowerPC] eliminate RLWINM instruction following LBARX as possible Jun 13, 2025
Register SrcReg = MI.getOperand(1).getReg();
MachineInstr *DefMI = MRI->getVRegDef(SrcReg);

if (DefMI) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early exit for if (!DefMI)

// Replace all uses of RLWINM's result with LBARX result
MRI->replaceRegWith(DstReg, SrcReg);
addRegToUpdate(DstReg);
addRegToUpdate(SrcReg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang format.

// LBARX already sets the upper 24 bits of the destination register
// to zero. If the register is cleared to zero in the upper 24 bits
// using RLWINM later, we eliminate the RLWINM. Same applies to
// LHARX.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ISA defines the opcodes in terms of 64-bit register bit positions, which is probably clearer. It's the upper 56 bits, 32-bit mode just can't see all of them. Also the bit numbers are different for lharx so saying the same applies is not quite right.

break;
}
case PPC::RLWINM:
case PPC::RLWINM: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is done in PPCMIPeephole rather than fixing it in the DAG stage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Roland. It seems like a simple patch such as the following should accomplish what you're after:

diff --git a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp index f921032356d6..d35d4d91918a 100644 --- a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp +++ b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp @@ -4961,6 +4961,21 @@ bool PPCDAGToDAGISel::tryAsSingleRLWINM(SDNode *N) { // If this is just a masked value where the input is not handled, and // is not a rotate-left (handled by a pattern in the .td file), emit rlwinm if (isRunOfOnes(Imm, MB, ME) && Val.getOpcode() != ISD::ROTL) { + // The result of LBARX/LHARX do not need to be cleared as the instructions + // implicitly clear the upper bits. + unsigned AlreadyCleared = 0; + if (Val.getOpcode() == ISD::INTRINSIC_W_CHAIN) { + auto IntrinsicID = Val.getConstantOperandVal(1); + if (IntrinsicID == Intrinsic::ppc_lbarx) + AlreadyCleared = 24; + else if (IntrinsicID == Intrinsic::ppc_lharx) + AlreadyCleared = 16; + } + if (AlreadyCleared == MB) { + ReplaceUses(SDValue(N, 0), N->getOperand(0)); + return true; + } + SDValue Ops[] = {Val, getI32Imm(0, dl), getI32Imm(MB, dl), getI32Imm(ME, dl)}; CurDAG->SelectNodeTo(N, PPC::RLWINM, MVT::i32, Ops); 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the code.

Copy link
Member

@nemanjai nemanjai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description should probably mention that these redundant clear instructions are introduced by 85a9f2e14859b.

; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB0_1: # %cmpxchg.start
; PPC64LE-NEXT: # =>This Inner Loop Header: Depth=1
; PPC64LE-NEXT: .p2align 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Copy link
Contributor Author

@diggerlin diggerlin Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

the change is caused by run the update_llc_test_checks.py, I created NFC PR #144411 first , after the NFC PR, I will rebased the patch.

break;
}
case PPC::RLWINM:
case PPC::RLWINM: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Roland. It seems like a simple patch such as the following should accomplish what you're after:

diff --git a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp index f921032356d6..d35d4d91918a 100644 --- a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp +++ b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp @@ -4961,6 +4961,21 @@ bool PPCDAGToDAGISel::tryAsSingleRLWINM(SDNode *N) { // If this is just a masked value where the input is not handled, and // is not a rotate-left (handled by a pattern in the .td file), emit rlwinm if (isRunOfOnes(Imm, MB, ME) && Val.getOpcode() != ISD::ROTL) { + // The result of LBARX/LHARX do not need to be cleared as the instructions + // implicitly clear the upper bits. + unsigned AlreadyCleared = 0; + if (Val.getOpcode() == ISD::INTRINSIC_W_CHAIN) { + auto IntrinsicID = Val.getConstantOperandVal(1); + if (IntrinsicID == Intrinsic::ppc_lbarx) + AlreadyCleared = 24; + else if (IntrinsicID == Intrinsic::ppc_lharx) + AlreadyCleared = 16; + } + if (AlreadyCleared == MB) { + ReplaceUses(SDValue(N, 0), N->getOperand(0)); + return true; + } + SDValue Ops[] = {Val, getI32Imm(0, dl), getI32Imm(MB, dl), getI32Imm(ME, dl)}; CurDAG->SelectNodeTo(N, PPC::RLWINM, MVT::i32, Ops); 
diggerlin added a commit that referenced this pull request Jun 18, 2025
…ll-atomics.ll,loop-comment.ll etc (#144411) Run the update_llc_test_checks.py for all-atomics.ll,loop-comment.ll ,PR35812-neg-cmpxchg.ll (Pre-commit patch for the #144089)
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 18, 2025
…ks.py for all-atomics.ll,loop-comment.ll etc (#144411) Run the update_llc_test_checks.py for all-atomics.ll,loop-comment.ll ,PR35812-neg-cmpxchg.ll (Pre-commit patch for the llvm/llvm-project#144089)
@diggerlin diggerlin force-pushed the digger/elimate-RLWINM branch from 4bbf278 to fc8753b Compare June 19, 2025 18:36
AlreadyCleared = 24;
else if (IntrinsicID == Intrinsic::ppc_lharx)
AlreadyCleared = 16;
if (AlreadyCleared != 0 && AlreadyCleared == MB) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you also have to know that ME is 31? Otherwise it clears bits on the right as well.

Copy link
Contributor Author

@diggerlin diggerlin Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there is AlreadyCleared != 0 , that is means , it is Intrinsic::ppc_lharx or Intrinsic::ppc_lbarx only, if it is the first operand of and is Intrinsic::ppc_lharx or IntrinsicID::ppc_lbarx , it means the second operand of the and SDNOD is 0xff(for Intrinsic::ppc_lbarx) or 0xffff(for Intrinsic::ppc_lharx), it hint ME is 31 implicitly. I think I can add assert after line 4973

assert(ME == 31 && "ME must be 31");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mask will normally be what is expected, but you haven't proven it is. I don't think there is anything stopping the user from writing atomic_var & my_own_mask, in which case that and and the auto-generated one for data size will likely be merged into one and with a narrower mask by optimization. So I think it's safer to put in an if check even if it is an unusual case.

@diggerlin diggerlin requested a review from RolandF77 June 20, 2025 15:52
Copy link
Collaborator

@RolandF77 RolandF77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@diggerlin diggerlin merged commit e99d8bb into llvm:main Jun 25, 2025
6 of 7 checks passed
anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
…lvm#144089) LBARX loads a byte from memory into a register, automatically setting the remaining bits of the register to zero. If a subsequent RLWINM instruction is used to clear those same bits (which LBARX has already set to zero), the RLWINM is redundant and can be eliminated. these redundant clear instructions are introduced by 85a9f2e.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

5 participants