[RISCV] Mask integer and float loads as canFoldAsLoad for stackmaps #165761

preames · 2025-10-30T18:23:38Z

We have two mechanisms used in inline spilled for folding a load into a consuming instruction. One is used for stack reloads, the other for other load instructions (usually argument loads). We currently only implement optimizations for the first case, but stackmaps have generic support in target independent code for the other. We can go ahead and set the flag to enable that optimization.

The primary motivation for this is that if we enable load rematerialization without it, we run into crashes where we can't make progress through rematerialization.

We probably should enable the other foldMemoryOperand hook for RISCV specific instructions, but that's separate optimization.

We have two mechanisms used in inline spilled for folding a load into a consuming instruction. One is used for stack reloads, the other for other load instructions (usually argument loads). We currently only implement optimizations for the first case, but stackmaps have generic support in target independent code for the other. We can go ahead and set the flag to enable that optimization. The primary motivation for this is that if we enable load rematerialization without it, we run into crashes where we can't make progress through rematerialization. We probably should enable the other foldMemoryOperand hook for RISCV specific instructions, but that's separate optimization.

llvmbot · 2025-10-30T18:24:09Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

We have two mechanisms used in inline spilled for folding a load into a consuming instruction. One is used for stack reloads, the other for other load instructions (usually argument loads). We currently only implement optimizations for the first case, but stackmaps have generic support in target independent code for the other. We can go ahead and set the flag to enable that optimization.

The primary motivation for this is that if we enable load rematerialization without it, we run into crashes where we can't make progress through rematerialization.

We probably should enable the other foldMemoryOperand hook for RISCV specific instructions, but that's separate optimization.

Full diff: https://github.com/llvm/llvm-project/pull/165761.diff

5 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+1-1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+3-1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoD.td (+1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoF.td (+1)
(modified) llvm/test/CodeGen/RISCV/rv64-stackmap.ll (+4-4)

diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp index 912b82d294f44..069f2b0c1e885 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp +++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp @@ -869,7 +869,7 @@ std::optional<unsigned> getFoldedOpcode(MachineFunction &MF, MachineInstr &MI, } } -// This is the version used during inline spilling +// This is the version used during InlineSpiller::spillAroundUses MachineInstr *RISCVInstrInfo::foldMemoryOperandImpl( MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops, MachineBasicBlock::iterator InsertPt, int FrameIndex, LiveIntervals *LIS, diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td index 7c89686ebfb3c..9cb53fb27a2d2 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td @@ -768,7 +768,7 @@ def BGE : BranchCC_rri<0b101, "bge">; def BLTU : BranchCC_rri<0b110, "bltu">; def BGEU : BranchCC_rri<0b111, "bgeu">; -let IsSignExtendingOpW = 1 in { +let IsSignExtendingOpW = 1, canFoldAsLoad = 1 in { def LB : Load_ri<0b000, "lb">, Sched<[WriteLDB, ReadMemBase]>; def LH : Load_ri<0b001, "lh">, Sched<[WriteLDH, ReadMemBase]>; def LW : Load_ri<0b010, "lw">, Sched<[WriteLDW, ReadMemBase]>; @@ -889,8 +889,10 @@ def CSRRCI : CSR_ii<0b111, "csrrci">; /// RV64I instructions let Predicates = [IsRV64] in { +let canFoldAsLoad = 1 in { def LWU : Load_ri<0b110, "lwu">, Sched<[WriteLDW, ReadMemBase]>; def LD : Load_ri<0b011, "ld">, Sched<[WriteLDD, ReadMemBase]>; +} def SD : Store_rri<0b011, "sd">, Sched<[WriteSTD, ReadStoreData, ReadMemBase]>; let IsSignExtendingOpW = 1 in { diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoD.td b/llvm/lib/Target/RISCV/RISCVInstrInfoD.td index afac37d6337d4..4ffe3e62ac501 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfoD.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfoD.td @@ -71,6 +71,7 @@ defvar DExtsRV64 = [DExt, ZdinxExt]; //===----------------------------------------------------------------------===// let Predicates = [HasStdExtD] in { +let canFoldAsLoad = 1 in def FLD : FPLoad_r<0b011, "fld", FPR64, WriteFLD64>; // Operands for stores are in the order srcreg, base, offset rather than diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoF.td b/llvm/lib/Target/RISCV/RISCVInstrInfoF.td index 6571d998246a7..b30f8ec820c15 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfoF.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfoF.td @@ -330,6 +330,7 @@ class PseudoFROUND<DAGOperand Ty, ValueType vt, ValueType intvt = XLenVT> //===----------------------------------------------------------------------===// let Predicates = [HasStdExtF] in { +let canFoldAsLoad = 1 in def FLW : FPLoad_r<0b010, "flw", FPR32, WriteFLD32>; // Operands for stores are in the order srcreg, base, offset rather than diff --git a/llvm/test/CodeGen/RISCV/rv64-stackmap.ll b/llvm/test/CodeGen/RISCV/rv64-stackmap.ll index c3183a1a3e036..9aefa90684dd3 100644 --- a/llvm/test/CodeGen/RISCV/rv64-stackmap.ll +++ b/llvm/test/CodeGen/RISCV/rv64-stackmap.ll @@ -38,8 +38,8 @@ ; CHECK-NEXT: .quad liveConstant ; CHECK-NEXT: .quad 0 ; CHECK-NEXT: .quad 1 -; CHECK-NEXT: .quad spilledValue -; CHECK-NEXT: .quad 144 +; CHECK-NEXT: .quad liveArgs +; CHECK-NEXT: .quad 0 ; CHECK-NEXT: .quad 1 ; CHECK-NEXT: .quad directFrameIdx ; CHECK-NEXT: .quad 48 @@ -278,7 +278,7 @@ define void @liveConstant() { ; ; Verify 28 stack map entries. ; -; CHECK-LABEL: .word .L{{.*}}-spilledValue +; CHECK-LABEL: .word .L{{.*}}-liveArgs ; CHECK-NEXT: .half 0 ; CHECK-NEXT: .half 28 ; @@ -290,7 +290,7 @@ define void @liveConstant() { ; CHECK-NEXT: .half 2 ; CHECK-NEXT: .half 0 ; CHECK-NEXT: .word -define void @spilledValue(i64 %arg0, i64 %arg1, i64 %arg2, i64 %arg3, i64 %arg4, i64 %l0, i64 %l1, i64 %l2, i64 %l3, i64 %l4, i64 %l5, i64 %l6, i64 %l7, i64 %l8, i64 %l9, i64 %l10, i64 %l11, i64 %l12, i64 %l13, i64 %l14, i64 %l15, i64 %l16, i64 %l17, i64 %l18, i64 %l19, i64 %l20, i64 %l21, i64 %l22, i64 %l23, i64 %l24, i8 %l25, i16 zeroext %l26, i32 signext %l27) { +define void @liveArgs(i64 %arg0, i64 %arg1, i64 %arg2, i64 %arg3, i64 %arg4, i64 %l0, i64 %l1, i64 %l2, i64 %l3, i64 %l4, i64 %l5, i64 %l6, i64 %l7, i64 %l8, i64 %l9, i64 %l10, i64 %l11, i64 %l12, i64 %l13, i64 %l14, i64 %l15, i64 %l16, i64 %l17, i64 %l18, i64 %l19, i64 %l20, i64 %l21, i64 %l22, i64 %l23, i64 %l24, i8 %l25, i16 zeroext %l26, i32 signext %l27) { entry: call void (i64, i32, ptr, i32, ...) @llvm.experimental.patchpoint.void(i64 11, i32 28, ptr null, i32 5, i64 %arg0, i64 %arg1, i64 %arg2, i64 %arg3, i64 %arg4, i64 %l0, i64 %l1, i64 %l2, i64 %l3, i64 %l4, i64 %l5, i64 %l6, i64 %l7, i64 %l8, i64 %l9, i64 %l10, i64 %l11, i64 %l12, i64 %l13, i64 %l14, i64 %l15, i64 %l16, i64 %l17, i64 %l18, i64 %l19, i64 %l20, i64 %l21, i64 %l22, i64 %l23, i64 %l24, i8 %l25, i16 %l26, i32 %l27) ret void

lukel97

LGTM

topperc · 2025-10-31T17:31:33Z

Was the title supposed to say "Mark" instead of "Mask"?

preames · 2025-10-31T18:08:34Z

Was the title supposed to say "Mark" instead of "Mask"?

Yep, that was a typo.

…lvm#165761) We have two mechanisms used in inline spilled for folding a load into a consuming instruction. One is used for stack reloads, the other for other load instructions (usually argument loads). We currently only implement optimizations for the first case, but stackmaps have generic support in target independent code for the other. We can go ahead and set the flag to enable that optimization. The primary motivation for this is that if we enable load rematerialization without it, we run into crashes where we can't make progress through rematerialization. We probably should enable the other foldMemoryOperand hook for RISCV specific instructions, but that's a separate optimization.

In some workloads we see an argument passed on the stack where it is loaded, only for it to be immediately spilled to a different slot on the stack and then reloaded from that spill slot later on. We can avoid the unnecessary spill by marking loads as rematerializable and just directly loading from where the argument was originally passed on the stack. TargetTransformInfo::isReMaterializableImpl checks to make sure that any loads are `MI.isDereferenceableInvariantLoad()`, so we should be able to move the load down to the remat site. This gives a 14.8% reduction in spills in 544.nab_r on rva23u64 -O3, and a few other smaller reductions on llvm-test-suite. I didn't find any benchmarks where the number of spills/reloads increased. Related: #165761

In some workloads we see an argument passed on the stack where it is loaded, only for it to be immediately spilled to a different slot on the stack and then reloaded from that spill slot later on. We can avoid the unnecessary spill by marking loads as rematerializable and just directly loading from where the argument was originally passed on the stack. TargetTransformInfo::isReMaterializableImpl checks to make sure that any loads are `MI.isDereferenceableInvariantLoad()`, so we should be able to move the load down to the remat site. This gives a 14.8% reduction in spills in 544.nab_r on rva23u64 -O3, and a few other smaller reductions on llvm-test-suite. I didn't find any benchmarks where the number of spills/reloads increased. Related: llvm#165761

llvmbot added the backend:RISC-V label Oct 30, 2025

preames requested review from lukel97 and topperc October 30, 2025 18:24

lukel97 approved these changes Oct 30, 2025

View reviewed changes

Merge branch 'main' into pr-riscv-stackmaps-canFoldAsLoad

f878e62

preames enabled auto-merge (squash) October 31, 2025 14:10

preames merged commit e72876a into llvm:main Oct 31, 2025
7 of 9 checks passed

lukel97 mentioned this pull request Nov 6, 2025

[RISCV] Enable rematerialization for scalar loads #166774

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Mask integer and float loads as canFoldAsLoad for stackmaps #165761

[RISCV] Mask integer and float loads as canFoldAsLoad for stackmaps #165761

Uh oh!

preames commented Oct 30, 2025

llvmbot commented Oct 30, 2025

lukel97 left a comment

Uh oh!

topperc commented Oct 31, 2025

preames commented Oct 31, 2025

Labels

4 participants

[RISCV] Mask integer and float loads as canFoldAsLoad for stackmaps #165761

[RISCV] Mask integer and float loads as canFoldAsLoad for stackmaps #165761

Uh oh!

Conversation

preames commented Oct 30, 2025

llvmbot commented Oct 30, 2025

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

topperc commented Oct 31, 2025

preames commented Oct 31, 2025

Labels

4 participants