- Notifications
You must be signed in to change notification settings - Fork 15.3k
[RISCV] Use LD_RV32/SD_RV32 for spills and reloads when Zilsd is enabled #153595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| @llvm/pr-subscribers-backend-risc-v Author: Sudharsan Veeravalli (svs-quic) ChangesWe are currently unilaterally using Full diff: https://github.com/llvm/llvm-project/pull/153595.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp index 085064eee896a..cd75f6709e1e8 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp +++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp @@ -658,7 +658,12 @@ void RISCVInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB, } else if (RISCV::GPRF32RegClass.hasSubClassEq(RC)) { Opcode = RISCV::SW_INX; } else if (RISCV::GPRPairRegClass.hasSubClassEq(RC)) { - Opcode = RISCV::PseudoRV32ZdinxSD; + if (STI.hasStdExtZilsd() && !STI.is64Bit()) { + Opcode = RISCV::SD_RV32; + } else { + assert(STI.hasStdExtZdinx() && !STI.is64Bit()); + Opcode = RISCV::PseudoRV32ZdinxSD; + } } else if (RISCV::FPR16RegClass.hasSubClassEq(RC)) { Opcode = RISCV::FSH; } else if (RISCV::FPR32RegClass.hasSubClassEq(RC)) { @@ -742,7 +747,12 @@ void RISCVInstrInfo::loadRegFromStackSlot( } else if (RISCV::GPRF32RegClass.hasSubClassEq(RC)) { Opcode = RISCV::LW_INX; } else if (RISCV::GPRPairRegClass.hasSubClassEq(RC)) { - Opcode = RISCV::PseudoRV32ZdinxLD; + if (STI.hasStdExtZilsd() && !STI.is64Bit()) { + Opcode = RISCV::LD_RV32; + } else { + assert(STI.hasStdExtZdinx() && !STI.is64Bit()); + Opcode = RISCV::PseudoRV32ZdinxLD; + } } else if (RISCV::FPR16RegClass.hasSubClassEq(RC)) { Opcode = RISCV::FLH; } else if (RISCV::FPR32RegClass.hasSubClassEq(RC)) { diff --git a/llvm/test/CodeGen/RISCV/zdinx-spill.ll b/llvm/test/CodeGen/RISCV/zdinx-spill.ll index 6f206fe571c17..526392b0280cb 100644 --- a/llvm/test/CodeGen/RISCV/zdinx-spill.ll +++ b/llvm/test/CodeGen/RISCV/zdinx-spill.ll @@ -1,44 +1,81 @@ ; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 -; RUN: llc < %s -mtriple=riscv32 -mattr=+zdinx -verify-machineinstrs -stop-after=prologepilog | FileCheck %s +; RUN: llc < %s -mtriple=riscv32 -mattr=+zdinx -verify-machineinstrs -stop-after=prologepilog | FileCheck %s -check-prefix=ZDINX +; RUN: llc < %s -mtriple=riscv32 -mattr=+zdinx,+zilsd -verify-machineinstrs -stop-after=prologepilog | FileCheck %s -check-prefix=ZDINX-ZILSD declare void @bar() define double @foo(double %x) nounwind { - ; CHECK-LABEL: name: foo - ; CHECK: bb.0 (%ir-block.0): - ; CHECK-NEXT: liveins: $x10, $x11, $x8, $x9, $x18, $x19, $x20, $x21, $x22, $x23, $x24, $x25, $x26, $x27 - ; CHECK-NEXT: {{ $}} - ; CHECK-NEXT: $x2 = frame-setup ADDI $x2, -64 - ; CHECK-NEXT: frame-setup SW killed $x8, $x2, 60 :: (store (s32) into %stack.1) - ; CHECK-NEXT: frame-setup SW killed $x9, $x2, 56 :: (store (s32) into %stack.2) - ; CHECK-NEXT: frame-setup SW killed $x18, $x2, 52 :: (store (s32) into %stack.3) - ; CHECK-NEXT: frame-setup SW killed $x19, $x2, 48 :: (store (s32) into %stack.4) - ; CHECK-NEXT: frame-setup SW killed $x20, $x2, 44 :: (store (s32) into %stack.5) - ; CHECK-NEXT: frame-setup SW killed $x21, $x2, 40 :: (store (s32) into %stack.6) - ; CHECK-NEXT: frame-setup SW killed $x22, $x2, 36 :: (store (s32) into %stack.7) - ; CHECK-NEXT: frame-setup SW killed $x23, $x2, 32 :: (store (s32) into %stack.8) - ; CHECK-NEXT: frame-setup SW killed $x24, $x2, 28 :: (store (s32) into %stack.9) - ; CHECK-NEXT: frame-setup SW killed $x25, $x2, 24 :: (store (s32) into %stack.10) - ; CHECK-NEXT: frame-setup SW killed $x26, $x2, 20 :: (store (s32) into %stack.11) - ; CHECK-NEXT: frame-setup SW killed $x27, $x2, 16 :: (store (s32) into %stack.12) - ; CHECK-NEXT: renamable $x10_x11 = nofpexcept FADD_D_IN32X killed renamable $x10_x11, renamable $x10_x11, 7, implicit $frm - ; CHECK-NEXT: PseudoRV32ZdinxSD killed renamable $x10_x11, $x2, 8 :: (store (s64) into %stack.0, align 4) - ; CHECK-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */, 12 /* clobber */, implicit-def dead early-clobber $x6, 12 /* clobber */, implicit-def dead early-clobber $x7, 12 /* clobber */, implicit-def dead early-clobber $x8, 12 /* clobber */, implicit-def dead early-clobber $x9, 12 /* clobber */, implicit-def dead early-clobber $x10, 12 /* clobber */, implicit-def dead early-clobber $x11, 12 /* clobber */, implicit-def dead early-clobber $x12, 12 /* clobber */, implicit-def dead early-clobber $x13, 12 /* clobber */, implicit-def dead early-clobber $x14, 12 /* clobber */, implicit-def dead early-clobber $x15, 12 /* clobber */, implicit-def dead early-clobber $x16, 12 /* clobber */, implicit-def dead early-clobber $x17, 12 /* clobber */, implicit-def dead early-clobber $x18, 12 /* clobber */, implicit-def dead early-clobber $x19, 12 /* clobber */, implicit-def dead early-clobber $x20, 12 /* clobber */, implicit-def dead early-clobber $x21, 12 /* clobber */, implicit-def dead early-clobber $x22, 12 /* clobber */, implicit-def dead early-clobber $x23, 12 /* clobber */, implicit-def dead early-clobber $x24, 12 /* clobber */, implicit-def dead early-clobber $x25, 12 /* clobber */, implicit-def dead early-clobber $x26, 12 /* clobber */, implicit-def dead early-clobber $x27, 12 /* clobber */, implicit-def dead early-clobber $x28, 12 /* clobber */, implicit-def dead early-clobber $x29, 12 /* clobber */, implicit-def dead early-clobber $x31 - ; CHECK-NEXT: renamable $x10_x11 = PseudoRV32ZdinxLD $x2, 8 :: (load (s64) from %stack.0, align 4) - ; CHECK-NEXT: $x8 = frame-destroy LW $x2, 60 :: (load (s32) from %stack.1) - ; CHECK-NEXT: $x9 = frame-destroy LW $x2, 56 :: (load (s32) from %stack.2) - ; CHECK-NEXT: $x18 = frame-destroy LW $x2, 52 :: (load (s32) from %stack.3) - ; CHECK-NEXT: $x19 = frame-destroy LW $x2, 48 :: (load (s32) from %stack.4) - ; CHECK-NEXT: $x20 = frame-destroy LW $x2, 44 :: (load (s32) from %stack.5) - ; CHECK-NEXT: $x21 = frame-destroy LW $x2, 40 :: (load (s32) from %stack.6) - ; CHECK-NEXT: $x22 = frame-destroy LW $x2, 36 :: (load (s32) from %stack.7) - ; CHECK-NEXT: $x23 = frame-destroy LW $x2, 32 :: (load (s32) from %stack.8) - ; CHECK-NEXT: $x24 = frame-destroy LW $x2, 28 :: (load (s32) from %stack.9) - ; CHECK-NEXT: $x25 = frame-destroy LW $x2, 24 :: (load (s32) from %stack.10) - ; CHECK-NEXT: $x26 = frame-destroy LW $x2, 20 :: (load (s32) from %stack.11) - ; CHECK-NEXT: $x27 = frame-destroy LW $x2, 16 :: (load (s32) from %stack.12) - ; CHECK-NEXT: $x2 = frame-destroy ADDI $x2, 64 - ; CHECK-NEXT: PseudoRET implicit $x10, implicit $x11 + ; ZDINX-LABEL: name: foo + ; ZDINX: bb.0 (%ir-block.0): + ; ZDINX-NEXT: liveins: $x10, $x11, $x8, $x9, $x18, $x19, $x20, $x21, $x22, $x23, $x24, $x25, $x26, $x27 + ; ZDINX-NEXT: {{ $}} + ; ZDINX-NEXT: $x2 = frame-setup ADDI $x2, -64 + ; ZDINX-NEXT: frame-setup SW killed $x8, $x2, 60 :: (store (s32) into %stack.1) + ; ZDINX-NEXT: frame-setup SW killed $x9, $x2, 56 :: (store (s32) into %stack.2) + ; ZDINX-NEXT: frame-setup SW killed $x18, $x2, 52 :: (store (s32) into %stack.3) + ; ZDINX-NEXT: frame-setup SW killed $x19, $x2, 48 :: (store (s32) into %stack.4) + ; ZDINX-NEXT: frame-setup SW killed $x20, $x2, 44 :: (store (s32) into %stack.5) + ; ZDINX-NEXT: frame-setup SW killed $x21, $x2, 40 :: (store (s32) into %stack.6) + ; ZDINX-NEXT: frame-setup SW killed $x22, $x2, 36 :: (store (s32) into %stack.7) + ; ZDINX-NEXT: frame-setup SW killed $x23, $x2, 32 :: (store (s32) into %stack.8) + ; ZDINX-NEXT: frame-setup SW killed $x24, $x2, 28 :: (store (s32) into %stack.9) + ; ZDINX-NEXT: frame-setup SW killed $x25, $x2, 24 :: (store (s32) into %stack.10) + ; ZDINX-NEXT: frame-setup SW killed $x26, $x2, 20 :: (store (s32) into %stack.11) + ; ZDINX-NEXT: frame-setup SW killed $x27, $x2, 16 :: (store (s32) into %stack.12) + ; ZDINX-NEXT: renamable $x10_x11 = nofpexcept FADD_D_IN32X killed renamable $x10_x11, renamable $x10_x11, 7, implicit $frm + ; ZDINX-NEXT: PseudoRV32ZdinxSD killed renamable $x10_x11, $x2, 8 :: (store (s64) into %stack.0, align 4) + ; ZDINX-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */, 12 /* clobber */, implicit-def dead early-clobber $x6, 12 /* clobber */, implicit-def dead early-clobber $x7, 12 /* clobber */, implicit-def dead early-clobber $x8, 12 /* clobber */, implicit-def dead early-clobber $x9, 12 /* clobber */, implicit-def dead early-clobber $x10, 12 /* clobber */, implicit-def dead early-clobber $x11, 12 /* clobber */, implicit-def dead early-clobber $x12, 12 /* clobber */, implicit-def dead early-clobber $x13, 12 /* clobber */, implicit-def dead early-clobber $x14, 12 /* clobber */, implicit-def dead early-clobber $x15, 12 /* clobber */, implicit-def dead early-clobber $x16, 12 /* clobber */, implicit-def dead early-clobber $x17, 12 /* clobber */, implicit-def dead early-clobber $x18, 12 /* clobber */, implicit-def dead early-clobber $x19, 12 /* clobber */, implicit-def dead early-clobber $x20, 12 /* clobber */, implicit-def dead early-clobber $x21, 12 /* clobber */, implicit-def dead early-clobber $x22, 12 /* clobber */, implicit-def dead early-clobber $x23, 12 /* clobber */, implicit-def dead early-clobber $x24, 12 /* clobber */, implicit-def dead early-clobber $x25, 12 /* clobber */, implicit-def dead early-clobber $x26, 12 /* clobber */, implicit-def dead early-clobber $x27, 12 /* clobber */, implicit-def dead early-clobber $x28, 12 /* clobber */, implicit-def dead early-clobber $x29, 12 /* clobber */, implicit-def dead early-clobber $x31 + ; ZDINX-NEXT: renamable $x10_x11 = PseudoRV32ZdinxLD $x2, 8 :: (load (s64) from %stack.0, align 4) + ; ZDINX-NEXT: $x8 = frame-destroy LW $x2, 60 :: (load (s32) from %stack.1) + ; ZDINX-NEXT: $x9 = frame-destroy LW $x2, 56 :: (load (s32) from %stack.2) + ; ZDINX-NEXT: $x18 = frame-destroy LW $x2, 52 :: (load (s32) from %stack.3) + ; ZDINX-NEXT: $x19 = frame-destroy LW $x2, 48 :: (load (s32) from %stack.4) + ; ZDINX-NEXT: $x20 = frame-destroy LW $x2, 44 :: (load (s32) from %stack.5) + ; ZDINX-NEXT: $x21 = frame-destroy LW $x2, 40 :: (load (s32) from %stack.6) + ; ZDINX-NEXT: $x22 = frame-destroy LW $x2, 36 :: (load (s32) from %stack.7) + ; ZDINX-NEXT: $x23 = frame-destroy LW $x2, 32 :: (load (s32) from %stack.8) + ; ZDINX-NEXT: $x24 = frame-destroy LW $x2, 28 :: (load (s32) from %stack.9) + ; ZDINX-NEXT: $x25 = frame-destroy LW $x2, 24 :: (load (s32) from %stack.10) + ; ZDINX-NEXT: $x26 = frame-destroy LW $x2, 20 :: (load (s32) from %stack.11) + ; ZDINX-NEXT: $x27 = frame-destroy LW $x2, 16 :: (load (s32) from %stack.12) + ; ZDINX-NEXT: $x2 = frame-destroy ADDI $x2, 64 + ; ZDINX-NEXT: PseudoRET implicit $x10, implicit $x11 + ; + ; ZDINX-ZILSD-LABEL: name: foo + ; ZDINX-ZILSD: bb.0 (%ir-block.0): + ; ZDINX-ZILSD-NEXT: liveins: $x10, $x11, $x8, $x9, $x18, $x19, $x20, $x21, $x22, $x23, $x24, $x25, $x26, $x27 + ; ZDINX-ZILSD-NEXT: {{ $}} + ; ZDINX-ZILSD-NEXT: $x2 = frame-setup ADDI $x2, -64 + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x8, $x2, 60 :: (store (s32) into %stack.1) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x9, $x2, 56 :: (store (s32) into %stack.2) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x18, $x2, 52 :: (store (s32) into %stack.3) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x19, $x2, 48 :: (store (s32) into %stack.4) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x20, $x2, 44 :: (store (s32) into %stack.5) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x21, $x2, 40 :: (store (s32) into %stack.6) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x22, $x2, 36 :: (store (s32) into %stack.7) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x23, $x2, 32 :: (store (s32) into %stack.8) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x24, $x2, 28 :: (store (s32) into %stack.9) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x25, $x2, 24 :: (store (s32) into %stack.10) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x26, $x2, 20 :: (store (s32) into %stack.11) + ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x27, $x2, 16 :: (store (s32) into %stack.12) + ; ZDINX-ZILSD-NEXT: renamable $x10_x11 = nofpexcept FADD_D_IN32X killed renamable $x10_x11, renamable $x10_x11, 7, implicit $frm + ; ZDINX-ZILSD-NEXT: SD_RV32 killed renamable $x10_x11, $x2, 8 :: (store (s64) into %stack.0, align 4) + ; ZDINX-ZILSD-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */, 12 /* clobber */, implicit-def dead early-clobber $x6, 12 /* clobber */, implicit-def dead early-clobber $x7, 12 /* clobber */, implicit-def dead early-clobber $x8, 12 /* clobber */, implicit-def dead early-clobber $x9, 12 /* clobber */, implicit-def dead early-clobber $x10, 12 /* clobber */, implicit-def dead early-clobber $x11, 12 /* clobber */, implicit-def dead early-clobber $x12, 12 /* clobber */, implicit-def dead early-clobber $x13, 12 /* clobber */, implicit-def dead early-clobber $x14, 12 /* clobber */, implicit-def dead early-clobber $x15, 12 /* clobber */, implicit-def dead early-clobber $x16, 12 /* clobber */, implicit-def dead early-clobber $x17, 12 /* clobber */, implicit-def dead early-clobber $x18, 12 /* clobber */, implicit-def dead early-clobber $x19, 12 /* clobber */, implicit-def dead early-clobber $x20, 12 /* clobber */, implicit-def dead early-clobber $x21, 12 /* clobber */, implicit-def dead early-clobber $x22, 12 /* clobber */, implicit-def dead early-clobber $x23, 12 /* clobber */, implicit-def dead early-clobber $x24, 12 /* clobber */, implicit-def dead early-clobber $x25, 12 /* clobber */, implicit-def dead early-clobber $x26, 12 /* clobber */, implicit-def dead early-clobber $x27, 12 /* clobber */, implicit-def dead early-clobber $x28, 12 /* clobber */, implicit-def dead early-clobber $x29, 12 /* clobber */, implicit-def dead early-clobber $x31 + ; ZDINX-ZILSD-NEXT: renamable $x10_x11 = LD_RV32 $x2, 8 :: (load (s64) from %stack.0, align 4) + ; ZDINX-ZILSD-NEXT: $x8 = frame-destroy LW $x2, 60 :: (load (s32) from %stack.1) + ; ZDINX-ZILSD-NEXT: $x9 = frame-destroy LW $x2, 56 :: (load (s32) from %stack.2) + ; ZDINX-ZILSD-NEXT: $x18 = frame-destroy LW $x2, 52 :: (load (s32) from %stack.3) + ; ZDINX-ZILSD-NEXT: $x19 = frame-destroy LW $x2, 48 :: (load (s32) from %stack.4) + ; ZDINX-ZILSD-NEXT: $x20 = frame-destroy LW $x2, 44 :: (load (s32) from %stack.5) + ; ZDINX-ZILSD-NEXT: $x21 = frame-destroy LW $x2, 40 :: (load (s32) from %stack.6) + ; ZDINX-ZILSD-NEXT: $x22 = frame-destroy LW $x2, 36 :: (load (s32) from %stack.7) + ; ZDINX-ZILSD-NEXT: $x23 = frame-destroy LW $x2, 32 :: (load (s32) from %stack.8) + ; ZDINX-ZILSD-NEXT: $x24 = frame-destroy LW $x2, 28 :: (load (s32) from %stack.9) + ; ZDINX-ZILSD-NEXT: $x25 = frame-destroy LW $x2, 24 :: (load (s32) from %stack.10) + ; ZDINX-ZILSD-NEXT: $x26 = frame-destroy LW $x2, 20 :: (load (s32) from %stack.11) + ; ZDINX-ZILSD-NEXT: $x27 = frame-destroy LW $x2, 16 :: (load (s32) from %stack.12) + ; ZDINX-ZILSD-NEXT: $x2 = frame-destroy ADDI $x2, 64 + ; ZDINX-ZILSD-NEXT: PseudoRET implicit $x10, implicit $x11 %a = fadd double %x, %x call void asm sideeffect "", "~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x13},~{x14},~{x15},~{x16},~{x17},~{x18},~{x19},~{x20},~{x21},~{x22},~{x23},~{x24},~{x25},~{x26},~{x27},~{x28},~{x29},~{xr0},~{x31}"() ret double %a |
| I'll see if I can come up with a test case where we generate a PseudoRV32ZdinxSD/LD when we don't have Zdinx enabled (but Zilsd is). |
| ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x26, $x2, 20 :: (store (s32) into %stack.11) | ||
| ; ZDINX-ZILSD-NEXT: frame-setup SW killed $x27, $x2, 16 :: (store (s32) into %stack.12) | ||
| ; ZDINX-ZILSD-NEXT: renamable $x10_x11 = nofpexcept FADD_D_IN32X killed renamable $x10_x11, renamable $x10_x11, 7, implicit $frm | ||
| ; ZDINX-ZILSD-NEXT: SD_RV32 killed renamable $x10_x11, $x2, 8 :: (store (s64) into %stack.0, align 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says align 4. SD_RV32 requires 8 byte alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:( everything is always a few more steps more complex than hoped.
We have a few ideas around addressing this so SD_RV32 is usable for these spills, but none are fundamentally easy. We'll be back with a larger proposal around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I looked into this once.
One thing I do remember is that the alignment associated with the FrameIndex object might be wrong if the stack can't be realigned.
I'm now wondering if spills of D registers on RV32E can reliably use FSD/FLD since the stack is only 4 byte aligned. The D register spill would require stack realignment, but that might not be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. The spill alignment for pairs matches the spill alignment for GPRs, rather than matching the natural alignment of pairs.
I think increasing the spill width is possible, but would cause issues on ilp32e as you point out, as we find it hard to spill D registers already with that combo, due to stack realignment.
Thankfully, the spill alignment is not an ABI detail, so we can try to make better choices, but annoyingly it's also controlled by tablegen info which we cannot override easily (this does not warrant new HWModes, which would be the only way to change it).
We're thinking about other ideas here, including about how we are handling alignment on rv32 - but we're also in the final days before our own release freeze, so capacity for our team to do a larger fix is not available immediately.
| Part of the reason we started looking at this is that we hit a verifier error after the pseudo expansion pass since the Something like: Before the Pseudo is expanded the code looks like: And after: Is there a way we can add the implicit register info or does the expansion happen too late in the pipeline for it to cause any issues? |
We are currently unilaterally using
PseudoRV32ZdinxSD/LDfor spills and reloads when the register class isGPRPairRegClassbut we can useLD_RV32/SD_RV32whenZilsdis enabled. This isn't an issue since thePseudoRV32ZdinxSD/LDgets expanded into individualLW/SW'slate but I don't think it's right to use it even whenZdinxis not enabled.