Skip to content

Conversation

@wangpc-pp
Copy link
Contributor

We add a MI flag to indicate the constraint and set this flag to
true for the second instruction of fusible pairs in pre-regalloc
macrofusion.

Then, we add register allocation hints for it.

During regalloc, the allocator will choose the same register
according to the hint.

This is a PoC currently.

tryAddHint(MO, MI.getOperand(0), NeedGPRC);
}
}
if (MI.isFusible() && OpIdx == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to check that the "first" instuction is part of a fusible pair?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to, because this has been done in MacroFusion mutation?

// this instruction.
Unpredictable = 1 << 16, // Instruction with unpredictable condition.
NoConvergent = 1 << 17, // Call does not require convergence guarantees.
Fusible = 1 << 18, // Instruction is the second of a fusible pair.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point to add a flag to simplify the code which detects fusible patterns? I realise that some processors may have a lot of fusions, but is adding a new flag really the right solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point to add a flag to simplify the code which detects fusible patterns?

Yes!

I realise that some processors may have a lot of fusions, but is adding a new flag really the right solution?

I think this can be feasible, there are already some values in MIFlag that are not so common, like NoConvergent (SIMT only I think), Unpredictable (X86 only), etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the second fusion instruction is an instruction with 2 register sources instead of 1 register and an immediate. We'll need to know which operand to constrain.

Copy link
Contributor Author

@wangpc-pp wangpc-pp Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the second fusion instruction is an instruction with 2 register sources

Currently, I think this is a uncommon case and I don't know if there are such fusions. Do you have any example? Maybe integer multiply-add fusion?
Even if so, I think we can know the constrainted operands via its opcode and constrain them in the getRegAllocationHints implementation. We don't think we need to record this info in target-independent part.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the RISC-V section of this page https://en.wikichip.org/wiki/macro-operation_fusion which are also listed here https://xiangshan-doc.readthedocs.io/zh-cn/latest/frontend/decode/

slli rd, rs1, {1,2,3} add rd, rd, rs2 

There are a few others on https://xiangshan-doc.readthedocs.io/zh-cn/latest/frontend/decode/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't take a deep look into XiangShan's fusions (facepalming).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the second fusion instruction is an instruction with 2 register sources instead of 1 register and an immediate. We'll need to know which operand to constrain.

Then we need to handle them case by case via opcode when adding hints.

wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this pull request Jan 10, 2024
This can save some time to know whether MacroFusion mutation is running in post-ra scheduler. And this can be used in llvm#77461.
@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Wang Pengcheng (wangpc-pp)

Changes

We add a MI flag to indicate the constraint and set this flag to
true for the second instruction of fusible pairs in pre-regalloc
macrofusion.

Then, we add register allocation hints for it.

During regalloc, the allocator will choose the same register
according to the hint.

This is a PoC currently.


Full diff: https://github.com/llvm/llvm-project/pull/77461.diff

4 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineInstr.h (+4)
  • (modified) llvm/lib/CodeGen/MacroFusion.cpp (+7)
  • (modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (+2)
  • (added) llvm/test/CodeGen/RISCV/pr76779.ll (+37)
diff --git a/llvm/include/llvm/CodeGen/MachineInstr.h b/llvm/include/llvm/CodeGen/MachineInstr.h index bd72ac23fc9c08..f694c27a3f7f7f 100644 --- a/llvm/include/llvm/CodeGen/MachineInstr.h +++ b/llvm/include/llvm/CodeGen/MachineInstr.h @@ -114,6 +114,7 @@ class MachineInstr // this instruction. Unpredictable = 1 << 16, // Instruction with unpredictable condition. NoConvergent = 1 << 17, // Call does not require convergence guarantees. + Fusible = 1 << 18, // Instruction is the second of a fusible pair. }; private: @@ -1030,6 +1031,9 @@ class MachineInstr return hasProperty(MCID::Convergent, Type); } + /// Return true if this instruction is fusible. + bool isFusible() const { return getFlag(Fusible); } + /// Returns true if the specified instruction has a delay slot /// which must be filled by the code generator. bool hasDelaySlot(QueryType Type = AnyInBundle) const { diff --git a/llvm/lib/CodeGen/MacroFusion.cpp b/llvm/lib/CodeGen/MacroFusion.cpp index 5bd6ca0978a4b1..e4aa4636442db2 100644 --- a/llvm/lib/CodeGen/MacroFusion.cpp +++ b/llvm/lib/CodeGen/MacroFusion.cpp @@ -13,6 +13,7 @@ #include "llvm/CodeGen/MacroFusion.h" #include "llvm/ADT/Statistic.h" +#include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineInstr.h" #include "llvm/CodeGen/ScheduleDAG.h" #include "llvm/CodeGen/ScheduleDAGInstrs.h" @@ -128,6 +129,12 @@ bool llvm::fuseInstructionPair(ScheduleDAGInstrs &DAG, SUnit &FirstSU, } } + // Mark the second instruction of fusible pair as MachineInstr::Fusible if + // this mutation is running in pre-ra scheduler. + if (!DAG.MF.getProperties().hasProperty( + MachineFunctionProperties::Property::NoVRegs)) + SecondSU.getInstr()->setFlag(MachineInstr::Fusible); + ++NumFused; return true; } diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp index 24f8d600f1eafc..741bb51d069437 100644 --- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp +++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp @@ -823,6 +823,8 @@ bool RISCVRegisterInfo::getRegAllocationHints( tryAddHint(MO, MI.getOperand(0), NeedGPRC); } } + if (MI.isFusible() && OpIdx == 1) + tryAddHint(MO, MI.getOperand(0), false); } for (MCPhysReg OrderReg : Order) diff --git a/llvm/test/CodeGen/RISCV/pr76779.ll b/llvm/test/CodeGen/RISCV/pr76779.ll new file mode 100644 index 00000000000000..cec132251e3c4c --- /dev/null +++ b/llvm/test/CodeGen/RISCV/pr76779.ll @@ -0,0 +1,37 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +;RUN: llc < %s -mtriple=riscv64 -mattr=+f -target-abi=lp64f \ +;RUN: | FileCheck %s --check-prefix=NOFUSION +;RUN: llc < %s -mtriple=riscv64 -mattr=+f,+lui-addi-fusion \ +;RUN: -target-abi=lp64f | FileCheck %s --check-prefix=FUSION +;RUN: llc < %s -mtriple=riscv64 -mattr=+f,+lui-addi-fusion,+use-postra-scheduler \ +;RUN: -target-abi=lp64f | FileCheck %s --check-prefixes=FUSION-POSTRA + +define void @foo(i32 noundef signext %0, i32 noundef signext %1) { +; NOFUSION-LABEL: foo: +; NOFUSION: # %bb.0: +; NOFUSION-NEXT: lui a0, 3014 +; NOFUSION-NEXT: addiw a2, a0, 334 +; NOFUSION-NEXT: mv a0, a1 +; NOFUSION-NEXT: mv a1, a2 +; NOFUSION-NEXT: tail bar +; +; FUSION-LABEL: foo: +; FUSION: # %bb.0: +; FUSION-NEXT: lui a2, 3014 +; FUSION-NEXT: addiw a2, a2, 334 +; FUSION-NEXT: mv a0, a1 +; FUSION-NEXT: mv a1, a2 +; FUSION-NEXT: tail bar +; +; FUSION-POSTRA-LABEL: foo: +; FUSION-POSTRA: # %bb.0: +; FUSION-POSTRA-NEXT: lui a2, 3014 +; FUSION-POSTRA-NEXT: addiw a2, a2, 334 +; FUSION-POSTRA-NEXT: mv a0, a1 +; FUSION-POSTRA-NEXT: mv a1, a2 +; FUSION-POSTRA-NEXT: tail bar + tail call void @bar(i32 noundef signext %1, i32 noundef signext 12345678) + ret void +} + +declare void @bar(i32 noundef signext, i32 noundef signext) 
… fusible pair We add a MI flag to indicate the constraint and set this flag to true for the second instruction of fusible pairs in pre-regalloc macrofusion. Then, we add register allocation hints for it. During regalloc, the allocator will choose the same register according to the hint. This is a PoC currently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants