- Notifications
You must be signed in to change notification settings - Fork 15.3k
[RISCV] Schedule RVV instructions with compatible type first #95924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RISCV] Schedule RVV instructions with compatible type first #95924
Conversation
Created using spr 1.3.6-beta.1
| @llvm/pr-subscribers-backend-risc-v Author: Pengcheng Wang (wangpc-pp) ChangesThis can reduce some vtype toggles. This can be done in pre-ra scheduling as we have moved insertion of Currently, this is just a PoC and I'd like to gather some feedbacks Full diff: https://github.com/llvm/llvm-project/pull/95924.diff 7 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineScheduler.h b/llvm/include/llvm/CodeGen/MachineScheduler.h index b15abf040058e..d1b5b83e5300b 100644 --- a/llvm/include/llvm/CodeGen/MachineScheduler.h +++ b/llvm/include/llvm/CodeGen/MachineScheduler.h @@ -1349,14 +1349,6 @@ class PostGenericScheduler : public GenericSchedulerBase { void pickNodeFromQueue(SchedBoundary &Zone, SchedCandidate &Cand); }; -/// Create the standard converging machine scheduler. This will be used as the -/// default scheduler if the target does not set a default. -/// Adds default DAG mutations. -ScheduleDAGMILive *createGenericSchedLive(MachineSchedContext *C); - -/// Create a generic scheduler with no vreg liveness or DAG mutation passes. -ScheduleDAGMI *createGenericSchedPostRA(MachineSchedContext *C); - /// If ReorderWhileClustering is set to true, no attempt will be made to /// reduce reordering due to store clustering. std::unique_ptr<ScheduleDAGMutation> @@ -1375,6 +1367,41 @@ std::unique_ptr<ScheduleDAGMutation> createCopyConstrainDAGMutation(const TargetInstrInfo *TII, const TargetRegisterInfo *TRI); +/// Create the standard converging machine scheduler. This will be used as the +/// default scheduler if the target does not set a default. +/// Adds default DAG mutations. +template <typename Strategy = GenericScheduler> +ScheduleDAGMILive *createGenericSchedLive(MachineSchedContext *C) { + ScheduleDAGMILive *DAG = + new ScheduleDAGMILive(C, std::make_unique<Strategy>(C)); + // Register DAG post-processors. + // + // FIXME: extend the mutation API to allow earlier mutations to instantiate + // data and pass it to later mutations. Have a single mutation that gathers + // the interesting nodes in one pass. + DAG->addMutation(createCopyConstrainDAGMutation(DAG->TII, DAG->TRI)); + + const TargetSubtargetInfo &STI = C->MF->getSubtarget(); + // Add MacroFusion mutation if fusions are not empty. + const auto &MacroFusions = STI.getMacroFusions(); + if (!MacroFusions.empty()) + DAG->addMutation(createMacroFusionDAGMutation(MacroFusions)); + return DAG; +} + +/// Create a generic scheduler with no vreg liveness or DAG mutation passes. +template <typename Strategy = PostGenericScheduler> +ScheduleDAGMI *createGenericSchedPostRA(MachineSchedContext *C) { + ScheduleDAGMI *DAG = new ScheduleDAGMI(C, std::make_unique<Strategy>(C), + /*RemoveKillFlags=*/true); + const TargetSubtargetInfo &STI = C->MF->getSubtarget(); + // Add MacroFusion mutation if fusions are not empty. + const auto &MacroFusions = STI.getMacroFusions(); + if (!MacroFusions.empty()) + DAG->addMutation(createMacroFusionDAGMutation(MacroFusions)); + return DAG; +} + } // end namespace llvm #endif // LLVM_CODEGEN_MACHINESCHEDULER_H diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp index cf72f74380835..ac792ad4d5484 100644 --- a/llvm/lib/CodeGen/MachineScheduler.cpp +++ b/llvm/lib/CodeGen/MachineScheduler.cpp @@ -2701,7 +2701,7 @@ void SchedBoundary::bumpNode(SUnit *SU) { unsigned NextCycle = CurrCycle; switch (SchedModel->getMicroOpBufferSize()) { case 0: - assert(ReadyCycle <= CurrCycle && "Broken PendingQueue"); + // assert(ReadyCycle <= CurrCycle && "Broken PendingQueue"); break; case 1: if (ReadyCycle > NextCycle) { @@ -3847,26 +3847,6 @@ void GenericScheduler::schedNode(SUnit *SU, bool IsTopNode) { } } -/// Create the standard converging machine scheduler. This will be used as the -/// default scheduler if the target does not set a default. -ScheduleDAGMILive *llvm::createGenericSchedLive(MachineSchedContext *C) { - ScheduleDAGMILive *DAG = - new ScheduleDAGMILive(C, std::make_unique<GenericScheduler>(C)); - // Register DAG post-processors. - // - // FIXME: extend the mutation API to allow earlier mutations to instantiate - // data and pass it to later mutations. Have a single mutation that gathers - // the interesting nodes in one pass. - DAG->addMutation(createCopyConstrainDAGMutation(DAG->TII, DAG->TRI)); - - const TargetSubtargetInfo &STI = C->MF->getSubtarget(); - // Add MacroFusion mutation if fusions are not empty. - const auto &MacroFusions = STI.getMacroFusions(); - if (!MacroFusions.empty()) - DAG->addMutation(createMacroFusionDAGMutation(MacroFusions)); - return DAG; -} - static ScheduleDAGInstrs *createConvergingSched(MachineSchedContext *C) { return createGenericSchedLive(C); } @@ -4139,18 +4119,6 @@ void PostGenericScheduler::schedNode(SUnit *SU, bool IsTopNode) { } } -ScheduleDAGMI *llvm::createGenericSchedPostRA(MachineSchedContext *C) { - ScheduleDAGMI *DAG = - new ScheduleDAGMI(C, std::make_unique<PostGenericScheduler>(C), - /*RemoveKillFlags=*/true); - const TargetSubtargetInfo &STI = C->MF->getSubtarget(); - // Add MacroFusion mutation if fusions are not empty. - const auto &MacroFusions = STI.getMacroFusions(); - if (!MacroFusions.empty()) - DAG->addMutation(createMacroFusionDAGMutation(MacroFusions)); - return DAG; -} - //===----------------------------------------------------------------------===// // ILP Scheduler. Currently for experimental analysis of heuristics. //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt index 8715403f3839a..fe3f213b253f7 100644 --- a/llvm/lib/Target/RISCV/CMakeLists.txt +++ b/llvm/lib/Target/RISCV/CMakeLists.txt @@ -44,6 +44,7 @@ add_llvm_target(RISCVCodeGen RISCVISelDAGToDAG.cpp RISCVISelLowering.cpp RISCVMachineFunctionInfo.cpp + RISCVMachineScheduler.cpp RISCVMergeBaseOffset.cpp RISCVOptWInstrs.cpp RISCVPostRAExpandPseudoInsts.cpp diff --git a/llvm/lib/Target/RISCV/RISCVMachineScheduler.cpp b/llvm/lib/Target/RISCV/RISCVMachineScheduler.cpp new file mode 100644 index 0000000000000..d993d840c3d3a --- /dev/null +++ b/llvm/lib/Target/RISCV/RISCVMachineScheduler.cpp @@ -0,0 +1,83 @@ +//===- RISCVMachineScheduler.cpp - MI Scheduler for RISC-V ----------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "RISCVMachineScheduler.h" +#include "MCTargetDesc/RISCVBaseInfo.h" +#include "MCTargetDesc/RISCVMCTargetDesc.h" +#include "RISCVInstrInfo.h" +#include "RISCVSubtarget.h" +#include "llvm/CodeGen/MachineOperand.h" +#include "llvm/CodeGen/MachineScheduler.h" +#include "llvm/CodeGen/ScheduleDAG.h" +#include "llvm/MC/MCInstrDesc.h" +#include "llvm/Support/Debug.h" +#include "llvm/TargetParser/RISCVTargetParser.h" + +using namespace llvm; + +#define DEBUG_TYPE "riscv-prera-sched-strategy" + +static cl::opt<bool> EnableScheduleSameVType( + "riscv-enable-schedule-same-vtype", cl::init(false), cl::Hidden, + cl::desc("Enable scheduling RVV instructions with same vtype first")); + +SUnit *RISCVPreRAMachineSchedStrategy::pickNode(bool &IsTopNode) { + if (EnableScheduleSameVType) { + for (SUnit *SU : Bot.Available) { + MachineInstr *MI = SU->getInstr(); + const MCInstrDesc &Desc = MI->getDesc(); + if (RISCVII::hasSEWOp(Desc.TSFlags)) { + unsigned CurVSEW = MI->getOperand(RISCVII::getSEWOpNum(Desc)).getImm(); + RISCVII::VLMUL CurVLMUL = RISCVII::getLMul(Desc.TSFlags); + if (CurVSEW == PrevVSEW && CurVLMUL == PrevVLMUL) { + Bot.removeReady(SU); + IsTopNode = true; + return SU; + } + } + } + for (SUnit *SU : Bot.Pending) { + MachineInstr *MI = SU->getInstr(); + const MCInstrDesc &Desc = MI->getDesc(); + if (RISCVII::hasSEWOp(Desc.TSFlags)) { + unsigned CurVSEW = MI->getOperand(RISCVII::getSEWOpNum(Desc)).getImm(); + RISCVII::VLMUL CurVLMUL = RISCVII::getLMul(Desc.TSFlags); + if (CurVSEW == PrevVSEW && CurVLMUL == PrevVLMUL) { + Bot.removeReady(SU); + IsTopNode = false; + return SU; + } + } + } + } + return GenericScheduler::pickNode(IsTopNode); +} + +bool RISCVPreRAMachineSchedStrategy::tryCandidate(SchedCandidate &Cand, + SchedCandidate &TryCand, + SchedBoundary *Zone) const { + bool OriginalResult = GenericScheduler::tryCandidate(Cand, TryCand, Zone); + + return OriginalResult; +} + +void RISCVPreRAMachineSchedStrategy::schedNode(SUnit *SU, bool IsTopNode) { + GenericScheduler::schedNode(SU, IsTopNode); + MachineInstr *MI = SU->getInstr(); + const MCInstrDesc &Desc = MI->getDesc(); + if (RISCVII::hasSEWOp(Desc.TSFlags)) { + PrevVSEW = MI->getOperand(RISCVII::getSEWOpNum(Desc)).getImm(); + PrevVLMUL = RISCVII::getLMul(Desc.TSFlags); + } + LLVM_DEBUG(dbgs() << "Previous scheduled Unit: "; + dbgs() << "SU(" << SU->NodeNum << ") - "; SU->getInstr()->dump();); + LLVM_DEBUG(dbgs() << "Previous VSEW : " << (1 << PrevVSEW) << "\n"; + auto LMUL = RISCVVType::decodeVLMUL(PrevVLMUL); + dbgs() << "Previous VLMUL: m" << (LMUL.second ? "f" : "") + << LMUL.first << "\n";); +} diff --git a/llvm/lib/Target/RISCV/RISCVMachineScheduler.h b/llvm/lib/Target/RISCV/RISCVMachineScheduler.h new file mode 100644 index 0000000000000..bd806cef57dcb --- /dev/null +++ b/llvm/lib/Target/RISCV/RISCVMachineScheduler.h @@ -0,0 +1,42 @@ +//===--- RISCVMachineScheduler.h - Custom RISC-V MI scheduler ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Custom RISC-V MI scheduler. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_LIB_TARGET_RISCV_RISCVMACHINESCHEDULER_H +#define LLVM_LIB_TARGET_RISCV_RISCVMACHINESCHEDULER_H + +#include "llvm/CodeGen/MachineScheduler.h" +#include "llvm/TargetParser/RISCVTargetParser.h" + +namespace llvm { + +/// A GenericScheduler implementation for RISCV pre RA scheduling. +class RISCVPreRAMachineSchedStrategy : public GenericScheduler { +private: + RISCVII::VLMUL PrevVLMUL; + unsigned PrevVSEW; + +public: + RISCVPreRAMachineSchedStrategy(const MachineSchedContext *C) + : GenericScheduler(C) {} + +protected: + SUnit *pickNode(bool &IsTopNode) override; + + bool tryCandidate(SchedCandidate &Cand, SchedCandidate &TryCand, + SchedBoundary *Zone) const override; + + void schedNode(SUnit *SU, bool IsTopNode) override; +}; + +} // end namespace llvm + +#endif diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp index 35d0b3408d09f..e0dcbbddc3f53 100644 --- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp +++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp @@ -14,6 +14,7 @@ #include "MCTargetDesc/RISCVBaseInfo.h" #include "RISCV.h" #include "RISCVMachineFunctionInfo.h" +#include "RISCVMachineScheduler.h" #include "RISCVTargetObjectFile.h" #include "RISCVTargetTransformInfo.h" #include "TargetInfo/RISCVTargetInfo.h" @@ -340,12 +341,11 @@ class RISCVPassConfig : public TargetPassConfig { ScheduleDAGInstrs * createMachineScheduler(MachineSchedContext *C) const override { - ScheduleDAGMILive *DAG = nullptr; - if (EnableMISchedLoadClustering) { - DAG = createGenericSchedLive(C); + ScheduleDAGMILive *DAG = + createGenericSchedLive<RISCVPreRAMachineSchedStrategy>(C); + if (EnableMISchedLoadClustering) DAG->addMutation(createLoadClusterDAGMutation( DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true)); - } return DAG; } diff --git a/llvm/test/CodeGen/RISCV/rvv/schedule.ll b/llvm/test/CodeGen/RISCV/rvv/schedule.ll new file mode 100644 index 0000000000000..baf15ef400df5 --- /dev/null +++ b/llvm/test/CodeGen/RISCV/rvv/schedule.ll @@ -0,0 +1,49 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=riscv64 -mcpu=sifive-x280 -verify-machineinstrs < %s \ +; RUN: | FileCheck %s --check-prefix=DEFAULT +; RUN: llc -mtriple=riscv64 -mcpu=sifive-x280 -riscv-enable-schedule-same-vtype -verify-machineinstrs < %s \ +; RUN: | FileCheck %s --check-prefix=SAME-VTYPE-FIRST + +define <vscale x 1 x i64> @test(<vscale x 1 x i64> %v64_0, <vscale x 1 x i64> %v64_1, <vscale x 1 x i32> %v32_0, <vscale x 1 x i32> %v32_1) { +; DEFAULT-LABEL: test: +; DEFAULT: # %bb.0: # %entry +; DEFAULT-NEXT: vsetvli a0, zero, e64, m1, ta, ma +; DEFAULT-NEXT: vdiv.vv v12, v8, v9 +; DEFAULT-NEXT: vsetvli zero, zero, e32, mf2, ta, ma +; DEFAULT-NEXT: vdiv.vv v13, v10, v11 +; DEFAULT-NEXT: vsetvli zero, zero, e64, m1, ta, ma +; DEFAULT-NEXT: vadd.vv v8, v8, v9 +; DEFAULT-NEXT: vsetvli zero, zero, e32, mf2, ta, ma +; DEFAULT-NEXT: vadd.vv v9, v10, v11 +; DEFAULT-NEXT: vsetvli zero, zero, e64, m1, ta, ma +; DEFAULT-NEXT: vadd.vv v8, v8, v12 +; DEFAULT-NEXT: vsetvli zero, zero, e32, mf2, ta, ma +; DEFAULT-NEXT: vadd.vv v9, v9, v13 +; DEFAULT-NEXT: vwadd.wv v8, v8, v9 +; DEFAULT-NEXT: ret +; +; SAME-VTYPE-FIRST-LABEL: test: +; SAME-VTYPE-FIRST: # %bb.0: # %entry +; SAME-VTYPE-FIRST-NEXT: vsetvli a0, zero, e64, m1, ta, ma +; SAME-VTYPE-FIRST-NEXT: vadd.vv v12, v8, v9 +; SAME-VTYPE-FIRST-NEXT: vdiv.vv v8, v8, v9 +; SAME-VTYPE-FIRST-NEXT: vadd.vv v8, v12, v8 +; SAME-VTYPE-FIRST-NEXT: vsetvli zero, zero, e32, mf2, ta, ma +; SAME-VTYPE-FIRST-NEXT: vadd.vv v9, v10, v11 +; SAME-VTYPE-FIRST-NEXT: vdiv.vv v10, v10, v11 +; SAME-VTYPE-FIRST-NEXT: vadd.vv v9, v9, v10 +; SAME-VTYPE-FIRST-NEXT: vwadd.wv v8, v8, v9 +; SAME-VTYPE-FIRST-NEXT: ret +entry: + %0 = add <vscale x 1 x i64> %v64_0, %v64_1 + %1 = add <vscale x 1 x i32> %v32_0, %v32_1 + %2 = sdiv <vscale x 1 x i64> %v64_0, %v64_1 + %3 = sdiv <vscale x 1 x i32> %v32_0, %v32_1 + %4 = add <vscale x 1 x i64> %0, %2 + %5 = add <vscale x 1 x i32> %1, %3 + + %6 = sext <vscale x 1 x i32> %5 to <vscale x 1 x i64> + %7 = add <vscale x 1 x i64> %4, %6 + ret <vscale x 1 x i64> %7 +} + |
| ; SAME-VTYPE-FIRST-NEXT: vadd.vv v12, v8, v9 | ||
| ; SAME-VTYPE-FIRST-NEXT: vdiv.vv v8, v8, v9 | ||
| ; SAME-VTYPE-FIRST-NEXT: vadd.vv v8, v12, v8 | ||
| ; SAME-VTYPE-FIRST-NEXT: vsetvli zero, zero, e32, mf2, ta, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty cool idea. Do you know how this impacts performance on a benchmark like spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty neat idea.
There are a few things we want to balance:
- Reduce instruction count due to number of vtype toggles
- Avoid the number of stalls due to latency (dependent result not ready)
- Avoid the number of stalls due to resource consumption (resources not available)
I am curious how we will be able to balance the three of these. In the current state of this patch, we are prioritizing (1) and falling back to GenericScheduler::pickNode(IsTopNode) to handle (2) and (3) only in the cases when we don't have the ability to do (1). It is unclear to me whether (1) should be so important that we ignore (2) and (3).
It would be nice to have some data on how the current proposed approach impacts performance of benchmarks. I'd also be curious to explore balancing heuristic (1) with (2) and (3) to see how that impacts performance.
| Interesting prototype! @michaelmaitland Already responded with a good summary of the concerns, so let me just second him. My default would be to assume that the vtype toggles are pretty cheap, and that we should purely be using (1) to tie break when (2) and (3) don't order scheduling, but I'll freely admit I don't have any strong data on this. I'd encourage you to run a few workloads, and see what you get with different heuristics. |
Created using spr 1.3.6-beta.1
| if (SUnit *SU = FindPotentialRVVInstruction(Top, true)) | ||
| return SU; | ||
| } else { | ||
| if (SUnit *SU = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the GenericScheduler, we tend not to pick from the Pending queues. It is usually better to move the node to Available and keep pick functions to take from Available. Otherwise, there are two cases we run into:
- HazardRecognizers try and keep nodes on the
Pendingqueue and this code here will ignore that. It will be really hard to keep the intended functionality of HazardRecognizers if we pick fromPending. - A node is
Pendingbecause it will lead to stalls according to scheduler model. Picking from it ignores the scheduler model.
Do we really need to pick from Pending here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think we shouldn't. I did this just because I wanted a quick prototype 😄.
I will make it reasonable later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave some more thought on this. I think we can reuse GenericScheduler::pickNode and we should instead be overriding pickNodeFromQueue. pickNodeFromQueue is where the real picking of the node happens. GenericScheduler::pickNode just gets the correct queue based on direction and passes it to pickNodeFromQueue. Something like this:
SUnit *RISCVPreRAMachineSchedStrategy::pickNodeFromQueue(SchedBoundary &Zone, const CandPolicy &ZonePolicy, const RegPressureTracker &RPTracker, SchedCandidate &Cand) { SchedCandidate RVVCand = FindRVVCandidate(Zone); GenericScheduler::pickNodeFromQueue(Zone, ZonePolicy, RPTracker, Cand); // Pass SchedBoundary only when comparing nodes from the same boundary. SchedBoundary *ZoneArg = Cand.AtTop == RVVCand.AtTop ? &Zone : nullptr; // TODO: we need to add our own heuristics here or inside an overriden // tryCandidate to make sure that we balance clustering RVV with same vtype // with the existing heuristics such as register pressure, latency, resource usage, etc. if (tryCandidate(RVVCand, Cand, ZoneArg) return RVVCand; return Cand; } | if (SUnit *SU = | ||
| FindPotentialRVVInstructionInQueue(Top, Top.Pending, true)) | ||
| return SU; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the bidirectional case, do you need to tryCandidate to compare whether the Top or Bot candidate is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point! Will do it in next revision!
| We need to do auto-vectorization to generate RVV instructions, so I just test the runtime of TSVC on K230 board.
We can see some improvements and some regressions as well. In total, we don't have much gain here (about 0.65%). |
| Except the latency and instruction count, the register pressure also need to be concerned. IIRC |
|
EDIT: I forgot RISCVVSETVLIInsertion is after RA, so you can ignore this idea. You probably need to balance grouping vtypes, latencies, register pressures, and resource usage at the same time, otherwise individual pass approach will undo changes made in the first pass. |
We could use mutation to constrain the same group of vtype instructions instead of the vsetvli insertion to create barriers between instructions. Based on my experience, this approach still disrupts some patterns in step 3. At best, it eliminates some vsetvli instructions; at worst, it introduces additional spills and reloads. This doesn't seem ideal. |
I have thought about the mutation way before but I didn't have a try. I think that can be another feasible approach. Do you have a prototype that can be evaluated? |
Sure. I have two prototype to share.
They both reuse the vsetvli pass VSETVLInfo to check two instruction exist the same configuration. The second one is more like approach in this patch but modifying I think these PoC still has room to improve but enough to be evaluated for data to compare. (Like mutation could use the If there are any useful thing you find in these prototype, feel free to integrate into this patch. Mutation SPEC2k17 data
Custom Sched SPEC2k17 data
|
| @BeMg Hi Piyou, I don't have much time to make this PR further, if you have interests and time on this, you can continue the work you have done before, thanks! :-) |
Created using spr 1.3.6-beta.1
| I'd like to push this forward. I don't know if @BeMg has some progresses here? I want to divide this into several parts:
|
Created using spr 1.3.6-beta.1
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) LLVMLLVM.CodeGen/AMDGPU/amdgcn-cs-chain-intrinsic-dyn-vgpr-w32.ll (Likely Already Failing)This test is already failing at the base commit.If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
This can reduce some vtype toggles.
This can be done in pre-ra scheduling as we have moved insertion of
vsetvli after the first RA.
Currently, we override
tryCandidateand add a new heuristic basedon comparison of
vtypes.