Skip to content

Conversation

@lucas-rami
Copy link
Contributor

@lucas-rami lucas-rami commented Feb 25, 2025

The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning.

This makes the machine scheduler no longer skip single-MI regions. Only a few unit tests are affected (mainly those which check for the scheduler's debug output).

@lucas-rami lucas-rami added backend:AMDGPU mi-sched machine instruction scheduler labels Feb 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-backend-amdgpu

Author: Lucas Ramirez (lucas-rami)

Changes

The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning.

This adds a flag to ScheduleDAGInstrs to tell the scheduler to not skip over single-MI regions. Only the AMDGPU target currently enables this flag, so scheduling behavior is unaffected for all other targets.


Full diff: https://github.com/llvm/llvm-project/pull/128739.diff

5 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h (+8)
  • (modified) llvm/lib/CodeGen/MachineScheduler.cpp (+2-1)
  • (modified) llvm/lib/CodeGen/ScheduleDAGInstrs.cpp (+3-2)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+4-1)
  • (modified) llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir (+2)
diff --git a/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h b/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h index aaa10e684687c..82240745c2772 100644 --- a/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h +++ b/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h @@ -124,6 +124,9 @@ namespace llvm { /// rescheduling). bool RemoveKillFlags; + /// True if regions with a single MI should be scheduled. + bool ScheduleSingleMIRegions = false; + /// The standard DAG builder does not normally include terminators as DAG /// nodes because it does not create the necessary dependencies to prevent /// reordering. A specialized scheduler can override @@ -288,6 +291,11 @@ namespace llvm { return Topo.IsReachable(SU, TargetSU); } + /// Whether regions with a single MI should be scheduled. + bool shouldScheduleSingleMIRegions() const { + return ScheduleSingleMIRegions; + } + /// Returns an iterator to the top of the current scheduling region. MachineBasicBlock::iterator begin() const { return RegionBegin; } diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp index 0da7535031a7d..b9903ee832d31 100644 --- a/llvm/lib/CodeGen/MachineScheduler.cpp +++ b/llvm/lib/CodeGen/MachineScheduler.cpp @@ -770,6 +770,7 @@ void MachineSchedulerBase::scheduleRegions(ScheduleDAGInstrs &Scheduler, MBBRegionsVector MBBRegions; getSchedRegions(&*MBB, MBBRegions, Scheduler.doMBBSchedRegionsTopDown()); + bool ScheduleSingleMI = Scheduler.shouldScheduleSingleMIRegions(); for (const SchedRegion &R : MBBRegions) { MachineBasicBlock::iterator I = R.RegionBegin; MachineBasicBlock::iterator RegionEnd = R.RegionEnd; @@ -780,7 +781,7 @@ void MachineSchedulerBase::scheduleRegions(ScheduleDAGInstrs &Scheduler, Scheduler.enterRegion(&*MBB, I, RegionEnd, NumRegionInstrs); // Skip empty scheduling regions (0 or 1 schedulable instructions). - if (I == RegionEnd || I == std::prev(RegionEnd)) { + if (I == RegionEnd || (!ScheduleSingleMI && I == std::prev(RegionEnd))) { // Close the current region. Bundle the terminator if needed. // This invalidates 'RegionEnd' and 'I'. Scheduler.exitRegion(); diff --git a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp index a26804707dd1f..cd652659dfdef 100644 --- a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp +++ b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp @@ -116,8 +116,9 @@ ScheduleDAGInstrs::ScheduleDAGInstrs(MachineFunction &mf, bool RemoveKillFlags) : ScheduleDAG(mf), MLI(mli), MFI(mf.getFrameInfo()), RemoveKillFlags(RemoveKillFlags), - UnknownValue(UndefValue::get( - Type::getVoidTy(mf.getFunction().getContext()))), Topo(SUnits, &ExitSU) { + UnknownValue( + UndefValue::get(Type::getVoidTy(mf.getFunction().getContext()))), + Topo(SUnits, &ExitSU) { DbgValues.clear(); const TargetSubtargetInfo &ST = mf.getSubtarget(); diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp index 176586e3fbbb6..dbab18b7ae46f 100644 --- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp +++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp @@ -759,7 +759,10 @@ GCNScheduleDAGMILive::GCNScheduleDAGMILive( MFI(*MF.getInfo<SIMachineFunctionInfo>()), StartingOccupancy(MFI.getOccupancy()), MinOccupancy(StartingOccupancy), RegionLiveOuts(this, /*IsLiveOut=*/true) { - + // We want regions with a single MI to be scheduled so that we can reason on + // them correctlt during scheduling stages that move MIs between regions (e.g. + // rematerialization). + ScheduleSingleMIRegions = true; LLVM_DEBUG(dbgs() << "Starting occupancy is " << StartingOccupancy << ".\n"); if (RelaxedOcc) { MinOccupancy = std::min(MFI.getMinAllowedOccupancy(), StartingOccupancy); diff --git a/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir b/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir index d415346b49b28..2a08c52e447ba 100644 --- a/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir +++ b/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir @@ -2,6 +2,8 @@ # RUN: llc -mtriple=amdgcn -mcpu=gfx908 -passes=machine-scheduler %s -o - -debug-only=machine-scheduler 2>&1 | FileCheck %s # REQUIRES: asserts +# CHECK: ********** MI Scheduling ********** +# CHECK-NEXT: test_get_liveins:%bb.0 # CHECK: ********** MI Scheduling ********** # CHECK-NEXT: test_get_liveins:%bb.1 # CHECK: Region live-in pressure: VGPRs: 1 AGPRs: 0, SGPRs: 0, LVGPR WT: 0, LSGPR WT: 0 
@lucas-rami lucas-rami requested review from arsenm, bcahoon, davemgreen and jrbyrnes and removed request for arsenm and bcahoon February 25, 2025 16:32
Co-authored-by: Jay Foad <jay.foad@gmail.com>
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testcase where this makes a difference?

bool RemoveKillFlags;

/// True if regions with a single MI should be scheduled.
bool ScheduleSingleMIRegions = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over-configuration? Can you just unconditionally do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the flag and made this the generic scheduler behavior. The only unit test that breaks in a way I cannot really understand is misched-branch-targets.mir. I "fixed" it anyway, but not sure if the change break something here.

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I looking at the right diff here? The only code change shown here is the removal of I == std::prev(RegionEnd).

@lucas-rami
Copy link
Contributor Author

lucas-rami commented Feb 26, 2025

Testcase where this makes a difference?

I spent some time trying to come up with one but didn't manage to. This change is motivated by ongoing pre-RA rematerialization work on the AMDGPU backend (see #125885) which requires that we are able to track regions with a single MI. The current implementation follows a different logic to identify rematerializable instructions, and I can't manage to make this fail/succeed depending on whether single-MI regions are filtered out/in.

Am I looking at the right diff here? The only code change shown here is the removal of I == std::prev(RegionEnd).

A few unit tests across various targets were also updated manually, and some previous changes reverted.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the comment about the flag from the message

@lucas-rami lucas-rami merged commit 15e295d into llvm:main Feb 27, 2025
6 of 10 checks passed
@nikic
Copy link
Contributor

nikic commented Feb 27, 2025

@jayfoad
Copy link
Contributor

jayfoad commented Feb 28, 2025

The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning.

I thought the scheduler only ever built the DAG for one region at a time. What's the mechanism for having DAG for multiple regions live at the same time?

@lucas-rami
Copy link
Contributor Author

lucas-rami commented Feb 28, 2025

Can we not do this on targets that don't need it?

I am ok reintroducing the flag to only conditionally do this for targets that care (off-by-default).

I thought the scheduler only ever built the DAG for one region at a time. What's the mechanism for having DAG for multiple regions live at the same time?

The AMDGPU backend doesn't actually schedule in the ScheduleDAGInstrs::schedule hook, it only collects the regions there. Scheduling only happens in the ScheduleDAGInstrs::finalizeSchedule hook, once all regions have been collected. In the end it still schedule regions one at a time, but there may be some instruction movement across regions between scheduling stages.

joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Mar 3, 2025
…m#128739) The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning. This makes the machine scheduler no longer skip single-MI regions. Only a few unit tests are affected (mainly those which check for the scheduler's debug output).
lucas-rami added a commit that referenced this pull request Mar 4, 2025
Following 15e295d the machine scheduler no longer filters-out single-MI regions when emitting regions to schedule. While this has no functional impact at the moment, it generally has a negative compile-time impact (see #128739). Since all targets but AMDGPU do not care for this behavior, this introduces an off-by-default flag to `ScheduleDAGInstrs` to control whether such regions are going to be scheduled, effectively reverting 15e295d for all targets but AMDGPU (currently the only target enabling this flag).
jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request May 1, 2025
) Following 15e295d the machine scheduler no longer filters-out single-MI regions when emitting regions to schedule. While this has no functional impact at the moment, it generally has a negative compile-time impact (see llvm#128739). Since all targets but AMDGPU do not care for this behavior, this introduces an off-by-default flag to `ScheduleDAGInstrs` to control whether such regions are going to be scheduled, effectively reverting 15e295d for all targets but AMDGPU (currently the only target enabling this flag). Change-Id: Ib38f9b7e8d2bb1073cb43ed7e58eaf251ffdce48
@lucas-rami lucas-rami deleted the single-mi-region-scheduling branch November 17, 2025 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment