- Notifications
You must be signed in to change notification settings - Fork 15.3k
[LICM] Promote conditional, loop-invariant memory accesses to scalars with intrinsic #93999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
| @llvm/pr-subscribers-backend-powerpc @llvm/pr-subscribers-llvm-transforms Author: Ivan Shumakov (ii-sc) ChangesThere is a missed opportunity to promote conditional store in the LICM pass. It has been implemented in this old patch: https://reviews.llvm.org/D115244 This patch has a minor flaw: on some architectures masked store intrinsic lowers to vector instructions. For example, on RISC-V architecture this code int res; void test(int * restrict a, int N) { for (int i = 0; i < N; ++i) if (a[i]) ++res; }translates to the assembler with vector instructions: .LBB0_2: # %for.cond.cleanup andi a4, a4, 1 vsetivli zero, 1, e32, mf2, ta, ma vmv.v.x v9, a4 vmv.v.x v8, a3 vsetvli zero, zero, e8, mf8, ta, ma vmsne.vi v0, v9, 0 vse32.v v8, (a2), v0.twhich is unnecessary here. I have implemented Patch is 32.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93999.diff 20 Files Affected:
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h index 40a9cf507248a..06fc284c7a824 100644 --- a/llvm/include/llvm/IR/IRBuilder.h +++ b/llvm/include/llvm/IR/IRBuilder.h @@ -818,6 +818,10 @@ class IRBuilderBase { /// Create a call to llvm.threadlocal.address intrinsic. CallInst *CreateThreadLocalAddress(Value *Ptr); + // Create a call to a Conditional Store intrinsic + CallInst *CreateConditionalStore(Value *Val, Value *Ptr, Align Alignment, + Value *Condition); + /// Create a call to Masked Load intrinsic CallInst *CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru = nullptr, const Twine &Name = ""); diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td index 107442623ab7b..95a9f6cc04de2 100644 --- a/llvm/include/llvm/IR/Intrinsics.td +++ b/llvm/include/llvm/IR/Intrinsics.td @@ -2322,6 +2322,21 @@ def int_vp_is_fpclass: llvm_i32_ty], [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<1>>]>; +//===-------------------------- Conditional Intrinsics --------------------===// +// + +def int_conditional_store: + DefaultAttrsIntrinsic</*ret_types*/[], + /*param_types*/[/*Val*/llvm_any_ty, + /*Ptr*/llvm_anyptr_ty, + /*Alignment*/llvm_i32_ty, + /*Condition*/llvm_i1_ty], + /*intr_properties*/[IntrWriteMem, + IntrArgMemOnly, + IntrWillReturn, + /*Alignment is a constant*/ImmArg<ArgIndex<2>>, + NoCapture<ArgIndex<1>>]>; + //===-------------------------- Masked Intrinsics -------------------------===// // def int_masked_load: diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 9ba75d491c1c9..c36f035e00bdc 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -169,6 +169,7 @@ void initializeLoopUnrollPass(PassRegistry&); void initializeLowerAtomicLegacyPassPass(PassRegistry&); void initializeLowerConstantIntrinsicsPass(PassRegistry&); void initializeLowerEmuTLSPass(PassRegistry&); +void initializeLowerConditionalStoreIntrinsicLegacyPass(PassRegistry &); void initializeLowerGlobalDtorsLegacyPassPass(PassRegistry &); void initializeLowerIntrinsicsPass(PassRegistry&); void initializeLowerInvokeLegacyPassPass(PassRegistry&); diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h index 30e7c22f31460..042a0e8768380 100644 --- a/llvm/include/llvm/LinkAllPasses.h +++ b/llvm/include/llvm/LinkAllPasses.h @@ -90,6 +90,7 @@ namespace { (void) llvm::createLoopStrengthReducePass(); (void) llvm::createLoopUnrollPass(); (void) llvm::createLowerConstantIntrinsicsPass(); + (void)llvm::createLowerConditionalStoreIntrinsicPass(); (void) llvm::createLowerGlobalDtorsLegacyPass(); (void) llvm::createLowerInvokePass(); (void) llvm::createLowerSwitchPass(); diff --git a/llvm/include/llvm/Transforms/Scalar.h b/llvm/include/llvm/Transforms/Scalar.h index f74a49785e11b..962f5c75a01e6 100644 --- a/llvm/include/llvm/Transforms/Scalar.h +++ b/llvm/include/llvm/Transforms/Scalar.h @@ -143,6 +143,11 @@ Pass *createMergeICmpsLegacyPass(); FunctionPass *createInferAddressSpacesPass(unsigned AddressSpace = ~0u); extern char &InferAddressSpacesID; +//===----------------------------------------------------------------------===// +// +// Lower conditional store intrinsic +FunctionPass *createLowerConditionalStoreIntrinsicPass(); + //===----------------------------------------------------------------------===// // // TLSVariableHoist - This pass reduce duplicated TLS address call. diff --git a/llvm/include/llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h b/llvm/include/llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h new file mode 100644 index 0000000000000..f3b6f6ce2a185 --- /dev/null +++ b/llvm/include/llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h @@ -0,0 +1,30 @@ +//===- LowerConditionalStoreIntrinsic.h -------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +/// \file +/// +/// Pass for early lowering of conditional store. +/// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TRANSFORMS_SCALAR_LOWERCONDSTOREINTRINSIC_H +#define LLVM_TRANSFORMS_SCALAR_LOWERCONDSTOREINTRINSIC_H + +#include "llvm/IR/PassManager.h" + +namespace llvm { + +class Function; + +struct LowerConditionalStoreIntrinsicPass + : PassInfoMixin<LowerConditionalStoreIntrinsicPass> { + PreservedAnalyses run(Function &F, FunctionAnalysisManager &); +}; + +} // namespace llvm + +#endif \ No newline at end of file diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp index b32799355d692..4a0ba04a86e24 100644 --- a/llvm/lib/IR/IRBuilder.cpp +++ b/llvm/lib/IR/IRBuilder.cpp @@ -566,6 +566,22 @@ Instruction *IRBuilderBase::CreateNoAliasScopeDeclaration(Value *Scope) { return CreateCall(FnIntrinsic, {Scope}); } +/// Create a call to a Conditional Store intrinsic. +/// \p Val - data to be stored, +/// \p Ptr - base pointer for the store +/// \p Alignment - alignment of the destination location +/// \p Condition - boolean that indicates if store should be performed +CallInst *IRBuilderBase::CreateConditionalStore(Value *Val, Value *Ptr, + Align Alignment, + Value *Condition) { + auto *PtrTy = cast<PointerType>(Ptr->getType()); + Type *DataTy = Val->getType(); + Type *OverloadedTypes[] = {DataTy, PtrTy}; + Value *Ops[] = {Val, Ptr, getInt32(Alignment.value()), Condition}; + return CreateMaskedIntrinsic(Intrinsic::conditional_store, Ops, + OverloadedTypes); +} + /// Create a call to a Masked Load intrinsic. /// \p Ty - vector type to load /// \p Ptr - base pointer for the load diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 734ca4d5deec9..fe6b75c3b8edd 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -232,6 +232,7 @@ #include "llvm/Transforms/Scalar/LoopUnrollPass.h" #include "llvm/Transforms/Scalar/LoopVersioningLICM.h" #include "llvm/Transforms/Scalar/LowerAtomicPass.h" +#include "llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h" #include "llvm/Transforms/Scalar/LowerConstantIntrinsics.h" #include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h" #include "llvm/Transforms/Scalar/LowerGuardIntrinsic.h" diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index 1892e16a06528..382c57c0375e0 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -110,6 +110,7 @@ #include "llvm/Transforms/Scalar/LoopUnrollAndJamPass.h" #include "llvm/Transforms/Scalar/LoopUnrollPass.h" #include "llvm/Transforms/Scalar/LoopVersioningLICM.h" +#include "llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h" #include "llvm/Transforms/Scalar/LowerConstantIntrinsics.h" #include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h" #include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h" @@ -499,7 +500,7 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level, FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); - + FPM.addPass(LowerConditionalStoreIntrinsicPass()); // Delete small array after loop unroll. FPM.addPass(SROAPass(SROAOptions::ModifyCFG)); @@ -691,7 +692,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); - + FPM.addPass(LowerConditionalStoreIntrinsicPass()); // Delete small array after loop unroll. FPM.addPass(SROAPass(SROAOptions::ModifyCFG)); @@ -744,7 +745,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap, /*AllowSpeculation=*/true), /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false)); - + FPM.addPass(LowerConditionalStoreIntrinsicPass()); FPM.addPass(CoroElidePass()); invokeScalarOptimizerLateEPCallbacks(FPM, Level); @@ -1279,6 +1280,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level, SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true))); ExtraPasses.addPass(InstCombinePass()); FPM.addPass(std::move(ExtraPasses)); + FPM.addPass(LowerConditionalStoreIntrinsicPass()); } // Now that we've formed fast to execute loop structures, we do further @@ -1354,6 +1356,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level, LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap, /*AllowSpeculation=*/true), /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false)); + FPM.addPass(LowerConditionalStoreIntrinsicPass()); // Now that we've vectorized and unrolled loops, we may have more refined // alignment information, try to re-derive it here. @@ -1950,6 +1953,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level, LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap, /*AllowSpeculation=*/true), /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false)); + MainFPM.addPass(LowerConditionalStoreIntrinsicPass()); if (RunNewGVN) MainFPM.addPass(NewGVNPass()); diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def index 50682ca4970f1..04d01f7d7310e 100644 --- a/llvm/lib/Passes/PassRegistry.def +++ b/llvm/lib/Passes/PassRegistry.def @@ -385,6 +385,7 @@ FUNCTION_PASS("lower-allow-check", LowerAllowCheckPass()) FUNCTION_PASS("lower-atomic", LowerAtomicPass()) FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass()) FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass()) +FUNCTION_PASS("lower-conditional-store", LowerConditionalStoreIntrinsicPass()) FUNCTION_PASS("lower-guard-intrinsic", LowerGuardIntrinsicPass()) FUNCTION_PASS("lower-invoke", LowerInvokePass()) FUNCTION_PASS("lower-switch", LowerSwitchPass()) diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp index 945ab5cf1f303..db8eb7a952335 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp @@ -619,6 +619,9 @@ void AArch64PassConfig::addIRPasses() { // Do loop invariant code motion in case part of the lowered result is // invariant. addPass(createLICMPass()); + // This pass expands conditional store intrinsics, + // which are not supported in the target + addPass(createLowerConditionalStoreIntrinsicPass()); } TargetPassConfig::addIRPasses(); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index dbbfe34a63863..a865c81d430e4 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -1046,8 +1046,12 @@ void AMDGPUPassConfig::addIRPasses() { // Try to hoist loop invariant parts of divisions AMDGPUCodeGenPrepare may // have expanded. - if (TM.getOptLevel() > CodeGenOptLevel::Less) + if (TM.getOptLevel() > CodeGenOptLevel::Less) { addPass(createLICMPass()); + // This pass expands conditional store intrinsics, + // which are not supported in the target + addPass(createLowerConditionalStoreIntrinsicPass()); + } } TargetPassConfig::addIRPasses(); diff --git a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp index 714cf69827a1e..3650f4c2121e6 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp @@ -485,6 +485,9 @@ void PPCPassConfig::addIRPasses() { // Do loop invariant code motion in case part of the lowered result is // invariant. addPass(createLICMPass()); + // This pass expands conditional store intrinsics, + // which are not supported in the target + addPass(createLowerConditionalStoreIntrinsicPass()); } TargetPassConfig::addIRPasses(); diff --git a/llvm/lib/Transforms/Scalar/CMakeLists.txt b/llvm/lib/Transforms/Scalar/CMakeLists.txt index ba09ebf8b04c4..bdecc839729c9 100644 --- a/llvm/lib/Transforms/Scalar/CMakeLists.txt +++ b/llvm/lib/Transforms/Scalar/CMakeLists.txt @@ -48,6 +48,7 @@ add_llvm_component_library(LLVMScalarOpts LoopUnrollAndJamPass.cpp LoopVersioningLICM.cpp LowerAtomicPass.cpp + LowerConditionalStoreIntrinsic.cpp LowerConstantIntrinsics.cpp LowerExpectIntrinsic.cpp LowerGuardIntrinsic.cpp diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp index 5eccf7b4adb65..009b5f3bb7bcc 100644 --- a/llvm/lib/Transforms/Scalar/LICM.cpp +++ b/llvm/lib/Transforms/Scalar/LICM.cpp @@ -157,6 +157,11 @@ cl::opt<unsigned> llvm::SetLicmMssaOptCap( cl::desc("Enable imprecision in LICM in pathological cases, in exchange " "for faster compile. Caps the MemorySSA clobbering calls.")); +cl::opt<bool> SetLicmConditionalAccessPromotion( + "licm-conditional-access-promotion", cl::Hidden, cl::init(true), + cl::desc("Enable promotion of conditional accesses of loop-invariant" + " locations")); + // Experimentally, memory promotion carries less importance than sinking and // hoisting. Limit when we do promotion when using MemorySSA, in order to save // compile time. @@ -1819,6 +1824,12 @@ class LoopPromoter : public LoadAndStorePromoter { AAMDNodes AATags; ICFLoopSafetyInfo &SafetyInfo; bool CanInsertStoresInExitBlocks; + // This flag will be used to make sure that every sinken, conditional store + // instruction is executed conditionally within the exit blocks. In the + // preheader, it is initialized to 0. In every basic block containing a + // conditional store it is raised. + bool ConditionalAccessShouldBePromoted; + SSAUpdater &FlagSSAUpdater; ArrayRef<const Instruction *> Uses; // We're about to add a use of V in a loop exit block. Insert an LCSSA phi @@ -1839,6 +1850,17 @@ class LoopPromoter : public LoadAndStorePromoter { return PN; } + void promoteConditionalAccess(BasicBlock *ExitBlock, Value *LiveInValue, + Value *PtrToExitBB, + BasicBlock::iterator InsertPos) { + Value *FlagValue = FlagSSAUpdater.GetValueInMiddleOfBlock(ExitBlock); + IRBuilder<> Builder(&*InsertPos); + Type *DataType = LiveInValue->getType(); + Value *Ptr = Builder.CreatePointerCast(PtrToExitBB, + PointerType::getUnqual(DataType)); + Builder.CreateConditionalStore(LiveInValue, Ptr, Alignment, FlagValue); + } + public: LoopPromoter(Value *SP, ArrayRef<const Instruction *> Insts, SSAUpdater &S, SmallVectorImpl<BasicBlock *> &LEB, @@ -1846,13 +1868,17 @@ class LoopPromoter : public LoadAndStorePromoter { SmallVectorImpl<MemoryAccess *> &MSSAIP, PredIteratorCache &PIC, MemorySSAUpdater &MSSAU, LoopInfo &li, DebugLoc dl, Align Alignment, bool UnorderedAtomic, const AAMDNodes &AATags, - ICFLoopSafetyInfo &SafetyInfo, bool CanInsertStoresInExitBlocks) + ICFLoopSafetyInfo &SafetyInfo, bool CanInsertStoresInExitBlocks, + bool ConditionalAccessShouldBePromoted, + SSAUpdater &FlagSSAUpdater) : LoadAndStorePromoter(Insts, S), SomePtr(SP), LoopExitBlocks(LEB), LoopInsertPts(LIP), MSSAInsertPts(MSSAIP), PredCache(PIC), MSSAU(MSSAU), LI(li), DL(std::move(dl)), Alignment(Alignment), UnorderedAtomic(UnorderedAtomic), AATags(AATags), SafetyInfo(SafetyInfo), - CanInsertStoresInExitBlocks(CanInsertStoresInExitBlocks), Uses(Insts) {} + CanInsertStoresInExitBlocks(CanInsertStoresInExitBlocks), + ConditionalAccessShouldBePromoted(ConditionalAccessShouldBePromoted), + FlagSSAUpdater(FlagSSAUpdater), Uses(Insts) {} void insertStoresInLoopExitBlocks() { // Insert stores after in the loop exit blocks. Each exit block gets a @@ -1866,6 +1892,10 @@ class LoopPromoter : public LoadAndStorePromoter { LiveInValue = maybeInsertLCSSAPHI(LiveInValue, ExitBlock); Value *Ptr = maybeInsertLCSSAPHI(SomePtr, ExitBlock); BasicBlock::iterator InsertPos = LoopInsertPts[i]; + if (ConditionalAccessShouldBePromoted) { + promoteConditionalAccess(ExitBlock, LiveInValue, Ptr, InsertPos); + continue; + } StoreInst *NewSI = new StoreInst(LiveInValue, Ptr, InsertPos); if (UnorderedAtomic) NewSI->setOrdering(AtomicOrdering::Unordered); @@ -2042,6 +2072,9 @@ bool llvm::promoteLoopAccessesToScalars( bool SawNotAtomic = false; AAMDNodes AATags; + bool SawConditionalLIStore = false; + StringRef PointerOperandName; + const DataLayout &MDL = Preheader->getModule()->getDataLayout(); // If there are reads outside the promoted set, then promoting stores is @@ -2119,6 +2152,12 @@ bool llvm::promoteLoopAccessesToScalars( if (StoreSafety == StoreSafetyUnknown) StoreSafety = StoreSafe; Alignment = std::max(Alignment, InstAlignment); + } else if (SetLicmConditionalAccessPromotion && + (!SawConditionalLIStore || (InstAlignment > Alignment))) { + SawConditionalLIStore = true; + if (PointerOperandName.empty()) + PointerOperandName = Store->getPointerOperand()->getName(); + Alignment = std::max(Alignment, InstAlignment); } // If a store dominates all exit blocks, it is safe to sink. @@ -2199,6 +2238,29 @@ bool llvm::promoteLoopAccessesToScalars( // If we cannot hoist the load either, give up. return false; + const bool PromoteConditionalAccesses = + SetLicmConditionalAccessPromotion && SawConditionalLIStore; + bool ConditionalAccessShouldBePromoted = false; + SmallVector<PHINode *, 16> FlagPHIs; + SSAUpdater FlagSSAUpdater(&FlagPHIs); + if (StoreSafety == StoreSafetyUnknown && PromoteConditionalAccesses) { + ConditionalAccessShouldBePromoted = true; + // If we are allowed to promote conditional stores, store promotion is safe + StoreSafety = StoreSafe; + Type *Int1Ty = Type::getInt1Ty(Preheader->getParent()->getContext()); + FlagSSAUpdater.Initialize(Int1Ty, PointerOperandName.str() + ".flag"); + // Initialize the flag with 0 in the preheader. + FlagSSAUpdater.AddAvailableValue(Preheader, + ConstantInt::get(Int1Ty, + /* Value */ 0)); + for (auto *UI : LoopUses) + if (StoreInst *ConditionalLIStore = dyn_cast<StoreInst>(UI)) + // Raise the flag if a conditional store happened. + FlagSSAUpdater.AddAvailableValue(ConditionalLIStore->getParent(), + ConstantInt::get(Int1Ty, + /* Value */ 1)); + } + // Lets do the promotion! if (StoreSafety == StoreSafe) { LLVM_DEBUG(dbgs() << "LICM: Promoting load/store of the value: " << *SomePtr @@ -2228,7 +2290,8 @@ bool llvm::promoteLoopAccessesToScalars( LoopPromoter Promoter(SomePtr, LoopUses, SSA, ExitBlocks, InsertPts, MSSAInsertPts, PIC, MSSAU, *LI,... [truncated] |
| @llvm/pr-subscribers-backend-amdgpu Author: Ivan Shumakov (ii-sc) ChangesThere is a missed opportunity to promote conditional store in the LICM pass. It has been implemented in this old patch: https://reviews.llvm.org/D115244 This patch has a minor flaw: on some architectures masked store intrinsic lowers to vector instructions. For example, on RISC-V architecture this code int res; void test(int * restrict a, int N) { for (int i = 0; i < N; ++i) if (a[i]) ++res; }translates to the assembler with vector instructions: .LBB0_2: # %for.cond.cleanup andi a4, a4, 1 vsetivli zero, 1, e32, mf2, ta, ma vmv.v.x v9, a4 vmv.v.x v8, a3 vsetvli zero, zero, e8, mf8, ta, ma vmsne.vi v0, v9, 0 vse32.v v8, (a2), v0.twhich is unnecessary here. I have implemented Patch is 32.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93999.diff 20 Files Affected:
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h index 40a9cf507248a..06fc284c7a824 100644 --- a/llvm/include/llvm/IR/IRBuilder.h +++ b/llvm/include/llvm/IR/IRBuilder.h @@ -818,6 +818,10 @@ class IRBuilderBase { /// Create a call to llvm.threadlocal.address intrinsic. CallInst *CreateThreadLocalAddress(Value *Ptr); + // Create a call to a Conditional Store intrinsic + CallInst *CreateConditionalStore(Value *Val, Value *Ptr, Align Alignment, + Value *Condition); + /// Create a call to Masked Load intrinsic CallInst *CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru = nullptr, const Twine &Name = ""); diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td index 107442623ab7b..95a9f6cc04de2 100644 --- a/llvm/include/llvm/IR/Intrinsics.td +++ b/llvm/include/llvm/IR/Intrinsics.td @@ -2322,6 +2322,21 @@ def int_vp_is_fpclass: llvm_i32_ty], [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<1>>]>; +//===-------------------------- Conditional Intrinsics --------------------===// +// + +def int_conditional_store: + DefaultAttrsIntrinsic</*ret_types*/[], + /*param_types*/[/*Val*/llvm_any_ty, + /*Ptr*/llvm_anyptr_ty, + /*Alignment*/llvm_i32_ty, + /*Condition*/llvm_i1_ty], + /*intr_properties*/[IntrWriteMem, + IntrArgMemOnly, + IntrWillReturn, + /*Alignment is a constant*/ImmArg<ArgIndex<2>>, + NoCapture<ArgIndex<1>>]>; + //===-------------------------- Masked Intrinsics -------------------------===// // def int_masked_load: diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 9ba75d491c1c9..c36f035e00bdc 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -169,6 +169,7 @@ void initializeLoopUnrollPass(PassRegistry&); void initializeLowerAtomicLegacyPassPass(PassRegistry&); void initializeLowerConstantIntrinsicsPass(PassRegistry&); void initializeLowerEmuTLSPass(PassRegistry&); +void initializeLowerConditionalStoreIntrinsicLegacyPass(PassRegistry &); void initializeLowerGlobalDtorsLegacyPassPass(PassRegistry &); void initializeLowerIntrinsicsPass(PassRegistry&); void initializeLowerInvokeLegacyPassPass(PassRegistry&); diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h index 30e7c22f31460..042a0e8768380 100644 --- a/llvm/include/llvm/LinkAllPasses.h +++ b/llvm/include/llvm/LinkAllPasses.h @@ -90,6 +90,7 @@ namespace { (void) llvm::createLoopStrengthReducePass(); (void) llvm::createLoopUnrollPass(); (void) llvm::createLowerConstantIntrinsicsPass(); + (void)llvm::createLowerConditionalStoreIntrinsicPass(); (void) llvm::createLowerGlobalDtorsLegacyPass(); (void) llvm::createLowerInvokePass(); (void) llvm::createLowerSwitchPass(); diff --git a/llvm/include/llvm/Transforms/Scalar.h b/llvm/include/llvm/Transforms/Scalar.h index f74a49785e11b..962f5c75a01e6 100644 --- a/llvm/include/llvm/Transforms/Scalar.h +++ b/llvm/include/llvm/Transforms/Scalar.h @@ -143,6 +143,11 @@ Pass *createMergeICmpsLegacyPass(); FunctionPass *createInferAddressSpacesPass(unsigned AddressSpace = ~0u); extern char &InferAddressSpacesID; +//===----------------------------------------------------------------------===// +// +// Lower conditional store intrinsic +FunctionPass *createLowerConditionalStoreIntrinsicPass(); + //===----------------------------------------------------------------------===// // // TLSVariableHoist - This pass reduce duplicated TLS address call. diff --git a/llvm/include/llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h b/llvm/include/llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h new file mode 100644 index 0000000000000..f3b6f6ce2a185 --- /dev/null +++ b/llvm/include/llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h @@ -0,0 +1,30 @@ +//===- LowerConditionalStoreIntrinsic.h -------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +/// \file +/// +/// Pass for early lowering of conditional store. +/// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TRANSFORMS_SCALAR_LOWERCONDSTOREINTRINSIC_H +#define LLVM_TRANSFORMS_SCALAR_LOWERCONDSTOREINTRINSIC_H + +#include "llvm/IR/PassManager.h" + +namespace llvm { + +class Function; + +struct LowerConditionalStoreIntrinsicPass + : PassInfoMixin<LowerConditionalStoreIntrinsicPass> { + PreservedAnalyses run(Function &F, FunctionAnalysisManager &); +}; + +} // namespace llvm + +#endif \ No newline at end of file diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp index b32799355d692..4a0ba04a86e24 100644 --- a/llvm/lib/IR/IRBuilder.cpp +++ b/llvm/lib/IR/IRBuilder.cpp @@ -566,6 +566,22 @@ Instruction *IRBuilderBase::CreateNoAliasScopeDeclaration(Value *Scope) { return CreateCall(FnIntrinsic, {Scope}); } +/// Create a call to a Conditional Store intrinsic. +/// \p Val - data to be stored, +/// \p Ptr - base pointer for the store +/// \p Alignment - alignment of the destination location +/// \p Condition - boolean that indicates if store should be performed +CallInst *IRBuilderBase::CreateConditionalStore(Value *Val, Value *Ptr, + Align Alignment, + Value *Condition) { + auto *PtrTy = cast<PointerType>(Ptr->getType()); + Type *DataTy = Val->getType(); + Type *OverloadedTypes[] = {DataTy, PtrTy}; + Value *Ops[] = {Val, Ptr, getInt32(Alignment.value()), Condition}; + return CreateMaskedIntrinsic(Intrinsic::conditional_store, Ops, + OverloadedTypes); +} + /// Create a call to a Masked Load intrinsic. /// \p Ty - vector type to load /// \p Ptr - base pointer for the load diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 734ca4d5deec9..fe6b75c3b8edd 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -232,6 +232,7 @@ #include "llvm/Transforms/Scalar/LoopUnrollPass.h" #include "llvm/Transforms/Scalar/LoopVersioningLICM.h" #include "llvm/Transforms/Scalar/LowerAtomicPass.h" +#include "llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h" #include "llvm/Transforms/Scalar/LowerConstantIntrinsics.h" #include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h" #include "llvm/Transforms/Scalar/LowerGuardIntrinsic.h" diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index 1892e16a06528..382c57c0375e0 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -110,6 +110,7 @@ #include "llvm/Transforms/Scalar/LoopUnrollAndJamPass.h" #include "llvm/Transforms/Scalar/LoopUnrollPass.h" #include "llvm/Transforms/Scalar/LoopVersioningLICM.h" +#include "llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h" #include "llvm/Transforms/Scalar/LowerConstantIntrinsics.h" #include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h" #include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h" @@ -499,7 +500,7 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level, FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); - + FPM.addPass(LowerConditionalStoreIntrinsicPass()); // Delete small array after loop unroll. FPM.addPass(SROAPass(SROAOptions::ModifyCFG)); @@ -691,7 +692,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); - + FPM.addPass(LowerConditionalStoreIntrinsicPass()); // Delete small array after loop unroll. FPM.addPass(SROAPass(SROAOptions::ModifyCFG)); @@ -744,7 +745,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap, /*AllowSpeculation=*/true), /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false)); - + FPM.addPass(LowerConditionalStoreIntrinsicPass()); FPM.addPass(CoroElidePass()); invokeScalarOptimizerLateEPCallbacks(FPM, Level); @@ -1279,6 +1280,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level, SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true))); ExtraPasses.addPass(InstCombinePass()); FPM.addPass(std::move(ExtraPasses)); + FPM.addPass(LowerConditionalStoreIntrinsicPass()); } // Now that we've formed fast to execute loop structures, we do further @@ -1354,6 +1356,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level, LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap, /*AllowSpeculation=*/true), /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false)); + FPM.addPass(LowerConditionalStoreIntrinsicPass()); // Now that we've vectorized and unrolled loops, we may have more refined // alignment information, try to re-derive it here. @@ -1950,6 +1953,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level, LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap, /*AllowSpeculation=*/true), /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false)); + MainFPM.addPass(LowerConditionalStoreIntrinsicPass()); if (RunNewGVN) MainFPM.addPass(NewGVNPass()); diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def index 50682ca4970f1..04d01f7d7310e 100644 --- a/llvm/lib/Passes/PassRegistry.def +++ b/llvm/lib/Passes/PassRegistry.def @@ -385,6 +385,7 @@ FUNCTION_PASS("lower-allow-check", LowerAllowCheckPass()) FUNCTION_PASS("lower-atomic", LowerAtomicPass()) FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass()) FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass()) +FUNCTION_PASS("lower-conditional-store", LowerConditionalStoreIntrinsicPass()) FUNCTION_PASS("lower-guard-intrinsic", LowerGuardIntrinsicPass()) FUNCTION_PASS("lower-invoke", LowerInvokePass()) FUNCTION_PASS("lower-switch", LowerSwitchPass()) diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp index 945ab5cf1f303..db8eb7a952335 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp @@ -619,6 +619,9 @@ void AArch64PassConfig::addIRPasses() { // Do loop invariant code motion in case part of the lowered result is // invariant. addPass(createLICMPass()); + // This pass expands conditional store intrinsics, + // which are not supported in the target + addPass(createLowerConditionalStoreIntrinsicPass()); } TargetPassConfig::addIRPasses(); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index dbbfe34a63863..a865c81d430e4 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -1046,8 +1046,12 @@ void AMDGPUPassConfig::addIRPasses() { // Try to hoist loop invariant parts of divisions AMDGPUCodeGenPrepare may // have expanded. - if (TM.getOptLevel() > CodeGenOptLevel::Less) + if (TM.getOptLevel() > CodeGenOptLevel::Less) { addPass(createLICMPass()); + // This pass expands conditional store intrinsics, + // which are not supported in the target + addPass(createLowerConditionalStoreIntrinsicPass()); + } } TargetPassConfig::addIRPasses(); diff --git a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp index 714cf69827a1e..3650f4c2121e6 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp @@ -485,6 +485,9 @@ void PPCPassConfig::addIRPasses() { // Do loop invariant code motion in case part of the lowered result is // invariant. addPass(createLICMPass()); + // This pass expands conditional store intrinsics, + // which are not supported in the target + addPass(createLowerConditionalStoreIntrinsicPass()); } TargetPassConfig::addIRPasses(); diff --git a/llvm/lib/Transforms/Scalar/CMakeLists.txt b/llvm/lib/Transforms/Scalar/CMakeLists.txt index ba09ebf8b04c4..bdecc839729c9 100644 --- a/llvm/lib/Transforms/Scalar/CMakeLists.txt +++ b/llvm/lib/Transforms/Scalar/CMakeLists.txt @@ -48,6 +48,7 @@ add_llvm_component_library(LLVMScalarOpts LoopUnrollAndJamPass.cpp LoopVersioningLICM.cpp LowerAtomicPass.cpp + LowerConditionalStoreIntrinsic.cpp LowerConstantIntrinsics.cpp LowerExpectIntrinsic.cpp LowerGuardIntrinsic.cpp diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp index 5eccf7b4adb65..009b5f3bb7bcc 100644 --- a/llvm/lib/Transforms/Scalar/LICM.cpp +++ b/llvm/lib/Transforms/Scalar/LICM.cpp @@ -157,6 +157,11 @@ cl::opt<unsigned> llvm::SetLicmMssaOptCap( cl::desc("Enable imprecision in LICM in pathological cases, in exchange " "for faster compile. Caps the MemorySSA clobbering calls.")); +cl::opt<bool> SetLicmConditionalAccessPromotion( + "licm-conditional-access-promotion", cl::Hidden, cl::init(true), + cl::desc("Enable promotion of conditional accesses of loop-invariant" + " locations")); + // Experimentally, memory promotion carries less importance than sinking and // hoisting. Limit when we do promotion when using MemorySSA, in order to save // compile time. @@ -1819,6 +1824,12 @@ class LoopPromoter : public LoadAndStorePromoter { AAMDNodes AATags; ICFLoopSafetyInfo &SafetyInfo; bool CanInsertStoresInExitBlocks; + // This flag will be used to make sure that every sinken, conditional store + // instruction is executed conditionally within the exit blocks. In the + // preheader, it is initialized to 0. In every basic block containing a + // conditional store it is raised. + bool ConditionalAccessShouldBePromoted; + SSAUpdater &FlagSSAUpdater; ArrayRef<const Instruction *> Uses; // We're about to add a use of V in a loop exit block. Insert an LCSSA phi @@ -1839,6 +1850,17 @@ class LoopPromoter : public LoadAndStorePromoter { return PN; } + void promoteConditionalAccess(BasicBlock *ExitBlock, Value *LiveInValue, + Value *PtrToExitBB, + BasicBlock::iterator InsertPos) { + Value *FlagValue = FlagSSAUpdater.GetValueInMiddleOfBlock(ExitBlock); + IRBuilder<> Builder(&*InsertPos); + Type *DataType = LiveInValue->getType(); + Value *Ptr = Builder.CreatePointerCast(PtrToExitBB, + PointerType::getUnqual(DataType)); + Builder.CreateConditionalStore(LiveInValue, Ptr, Alignment, FlagValue); + } + public: LoopPromoter(Value *SP, ArrayRef<const Instruction *> Insts, SSAUpdater &S, SmallVectorImpl<BasicBlock *> &LEB, @@ -1846,13 +1868,17 @@ class LoopPromoter : public LoadAndStorePromoter { SmallVectorImpl<MemoryAccess *> &MSSAIP, PredIteratorCache &PIC, MemorySSAUpdater &MSSAU, LoopInfo &li, DebugLoc dl, Align Alignment, bool UnorderedAtomic, const AAMDNodes &AATags, - ICFLoopSafetyInfo &SafetyInfo, bool CanInsertStoresInExitBlocks) + ICFLoopSafetyInfo &SafetyInfo, bool CanInsertStoresInExitBlocks, + bool ConditionalAccessShouldBePromoted, + SSAUpdater &FlagSSAUpdater) : LoadAndStorePromoter(Insts, S), SomePtr(SP), LoopExitBlocks(LEB), LoopInsertPts(LIP), MSSAInsertPts(MSSAIP), PredCache(PIC), MSSAU(MSSAU), LI(li), DL(std::move(dl)), Alignment(Alignment), UnorderedAtomic(UnorderedAtomic), AATags(AATags), SafetyInfo(SafetyInfo), - CanInsertStoresInExitBlocks(CanInsertStoresInExitBlocks), Uses(Insts) {} + CanInsertStoresInExitBlocks(CanInsertStoresInExitBlocks), + ConditionalAccessShouldBePromoted(ConditionalAccessShouldBePromoted), + FlagSSAUpdater(FlagSSAUpdater), Uses(Insts) {} void insertStoresInLoopExitBlocks() { // Insert stores after in the loop exit blocks. Each exit block gets a @@ -1866,6 +1892,10 @@ class LoopPromoter : public LoadAndStorePromoter { LiveInValue = maybeInsertLCSSAPHI(LiveInValue, ExitBlock); Value *Ptr = maybeInsertLCSSAPHI(SomePtr, ExitBlock); BasicBlock::iterator InsertPos = LoopInsertPts[i]; + if (ConditionalAccessShouldBePromoted) { + promoteConditionalAccess(ExitBlock, LiveInValue, Ptr, InsertPos); + continue; + } StoreInst *NewSI = new StoreInst(LiveInValue, Ptr, InsertPos); if (UnorderedAtomic) NewSI->setOrdering(AtomicOrdering::Unordered); @@ -2042,6 +2072,9 @@ bool llvm::promoteLoopAccessesToScalars( bool SawNotAtomic = false; AAMDNodes AATags; + bool SawConditionalLIStore = false; + StringRef PointerOperandName; + const DataLayout &MDL = Preheader->getModule()->getDataLayout(); // If there are reads outside the promoted set, then promoting stores is @@ -2119,6 +2152,12 @@ bool llvm::promoteLoopAccessesToScalars( if (StoreSafety == StoreSafetyUnknown) StoreSafety = StoreSafe; Alignment = std::max(Alignment, InstAlignment); + } else if (SetLicmConditionalAccessPromotion && + (!SawConditionalLIStore || (InstAlignment > Alignment))) { + SawConditionalLIStore = true; + if (PointerOperandName.empty()) + PointerOperandName = Store->getPointerOperand()->getName(); + Alignment = std::max(Alignment, InstAlignment); } // If a store dominates all exit blocks, it is safe to sink. @@ -2199,6 +2238,29 @@ bool llvm::promoteLoopAccessesToScalars( // If we cannot hoist the load either, give up. return false; + const bool PromoteConditionalAccesses = + SetLicmConditionalAccessPromotion && SawConditionalLIStore; + bool ConditionalAccessShouldBePromoted = false; + SmallVector<PHINode *, 16> FlagPHIs; + SSAUpdater FlagSSAUpdater(&FlagPHIs); + if (StoreSafety == StoreSafetyUnknown && PromoteConditionalAccesses) { + ConditionalAccessShouldBePromoted = true; + // If we are allowed to promote conditional stores, store promotion is safe + StoreSafety = StoreSafe; + Type *Int1Ty = Type::getInt1Ty(Preheader->getParent()->getContext()); + FlagSSAUpdater.Initialize(Int1Ty, PointerOperandName.str() + ".flag"); + // Initialize the flag with 0 in the preheader. + FlagSSAUpdater.AddAvailableValue(Preheader, + ConstantInt::get(Int1Ty, + /* Value */ 0)); + for (auto *UI : LoopUses) + if (StoreInst *ConditionalLIStore = dyn_cast<StoreInst>(UI)) + // Raise the flag if a conditional store happened. + FlagSSAUpdater.AddAvailableValue(ConditionalLIStore->getParent(), + ConstantInt::get(Int1Ty, + /* Value */ 1)); + } + // Lets do the promotion! if (StoreSafety == StoreSafe) { LLVM_DEBUG(dbgs() << "LICM: Promoting load/store of the value: " << *SomePtr @@ -2228,7 +2290,8 @@ bool llvm::promoteLoopAccessesToScalars( LoopPromoter Promoter(SomePtr, LoopUses, SSA, ExitBlocks, InsertPts, MSSAInsertPts, PIC, MSSAU, *LI,... [truncated] |
| @llvm/pr-subscribers-backend-aarch64 Author: Ivan Shumakov (ii-sc) ChangesThere is a missed opportunity to promote conditional store in the LICM pass. It has been implemented in this old patch: https://reviews.llvm.org/D115244 This patch has a minor flaw: on some architectures masked store intrinsic lowers to vector instructions. For example, on RISC-V architecture this code int res; void test(int * restrict a, int N) { for (int i = 0; i < N; ++i) if (a[i]) ++res; }translates to the assembler with vector instructions: .LBB0_2: # %for.cond.cleanup andi a4, a4, 1 vsetivli zero, 1, e32, mf2, ta, ma vmv.v.x v9, a4 vmv.v.x v8, a3 vsetvli zero, zero, e8, mf8, ta, ma vmsne.vi v0, v9, 0 vse32.v v8, (a2), v0.twhich is unnecessary here. I have implemented Patch is 32.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93999.diff 20 Files Affected:
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h index 40a9cf507248a..06fc284c7a824 100644 --- a/llvm/include/llvm/IR/IRBuilder.h +++ b/llvm/include/llvm/IR/IRBuilder.h @@ -818,6 +818,10 @@ class IRBuilderBase { /// Create a call to llvm.threadlocal.address intrinsic. CallInst *CreateThreadLocalAddress(Value *Ptr); + // Create a call to a Conditional Store intrinsic + CallInst *CreateConditionalStore(Value *Val, Value *Ptr, Align Alignment, + Value *Condition); + /// Create a call to Masked Load intrinsic CallInst *CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru = nullptr, const Twine &Name = ""); diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td index 107442623ab7b..95a9f6cc04de2 100644 --- a/llvm/include/llvm/IR/Intrinsics.td +++ b/llvm/include/llvm/IR/Intrinsics.td @@ -2322,6 +2322,21 @@ def int_vp_is_fpclass: llvm_i32_ty], [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<1>>]>; +//===-------------------------- Conditional Intrinsics --------------------===// +// + +def int_conditional_store: + DefaultAttrsIntrinsic</*ret_types*/[], + /*param_types*/[/*Val*/llvm_any_ty, + /*Ptr*/llvm_anyptr_ty, + /*Alignment*/llvm_i32_ty, + /*Condition*/llvm_i1_ty], + /*intr_properties*/[IntrWriteMem, + IntrArgMemOnly, + IntrWillReturn, + /*Alignment is a constant*/ImmArg<ArgIndex<2>>, + NoCapture<ArgIndex<1>>]>; + //===-------------------------- Masked Intrinsics -------------------------===// // def int_masked_load: diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 9ba75d491c1c9..c36f035e00bdc 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -169,6 +169,7 @@ void initializeLoopUnrollPass(PassRegistry&); void initializeLowerAtomicLegacyPassPass(PassRegistry&); void initializeLowerConstantIntrinsicsPass(PassRegistry&); void initializeLowerEmuTLSPass(PassRegistry&); +void initializeLowerConditionalStoreIntrinsicLegacyPass(PassRegistry &); void initializeLowerGlobalDtorsLegacyPassPass(PassRegistry &); void initializeLowerIntrinsicsPass(PassRegistry&); void initializeLowerInvokeLegacyPassPass(PassRegistry&); diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h index 30e7c22f31460..042a0e8768380 100644 --- a/llvm/include/llvm/LinkAllPasses.h +++ b/llvm/include/llvm/LinkAllPasses.h @@ -90,6 +90,7 @@ namespace { (void) llvm::createLoopStrengthReducePass(); (void) llvm::createLoopUnrollPass(); (void) llvm::createLowerConstantIntrinsicsPass(); + (void)llvm::createLowerConditionalStoreIntrinsicPass(); (void) llvm::createLowerGlobalDtorsLegacyPass(); (void) llvm::createLowerInvokePass(); (void) llvm::createLowerSwitchPass(); diff --git a/llvm/include/llvm/Transforms/Scalar.h b/llvm/include/llvm/Transforms/Scalar.h index f74a49785e11b..962f5c75a01e6 100644 --- a/llvm/include/llvm/Transforms/Scalar.h +++ b/llvm/include/llvm/Transforms/Scalar.h @@ -143,6 +143,11 @@ Pass *createMergeICmpsLegacyPass(); FunctionPass *createInferAddressSpacesPass(unsigned AddressSpace = ~0u); extern char &InferAddressSpacesID; +//===----------------------------------------------------------------------===// +// +// Lower conditional store intrinsic +FunctionPass *createLowerConditionalStoreIntrinsicPass(); + //===----------------------------------------------------------------------===// // // TLSVariableHoist - This pass reduce duplicated TLS address call. diff --git a/llvm/include/llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h b/llvm/include/llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h new file mode 100644 index 0000000000000..f3b6f6ce2a185 --- /dev/null +++ b/llvm/include/llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h @@ -0,0 +1,30 @@ +//===- LowerConditionalStoreIntrinsic.h -------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +/// \file +/// +/// Pass for early lowering of conditional store. +/// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TRANSFORMS_SCALAR_LOWERCONDSTOREINTRINSIC_H +#define LLVM_TRANSFORMS_SCALAR_LOWERCONDSTOREINTRINSIC_H + +#include "llvm/IR/PassManager.h" + +namespace llvm { + +class Function; + +struct LowerConditionalStoreIntrinsicPass + : PassInfoMixin<LowerConditionalStoreIntrinsicPass> { + PreservedAnalyses run(Function &F, FunctionAnalysisManager &); +}; + +} // namespace llvm + +#endif \ No newline at end of file diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp index b32799355d692..4a0ba04a86e24 100644 --- a/llvm/lib/IR/IRBuilder.cpp +++ b/llvm/lib/IR/IRBuilder.cpp @@ -566,6 +566,22 @@ Instruction *IRBuilderBase::CreateNoAliasScopeDeclaration(Value *Scope) { return CreateCall(FnIntrinsic, {Scope}); } +/// Create a call to a Conditional Store intrinsic. +/// \p Val - data to be stored, +/// \p Ptr - base pointer for the store +/// \p Alignment - alignment of the destination location +/// \p Condition - boolean that indicates if store should be performed +CallInst *IRBuilderBase::CreateConditionalStore(Value *Val, Value *Ptr, + Align Alignment, + Value *Condition) { + auto *PtrTy = cast<PointerType>(Ptr->getType()); + Type *DataTy = Val->getType(); + Type *OverloadedTypes[] = {DataTy, PtrTy}; + Value *Ops[] = {Val, Ptr, getInt32(Alignment.value()), Condition}; + return CreateMaskedIntrinsic(Intrinsic::conditional_store, Ops, + OverloadedTypes); +} + /// Create a call to a Masked Load intrinsic. /// \p Ty - vector type to load /// \p Ptr - base pointer for the load diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 734ca4d5deec9..fe6b75c3b8edd 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -232,6 +232,7 @@ #include "llvm/Transforms/Scalar/LoopUnrollPass.h" #include "llvm/Transforms/Scalar/LoopVersioningLICM.h" #include "llvm/Transforms/Scalar/LowerAtomicPass.h" +#include "llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h" #include "llvm/Transforms/Scalar/LowerConstantIntrinsics.h" #include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h" #include "llvm/Transforms/Scalar/LowerGuardIntrinsic.h" diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index 1892e16a06528..382c57c0375e0 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -110,6 +110,7 @@ #include "llvm/Transforms/Scalar/LoopUnrollAndJamPass.h" #include "llvm/Transforms/Scalar/LoopUnrollPass.h" #include "llvm/Transforms/Scalar/LoopVersioningLICM.h" +#include "llvm/Transforms/Scalar/LowerConditionalStoreIntrinsic.h" #include "llvm/Transforms/Scalar/LowerConstantIntrinsics.h" #include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h" #include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h" @@ -499,7 +500,7 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level, FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); - + FPM.addPass(LowerConditionalStoreIntrinsicPass()); // Delete small array after loop unroll. FPM.addPass(SROAPass(SROAOptions::ModifyCFG)); @@ -691,7 +692,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); - + FPM.addPass(LowerConditionalStoreIntrinsicPass()); // Delete small array after loop unroll. FPM.addPass(SROAPass(SROAOptions::ModifyCFG)); @@ -744,7 +745,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap, /*AllowSpeculation=*/true), /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false)); - + FPM.addPass(LowerConditionalStoreIntrinsicPass()); FPM.addPass(CoroElidePass()); invokeScalarOptimizerLateEPCallbacks(FPM, Level); @@ -1279,6 +1280,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level, SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true))); ExtraPasses.addPass(InstCombinePass()); FPM.addPass(std::move(ExtraPasses)); + FPM.addPass(LowerConditionalStoreIntrinsicPass()); } // Now that we've formed fast to execute loop structures, we do further @@ -1354,6 +1356,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level, LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap, /*AllowSpeculation=*/true), /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false)); + FPM.addPass(LowerConditionalStoreIntrinsicPass()); // Now that we've vectorized and unrolled loops, we may have more refined // alignment information, try to re-derive it here. @@ -1950,6 +1953,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level, LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap, /*AllowSpeculation=*/true), /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false)); + MainFPM.addPass(LowerConditionalStoreIntrinsicPass()); if (RunNewGVN) MainFPM.addPass(NewGVNPass()); diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def index 50682ca4970f1..04d01f7d7310e 100644 --- a/llvm/lib/Passes/PassRegistry.def +++ b/llvm/lib/Passes/PassRegistry.def @@ -385,6 +385,7 @@ FUNCTION_PASS("lower-allow-check", LowerAllowCheckPass()) FUNCTION_PASS("lower-atomic", LowerAtomicPass()) FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass()) FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass()) +FUNCTION_PASS("lower-conditional-store", LowerConditionalStoreIntrinsicPass()) FUNCTION_PASS("lower-guard-intrinsic", LowerGuardIntrinsicPass()) FUNCTION_PASS("lower-invoke", LowerInvokePass()) FUNCTION_PASS("lower-switch", LowerSwitchPass()) diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp index 945ab5cf1f303..db8eb7a952335 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp @@ -619,6 +619,9 @@ void AArch64PassConfig::addIRPasses() { // Do loop invariant code motion in case part of the lowered result is // invariant. addPass(createLICMPass()); + // This pass expands conditional store intrinsics, + // which are not supported in the target + addPass(createLowerConditionalStoreIntrinsicPass()); } TargetPassConfig::addIRPasses(); diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp index dbbfe34a63863..a865c81d430e4 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp @@ -1046,8 +1046,12 @@ void AMDGPUPassConfig::addIRPasses() { // Try to hoist loop invariant parts of divisions AMDGPUCodeGenPrepare may // have expanded. - if (TM.getOptLevel() > CodeGenOptLevel::Less) + if (TM.getOptLevel() > CodeGenOptLevel::Less) { addPass(createLICMPass()); + // This pass expands conditional store intrinsics, + // which are not supported in the target + addPass(createLowerConditionalStoreIntrinsicPass()); + } } TargetPassConfig::addIRPasses(); diff --git a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp index 714cf69827a1e..3650f4c2121e6 100644 --- a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp +++ b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp @@ -485,6 +485,9 @@ void PPCPassConfig::addIRPasses() { // Do loop invariant code motion in case part of the lowered result is // invariant. addPass(createLICMPass()); + // This pass expands conditional store intrinsics, + // which are not supported in the target + addPass(createLowerConditionalStoreIntrinsicPass()); } TargetPassConfig::addIRPasses(); diff --git a/llvm/lib/Transforms/Scalar/CMakeLists.txt b/llvm/lib/Transforms/Scalar/CMakeLists.txt index ba09ebf8b04c4..bdecc839729c9 100644 --- a/llvm/lib/Transforms/Scalar/CMakeLists.txt +++ b/llvm/lib/Transforms/Scalar/CMakeLists.txt @@ -48,6 +48,7 @@ add_llvm_component_library(LLVMScalarOpts LoopUnrollAndJamPass.cpp LoopVersioningLICM.cpp LowerAtomicPass.cpp + LowerConditionalStoreIntrinsic.cpp LowerConstantIntrinsics.cpp LowerExpectIntrinsic.cpp LowerGuardIntrinsic.cpp diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp index 5eccf7b4adb65..009b5f3bb7bcc 100644 --- a/llvm/lib/Transforms/Scalar/LICM.cpp +++ b/llvm/lib/Transforms/Scalar/LICM.cpp @@ -157,6 +157,11 @@ cl::opt<unsigned> llvm::SetLicmMssaOptCap( cl::desc("Enable imprecision in LICM in pathological cases, in exchange " "for faster compile. Caps the MemorySSA clobbering calls.")); +cl::opt<bool> SetLicmConditionalAccessPromotion( + "licm-conditional-access-promotion", cl::Hidden, cl::init(true), + cl::desc("Enable promotion of conditional accesses of loop-invariant" + " locations")); + // Experimentally, memory promotion carries less importance than sinking and // hoisting. Limit when we do promotion when using MemorySSA, in order to save // compile time. @@ -1819,6 +1824,12 @@ class LoopPromoter : public LoadAndStorePromoter { AAMDNodes AATags; ICFLoopSafetyInfo &SafetyInfo; bool CanInsertStoresInExitBlocks; + // This flag will be used to make sure that every sinken, conditional store + // instruction is executed conditionally within the exit blocks. In the + // preheader, it is initialized to 0. In every basic block containing a + // conditional store it is raised. + bool ConditionalAccessShouldBePromoted; + SSAUpdater &FlagSSAUpdater; ArrayRef<const Instruction *> Uses; // We're about to add a use of V in a loop exit block. Insert an LCSSA phi @@ -1839,6 +1850,17 @@ class LoopPromoter : public LoadAndStorePromoter { return PN; } + void promoteConditionalAccess(BasicBlock *ExitBlock, Value *LiveInValue, + Value *PtrToExitBB, + BasicBlock::iterator InsertPos) { + Value *FlagValue = FlagSSAUpdater.GetValueInMiddleOfBlock(ExitBlock); + IRBuilder<> Builder(&*InsertPos); + Type *DataType = LiveInValue->getType(); + Value *Ptr = Builder.CreatePointerCast(PtrToExitBB, + PointerType::getUnqual(DataType)); + Builder.CreateConditionalStore(LiveInValue, Ptr, Alignment, FlagValue); + } + public: LoopPromoter(Value *SP, ArrayRef<const Instruction *> Insts, SSAUpdater &S, SmallVectorImpl<BasicBlock *> &LEB, @@ -1846,13 +1868,17 @@ class LoopPromoter : public LoadAndStorePromoter { SmallVectorImpl<MemoryAccess *> &MSSAIP, PredIteratorCache &PIC, MemorySSAUpdater &MSSAU, LoopInfo &li, DebugLoc dl, Align Alignment, bool UnorderedAtomic, const AAMDNodes &AATags, - ICFLoopSafetyInfo &SafetyInfo, bool CanInsertStoresInExitBlocks) + ICFLoopSafetyInfo &SafetyInfo, bool CanInsertStoresInExitBlocks, + bool ConditionalAccessShouldBePromoted, + SSAUpdater &FlagSSAUpdater) : LoadAndStorePromoter(Insts, S), SomePtr(SP), LoopExitBlocks(LEB), LoopInsertPts(LIP), MSSAInsertPts(MSSAIP), PredCache(PIC), MSSAU(MSSAU), LI(li), DL(std::move(dl)), Alignment(Alignment), UnorderedAtomic(UnorderedAtomic), AATags(AATags), SafetyInfo(SafetyInfo), - CanInsertStoresInExitBlocks(CanInsertStoresInExitBlocks), Uses(Insts) {} + CanInsertStoresInExitBlocks(CanInsertStoresInExitBlocks), + ConditionalAccessShouldBePromoted(ConditionalAccessShouldBePromoted), + FlagSSAUpdater(FlagSSAUpdater), Uses(Insts) {} void insertStoresInLoopExitBlocks() { // Insert stores after in the loop exit blocks. Each exit block gets a @@ -1866,6 +1892,10 @@ class LoopPromoter : public LoadAndStorePromoter { LiveInValue = maybeInsertLCSSAPHI(LiveInValue, ExitBlock); Value *Ptr = maybeInsertLCSSAPHI(SomePtr, ExitBlock); BasicBlock::iterator InsertPos = LoopInsertPts[i]; + if (ConditionalAccessShouldBePromoted) { + promoteConditionalAccess(ExitBlock, LiveInValue, Ptr, InsertPos); + continue; + } StoreInst *NewSI = new StoreInst(LiveInValue, Ptr, InsertPos); if (UnorderedAtomic) NewSI->setOrdering(AtomicOrdering::Unordered); @@ -2042,6 +2072,9 @@ bool llvm::promoteLoopAccessesToScalars( bool SawNotAtomic = false; AAMDNodes AATags; + bool SawConditionalLIStore = false; + StringRef PointerOperandName; + const DataLayout &MDL = Preheader->getModule()->getDataLayout(); // If there are reads outside the promoted set, then promoting stores is @@ -2119,6 +2152,12 @@ bool llvm::promoteLoopAccessesToScalars( if (StoreSafety == StoreSafetyUnknown) StoreSafety = StoreSafe; Alignment = std::max(Alignment, InstAlignment); + } else if (SetLicmConditionalAccessPromotion && + (!SawConditionalLIStore || (InstAlignment > Alignment))) { + SawConditionalLIStore = true; + if (PointerOperandName.empty()) + PointerOperandName = Store->getPointerOperand()->getName(); + Alignment = std::max(Alignment, InstAlignment); } // If a store dominates all exit blocks, it is safe to sink. @@ -2199,6 +2238,29 @@ bool llvm::promoteLoopAccessesToScalars( // If we cannot hoist the load either, give up. return false; + const bool PromoteConditionalAccesses = + SetLicmConditionalAccessPromotion && SawConditionalLIStore; + bool ConditionalAccessShouldBePromoted = false; + SmallVector<PHINode *, 16> FlagPHIs; + SSAUpdater FlagSSAUpdater(&FlagPHIs); + if (StoreSafety == StoreSafetyUnknown && PromoteConditionalAccesses) { + ConditionalAccessShouldBePromoted = true; + // If we are allowed to promote conditional stores, store promotion is safe + StoreSafety = StoreSafe; + Type *Int1Ty = Type::getInt1Ty(Preheader->getParent()->getContext()); + FlagSSAUpdater.Initialize(Int1Ty, PointerOperandName.str() + ".flag"); + // Initialize the flag with 0 in the preheader. + FlagSSAUpdater.AddAvailableValue(Preheader, + ConstantInt::get(Int1Ty, + /* Value */ 0)); + for (auto *UI : LoopUses) + if (StoreInst *ConditionalLIStore = dyn_cast<StoreInst>(UI)) + // Raise the flag if a conditional store happened. + FlagSSAUpdater.AddAvailableValue(ConditionalLIStore->getParent(), + ConstantInt::get(Int1Ty, + /* Value */ 1)); + } + // Lets do the promotion! if (StoreSafety == StoreSafe) { LLVM_DEBUG(dbgs() << "LICM: Promoting load/store of the value: " << *SomePtr @@ -2228,7 +2290,8 @@ bool llvm::promoteLoopAccessesToScalars( LoopPromoter Promoter(SomePtr, LoopUses, SSA, ExitBlocks, InsertPts, MSSAInsertPts, PIC, MSSAU, *LI,... [truncated] |
llvm/include/llvm/IR/Intrinsics.td Outdated
| DefaultAttrsIntrinsic</*ret_types*/[], | ||
| /*param_types*/[/*Val*/llvm_any_ty, | ||
| /*Ptr*/llvm_anyptr_ty, | ||
| /*Alignment*/llvm_i32_ty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alignment can just come from the argument attribute, it doesn't need to be an explicit parameter
llvm/include/llvm/IR/Intrinsics.td Outdated
| /*Condition*/llvm_i1_ty], | ||
| /*intr_properties*/[IntrWriteMem, | ||
| IntrArgMemOnly, | ||
| IntrWillReturn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume IntrWillReturn is implied by DefaultAttrsIntrinsic
| //===-------------------------- Conditional Intrinsics --------------------===// | ||
| // | ||
| | ||
| def int_conditional_store: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing LangRef change
| | ||
| } // namespace llvm | ||
| | ||
| #endif No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline at end of file
| | ||
| class Function; | ||
| | ||
| struct LowerConditionalStoreIntrinsicPass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need its own dedicated pass? Can it just go in one of the other intrinsic lowering passes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason why conditional store intrinsic is needed is the fact that it is quite hard to change control flow in the LICM pass, so the destiny of the llvm.conditional.store is to be lowered as soon as possible after LICM pass. Almost all intrinsics leave until backend, so if we lower llvm.conditional.store there, we will miss potential control flow optimizations in the middleend.
| CallInst *CI = dyn_cast<CallInst>(&Instr); | ||
| if (!CI) | ||
| return false; | ||
| | ||
| Function *Fn = CI->getCalledFunction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dyn_cast, and don't need to go through getCalledFunction
| auto *Ptr = Instr.getOperand(1); | ||
| auto *AlignmentVal = dyn_cast<ConstantInt>(Instr.getOperand(2)); | ||
| auto Alignment = MaybeAlign(AlignmentVal->getValue().getLimitedValue()); | ||
| assert(AlignmentVal && "Invalid intrinsic operands"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast instead of dyn_cast and assert (but this doesn't matter if the align just comes from the attributes)
| ; CHECK: br label %4 | ||
| ; CHECK: 4: | ||
| ; CHECK: ret void | ||
| ; CHECK: } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some high level comments:
- Is this really a universally profitable transform? This adds extra phi nodes to track the condition, plus a conditional store after the loop. This seems like it would be non-profitable at least in the case where the store was originally inside a cold block in the loop.
- It seems like you are lowering the new conditional.store intrinsic immediately after LICM by scheduling an additional pass (after every single LICM run!) That seems pretty pointless to me. If you are going to do that, you may as well directly update the CFG in LICM. If you're going to introduce that kind of intrinsic, I would expect it to be lowered in the early backend.
- Of course, if you do that, then we also need some middle-end optimization support for the intrinsic.
Mostly though I'm not really convinced that we want to do this transform at all, at least not as an unconditional canonicalization transform.
| High level comment... I generally think that allowing the hoisting of predicated scalar loads and stores outside the loop is a good overall idea, but there's a ton of details for which I don't have a clearly formed opinion. (For anyone reading older reviews, note that I feel I have less clarity on this than I used to...) One of the major concerns here is that while there are cases where hoisting a predicated store or load out of a loop is hugely profitable, there are also a bunch of cases where it isn't, and may even be strongly negative. My original thought was that we'd treat the predicated form as a canonical form, and reverse the transform if needed, but well, I'm generally less sure that's a good idea today. Representation wise, I would be tempted to extend our existing masked.load and masked.store intrinsic to allow non-vector types. That seems "cleanest" to me. When I'd glanced at this a while back, I'd tentatively wanted to represent scalars as 1 x Type vectors, but I think I convinced myself that just doesn't work out well. I think there's a bit of a decision point here. If we want to restrict to only performing profitable hoist/sink, then we probably don't want an intrinsic at all, and should just have LICM rewrite the CFG. If we think predicated scalar loads and stores are a decent canonical form (in at least some cases), then we need to plumb those through the optimizer, and codegen. Lowering them early really feels like a half way point between the two options above, and honestly, I suspect is the worst of both worlds. Another thought - unrelated to the above - is that predicated loads and stores don't have to be conditional in their lowering. If you have a safe memory region you can do a select of two addresses instead. Depending on the target, this can be better or worse overall, but might better integrate with e.g. our existing select lowering. One interesting point is that LICM is in a good spot to identify dereferenceable alternate memory locations for loads. Stores are harder since you have to know that no one reads from it - maybe a dummy alloca? (This idea is very much unexplored, and might turn out to be a terrible suggestion - take this as brainstorming, nothing more.) |
Now conditional store promotion option is disabled by default. I have missed this aspect at first.
There are cases when LLVM might optimize lowered control flow. For example, this code might be optimized with early lowering: backend-lowering.txt contains assembler from initial patch where masked store was lowered in the backend. |
llvm/include/llvm/LinkAllPasses.h Outdated
| (void) llvm::createLoopStrengthReducePass(); | ||
| (void) llvm::createLoopUnrollPass(); | ||
| (void) llvm::createLowerConstantIntrinsicsPass(); | ||
| (void)llvm::createLowerConditionalStoreIntrinsicPass(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (void)llvm::createLowerConditionalStoreIntrinsicPass(); | |
| (void) llvm::createLowerConditionalStoreIntrinsicPass(); |
llvm/lib/Transforms/Scalar/LICM.cpp Outdated
| AAMDNodes AATags; | ||
| ICFLoopSafetyInfo &SafetyInfo; | ||
| bool CanInsertStoresInExitBlocks; | ||
| // This flag will be used to make sure that every sinken, conditional store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // This flag will be used to make sure that every sinken, conditional store | |
| // This flag will be used to make sure that every sunk conditional store |
| } | ||
| | ||
| static void lowerCondStoreIntr(Instruction &Instr) { | ||
| LLVM_DEBUG(dbgs() << "Found basic with conditional store: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean found conditional store intrinsic?
| | ||
| FunctionPass *llvm::createLowerConditionalStoreIntrinsicPass() { | ||
| return new LowerConditionalStoreIntrinsicLegacy(); | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing empty line
llvm/lib/Passes/PassRegistry.def Outdated
| FUNCTION_PASS("lower-atomic", LowerAtomicPass()) | ||
| FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass()) | ||
| FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass()) | ||
| FUNCTION_PASS("lower-conditional-store", LowerConditionalStoreIntrinsicPass()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be named the same as the DEBUG_TYPE (which is lower-cond-store-intrinsic), or the DEBUG_TYPE should be renamed.
| @preames @nikic Sorry for the delay. I have been fixing some issues with this patch. I am about to add commits, that make this optimization correct. (now there are some troubles with memory SSA metadata)
Probably, one of the reasons why there is an intrinsic instead of CF change in the LICM is the fact that this will require numerous changes in the LICM code and I am not sure that it will not cause some bugs. New intrinsic is a good option because it introduces new way of solving such issues. In my opinion combination of conditional and masked stores convey intentions of the programmer better than only masked store, because, for example, usage of the vector intrinsic in the non-vector code might confuse other developers(and, perhaps, some optimizations). The fact that the current intrinsic is lowered immediately after the LICM might be changed in the future when it will be used in the other optimizations. |
There is a missed opportunity to promote conditional store in the LICM pass. It has been implemented in this old patch:
https://reviews.llvm.org/D115244
This patch has a minor flaw: on some architectures masked store intrinsic lowers to vector instructions. For example, on RISC-V architecture this code
translates to the assembler with vector instructions:
which is unnecessary here.
I have implemented
conditional_storeintrinsic for this patch in addition to the original one.