- Notifications
You must be signed in to change notification settings - Fork 15.3k
[flang][OpenMP] Basic mapping of do concurrent ... reduce to OpenMP #146033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][OpenMP] Basic mapping of do concurrent ... reduce to OpenMP #146033
Conversation
| @llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesNow that we have changes introduced by #145837, mapping reductions from TODO: Add tests Full diff: https://github.com/llvm/llvm-project/pull/146033.diff 1 Files Affected:
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 709cf1d0938fa..31076f6eb328f 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -312,6 +312,19 @@ class DoConcurrentConversion bool isComposite) const { mlir::omp::WsloopOperands wsloopClauseOps; + auto cloneFIRRegionToOMP = [&rewriter](mlir::Region &firRegion, + mlir::Region &ompRegion) { + if (!firRegion.empty()) { + rewriter.cloneRegionBefore(firRegion, ompRegion, ompRegion.begin()); + auto firYield = + mlir::cast<fir::YieldOp>(ompRegion.back().getTerminator()); + rewriter.setInsertionPoint(firYield); + rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(), + firYield.getOperands()); + rewriter.eraseOp(firYield); + } + }; + // For `local` (and `local_init`) opernads, emit corresponding `private` // clauses and attach these clauses to the workshare loop. if (!loop.getLocalVars().empty()) @@ -326,50 +339,65 @@ class DoConcurrentConversion TODO(localizer.getLoc(), "local_init conversion is not supported yet"); - auto oldIP = rewriter.saveInsertionPoint(); + mlir::OpBuilder::InsertionGuard guard(rewriter); rewriter.setInsertionPointAfter(localizer); + auto privatizer = rewriter.create<mlir::omp::PrivateClauseOp>( localizer.getLoc(), sym.getLeafReference().str() + ".omp", localizer.getTypeAttr().getValue(), mlir::omp::DataSharingClauseType::Private); - if (!localizer.getInitRegion().empty()) { - rewriter.cloneRegionBefore(localizer.getInitRegion(), - privatizer.getInitRegion(), - privatizer.getInitRegion().begin()); - auto firYield = mlir::cast<fir::YieldOp>( - privatizer.getInitRegion().back().getTerminator()); - rewriter.setInsertionPoint(firYield); - rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(), - firYield.getOperands()); - rewriter.eraseOp(firYield); - } - - if (!localizer.getDeallocRegion().empty()) { - rewriter.cloneRegionBefore(localizer.getDeallocRegion(), - privatizer.getDeallocRegion(), - privatizer.getDeallocRegion().begin()); - auto firYield = mlir::cast<fir::YieldOp>( - privatizer.getDeallocRegion().back().getTerminator()); - rewriter.setInsertionPoint(firYield); - rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(), - firYield.getOperands()); - rewriter.eraseOp(firYield); - } - - rewriter.restoreInsertionPoint(oldIP); + cloneFIRRegionToOMP(localizer.getInitRegion(), + privatizer.getInitRegion()); + cloneFIRRegionToOMP(localizer.getDeallocRegion(), + privatizer.getDeallocRegion()); wsloopClauseOps.privateVars.push_back(op); wsloopClauseOps.privateSyms.push_back( mlir::SymbolRefAttr::get(privatizer)); } + if (!loop.getReduceVars().empty()) { + for (auto [op, byRef, sym, arg] : llvm::zip_equal( + loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(), + loop.getReduceSymsAttr().getAsRange<mlir::SymbolRefAttr>(), + loop.getRegionReduceArgs())) { + auto firReducer = + mlir::SymbolTable::lookupNearestSymbolFrom<fir::DeclareReductionOp>( + loop, sym); + + mlir::OpBuilder::InsertionGuard guard(rewriter); + rewriter.setInsertionPointAfter(firReducer); + + auto ompReducer = rewriter.create<mlir::omp::DeclareReductionOp>( + firReducer.getLoc(), sym.getLeafReference().str() + ".omp", + firReducer.getTypeAttr().getValue()); + + cloneFIRRegionToOMP(firReducer.getAllocRegion(), + ompReducer.getAllocRegion()); + cloneFIRRegionToOMP(firReducer.getInitializerRegion(), + ompReducer.getInitializerRegion()); + cloneFIRRegionToOMP(firReducer.getReductionRegion(), + ompReducer.getReductionRegion()); + cloneFIRRegionToOMP(firReducer.getAtomicReductionRegion(), + ompReducer.getAtomicReductionRegion()); + cloneFIRRegionToOMP(firReducer.getCleanupRegion(), + ompReducer.getCleanupRegion()); + + wsloopClauseOps.reductionVars.push_back(op); + wsloopClauseOps.reductionByref.push_back(byRef); + wsloopClauseOps.reductionSyms.push_back( + mlir::SymbolRefAttr::get(ompReducer)); + } + } + auto wsloopOp = rewriter.create<mlir::omp::WsloopOp>(loop.getLoc(), wsloopClauseOps); wsloopOp.setComposite(isComposite); Fortran::common::openmp::EntryBlockArgs wsloopArgs; wsloopArgs.priv.vars = wsloopClauseOps.privateVars; + wsloopArgs.reduction.vars = wsloopClauseOps.reductionVars; Fortran::common::openmp::genEntryBlock(rewriter, wsloopArgs, wsloopOp.getRegion()); @@ -393,7 +421,8 @@ class DoConcurrentConversion clauseOps.loopLowerBounds.size()))) rewriter.replaceAllUsesWith(loopNestArg, wsloopArg); - for (unsigned i = 0; i < loop.getLocalVars().size(); ++i) + for (unsigned i = 0; + i < loop.getLocalVars().size() + loop.getReduceVars().size(); ++i) loopNestOp.getRegion().eraseArgument(clauseOps.loopLowerBounds.size()); return loopNestOp; |
b99122b to c99a502 Compare
tblah left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but please could you add a test that covers every region.
5f665c9 to 105ac7a Compare c99a502 to 34a5df1 Compare 105ac7a to 3e2bdf5 Compare 34a5df1 to 185b43f Compare
Thanks for the review. Done. |
mjklemm left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
3e2bdf5 to 15ed831 Compare 185b43f to 8c25fe7 Compare
clementval left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
15ed831 to 53f9c0d Compare 8c25fe7 to 39e6ed7 Compare …lled in OpenMP and OpenACC This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm#145837 (this one) - llvm#146025 - llvm#146028 - llvm#146033
With llvm#145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - llvm#145837 - llvm#146025 (this one) - llvm#146028 - llvm#146033
Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - llvm#145837 - llvm#146025 - llvm#146028 (this one) - llvm#146033
Now that we have changes introduced by llvm#145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping. PR stack: - llvm#145837 - llvm#146025 - llvm#146028 - llvm#146033 (this one)
…lled in OpenMP and OpenACC (#145837) This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - #145837 (this one) - #146025 - #146028 - #146033
53f9c0d to 8e67de7 Compare …ns are modelled in OpenMP and OpenACC (#145837) This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm/llvm-project#145837 (this one) - llvm/llvm-project#146025 - llvm/llvm-project#146028 - llvm/llvm-project#146033
8e67de7 to df63fea Compare … (#146025) With #145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 (this one) - llvm/llvm-project#146028 - llvm/llvm-project#146033
Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping.
39e6ed7 to c1834bb Compare …defintion (#146028) Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 - llvm/llvm-project#146028 (this one) - llvm/llvm-project#146033
…` to OpenMP (#146033) Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 - llvm/llvm-project#146028 - llvm/llvm-project#146033 (this one)
Now that we have changes introduced by #145837, mapping reductions from
do concurrentto OpenMP is almost trivial. This PR adds such mapping.PR stack:
reduceto match reductions are modelled in OpenMP and OpenACC #145837ReductionProcessortoLower/Support. #146025fir_DoConcurrentLoopOp's defintion #146028do concurrent ... reduceto OpenMP #146033 (this one)