Skip to content

Conversation

@ergawy
Copy link
Member

@ergawy ergawy commented Jun 27, 2025

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Jun 27, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 27, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Now that we have changes introduced by #145837, mapping reductions from do concurrent to OpenMP is almost trivial. This PR adds such mapping.

TODO: Add tests


Full diff: https://github.com/llvm/llvm-project/pull/146033.diff

1 Files Affected:

  • (modified) flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp (+56-27)
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 709cf1d0938fa..31076f6eb328f 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -312,6 +312,19 @@ class DoConcurrentConversion bool isComposite) const { mlir::omp::WsloopOperands wsloopClauseOps; + auto cloneFIRRegionToOMP = [&rewriter](mlir::Region &firRegion, + mlir::Region &ompRegion) { + if (!firRegion.empty()) { + rewriter.cloneRegionBefore(firRegion, ompRegion, ompRegion.begin()); + auto firYield = + mlir::cast<fir::YieldOp>(ompRegion.back().getTerminator()); + rewriter.setInsertionPoint(firYield); + rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(), + firYield.getOperands()); + rewriter.eraseOp(firYield); + } + }; + // For `local` (and `local_init`) opernads, emit corresponding `private` // clauses and attach these clauses to the workshare loop. if (!loop.getLocalVars().empty()) @@ -326,50 +339,65 @@ class DoConcurrentConversion TODO(localizer.getLoc(), "local_init conversion is not supported yet"); - auto oldIP = rewriter.saveInsertionPoint(); + mlir::OpBuilder::InsertionGuard guard(rewriter); rewriter.setInsertionPointAfter(localizer); + auto privatizer = rewriter.create<mlir::omp::PrivateClauseOp>( localizer.getLoc(), sym.getLeafReference().str() + ".omp", localizer.getTypeAttr().getValue(), mlir::omp::DataSharingClauseType::Private); - if (!localizer.getInitRegion().empty()) { - rewriter.cloneRegionBefore(localizer.getInitRegion(), - privatizer.getInitRegion(), - privatizer.getInitRegion().begin()); - auto firYield = mlir::cast<fir::YieldOp>( - privatizer.getInitRegion().back().getTerminator()); - rewriter.setInsertionPoint(firYield); - rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(), - firYield.getOperands()); - rewriter.eraseOp(firYield); - } - - if (!localizer.getDeallocRegion().empty()) { - rewriter.cloneRegionBefore(localizer.getDeallocRegion(), - privatizer.getDeallocRegion(), - privatizer.getDeallocRegion().begin()); - auto firYield = mlir::cast<fir::YieldOp>( - privatizer.getDeallocRegion().back().getTerminator()); - rewriter.setInsertionPoint(firYield); - rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(), - firYield.getOperands()); - rewriter.eraseOp(firYield); - } - - rewriter.restoreInsertionPoint(oldIP); + cloneFIRRegionToOMP(localizer.getInitRegion(), + privatizer.getInitRegion()); + cloneFIRRegionToOMP(localizer.getDeallocRegion(), + privatizer.getDeallocRegion()); wsloopClauseOps.privateVars.push_back(op); wsloopClauseOps.privateSyms.push_back( mlir::SymbolRefAttr::get(privatizer)); } + if (!loop.getReduceVars().empty()) { + for (auto [op, byRef, sym, arg] : llvm::zip_equal( + loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(), + loop.getReduceSymsAttr().getAsRange<mlir::SymbolRefAttr>(), + loop.getRegionReduceArgs())) { + auto firReducer = + mlir::SymbolTable::lookupNearestSymbolFrom<fir::DeclareReductionOp>( + loop, sym); + + mlir::OpBuilder::InsertionGuard guard(rewriter); + rewriter.setInsertionPointAfter(firReducer); + + auto ompReducer = rewriter.create<mlir::omp::DeclareReductionOp>( + firReducer.getLoc(), sym.getLeafReference().str() + ".omp", + firReducer.getTypeAttr().getValue()); + + cloneFIRRegionToOMP(firReducer.getAllocRegion(), + ompReducer.getAllocRegion()); + cloneFIRRegionToOMP(firReducer.getInitializerRegion(), + ompReducer.getInitializerRegion()); + cloneFIRRegionToOMP(firReducer.getReductionRegion(), + ompReducer.getReductionRegion()); + cloneFIRRegionToOMP(firReducer.getAtomicReductionRegion(), + ompReducer.getAtomicReductionRegion()); + cloneFIRRegionToOMP(firReducer.getCleanupRegion(), + ompReducer.getCleanupRegion()); + + wsloopClauseOps.reductionVars.push_back(op); + wsloopClauseOps.reductionByref.push_back(byRef); + wsloopClauseOps.reductionSyms.push_back( + mlir::SymbolRefAttr::get(ompReducer)); + } + } + auto wsloopOp = rewriter.create<mlir::omp::WsloopOp>(loop.getLoc(), wsloopClauseOps); wsloopOp.setComposite(isComposite); Fortran::common::openmp::EntryBlockArgs wsloopArgs; wsloopArgs.priv.vars = wsloopClauseOps.privateVars; + wsloopArgs.reduction.vars = wsloopClauseOps.reductionVars; Fortran::common::openmp::genEntryBlock(rewriter, wsloopArgs, wsloopOp.getRegion()); @@ -393,7 +421,8 @@ class DoConcurrentConversion clauseOps.loopLowerBounds.size()))) rewriter.replaceAllUsesWith(loopNestArg, wsloopArg); - for (unsigned i = 0; i < loop.getLocalVars().size(); ++i) + for (unsigned i = 0; + i < loop.getLocalVars().size() + loop.getReduceVars().size(); ++i) loopNestOp.getRegion().eraseArgument(clauseOps.loopLowerBounds.size()); return loopNestOp; 
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please could you add a test that covers every region.

@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 5f665c9 to 105ac7a Compare June 30, 2025 04:19
@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from c99a502 to 34a5df1 Compare June 30, 2025 04:23
@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 105ac7a to 3e2bdf5 Compare June 30, 2025 04:40
@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from 34a5df1 to 185b43f Compare June 30, 2025 05:28
@ergawy
Copy link
Member Author

ergawy commented Jun 30, 2025

LGTM but please could you add a test that covers every region.

Thanks for the review. Done.

Copy link
Contributor

@mjklemm mjklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 3e2bdf5 to 15ed831 Compare July 2, 2025 15:42
@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from 185b43f to 8c25fe7 Compare July 2, 2025 15:47
Copy link
Contributor

@clementval clementval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 15ed831 to 53f9c0d Compare July 9, 2025 07:18
@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from 8c25fe7 to 39e6ed7 Compare July 9, 2025 07:18
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 10, 2025
…lled in OpenMP and OpenACC This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm#145837 (this one) - llvm#146025 - llvm#146028 - llvm#146033
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 10, 2025
With llvm#145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - llvm#145837 - llvm#146025 (this one) - llvm#146028 - llvm#146033
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 10, 2025
Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - llvm#145837 - llvm#146025 - llvm#146028 (this one) - llvm#146033
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 10, 2025
Now that we have changes introduced by llvm#145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping. PR stack: - llvm#145837 - llvm#146025 - llvm#146028 - llvm#146033 (this one)
ergawy added a commit that referenced this pull request Jul 11, 2025
…lled in OpenMP and OpenACC (#145837) This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - #145837 (this one) - #146025 - #146028 - #146033
@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 53f9c0d to 8e67de7 Compare July 11, 2025 04:41
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 11, 2025
…ns are modelled in OpenMP and OpenACC (#145837) This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm/llvm-project#145837 (this one) - llvm/llvm-project#146025 - llvm/llvm-project#146028 - llvm/llvm-project#146033
ergawy added a commit that referenced this pull request Jul 11, 2025
With #145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - #145837 - #146025 (this one) - #146028 - #146033
@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 8e67de7 to df63fea Compare July 11, 2025 05:45
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 11, 2025
… (#146025) With #145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 (this one) - llvm/llvm-project#146028 - llvm/llvm-project#146033
ergawy added a commit that referenced this pull request Jul 11, 2025
…146028) Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - #145837 - #146025 - #146028 (this one) - #146033
Base automatically changed from users/ergawy/convert_locality_specs_to_clauses_7 to main July 11, 2025 06:30
ergawy added 2 commits July 11, 2025 01:31
Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping.
@ergawy ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from 39e6ed7 to c1834bb Compare July 11, 2025 06:31
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 11, 2025
…defintion (#146028) Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 - llvm/llvm-project#146028 (this one) - llvm/llvm-project#146033
@ergawy ergawy merged commit 0e9b7b0 into main Jul 11, 2025
9 checks passed
@ergawy ergawy deleted the users/ergawy/convert_locality_specs_to_clauses_8 branch July 11, 2025 07:19
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 11, 2025
…` to OpenMP (#146033) Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 - llvm/llvm-project#146028 - llvm/llvm-project#146033 (this one)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang Flang issues not falling into any other category

6 participants