- Notifications
You must be signed in to change notification settings - Fork 15.3k
[OpenMP][flang] Add initial support for by-ref reductions on the GPU #165714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesAdds initial support for GPU by-ref reductions. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end targetThis PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example:
Patch is 54.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165714.diff 34 Files Affected:
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index fddeba98adccc..9a8c75073aa4c 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -1784,8 +1784,8 @@ void CGOpenMPRuntimeGPU::emitReduction( llvm::OpenMPIRBuilder::InsertPointTy AfterIP = cantFail(OMPBuilder.createReductionsGPU( - OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, false, TeamsReduction, - llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, + OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, {}, false, + TeamsReduction, llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, CGF.getTarget().getGridValue(), C.getLangOpts().OpenMPCUDAReductionBufNum, RTLoc)); CGF.Builder.restoreIP(AfterIP); diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td index 58a317cf5d691..ff4dab1136ee9 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.td +++ b/flang/include/flang/Optimizer/Dialect/FIROps.td @@ -3743,7 +3743,8 @@ def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove, }]; let arguments = (ins SymbolNameAttr:$sym_name, - TypeAttr:$type); + TypeAttr:$type, + OptionalAttr<TypeAttr>:$byref_element_type); let regions = (region MaxSizedRegion<1>:$allocRegion, AnyRegion:$initializerRegion, diff --git a/flang/lib/Lower/Support/ReductionProcessor.cpp b/flang/lib/Lower/Support/ReductionProcessor.cpp index 605a5b6b20b94..e02cd8fac823b 100644 --- a/flang/lib/Lower/Support/ReductionProcessor.cpp +++ b/flang/lib/Lower/Support/ReductionProcessor.cpp @@ -573,10 +573,15 @@ OpType ReductionProcessor::createDeclareReduction( mlir::OpBuilder modBuilder(module.getBodyRegion()); mlir::Type valTy = fir::unwrapRefType(type); + mlir::TypeAttr boxedTy{}; + if (!isByRef) type = valTy; - decl = OpType::create(modBuilder, loc, reductionOpName, type); + if (isByRef) + boxedTy = mlir::TypeAttr::get(fir::unwrapPassByRefType(valTy)); + + decl = OpType::create(modBuilder, loc, reductionOpName, type, boxedTy); createReductionAllocAndInitRegions(converter, loc, decl, redId, type, isByRef); diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 1229018bd9b3e..11609ea7b6040 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -851,7 +851,8 @@ class DoConcurrentConversion if (!ompReducer) { ompReducer = mlir::omp::DeclareReductionOp::create( rewriter, firReducer.getLoc(), ompReducerName, - firReducer.getTypeAttr().getValue()); + firReducer.getTypeAttr().getValue(), + firReducer.getByrefElementTypeAttr()); cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(), ompReducer.getAllocRegion()); diff --git a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 index 4b6a643f94059..4c7b6ac5f5f9b 100644 --- a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 @@ -22,7 +22,7 @@ subroutine red_and_delayed_private ! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : i32 ! CHECK-LABEL: omp.declare_reduction -! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> alloc +! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> attributes {byref_element_type = i32} alloc ! CHECK-LABEL: _QPred_and_delayed_private ! CHECK: omp.parallel diff --git a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 index 41c7d69ebb3ba..f56875dcb518b 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_10:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_10]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 index aa91e1e0e8b15..d9ba3bed464f8 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 @@ -12,7 +12,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_15:.*]] = fir.alloca !fir.box<!fir.array<3x2xi32>> ! CHECK: omp.yield(%[[VAL_15]] : !fir.ref<!fir.box<!fir.array<3x2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 index 59595de338d50..636660f279e85 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 @@ -17,7 +17,7 @@ program reduce print *,i end program -! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> attributes {byref_element_type = !fir.array<3xi32>} alloc { ! CPU: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CPU: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CPU-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 index 14338c6f50817..9cf8a63427ed1 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 @@ -13,7 +13,7 @@ program reduce print *,i end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 index 36344458d1cae..3de2ba8f61f8e 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 @@ -19,7 +19,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction3.f90 b/flang/test/Lower/OpenMP/parallel-reduction3.f90 index 9af18378f0ae0..da337378862be 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction3.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction3.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 index 8b94d51f986f5..4a0593ff9eca4 100644 --- a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 +++ b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 @@ -9,7 +9,7 @@ subroutine max_array_reduction(l, r) !$omp end parallel end subroutine -! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/sections-array-reduction.f90 b/flang/test/Lower/OpenMP/sections-array-reduction.f90 index 2f2808cebfc0c..0dbe9e3673395 100644 --- a/flang/test/Lower/OpenMP/sections-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/sections-array-reduction.f90 @@ -14,7 +14,7 @@ subroutine sectionsReduction(x) end subroutine -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 index 18a4f75b86309..3a63bb09c59de 100644 --- a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 index 2cd953de0dffa..ed81577ecce16 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 @@ -32,7 +32,7 @@ program reduce15 print *,"min: ", mins end program -! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { @@ -93,7 +93,7 @@ program reduce15 ! CHECK: omp.yield ! CHECK: } -! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 index 663851cba46c6..d8c0a36db126e 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> attributes {byref_element_type = i32} alloc { ! CHECK: %[[VAL_2:.*]] = fir.alloca !fir.box<!fir.heap<i32>> ! CHECK: omp.yield(%[[VAL_2]] : !fir.ref<!fir.box<!fir.heap<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 index 209ee9a4e0cef..28acb8f19531f 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 @@ -22,7 +22,7 @@ subroutine reduce(r) end subroutine end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xf64>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 index 2233a74600948..ec448cf20f111 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 @@ -11,7 +11,7 @@ program reduce !$omp end parallel do end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<2xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 index 211bde19da8db..9da05a290ec21 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 @@ -19,7 +19,7 @@ subroutine sub(a, lb, ub) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<?xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 index afaeba27c5eae..14b657c8e180d 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> attributes {byref_element_type = !fir.array<2xi32>} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 index 25b2e97a1b7f7..d0a0c38e4ccb1 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 index edd2bcb1d6be8..60a162d8f8002 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 @@ -24,7 +24,7 @@ program main endprogram -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<3x3xf64>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<3x3xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 index 27b726376fbeb..f640f5caddf76 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 @@ -18,7 +18,7 @@ program reduce_pointer deallocate(v) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 index 873fd10dd1b97..4fb67c094b594 100644 --- a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 +++ b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 @@ -8,7 +8,7 @@ subroutine do_concurrent_allocatable end do end subroutine -! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] alloc { +! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] attributes {byref_element_type = !fir.array<?x?xf32>} alloc { ! CHECK: %[[ALLOC:.*]] = fir.alloca ! CHECK: fir.yield(%[[ALLOC]] : ![[RED_TYPE]]) ! CHECK: } init { diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 5331cb5abdc6f..f4192f9b49fd9 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -1448,11 +1448,15 @@ class OpenMPIRBuilder { ReductionInfo(Type *ElementType, Value *Variable, Value *PrivateVariable, EvalKind EvaluationKind, ReductionGenCBTy ReductionGen, ReductionGenClangCBTy ReductionGenClang, - ReductionGenAtomicCBTy AtomicReductionGen) + ReductionGenAtomicCBTy AtomicReductionGen, + Type *ByRefAllocatedType = nullptr, + Type *ByRefElementType = nullptr) : ElementType(ElementType), Vari... [truncated] |
| @llvm/pr-subscribers-clang Author: Kareem Ergawy (ergawy) ChangesAdds initial support for GPU by-ref reductions. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end targetThis PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example:
Patch is 54.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165714.diff 34 Files Affected:
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index fddeba98adccc..9a8c75073aa4c 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -1784,8 +1784,8 @@ void CGOpenMPRuntimeGPU::emitReduction( llvm::OpenMPIRBuilder::InsertPointTy AfterIP = cantFail(OMPBuilder.createReductionsGPU( - OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, false, TeamsReduction, - llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, + OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, {}, false, + TeamsReduction, llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, CGF.getTarget().getGridValue(), C.getLangOpts().OpenMPCUDAReductionBufNum, RTLoc)); CGF.Builder.restoreIP(AfterIP); diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td index 58a317cf5d691..ff4dab1136ee9 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.td +++ b/flang/include/flang/Optimizer/Dialect/FIROps.td @@ -3743,7 +3743,8 @@ def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove, }]; let arguments = (ins SymbolNameAttr:$sym_name, - TypeAttr:$type); + TypeAttr:$type, + OptionalAttr<TypeAttr>:$byref_element_type); let regions = (region MaxSizedRegion<1>:$allocRegion, AnyRegion:$initializerRegion, diff --git a/flang/lib/Lower/Support/ReductionProcessor.cpp b/flang/lib/Lower/Support/ReductionProcessor.cpp index 605a5b6b20b94..e02cd8fac823b 100644 --- a/flang/lib/Lower/Support/ReductionProcessor.cpp +++ b/flang/lib/Lower/Support/ReductionProcessor.cpp @@ -573,10 +573,15 @@ OpType ReductionProcessor::createDeclareReduction( mlir::OpBuilder modBuilder(module.getBodyRegion()); mlir::Type valTy = fir::unwrapRefType(type); + mlir::TypeAttr boxedTy{}; + if (!isByRef) type = valTy; - decl = OpType::create(modBuilder, loc, reductionOpName, type); + if (isByRef) + boxedTy = mlir::TypeAttr::get(fir::unwrapPassByRefType(valTy)); + + decl = OpType::create(modBuilder, loc, reductionOpName, type, boxedTy); createReductionAllocAndInitRegions(converter, loc, decl, redId, type, isByRef); diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 1229018bd9b3e..11609ea7b6040 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -851,7 +851,8 @@ class DoConcurrentConversion if (!ompReducer) { ompReducer = mlir::omp::DeclareReductionOp::create( rewriter, firReducer.getLoc(), ompReducerName, - firReducer.getTypeAttr().getValue()); + firReducer.getTypeAttr().getValue(), + firReducer.getByrefElementTypeAttr()); cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(), ompReducer.getAllocRegion()); diff --git a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 index 4b6a643f94059..4c7b6ac5f5f9b 100644 --- a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 @@ -22,7 +22,7 @@ subroutine red_and_delayed_private ! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : i32 ! CHECK-LABEL: omp.declare_reduction -! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> alloc +! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> attributes {byref_element_type = i32} alloc ! CHECK-LABEL: _QPred_and_delayed_private ! CHECK: omp.parallel diff --git a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 index 41c7d69ebb3ba..f56875dcb518b 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_10:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_10]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 index aa91e1e0e8b15..d9ba3bed464f8 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 @@ -12,7 +12,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_15:.*]] = fir.alloca !fir.box<!fir.array<3x2xi32>> ! CHECK: omp.yield(%[[VAL_15]] : !fir.ref<!fir.box<!fir.array<3x2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 index 59595de338d50..636660f279e85 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 @@ -17,7 +17,7 @@ program reduce print *,i end program -! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> attributes {byref_element_type = !fir.array<3xi32>} alloc { ! CPU: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CPU: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CPU-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 index 14338c6f50817..9cf8a63427ed1 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 @@ -13,7 +13,7 @@ program reduce print *,i end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 index 36344458d1cae..3de2ba8f61f8e 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 @@ -19,7 +19,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction3.f90 b/flang/test/Lower/OpenMP/parallel-reduction3.f90 index 9af18378f0ae0..da337378862be 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction3.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction3.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 index 8b94d51f986f5..4a0593ff9eca4 100644 --- a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 +++ b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 @@ -9,7 +9,7 @@ subroutine max_array_reduction(l, r) !$omp end parallel end subroutine -! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/sections-array-reduction.f90 b/flang/test/Lower/OpenMP/sections-array-reduction.f90 index 2f2808cebfc0c..0dbe9e3673395 100644 --- a/flang/test/Lower/OpenMP/sections-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/sections-array-reduction.f90 @@ -14,7 +14,7 @@ subroutine sectionsReduction(x) end subroutine -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 index 18a4f75b86309..3a63bb09c59de 100644 --- a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 index 2cd953de0dffa..ed81577ecce16 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 @@ -32,7 +32,7 @@ program reduce15 print *,"min: ", mins end program -! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { @@ -93,7 +93,7 @@ program reduce15 ! CHECK: omp.yield ! CHECK: } -! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 index 663851cba46c6..d8c0a36db126e 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> attributes {byref_element_type = i32} alloc { ! CHECK: %[[VAL_2:.*]] = fir.alloca !fir.box<!fir.heap<i32>> ! CHECK: omp.yield(%[[VAL_2]] : !fir.ref<!fir.box<!fir.heap<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 index 209ee9a4e0cef..28acb8f19531f 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 @@ -22,7 +22,7 @@ subroutine reduce(r) end subroutine end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xf64>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 index 2233a74600948..ec448cf20f111 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 @@ -11,7 +11,7 @@ program reduce !$omp end parallel do end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<2xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 index 211bde19da8db..9da05a290ec21 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 @@ -19,7 +19,7 @@ subroutine sub(a, lb, ub) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<?xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 index afaeba27c5eae..14b657c8e180d 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> attributes {byref_element_type = !fir.array<2xi32>} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 index 25b2e97a1b7f7..d0a0c38e4ccb1 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 index edd2bcb1d6be8..60a162d8f8002 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 @@ -24,7 +24,7 @@ program main endprogram -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<3x3xf64>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<3x3xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 index 27b726376fbeb..f640f5caddf76 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 @@ -18,7 +18,7 @@ program reduce_pointer deallocate(v) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 index 873fd10dd1b97..4fb67c094b594 100644 --- a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 +++ b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 @@ -8,7 +8,7 @@ subroutine do_concurrent_allocatable end do end subroutine -! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] alloc { +! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] attributes {byref_element_type = !fir.array<?x?xf32>} alloc { ! CHECK: %[[ALLOC:.*]] = fir.alloca ! CHECK: fir.yield(%[[ALLOC]] : ![[RED_TYPE]]) ! CHECK: } init { diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 5331cb5abdc6f..f4192f9b49fd9 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -1448,11 +1448,15 @@ class OpenMPIRBuilder { ReductionInfo(Type *ElementType, Value *Variable, Value *PrivateVariable, EvalKind EvaluationKind, ReductionGenCBTy ReductionGen, ReductionGenClangCBTy ReductionGenClang, - ReductionGenAtomicCBTy AtomicReductionGen) + ReductionGenAtomicCBTy AtomicReductionGen, + Type *ByRefAllocatedType = nullptr, + Type *ByRefElementType = nullptr) : ElementType(ElementType), Vari... [truncated] |
| @llvm/pr-subscribers-flang-openmp Author: Kareem Ergawy (ergawy) ChangesAdds initial support for GPU by-ref reductions. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end targetThis PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example:
Patch is 54.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165714.diff 34 Files Affected:
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index fddeba98adccc..9a8c75073aa4c 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -1784,8 +1784,8 @@ void CGOpenMPRuntimeGPU::emitReduction( llvm::OpenMPIRBuilder::InsertPointTy AfterIP = cantFail(OMPBuilder.createReductionsGPU( - OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, false, TeamsReduction, - llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, + OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, {}, false, + TeamsReduction, llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, CGF.getTarget().getGridValue(), C.getLangOpts().OpenMPCUDAReductionBufNum, RTLoc)); CGF.Builder.restoreIP(AfterIP); diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td index 58a317cf5d691..ff4dab1136ee9 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.td +++ b/flang/include/flang/Optimizer/Dialect/FIROps.td @@ -3743,7 +3743,8 @@ def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove, }]; let arguments = (ins SymbolNameAttr:$sym_name, - TypeAttr:$type); + TypeAttr:$type, + OptionalAttr<TypeAttr>:$byref_element_type); let regions = (region MaxSizedRegion<1>:$allocRegion, AnyRegion:$initializerRegion, diff --git a/flang/lib/Lower/Support/ReductionProcessor.cpp b/flang/lib/Lower/Support/ReductionProcessor.cpp index 605a5b6b20b94..e02cd8fac823b 100644 --- a/flang/lib/Lower/Support/ReductionProcessor.cpp +++ b/flang/lib/Lower/Support/ReductionProcessor.cpp @@ -573,10 +573,15 @@ OpType ReductionProcessor::createDeclareReduction( mlir::OpBuilder modBuilder(module.getBodyRegion()); mlir::Type valTy = fir::unwrapRefType(type); + mlir::TypeAttr boxedTy{}; + if (!isByRef) type = valTy; - decl = OpType::create(modBuilder, loc, reductionOpName, type); + if (isByRef) + boxedTy = mlir::TypeAttr::get(fir::unwrapPassByRefType(valTy)); + + decl = OpType::create(modBuilder, loc, reductionOpName, type, boxedTy); createReductionAllocAndInitRegions(converter, loc, decl, redId, type, isByRef); diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 1229018bd9b3e..11609ea7b6040 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -851,7 +851,8 @@ class DoConcurrentConversion if (!ompReducer) { ompReducer = mlir::omp::DeclareReductionOp::create( rewriter, firReducer.getLoc(), ompReducerName, - firReducer.getTypeAttr().getValue()); + firReducer.getTypeAttr().getValue(), + firReducer.getByrefElementTypeAttr()); cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(), ompReducer.getAllocRegion()); diff --git a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 index 4b6a643f94059..4c7b6ac5f5f9b 100644 --- a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 @@ -22,7 +22,7 @@ subroutine red_and_delayed_private ! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : i32 ! CHECK-LABEL: omp.declare_reduction -! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> alloc +! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> attributes {byref_element_type = i32} alloc ! CHECK-LABEL: _QPred_and_delayed_private ! CHECK: omp.parallel diff --git a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 index 41c7d69ebb3ba..f56875dcb518b 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_10:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_10]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 index aa91e1e0e8b15..d9ba3bed464f8 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 @@ -12,7 +12,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_15:.*]] = fir.alloca !fir.box<!fir.array<3x2xi32>> ! CHECK: omp.yield(%[[VAL_15]] : !fir.ref<!fir.box<!fir.array<3x2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 index 59595de338d50..636660f279e85 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 @@ -17,7 +17,7 @@ program reduce print *,i end program -! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> attributes {byref_element_type = !fir.array<3xi32>} alloc { ! CPU: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CPU: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CPU-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 index 14338c6f50817..9cf8a63427ed1 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 @@ -13,7 +13,7 @@ program reduce print *,i end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 index 36344458d1cae..3de2ba8f61f8e 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 @@ -19,7 +19,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction3.f90 b/flang/test/Lower/OpenMP/parallel-reduction3.f90 index 9af18378f0ae0..da337378862be 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction3.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction3.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 index 8b94d51f986f5..4a0593ff9eca4 100644 --- a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 +++ b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 @@ -9,7 +9,7 @@ subroutine max_array_reduction(l, r) !$omp end parallel end subroutine -! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/sections-array-reduction.f90 b/flang/test/Lower/OpenMP/sections-array-reduction.f90 index 2f2808cebfc0c..0dbe9e3673395 100644 --- a/flang/test/Lower/OpenMP/sections-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/sections-array-reduction.f90 @@ -14,7 +14,7 @@ subroutine sectionsReduction(x) end subroutine -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 index 18a4f75b86309..3a63bb09c59de 100644 --- a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 index 2cd953de0dffa..ed81577ecce16 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 @@ -32,7 +32,7 @@ program reduce15 print *,"min: ", mins end program -! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { @@ -93,7 +93,7 @@ program reduce15 ! CHECK: omp.yield ! CHECK: } -! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 index 663851cba46c6..d8c0a36db126e 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> attributes {byref_element_type = i32} alloc { ! CHECK: %[[VAL_2:.*]] = fir.alloca !fir.box<!fir.heap<i32>> ! CHECK: omp.yield(%[[VAL_2]] : !fir.ref<!fir.box<!fir.heap<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 index 209ee9a4e0cef..28acb8f19531f 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 @@ -22,7 +22,7 @@ subroutine reduce(r) end subroutine end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xf64>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 index 2233a74600948..ec448cf20f111 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 @@ -11,7 +11,7 @@ program reduce !$omp end parallel do end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<2xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 index 211bde19da8db..9da05a290ec21 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 @@ -19,7 +19,7 @@ subroutine sub(a, lb, ub) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<?xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 index afaeba27c5eae..14b657c8e180d 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> attributes {byref_element_type = !fir.array<2xi32>} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 index 25b2e97a1b7f7..d0a0c38e4ccb1 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 index edd2bcb1d6be8..60a162d8f8002 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 @@ -24,7 +24,7 @@ program main endprogram -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<3x3xf64>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<3x3xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 index 27b726376fbeb..f640f5caddf76 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 @@ -18,7 +18,7 @@ program reduce_pointer deallocate(v) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 index 873fd10dd1b97..4fb67c094b594 100644 --- a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 +++ b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 @@ -8,7 +8,7 @@ subroutine do_concurrent_allocatable end do end subroutine -! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] alloc { +! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] attributes {byref_element_type = !fir.array<?x?xf32>} alloc { ! CHECK: %[[ALLOC:.*]] = fir.alloca ! CHECK: fir.yield(%[[ALLOC]] : ![[RED_TYPE]]) ! CHECK: } init { diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 5331cb5abdc6f..f4192f9b49fd9 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -1448,11 +1448,15 @@ class OpenMPIRBuilder { ReductionInfo(Type *ElementType, Value *Variable, Value *PrivateVariable, EvalKind EvaluationKind, ReductionGenCBTy ReductionGen, ReductionGenClangCBTy ReductionGenClang, - ReductionGenAtomicCBTy AtomicReductionGen) + ReductionGenAtomicCBTy AtomicReductionGen, + Type *ByRefAllocatedType = nullptr, + Type *ByRefElementType = nullptr) : ElementType(ElementType), Vari... [truncated] |
| @llvm/pr-subscribers-mlir-llvm Author: Kareem Ergawy (ergawy) ChangesAdds initial support for GPU by-ref reductions. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end targetThis PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example:
Patch is 54.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165714.diff 34 Files Affected:
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index fddeba98adccc..9a8c75073aa4c 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -1784,8 +1784,8 @@ void CGOpenMPRuntimeGPU::emitReduction( llvm::OpenMPIRBuilder::InsertPointTy AfterIP = cantFail(OMPBuilder.createReductionsGPU( - OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, false, TeamsReduction, - llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, + OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, {}, false, + TeamsReduction, llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, CGF.getTarget().getGridValue(), C.getLangOpts().OpenMPCUDAReductionBufNum, RTLoc)); CGF.Builder.restoreIP(AfterIP); diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td index 58a317cf5d691..ff4dab1136ee9 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.td +++ b/flang/include/flang/Optimizer/Dialect/FIROps.td @@ -3743,7 +3743,8 @@ def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove, }]; let arguments = (ins SymbolNameAttr:$sym_name, - TypeAttr:$type); + TypeAttr:$type, + OptionalAttr<TypeAttr>:$byref_element_type); let regions = (region MaxSizedRegion<1>:$allocRegion, AnyRegion:$initializerRegion, diff --git a/flang/lib/Lower/Support/ReductionProcessor.cpp b/flang/lib/Lower/Support/ReductionProcessor.cpp index 605a5b6b20b94..e02cd8fac823b 100644 --- a/flang/lib/Lower/Support/ReductionProcessor.cpp +++ b/flang/lib/Lower/Support/ReductionProcessor.cpp @@ -573,10 +573,15 @@ OpType ReductionProcessor::createDeclareReduction( mlir::OpBuilder modBuilder(module.getBodyRegion()); mlir::Type valTy = fir::unwrapRefType(type); + mlir::TypeAttr boxedTy{}; + if (!isByRef) type = valTy; - decl = OpType::create(modBuilder, loc, reductionOpName, type); + if (isByRef) + boxedTy = mlir::TypeAttr::get(fir::unwrapPassByRefType(valTy)); + + decl = OpType::create(modBuilder, loc, reductionOpName, type, boxedTy); createReductionAllocAndInitRegions(converter, loc, decl, redId, type, isByRef); diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 1229018bd9b3e..11609ea7b6040 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -851,7 +851,8 @@ class DoConcurrentConversion if (!ompReducer) { ompReducer = mlir::omp::DeclareReductionOp::create( rewriter, firReducer.getLoc(), ompReducerName, - firReducer.getTypeAttr().getValue()); + firReducer.getTypeAttr().getValue(), + firReducer.getByrefElementTypeAttr()); cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(), ompReducer.getAllocRegion()); diff --git a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 index 4b6a643f94059..4c7b6ac5f5f9b 100644 --- a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 @@ -22,7 +22,7 @@ subroutine red_and_delayed_private ! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : i32 ! CHECK-LABEL: omp.declare_reduction -! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> alloc +! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> attributes {byref_element_type = i32} alloc ! CHECK-LABEL: _QPred_and_delayed_private ! CHECK: omp.parallel diff --git a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 index 41c7d69ebb3ba..f56875dcb518b 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_10:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_10]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 index aa91e1e0e8b15..d9ba3bed464f8 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 @@ -12,7 +12,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_15:.*]] = fir.alloca !fir.box<!fir.array<3x2xi32>> ! CHECK: omp.yield(%[[VAL_15]] : !fir.ref<!fir.box<!fir.array<3x2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 index 59595de338d50..636660f279e85 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 @@ -17,7 +17,7 @@ program reduce print *,i end program -! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> attributes {byref_element_type = !fir.array<3xi32>} alloc { ! CPU: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CPU: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CPU-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 index 14338c6f50817..9cf8a63427ed1 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 @@ -13,7 +13,7 @@ program reduce print *,i end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 index 36344458d1cae..3de2ba8f61f8e 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 @@ -19,7 +19,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction3.f90 b/flang/test/Lower/OpenMP/parallel-reduction3.f90 index 9af18378f0ae0..da337378862be 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction3.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction3.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 index 8b94d51f986f5..4a0593ff9eca4 100644 --- a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 +++ b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 @@ -9,7 +9,7 @@ subroutine max_array_reduction(l, r) !$omp end parallel end subroutine -! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/sections-array-reduction.f90 b/flang/test/Lower/OpenMP/sections-array-reduction.f90 index 2f2808cebfc0c..0dbe9e3673395 100644 --- a/flang/test/Lower/OpenMP/sections-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/sections-array-reduction.f90 @@ -14,7 +14,7 @@ subroutine sectionsReduction(x) end subroutine -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 index 18a4f75b86309..3a63bb09c59de 100644 --- a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 index 2cd953de0dffa..ed81577ecce16 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 @@ -32,7 +32,7 @@ program reduce15 print *,"min: ", mins end program -! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { @@ -93,7 +93,7 @@ program reduce15 ! CHECK: omp.yield ! CHECK: } -! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 index 663851cba46c6..d8c0a36db126e 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> attributes {byref_element_type = i32} alloc { ! CHECK: %[[VAL_2:.*]] = fir.alloca !fir.box<!fir.heap<i32>> ! CHECK: omp.yield(%[[VAL_2]] : !fir.ref<!fir.box<!fir.heap<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 index 209ee9a4e0cef..28acb8f19531f 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 @@ -22,7 +22,7 @@ subroutine reduce(r) end subroutine end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xf64>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 index 2233a74600948..ec448cf20f111 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 @@ -11,7 +11,7 @@ program reduce !$omp end parallel do end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<2xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 index 211bde19da8db..9da05a290ec21 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 @@ -19,7 +19,7 @@ subroutine sub(a, lb, ub) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<?xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 index afaeba27c5eae..14b657c8e180d 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> attributes {byref_element_type = !fir.array<2xi32>} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 index 25b2e97a1b7f7..d0a0c38e4ccb1 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 index edd2bcb1d6be8..60a162d8f8002 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 @@ -24,7 +24,7 @@ program main endprogram -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<3x3xf64>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<3x3xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 index 27b726376fbeb..f640f5caddf76 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 @@ -18,7 +18,7 @@ program reduce_pointer deallocate(v) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 index 873fd10dd1b97..4fb67c094b594 100644 --- a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 +++ b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 @@ -8,7 +8,7 @@ subroutine do_concurrent_allocatable end do end subroutine -! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] alloc { +! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] attributes {byref_element_type = !fir.array<?x?xf32>} alloc { ! CHECK: %[[ALLOC:.*]] = fir.alloca ! CHECK: fir.yield(%[[ALLOC]] : ![[RED_TYPE]]) ! CHECK: } init { diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 5331cb5abdc6f..f4192f9b49fd9 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -1448,11 +1448,15 @@ class OpenMPIRBuilder { ReductionInfo(Type *ElementType, Value *Variable, Value *PrivateVariable, EvalKind EvaluationKind, ReductionGenCBTy ReductionGen, ReductionGenClangCBTy ReductionGenClang, - ReductionGenAtomicCBTy AtomicReductionGen) + ReductionGenAtomicCBTy AtomicReductionGen, + Type *ByRefAllocatedType = nullptr, + Type *ByRefElementType = nullptr) : ElementType(ElementType), Vari... [truncated] |
| @llvm/pr-subscribers-mlir-openmp Author: Kareem Ergawy (ergawy) ChangesAdds initial support for GPU by-ref reductions. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end targetThis PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example:
Patch is 54.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165714.diff 34 Files Affected:
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp index fddeba98adccc..9a8c75073aa4c 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp @@ -1784,8 +1784,8 @@ void CGOpenMPRuntimeGPU::emitReduction( llvm::OpenMPIRBuilder::InsertPointTy AfterIP = cantFail(OMPBuilder.createReductionsGPU( - OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, false, TeamsReduction, - llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, + OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, {}, false, + TeamsReduction, llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang, CGF.getTarget().getGridValue(), C.getLangOpts().OpenMPCUDAReductionBufNum, RTLoc)); CGF.Builder.restoreIP(AfterIP); diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td index 58a317cf5d691..ff4dab1136ee9 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.td +++ b/flang/include/flang/Optimizer/Dialect/FIROps.td @@ -3743,7 +3743,8 @@ def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove, }]; let arguments = (ins SymbolNameAttr:$sym_name, - TypeAttr:$type); + TypeAttr:$type, + OptionalAttr<TypeAttr>:$byref_element_type); let regions = (region MaxSizedRegion<1>:$allocRegion, AnyRegion:$initializerRegion, diff --git a/flang/lib/Lower/Support/ReductionProcessor.cpp b/flang/lib/Lower/Support/ReductionProcessor.cpp index 605a5b6b20b94..e02cd8fac823b 100644 --- a/flang/lib/Lower/Support/ReductionProcessor.cpp +++ b/flang/lib/Lower/Support/ReductionProcessor.cpp @@ -573,10 +573,15 @@ OpType ReductionProcessor::createDeclareReduction( mlir::OpBuilder modBuilder(module.getBodyRegion()); mlir::Type valTy = fir::unwrapRefType(type); + mlir::TypeAttr boxedTy{}; + if (!isByRef) type = valTy; - decl = OpType::create(modBuilder, loc, reductionOpName, type); + if (isByRef) + boxedTy = mlir::TypeAttr::get(fir::unwrapPassByRefType(valTy)); + + decl = OpType::create(modBuilder, loc, reductionOpName, type, boxedTy); createReductionAllocAndInitRegions(converter, loc, decl, redId, type, isByRef); diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 1229018bd9b3e..11609ea7b6040 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -851,7 +851,8 @@ class DoConcurrentConversion if (!ompReducer) { ompReducer = mlir::omp::DeclareReductionOp::create( rewriter, firReducer.getLoc(), ompReducerName, - firReducer.getTypeAttr().getValue()); + firReducer.getTypeAttr().getValue(), + firReducer.getByrefElementTypeAttr()); cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(), ompReducer.getAllocRegion()); diff --git a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 index 4b6a643f94059..4c7b6ac5f5f9b 100644 --- a/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 +++ b/flang/test/Lower/OpenMP/delayed-privatization-reduction-byref.f90 @@ -22,7 +22,7 @@ subroutine red_and_delayed_private ! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : i32 ! CHECK-LABEL: omp.declare_reduction -! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> alloc +! CHECK-SAME: @[[REDUCTION_SYM:.*]] : !fir.ref<i32> attributes {byref_element_type = i32} alloc ! CHECK-LABEL: _QPred_and_delayed_private ! CHECK: omp.parallel diff --git a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 index 41c7d69ebb3ba..f56875dcb518b 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_10:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_10]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 index aa91e1e0e8b15..d9ba3bed464f8 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array-lb.f90 @@ -12,7 +12,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x2xi32 : !fir.ref<!fir.box<!fir.array<3x2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_15:.*]] = fir.alloca !fir.box<!fir.array<3x2xi32>> ! CHECK: omp.yield(%[[VAL_15]] : !fir.ref<!fir.box<!fir.array<3x2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 index 59595de338d50..636660f279e85 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array.f90 @@ -17,7 +17,7 @@ program reduce print *,i end program -! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CPU-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> attributes {byref_element_type = !fir.array<3xi32>} alloc { ! CPU: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CPU: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CPU-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 index 14338c6f50817..9cf8a63427ed1 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-array2.f90 @@ -13,7 +13,7 @@ program reduce print *,i end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3xi32 : !fir.ref<!fir.box<!fir.array<3xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<3xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<3xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 index 36344458d1cae..3de2ba8f61f8e 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 @@ -19,7 +19,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_Uxi32 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>> attributes {byref_element_type = !fir.array<?xi32>} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/parallel-reduction3.f90 b/flang/test/Lower/OpenMP/parallel-reduction3.f90 index 9af18378f0ae0..da337378862be 100644 --- a/flang/test/Lower/OpenMP/parallel-reduction3.f90 +++ b/flang/test/Lower/OpenMP/parallel-reduction3.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 index 8b94d51f986f5..4a0593ff9eca4 100644 --- a/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 +++ b/flang/test/Lower/OpenMP/reduction-array-intrinsic.f90 @@ -9,7 +9,7 @@ subroutine max_array_reduction(l, r) !$omp end parallel end subroutine -! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<?xi32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<?xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/sections-array-reduction.f90 b/flang/test/Lower/OpenMP/sections-array-reduction.f90 index 2f2808cebfc0c..0dbe9e3673395 100644 --- a/flang/test/Lower/OpenMP/sections-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/sections-array-reduction.f90 @@ -14,7 +14,7 @@ subroutine sectionsReduction(x) end subroutine -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 index 18a4f75b86309..3a63bb09c59de 100644 --- a/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 +++ b/flang/test/Lower/OpenMP/taskgroup-task-array-reduction.f90 @@ -1,7 +1,7 @@ ! RUN: bbc -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s ! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> {{.*}} alloc { ! [...] ! CHECK: omp.yield ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 index 2cd953de0dffa..ed81577ecce16 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 @@ -32,7 +32,7 @@ program reduce15 print *,"min: ", mins end program -! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @min_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { @@ -93,7 +93,7 @@ program reduce15 ! CHECK: omp.yield ! CHECK: } -! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> alloc { +! CHECK-LABEL: omp.declare_reduction @max_byref_box_heap_Uxi32 : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.heap<!fir.array<?xi32>>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xi32>>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 index 663851cba46c6..d8c0a36db126e 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 @@ -18,7 +18,7 @@ program reduce end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> attributes {byref_element_type = i32} alloc { ! CHECK: %[[VAL_2:.*]] = fir.alloca !fir.box<!fir.heap<i32>> ! CHECK: omp.yield(%[[VAL_2]] : !fir.ref<!fir.box<!fir.heap<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 index 209ee9a4e0cef..28acb8f19531f 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 @@ -22,7 +22,7 @@ subroutine reduce(r) end subroutine end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf64 : !fir.ref<!fir.box<!fir.array<?xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<?xf64>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<?xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 index 2233a74600948..ec448cf20f111 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb.f90 @@ -11,7 +11,7 @@ program reduce !$omp end parallel do end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<2xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<2xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 index 211bde19da8db..9da05a290ec21 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array-lb2.f90 @@ -19,7 +19,7 @@ subroutine sub(a, lb, ub) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> {{.*}} alloc { ! CHECK: } combiner { ! CHECK: ^bb0(%[[ARG0:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>, %[[ARG1:.*]]: !fir.ref<!fir.box<!fir.array<?xi32>>>): ! CHECK: %[[ARR0:.*]] = fir.load %[[ARG0]] : !fir.ref<!fir.box<!fir.array<?xi32>>> diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 index afaeba27c5eae..14b657c8e180d 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> attributes {byref_element_type = !fir.array<2xi32>} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 index 25b2e97a1b7f7..d0a0c38e4ccb1 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 @@ -14,7 +14,7 @@ program reduce print *,r end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_2xi32 : !fir.ref<!fir.box<!fir.array<2xi32>>> {{.*}} alloc { ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.box<!fir.array<2xi32>> ! CHECK: omp.yield(%[[VAL_8]] : !fir.ref<!fir.box<!fir.array<2xi32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 index edd2bcb1d6be8..60a162d8f8002 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 @@ -24,7 +24,7 @@ program main endprogram -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_3x3xf64 : !fir.ref<!fir.box<!fir.array<3x3xf64>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<3x3xf64>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.array<3x3xf64>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 index 27b726376fbeb..f640f5caddf76 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 @@ -18,7 +18,7 @@ program reduce_pointer deallocate(v) end program -! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> alloc { +! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_ptr_i32 : !fir.ref<!fir.box<!fir.ptr<i32>>> {{.*}} alloc { ! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> ! CHECK: omp.yield(%[[VAL_3]] : !fir.ref<!fir.box<!fir.ptr<i32>>>) ! CHECK-LABEL: } init { diff --git a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 index 873fd10dd1b97..4fb67c094b594 100644 --- a/flang/test/Lower/do_concurrent_reduce_allocatable.f90 +++ b/flang/test/Lower/do_concurrent_reduce_allocatable.f90 @@ -8,7 +8,7 @@ subroutine do_concurrent_allocatable end do end subroutine -! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] alloc { +! CHECK: fir.declare_reduction @[[RED_OP:.*]] : ![[RED_TYPE:.*]] attributes {byref_element_type = !fir.array<?x?xf32>} alloc { ! CHECK: %[[ALLOC:.*]] = fir.alloca ! CHECK: fir.yield(%[[ALLOC]] : ![[RED_TYPE]]) ! CHECK: } init { diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h index 5331cb5abdc6f..f4192f9b49fd9 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h +++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h @@ -1448,11 +1448,15 @@ class OpenMPIRBuilder { ReductionInfo(Type *ElementType, Value *Variable, Value *PrivateVariable, EvalKind EvaluationKind, ReductionGenCBTy ReductionGen, ReductionGenClangCBTy ReductionGenClang, - ReductionGenAtomicCBTy AtomicReductionGen) + ReductionGenAtomicCBTy AtomicReductionGen, + Type *ByRefAllocatedType = nullptr, + Type *ByRefElementType = nullptr) : ElementType(ElementType), Vari... [truncated] |
e23a9e7 to 643dafc Compare | ✅ With the latest revision this PR passed the C/C++ code formatter. |
23caa56 to 1fdd4e7 Compare | // pointer to the descriptor of the by-ref reduction element. | ||
| ShuffleType = RI.ByRefElementType; | ||
| | ||
| ShuffleSrcAddr = Builder.CreateGEP(RI.ByRefAllocatedType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat leaking the Fortran descriptor struct to the OMP IR builder. I think we can abstract this away somehow but I do not know if it will worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do this. It defeats the purpose of OMPIRBuilder. It would be better to add optional regions to the mlir reduction op which extract the information we want and then inline them here. I suspect we will need to read much more out of the box to support variable length arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to add optional regions to the mlir reduction op which extract the information we want and then inline them here.
The issue I see with this is that we would extend the op to model what we need for the GPU on the MLIR level.
It defeats the purpose of OMPIRBuilder.
If we have the regions, the OMPIRBuilder will still have to be aware of them, I think. At least in the form of callbacks passed from MLIR to LLVM translation so that we inline these regions at the proper locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it would have to understand these new regions. I was just imagining the region would allow us to keep the codegen for accessing elements of a box inside of flang. I'm worried that one day some other MLIR user may want to lower through this (e.g. maybe the efforts to represent the clang AST in MLIR) and then hit strange bugs from these bits of code that make assumptions about the use of fortran descriptors.
So something like this might work
omp.declare_reduction @add_reduction_byref_box_?xi32 : !fir.ref<!fir.box<!fir.array<?xi32>>> alloc { %0 = fir.alloca !fir.box<!fir.array<?xi32>> omp.yield(%0 : !fir.ref<!fir.box<!fir.array<?xi32>>>) } init { ^bb0(%arg0: !fir.ref<!fir.box<!fir.array<?xi32>>>, %arg1: !fir.ref<!fir.box<!fir.array<?xi32>>>): ... omp.yield(%arg1 : !fir.ref<!fir.box<!fir.array<?xi32>>>) } combiner { ^bb0(%arg0: !fir.ref<!fir.box<!fir.array<?xi32>>>, %arg1: !fir.ref<!fir.box<!fir.array<?xi32>>>): ... omp.yield(%arg0 : !fir.ref<!fir.box<!fir.array<?xi32>>>) } cleanup { ^bb0(%arg0: !fir.ref<!fir.box<!fir.array<?xi32>>>): ... omp.yield } data_size { ^bb0(%arg0: !fir.ref<!fir.box<!fir.array<?xi32>>>): %c1 = arith.constant 0 : i32 %box = fir.load %arg0 %ele_size = fir.box_elesize %box %dims:3 = fir.box_dims %box, %c1 %num_elements = ... %size = arith.muli %ele_size %num_elements omp.yield(%size) } data_addr { ^bb0(%arg0: !fir.ref<!fir.box<!fir.array<?xi32>>>): %box = fir.load %arg0 %addr = fir.box_addr %box omp.yield(%addr) } Then other lowering routes (e.g. via mlir's memref) would be free to populate these regions differently. I guess there should be some sensible default for types which aren't passed via a descriptor.
For the Fortran case, it is important we use fir.box_elesize because the size might not be known at compile time (I think one example of that is polymorphic types). If it isn't done already, this could easily be canonicalised within flang to a constant for types which do have a size known at compile time. Similarly, a constant size should be generated where possible so that MLIR can fold the whole data_size region into a constant where possible.
I'm mentioning all of this now because there is an issue asking about declare reduction inside of target regions so you might want to consider a design which is future proof. However, if you would rather not do that then I'm happy so long as we have the right error checks to ensure the compiler doesn't crash or produce nonsense code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, I had to take some off. I think your suggestion would be cleaner that my current approach. The only downside I see is that, now we are leaking both Fortran and GPU-implemenation details to the op's definition. But it will be clearer what the op is encapsulating. I will prototype your suggestion and see how it looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this either way. It is not clear to me that any other front-end is likely to use the data_addr or data_size regions so it may be specific to flang. On the other hand, hard-coding how to access the data in the OMPIRBuilder is not as clean. Prototyping seems like a good idea. Btw, what would the default implementation be for the data_addr region, just a yield %arg0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, what would the default implementation be for the data_addr region, just a
yield %arg0?
I think it should be empty. For value reductions, it won't be printed and won't be used by any parts of the code-gen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7da2821 contains an implementation to support scalar allocatables and const-shaped arrays.
Meinersbur left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const ReductionInfo &RI = En.value(); | ||
| Value *LHS = RI.Variable; | ||
| Type *ValueType = RI.ElementType; | ||
| Value *RedValue = RI.Variable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Value *RedValue = RI.Variable; | |
| Value *LHS = RI.Variable; |
Why the rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match the naming used in CPU reduction.
I think RedValue gives more context for the code than just LHS.
| Thanks for the review Michael.
Yes, exactly. The main problem for reduction by reference is that, prior to this PR, we were shuffling (from remote lanes within the same warp or across different warps within the block) pointers/references to the private reduction values rather than the private reduction values themselves. The same problem is also present for inter-team reduction-value combination for which I have a fix and will open a follow-up PR. This PR only focuses on single-team (i.e. intra- and inter-warp reductions).
This is determined by the ReductionProcessor (@tblah please correct me if I am wrong here). |
Isn't |
1fdd4e7 to c02ae7b Compare
Right, I think I did not properly understand the quote your shared. |
Meinersbur left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the explanation
The main problem for reduction by reference is that, prior to this PR, we were shuffling (from remote lanes within the same warp or across different warps within the block) pointers/references to the private reduction values rather than the private reduction values themselves.
this PR makes sense to me. Would be good if it could be added to the summary.
LGTM
| if (!isByRef) | ||
| type = valTy; | ||
| | ||
| decl = OpType::create(modBuilder, loc, reductionOpName, type); | ||
| if (isByRef) | ||
| boxedTy = mlir::TypeAttr::get(fir::unwrapPassByRefType(valTy)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| if (!isByRef) | |
| type = valTy; | |
| decl = OpType::create(modBuilder, loc, reductionOpName, type); | |
| if (isByRef) | |
| boxedTy = mlir::TypeAttr::get(fir::unwrapPassByRefType(valTy)); | |
| if (isByRef) | |
| boxedTy = mlir::TypeAttr::get(fir::unwrapPassByRefType(valTy)); | |
| else | |
| type = valTy; |
I might be mising something, but for which types that would not be the case? The attribute stores the actual type on which the reduction operator will apply so I think it should be convertible, otherwise that would mean the compiler does not support reductions by-value on that type.
No, I will try to do that.
Shuffling reinterprets the data as bytes and moves the size of a type regardless of what the type original was (see: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp#L2454). |
I think some time in the past, translation of a literal box (not a pointer to a box) from fir into LLVMIR (dialect) wasn't implemented. Boxes are weird because they have a different size depending on the rank (possibly only known at runtime) and whether the addendum is required. It may now be implemented. If you already tried all of the weirder fortran types and didn't crash the compiler then that's okay.
Thanks for the explanation. I suspect that implementation won't work correctly in cases where the element size is only known dynamically. That's okay so long as we catch these cases and refuse to compile it instead of crashing the compiler or generating broken code. |
c02ae7b to 7da2821 Compare
tblah left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM so long as the people who understand GPUs are happy too :)
I would suggest restricting the new data pointer region to only a single block, but that's up to you.
30b6e73 to d92d501 Compare
bhandarkar-pranav left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, @ergawy. I had some comments in-flight that I decided to publish before i pick up the rest of the review first thing tomorrow. Sorry about the delay.
| mlir::OpBuilder modBuilder(module.getBodyRegion()); | ||
| mlir::Type valTy = fir::unwrapRefType(type); | ||
| // For by-ref reductions, we want to keep track of the | ||
| // boxed/referenced/allocated type. For example, a for `real, allocatable` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sed 's/a for/for a/'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| be executed after the reduction has completed. | ||
| 6. The DataPtrPtr region specifies how to access the base address of a | ||
| boxed-value. This is used, in particular, for GPU reductions in order | ||
| know where partial reduction resutls are stored in remote lanes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sed s/resutls/results/
Also, at the top of this numbered list there is a line that says
This requires two mandatory and three optional regions. It should be updated to say 4 optional regions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deon 😛
| @@ -1017,6 +1020,31 @@ makeAtomicReductionGen(omp::DeclareReductionOp decl, | |||
| return atomicGen; | |||
| } | |||
| | |||
| static OwningDataPtrPtrReductionGen | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, could you please add a comment here about this function just like the other makexxxGen functions above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
d92d501 to 98751f2 Compare Adds initial support for GPU by-ref reductions. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in `target` regions. For example: ```fortran integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end target ``` This PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example: * Support inter-block value combination is still not supported. Therefore, `target teams distribute parallel do` is still not supported. * Support for dynamically-sized arrays still needs to be added. * Support for more than one allocatable/array on the same `reduction` clause.
98751f2 to 8f2acf6 Compare
jsjodin left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bhandarkar-pranav left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Ignore the nit if you don't have any other changes to make. Thank you for this PR.
| contains the value of the thread-local reduction accumulator. This will | ||
| be executed after the reduction has completed. | ||
| 6. The DataPtrPtr region specifies how to access the base address of a | ||
| boxed-value. This is used, in particular, for GPU reductions in order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
< nit >: in order to know
| LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/33900 Here is the relevant piece of the build log for the reference |
The reported problem does not seem to be relevant to the PR changes. Anyone, please let me know if I missed something. |
Extends the work started in #165714 by supporting team reductions. Similar to what was done in #165714, this PR introduces proper allocations, loads, and stores for by-ref reductions in teams-related callbacks: * `_omp_reduction_list_to_global_copy_func`, * `_omp_reduction_list_to_global_reduce_func`, * `_omp_reduction_global_to_list_copy_func`, and * `_omp_reduction_global_to_list_reduce_func`.
…lvm#165714) Adds initial support for GPU by-ref reductions. The main problem for reduction by reference is that, prior to this PR, we were shuffling (from remote lanes within the same warp or across different warps within the block) pointers/references to the private reduction values rather than the private reduction values themselves. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in `target` regions. For example: ```fortran integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end target ``` This PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example: * Support inter-block value combination is still not supported. Therefore, `target teams distribute parallel do` is still not supported. * Support for dynamically-sized arrays still needs to be added. * Support for more than one allocatable/array on the same `reduction` clause.
…lvm#165714) Adds initial support for GPU by-ref reductions. The main problem for reduction by reference is that, prior to this PR, we were shuffling (from remote lanes within the same warp or across different warps within the block) pointers/references to the private reduction values rather than the private reduction values themselves. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in `target` regions. For example: ```fortran integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end target ``` This PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example: * Support inter-block value combination is still not supported. Therefore, `target teams distribute parallel do` is still not supported. * Support for dynamically-sized arrays still needs to be added. * Support for more than one allocatable/array on the same `reduction` clause.
Extends the work started in #165714 by supporting team reductions. Similar to what was done in #165714, this PR introduces proper allocations, loads, and stores for by-ref reductions in teams-related callbacks: * `_omp_reduction_list_to_global_copy_func`, * `_omp_reduction_list_to_global_reduce_func`, * `_omp_reduction_global_to_list_copy_func`, and * `_omp_reduction_global_to_list_reduce_func`.
… static arrays Adds smoke tests for the changes introduced in upstream PR: llvm/llvm-project#165714. These tests cannot be added as offloading tests upstream (yet) since they need to link wit h flang's device RT lib.
… static arrays (#1718) * [smoke-fort][OpenMP] Add tests for GPU reductions on allocatables and static arrays Adds smoke tests for the changes introduced in upstream PR: llvm/llvm-project#165714. These tests cannot be added as offloading tests upstream (yet) since they need to link wit h flang's device RT lib. * review comments, Michael

Adds initial support for GPU by-ref reductions. The main problem for reduction by reference is that, prior to this PR, we were shuffling (from remote lanes within the same warp or across different warps within the block) pointers/references to the private reduction values rather than the private reduction values themselves.
In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in
targetregions. For example:This PR supports by-ref reductions on the intra- and inter-warp levels.
So far, there are still steps to be takens for full support of by-ref reductions, for example:
target teams distribute parallel dois still not supported.reductionclause.