Skip to content

Conversation

@jdoerfert
Copy link
Member

The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.

@jdoerfert jdoerfert requested review from jhuber6 and shiltian October 25, 2023 21:37
@llvmbot llvmbot added flang:openmp llvm:transforms clang:openmp OpenMP related changes to Clang openmp:libomptarget OpenMP offload runtime labels Oct 25, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 25, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-flang-openmp

Author: Johannes Doerfert (jdoerfert)

Changes

The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.


Patch is 166.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70257.diff

31 Files Affected:

  • (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+1-1)
  • (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+17-10)
  • (modified) llvm/lib/Transforms/IPO/OpenMPOpt.cpp (+29)
  • (modified) llvm/test/Transforms/OpenMP/always_inline_device.ll (+3-3)
  • (modified) llvm/test/Transforms/OpenMP/custom_state_machines.ll (+41-41)
  • (modified) llvm/test/Transforms/OpenMP/custom_state_machines_pre_lto.ll (+57-57)
  • (modified) llvm/test/Transforms/OpenMP/custom_state_machines_remarks.ll (+3-3)
  • (modified) llvm/test/Transforms/OpenMP/deduplication_target.ll (+2-2)
  • (modified) llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll (+7-7)
  • (modified) llvm/test/Transforms/OpenMP/global_constructor.ll (+2-2)
  • (modified) llvm/test/Transforms/OpenMP/globalization_remarks.ll (+2-2)
  • (modified) llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll (+2-2)
  • (modified) llvm/test/Transforms/OpenMP/is_spmd_exec_mode_fold.ll (+9-9)
  • (modified) llvm/test/Transforms/OpenMP/nested_parallelism.ll (+5-5)
  • (modified) llvm/test/Transforms/OpenMP/parallel_deletion.ll (+25-25)
  • (modified) llvm/test/Transforms/OpenMP/parallel_level_fold.ll (+7-7)
  • (modified) llvm/test/Transforms/OpenMP/remove_globalization.ll (+4-4)
  • (modified) llvm/test/Transforms/OpenMP/replace_globalization.ll (+7-7)
  • (modified) llvm/test/Transforms/OpenMP/single_threaded_execution.ll (+2-2)
  • (modified) llvm/test/Transforms/OpenMP/spmdization.ll (+43-43)
  • (modified) llvm/test/Transforms/OpenMP/spmdization_assumes.ll (+3-3)
  • (modified) llvm/test/Transforms/OpenMP/spmdization_constant_prop.ll (+3-3)
  • (modified) llvm/test/Transforms/OpenMP/spmdization_guarding.ll (+4-4)
  • (modified) llvm/test/Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll (+7-7)
  • (modified) llvm/test/Transforms/OpenMP/spmdization_indirect.ll (+13-13)
  • (modified) llvm/test/Transforms/OpenMP/spmdization_kernel_env_dep.ll (+3-3)
  • (modified) llvm/test/Transforms/OpenMP/spmdization_no_guarding_two_reaching_kernels.ll (+7-7)
  • (modified) llvm/test/Transforms/OpenMP/spmdization_remarks.ll (+3-3)
  • (modified) llvm/test/Transforms/OpenMP/value-simplify-openmp-opt.ll (+3-3)
  • (modified) llvm/test/Transforms/PhaseOrdering/openmp-opt-module.ll (+2-2)
  • (modified) openmp/libomptarget/include/Environment.h (+7)
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def index 4823c4cc6b833ec..a06ec62350afae9 100644 --- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def +++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def @@ -97,7 +97,7 @@ __OMP_STRUCT_TYPE(AsyncInfo, __tgt_async_info, false, Int8Ptr) __OMP_STRUCT_TYPE(DependInfo, kmp_dep_info, false, SizeTy, SizeTy, Int8) __OMP_STRUCT_TYPE(Task, kmp_task_ompbuilder_t, false, VoidPtr, VoidPtr, Int32, VoidPtr, VoidPtr) __OMP_STRUCT_TYPE(ConfigurationEnvironment, ConfigurationEnvironmentTy, false, - Int8, Int8, Int8) + Int8, Int8, Int8, Int32, Int32, Int32, Int32) __OMP_STRUCT_TYPE(DynamicEnvironment, DynamicEnvironmentTy, false, Int16) __OMP_STRUCT_TYPE(KernelEnvironment, KernelEnvironmentTy, false, ConfigurationEnvironment, IdentPtr, DynamicEnvironmentPtr) diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 5b24e9fe2e0c5bd..e3880eb39d82d7f 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -4108,18 +4108,21 @@ OpenMPIRBuilder::createTargetInit(const LocationDescription &Loc, bool IsSPMD) { uint32_t SrcLocStrSize; Constant *SrcLocStr = getOrCreateSrcLocStr(Loc, SrcLocStrSize); Constant *Ident = getOrCreateIdent(SrcLocStr, SrcLocStrSize); - ConstantInt *IsSPMDVal = ConstantInt::getSigned( - IntegerType::getInt8Ty(Int8->getContext()), - IsSPMD ? OMP_TGT_EXEC_MODE_SPMD : OMP_TGT_EXEC_MODE_GENERIC); - ConstantInt *UseGenericStateMachineVal = ConstantInt::getSigned( - IntegerType::getInt8Ty(Int8->getContext()), !IsSPMD); - ConstantInt *MayUseNestedParallelismVal = - ConstantInt::getSigned(IntegerType::getInt8Ty(Int8->getContext()), true); - ConstantInt *DebugIndentionLevelVal = - ConstantInt::getSigned(IntegerType::getInt16Ty(Int8->getContext()), 0); + Constant *IsSPMDVal = ConstantInt::getSigned( + Int8, IsSPMD ? OMP_TGT_EXEC_MODE_SPMD : OMP_TGT_EXEC_MODE_GENERIC); + Constant *UseGenericStateMachineVal = ConstantInt::getSigned(Int8, !IsSPMD); + Constant *MayUseNestedParallelismVal = ConstantInt::getSigned(Int8, true); + Constant *DebugIndentionLevelVal = ConstantInt::getSigned(Int16, 0); - // We need to strip the debug prefix to get the correct kernel name. Function *Kernel = Builder.GetInsertBlock()->getParent(); + auto [MinThreadsVal, MaxThreadsVal] = readThreadBoundsForKernel(*Kernel); + auto [MinTeamsVal, MaxTeamsVal] = readTeamBoundsForKernel(*Kernel); + Constant *MinThreads = ConstantInt::getSigned(Int32, MinThreadsVal); + Constant *MaxThreads = ConstantInt::getSigned(Int32, MaxThreadsVal); + Constant *MinTeams = ConstantInt::getSigned(Int32, MinTeamsVal); + Constant *MaxTeams = ConstantInt::getSigned(Int32, MaxTeamsVal); + + // We need to strip the debug prefix to get the correct kernel name. StringRef KernelName = Kernel->getName(); const std::string DebugPrefix = "_debug__"; if (KernelName.ends_with(DebugPrefix)) @@ -4150,6 +4153,10 @@ OpenMPIRBuilder::createTargetInit(const LocationDescription &Loc, bool IsSPMD) { UseGenericStateMachineVal, MayUseNestedParallelismVal, IsSPMDVal, + MinThreads, + MinTeams, + MaxThreads, + MaxTeams, }); Constant *KernelEnvironmentInitializer = ConstantStruct::get( KernelEnvironment, { diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp index e50e74ea6c0d5aa..e9dbcd334321f34 100644 --- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp +++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp @@ -191,6 +191,10 @@ namespace KernelInfo { // uint8_t UseGenericStateMachine; // uint8_t MayUseNestedParallelism; // llvm::omp::OMPTgtExecModeFlags ExecMode; +// int32_t MinThreads; +// int32_t MaxThreads; +// int32_t MinTeams; +// int32_t MaxTeams; // }; // struct DynamicEnvironmentTy { @@ -217,6 +221,10 @@ KERNEL_ENVIRONMENT_IDX(Ident, 1) KERNEL_ENVIRONMENT_CONFIGURATION_IDX(UseGenericStateMachine, 0) KERNEL_ENVIRONMENT_CONFIGURATION_IDX(MayUseNestedParallelism, 1) KERNEL_ENVIRONMENT_CONFIGURATION_IDX(ExecMode, 2) +KERNEL_ENVIRONMENT_CONFIGURATION_IDX(MinThreads, 3) +KERNEL_ENVIRONMENT_CONFIGURATION_IDX(MaxThreads, 4) +KERNEL_ENVIRONMENT_CONFIGURATION_IDX(MinTeams, 5) +KERNEL_ENVIRONMENT_CONFIGURATION_IDX(MaxTeams, 6) #undef KERNEL_ENVIRONMENT_CONFIGURATION_IDX @@ -241,6 +249,10 @@ KERNEL_ENVIRONMENT_GETTER(Configuration, ConstantStruct) KERNEL_ENVIRONMENT_CONFIGURATION_GETTER(UseGenericStateMachine) KERNEL_ENVIRONMENT_CONFIGURATION_GETTER(MayUseNestedParallelism) KERNEL_ENVIRONMENT_CONFIGURATION_GETTER(ExecMode) +KERNEL_ENVIRONMENT_CONFIGURATION_GETTER(MinThreads) +KERNEL_ENVIRONMENT_CONFIGURATION_GETTER(MaxThreads) +KERNEL_ENVIRONMENT_CONFIGURATION_GETTER(MinTeams) +KERNEL_ENVIRONMENT_CONFIGURATION_GETTER(MaxTeams) #undef KERNEL_ENVIRONMENT_CONFIGURATION_GETTER @@ -3636,6 +3648,10 @@ struct AAKernelInfoFunction : AAKernelInfo { KERNEL_ENVIRONMENT_CONFIGURATION_SETTER(UseGenericStateMachine) KERNEL_ENVIRONMENT_CONFIGURATION_SETTER(MayUseNestedParallelism) KERNEL_ENVIRONMENT_CONFIGURATION_SETTER(ExecMode) + KERNEL_ENVIRONMENT_CONFIGURATION_SETTER(MinThreads) + KERNEL_ENVIRONMENT_CONFIGURATION_SETTER(MaxThreads) + KERNEL_ENVIRONMENT_CONFIGURATION_SETTER(MinTeams) + KERNEL_ENVIRONMENT_CONFIGURATION_SETTER(MaxTeams) #undef KERNEL_ENVIRONMENT_CONFIGURATION_SETTER @@ -3723,6 +3739,19 @@ struct AAKernelInfoFunction : AAKernelInfo { else setExecModeOfKernelEnvironment(AssumedExecModeC); + auto *Int32Ty = Type::getInt32Ty(Fn->getContext()); + auto [MinThreads, MaxThreads] = + OpenMPIRBuilder::readThreadBoundsForKernel(*Fn); + if (MinThreads) + setMinThreadsOfKernelEnvironment(ConstantInt::get(Int32Ty, MinThreads)); + if (MaxThreads) + setMaxThreadsOfKernelEnvironment(ConstantInt::get(Int32Ty, MaxThreads)); + auto [MinTeams, MaxTeams] = OpenMPIRBuilder::readTeamBoundsForKernel(*Fn); + if (MinTeams) + setMinTeamsOfKernelEnvironment(ConstantInt::get(Int32Ty, MinTeams)); + if (MaxTeams) + setMaxTeamsOfKernelEnvironment(ConstantInt::get(Int32Ty, MaxTeams)); + ConstantInt *MayUseNestedParallelismC = KernelInfo::getMayUseNestedParallelismFromKernelEnvironment(KernelEnvC); ConstantInt *AssumedMayUseNestedParallelismC = ConstantInt::get( diff --git a/llvm/test/Transforms/OpenMP/always_inline_device.ll b/llvm/test/Transforms/OpenMP/always_inline_device.ll index 5478924cf828b68..993f097d741188c 100644 --- a/llvm/test/Transforms/OpenMP/always_inline_device.ll +++ b/llvm/test/Transforms/OpenMP/always_inline_device.ll @@ -3,19 +3,19 @@ %struct.ident_t = type { i32, i32, i32, i32, ptr } %struct.KernelEnvironmentTy = type { %struct.ConfigurationEnvironmentTy, ptr, ptr } -%struct.ConfigurationEnvironmentTy = type { i8, i8, i8 } +%struct.ConfigurationEnvironmentTy = type { i8, i8, i8, i32, i32, i32, i32 } @0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1 @1 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, ptr @0 }, align 8 @G = external global i8 -@kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1 }, ptr @1, ptr null } +@kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @1, ptr null } ; Function Attrs: convergent norecurse nounwind ;. ; CHECK: @[[GLOB0:[0-9]+]] = private unnamed_addr constant [23 x i8] c" ; CHECK: @[[GLOB1:[0-9]+]] = private unnamed_addr constant [[STRUCT_IDENT_T:%.*]] { i32 0, i32 2, i32 0, i32 0, ptr @[[GLOB0]] }, align 8 ; CHECK: @[[G:[a-zA-Z0-9_$"\\.-]+]] = external global i8 -; CHECK: @[[KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 0, i8 3 }, ptr @[[GLOB1]], ptr null } +; CHECK: @[[KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 0, i8 3, i32 0, i32 0, i32 0, i32 0 }, ptr @[[GLOB1]], ptr null } ;. define weak void @__omp_offloading_fd02_c0934fc2_foo_l4() #0 { ; CHECK: Function Attrs: norecurse nounwind diff --git a/llvm/test/Transforms/OpenMP/custom_state_machines.ll b/llvm/test/Transforms/OpenMP/custom_state_machines.ll index 9eefb4d5fa509e6..89a298ff195493c 100644 --- a/llvm/test/Transforms/OpenMP/custom_state_machines.ll +++ b/llvm/test/Transforms/OpenMP/custom_state_machines.ll @@ -121,7 +121,7 @@ %struct.ident_t = type { i32, i32, i32, i32, ptr } %struct.KernelEnvironmentTy = type { %struct.ConfigurationEnvironmentTy, ptr, ptr } -%struct.ConfigurationEnvironmentTy = type { i8, i8, i8 } +%struct.ConfigurationEnvironmentTy = type { i8, i8, i8, i32, i32, i32, i32 } @0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1 @1 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 0, ptr @0 }, align 8 @@ -129,14 +129,14 @@ @G = external global i32, align 4 @3 = private unnamed_addr constant %struct.ident_t { i32 0, i32 322, i32 2, i32 0, ptr @0 }, align 8 -@__omp_offloading_14_a36502b_no_state_machine_needed_l14_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1 }, ptr @1, ptr null } -@__omp_offloading_14_a36502b_simple_state_machine_l22_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1 }, ptr @1, ptr null } -@__omp_offloading_14_a36502b_simple_state_machine_interprocedural_l39_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1 }, ptr @1, ptr null } -@__omp_offloading_14_a36502b_simple_state_machine_with_fallback_l55_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1 }, ptr @1, ptr null } -@__omp_offloading_14_a36502b_simple_state_machine_no_openmp_attr_l66_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1 }, ptr @1, ptr null } -@__omp_offloading_14_a36502b_simple_state_machine_pure_l77_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1 }, ptr @1, ptr null } -@__omp_offloading_14_a36502b_simple_state_machine_interprocedural_nested_recursive_l92_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1 }, ptr @1, ptr null } -@__omp_offloading_14_a36502b_no_state_machine_weak_callee_l112_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1 }, ptr @1, ptr null } +@__omp_offloading_14_a36502b_no_state_machine_needed_l14_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @1, ptr null } +@__omp_offloading_14_a36502b_simple_state_machine_l22_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @1, ptr null } +@__omp_offloading_14_a36502b_simple_state_machine_interprocedural_l39_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @1, ptr null } +@__omp_offloading_14_a36502b_simple_state_machine_with_fallback_l55_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @1, ptr null } +@__omp_offloading_14_a36502b_simple_state_machine_no_openmp_attr_l66_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @1, ptr null } +@__omp_offloading_14_a36502b_simple_state_machine_pure_l77_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @1, ptr null } +@__omp_offloading_14_a36502b_simple_state_machine_interprocedural_nested_recursive_l92_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @1, ptr null } +@__omp_offloading_14_a36502b_no_state_machine_weak_callee_l112_kernel_environment = local_unnamed_addr constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @1, ptr null } define weak void @__omp_offloading_14_a36502b_no_state_machine_needed_l14() #0 { entry: @@ -840,14 +840,14 @@ attributes #9 = { convergent nounwind readonly willreturn } ; AMDGPU: @[[GLOB2:[0-9]+]] = private unnamed_addr constant [[STRUCT_IDENT_T:%.*]] { i32 0, i32 2, i32 2, i32 0, ptr @[[GLOB0]] }, align 8 ; AMDGPU: @[[G:[a-zA-Z0-9_$"\\.-]+]] = external global i32, align 4 ; AMDGPU: @[[GLOB3:[0-9]+]] = private unnamed_addr constant [[STRUCT_IDENT_T:%.*]] { i32 0, i32 322, i32 2, i32 0, ptr @[[GLOB0]] }, align 8 -; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_NO_STATE_MACHINE_NEEDED_L14_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 0, i8 1 }, ptr @[[GLOB1]], ptr null } -; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_L22_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1 }, ptr @[[GLOB1]], ptr null } -; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_INTERPROCEDURAL_L39_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1 }, ptr @[[GLOB1]], ptr null } -; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_WITH_FALLBACK_L55_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1 }, ptr @[[GLOB1]], ptr null } -; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_NO_OPENMP_ATTR_L66_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1 }, ptr @[[GLOB1]], ptr null } -; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_PURE_L77_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1 }, ptr @[[GLOB1]], ptr null } -; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_INTERPROCEDURAL_NESTED_RECURSIVE_L92_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 3 }, ptr @[[GLOB1]], ptr null } -; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_NO_STATE_MACHINE_WEAK_CALLEE_L112_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 0, i8 1 }, ptr @[[GLOB1]], ptr null } +; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_NO_STATE_MACHINE_NEEDED_L14_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @[[GLOB1]], ptr null } +; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_L22_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @[[GLOB1]], ptr null } +; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_INTERPROCEDURAL_L39_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @[[GLOB1]], ptr null } +; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_WITH_FALLBACK_L55_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @[[GLOB1]], ptr null } +; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_NO_OPENMP_ATTR_L66_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @[[GLOB1]], ptr null } +; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_PURE_L77_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @[[GLOB1]], ptr null } +; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_INTERPROCEDURAL_NESTED_RECURSIVE_L92_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 3, i32 0, i32 0, i32 0, i32 0 }, ptr @[[GLOB1]], ptr null } +; AMDGPU: @[[__OMP_OFFLOADING_14_A36502B_NO_STATE_MACHINE_WEAK_CALLEE_L112_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 0, i8 1, i32 0, i32 0, i32 0, i32 0 }, ptr @[[GLOB1]], ptr null } ; AMDGPU: @[[__OMP_OUTLINED__2_WRAPPER_ID:[a-zA-Z0-9_$"\\.-]+]] = private constant i8 undef ; AMDGPU: @[[__OMP_OUTLINED__3_WRAPPER_ID:[a-zA-Z0-9_$"\\.-]+]] = private constant i8 undef ; AMDGPU: @[[__OMP_OUTLINED__5_WRAPPER_ID:[a-zA-Z0-9_$"\\.-]+]] = private constant i8 undef @@ -863,14 +863,14 @@ attributes #9 = { convergent nounwind readonly willreturn } ; NVPTX: @[[GLOB2:[0-9]+]] = private unnamed_addr constant [[STRUCT_IDENT_T:%.*]] { i32 0, i32 2, i32 2, i32 0, ptr @[[GLOB0]] }, align 8 ; NVPTX: @[[G:[a-zA-Z0-9_$"\\.-]+]] = external global i32, align 4 ; NVPTX: @[[GLOB3:[0-9]+]] = private unnamed_addr constant [[STRUCT_IDENT_T:%.*]] { i32 0, i32 322, i32 2, i32 0, ptr @[[GLOB0]] }, align 8 -; NVPTX: @[[__OMP_OFFLOADING_14_A36502B_NO_STATE_MACHINE_NEEDED_L14_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 0, i8 1 }, ptr @[[GLOB1]], ptr null } -; NVPTX: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_L22_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1 }, ptr @[[GLOB1]], ptr null } -; NVPTX: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_INTERPROCEDURAL_L39_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1 }, ptr @[[GLOB1]], ptr null } -; NVPTX: @[[__OMP_OFFLOADING_14_A36502B_SIMPLE_STATE_MACHINE_WITH_FALLBACK_L55_KERNEL_ENVIRONMENT:[a-zA-Z0-9_$"\\.-]+]] = local_unnamed_addr constant [[STRUCT_KERNELENVIRONMENTTY:%.*]] { [[STRUCT_CONFIGURATIONENVIRONMENTTY:%.*]] { i8 0, i8 1, i8 1 }, ptr @[[GLOB1]], ptr null } -; NVPTX: @[[__OMP_OFFLOADING_14_A365... [truncated] 
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial change, lots of tests impacted per the norm

@jdoerfert
Copy link
Member Author

Trivial change, lots of tests impacted per the norm

I take this as a good thing, honestly, even if I don't have tooling to do the update and it takes a little longer.

The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.
@jdoerfert jdoerfert force-pushed the min_max_threads_teams_kernel_env branch from 336b4d0 to 2702498 Compare October 26, 2023 21:46
@jdoerfert jdoerfert merged commit c2a1249 into llvm:main Oct 26, 2023
@joker-eph
Copy link
Collaborator

joker-eph commented Oct 26, 2023

Seems like this broke the MLIR bot, the omptarget-region-device-llvm.mlir is failing?
(the pre-merge test on this PR showed the issue: https://buildkite.com/llvm-project/github-pull-requests/builds/10573#018b6df3-e024-40ac-8445-17129f9947b7 )

joker-eph added a commit that referenced this pull request Oct 27, 2023
…Environment (#70257)" This reverts commit c2a1249. The MLIR bots are broken with an omp test failure.
joker-eph added a commit that referenced this pull request Oct 27, 2023
…e KernelEnvironment (#70257)"" This reverts commit ddbaa11. Reapply the original commit, the broken test was repaired in 5e51363 in the meantime.
@jdoerfert jdoerfert deleted the min_max_threads_teams_kernel_env branch October 29, 2023 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:openmp OpenMP related changes to Clang flang:openmp llvm:transforms openmp:libomptarget OpenMP offload runtime

4 participants