[clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors #66658

argentite · 2023-09-18T15:45:11Z

CUDA device code needs to be registered to the runtime before kernels can be launched. This is done through a global constructor. User code in Clang interpreter, is also executed through global_ctors. This patch ensures kernels can be launched in the same iteration it is defined in by making the registration first in the list.

This allows #include-ing a large portion of code that defines device functions and also launches kernels in clang-repl.

llvmbot · 2023-09-18T15:46:14Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Changes

CUDA device code needs to be registered to the runtime before kernels can be launched. This is done through a global constructor. User code in Clang interpreter, is also executed through global_ctors. This patch ensures kernels can be launched in the same iteration it is defined in by making the registration first in the list.

This allows #include-ing a large portion of code that defines device functions and also launches kernels in clang-repl.

Full diff: https://github.com/llvm/llvm-project/pull/66658.diff

2 Files Affected:

(modified) clang/lib/CodeGen/CodeGenModule.cpp (+1-1)
(added) clang/test/Interpreter/CUDA/launch-same-ptu.cu (+21)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp index 8b0c9340775cbe9..783865409c778f5 100644 --- a/clang/lib/CodeGen/CodeGenModule.cpp +++ b/clang/lib/CodeGen/CodeGenModule.cpp @@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); + AddGlobalCtor(CudaCtorFunction, 0); } if (OpenMPRuntime) { if (llvm::Function *OpenMPRequiresDirectiveRegFun = diff --git a/clang/test/Interpreter/CUDA/launch-same-ptu.cu b/clang/test/Interpreter/CUDA/launch-same-ptu.cu new file mode 100644 index 000000000000000..93e203a47212fbf --- /dev/null +++ b/clang/test/Interpreter/CUDA/launch-same-ptu.cu @@ -0,0 +1,21 @@ +// Tests __device__ function calls +// RUN: cat %s | clang-repl --cuda | FileCheck %s + +extern "C" int printf(const char*, ...); + +int var; +int* devptr = nullptr; +printf("cudaMalloc: %d\n", cudaMalloc((void **) &devptr, sizeof(int))); +// CHECK: cudaMalloc: 0 + +__device__ inline void test_device(int* value) { *value = 42; } __global__ void test_kernel(int* value) { test_device(value); } test_kernel<<<1,1>>>(devptr); +printf("CUDA Error: %d\n", cudaGetLastError()); +// CHECK-NEXT: CUDA Error: 0 + +printf("cudaMemcpy: %d\n", cudaMemcpy(&var, devptr, sizeof(int), cudaMemcpyDeviceToHost)); +// CHECK-NEXT: cudaMemcpy: 0 + +printf("Value: %d\n", var); +// CHECK-NEXT: Value: 42 + +%quit

…al_ctors CUDA device code needs to be registered to the runtime before kernels can be launched. This is done through a global constructor. User code in Clang interpreter, is also executed through global_ctors. This patch ensures kernels can be launched in the same iteration it is defined in by making the registration first in the list.

Artem-B · 2023-09-18T16:43:56Z

clang/lib/CodeGen/CodeGenModule.cpp

 if (Context.getLangOpts().CUDA && CUDARuntime) {
 if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule())
- AddGlobalCtor(CudaCtorFunction);
+ AddGlobalCtor(CudaCtorFunction, /*Priority=*/0);


User code in Clang interpreter, is also executed through global_ctors. This patch ensures kernels can be launched in the same iteration it is defined in by making the registration first in the list.

This sounds like an application-specific problem that may be addressable by lowering priority of user code initializers.

In general, I'm very reluctant to change the initialization order to be different from what NVCC generates. We do need to interoperate with NVIDIA's libraries and the change in initialization order is potentially risky. Considering that we have no practical way to test it, and that it appears to address something that affects only one application (and may be dealt with on the app level), I do not think we should change the priority for the clang-generated kernel registration code.

The underlying issues is not actually clang-repl specific, it also affects clang. For example, this seems to succeed in nvcc but fails with clang:

#include <cstdio> __global__ void kernel() {} class C { public: C() { kernel<<<1, 1>>>(); printf("Error: %d\n", cudaGetLastError()); } }; C c; int main() {}

This is fixed by this patch. Maybe we can look for a proper solution to this?

This is a very contrived example. While I agree that it currently does not work with CUDA, I am still not convinced that it is a problem that needs to be solved in clang.

Let's assume you've set the priority at X. Launching kernels from dynamic initializers with higher priority will still be broken, so the patch does not solve the problem conceptually.

If you set the priority of CUDA kernel initializers at the highest level (is that the ntent of priority=0?), can you guarantee that kernel registration never depends on anything else that was expected to get initialized before it? We also no longer have any wiggle room to run anything before kernel registration when we need to.

@MaskRay Fangrui, WDYT about bumping dynamic initializer priority in principle? Is there anything else we need to worry about?

@Artem-B, I don’t think @argentite is pushing particularly for this solution of the problem. It seems we agree that is a problem and the behavior of clang diverges from the reference implementation. I believe we should figure out how to fix it.

Rather than changing the priority we can book a slot for the kernel launch declaration respecting the init order.

I'd start with checking what NVCC generates for the initializers. Considering that ultimately we need to conform to CUDA runtime expectations and given lack of documentation, NVCC-generated code is the only reference we have.
Compile your example with -keep and see what NVCC-generated registration code looks like.

@argentite ping.

I have the same fear as @Artem-B, higher than default priorities are also sometimes reserved. We really need to see what nvcc does here, but what I could imagine (at least how I would solve it) is putting the constructor with the same priority before all other constructors.

@argentite, could we revisit this?

vgvassilev · 2023-09-30T06:31:26Z

cc: @hahnjo

anutosh491 · 2025-09-11T09:38:51Z

Curious if this helps with a recent issue we spotted !

#158021

If yes, I would like to debug more and try finishing up the work left here.

argentite added the cuda label Sep 18, 2023

argentite requested review from Artem-B and vgvassilev September 18, 2023 15:45

llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Sep 18, 2023

argentite force-pushed the cuda-ctor branch from fb806d7 to bed2919 Compare September 18, 2023 15:50

Artem-B reviewed Sep 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors #66658

[clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors #66658

Uh oh!

argentite commented Sep 18, 2023

llvmbot commented Sep 18, 2023 •

edited

Loading

Artem-B Sep 18, 2023

argentite Sep 18, 2023

Artem-B Sep 18, 2023

vgvassilev Sep 18, 2023

Artem-B Sep 18, 2023

vgvassilev Sep 30, 2023

hahnjo Oct 3, 2023

vgvassilev Oct 28, 2024

vgvassilev commented Sep 30, 2023

anutosh491 commented Sep 11, 2025

Labels

6 participants

[clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors #66658

Are you sure you want to change the base?

[clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors #66658

Uh oh!

Conversation

argentite commented Sep 18, 2023

llvmbot commented Sep 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vgvassilev commented Sep 30, 2023

anutosh491 commented Sep 11, 2025

Labels

6 participants

llvmbot commented Sep 18, 2023 •

edited

Loading