Skip to content

Conversation

@yxsamliu
Copy link
Collaborator

@yxsamliu yxsamliu commented Dec 7, 2023

Add a function attribute "amdgpu-lib-fun" to indicate that the function needs special handling in backend. Basically it will not be internalized so that it will not be removed by DCE after internalization. This is to keep the library functions that are not called by users' code but will be called by instructions generated by LLVM passes or instruction selection, e.g. sanitizers or lowering of 128 bit integer divisioin.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Dec 7, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2023

@llvm/pr-subscribers-clang-codegen

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Add a function attribute "amdgpu-lib-fun" to indicate that the function needs special handling in backend. Basically it will not be internalized so that it will not be removed by DCE after internalization. This is to keep the library functions that are not called by users' code but will be called by instructions generated by LLVM passes or instruction selection, e.g. sanitizers or lowering of 128 bit integer divisioin.


Full diff: https://github.com/llvm/llvm-project/pull/74737.diff

5 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+7)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+11)
  • (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+2)
  • (modified) clang/test/CodeGenCUDA/amdgpu-func-attrs.cu (+8)
  • (modified) clang/test/CodeGenOpenCL/amdgpu-attrs.cl (+4)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td index 121ed203829ce..676faddd2d1ac 100644 --- a/clang/include/clang/Basic/Attr.td +++ b/clang/include/clang/Basic/Attr.td @@ -2011,6 +2011,13 @@ def AMDGPUNumVGPR : InheritableAttr { let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">; } +def AMDGPULibFun : InheritableAttr { + let Spellings = [Clang<"amdgpu_lib_fun">]; + let Documentation = [AMDGPULibFunDocs]; + let Subjects = SubjectList<[Function]>; + let SimpleHandler = 1; +} + def AMDGPUKernelCall : DeclOrTypeAttr { let Spellings = [Clang<"amdgpu_kernel">]; let Documentation = [Undocumented]; diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td index 88f7c65e6e847..c2c77d4d1d817 100644 --- a/clang/include/clang/Basic/AttrDocs.td +++ b/clang/include/clang/Basic/AttrDocs.td @@ -2693,6 +2693,17 @@ An error will be given if: }]; } +def AMDGPULibFunDocs : Documentation { + let Category = DocCatAMDGPUAttributes; + let Content = [{ +The ``amdgpu_lib_fun`` attribute can be applied to a function for AMDGPU target +to indicate it is a library function which are handled specially in backend. +An AMDGPU library function is not internalized and can be used to fullfill +calls generated by LLVM passes or instruction selection. Unused AMDGPU library +functions will be eliminated by the backend. + }]; +} + def DocCatCallingConvs : DocumentationCategory<"Calling Conventions"> { let Content = [{ Clang supports several different calling conventions, depending on the target diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp index b654e3f12af8d..1a56ee3692d07 100644 --- a/clang/lib/CodeGen/Targets/AMDGPU.cpp +++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp @@ -356,6 +356,8 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes( if (NumVGPR != 0) F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR)); } + if (FD->getAttr<AMDGPULibFunAttr>()) + F->addFnAttr("amdgpu-lib-fun"); } /// Emits control constants used to change per-architecture behaviour in the diff --git a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu b/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu index 89add87919c12..e319cd4809e0d 100644 --- a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu +++ b/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu @@ -8,6 +8,9 @@ // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \ // RUN: -o - -x hip %s -munsafe-fp-atomics \ // RUN: | FileCheck -check-prefix=NO-UNSAFE-FP-ATOMICS %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \ +// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \ +// RUN: | FileCheck %s #include "Inputs/cuda.h" @@ -15,8 +18,13 @@ __device__ void test() { // UNSAFE-FP-ATOMICS: define{{.*}} void @_Z4testv() [[ATTR:#[0-9]+]] } +__attribute__((amdgpu_lib_fun)) __device__ void lib_fun() { +// CHECK: define{{.*}} void @_Z7lib_funv() [[LIB_FUN:#[0-9]+]] +} + // Make sure this is silently accepted on other targets. // NO-UNSAFE-FP-ATOMICS-NOT: "amdgpu-unsafe-fp-atomics" // UNSAFE-FP-ATOMICS-DAG: attributes [[ATTR]] = {{.*}}"amdgpu-unsafe-fp-atomics"="true" +// CHECK-DAG: attributes [[LIB_FUN]] = {{.*}}"amdgpu-lib-fun" diff --git a/clang/test/CodeGenOpenCL/amdgpu-attrs.cl b/clang/test/CodeGenOpenCL/amdgpu-attrs.cl index b0dfc97b53b2c..bce7739c7a429 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-attrs.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-attrs.cl @@ -147,6 +147,9 @@ kernel void default_kernel() { // CHECK: define{{.*}} amdgpu_kernel void @default_kernel() [[DEFAULT_KERNEL_ATTRS:#[0-9]+]] } +__attribute__((amdgpu_lib_fun)) void lib_fun() { +// CHECK: define{{.*}} void @lib_fun() [[LIB_FUN:#[0-9]+]] +} // Make sure this is silently accepted on other targets. // X86-NOT: "amdgpu-flat-work-group-size" @@ -191,3 +194,4 @@ kernel void default_kernel() { // CHECK-DAG: attributes [[A_FUNCTION]] = {{.*}} // CHECK-DAG: attributes [[DEFAULT_KERNEL_ATTRS]] = {{.*}} "amdgpu-flat-work-group-size"="1,256" +// CHECK-DAG: attributes [[LIB_FUN]] = {{.*}} "amdgpu-lib-fun" 
@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2023

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Add a function attribute "amdgpu-lib-fun" to indicate that the function needs special handling in backend. Basically it will not be internalized so that it will not be removed by DCE after internalization. This is to keep the library functions that are not called by users' code but will be called by instructions generated by LLVM passes or instruction selection, e.g. sanitizers or lowering of 128 bit integer divisioin.


Full diff: https://github.com/llvm/llvm-project/pull/74737.diff

5 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+7)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+11)
  • (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+2)
  • (modified) clang/test/CodeGenCUDA/amdgpu-func-attrs.cu (+8)
  • (modified) clang/test/CodeGenOpenCL/amdgpu-attrs.cl (+4)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td index 121ed203829ce..676faddd2d1ac 100644 --- a/clang/include/clang/Basic/Attr.td +++ b/clang/include/clang/Basic/Attr.td @@ -2011,6 +2011,13 @@ def AMDGPUNumVGPR : InheritableAttr { let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">; } +def AMDGPULibFun : InheritableAttr { + let Spellings = [Clang<"amdgpu_lib_fun">]; + let Documentation = [AMDGPULibFunDocs]; + let Subjects = SubjectList<[Function]>; + let SimpleHandler = 1; +} + def AMDGPUKernelCall : DeclOrTypeAttr { let Spellings = [Clang<"amdgpu_kernel">]; let Documentation = [Undocumented]; diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td index 88f7c65e6e847..c2c77d4d1d817 100644 --- a/clang/include/clang/Basic/AttrDocs.td +++ b/clang/include/clang/Basic/AttrDocs.td @@ -2693,6 +2693,17 @@ An error will be given if: }]; } +def AMDGPULibFunDocs : Documentation { + let Category = DocCatAMDGPUAttributes; + let Content = [{ +The ``amdgpu_lib_fun`` attribute can be applied to a function for AMDGPU target +to indicate it is a library function which are handled specially in backend. +An AMDGPU library function is not internalized and can be used to fullfill +calls generated by LLVM passes or instruction selection. Unused AMDGPU library +functions will be eliminated by the backend. + }]; +} + def DocCatCallingConvs : DocumentationCategory<"Calling Conventions"> { let Content = [{ Clang supports several different calling conventions, depending on the target diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp index b654e3f12af8d..1a56ee3692d07 100644 --- a/clang/lib/CodeGen/Targets/AMDGPU.cpp +++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp @@ -356,6 +356,8 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes( if (NumVGPR != 0) F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR)); } + if (FD->getAttr<AMDGPULibFunAttr>()) + F->addFnAttr("amdgpu-lib-fun"); } /// Emits control constants used to change per-architecture behaviour in the diff --git a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu b/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu index 89add87919c12..e319cd4809e0d 100644 --- a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu +++ b/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu @@ -8,6 +8,9 @@ // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \ // RUN: -o - -x hip %s -munsafe-fp-atomics \ // RUN: | FileCheck -check-prefix=NO-UNSAFE-FP-ATOMICS %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \ +// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \ +// RUN: | FileCheck %s #include "Inputs/cuda.h" @@ -15,8 +18,13 @@ __device__ void test() { // UNSAFE-FP-ATOMICS: define{{.*}} void @_Z4testv() [[ATTR:#[0-9]+]] } +__attribute__((amdgpu_lib_fun)) __device__ void lib_fun() { +// CHECK: define{{.*}} void @_Z7lib_funv() [[LIB_FUN:#[0-9]+]] +} + // Make sure this is silently accepted on other targets. // NO-UNSAFE-FP-ATOMICS-NOT: "amdgpu-unsafe-fp-atomics" // UNSAFE-FP-ATOMICS-DAG: attributes [[ATTR]] = {{.*}}"amdgpu-unsafe-fp-atomics"="true" +// CHECK-DAG: attributes [[LIB_FUN]] = {{.*}}"amdgpu-lib-fun" diff --git a/clang/test/CodeGenOpenCL/amdgpu-attrs.cl b/clang/test/CodeGenOpenCL/amdgpu-attrs.cl index b0dfc97b53b2c..bce7739c7a429 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-attrs.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-attrs.cl @@ -147,6 +147,9 @@ kernel void default_kernel() { // CHECK: define{{.*}} amdgpu_kernel void @default_kernel() [[DEFAULT_KERNEL_ATTRS:#[0-9]+]] } +__attribute__((amdgpu_lib_fun)) void lib_fun() { +// CHECK: define{{.*}} void @lib_fun() [[LIB_FUN:#[0-9]+]] +} // Make sure this is silently accepted on other targets. // X86-NOT: "amdgpu-flat-work-group-size" @@ -191,3 +194,4 @@ kernel void default_kernel() { // CHECK-DAG: attributes [[A_FUNCTION]] = {{.*}} // CHECK-DAG: attributes [[DEFAULT_KERNEL_ATTRS]] = {{.*}} "amdgpu-flat-work-group-size"="1,256" +// CHECK-DAG: attributes [[LIB_FUN]] = {{.*}} "amdgpu-lib-fun" 
@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Add a function attribute "amdgpu-lib-fun" to indicate that the function needs special handling in backend. Basically it will not be internalized so that it will not be removed by DCE after internalization. This is to keep the library functions that are not called by users' code but will be called by instructions generated by LLVM passes or instruction selection, e.g. sanitizers or lowering of 128 bit integer divisioin.


Full diff: https://github.com/llvm/llvm-project/pull/74737.diff

5 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+7)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+11)
  • (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+2)
  • (modified) clang/test/CodeGenCUDA/amdgpu-func-attrs.cu (+8)
  • (modified) clang/test/CodeGenOpenCL/amdgpu-attrs.cl (+4)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td index 121ed203829cec..676faddd2d1aca 100644 --- a/clang/include/clang/Basic/Attr.td +++ b/clang/include/clang/Basic/Attr.td @@ -2011,6 +2011,13 @@ def AMDGPUNumVGPR : InheritableAttr { let Subjects = SubjectList<[Function], ErrorDiag, "kernel functions">; } +def AMDGPULibFun : InheritableAttr { + let Spellings = [Clang<"amdgpu_lib_fun">]; + let Documentation = [AMDGPULibFunDocs]; + let Subjects = SubjectList<[Function]>; + let SimpleHandler = 1; +} + def AMDGPUKernelCall : DeclOrTypeAttr { let Spellings = [Clang<"amdgpu_kernel">]; let Documentation = [Undocumented]; diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td index 88f7c65e6e847b..c2c77d4d1d8171 100644 --- a/clang/include/clang/Basic/AttrDocs.td +++ b/clang/include/clang/Basic/AttrDocs.td @@ -2693,6 +2693,17 @@ An error will be given if: }]; } +def AMDGPULibFunDocs : Documentation { + let Category = DocCatAMDGPUAttributes; + let Content = [{ +The ``amdgpu_lib_fun`` attribute can be applied to a function for AMDGPU target +to indicate it is a library function which are handled specially in backend. +An AMDGPU library function is not internalized and can be used to fullfill +calls generated by LLVM passes or instruction selection. Unused AMDGPU library +functions will be eliminated by the backend. + }]; +} + def DocCatCallingConvs : DocumentationCategory<"Calling Conventions"> { let Content = [{ Clang supports several different calling conventions, depending on the target diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp index b654e3f12af8d4..1a56ee3692d072 100644 --- a/clang/lib/CodeGen/Targets/AMDGPU.cpp +++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp @@ -356,6 +356,8 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes( if (NumVGPR != 0) F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR)); } + if (FD->getAttr<AMDGPULibFunAttr>()) + F->addFnAttr("amdgpu-lib-fun"); } /// Emits control constants used to change per-architecture behaviour in the diff --git a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu b/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu index 89add87919c12d..e319cd4809e0dd 100644 --- a/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu +++ b/clang/test/CodeGenCUDA/amdgpu-func-attrs.cu @@ -8,6 +8,9 @@ // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \ // RUN: -o - -x hip %s -munsafe-fp-atomics \ // RUN: | FileCheck -check-prefix=NO-UNSAFE-FP-ATOMICS %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \ +// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \ +// RUN: | FileCheck %s #include "Inputs/cuda.h" @@ -15,8 +18,13 @@ __device__ void test() { // UNSAFE-FP-ATOMICS: define{{.*}} void @_Z4testv() [[ATTR:#[0-9]+]] } +__attribute__((amdgpu_lib_fun)) __device__ void lib_fun() { +// CHECK: define{{.*}} void @_Z7lib_funv() [[LIB_FUN:#[0-9]+]] +} + // Make sure this is silently accepted on other targets. // NO-UNSAFE-FP-ATOMICS-NOT: "amdgpu-unsafe-fp-atomics" // UNSAFE-FP-ATOMICS-DAG: attributes [[ATTR]] = {{.*}}"amdgpu-unsafe-fp-atomics"="true" +// CHECK-DAG: attributes [[LIB_FUN]] = {{.*}}"amdgpu-lib-fun" diff --git a/clang/test/CodeGenOpenCL/amdgpu-attrs.cl b/clang/test/CodeGenOpenCL/amdgpu-attrs.cl index b0dfc97b53b2c5..bce7739c7a429f 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-attrs.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-attrs.cl @@ -147,6 +147,9 @@ kernel void default_kernel() { // CHECK: define{{.*}} amdgpu_kernel void @default_kernel() [[DEFAULT_KERNEL_ATTRS:#[0-9]+]] } +__attribute__((amdgpu_lib_fun)) void lib_fun() { +// CHECK: define{{.*}} void @lib_fun() [[LIB_FUN:#[0-9]+]] +} // Make sure this is silently accepted on other targets. // X86-NOT: "amdgpu-flat-work-group-size" @@ -191,3 +194,4 @@ kernel void default_kernel() { // CHECK-DAG: attributes [[A_FUNCTION]] = {{.*}} // CHECK-DAG: attributes [[DEFAULT_KERNEL_ATTRS]] = {{.*}} "amdgpu-flat-work-group-size"="1,256" +// CHECK-DAG: attributes [[LIB_FUN]] = {{.*}} "amdgpu-lib-fun" 
@yxsamliu
Copy link
Collaborator Author

this patch is used by #74741

@yxsamliu
Copy link
Collaborator Author

ping

@Artem-B
Copy link
Member

Artem-B commented Jan 9, 2024

This sounds like it may be useful outside of AMDGPU back-end.
@jhuber6 this is something that may come handy for implementing general library functions.

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 9, 2024

My use-case is more to be able to write functions like is_wavefrontsize64() in regular C++ code. This would require some way to emit builtins for these.

I believe the use-case here is a workaround for the issues caused by library ordering? I'm guessing this is related to the problems caused by prematurely optimizing out library functions that later passes wanted to depend on.

@Artem-B
Copy link
Member

Artem-B commented Jan 9, 2024

I was thinking of implementing libm/libc for nvptx, which would produce an IR library . We'll still need to keep the functions around if they are not used explicitly, because we may need them to fulfill libcalls later in the compilation pipeline. Sort of a libdevice replacement which can be used for libcall materialization.

But you're right, with RDC object files used for offloading it's probably not necessary.

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 9, 2024

I was thinking of implementing libm/libc for nvptx, which would produce an IR library . We'll still need to keep the functions around if they are not used explicitly, because we may need them to fulfill libcalls later in the compilation pipeline. Sort of a libdevice replacement which can be used for libcall materialization.

But you're right, with RDC object files used for offloading it's probably not necessary.

That's one problem I'm unsure of how to solve currently. Right now when doing LTO, there's a list of "libfuncs" that backends can emit. If the function is one of these we can't interalize / optimize out the symbol. I was attempting to relax this in https://reviews.llvm.org/D154364 at some point, because ideally we don't want to do this if the backend doesn't use them, but we don't have that logic right now.

Right now, the issue is how to handle divergence for different targets. So, for libc/libm we just build the same library N times for each architecture. This allows us to use things like __CUDA_ARCH__ and __has_builtin as normal because it has a unique file for each architecture. However, I really don't think that N files is a scalable solution and would like to be able to create generic IR for a single file. Basically I'd like to have something like libdevice.bc where it's just one file. The problem is that we don't have a good, robust way to express this. Nvidia uses their reflection you're well aware of, and AMD uses external globals which need to be resolved by some link job.

One reason I'd like this is because I'd really like to be able to provide my crt1.o and libc.a as exported targets such that someone can do clang++ --target=amdgcn-amd-amdhsa -mcpu=native foo.cpp crt1.o -lc and have it work correctly. Right now fishing out the correct file requires linker wrapper magic.

@MaskRay
Copy link
Member

MaskRay commented Jan 9, 2024

An AMDGPU library function is not internalized and can be used to fullfill calls generated by LLVM passes or instruction selection.

I am confused by the description of "internalized". Do you refer to LTO internalization? You can leverage llvm.used to disable LTO internalization.

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 9, 2024

An AMDGPU library function is not internalized and can be used to fullfill calls generated by LLVM passes or instruction selection.

I am confused by the description of "internalized". Do you refer to LTO internalization? You can leverage llvm.used to disable LTO internalization.

My guess is that the function should be considered used and then thrown away by the backend.

@yxsamliu
Copy link
Collaborator Author

An AMDGPU library function is not internalized and can be used to fullfill calls generated by LLVM passes or instruction selection.

I am confused by the description of "internalized". Do you refer to LTO internalization? You can leverage llvm.used to disable LTO internalization.

Yes I mean LTO internalization. We want keep them to the backend but we also want to remove them if they are not used by the backend. llvm.used won't tell us that we can remove them since it could be specified by the users for non-amdgpu-library functions.

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 12, 2024

An AMDGPU library function is not internalized and can be used to fullfill calls generated by LLVM passes or instruction selection.

I am confused by the description of "internalized". Do you refer to LTO internalization? You can leverage llvm.used to disable LTO internalization.

Yes I mean LTO internalization. We want keep them to the backend but we also want to remove them if they are not used by the backend. llvm.used won't tell us that we can remove them since it could be specified by the users for non-amdgpu-library functions.

I wonder if we could just define another llvm.used similar to llvm.compiler.used for this special case where the variable can be thrown away by the backend.

@yxsamliu
Copy link
Collaborator Author

An AMDGPU library function is not internalized and can be used to fullfill calls generated by LLVM passes or instruction selection.

I am confused by the description of "internalized". Do you refer to LTO internalization? You can leverage llvm.used to disable LTO internalization.

Yes I mean LTO internalization. We want keep them to the backend but we also want to remove them if they are not used by the backend. llvm.used won't tell us that we can remove them since it could be specified by the users for non-amdgpu-library functions.

I wonder if we could just define another llvm.used similar to llvm.compiler.used for this special case where the variable can be thrown away by the backend.

we need this attribute because AMDGPU target does not support ISA level linking, otherwise we could just link with a library after LLVM codegen. I doubt this attribute is generic enough to introduce something like llvm.used. Also, we do not need to prevent linker from discarding the symbol, therefore it is unnecessary to put them in some global variables. A simple function attribute is suffice.

Add a function attribute "amdgpu-lib-fun" to indicate that the function needs special handling in backend. Basically it will not be internalized so that it will not be removed by DCE after internalization. This is to keep the library functions that are not called by users' code but will be called by instructions generated by LLVM passes or instruction selection, e.g. sanitizers or lowering of 128 bit integer divisioin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

5 participants