Skip to content

Conversation

@boomanaiden154
Copy link
Contributor

@boomanaiden154 boomanaiden154 commented Nov 4, 2025

This helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload. We also saw a ~0.2% performance increase on an internal server workload when enabling this.

I also tested this against all of the non-flaky tests in our large C++ codebase and found a minimal number of issues that all happened to be in user code.

This helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload.
@boomanaiden154 boomanaiden154 force-pushed the clang-destructor-dead-on-return branch from 7fb6339 to 7a3dec4 Compare December 1, 2025 16:11
@boomanaiden154 boomanaiden154 marked this pull request as ready for review December 1, 2025 17:57
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang labels Dec 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 1, 2025

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Aiden Grossman (boomanaiden154)

Changes

This helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload. We also saw a ~0.2% performance increase on an internal server workload when enabling this.

I also tested this against all of the non-flaky tests in our large C++ codebase and found a minimal number of issues that all happened to be in user code.


Patch is 5.29 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/166276.diff

109 Files Affected:

  • (modified) clang/lib/CodeGen/CGCall.cpp (+11-1)
  • (modified) clang/test/CodeGen/paren-list-agg-init.cpp (+4-4)
  • (modified) clang/test/CodeGen/temporary-lifetime.cpp (+6-6)
  • (modified) clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/amdgcn-func-arg.cpp (+3-3)
  • (modified) clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp (+3-3)
  • (modified) clang/test/CodeGenCXX/for-range.cpp (+12-12)
  • (modified) clang/test/CodeGenCXX/gh62818.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/nrvo.cpp (+116-116)
  • (modified) clang/test/CodeGenCXX/pr13396.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/ptrauth-apple-kext-indirect-virtual-dtor-call.cpp (+3-3)
  • (modified) clang/test/CodeGenObjCXX/objc-struct-cxx-abi.mm (+3-3)
  • (modified) clang/test/CodeGenObjCXX/ptrauth-struct-cxx-abi.mm (+1-1)
  • (modified) clang/test/DebugInfo/CXX/bpf-structors.cpp (+1-1)
  • (modified) clang/test/DebugInfo/CXX/trivial_abi.cpp (+1-1)
  • (modified) clang/test/OpenMP/amdgcn_target_global_constructor.cpp (+3-3)
  • (modified) clang/test/OpenMP/distribute_firstprivate_codegen.cpp (+141-141)
  • (modified) clang/test/OpenMP/distribute_lastprivate_codegen.cpp (+145-145)
  • (modified) clang/test/OpenMP/distribute_parallel_for_firstprivate_codegen.cpp (+188-188)
  • (modified) clang/test/OpenMP/distribute_parallel_for_lastprivate_codegen.cpp (+211-211)
  • (modified) clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp (+24-24)
  • (modified) clang/test/OpenMP/distribute_parallel_for_private_codegen.cpp (+88-88)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_firstprivate_codegen.cpp (+112-112)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_lastprivate_codegen.cpp (+144-144)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp (+510-510)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_private_codegen.cpp (+132-132)
  • (modified) clang/test/OpenMP/distribute_private_codegen.cpp (+64-64)
  • (modified) clang/test/OpenMP/distribute_simd_firstprivate_codegen.cpp (+88-88)
  • (modified) clang/test/OpenMP/distribute_simd_lastprivate_codegen.cpp (+116-116)
  • (modified) clang/test/OpenMP/distribute_simd_private_codegen.cpp (+108-108)
  • (modified) clang/test/OpenMP/for_firstprivate_codegen.cpp (+40-40)
  • (modified) clang/test/OpenMP/for_lastprivate_codegen.cpp (+202-202)
  • (modified) clang/test/OpenMP/for_linear_codegen.cpp (+102-102)
  • (modified) clang/test/OpenMP/for_private_codegen.cpp (+33-33)
  • (modified) clang/test/OpenMP/for_reduction_codegen.cpp (+647-311)
  • (modified) clang/test/OpenMP/master_taskloop_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_in_reduction_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/master_taskloop_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_simd_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_simd_in_reduction_codegen.cpp (+71-71)
  • (modified) clang/test/OpenMP/master_taskloop_simd_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_simd_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/parallel_copyin_codegen.cpp (+70-70)
  • (modified) clang/test/OpenMP/parallel_firstprivate_codegen.cpp (+74-74)
  • (modified) clang/test/OpenMP/parallel_for_linear_codegen.cpp (+13-13)
  • (modified) clang/test/OpenMP/parallel_master_codegen.cpp (+8-8)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_firstprivate_codegen.cpp (+200-200)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_lastprivate_codegen.cpp (+231-231)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_simd_firstprivate_codegen.cpp (+213-213)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_simd_lastprivate_codegen.cpp (+306-306)
  • (modified) clang/test/OpenMP/parallel_private_codegen.cpp (+49-49)
  • (modified) clang/test/OpenMP/parallel_reduction_codegen.cpp (+113-113)
  • (modified) clang/test/OpenMP/scope_codegen.cpp (+358-358)
  • (modified) clang/test/OpenMP/sections_firstprivate_codegen.cpp (+41-41)
  • (modified) clang/test/OpenMP/sections_lastprivate_codegen.cpp (+83-83)
  • (modified) clang/test/OpenMP/sections_private_codegen.cpp (+30-30)
  • (modified) clang/test/OpenMP/sections_reduction_codegen.cpp (+43-43)
  • (modified) clang/test/OpenMP/simd_private_taskloop_codegen.cpp (+112-112)
  • (modified) clang/test/OpenMP/single_codegen.cpp (+455-455)
  • (modified) clang/test/OpenMP/single_firstprivate_codegen.cpp (+41-41)
  • (modified) clang/test/OpenMP/single_private_codegen.cpp (+30-30)
  • (modified) clang/test/OpenMP/target_has_device_addr_codegen.cpp (+65-65)
  • (modified) clang/test/OpenMP/target_in_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/target_parallel_generic_loop_codegen-1.cpp (+84-84)
  • (modified) clang/test/OpenMP/target_teams_distribute_firstprivate_codegen.cpp (+68-68)
  • (modified) clang/test/OpenMP/target_teams_distribute_lastprivate_codegen.cpp (+118-118)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_firstprivate_codegen.cpp (+186-186)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_lastprivate_codegen.cpp (+172-172)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_private_codegen.cpp (+150-150)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+216-216)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_lastprivate_codegen.cpp (+144-144)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_private_codegen.cpp (+194-194)
  • (modified) clang/test/OpenMP/target_teams_distribute_private_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp (+98-98)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp (+116-116)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp (+106-106)
  • (modified) clang/test/OpenMP/target_teams_generic_loop_private_codegen.cpp (+102-102)
  • (modified) clang/test/OpenMP/task_codegen.cpp (+3057-1972)
  • (modified) clang/test/OpenMP/task_in_reduction_codegen.cpp (+57-57)
  • (modified) clang/test/OpenMP/taskloop_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_in_reduction_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/taskloop_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_simd_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_simd_in_reduction_codegen.cpp (+71-71)
  • (modified) clang/test/OpenMP/taskloop_simd_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_simd_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/teams_distribute_firstprivate_codegen.cpp (+68-68)
  • (modified) clang/test/OpenMP/teams_distribute_lastprivate_codegen.cpp (+139-139)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_firstprivate_codegen.cpp (+100-100)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_lastprivate_codegen.cpp (+205-205)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_num_threads_codegen.cpp (+14-14)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_private_codegen.cpp (+82-82)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+130-130)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_lastprivate_codegen.cpp (+144-144)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_num_threads_codegen.cpp (+252-252)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_private_codegen.cpp (+130-130)
  • (modified) clang/test/OpenMP/teams_distribute_private_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/teams_distribute_simd_firstprivate_codegen.cpp (+98-98)
  • (modified) clang/test/OpenMP/teams_distribute_simd_lastprivate_codegen.cpp (+116-116)
  • (modified) clang/test/OpenMP/teams_distribute_simd_private_codegen.cpp (+106-106)
  • (modified) clang/test/OpenMP/teams_firstprivate_codegen.cpp (+74-74)
  • (modified) clang/test/OpenMP/teams_generic_loop_private_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/teams_private_codegen.cpp (+80-80)
  • (modified) clang/test/OpenMP/threadprivate_codegen.cpp (+392-392)
  • (modified) clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected (+13-13)
  • (modified) clang/test/utils/update_cc_test_checks/Inputs/explicit-template-instantiation.cpp.expected (+3-3)
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index efacb3cc04c01..ee6e13fd1c1a5 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -2767,7 +2767,8 @@ void CodeGenModule::ConstructAttributeList(StringRef Name, } // Apply `nonnull`, `dereferenceable(N)` and `align N` to the `this` argument, - // unless this is a thunk function. + // unless this is a thunk function. Add dead_on_return to the `this` argument + // in base class destructors to aid in DSE. // FIXME: fix this properly, https://reviews.llvm.org/D100388 if (FI.isInstanceMethod() && !IRFunctionArgs.hasInallocaArg() && !FI.arg_begin()->type->isVoidPointerType() && !IsThunk) { @@ -2800,6 +2801,15 @@ void CodeGenModule::ConstructAttributeList(StringRef Name, .getAsAlign(); Attrs.addAlignmentAttr(Alignment); + if (isa_and_nonnull<CXXDestructorDecl>( + CalleeInfo.getCalleeDecl().getDecl())) { + auto *ClassDecl = dyn_cast<CXXRecordDecl>( + CalleeInfo.getCalleeDecl().getDecl()->getDeclContext()); + if (ClassDecl->getNumBases() == 0 && ClassDecl->getNumVBases() == 0) { + Attrs.addAttribute(llvm::Attribute::DeadOnReturn); + } + } + ArgAttrs[IRArgs.first] = llvm::AttributeSet::get(getLLVMContext(), Attrs); } diff --git a/clang/test/CodeGen/paren-list-agg-init.cpp b/clang/test/CodeGen/paren-list-agg-init.cpp index e30777ecc07d6..561bf2b5eb9c4 100644 --- a/clang/test/CodeGen/paren-list-agg-init.cpp +++ b/clang/test/CodeGen/paren-list-agg-init.cpp @@ -394,9 +394,9 @@ namespace gh61145 { // a.k.a. Vec::Vec(Vec&&) // CHECK-NEXT: call void @_ZN7gh611453VecC1EOS0_(ptr noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]], ptr noundef nonnull align 1 dereferenceable(1) [[V]]) // a.k.a. S1::~S1() - // CHECK-NEXT: call void @_ZN7gh611452S1D1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]]) + // CHECK-NEXT: call void @_ZN7gh611452S1D1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]]) // a.k.a.Vec::~Vec() - // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[V]]) + // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[V]]) // CHECK-NEXT: ret void template <int I> void make1() { @@ -416,9 +416,9 @@ namespace gh61145 { // CHECK-NEXT: [[C:%.*c.*]] = getelementptr inbounds nuw [[STRUCT_S2]], ptr [[AGG_TMP_ENSURED]], i32 0, i32 // CHECK-NEXT: store i8 0, ptr [[C]], align 1 // a.k.a. S2::~S2() - // CHECK-NEXT: call void @_ZN7gh611452S2D1Ev(ptr noundef nonnull align 1 dereferenceable(2) [[AGG_TMP_ENSURED]]) + // CHECK-NEXT: call void @_ZN7gh611452S2D1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(2) [[AGG_TMP_ENSURED]]) // a.k.a. Vec::~Vec() - // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[V]]) + // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[V]]) // CHECK-NEXT: ret void template <int I> void make2() { diff --git a/clang/test/CodeGen/temporary-lifetime.cpp b/clang/test/CodeGen/temporary-lifetime.cpp index 04087292b2c70..44d1235f15c86 100644 --- a/clang/test/CodeGen/temporary-lifetime.cpp +++ b/clang/test/CodeGen/temporary-lifetime.cpp @@ -24,12 +24,12 @@ void Test1() { // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]]) // CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR:[^ ]+]]) // CHECK-DTOR: call void @_Z3FooIRK1AEvOT_ - // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR]]) + // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR]]) // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]]) // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]]) // CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR:[^ ]+]]) // CHECK-DTOR: call void @_Z3FooIRK1AEvOT_ - // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR]]) + // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR]]) // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]]) // CHECK-DTOR: } @@ -61,9 +61,9 @@ void Test2() { // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR2:.+]]) // CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR2:[^ ]+]]) // CHECK-DTOR: call void @_Z3FooIRK1AEvOT_ - // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR2]]) + // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR2]]) // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR2]]) - // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR1]]) + // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR1]]) // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR1]]) // CHECK-DTOR: } @@ -155,12 +155,12 @@ void Test7() { // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]]) // CHECK-DTOR: call void @_Z3BazI1AET_v({{.*}} %[[SLOT:[^ ]+]]) // CHECK-DTOR: call void @_Z3FooI1AEvOT_({{.*}} %[[SLOT]]) - // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[SLOT]]) + // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[SLOT]]) // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]]) // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]]) // CHECK-DTOR: call void @_Z3BazI1AET_v({{.*}} %[[SLOT:[^ ]+]]) // CHECK-DTOR: call void @_Z3FooI1AEvOT_({{.*}} %[[SLOT]]) - // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[SLOT]]) + // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[SLOT]]) // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]]) // CHECK-DTOR: } Foo(Baz<A>()); diff --git a/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp b/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp index 3c2a624bd4f95..e05f8133321c7 100644 --- a/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp +++ b/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp @@ -75,7 +75,7 @@ int x; // CHECK-NEXT: [[A:%.*]] = alloca [[CLASS_A:%.*]], align 4, addrspace(5) // CHECK-NEXT: [[A_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A]] to ptr // CHECK-NEXT: call void @_ZN1AC1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]]) -// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]]) +// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]]) // CHECK-NEXT: ret void // void func3() { diff --git a/clang/test/CodeGenCXX/amdgcn-func-arg.cpp b/clang/test/CodeGenCXX/amdgcn-func-arg.cpp index a5f83dc91b038..bc20c33ec4d0f 100644 --- a/clang/test/CodeGenCXX/amdgcn-func-arg.cpp +++ b/clang/test/CodeGenCXX/amdgcn-func-arg.cpp @@ -43,9 +43,9 @@ void func_with_indirect_arg(A a) { // CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 [[A_ASCAST]], i64 4, i1 false) // CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5) // CHECK-NEXT: call void @_Z22func_with_indirect_arg1A(ptr addrspace(5) noundef [[AGG_TMP_ASCAST_ASCAST]]) -// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]]) +// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]]) // CHECK-NEXT: call void @_Z17func_with_ref_argR1A(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]]) -// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]]) +// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]]) // CHECK-NEXT: ret void // void test_indirect_arg_auto() { @@ -61,7 +61,7 @@ void test_indirect_arg_auto() { // CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 addrspacecast (ptr addrspace(1) @g_a to ptr), i64 4, i1 false) // CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5) // CHECK-NEXT: call void @_Z22func_with_indirect_arg1A(ptr addrspace(5) noundef [[AGG_TMP_ASCAST_ASCAST]]) -// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]]) +// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]]) // CHECK-NEXT: call void @_Z17func_with_ref_argR1A(ptr noundef nonnull align 4 dereferenceable(4) addrspacecast (ptr addrspace(1) @g_a to ptr)) // CHECK-NEXT: ret void // diff --git a/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp b/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp index 4eafa720e0cb4..a764ba31539eb 100644 --- a/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp +++ b/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp @@ -217,7 +217,7 @@ void ArrayInit() { // CHECK: [[ARRAY_DESTROY_BODY2]]: // CHECK-NEXT: %arraydestroy.elementPast = phi ptr [ %1, %cleanup ], [ %arraydestroy.element, %[[ARRAY_DESTROY_BODY2]] ] // CHECK-NEXT: %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1 - // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element) + // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element) // CHECK-NEXT: %arraydestroy.done = icmp eq ptr %arraydestroy.element, %arr // CHECK-NEXT: br i1 %arraydestroy.done, label %[[ARRAY_DESTROY_DONE2]], label %[[ARRAY_DESTROY_BODY2]] @@ -265,7 +265,7 @@ void ArraySubobjects() { // CHECK: [[ARRAY_DESTROY_BODY]]: // CHECK-NEXT: %arraydestroy.elementPast = phi ptr [ %0, %if.then ], [ %arraydestroy.element, %[[ARRAY_DESTROY_BODY]] ] // CHECK-NEXT: %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1 - // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element) + // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element) // CHECK-NEXT: %arraydestroy.done = icmp eq ptr %arraydestroy.element, %arr2 // CHECK-NEXT: br i1 %arraydestroy.done, label %[[ARRAY_DESTROY_DONE]], label %[[ARRAY_DESTROY_BODY]] @@ -277,7 +277,7 @@ void ArraySubobjects() { // CHECK: [[ARRAY_DESTROY_BODY2]]: // CHECK-NEXT: %arraydestroy.elementPast4 = phi ptr [ %1, %[[ARRAY_DESTROY_DONE]] ], [ %arraydestroy.element5, %[[ARRAY_DESTROY_BODY2]] ] // CHECK-NEXT: %arraydestroy.element5 = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast4, i64 -1 - // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element5) + // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element5) // CHECK-NEXT: %arraydestroy.done6 = icmp eq ptr %arraydestroy.element5, [[ARRAY_BEGIN]] // CHECK-NEXT: br i1 %arraydestroy.done6, label %[[ARRAY_DESTROY_DONE2:.+]], label %[[ARRAY_DESTROY_BODY2]] @@ -384,7 +384,7 @@ void NewArrayInit() { // CHECK: arraydestroy.body: // CHECK-NEXT: %arraydestroy.elementPast = phi ptr [ %{{.*}}, %if.then ], [ %arraydestroy.element, %arraydestroy.body ] // CHECK-NEXT: %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1 - // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element) + // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element) // CHECK-NEXT: %arraydestroy.done = icmp eq ptr %arraydestroy.element, %0 // CHECK-NEXT: br i1 %arraydestroy.done, label %arraydestroy.done{{.*}}, label %arraydestroy.body diff --git a/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp b/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp index 24b1a4dd42977..af29120feb5bb 100644 --- a/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp +++ b/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp @@ -41,11 +41,11 @@ void glob_delete_A(A *a) { ::delete a; } // CHECK: icmp eq ptr %[[a]], null // CHECK: br i1 -// CHECK-ITANIUM: call void @_ZN1AD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %[[a]]) +// CHECK-ITANIUM: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %[[a]]) // CHECK-ITANIUM-NEXT: call void @_ZdlPvm(ptr noundef %[[a]], i64 noundef 8) -// CHECK-MSABI64: call void @"??1A@@QEAA@XZ"(ptr noundef nonnull align 8 dereferenceable(8) %[[a]]) +// CHECK-MSABI64: call void @"??1A@@QEAA@XZ"(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %[[a]]) // CHECK-MSABI64-NEXT: call void @"??3@YAXPEAX_K@Z"(ptr noundef %[[a]], i64 noundef 8) -// CHECK-MSABI32: call x86_thiscallcc void @"??1A@@QAE@XZ"(ptr noundef nonnull align 4 dereferenceable(4) %[[a]]) +// CHECK-MSABI32: call x86_thiscallcc void @"??1A@@QAE@XZ"(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) %[[a]]) // CHECK-MSABI32-NEXT: call void @"??3@YAXPAXI@Z"(ptr noundef %[[a]], i32 noundef 4) struct B { diff --git a/clang/test/CodeGenCXX/for-range.cpp b/clang/test/CodeGenCXX/for-range.cpp index 088a34647c374..b9706855f658c 100644 --- a/clang/test/CodeGenCXX/for-range.cpp +++ b/clang/test/CodeGenCXX/for-range.cpp @@ -53,7 +53,7 @@ extern B array[5]; // CHECK: for.body: // CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[__BEGIN1]], align 8 // CHECK-NEXT: call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP2]]) -// CHECK-NEXT: call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3:[0-9]+]] +// CHECK-NEXT: call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3:[0-9]+]] // CHECK-NEXT: br label [[FOR_INC:%.*]] // CHECK: for.inc: // CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[__BEGIN1]], align 8 @@ -61,7 +61,7 @@ extern B array[5]; // CHECK-NEXT: store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8 // CHECK-NEXT: br label [[FOR_COND]] // CHECK: for.end: -// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]] +// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]] // CHECK-NEXT: ret void // void for_array() { @@ -81,10 +81,10 @@ void for_array() { // CHECK-NEXT: call void @_ZN1AC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) // CHECK-NEXT: call void @_ZN1CC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) // CHECK-NEXT: store ptr [[REF_TMP]], ptr [[__RANGE1]], align 8 -// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2:![0-9]+]] // CHECK-NEXT: [[CALL:%.*]] = call noundef ptr @_Z5beginR1C(ptr noundef nonnull align 1 dereferenceable(1) [[TMP0]]) // CHECK-NEXT: store ptr [[CALL]], ptr [[__BEGIN1]], align 8 -// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]] // CHECK-NEXT: [[CALL1:%.*]] = call noundef ptr @_Z3endR1C(ptr noundef nonnull align 1 dereferenceable(1) [[TMP1]]) // CHECK-NEXT: store ptr [[CALL1]], ptr [[__END1]], align 8 // CHECK-NEXT: br label [[FOR_COND:%.*]] @@ -94,12 +94,12 @@ void for_array() { // CHECK-NEXT: [[CMP:%.*]] = icmp ne ptr [[TMP2]], [[TMP3]] // CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]] // CHECK: for.cond.cleanup: -// CHECK-NEXT: call void @_ZN1CD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]] +// CHECK-NEXT: call void @_ZN1CD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]] // CHECK-NEXT: br label [[FOR_END:%.*]] // CHECK: for.body: // CHECK-NEXT: [[TMP4:%.*]] = load ptr, ptr [[__BEGIN1]], align 8 // CHECK-NEXT: call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP4]]) -// CHECK-NEXT: call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]] +// CHECK-NEXT: call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]] // CHECK-NEXT: br label [[FOR_INC:%.*]] // CHECK: for.inc: // CHECK-NEXT: [[TMP5:%.*]] = load ptr, ptr [[__BEGIN1]], align 8 @@ -107,7 +107,7 @@ void for_array() { // CHECK-NEXT: store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8 // CHECK-NEXT: br label [[FOR_COND]] // CHECK: for.end: -// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]] +// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]] // CHECK-NEXT: ret void // void for_range() { @@ -127,10 +127,10 @@ void for_range() { // CHECK-NEXT: call void @_ZN1AC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) // CHECK-NEXT: call void @_ZN1DC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) // CHECK-NEXT: store ptr [[REF_TMP]], ptr [[__RANGE1]], align 8 -// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8 +// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]] // CHECK-NEXT: [[CALL:%.*]] = call noundef ptr @_ZN1D5beginEv(ptr noundef nonnull align 1 dereferenceable(1) [[TMP0]]) // CHECK-NEXT: store ptr [[CALL]], ptr [[__BEGIN1]], align 8 -// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8 +// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]] // CHECK-NEXT: [[CALL1:%.*]] = call noundef ptr @_ZN1D3endEv(ptr noundef nonnull align 1 dereferenceable(1) [[TMP1]]) // CHECK-NEXT: store ptr [[CALL1]], ptr [[__END1]], align 8 // CHECK-NEXT: br label [[FOR_COND:%.*]] @@ -140,12 +140,12 @@ void for_range() { // CHECK-NEXT: [[CMP:%.*]] = icmp ne ptr [[TMP2]], [[TMP3]] // CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]] // CHECK: for.cond.cleanup: -// CHECK-NEXT: call void @_ZN1DD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]] +// CHECK-NEXT: call void @_ZN1DD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]] // CHECK-NEXT: br label [[FOR_END:%.*]] // CHECK: for.body: // CHECK-NEXT: [[TMP4:%.*]] = load ptr, ptr [[__BEGIN1]], align 8 // CHECK-NEXT: call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP4]]) -// CHECK-NEXT: call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]] +// CHECK-NEXT: call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]] // CHECK-NEXT: br label [[FOR_INC:%.*]] // CHECK: for.inc: // CHECK-NEXT: [[TMP5:%.*]] = load ptr, ptr [[__BEGIN1]], align 8 @@ -153,7 +153,7 @@ void for_range() { // CHECK-NEXT: store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8 // CHECK-NEXT: br label [[FOR_COND]] // CHECK: for.end: -// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]] +// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]] // CHECK-NEXT: ret void // void for_member_range() { diff --git a/clang/test/CodeGenCXX/gh62818.cpp b/clang/test/CodeGenCXX/gh62818.cpp index ec91b40fca077..f903679cd6b68 100644 --- a/clang/test/CodeGenCXX/gh62818.cpp +++ b/clang/test/CodeGenCXX/gh62818.c... [truncated] 
@boomanaiden154
Copy link
Contributor Author

The actual change is in the first commit (7a3dec4). I've separated the test changes out into 6ec0952 to hopefully make review a bit easier.

CC @philnik777 Who came up with the idea and requested this.

@rnk
Copy link
Collaborator

rnk commented Dec 1, 2025

This optimization exploits the fact that it's undefined behavior to read from an object after its been destroyed. Given the overall shift in how the industry feels about compilers exploiting undefined behavior, I want to push to add an flag to control this. Think of the people who use -fno-delete-null-pointer-checks. The kinds of people who use that are going to want to disable this kind of optimization. This optimization should absolutely be on-by-default, we'd just have a way to opt out, mentioned in release notes, etc etc.

I'd also like to better understand why base classes matter for this annotation. Until very recently, basic_string used a bunch of compressed pair empty bases instead of [[no_unique_address]], so adding a base class might create a surprising performance regression with the change as written.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, we should do it, this is a valuable optimization. (Commenting twice to push out the inline comments).

.getAsAlign();
Attrs.addAlignmentAttr(Alignment);

if (isa_and_nonnull<CXXDestructorDecl>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code golf: You can CSE the CalleeInfo.getCalleeDecl().getDecl() if you use dyn_cast_or_nonnull.

CalleeInfo.getCalleeDecl().getDecl())) {
auto *ClassDecl = dyn_cast<CXXRecordDecl>(
CalleeInfo.getCalleeDecl().getDecl()->getDeclContext());
if (ClassDecl->getNumBases() == 0 && ClassDecl->getNumVBases() == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to limit this to only destructors of classes with no bases? Whatever the reason (caution, incremental change, etc), comments here would be appreciated.

Copy link
Contributor

@rjmccall rjmccall Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtual base subobjects aren't dead after a base subobject destructor call, but yeah, I can't think of a reason to limit this because of non-virtual bases alone. And even virtual bases are dead after a complete-object destructor call.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the potential undefined behavior here checked by msan? Do we need to disable adding this attribute if msan is enabled?

@boomanaiden154
Copy link
Contributor Author

Is the potential undefined behavior here checked by msan? Do we need to disable adding this attribute if msan is enabled?

Yes, use-after-destroy is checked by msan, but I don't believe we need to explicitly disable anything here to handle msan. Based on my understand, the msan instrumentation will add a call to __sanitizer_dtor_callback_fields at the end of the destructor, passing in the this pointer along with the object size. __sanitizer_dtor_callback_fields then modifies the shadow memory, which is a separate object that is not marked dead by this change.

CC @thurstond Who is more familiar with the internals of msan and would be able to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category

5 participants