Skip to content

Conversation

@goldsteinn
Copy link
Contributor

@goldsteinn goldsteinn commented Sep 12, 2023

[Inliner] Fix bug when propagating poison generating return attributes
Poison generating return attributes can't be propagated the same as
others, as they can change the behavior of other uses and/or create UB
where it otherwise wouldn't have occurred.

For example:

define nonnull ptr @foo() { %p = call ptr @bar() call void @use(ptr %p) ret ptr %p } 

If we inline @foo and propagate nonnull to @bar, it could change
the behavior of @use as instead of taking null, @use will
now be passed poison.

This can be even worth in a case like:

define nonnull ptr @foo() { %p = call noundef ptr @bar() ret ptr %p } 

Where propagating nonnull to @bar will cause UB on null return
of @bar (noundef + poison) where it previously wouldn't
have occurred.

To fix this, we only propagate poison generating return attributes if
either 1) The only use of the callsite to propagate too is return and
the callsite to propagate too doesn't have noundef. Or 2) the
callsite to be be inlined has noundef.

The former case ensures no new UB or poison values will be
added. The latter is UB anyways if the value is poison so we can go
ahead without worrying about behavior changes.

@goldsteinn goldsteinn requested review from a team as code owners September 12, 2023 00:29
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 llvm:transforms labels Sep 12, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-backend-x86

Changes
  • Add some additional tests for progagating attributes before inlining; NFC
  • Regen checks for old test; NFC
  • Split some tests for c/cpp; NFC
  • Use "best" ret attribute when propagating attributes during inlining
  • Also propagate noundef and align ret attributes during inlining
  • Propagate callee function memory access attributes before inlining
  • Propagate callee argument memory access attributes before inlining
  • Propagate some additional callee argument attributes before inlining
  • Propagate some callee function attributes to callsites before inlining

--

Patch is 373.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66036.diff

45 Files Affected:

  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c (+9-9)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c (+4-4)
  • (modified) clang/test/CodeGen/X86/bitscan-builtins.c (-3)
  • (added) clang/test/CodeGen/X86/bitscan-builtins.cpp (+71)
  • (modified) clang/test/CodeGen/X86/popcnt-builtins.c (-2)
  • (added) clang/test/CodeGen/X86/popcnt-builtins.cpp (+67)
  • (modified) clang/test/CodeGen/X86/rot-intrinsics.c (-7)
  • (added) clang/test/CodeGen/X86/rot-intrinsics.cpp (+165)
  • (modified) clang/test/CodeGen/X86/x86-bswap.c (-1)
  • (added) clang/test/CodeGen/X86/x86-bswap.cpp (+44)
  • (modified) clang/test/CodeGen/aarch64-ls64.c (+3-3)
  • (modified) clang/test/CodeGen/aarch64-sme-intrinsics/aarch64-sme-attrs.cpp (+5-5)
  • (modified) clang/test/CodeGen/fp-contract-fast-pragma.cpp (+1-1)
  • (modified) clang/test/CodeGen/fp-contract-on-pragma.cpp (+1-1)
  • (modified) clang/test/CodeGen/fp-contract-pragma.cpp (+1-1)
  • (modified) clang/test/CodeGenCUDA/cuda-builtin-vars.cu (+12-12)
  • (modified) clang/test/Headers/__clang_hip_cmath.hip (+10-10)
  • (modified) clang/test/Headers/__clang_hip_math.hip (+524-524)
  • (modified) clang/test/Headers/__clang_hip_math_ocml_rounded_ops.hip (+44-44)
  • (modified) clang/test/Headers/amdgcn_openmp_device_math.c (+12-12)
  • (modified) clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp (+12-12)
  • (modified) clang/test/Headers/hip-header.hip (+5-5)
  • (modified) clang/test/Headers/nvptx_device_cmath_functions.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_cmath_functions_cxx17.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_math_functions.c (+12-7)
  • (modified) clang/test/Headers/nvptx_device_math_functions.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_math_functions_cxx17.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_math_modf.cpp (+4-4)
  • (modified) clang/test/Headers/nvptx_device_math_sin.cpp (+4-4)
  • (modified) clang/test/Headers/nvptx_device_math_sin_cos.cpp (+6-6)
  • (modified) clang/test/Headers/openmp_device_math_isnan.cpp (+4-4)
  • (modified) clang/test/OpenMP/bug57757.cpp (+1-1)
  • (modified) llvm/include/llvm/Support/ModRef.h (+7)
  • (modified) llvm/lib/Transforms/Utils/InlineFunction.cpp (+187)
  • (added) llvm/test/Transforms/Inline/access-attributes-prop.ll (+498)
  • (modified) llvm/test/Transforms/Inline/assumptions-from-callsite-attrs.ll (+1-1)
  • (modified) llvm/test/Transforms/Inline/byval.ll (+122-73)
  • (modified) llvm/test/Transforms/Inline/memprof_inline.ll (+2-2)
  • (modified) llvm/test/Transforms/Inline/noalias-calls-always.ll (+6-6)
  • (modified) llvm/test/Transforms/Inline/noalias-calls.ll (+10-10)
  • (added) llvm/test/Transforms/Inline/ret_attr_align_and_noundef.ll (+171)
  • (modified) llvm/test/Transforms/Inline/ret_attr_update.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/norepeated-icp-3.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/norepeated-icp-4.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/uniqname.ll (+3-3)
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c index 44f8cbe2cc01739..642b08ac68ef122 100644 --- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c +++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c @@ -636,31 +636,31 @@ void test_core(void) { // CHECK-ASM: vlbb vsc = vec_load_len(cptrsc, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vuc = vec_load_len(cptruc, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vss = vec_load_len(cptrss, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vus = vec_load_len(cptrus, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vsi = vec_load_len(cptrsi, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vui = vec_load_len(cptrui, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vsl = vec_load_len(cptrsl, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vul = vec_load_len(cptrul, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vd = vec_load_len(cptrd, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vec_store_len(vsc, ptrsc, idx); diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c index 416ca0ddd1b4fe2..3f02565dfb488ce 100644 --- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c +++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c @@ -207,10 +207,10 @@ void test_core(void) { // CHECK-ASM: vlbb vf = vec_load_len(cptrf, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vd = vec_load_len(cptrd, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vec_store_len(vf, ptrf, idx); @@ -221,10 +221,10 @@ void test_core(void) { // CHECK-ASM: vstl vuc = vec_load_len_r(cptruc, 0); - // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 0, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 0, ptr readonly %{{.*}}) // CHECK-ASM: vlrl %{{.*}}, 0(%{{.*}}), 0 vuc = vec_load_len_r(cptruc, idx); - // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vlrlr vec_store_len_r(vuc, ptruc, 0); diff --git a/clang/test/CodeGen/X86/bitscan-builtins.c b/clang/test/CodeGen/X86/bitscan-builtins.c index a5a7808a82a234d..d75f7c1ab2f1ed4 100644 --- a/clang/test/CodeGen/X86/bitscan-builtins.c +++ b/clang/test/CodeGen/X86/bitscan-builtins.c @@ -1,9 +1,6 @@ // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-unknown-unknown -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-unknown-unknown -emit-llvm -o - | FileCheck %s - // PR33722 // RUN: %clang_cc1 -x c -ffreestanding %s -triple x86_64-unknown-unknown -fms-extensions -fms-compatibility-version=19.00 -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -x c++ -ffreestanding %s -triple x86_64-unknown-unknown -fms-extensions -fms-compatibility-version=19.00 -emit-llvm -o - | FileCheck %s #include diff --git a/clang/test/CodeGen/X86/bitscan-builtins.cpp b/clang/test/CodeGen/X86/bitscan-builtins.cpp new file mode 100644 index 000000000000000..ab8a0fa2f24773b --- /dev/null +++ b/clang/test/CodeGen/X86/bitscan-builtins.cpp @@ -0,0 +1,71 @@ +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-unknown-unknown -emit-llvm -o - | FileCheck %s + +// PR33722 +// RUN: %clang_cc1 -x c++ -ffreestanding %s -triple x86_64-unknown-unknown -fms-extensions -fms-compatibility-version=19.00 -emit-llvm -o - | FileCheck %s + +#include + +int test_bit_scan_forward(int a) { +// CHECK-LABEL: test_bit_scan_forward +// CHECK: %[[call:.*]] = call noundef i32 @llvm.cttz.i32(i32 %{{.*}}, i1 true) +// CHECK: ret i32 %[[call]] + return _bit_scan_forward(a); +} + +int test_bit_scan_reverse(int a) { +// CHECK-LABEL: test_bit_scan_reverse +// CHECK: %[[call:.*]] = call i32 @llvm.ctlz.i32(i32 %{{.*}}, i1 true) +// CHECK: %[[sub:.*]] = sub nsw i32 31, %[[call]] +// CHECK: ret i32 %[[sub]] + return _bit_scan_reverse(a); +} + +int test__bsfd(int X) { +// CHECK-LABEL: test__bsfd +// CHECK: %[[call:.*]] = call noundef i32 @llvm.cttz.i32(i32 %{{.*}}, i1 true) + return __bsfd(X); +} + +int test__bsfq(long long X) { +// CHECK-LABEL: test__bsfq +// CHECK: %[[call:.*]] = call i64 @llvm.cttz.i64(i64 %{{.*}}, i1 true) + return __bsfq(X); +} + +int test__bsrd(int X) { +// CHECK-LABEL: test__bsrd +// CHECK: %[[call:.*]] = call i32 @llvm.ctlz.i32(i32 %{{.*}}, i1 true) +// CHECK: %[[sub:.*]] = sub nsw i32 31, %[[call]] + return __bsrd(X); +} + +int test__bsrq(long long X) { +// CHECK-LABEL: test__bsrq +// CHECK: %[[call:.*]] = call i64 @llvm.ctlz.i64(i64 %{{.*}}, i1 true) +// CHECK: %[[cast:.*]] = trunc i64 %[[call]] to i32 +// CHECK: %[[sub:.*]] = sub nsw i32 63, %[[cast]] + return __bsrq(X); +} + +// Test constexpr handling. +#if defined(__cplusplus) && (__cplusplus >= 201103L) + +char bsf_0[_bit_scan_forward(0x00000001) == 0 ? 1 : -1]; +char bsf_1[_bit_scan_forward(0x10000000) == 28 ? 1 : -1]; + +char bsr_0[_bit_scan_reverse(0x00000001) == 0 ? 1 : -1]; +char bsr_1[_bit_scan_reverse(0x01000000) == 24 ? 1 : -1]; + +char bsfd_0[__bsfd(0x00000008) == 3 ? 1 : -1]; +char bsfd_1[__bsfd(0x00010008) == 3 ? 1 : -1]; + +char bsrd_0[__bsrd(0x00000010) == 4 ? 1 : -1]; +char bsrd_1[__bsrd(0x00100100) == 20 ? 1 : -1]; + +char bsfq_0[__bsfq(0x0000000800000000ULL) == 35 ? 1 : -1]; +char bsfq_1[__bsfq(0x0004000000000000ULL) == 50 ? 1 : -1]; + +char bsrq_0[__bsrq(0x0000100800000000ULL) == 44 ? 1 : -1]; +char bsrq_1[__bsrq(0x0004000100000000ULL) == 50 ? 1 : -1]; + +#endif diff --git a/clang/test/CodeGen/X86/popcnt-builtins.c b/clang/test/CodeGen/X86/popcnt-builtins.c index e59ffaa031a6a10..b9bc666a1e28ad2 100644 --- a/clang/test/CodeGen/X86/popcnt-builtins.c +++ b/clang/test/CodeGen/X86/popcnt-builtins.c @@ -1,7 +1,5 @@ // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +popcnt -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-POPCNT -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +popcnt -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-POPCNT // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s #include diff --git a/clang/test/CodeGen/X86/popcnt-builtins.cpp b/clang/test/CodeGen/X86/popcnt-builtins.cpp new file mode 100644 index 000000000000000..42996c547e0af9b --- /dev/null +++ b/clang/test/CodeGen/X86/popcnt-builtins.cpp @@ -0,0 +1,67 @@ +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +popcnt -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-POPCNT +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s + +#include + +#ifdef __POPCNT__ +int test_mm_popcnt_u32(unsigned int __X) { + //CHECK-POPCNT: call noundef i32 @llvm.ctpop.i32 + return _mm_popcnt_u32(__X); +} +#endif + +int test_popcnt32(unsigned int __X) { + //CHECK: call noundef i32 @llvm.ctpop.i32 + return _popcnt32(__X); +} + +int test__popcntd(unsigned int __X) { + //CHECK: call noundef i32 @llvm.ctpop.i32 + return __popcntd(__X); +} + +#ifdef __x86_64__ +#ifdef __POPCNT__ +long long test_mm_popcnt_u64(unsigned long long __X) { + //CHECK-POPCNT: call i64 @llvm.ctpop.i64 + return _mm_popcnt_u64(__X); +} +#endif + +long long test_popcnt64(unsigned long long __X) { + //CHECK: call i64 @llvm.ctpop.i64 + return _popcnt64(__X); +} + +long long test__popcntq(unsigned long long __X) { + //CHECK: call i64 @llvm.ctpop.i64 + return __popcntq(__X); +} +#endif + +// Test constexpr handling. +#if defined(__cplusplus) && (__cplusplus >= 201103L) +#if defined(__POPCNT__) +char ctpop32_0[_mm_popcnt_u32(0x00000000) == 0 ? 1 : -1]; +char ctpop32_1[_mm_popcnt_u32(0x000000F0) == 4 ? 1 : -1]; +#endif + +char popcnt32_0[_popcnt32(0x00000000) == 0 ? 1 : -1]; +char popcnt32_1[_popcnt32(0x100000F0) == 5 ? 1 : -1]; + +char popcntd_0[__popcntd(0x00000000) == 0 ? 1 : -1]; +char popcntd_1[__popcntd(0x00F000F0) == 8 ? 1 : -1]; + +#ifdef __x86_64__ +#if defined(__POPCNT__) +char ctpop64_0[_mm_popcnt_u64(0x0000000000000000ULL) == 0 ? 1 : -1]; +char ctpop64_1[_mm_popcnt_u64(0xF000000000000001ULL) == 5 ? 1 : -1]; +#endif + +char popcnt64_0[_popcnt64(0x0000000000000000ULL) == 0 ? 1 : -1]; +char popcnt64_1[_popcnt64(0xF00000F000000001ULL) == 9 ? 1 : -1]; + +char popcntq_0[__popcntq(0x0000000000000000ULL) == 0 ? 1 : -1]; +char popcntq_1[__popcntq(0xF000010000300001ULL) == 8 ? 1 : -1]; +#endif +#endif diff --git a/clang/test/CodeGen/X86/rot-intrinsics.c b/clang/test/CodeGen/X86/rot-intrinsics.c index f8c78119a1c4a77..70bda329ce860c9 100644 --- a/clang/test/CodeGen/X86/rot-intrinsics.c +++ b/clang/test/CodeGen/X86/rot-intrinsics.c @@ -5,13 +5,6 @@ // RUN: %clang_cc1 -x c -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG // RUN: %clang_cc1 -x c -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple i686--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple x86_64--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-64BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG - #include unsigned char test__rolb(unsigned char value, int shift) { diff --git a/clang/test/CodeGen/X86/rot-intrinsics.cpp b/clang/test/CodeGen/X86/rot-intrinsics.cpp new file mode 100644 index 000000000000000..90afa91f335e83d --- /dev/null +++ b/clang/test/CodeGen/X86/rot-intrinsics.cpp @@ -0,0 +1,165 @@ +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple i686--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-NO-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple x86_64--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-64BIT-LONG +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-NO-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-NO-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-COMPAT17 + +#include + +unsigned char test__rolb(unsigned char value, int shift) { +// CHECK-LABEL: test__rolb +// CHECK: [[R:%.*]] = call noundef i8 @llvm.fshl.i8(i8 [[X:%.*]], i8 [[X]], i8 [[Y:%.*]]) +// CHECK: ret i8 [[R]] + return __rolb(value, shift); +} + +unsigned short test__rolw(unsigned short value, int shift) { +// CHECK-LABEL: test__rolw +// CHECK: [[R:%.*]] = call noundef i16 @llvm.fshl.i16(i16 [[X:%.*]], i16 [[X]], i16 [[Y:%.*]]) +// CHECK: ret i16 [[R]] + return __rolw(value, shift); +} + +unsigned int test__rold(unsigned int value, int shift) { +// CHECK-LABEL: test__rold +// CHECK: [[R:%.*]] = call noundef i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK: ret i32 [[R]] + return __rold(value, shift); +} + +#if defined(__x86_64__) +unsigned long test__rolq(unsigned long value, int shift) { +// CHECK-LONG-LABEL: test__rolq +// CHECK-LONG: [[R:%.*]] = call i64 @llvm.fshl.i64(i64 [[X:%.*]], i64 [[X]], i64 [[Y:%.*]]) +// CHECK-LONG: ret i64 [[R]] + return __rolq(value, shift); +} +#endif + +unsigned char test__rorb(unsigned char value, int shift) { +// CHECK-LABEL: test__rorb +// CHECK: [[R:%.*]] = call noundef i8 @llvm.fshr.i8(i8 [[X:%.*]], i8 [[X]], i8 [[Y:%.*]]) +// CHECK: ret i8 [[R]] + return __rorb(value, shift); +} + +unsigned short test__rorw(unsigned short value, int shift) { +// CHECK-LABEL: test__rorw +// CHECK: [[R:%.*]] = call noundef i16 @llvm.fshr.i16(i16 [[X:%.*]], i16 [[X]], i16 [[Y:%.*]]) +// CHECK: ret i16 [[R]] + return __rorw(value, shift); +} + +unsigned int test__rord(unsigned int value, int shift) { +// CHECK-LABEL: test__rord +// CHECK: [[R:%.*]] = call noundef i32 @llvm.fshr.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK: ret i32 [[R]] + return __rord(value, shift); +} + +#if defined(__x86_64__) +unsigned long test__rorq(unsigned long value, int shift) { +// CHECK-LONG-LABEL: test__rorq +// CHECK-LONG: [[R:%.*]] = call i64 @llvm.fshr.i64(i64 [[X:%.*]], i64 [[X]], i64 [[Y:%.*]]) +// CHECK-LONG: ret i64 [[R]] + return __rorq(value, shift); +} +#endif + +unsigned short test_rotwl(unsigned short value, int shift) { +// CHECK-LABEL: test_rotwl +// CHECK: [[R:%.*]] = call noundef i16 @llvm.fshl.i16(i16 [[X:%.*]], i16 [[X]], i16 [[Y:%.*]]) +// CHECK: ret i16 [[R]] + return _rotwl(value, shift); +} + +unsigned int test_rotl(unsigned int value, int shift) { +// CHECK-32BIT-LONG-COMPAT17-LABEL: test_rotl +// CHECK-32BIT-LONG-COMPAT17: [[R:%.*]] = call i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-COMPAT17: ret i32 [[R]] +// +// CHECK-32BIT-LONG-NO-COMPAT17-LABEL: test_rotl +// CHECK-32BIT-LONG-NO-COMPAT17: [[R:%.*]] = call noundef i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-NO-COMPAT17: ret i32 [[R]] + return _rotl(value, shift); +} + +unsigned long test_lrotl(unsigned long value, int shift) { +// CHECK-32BIT-LONG-COMPAT17-LABEL: test_lrotl +// CHECK-32BIT-LONG-COMPAT17: [[R:%.*]] = call i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-COMPAT17: ret i32 [[R]] +// +// CHECK-32BIT-LONG-NO-COMPAT17-LABEL: test_lrotl +// CHECK-32BIT-LONG-NO-COMPAT17: [[R:%.*]] = call noundef i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-NO-COMPAT17: ret i32 [[R]] +// +// CHECK-64BIT-LONG-LABEL: test_lrotl +// CHECK-64BIT-LONG: [[R:%.*]] = call noundef i64 @llvm.fshl.i64(i64 [[X:%.*]], i64 [[X]], i64 [[Y:%.*]]) +// CHECK-64BIT-LONG: ret i64 [[R]] + return _lrotl(value, shift); +} + + +unsigned short test_ro... 
@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-llvm-transforms

Changes
  • Add some additional tests for progagating attributes before inlining; NFC
  • Regen checks for old test; NFC
  • Split some tests for c/cpp; NFC
  • Use "best" ret attribute when propagating attributes during inlining
  • Also propagate noundef and align ret attributes during inlining
  • Propagate callee function memory access attributes before inlining
  • Propagate callee argument memory access attributes before inlining
  • Propagate some additional callee argument attributes before inlining
  • Propagate some callee function attributes to callsites before inlining

--

Patch is 373.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66036.diff

45 Files Affected:

  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c (+9-9)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c (+4-4)
  • (modified) clang/test/CodeGen/X86/bitscan-builtins.c (-3)
  • (added) clang/test/CodeGen/X86/bitscan-builtins.cpp (+71)
  • (modified) clang/test/CodeGen/X86/popcnt-builtins.c (-2)
  • (added) clang/test/CodeGen/X86/popcnt-builtins.cpp (+67)
  • (modified) clang/test/CodeGen/X86/rot-intrinsics.c (-7)
  • (added) clang/test/CodeGen/X86/rot-intrinsics.cpp (+165)
  • (modified) clang/test/CodeGen/X86/x86-bswap.c (-1)
  • (added) clang/test/CodeGen/X86/x86-bswap.cpp (+44)
  • (modified) clang/test/CodeGen/aarch64-ls64.c (+3-3)
  • (modified) clang/test/CodeGen/aarch64-sme-intrinsics/aarch64-sme-attrs.cpp (+5-5)
  • (modified) clang/test/CodeGen/fp-contract-fast-pragma.cpp (+1-1)
  • (modified) clang/test/CodeGen/fp-contract-on-pragma.cpp (+1-1)
  • (modified) clang/test/CodeGen/fp-contract-pragma.cpp (+1-1)
  • (modified) clang/test/CodeGenCUDA/cuda-builtin-vars.cu (+12-12)
  • (modified) clang/test/Headers/__clang_hip_cmath.hip (+10-10)
  • (modified) clang/test/Headers/__clang_hip_math.hip (+524-524)
  • (modified) clang/test/Headers/__clang_hip_math_ocml_rounded_ops.hip (+44-44)
  • (modified) clang/test/Headers/amdgcn_openmp_device_math.c (+12-12)
  • (modified) clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp (+12-12)
  • (modified) clang/test/Headers/hip-header.hip (+5-5)
  • (modified) clang/test/Headers/nvptx_device_cmath_functions.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_cmath_functions_cxx17.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_math_functions.c (+12-7)
  • (modified) clang/test/Headers/nvptx_device_math_functions.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_math_functions_cxx17.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_math_modf.cpp (+4-4)
  • (modified) clang/test/Headers/nvptx_device_math_sin.cpp (+4-4)
  • (modified) clang/test/Headers/nvptx_device_math_sin_cos.cpp (+6-6)
  • (modified) clang/test/Headers/openmp_device_math_isnan.cpp (+4-4)
  • (modified) clang/test/OpenMP/bug57757.cpp (+1-1)
  • (modified) llvm/include/llvm/Support/ModRef.h (+7)
  • (modified) llvm/lib/Transforms/Utils/InlineFunction.cpp (+187)
  • (added) llvm/test/Transforms/Inline/access-attributes-prop.ll (+498)
  • (modified) llvm/test/Transforms/Inline/assumptions-from-callsite-attrs.ll (+1-1)
  • (modified) llvm/test/Transforms/Inline/byval.ll (+122-73)
  • (modified) llvm/test/Transforms/Inline/memprof_inline.ll (+2-2)
  • (modified) llvm/test/Transforms/Inline/noalias-calls-always.ll (+6-6)
  • (modified) llvm/test/Transforms/Inline/noalias-calls.ll (+10-10)
  • (added) llvm/test/Transforms/Inline/ret_attr_align_and_noundef.ll (+171)
  • (modified) llvm/test/Transforms/Inline/ret_attr_update.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/norepeated-icp-3.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/norepeated-icp-4.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/uniqname.ll (+3-3)
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c index 44f8cbe2cc01739..642b08ac68ef122 100644 --- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c +++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c @@ -636,31 +636,31 @@ void test_core(void) { // CHECK-ASM: vlbb vsc = vec_load_len(cptrsc, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vuc = vec_load_len(cptruc, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vss = vec_load_len(cptrss, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vus = vec_load_len(cptrus, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vsi = vec_load_len(cptrsi, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vui = vec_load_len(cptrui, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vsl = vec_load_len(cptrsl, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vul = vec_load_len(cptrul, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vd = vec_load_len(cptrd, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vec_store_len(vsc, ptrsc, idx); diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c index 416ca0ddd1b4fe2..3f02565dfb488ce 100644 --- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c +++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c @@ -207,10 +207,10 @@ void test_core(void) { // CHECK-ASM: vlbb vf = vec_load_len(cptrf, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vd = vec_load_len(cptrd, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vec_store_len(vf, ptrf, idx); @@ -221,10 +221,10 @@ void test_core(void) { // CHECK-ASM: vstl vuc = vec_load_len_r(cptruc, 0); - // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 0, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 0, ptr readonly %{{.*}}) // CHECK-ASM: vlrl %{{.*}}, 0(%{{.*}}), 0 vuc = vec_load_len_r(cptruc, idx); - // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vlrlr vec_store_len_r(vuc, ptruc, 0); diff --git a/clang/test/CodeGen/X86/bitscan-builtins.c b/clang/test/CodeGen/X86/bitscan-builtins.c index a5a7808a82a234d..d75f7c1ab2f1ed4 100644 --- a/clang/test/CodeGen/X86/bitscan-builtins.c +++ b/clang/test/CodeGen/X86/bitscan-builtins.c @@ -1,9 +1,6 @@ // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-unknown-unknown -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-unknown-unknown -emit-llvm -o - | FileCheck %s - // PR33722 // RUN: %clang_cc1 -x c -ffreestanding %s -triple x86_64-unknown-unknown -fms-extensions -fms-compatibility-version=19.00 -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -x c++ -ffreestanding %s -triple x86_64-unknown-unknown -fms-extensions -fms-compatibility-version=19.00 -emit-llvm -o - | FileCheck %s #include diff --git a/clang/test/CodeGen/X86/bitscan-builtins.cpp b/clang/test/CodeGen/X86/bitscan-builtins.cpp new file mode 100644 index 000000000000000..ab8a0fa2f24773b --- /dev/null +++ b/clang/test/CodeGen/X86/bitscan-builtins.cpp @@ -0,0 +1,71 @@ +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-unknown-unknown -emit-llvm -o - | FileCheck %s + +// PR33722 +// RUN: %clang_cc1 -x c++ -ffreestanding %s -triple x86_64-unknown-unknown -fms-extensions -fms-compatibility-version=19.00 -emit-llvm -o - | FileCheck %s + +#include + +int test_bit_scan_forward(int a) { +// CHECK-LABEL: test_bit_scan_forward +// CHECK: %[[call:.*]] = call noundef i32 @llvm.cttz.i32(i32 %{{.*}}, i1 true) +// CHECK: ret i32 %[[call]] + return _bit_scan_forward(a); +} + +int test_bit_scan_reverse(int a) { +// CHECK-LABEL: test_bit_scan_reverse +// CHECK: %[[call:.*]] = call i32 @llvm.ctlz.i32(i32 %{{.*}}, i1 true) +// CHECK: %[[sub:.*]] = sub nsw i32 31, %[[call]] +// CHECK: ret i32 %[[sub]] + return _bit_scan_reverse(a); +} + +int test__bsfd(int X) { +// CHECK-LABEL: test__bsfd +// CHECK: %[[call:.*]] = call noundef i32 @llvm.cttz.i32(i32 %{{.*}}, i1 true) + return __bsfd(X); +} + +int test__bsfq(long long X) { +// CHECK-LABEL: test__bsfq +// CHECK: %[[call:.*]] = call i64 @llvm.cttz.i64(i64 %{{.*}}, i1 true) + return __bsfq(X); +} + +int test__bsrd(int X) { +// CHECK-LABEL: test__bsrd +// CHECK: %[[call:.*]] = call i32 @llvm.ctlz.i32(i32 %{{.*}}, i1 true) +// CHECK: %[[sub:.*]] = sub nsw i32 31, %[[call]] + return __bsrd(X); +} + +int test__bsrq(long long X) { +// CHECK-LABEL: test__bsrq +// CHECK: %[[call:.*]] = call i64 @llvm.ctlz.i64(i64 %{{.*}}, i1 true) +// CHECK: %[[cast:.*]] = trunc i64 %[[call]] to i32 +// CHECK: %[[sub:.*]] = sub nsw i32 63, %[[cast]] + return __bsrq(X); +} + +// Test constexpr handling. +#if defined(__cplusplus) && (__cplusplus >= 201103L) + +char bsf_0[_bit_scan_forward(0x00000001) == 0 ? 1 : -1]; +char bsf_1[_bit_scan_forward(0x10000000) == 28 ? 1 : -1]; + +char bsr_0[_bit_scan_reverse(0x00000001) == 0 ? 1 : -1]; +char bsr_1[_bit_scan_reverse(0x01000000) == 24 ? 1 : -1]; + +char bsfd_0[__bsfd(0x00000008) == 3 ? 1 : -1]; +char bsfd_1[__bsfd(0x00010008) == 3 ? 1 : -1]; + +char bsrd_0[__bsrd(0x00000010) == 4 ? 1 : -1]; +char bsrd_1[__bsrd(0x00100100) == 20 ? 1 : -1]; + +char bsfq_0[__bsfq(0x0000000800000000ULL) == 35 ? 1 : -1]; +char bsfq_1[__bsfq(0x0004000000000000ULL) == 50 ? 1 : -1]; + +char bsrq_0[__bsrq(0x0000100800000000ULL) == 44 ? 1 : -1]; +char bsrq_1[__bsrq(0x0004000100000000ULL) == 50 ? 1 : -1]; + +#endif diff --git a/clang/test/CodeGen/X86/popcnt-builtins.c b/clang/test/CodeGen/X86/popcnt-builtins.c index e59ffaa031a6a10..b9bc666a1e28ad2 100644 --- a/clang/test/CodeGen/X86/popcnt-builtins.c +++ b/clang/test/CodeGen/X86/popcnt-builtins.c @@ -1,7 +1,5 @@ // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +popcnt -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-POPCNT -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +popcnt -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-POPCNT // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s #include diff --git a/clang/test/CodeGen/X86/popcnt-builtins.cpp b/clang/test/CodeGen/X86/popcnt-builtins.cpp new file mode 100644 index 000000000000000..42996c547e0af9b --- /dev/null +++ b/clang/test/CodeGen/X86/popcnt-builtins.cpp @@ -0,0 +1,67 @@ +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +popcnt -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-POPCNT +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s + +#include + +#ifdef __POPCNT__ +int test_mm_popcnt_u32(unsigned int __X) { + //CHECK-POPCNT: call noundef i32 @llvm.ctpop.i32 + return _mm_popcnt_u32(__X); +} +#endif + +int test_popcnt32(unsigned int __X) { + //CHECK: call noundef i32 @llvm.ctpop.i32 + return _popcnt32(__X); +} + +int test__popcntd(unsigned int __X) { + //CHECK: call noundef i32 @llvm.ctpop.i32 + return __popcntd(__X); +} + +#ifdef __x86_64__ +#ifdef __POPCNT__ +long long test_mm_popcnt_u64(unsigned long long __X) { + //CHECK-POPCNT: call i64 @llvm.ctpop.i64 + return _mm_popcnt_u64(__X); +} +#endif + +long long test_popcnt64(unsigned long long __X) { + //CHECK: call i64 @llvm.ctpop.i64 + return _popcnt64(__X); +} + +long long test__popcntq(unsigned long long __X) { + //CHECK: call i64 @llvm.ctpop.i64 + return __popcntq(__X); +} +#endif + +// Test constexpr handling. +#if defined(__cplusplus) && (__cplusplus >= 201103L) +#if defined(__POPCNT__) +char ctpop32_0[_mm_popcnt_u32(0x00000000) == 0 ? 1 : -1]; +char ctpop32_1[_mm_popcnt_u32(0x000000F0) == 4 ? 1 : -1]; +#endif + +char popcnt32_0[_popcnt32(0x00000000) == 0 ? 1 : -1]; +char popcnt32_1[_popcnt32(0x100000F0) == 5 ? 1 : -1]; + +char popcntd_0[__popcntd(0x00000000) == 0 ? 1 : -1]; +char popcntd_1[__popcntd(0x00F000F0) == 8 ? 1 : -1]; + +#ifdef __x86_64__ +#if defined(__POPCNT__) +char ctpop64_0[_mm_popcnt_u64(0x0000000000000000ULL) == 0 ? 1 : -1]; +char ctpop64_1[_mm_popcnt_u64(0xF000000000000001ULL) == 5 ? 1 : -1]; +#endif + +char popcnt64_0[_popcnt64(0x0000000000000000ULL) == 0 ? 1 : -1]; +char popcnt64_1[_popcnt64(0xF00000F000000001ULL) == 9 ? 1 : -1]; + +char popcntq_0[__popcntq(0x0000000000000000ULL) == 0 ? 1 : -1]; +char popcntq_1[__popcntq(0xF000010000300001ULL) == 8 ? 1 : -1]; +#endif +#endif diff --git a/clang/test/CodeGen/X86/rot-intrinsics.c b/clang/test/CodeGen/X86/rot-intrinsics.c index f8c78119a1c4a77..70bda329ce860c9 100644 --- a/clang/test/CodeGen/X86/rot-intrinsics.c +++ b/clang/test/CodeGen/X86/rot-intrinsics.c @@ -5,13 +5,6 @@ // RUN: %clang_cc1 -x c -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG // RUN: %clang_cc1 -x c -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple i686--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple x86_64--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-64BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG - #include unsigned char test__rolb(unsigned char value, int shift) { diff --git a/clang/test/CodeGen/X86/rot-intrinsics.cpp b/clang/test/CodeGen/X86/rot-intrinsics.cpp new file mode 100644 index 000000000000000..90afa91f335e83d --- /dev/null +++ b/clang/test/CodeGen/X86/rot-intrinsics.cpp @@ -0,0 +1,165 @@ +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple i686--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-NO-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple x86_64--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-64BIT-LONG +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-NO-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-NO-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-COMPAT17 + +#include + +unsigned char test__rolb(unsigned char value, int shift) { +// CHECK-LABEL: test__rolb +// CHECK: [[R:%.*]] = call noundef i8 @llvm.fshl.i8(i8 [[X:%.*]], i8 [[X]], i8 [[Y:%.*]]) +// CHECK: ret i8 [[R]] + return __rolb(value, shift); +} + +unsigned short test__rolw(unsigned short value, int shift) { +// CHECK-LABEL: test__rolw +// CHECK: [[R:%.*]] = call noundef i16 @llvm.fshl.i16(i16 [[X:%.*]], i16 [[X]], i16 [[Y:%.*]]) +// CHECK: ret i16 [[R]] + return __rolw(value, shift); +} + +unsigned int test__rold(unsigned int value, int shift) { +// CHECK-LABEL: test__rold +// CHECK: [[R:%.*]] = call noundef i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK: ret i32 [[R]] + return __rold(value, shift); +} + +#if defined(__x86_64__) +unsigned long test__rolq(unsigned long value, int shift) { +// CHECK-LONG-LABEL: test__rolq +// CHECK-LONG: [[R:%.*]] = call i64 @llvm.fshl.i64(i64 [[X:%.*]], i64 [[X]], i64 [[Y:%.*]]) +// CHECK-LONG: ret i64 [[R]] + return __rolq(value, shift); +} +#endif + +unsigned char test__rorb(unsigned char value, int shift) { +// CHECK-LABEL: test__rorb +// CHECK: [[R:%.*]] = call noundef i8 @llvm.fshr.i8(i8 [[X:%.*]], i8 [[X]], i8 [[Y:%.*]]) +// CHECK: ret i8 [[R]] + return __rorb(value, shift); +} + +unsigned short test__rorw(unsigned short value, int shift) { +// CHECK-LABEL: test__rorw +// CHECK: [[R:%.*]] = call noundef i16 @llvm.fshr.i16(i16 [[X:%.*]], i16 [[X]], i16 [[Y:%.*]]) +// CHECK: ret i16 [[R]] + return __rorw(value, shift); +} + +unsigned int test__rord(unsigned int value, int shift) { +// CHECK-LABEL: test__rord +// CHECK: [[R:%.*]] = call noundef i32 @llvm.fshr.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK: ret i32 [[R]] + return __rord(value, shift); +} + +#if defined(__x86_64__) +unsigned long test__rorq(unsigned long value, int shift) { +// CHECK-LONG-LABEL: test__rorq +// CHECK-LONG: [[R:%.*]] = call i64 @llvm.fshr.i64(i64 [[X:%.*]], i64 [[X]], i64 [[Y:%.*]]) +// CHECK-LONG: ret i64 [[R]] + return __rorq(value, shift); +} +#endif + +unsigned short test_rotwl(unsigned short value, int shift) { +// CHECK-LABEL: test_rotwl +// CHECK: [[R:%.*]] = call noundef i16 @llvm.fshl.i16(i16 [[X:%.*]], i16 [[X]], i16 [[Y:%.*]]) +// CHECK: ret i16 [[R]] + return _rotwl(value, shift); +} + +unsigned int test_rotl(unsigned int value, int shift) { +// CHECK-32BIT-LONG-COMPAT17-LABEL: test_rotl +// CHECK-32BIT-LONG-COMPAT17: [[R:%.*]] = call i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-COMPAT17: ret i32 [[R]] +// +// CHECK-32BIT-LONG-NO-COMPAT17-LABEL: test_rotl +// CHECK-32BIT-LONG-NO-COMPAT17: [[R:%.*]] = call noundef i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-NO-COMPAT17: ret i32 [[R]] + return _rotl(value, shift); +} + +unsigned long test_lrotl(unsigned long value, int shift) { +// CHECK-32BIT-LONG-COMPAT17-LABEL: test_lrotl +// CHECK-32BIT-LONG-COMPAT17: [[R:%.*]] = call i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-COMPAT17: ret i32 [[R]] +// +// CHECK-32BIT-LONG-NO-COMPAT17-LABEL: test_lrotl +// CHECK-32BIT-LONG-NO-COMPAT17: [[R:%.*]] = call noundef i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-NO-COMPAT17: ret i32 [[R]] +// +// CHECK-64BIT-LONG-LABEL: test_lrotl +// CHECK-64BIT-LONG: [[R:%.*]] = call noundef i64 @llvm.fshl.i64(i64 [[X:%.*]], i64 [[X]], i64 [[Y:%.*]]) +// CHECK-64BIT-LONG: ret i64 [[R]] + return _lrotl(value, shift); +} + + +unsigned short test_ro... 
@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-clang

Changes
  • Add some additional tests for progagating attributes before inlining; NFC
  • Regen checks for old test; NFC
  • Split some tests for c/cpp; NFC
  • Use "best" ret attribute when propagating attributes during inlining
  • Also propagate noundef and align ret attributes during inlining
  • Propagate callee function memory access attributes before inlining
  • Propagate callee argument memory access attributes before inlining
  • Propagate some additional callee argument attributes before inlining
  • Propagate some callee function attributes to callsites before inlining

--

Patch is 373.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66036.diff

45 Files Affected:

  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c (+9-9)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c (+4-4)
  • (modified) clang/test/CodeGen/X86/bitscan-builtins.c (-3)
  • (added) clang/test/CodeGen/X86/bitscan-builtins.cpp (+71)
  • (modified) clang/test/CodeGen/X86/popcnt-builtins.c (-2)
  • (added) clang/test/CodeGen/X86/popcnt-builtins.cpp (+67)
  • (modified) clang/test/CodeGen/X86/rot-intrinsics.c (-7)
  • (added) clang/test/CodeGen/X86/rot-intrinsics.cpp (+165)
  • (modified) clang/test/CodeGen/X86/x86-bswap.c (-1)
  • (added) clang/test/CodeGen/X86/x86-bswap.cpp (+44)
  • (modified) clang/test/CodeGen/aarch64-ls64.c (+3-3)
  • (modified) clang/test/CodeGen/aarch64-sme-intrinsics/aarch64-sme-attrs.cpp (+5-5)
  • (modified) clang/test/CodeGen/fp-contract-fast-pragma.cpp (+1-1)
  • (modified) clang/test/CodeGen/fp-contract-on-pragma.cpp (+1-1)
  • (modified) clang/test/CodeGen/fp-contract-pragma.cpp (+1-1)
  • (modified) clang/test/CodeGenCUDA/cuda-builtin-vars.cu (+12-12)
  • (modified) clang/test/Headers/__clang_hip_cmath.hip (+10-10)
  • (modified) clang/test/Headers/__clang_hip_math.hip (+524-524)
  • (modified) clang/test/Headers/__clang_hip_math_ocml_rounded_ops.hip (+44-44)
  • (modified) clang/test/Headers/amdgcn_openmp_device_math.c (+12-12)
  • (modified) clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp (+12-12)
  • (modified) clang/test/Headers/hip-header.hip (+5-5)
  • (modified) clang/test/Headers/nvptx_device_cmath_functions.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_cmath_functions_cxx17.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_math_functions.c (+12-7)
  • (modified) clang/test/Headers/nvptx_device_math_functions.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_math_functions_cxx17.cpp (+5-5)
  • (modified) clang/test/Headers/nvptx_device_math_modf.cpp (+4-4)
  • (modified) clang/test/Headers/nvptx_device_math_sin.cpp (+4-4)
  • (modified) clang/test/Headers/nvptx_device_math_sin_cos.cpp (+6-6)
  • (modified) clang/test/Headers/openmp_device_math_isnan.cpp (+4-4)
  • (modified) clang/test/OpenMP/bug57757.cpp (+1-1)
  • (modified) llvm/include/llvm/Support/ModRef.h (+7)
  • (modified) llvm/lib/Transforms/Utils/InlineFunction.cpp (+187)
  • (added) llvm/test/Transforms/Inline/access-attributes-prop.ll (+498)
  • (modified) llvm/test/Transforms/Inline/assumptions-from-callsite-attrs.ll (+1-1)
  • (modified) llvm/test/Transforms/Inline/byval.ll (+122-73)
  • (modified) llvm/test/Transforms/Inline/memprof_inline.ll (+2-2)
  • (modified) llvm/test/Transforms/Inline/noalias-calls-always.ll (+6-6)
  • (modified) llvm/test/Transforms/Inline/noalias-calls.ll (+10-10)
  • (added) llvm/test/Transforms/Inline/ret_attr_align_and_noundef.ll (+171)
  • (modified) llvm/test/Transforms/Inline/ret_attr_update.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/norepeated-icp-3.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/norepeated-icp-4.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/uniqname.ll (+3-3)
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c index 44f8cbe2cc01739..642b08ac68ef122 100644 --- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c +++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c @@ -636,31 +636,31 @@ void test_core(void) { // CHECK-ASM: vlbb vsc = vec_load_len(cptrsc, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vuc = vec_load_len(cptruc, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vss = vec_load_len(cptrss, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vus = vec_load_len(cptrus, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vsi = vec_load_len(cptrsi, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vui = vec_load_len(cptrui, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vsl = vec_load_len(cptrsl, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vul = vec_load_len(cptrul, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vd = vec_load_len(cptrd, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vec_store_len(vsc, ptrsc, idx); diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c index 416ca0ddd1b4fe2..3f02565dfb488ce 100644 --- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c +++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c @@ -207,10 +207,10 @@ void test_core(void) { // CHECK-ASM: vlbb vf = vec_load_len(cptrf, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vd = vec_load_len(cptrd, idx); - // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vll(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vll vec_store_len(vf, ptrf, idx); @@ -221,10 +221,10 @@ void test_core(void) { // CHECK-ASM: vstl vuc = vec_load_len_r(cptruc, 0); - // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 0, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 0, ptr readonly %{{.*}}) // CHECK-ASM: vlrl %{{.*}}, 0(%{{.*}}), 0 vuc = vec_load_len_r(cptruc, idx); - // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 %{{.*}}, ptr %{{.*}}) + // CHECK: call <16 x i8> @llvm.s390.vlrl(i32 %{{.*}}, ptr readonly %{{.*}}) // CHECK-ASM: vlrlr vec_store_len_r(vuc, ptruc, 0); diff --git a/clang/test/CodeGen/X86/bitscan-builtins.c b/clang/test/CodeGen/X86/bitscan-builtins.c index a5a7808a82a234d..d75f7c1ab2f1ed4 100644 --- a/clang/test/CodeGen/X86/bitscan-builtins.c +++ b/clang/test/CodeGen/X86/bitscan-builtins.c @@ -1,9 +1,6 @@ // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-unknown-unknown -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-unknown-unknown -emit-llvm -o - | FileCheck %s - // PR33722 // RUN: %clang_cc1 -x c -ffreestanding %s -triple x86_64-unknown-unknown -fms-extensions -fms-compatibility-version=19.00 -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -x c++ -ffreestanding %s -triple x86_64-unknown-unknown -fms-extensions -fms-compatibility-version=19.00 -emit-llvm -o - | FileCheck %s #include diff --git a/clang/test/CodeGen/X86/bitscan-builtins.cpp b/clang/test/CodeGen/X86/bitscan-builtins.cpp new file mode 100644 index 000000000000000..ab8a0fa2f24773b --- /dev/null +++ b/clang/test/CodeGen/X86/bitscan-builtins.cpp @@ -0,0 +1,71 @@ +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-unknown-unknown -emit-llvm -o - | FileCheck %s + +// PR33722 +// RUN: %clang_cc1 -x c++ -ffreestanding %s -triple x86_64-unknown-unknown -fms-extensions -fms-compatibility-version=19.00 -emit-llvm -o - | FileCheck %s + +#include + +int test_bit_scan_forward(int a) { +// CHECK-LABEL: test_bit_scan_forward +// CHECK: %[[call:.*]] = call noundef i32 @llvm.cttz.i32(i32 %{{.*}}, i1 true) +// CHECK: ret i32 %[[call]] + return _bit_scan_forward(a); +} + +int test_bit_scan_reverse(int a) { +// CHECK-LABEL: test_bit_scan_reverse +// CHECK: %[[call:.*]] = call i32 @llvm.ctlz.i32(i32 %{{.*}}, i1 true) +// CHECK: %[[sub:.*]] = sub nsw i32 31, %[[call]] +// CHECK: ret i32 %[[sub]] + return _bit_scan_reverse(a); +} + +int test__bsfd(int X) { +// CHECK-LABEL: test__bsfd +// CHECK: %[[call:.*]] = call noundef i32 @llvm.cttz.i32(i32 %{{.*}}, i1 true) + return __bsfd(X); +} + +int test__bsfq(long long X) { +// CHECK-LABEL: test__bsfq +// CHECK: %[[call:.*]] = call i64 @llvm.cttz.i64(i64 %{{.*}}, i1 true) + return __bsfq(X); +} + +int test__bsrd(int X) { +// CHECK-LABEL: test__bsrd +// CHECK: %[[call:.*]] = call i32 @llvm.ctlz.i32(i32 %{{.*}}, i1 true) +// CHECK: %[[sub:.*]] = sub nsw i32 31, %[[call]] + return __bsrd(X); +} + +int test__bsrq(long long X) { +// CHECK-LABEL: test__bsrq +// CHECK: %[[call:.*]] = call i64 @llvm.ctlz.i64(i64 %{{.*}}, i1 true) +// CHECK: %[[cast:.*]] = trunc i64 %[[call]] to i32 +// CHECK: %[[sub:.*]] = sub nsw i32 63, %[[cast]] + return __bsrq(X); +} + +// Test constexpr handling. +#if defined(__cplusplus) && (__cplusplus >= 201103L) + +char bsf_0[_bit_scan_forward(0x00000001) == 0 ? 1 : -1]; +char bsf_1[_bit_scan_forward(0x10000000) == 28 ? 1 : -1]; + +char bsr_0[_bit_scan_reverse(0x00000001) == 0 ? 1 : -1]; +char bsr_1[_bit_scan_reverse(0x01000000) == 24 ? 1 : -1]; + +char bsfd_0[__bsfd(0x00000008) == 3 ? 1 : -1]; +char bsfd_1[__bsfd(0x00010008) == 3 ? 1 : -1]; + +char bsrd_0[__bsrd(0x00000010) == 4 ? 1 : -1]; +char bsrd_1[__bsrd(0x00100100) == 20 ? 1 : -1]; + +char bsfq_0[__bsfq(0x0000000800000000ULL) == 35 ? 1 : -1]; +char bsfq_1[__bsfq(0x0004000000000000ULL) == 50 ? 1 : -1]; + +char bsrq_0[__bsrq(0x0000100800000000ULL) == 44 ? 1 : -1]; +char bsrq_1[__bsrq(0x0004000100000000ULL) == 50 ? 1 : -1]; + +#endif diff --git a/clang/test/CodeGen/X86/popcnt-builtins.c b/clang/test/CodeGen/X86/popcnt-builtins.c index e59ffaa031a6a10..b9bc666a1e28ad2 100644 --- a/clang/test/CodeGen/X86/popcnt-builtins.c +++ b/clang/test/CodeGen/X86/popcnt-builtins.c @@ -1,7 +1,5 @@ // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +popcnt -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-POPCNT -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +popcnt -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-POPCNT // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s #include diff --git a/clang/test/CodeGen/X86/popcnt-builtins.cpp b/clang/test/CodeGen/X86/popcnt-builtins.cpp new file mode 100644 index 000000000000000..42996c547e0af9b --- /dev/null +++ b/clang/test/CodeGen/X86/popcnt-builtins.cpp @@ -0,0 +1,67 @@ +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +popcnt -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-POPCNT +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s + +#include + +#ifdef __POPCNT__ +int test_mm_popcnt_u32(unsigned int __X) { + //CHECK-POPCNT: call noundef i32 @llvm.ctpop.i32 + return _mm_popcnt_u32(__X); +} +#endif + +int test_popcnt32(unsigned int __X) { + //CHECK: call noundef i32 @llvm.ctpop.i32 + return _popcnt32(__X); +} + +int test__popcntd(unsigned int __X) { + //CHECK: call noundef i32 @llvm.ctpop.i32 + return __popcntd(__X); +} + +#ifdef __x86_64__ +#ifdef __POPCNT__ +long long test_mm_popcnt_u64(unsigned long long __X) { + //CHECK-POPCNT: call i64 @llvm.ctpop.i64 + return _mm_popcnt_u64(__X); +} +#endif + +long long test_popcnt64(unsigned long long __X) { + //CHECK: call i64 @llvm.ctpop.i64 + return _popcnt64(__X); +} + +long long test__popcntq(unsigned long long __X) { + //CHECK: call i64 @llvm.ctpop.i64 + return __popcntq(__X); +} +#endif + +// Test constexpr handling. +#if defined(__cplusplus) && (__cplusplus >= 201103L) +#if defined(__POPCNT__) +char ctpop32_0[_mm_popcnt_u32(0x00000000) == 0 ? 1 : -1]; +char ctpop32_1[_mm_popcnt_u32(0x000000F0) == 4 ? 1 : -1]; +#endif + +char popcnt32_0[_popcnt32(0x00000000) == 0 ? 1 : -1]; +char popcnt32_1[_popcnt32(0x100000F0) == 5 ? 1 : -1]; + +char popcntd_0[__popcntd(0x00000000) == 0 ? 1 : -1]; +char popcntd_1[__popcntd(0x00F000F0) == 8 ? 1 : -1]; + +#ifdef __x86_64__ +#if defined(__POPCNT__) +char ctpop64_0[_mm_popcnt_u64(0x0000000000000000ULL) == 0 ? 1 : -1]; +char ctpop64_1[_mm_popcnt_u64(0xF000000000000001ULL) == 5 ? 1 : -1]; +#endif + +char popcnt64_0[_popcnt64(0x0000000000000000ULL) == 0 ? 1 : -1]; +char popcnt64_1[_popcnt64(0xF00000F000000001ULL) == 9 ? 1 : -1]; + +char popcntq_0[__popcntq(0x0000000000000000ULL) == 0 ? 1 : -1]; +char popcntq_1[__popcntq(0xF000010000300001ULL) == 8 ? 1 : -1]; +#endif +#endif diff --git a/clang/test/CodeGen/X86/rot-intrinsics.c b/clang/test/CodeGen/X86/rot-intrinsics.c index f8c78119a1c4a77..70bda329ce860c9 100644 --- a/clang/test/CodeGen/X86/rot-intrinsics.c +++ b/clang/test/CodeGen/X86/rot-intrinsics.c @@ -5,13 +5,6 @@ // RUN: %clang_cc1 -x c -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG // RUN: %clang_cc1 -x c -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple i686--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple x86_64--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-64BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG -// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG - #include unsigned char test__rolb(unsigned char value, int shift) { diff --git a/clang/test/CodeGen/X86/rot-intrinsics.cpp b/clang/test/CodeGen/X86/rot-intrinsics.cpp new file mode 100644 index 000000000000000..90afa91f335e83d --- /dev/null +++ b/clang/test/CodeGen/X86/rot-intrinsics.cpp @@ -0,0 +1,165 @@ +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple i686--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-NO-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -ffreestanding -triple x86_64--linux -emit-llvm %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-64BIT-LONG +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-NO-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-NO-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-COMPAT17 +// RUN: %clang_cc1 -x c++ -std=c++11 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG-COMPAT17 + +#include + +unsigned char test__rolb(unsigned char value, int shift) { +// CHECK-LABEL: test__rolb +// CHECK: [[R:%.*]] = call noundef i8 @llvm.fshl.i8(i8 [[X:%.*]], i8 [[X]], i8 [[Y:%.*]]) +// CHECK: ret i8 [[R]] + return __rolb(value, shift); +} + +unsigned short test__rolw(unsigned short value, int shift) { +// CHECK-LABEL: test__rolw +// CHECK: [[R:%.*]] = call noundef i16 @llvm.fshl.i16(i16 [[X:%.*]], i16 [[X]], i16 [[Y:%.*]]) +// CHECK: ret i16 [[R]] + return __rolw(value, shift); +} + +unsigned int test__rold(unsigned int value, int shift) { +// CHECK-LABEL: test__rold +// CHECK: [[R:%.*]] = call noundef i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK: ret i32 [[R]] + return __rold(value, shift); +} + +#if defined(__x86_64__) +unsigned long test__rolq(unsigned long value, int shift) { +// CHECK-LONG-LABEL: test__rolq +// CHECK-LONG: [[R:%.*]] = call i64 @llvm.fshl.i64(i64 [[X:%.*]], i64 [[X]], i64 [[Y:%.*]]) +// CHECK-LONG: ret i64 [[R]] + return __rolq(value, shift); +} +#endif + +unsigned char test__rorb(unsigned char value, int shift) { +// CHECK-LABEL: test__rorb +// CHECK: [[R:%.*]] = call noundef i8 @llvm.fshr.i8(i8 [[X:%.*]], i8 [[X]], i8 [[Y:%.*]]) +// CHECK: ret i8 [[R]] + return __rorb(value, shift); +} + +unsigned short test__rorw(unsigned short value, int shift) { +// CHECK-LABEL: test__rorw +// CHECK: [[R:%.*]] = call noundef i16 @llvm.fshr.i16(i16 [[X:%.*]], i16 [[X]], i16 [[Y:%.*]]) +// CHECK: ret i16 [[R]] + return __rorw(value, shift); +} + +unsigned int test__rord(unsigned int value, int shift) { +// CHECK-LABEL: test__rord +// CHECK: [[R:%.*]] = call noundef i32 @llvm.fshr.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK: ret i32 [[R]] + return __rord(value, shift); +} + +#if defined(__x86_64__) +unsigned long test__rorq(unsigned long value, int shift) { +// CHECK-LONG-LABEL: test__rorq +// CHECK-LONG: [[R:%.*]] = call i64 @llvm.fshr.i64(i64 [[X:%.*]], i64 [[X]], i64 [[Y:%.*]]) +// CHECK-LONG: ret i64 [[R]] + return __rorq(value, shift); +} +#endif + +unsigned short test_rotwl(unsigned short value, int shift) { +// CHECK-LABEL: test_rotwl +// CHECK: [[R:%.*]] = call noundef i16 @llvm.fshl.i16(i16 [[X:%.*]], i16 [[X]], i16 [[Y:%.*]]) +// CHECK: ret i16 [[R]] + return _rotwl(value, shift); +} + +unsigned int test_rotl(unsigned int value, int shift) { +// CHECK-32BIT-LONG-COMPAT17-LABEL: test_rotl +// CHECK-32BIT-LONG-COMPAT17: [[R:%.*]] = call i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-COMPAT17: ret i32 [[R]] +// +// CHECK-32BIT-LONG-NO-COMPAT17-LABEL: test_rotl +// CHECK-32BIT-LONG-NO-COMPAT17: [[R:%.*]] = call noundef i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-NO-COMPAT17: ret i32 [[R]] + return _rotl(value, shift); +} + +unsigned long test_lrotl(unsigned long value, int shift) { +// CHECK-32BIT-LONG-COMPAT17-LABEL: test_lrotl +// CHECK-32BIT-LONG-COMPAT17: [[R:%.*]] = call i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-COMPAT17: ret i32 [[R]] +// +// CHECK-32BIT-LONG-NO-COMPAT17-LABEL: test_lrotl +// CHECK-32BIT-LONG-NO-COMPAT17: [[R:%.*]] = call noundef i32 @llvm.fshl.i32(i32 [[X:%.*]], i32 [[X]], i32 [[Y:%.*]]) +// CHECK-32BIT-LONG-NO-COMPAT17: ret i32 [[R]] +// +// CHECK-64BIT-LONG-LABEL: test_lrotl +// CHECK-64BIT-LONG: [[R:%.*]] = call noundef i64 @llvm.fshl.i64(i64 [[X:%.*]], i64 [[X]], i64 [[Y:%.*]]) +// CHECK-64BIT-LONG: ret i64 [[R]] + return _lrotl(value, shift); +} + + +unsigned short test_ro... 
@goldsteinn goldsteinn changed the title inliner attrs [Inliner] Improve attribute propagation to callsites when inlining. Sep 12, 2023
@goldsteinn
Copy link
Contributor Author

@RKSimon
Copy link
Collaborator

RKSimon commented Sep 13, 2023

Split some tests for c/cpp; NFC

Why was this necessary?

@goldsteinn
Copy link
Contributor Author

Split some tests for c/cpp; NFC

Why was this necessary?

Different noundef properties for some of the intrinsics. Since there are multiple unique checks thought it was simpler to just split than add c/cxx version for each of the existing checks.

@nikic
Copy link
Contributor

nikic commented Sep 15, 2023

It occurs to me that the current return attribute propagation is currently buggy for poison-generating attributes: https://llvm.godbolt.org/z/x8n18q9Mj

In this case the argument to use() will now be poison as well, while before inlining only the return value was poison.

This code needs to distinguish poison and UB generating attributes.

@goldsteinn
Copy link
Contributor Author

It occurs to me that the current return attribute propagation is currently buggy for poison-generating attributes: https://llvm.godbolt.org/z/x8n18q9Mj

In this case the argument to use() will now be poison as well, while before inlining only the return value was poison.

This code needs to distinguish poison and UB generating attributes.

Good catch. I think this means basically nonnull, noundef, and align can only be propagated if there are no other uses in to-be-inlined function.
That sound right or do you see any more robust way forward?

@nikic
Copy link
Contributor

nikic commented Sep 18, 2023

It occurs to me that the current return attribute propagation is currently buggy for poison-generating attributes: https://llvm.godbolt.org/z/x8n18q9Mj
In this case the argument to use() will now be poison as well, while before inlining only the return value was poison.
This code needs to distinguish poison and UB generating attributes.

Good catch. I think this means basically nonnull, noundef, and align can only be propagated if there are no other uses in to-be-inlined function. That sound right or do you see any more robust way forward?

Limiting poison-generating attributes to one-use while keeping the rest of the logic (with guarantee-to-transfer) would be the simplest way to make the existing code correct. There are two additional ways to generalize:

  • If the return is also noundef, we don't need one-use.
  • Just one-use is enough for poison-generating, we don't need guaranteed-to-transfer, BUT: we need to be careful about an implicit "use" in the call itself. That is, if the call we're transfering to is noundef and we only check one-use but not guaranteed-to-transfer, we would convert poison into UB at that point.
@goldsteinn
Copy link
Contributor Author

It occurs to me that the current return attribute propagation is currently buggy for poison-generating attributes: https://llvm.godbolt.org/z/x8n18q9Mj
In this case the argument to use() will now be poison as well, while before inlining only the return value was poison.
This code needs to distinguish poison and UB generating attributes.

Good catch. I think this means basically nonnull, noundef, and align can only be propagated if there are no other uses in to-be-inlined function. That sound right or do you see any more robust way forward?

Limiting poison-generating attributes to one-use while keeping the rest of the logic (with guarantee-to-transfer) would be the simplest way to make the existing code correct. There are two additional ways to generalize:

  • If the return is also noundef, we don't need one-use.
    Is that right for nonnull?

I.e:

define noundef nonnull ptr @foo() { %b = call ptr @bar() call void @use(ptr %p) willreturn nounwind ret ptr %b } 

If we add nonnull to @bar during inlining it can still make %p poison for its use in @use. I get it will trigger proper UB at the return b.c the noundef, but is it okay to potentially trigger it early at the call to @use?

  • Just one-use is enough for poison-generating, we don't need guaranteed-to-transfer, BUT: we need to be careful about an implicit "use" in the call itself. That is, if the call we're transfering to is noundef and we only check one-use but not guaranteed-to-transfer, we would convert poison into UB at that point.

Oh I see, you mean something like:

define nonnull ptr @foo() { %b = call noundef ptr @bar() ret ptr %b } 

So it would convert poison ret to full UB if we transfer nonnull.
The otherway around too I guess i.e if we have:

define noundef ptr @foo() { %b = call nonull ptr @bar() ret ptr %b } 

We also need to be careful about transferring the noundef.

I think with one-use + guranteed to transfer (the latter is a precondition for transferring at all)
in the latter we would get UB either way, but we are moving to the point up a tiny bit.
I don't understand how this could ever be okay in the former case though? Isn't the former case
just straight up converting poison to UB which is a no-go?

@nikic
Copy link
Contributor

nikic commented Sep 18, 2023

define noundef nonnull ptr @foo() { %b = call ptr @bar() call void @use(ptr %p) willreturn nounwind ret ptr %b } 

If we add nonnull to @bar during inlining it can still make %p poison for its use in @use. I get it will trigger proper UB at the return b.c the noundef, but is it okay to potentially trigger it early at the call to @use?

Yes, this is fine, because the UB will be hit anyway. We can move it backwards, as long as it's guaranteed to execute.

  • Just one-use is enough for poison-generating, we don't need guaranteed-to-transfer, BUT: we need to be careful about an implicit "use" in the call itself. That is, if the call we're transfering to is noundef and we only check one-use but not guaranteed-to-transfer, we would convert poison into UB at that point.

Oh I see, you mean something like:

define nonnull ptr @foo() { %b = call noundef ptr @bar() ret ptr %b } 

So it would convert poison ret to full UB if we transfer nonnull. The otherway around too I guess i.e if we have:

define noundef ptr @foo() { %b = call nonull ptr @bar() ret ptr %b } 

We also need to be careful about transferring the noundef.

I think with one-use + guranteed to transfer (the latter is a precondition for transferring at all) in the latter we would get UB either way, but we are moving to the point up a tiny bit. I don't understand how this could ever be okay in the former case though? Isn't the former case just straight up converting poison to UB which is a no-go?

You are right, the former case isn't correct even with one-use to guaranteed-to-transfer. That case would only work if we also had noundef on the foo return value.

@goldsteinn
Copy link
Contributor Author

define noundef nonnull ptr @foo() { %b = call ptr @bar() call void @use(ptr %p) willreturn nounwind ret ptr %b } 

If we add nonnull to @bar during inlining it can still make %p poison for its use in @use. I get it will trigger proper UB at the return b.c the noundef, but is it okay to potentially trigger it early at the call to @use?

Yes, this is fine, because the UB will be hit anyway. We can move it backwards, as long as it's guaranteed to execute.

  • Just one-use is enough for poison-generating, we don't need guaranteed-to-transfer, BUT: we need to be careful about an implicit "use" in the call itself. That is, if the call we're transfering to is noundef and we only check one-use but not guaranteed-to-transfer, we would convert poison into UB at that point.

Oh I see, you mean something like:

define nonnull ptr @foo() { %b = call noundef ptr @bar() ret ptr %b } 

So it would convert poison ret to full UB if we transfer nonnull. The otherway around too I guess i.e if we have:

define noundef ptr @foo() { %b = call nonull ptr @bar() ret ptr %b } 

We also need to be careful about transferring the noundef.
I think with one-use + guranteed to transfer (the latter is a precondition for transferring at all) in the latter we would get UB either way, but we are moving to the point up a tiny bit. I don't understand how this could ever be okay in the former case though? Isn't the former case just straight up converting poison to UB which is a no-go?

You are right, the former case isn't correct even with one-use to guaranteed-to-transfer. That case would only work if we also had noundef on the foo return value.

@nikic, on the former case, could we just strip noundef from the ret attributes?
Just a hunch, but think nonnull, align,... are generally more useful.

@goldsteinn
Copy link
Contributor Author

``

It occurs to me that the current return attribute propagation is currently buggy for poison-generating attributes: https://llvm.godbolt.org/z/x8n18q9Mj

In this case the argument to use() will now be poison as well, while before inlining only the return value was poison.

This code needs to distinguish poison and UB generating attributes.

Fixed with new commit ([Inliner] Fix bug...).

@jdoerfert
Copy link
Member

I don't think 10 commit PRs are helpful. This should be 3 PRs at least. One is trivial, one is probably easy, and then one with the new stuff.

@goldsteinn
Copy link
Contributor Author

I don't think 10 commit PRs are helpful. This should be 3 PRs at least. One is trivial, one is probably easy, and then one with the new stuff.

So PR based on PR? The thing is there are dependencies between commits.
My expectation is that I will push commits as they are approved, not the whole PR
at once.

@jdoerfert
Copy link
Member

I don't think 10 commit PRs are helpful. This should be 3 PRs at least. One is trivial, one is probably easy, and then one with the new stuff.

So PR based on PR? The thing is there are dependencies between commits. My expectation is that I will push commits as they are approved, not the whole PR at once.

We always had dependences between commits. Just mention it in the commit message.
This PR contains arguably different things. It simply makes it harder to review it. The guidance on this is clear, make review chunks as small if possible. The RFC commits, the fix, and the new functionality do not need to be commingled.

@goldsteinn goldsteinn changed the title [Inliner] Improve attribute propagation to callsites when inlining. [Inliner] Fix bug when propagating poison generating return attributes Sep 26, 2023
@goldsteinn
Copy link
Contributor Author

I don't think 10 commit PRs are helpful. This should be 3 PRs at least. One is trivial, one is probably easy, and then one with the new stuff.

So PR based on PR? The thing is there are dependencies between commits. My expectation is that I will push commits as they are approved, not the whole PR at once.

We always had dependences between commits. Just mention it in the commit message. This PR contains arguably different things. It simply makes it harder to review it. The guidance on this is clear, make review chunks as small if possible. The RFC commits, the fix, and the new functionality do not need to be commingled.

Okay, I can't really see a reasonable way to create a proper series. I am going to submit this piece-meal.
At the moment only up to the bugfix submitted.
Do you by chance know a reasonable way to create a series of PRs s.t the individual commits can be
reasonably reviewed while also keeping all the commits together?
I look at SO answers like: https://stackoverflow.com/questions/30768148/git-split-pull-request-into-smaller-prs-based-upon-the-new-directories-in-the
but they only seem to work if the changes to to distinct files.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second commit is missing test diffs -- either tests are missing or improperly committed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do anything useful over a hasOneUse check? We only allow the case where the call and return are in the same block, so there can only be one ReturnInst user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at the moment, I guess my feeling is we would like to be able to handle uses that can't cause UB (something like an ptrmask use) which will require looping. Secondly, this seems more "correct" should our constraints above change, and will be roughly the same cost as hasOneUse in the meantime (will be at most 2 iter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See caller12_todo test as an example of what we should be able to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we should only introduce this loop once it actually does something useful. For now this is just confusing dead code.

@goldsteinn
Copy link
Contributor Author

Second commit is missing test diffs -- either tests are missing or improperly committed.

Sorry, had gotten messed up when splitting. Fixed.

@github-actions
Copy link

github-actions bot commented Sep 28, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 1454 to 1466
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would write this either as

Suggested change
bool Okay = true;
if (!CB.hasRetAttr(Attribute::NoUndef)) {
// Cover case 3 where introducing a poison generating attribute can
// cause immediate UB.
Okay = !RetVal->hasRetAttr(Attribute::NoUndef);
// This is an overly conservative way to check that there are no other
// uses whose behavior may change/may introduce UB if we potentially
// return poison.
// TODO: Iterative through uses and only bail if we have a potentially
// dangerous use.
Okay &= RetVal->hasOneUse();
}
if (Okay)
// Comments...
if (CB.hasRetAttr(Attribute::NoUndef) ||
(!RetVal->hasRetAttr(Attribute::NoUndef) && RetVal->hasOneUse()))

or move it into a function with early returns, rather than using an Okay flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, although imo it makes the comments a bit more difficult to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dso_local shouldn't be needed.

Poison generating return attributes can't be propagated the same as others, as they can change the behavior of other uses and/or create UB where it otherwise wouldn't have occurred. For example: ``` define nonnull ptr @foo() { %p = call ptr @bar() call void @use(ptr %p) ret ptr %p } ``` If we inline `@foo` and propagate `nonnull` to `@bar`, it could change the behavior of `@use` as instead of taking `null`, `@use` will now be passed `poison`. This can be even worth in a case like: ``` define nonnull ptr @foo() { %p = call noundef ptr @bar() ret ptr %p } ``` Where propagating `nonnull` to `@bar` will cause UB on `null` return of `@bar` (`noundef` + `poison`) where it previously wouldn't have occurred. To fix this, we only propagate poison generating return attributes if either 1) The only use of the callsite to propagate too is return and the callsite to propagate too doesn't have `noundef`. Or 2) the callsite to be be inlined has `noundef`. The former case ensures no new UB or `poison` values will be added. The latter is UB anyways if the value is `poison` so we can go ahead without worrying about behavior changes.
@goldsteinn
Copy link
Contributor Author

Pushed.

@goldsteinn goldsteinn closed this Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 clang Clang issues not falling into any other category llvm:transforms

5 participants