[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic #69926

CarolineConcatto · 2023-10-23T13:26:26Z

This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present.

[1]ARM-software/acle#257

Co-author: Dinar Temirbulatov dinar.temirbulatov@arm.com

This patch implements the builtins in Clang and the LLVM-IR intrinsic for the following: // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64, // _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64; uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for _f32, _f64 float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn); According to the PR#257[1] The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present. [1]ARM-software/acle#257 Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>

llvmbot · 2023-10-23T13:27:35Z

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: None (CarolineConcatto)

Changes

This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present.

[1]ARM-software/acle#257

Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>

Patch is 94.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69926.diff

12 Files Affected:

(modified) clang/include/clang/Basic/TargetBuiltins.h (+1-1)
(modified) clang/include/clang/Basic/arm_sve.td (+17)
(modified) clang/include/clang/Basic/arm_sve_sme_incl.td (+2)
(modified) clang/lib/CodeGen/CGBuiltin.cpp (+4)
(added) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c (+285)
(added) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_int_reduce.c (+784)
(modified) clang/utils/TableGen/SveEmitter.cpp (+32-3)
(modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+21)
(modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+13-13)
(modified) llvm/lib/Target/AArch64/SVEInstrFormats.td (+11-2)
(added) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-fp-reduce.ll (+189)
(added) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-int-reduce.ll (+356)

diff --git a/clang/include/clang/Basic/TargetBuiltins.h b/clang/include/clang/Basic/TargetBuiltins.h index 8f7881abf26f7f4..c9f9cbec7493bfc 100644 --- a/clang/include/clang/Basic/TargetBuiltins.h +++ b/clang/include/clang/Basic/TargetBuiltins.h @@ -309,7 +309,7 @@ namespace clang { bool isTupleSet() const { return Flags & IsTupleSet; } bool isReadZA() const { return Flags & IsReadZA; } bool isWriteZA() const { return Flags & IsWriteZA; } - + bool isReductionQV() const { return Flags & IsReductionQV; } uint64_t getBits() const { return Flags; } bool isFlagSet(uint64_t Flag) const { return Flags & Flag; } }; diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td index b5baafedd139602..e8fef1e7a8dfb0d 100644 --- a/clang/include/clang/Basic/arm_sve.td +++ b/clang/include/clang/Basic/arm_sve.td @@ -1859,6 +1859,23 @@ def SVBGRP : SInst<"svbgrp[_{d}]", "ddd", "UcUsUiUl", MergeNone, "aarch64_sv def SVBGRP_N : SInst<"svbgrp[_n_{d}]", "dda", "UcUsUiUl", MergeNone, "aarch64_sve_bgrp_x">; } +// Standalone sve2.1 builtins +let TargetGuard = "sve2p1" in { +def SVORQV : SInst<"svorqv[_{d}]", "{Pd", "csilUcUsUiUl", MergeNone, "aarch64_sve_orqv", [IsReductionQV]>; +def SVEORQV : SInst<"sveorqv[_{d}]", "{Pd", "csilUcUsUiUl", MergeNone, "aarch64_sve_eorqv", [IsReductionQV]>; +def SVADDQV : SInst<"svaddqv[_{d}]", "{Pd", "hfdcsilUcUsUiUl", MergeNone, "aarch64_sve_addqv", [IsReductionQV]>; +def SVANDQV : SInst<"svandqv[_{d}]", "{Pd", "csilUcUsUiUl", MergeNone, "aarch64_sve_andqv", [IsReductionQV]>; +def SVSMAXQV : SInst<"svmaxqv[_{d}]", "{Pd", "csil", MergeNone, "aarch64_sve_smaxqv", [IsReductionQV]>; +def SVUMAXQV : SInst<"svmaxqv[_{d}]", "{Pd", "UcUsUiUl", MergeNone, "aarch64_sve_umaxqv", [IsReductionQV]>; +def SVSMINQV : SInst<"svminqv[_{d}]", "{Pd", "csil", MergeNone, "aarch64_sve_sminqv", [IsReductionQV]>; +def SVUMINQV : SInst<"svminqv[_{d}]", "{Pd", "UcUsUiUl", MergeNone, "aarch64_sve_uminqv", [IsReductionQV]>; + +def SVFMAXNMQV: SInst<"svmaxnmqv[_{d}]", "{Pd", "hfd", MergeNone, "aarch64_sve_fmaxnmqv", [IsReductionQV]>; +def SVFMINNMQV: SInst<"svminnmqv[_{d}]", "{Pd", "hfd", MergeNone, "aarch64_sve_fminnmqv", [IsReductionQV]>; +def SVFMAXQV: SInst<"svmaxqv[_{d}]", "{Pd", "hfd", MergeNone, "aarch64_sve_fmaxqv", [IsReductionQV]>; +def SVFMINQV: SInst<"svminqv[_{d}]", "{Pd", "hfd", MergeNone, "aarch64_sve_fminqv", [IsReductionQV]>; +} + let TargetGuard = "sve2p1" in { def SVFCLAMP : SInst<"svclamp[_{d}]", "dddd", "hfd", MergeNone, "aarch64_sve_fclamp", [], []>; def SVPTRUE_COUNT : SInst<"svptrue_{d}", "}v", "QcQsQiQl", MergeNone, "aarch64_sve_ptrue_{d}", [IsOverloadNone], []>; diff --git a/clang/include/clang/Basic/arm_sve_sme_incl.td b/clang/include/clang/Basic/arm_sve_sme_incl.td index 3a7a5b51b25801e..9fe497173b56ac6 100644 --- a/clang/include/clang/Basic/arm_sve_sme_incl.td +++ b/clang/include/clang/Basic/arm_sve_sme_incl.td @@ -128,6 +128,7 @@ // Z: const pointer to uint64_t // Prototype modifiers added for SVE2p1 +// {: 128b vector // }: svcount_t class MergeType<int val, string suffix=""> { @@ -224,6 +225,7 @@ def IsSharedZA : FlagType<0x8000000000>; def IsPreservesZA : FlagType<0x10000000000>; def IsReadZA : FlagType<0x20000000000>; def IsWriteZA : FlagType<0x40000000000>; +def IsReductionQV : FlagType<0x80000000000>; // These must be kept in sync with the flags in include/clang/Basic/TargetBuiltins.h class ImmCheckType<int val> { diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index e1211bb8949b665..86e77db4b914571 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -9834,6 +9834,10 @@ CodeGenFunction::getSVEOverloadTypes(const SVETypeFlags &TypeFlags, if (TypeFlags.isOverloadCvt()) return {Ops[0]->getType(), Ops.back()->getType()}; + if (TypeFlags.isReductionQV() && !ResultType->isScalableTy() && + ResultType->isVectorTy()) + return {ResultType, Ops[1]->getType()}; + assert(TypeFlags.isOverloadDefault() && "Unexpected value for overloads"); return {DefaultType}; } diff --git a/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c new file mode 100644 index 000000000000000..e58cf4e49a37f92 --- /dev/null +++ b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c @@ -0,0 +1,285 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: aarch64-registered-target +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -O1 -Werror -Wall -emit-llvm -o - -x c++ %s | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -O1 -Werror -Wall -emit-llvm -o - -x c++ %s | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s +#include <arm_neon.h> +#include <arm_sve.h> + +#ifdef SVE_OVERLOADED_FORMS +// A simple used,unused... macro, long enough to represent any SVE builtin. +#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED) A1##A3 +#else +#define SVE_ACLE_FUNC(A1,A2,A3,A4) A1##A2##A3##A4 +#endif + +// FADDQV + +// CHECK-LABEL: @test_svaddqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.addqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svaddqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.addqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svaddqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svaddqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svaddqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.addqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svaddqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.addqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svaddqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svaddqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svaddqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.addqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svaddqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.addqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svaddqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svaddqv,,_f64,)(pg, op); +} + + +// FMAXQV + +// CHECK-LABEL: @test_svmaxqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fmaxqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svmaxqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fmaxqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svmaxqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svmaxqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svmaxqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fmaxqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svmaxqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fmaxqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svmaxqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svmaxqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svmaxqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fmaxqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svmaxqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fmaxqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svmaxqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svmaxqv,,_f64,)(pg, op); +} + + +// FMINQV + +// CHECK-LABEL: @test_svminqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fminqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svminqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fminqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svminqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svminqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svminqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fminqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svminqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fminqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svminqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svminqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svminqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fminqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svminqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fminqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svminqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svminqv,,_f64,)(pg, op); +} + + +// FMAXNMQV + +// CHECK-LABEL: @test_svmaxnmqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fmaxnmqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svmaxnmqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fmaxnmqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svmaxnmqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svmaxnmqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svmaxnmqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fmaxnmqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svmaxnmqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fmaxnmqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svmaxnmqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svmaxnmqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svmaxnmqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fmaxnmqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svmaxnmqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fmaxnmqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svmaxnmqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svmaxnmqv,,_f64,)(pg, op); +} + + +// FMINNMQV + +// CHECK-LABEL: @test_svminnmqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fminnmqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svminnmqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fminnmqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svminnmqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svminnmqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svminnmqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fminnmqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svminnmqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fminnmqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svminnmqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svminnmqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svminnmqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fminnmqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svminnmqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fminnmqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svminnmqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svminnmqv,,_f64,)(pg, op); +} diff --git a/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_int_reduc... [truncated]

dtemirbulatov

LGTM.

clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c

clang/utils/TableGen/SveEmitter.cpp

Add a new header for neon and sve This patch implements the builtins in Clang and the LLVM-IR intrinsic for the following: // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64, // _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64; uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for _f32, _f64 float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn); According to the PR#257[1] The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present. [1]ARM-software/acle#257 Co-author by: Dinar Temirbulatov <dinar.temirbulatov@arm.com>

This patch is needed for the reduction instructions in sve2.1 It add ta new header to sve with all the fixed vector types. The new types are only added if neon is not declared.

github-actions · 2023-11-24T12:50:51Z

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

Remove arm_neon_types.h from NeonEmmiter

Now any combination between arm_neon.h and arm_sve.h should work

llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:ir labels Oct 23, 2023

CarolineConcatto requested review from dtemirbulatov, hassnaaHamdi, momchil-velikov and sdesmalen-arm and removed request for dtemirbulatov October 23, 2023 13:28

dtemirbulatov approved these changes Oct 27, 2023

View reviewed changes

sdesmalen-arm reviewed Nov 2, 2023

View reviewed changes

clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c Show resolved Hide resolved

clang/utils/TableGen/SveEmitter.cpp Outdated Show resolved Hide resolved

llvmbot added backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Nov 8, 2023

CarolineConcatto added 2 commits November 24, 2023 12:03

[Clang][AArch64] Add fix vector types to header into SVE

e8ec7c8

This patch is needed for the reduction instructions in sve2.1 It add ta new header to sve with all the fixed vector types. The new types are only added if neon is not declared.

Remove arm_neon_types.h

4367437

CarolineConcatto added 7 commits November 24, 2023 14:22

Fix missing arm_neon_types.h

32133d9

Remove arm_neon_types.h from NeonEmmiter

Fix header arm_vector_type to work with any header combination

c15c8ba

Now any combination between arm_neon.h and arm_sve.h should work

Merge branch 'main' into refactor-assembly-class

ff9295a

Use neon_vector_type for neon and sve headers

3240896

Fix tests fails

5cbbb3e

Restore NeonEmitter.cpp

1fcdfd2

Merge branch 'llvm:main' into refactor-assembly-class

fffac83

CarolineConcatto merged commit f2464ca into llvm:main Dec 13, 2023

CarolineConcatto deleted the refactor-assembly-class branch December 13, 2023 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic #69926

[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic #69926

Uh oh!

CarolineConcatto commented Oct 23, 2023

llvmbot commented Oct 23, 2023 •

edited

Loading

dtemirbulatov left a comment

Uh oh!

Uh oh!

github-actions bot commented Nov 24, 2023 •

edited

Loading

Labels

4 participants

[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic #69926

[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic #69926

Uh oh!

Conversation

CarolineConcatto commented Oct 23, 2023

llvmbot commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dtemirbulatov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

4 participants

llvmbot commented Oct 23, 2023 •

edited

Loading

github-actions bot commented Nov 24, 2023 •

edited

Loading