Skip to content

Conversation

@CarolineConcatto
Copy link
Contributor

This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present.

[1]ARM-software/acle#257

Co-author: Dinar Temirbulatov dinar.temirbulatov@arm.com

This patch implements the builtins in Clang and the LLVM-IR intrinsic for the following: // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64, // _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64; uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for _f32, _f64 float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn); According to the PR#257[1] The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present. [1]ARM-software/acle#257 Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:ir labels Oct 23, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 23, 2023

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: None (CarolineConcatto)

Changes

This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present.

[1]ARM-software/acle#257

Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>


Patch is 94.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69926.diff

12 Files Affected:

  • (modified) clang/include/clang/Basic/TargetBuiltins.h (+1-1)
  • (modified) clang/include/clang/Basic/arm_sve.td (+17)
  • (modified) clang/include/clang/Basic/arm_sve_sme_incl.td (+2)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+4)
  • (added) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c (+285)
  • (added) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_int_reduce.c (+784)
  • (modified) clang/utils/TableGen/SveEmitter.cpp (+32-3)
  • (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+21)
  • (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+13-13)
  • (modified) llvm/lib/Target/AArch64/SVEInstrFormats.td (+11-2)
  • (added) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-fp-reduce.ll (+189)
  • (added) llvm/test/CodeGen/AArch64/sve2p1-intrinsics-int-reduce.ll (+356)
diff --git a/clang/include/clang/Basic/TargetBuiltins.h b/clang/include/clang/Basic/TargetBuiltins.h index 8f7881abf26f7f4..c9f9cbec7493bfc 100644 --- a/clang/include/clang/Basic/TargetBuiltins.h +++ b/clang/include/clang/Basic/TargetBuiltins.h @@ -309,7 +309,7 @@ namespace clang { bool isTupleSet() const { return Flags & IsTupleSet; } bool isReadZA() const { return Flags & IsReadZA; } bool isWriteZA() const { return Flags & IsWriteZA; } - + bool isReductionQV() const { return Flags & IsReductionQV; } uint64_t getBits() const { return Flags; } bool isFlagSet(uint64_t Flag) const { return Flags & Flag; } }; diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td index b5baafedd139602..e8fef1e7a8dfb0d 100644 --- a/clang/include/clang/Basic/arm_sve.td +++ b/clang/include/clang/Basic/arm_sve.td @@ -1859,6 +1859,23 @@ def SVBGRP : SInst<"svbgrp[_{d}]", "ddd", "UcUsUiUl", MergeNone, "aarch64_sv def SVBGRP_N : SInst<"svbgrp[_n_{d}]", "dda", "UcUsUiUl", MergeNone, "aarch64_sve_bgrp_x">; } +// Standalone sve2.1 builtins +let TargetGuard = "sve2p1" in { +def SVORQV : SInst<"svorqv[_{d}]", "{Pd", "csilUcUsUiUl", MergeNone, "aarch64_sve_orqv", [IsReductionQV]>; +def SVEORQV : SInst<"sveorqv[_{d}]", "{Pd", "csilUcUsUiUl", MergeNone, "aarch64_sve_eorqv", [IsReductionQV]>; +def SVADDQV : SInst<"svaddqv[_{d}]", "{Pd", "hfdcsilUcUsUiUl", MergeNone, "aarch64_sve_addqv", [IsReductionQV]>; +def SVANDQV : SInst<"svandqv[_{d}]", "{Pd", "csilUcUsUiUl", MergeNone, "aarch64_sve_andqv", [IsReductionQV]>; +def SVSMAXQV : SInst<"svmaxqv[_{d}]", "{Pd", "csil", MergeNone, "aarch64_sve_smaxqv", [IsReductionQV]>; +def SVUMAXQV : SInst<"svmaxqv[_{d}]", "{Pd", "UcUsUiUl", MergeNone, "aarch64_sve_umaxqv", [IsReductionQV]>; +def SVSMINQV : SInst<"svminqv[_{d}]", "{Pd", "csil", MergeNone, "aarch64_sve_sminqv", [IsReductionQV]>; +def SVUMINQV : SInst<"svminqv[_{d}]", "{Pd", "UcUsUiUl", MergeNone, "aarch64_sve_uminqv", [IsReductionQV]>; + +def SVFMAXNMQV: SInst<"svmaxnmqv[_{d}]", "{Pd", "hfd", MergeNone, "aarch64_sve_fmaxnmqv", [IsReductionQV]>; +def SVFMINNMQV: SInst<"svminnmqv[_{d}]", "{Pd", "hfd", MergeNone, "aarch64_sve_fminnmqv", [IsReductionQV]>; +def SVFMAXQV: SInst<"svmaxqv[_{d}]", "{Pd", "hfd", MergeNone, "aarch64_sve_fmaxqv", [IsReductionQV]>; +def SVFMINQV: SInst<"svminqv[_{d}]", "{Pd", "hfd", MergeNone, "aarch64_sve_fminqv", [IsReductionQV]>; +} + let TargetGuard = "sve2p1" in { def SVFCLAMP : SInst<"svclamp[_{d}]", "dddd", "hfd", MergeNone, "aarch64_sve_fclamp", [], []>; def SVPTRUE_COUNT : SInst<"svptrue_{d}", "}v", "QcQsQiQl", MergeNone, "aarch64_sve_ptrue_{d}", [IsOverloadNone], []>; diff --git a/clang/include/clang/Basic/arm_sve_sme_incl.td b/clang/include/clang/Basic/arm_sve_sme_incl.td index 3a7a5b51b25801e..9fe497173b56ac6 100644 --- a/clang/include/clang/Basic/arm_sve_sme_incl.td +++ b/clang/include/clang/Basic/arm_sve_sme_incl.td @@ -128,6 +128,7 @@ // Z: const pointer to uint64_t // Prototype modifiers added for SVE2p1 +// {: 128b vector // }: svcount_t class MergeType<int val, string suffix=""> { @@ -224,6 +225,7 @@ def IsSharedZA : FlagType<0x8000000000>; def IsPreservesZA : FlagType<0x10000000000>; def IsReadZA : FlagType<0x20000000000>; def IsWriteZA : FlagType<0x40000000000>; +def IsReductionQV : FlagType<0x80000000000>; // These must be kept in sync with the flags in include/clang/Basic/TargetBuiltins.h class ImmCheckType<int val> { diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index e1211bb8949b665..86e77db4b914571 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -9834,6 +9834,10 @@ CodeGenFunction::getSVEOverloadTypes(const SVETypeFlags &TypeFlags, if (TypeFlags.isOverloadCvt()) return {Ops[0]->getType(), Ops.back()->getType()}; + if (TypeFlags.isReductionQV() && !ResultType->isScalableTy() && + ResultType->isVectorTy()) + return {ResultType, Ops[1]->getType()}; + assert(TypeFlags.isOverloadDefault() && "Unexpected value for overloads"); return {DefaultType}; } diff --git a/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c new file mode 100644 index 000000000000000..e58cf4e49a37f92 --- /dev/null +++ b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_fp_reduce.c @@ -0,0 +1,285 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: aarch64-registered-target +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -O1 -Werror -Wall -emit-llvm -o - -x c++ %s | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -O1 -Werror -Wall -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -O1 -Werror -Wall -emit-llvm -o - -x c++ %s | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s +#include <arm_neon.h> +#include <arm_sve.h> + +#ifdef SVE_OVERLOADED_FORMS +// A simple used,unused... macro, long enough to represent any SVE builtin. +#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED) A1##A3 +#else +#define SVE_ACLE_FUNC(A1,A2,A3,A4) A1##A2##A3##A4 +#endif + +// FADDQV + +// CHECK-LABEL: @test_svaddqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.addqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svaddqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.addqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svaddqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svaddqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svaddqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.addqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svaddqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.addqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svaddqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svaddqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svaddqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.addqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svaddqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.addqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svaddqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svaddqv,,_f64,)(pg, op); +} + + +// FMAXQV + +// CHECK-LABEL: @test_svmaxqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fmaxqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svmaxqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fmaxqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svmaxqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svmaxqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svmaxqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fmaxqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svmaxqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fmaxqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svmaxqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svmaxqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svmaxqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fmaxqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svmaxqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fmaxqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svmaxqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svmaxqv,,_f64,)(pg, op); +} + + +// FMINQV + +// CHECK-LABEL: @test_svminqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fminqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svminqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fminqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svminqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svminqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svminqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fminqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svminqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fminqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svminqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svminqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svminqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fminqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z16test_svminqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fminqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svminqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svminqv,,_f64,)(pg, op); +} + + +// FMAXNMQV + +// CHECK-LABEL: @test_svmaxnmqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fmaxnmqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svmaxnmqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fmaxnmqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svmaxnmqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svmaxnmqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svmaxnmqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fmaxnmqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svmaxnmqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fmaxnmqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svmaxnmqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svmaxnmqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svmaxnmqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fmaxnmqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svmaxnmqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fmaxnmqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svmaxnmqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svmaxnmqv,,_f64,)(pg, op); +} + + +// FMINNMQV + +// CHECK-LABEL: @test_svminnmqv_f16( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fminnmqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CHECK-NEXT: ret <8 x half> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svminnmqv_f16u10__SVBool_tu13__SVFloat16_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <8 x half> @llvm.aarch64.sve.fminnmqv.v8f16.nxv8f16(<vscale x 8 x i1> [[TMP0]], <vscale x 8 x half> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <8 x half> [[TMP1]] +// +float16x8_t test_svminnmqv_f16(svbool_t pg, svfloat16_t op) +{ + return SVE_ACLE_FUNC(svminnmqv,,_f16,)(pg, op); +} + +// CHECK-LABEL: @test_svminnmqv_f32( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fminnmqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CHECK-NEXT: ret <4 x float> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svminnmqv_f32u10__SVBool_tu13__SVFloat32_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.aarch64.sve.fminnmqv.v4f32.nxv4f32(<vscale x 4 x i1> [[TMP0]], <vscale x 4 x float> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <4 x float> [[TMP1]] +// +float32x4_t test_svminnmqv_f32(svbool_t pg, svfloat32_t op) +{ + return SVE_ACLE_FUNC(svminnmqv,,_f32,)(pg, op); +} + +// CHECK-LABEL: @test_svminnmqv_f64( +// CHECK-NEXT: entry: +// CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fminnmqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CHECK-NEXT: ret <2 x double> [[TMP1]] +// +// CPP-CHECK-LABEL: @_Z18test_svminnmqv_f64u10__SVBool_tu13__SVFloat64_t( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.*]]) +// CPP-CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.aarch64.sve.fminnmqv.v2f64.nxv2f64(<vscale x 2 x i1> [[TMP0]], <vscale x 2 x double> [[OP:%.*]]) +// CPP-CHECK-NEXT: ret <2 x double> [[TMP1]] +// +float64x2_t test_svminnmqv_f64(svbool_t pg, svfloat64_t op) +{ + return SVE_ACLE_FUNC(svminnmqv,,_f64,)(pg, op); +} diff --git a/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_int_reduc... [truncated] 
Copy link
Contributor

@dtemirbulatov dtemirbulatov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Add a new header for neon and sve This patch implements the builtins in Clang and the LLVM-IR intrinsic for the following: // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64, // _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64; uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for _f32, _f64 float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn); According to the PR#257[1] The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present. [1]ARM-software/acle#257 Co-author by: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
@llvmbot llvmbot added backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Nov 8, 2023
This patch is needed for the reduction instructions in sve2.1 It add ta new header to sve with all the fixed vector types. The new types are only added if neon is not declared.
@github-actions
Copy link

github-actions bot commented Nov 24, 2023

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter. 
@CarolineConcatto CarolineConcatto merged commit f2464ca into llvm:main Dec 13, 2023
@CarolineConcatto CarolineConcatto deleted the refactor-assembly-class branch December 13, 2023 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category llvm:ir

4 participants