Skip to content

Conversation

@MacDue
Copy link
Member

@MacDue MacDue commented Dec 9, 2024

This is a split-off from #109833 and only adds code relating to checking if a struct-returning call can be vectorized.

This initial patch only allows the case where all users of the struct return are extractvalue operations that can be widened.

%call = tail call { float, float } @foo(float %in_val) %extract_a = extractvalue { float, float } %call, 0 %extract_b = extractvalue { float, float } %call, 1 

Note: The tests require the VFABI changes from #119000 to pass.

@llvmbot llvmbot added vectorizers llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Dec 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-vectorizers

Author: Benjamin Maxwell (MacDue)

Changes

This is a split-off from #109833 and only adds code relating to checking if a struct-returning call can be vectorized.

This initial patch only allows the case where all users of the struct return are extractvalue operations that can be widened.

%call = tail call { float, float } @<!-- -->foo(float %in_val) #<!-- -->0 %extract_a = extractvalue { float, float } %call, 0 %extract_b = extractvalue { float, float } %call, 1 

Note: The tests require the VFABI changes from #119000 to pass.


Patch is 22.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119221.diff

7 Files Affected:

  • (modified) llvm/include/llvm/Analysis/VectorUtils.h (+4)
  • (modified) llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h (+10)
  • (modified) llvm/lib/Analysis/VectorUtils.cpp (+15)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+15-2)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+9)
  • (added) llvm/test/Transforms/LoopVectorize/AArch64/scalable-struct-return.ll (+97)
  • (added) llvm/test/Transforms/LoopVectorize/struct-return.ll (+268)
diff --git a/llvm/include/llvm/Analysis/VectorUtils.h b/llvm/include/llvm/Analysis/VectorUtils.h index c1016dd7bdddbd..2f89ff562ae752 100644 --- a/llvm/include/llvm/Analysis/VectorUtils.h +++ b/llvm/include/llvm/Analysis/VectorUtils.h @@ -140,6 +140,10 @@ inline Type *ToVectorTy(Type *Scalar, unsigned VF) { return ToVectorTy(Scalar, ElementCount::getFixed(VF)); } +/// Returns true if the call return type `Ty` can be widened by the loop +/// vectorizer. +bool canWidenCallReturnType(Type *Ty); + /// Identify if the intrinsic is trivially vectorizable. /// This method returns true if the intrinsic's argument types are all scalars /// for the scalar form of the intrinsic and all vectors (or scalars handled by diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h index dc7e484a40a452..0bbec848702372 100644 --- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h +++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h @@ -417,6 +417,10 @@ class LoopVectorizationLegality { /// has a vectorized variant available. bool hasVectorCallVariants() const { return VecCallVariantsFound; } + /// Returns true if there is at least one function call in the loop which + /// returns a struct type and needs to be vectorized. + bool hasStructVectorCall() const { return StructVecVecCallFound; } + unsigned getNumStores() const { return LAI->getNumStores(); } unsigned getNumLoads() const { return LAI->getNumLoads(); } @@ -639,6 +643,12 @@ class LoopVectorizationLegality { /// the use of those function variants. bool VecCallVariantsFound = false; + /// If we find a call (to be vectorized) that returns a struct type, record + /// that so we can bail out until this is supported. + /// TODO: Remove this flag once vectorizing calls with struct returns is + /// supported. + bool StructVecVecCallFound = false; + /// Indicates whether this loop has an uncountable early exit, i.e. an /// uncountable exiting block that is not the latch. bool HasUncountableEarlyExit = false; diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp index 5f7aa530342489..4b47154b6d972a 100644 --- a/llvm/lib/Analysis/VectorUtils.cpp +++ b/llvm/lib/Analysis/VectorUtils.cpp @@ -39,6 +39,21 @@ static cl::opt<unsigned> MaxInterleaveGroupFactor( cl::desc("Maximum factor for an interleaved access group (default = 8)"), cl::init(8)); +/// Returns true if the call return type `Ty` can be widened by the loop +/// vectorizer. +bool llvm::canWidenCallReturnType(Type *Ty) { + Type *ElTy = Ty; + // For now, only allow widening non-packed literal structs where all + // element types are the same. This simplifies the cost model and + // conversion between scalar and wide types. + if (auto *StructTy = dyn_cast<StructType>(Ty); + StructTy && !StructTy->isPacked() && StructTy->isLiteral() && + StructTy->containsHomogeneousTypes()) { + ElTy = StructTy->elements().front(); + } + return VectorType::isValidElementType(ElTy); +} + /// Return true if all of the intrinsic's arguments and return type are scalars /// for the scalar form of the intrinsic, and vectors for the vector form of the /// intrinsic (except operands that are marked as always being scalar by diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp index f1568781252c06..5276b17dd7df1e 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp @@ -943,11 +943,24 @@ bool LoopVectorizationLegality::canVectorizeInstrs() { if (CI && !VFDatabase::getMappings(*CI).empty()) VecCallVariantsFound = true; + auto canWidenInstruction = [this](Instruction const &Inst) { + Type *InstTy = Inst.getType(); + if (isa<CallInst>(Inst) && isa<StructType>(InstTy) && + canWidenCallReturnType(InstTy)) { + StructVecVecCallFound = true; + // For now, we can only widen struct values returned from calls where + // all users are extractvalue instructions. + return llvm::all_of(Inst.uses(), [](auto &Use) { + return isa<ExtractValueInst>(Use.getUser()); + }); + } + return VectorType::isValidElementType(InstTy) || InstTy->isVoidTy(); + }; + // Check that the instruction return type is vectorizable. // We can't vectorize casts from vector type to scalar type. // Also, we can't vectorize extractelement instructions. - if ((!VectorType::isValidElementType(I.getType()) && - !I.getType()->isVoidTy()) || + if (!canWidenInstruction(I) || (isa<CastInst>(I) && !VectorType::isValidElementType(I.getOperand(0)->getType())) || isa<ExtractElementInst>(I)) { diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 37118702762956..af10c127678277 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -10004,6 +10004,15 @@ bool LoopVectorizePass::processLoop(Loop *L) { return false; } + if (LVL.hasStructVectorCall()) { + constexpr StringLiteral FailureMessage( + "Auto-vectorization of calls that return struct types is not yet " + "supported"); + reportVectorizationFailure(FailureMessage, FailureMessage, + "StructCallVectorizationUnsupported", ORE, L); + return false; + } + // Entrance to the VPlan-native vectorization path. Outer loops are processed // here. They may require CFG and instruction level transformations before // even evaluating whether vectorization is profitable. Since we cannot modify diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-struct-return.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-struct-return.ll new file mode 100644 index 00000000000000..0454272d3f3dd6 --- /dev/null +++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-struct-return.ll @@ -0,0 +1,97 @@ +; RUN: opt < %s -mattr=+sve -passes=loop-vectorize,dce,instcombine -force-vector-interleave=1 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S | FileCheck %s +; RUN: opt < %s -mattr=+sve -passes=loop-vectorize,dce,instcombine -force-vector-interleave=1 -prefer-predicate-over-epilogue=predicate-dont-vectorize -pass-remarks-analysis=loop-vectorize -disable-output -S 2>&1 | FileCheck %s --check-prefix=CHECK-REMARKS + +target triple = "aarch64-unknown-linux-gnu" + +; Tests basic vectorization of scalable homogeneous struct literal returns. + +; TODO: Support vectorization in this case. +; CHECK-REMARKS: remark: {{.*}} loop not vectorized: Auto-vectorization of calls that return struct types is not yet supported +define void @struct_return_f32_widen(ptr noalias %in, ptr noalias writeonly %out_a, ptr noalias writeonly %out_b) { +; CHECK-LABEL: define void @struct_return_f32_widen +; CHECK-NOT: vector.body: +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds float, ptr %in, i64 %iv + %in_val = load float, ptr %arrayidx, align 4 + %call = tail call { float, float } @foo(float %in_val) #0 + %extract_a = extractvalue { float, float } %call, 0 + %extract_b = extractvalue { float, float } %call, 1 + %arrayidx2 = getelementptr inbounds float, ptr %out_a, i64 %iv + store float %extract_a, ptr %arrayidx2, align 4 + %arrayidx4 = getelementptr inbounds float, ptr %out_b, i64 %iv + store float %extract_b, ptr %arrayidx4, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +; TODO: Support vectorization in this case. +; CHECK-REMARKS: remark: {{.*}} loop not vectorized: Auto-vectorization of calls that return struct types is not yet supported +define void @struct_return_f64_widen(ptr noalias %in, ptr noalias writeonly %out_a, ptr noalias writeonly %out_b) { +; CHECK-LABEL: define void @struct_return_f64_widen +; CHECK-NOT: vector.body: +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds double, ptr %in, i64 %iv + %in_val = load double, ptr %arrayidx, align 8 + %call = tail call { double, double } @bar(double %in_val) #1 + %extract_a = extractvalue { double, double } %call, 0 + %extract_b = extractvalue { double, double } %call, 1 + %arrayidx2 = getelementptr inbounds double, ptr %out_a, i64 %iv + store double %extract_a, ptr %arrayidx2, align 8 + %arrayidx4 = getelementptr inbounds double, ptr %out_b, i64 %iv + store double %extract_b, ptr %arrayidx4, align 8 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +; TODO: Support vectorization in this case. +; CHECK-REMARKS: remark: {{.*}} loop not vectorized: Auto-vectorization of calls that return struct types is not yet supported +define void @struct_return_f32_widen_rt_checks(ptr %in, ptr writeonly %out_a, ptr writeonly %out_b) { +; CHECK-LABEL: define void @struct_return_f32_widen_rt_checks +; CHECK-NOT: vector.body: +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds float, ptr %in, i64 %iv + %in_val = load float, ptr %arrayidx, align 4 + %call = tail call { float, float } @foo(float %in_val) #0 + %extract_a = extractvalue { float, float } %call, 0 + %extract_b = extractvalue { float, float } %call, 1 + %arrayidx2 = getelementptr inbounds float, ptr %out_a, i64 %iv + store float %extract_a, ptr %arrayidx2, align 4 + %arrayidx4 = getelementptr inbounds float, ptr %out_b, i64 %iv + store float %extract_b, ptr %arrayidx4, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +declare { float, float } @foo(float) +declare { double, double } @bar(double) + +declare { <vscale x 4 x float>, <vscale x 4 x float> } @scalable_vec_masked_foo(<vscale x 4 x float>, <vscale x 4 x i1>) +declare { <vscale x 2 x double>, <vscale x 2 x double> } @scalable_vec_masked_bar(<vscale x 2 x double>, <vscale x 2 x i1>) + + +attributes #0 = { nounwind "vector-function-abi-variant"="_ZGVsMxv_foo(scalable_vec_masked_foo)" } +attributes #1 = { nounwind "vector-function-abi-variant"="_ZGVsMxv_bar(scalable_vec_masked_bar)" } diff --git a/llvm/test/Transforms/LoopVectorize/struct-return.ll b/llvm/test/Transforms/LoopVectorize/struct-return.ll new file mode 100644 index 00000000000000..1ac0c1670b9dc3 --- /dev/null +++ b/llvm/test/Transforms/LoopVectorize/struct-return.ll @@ -0,0 +1,268 @@ +; RUN: opt < %s -passes=loop-vectorize,dce,instcombine -force-vector-width=2 -force-vector-interleave=1 -S | FileCheck %s +; RUN: opt < %s -passes=loop-vectorize,dce,instcombine -force-vector-width=2 -force-vector-interleave=1 -pass-remarks-analysis=loop-vectorize -disable-output -S 2>&1 | FileCheck %s --check-prefix=CHECK-REMARKS + +target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" + +; Tests basic vectorization of homogeneous struct literal returns. + +; TODO: Support vectorization in this case. +; CHECK-REMARKS: remark: {{.*}} loop not vectorized: Auto-vectorization of calls that return struct types is not yet supported +define void @struct_return_f32_widen(ptr noalias %in, ptr noalias writeonly %out_a, ptr noalias writeonly %out_b) { +; CHECK-LABEL: define void @struct_return_f32_widen +; CHECK-NOT: vector.body: +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds float, ptr %in, i64 %iv + %in_val = load float, ptr %arrayidx, align 4 + %call = tail call { float, float } @foo(float %in_val) #0 + %extract_a = extractvalue { float, float } %call, 0 + %extract_b = extractvalue { float, float } %call, 1 + %arrayidx2 = getelementptr inbounds float, ptr %out_a, i64 %iv + store float %extract_a, ptr %arrayidx2, align 4 + %arrayidx4 = getelementptr inbounds float, ptr %out_b, i64 %iv + store float %extract_b, ptr %arrayidx4, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +; TODO: Support vectorization in this case. +; CHECK-REMARKS: remark: {{.*}} loop not vectorized: Auto-vectorization of calls that return struct types is not yet supported +define void @struct_return_f64_widen(ptr noalias %in, ptr noalias writeonly %out_a, ptr noalias writeonly %out_b) { +; CHECK-LABEL: define void @struct_return_f64_widen +; CHECK-NOT: vector.body: +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds double, ptr %in, i64 %iv + %in_val = load double, ptr %arrayidx, align 8 + %call = tail call { double, double } @bar(double %in_val) #1 + %extract_a = extractvalue { double, double } %call, 0 + %extract_b = extractvalue { double, double } %call, 1 + %arrayidx2 = getelementptr inbounds double, ptr %out_a, i64 %iv + store double %extract_a, ptr %arrayidx2, align 8 + %arrayidx4 = getelementptr inbounds double, ptr %out_b, i64 %iv + store double %extract_b, ptr %arrayidx4, align 8 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +; TODO: Support vectorization in this case. +; CHECK-REMARKS: remark: {{.*}} loop not vectorized: Auto-vectorization of calls that return struct types is not yet supported +define void @struct_return_f32_replicate(ptr noalias %in, ptr noalias writeonly %out_a, ptr noalias writeonly %out_b) { +; CHECK-LABEL: define void @struct_return_f32_replicate +; CHECK-NOT: vector.body: +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds float, ptr %in, i64 %iv + %in_val = load float, ptr %arrayidx, align 4 + ; #3 does not have a fixed-size vector mapping (so replication is used) + %call = tail call { float, float } @foo(float %in_val) #3 + %extract_a = extractvalue { float, float } %call, 0 + %extract_b = extractvalue { float, float } %call, 1 + %arrayidx2 = getelementptr inbounds float, ptr %out_a, i64 %iv + store float %extract_a, ptr %arrayidx2, align 4 + %arrayidx4 = getelementptr inbounds float, ptr %out_b, i64 %iv + store float %extract_b, ptr %arrayidx4, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +; TODO: Support vectorization in this case. +; CHECK-REMARKS: remark: {{.*}} loop not vectorized: Auto-vectorization of calls that return struct types is not yet supported +define void @struct_return_f32_widen_rt_checks(ptr %in, ptr writeonly %out_a, ptr writeonly %out_b) { +; CHECK-LABEL: define void @struct_return_f32_widen_rt_checks +; CHECK-NOT: vector.body: +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds float, ptr %in, i64 %iv + %in_val = load float, ptr %arrayidx, align 4 + %call = tail call { float, float } @foo(float %in_val) #0 + %extract_a = extractvalue { float, float } %call, 0 + %extract_b = extractvalue { float, float } %call, 1 + %arrayidx2 = getelementptr inbounds float, ptr %out_a, i64 %iv + store float %extract_a, ptr %arrayidx2, align 4 + %arrayidx4 = getelementptr inbounds float, ptr %out_b, i64 %iv + store float %extract_b, ptr %arrayidx4, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +; Negative test. Widening structs with mixed element types is not supported. +; CHECK-REMARKS-COUNT: remark: {{.*}} loop not vectorized: instruction return type cannot be vectorized +define void @negative_mixed_element_type_struct_return(ptr noalias %in, ptr noalias writeonly %out_a, ptr noalias writeonly %out_b) { +; CHECK-LABEL: define void @negative_mixed_element_type_struct_return +; CHECK-NOT: vector.body: +; CHECK-NOT: call {{.*}} @fixed_vec_baz +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds float, ptr %in, i64 %iv + %in_val = load float, ptr %arrayidx, align 4 + %call = tail call { float, i32 } @baz(float %in_val) #2 + %extract_a = extractvalue { float, i32 } %call, 0 + %extract_b = extractvalue { float, i32 } %call, 1 + %arrayidx2 = getelementptr inbounds float, ptr %out_a, i64 %iv + store float %extract_a, ptr %arrayidx2, align 4 + %arrayidx4 = getelementptr inbounds i32, ptr %out_b, i64 %iv + store i32 %extract_b, ptr %arrayidx4, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +%named_struct = type { double, double } + +; Negative test. Widening non-literal structs is not supported. +; CHECK-REMARKS-COUNT: remark: {{.*}} loop not vectorized: instruction return type cannot be vectorized +define void @test_named_struct_return(ptr noalias readonly %in, ptr noalias writeonly %out_a, ptr noalias writeonly %out_b) { +; CHECK-LABEL: define void @test_named_struct_return +; CHECK-NOT: vector.body: +; CHECK-NOT: call {{.*}} @fixed_vec_bar +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds double, ptr %in, i64 %iv + %in_val = load double, ptr %arrayidx, align 8 + %call = tail call %named_struct @bar_named(double %in_val) #4 + %extract_a = extractvalue %named_struct %call, 0 + %extract_b = extractvalue %named_struct %call, 1 + %arrayidx2 = getelementptr inbounds double, ptr %out_a, i64 %iv + store double %extract_a, ptr %arrayidx2, align 8 + %arrayidx4 = getelementptr inbounds double, ptr %out_b, i64 %iv + store double %extract_b, ptr %arrayidx4, align 8 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +; TODO: Allow mixed-struct type vectorization and mark overflow intrinsics as trivially vectorizable. +; CHECK-REMARKS: remark: {{.*}} loop not vectorized: call instruction cannot be vectorized +define void @test_overflow_intrinsic(ptr noalias readonly %in, ptr noalias writeonly %out_a, ptr noalias writeonly %out_b) { +; CHECK-LABEL: define void @test_overflow_intrinsic +; CHECK-NOT: vector.body: +; CHECK-NOT: @llvm.sadd.with.overflow.v{{.+}}i32 +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr inbounds float, ptr %in, i64 %iv + %in_val = load i32, ptr %arrayidx, align 4 + %call = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %in_val, i32 %in_val) + %extract_ret = extractvalue { i32, i1 } %call, 0 + %extract_overflow = extractvalue { i32, i1 } %call, 1 + %zext_overflow = zext i1 %extract_overflow to i8 + %arrayidx2 = getelementptr inbounds i32, ptr %out_a, i64 %iv + store i32 %extract_ret, ptr %arrayidx2, align 4 + %arrayidx4 = getelementptr inbounds i8, ptr %out_b, i64 %iv + store i8 %zext_overflow, ptr %arrayidx4, align 4 + %iv.next = add nuw nsw i64 %iv, 1 + %exitcond.not = icmp eq i64 %iv.next, 1024 + br i1 %exitcond.not, label %exit, label %for.body + +exit: + ret void +} + +; Negative test. Widening struct loads is not supported. +; CHECK-REMARKS: remark: {{.*}} loop not vectorized: instruction return type cannot be vectorized +define void @negative_struct_load(ptr noalias %in, ptr noalias writeonly %out_a, ptr noalias writeonly %out_b) { +; CHECK-LABEL: define void @negative_struct_load +; CHECK-NOT: vector.body: +entry: + br label %for.body + +for.body: + %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] + %arrayidx = getelementptr... [truncated] 
@MacDue MacDue force-pushed the loop_vec_struct_legality branch from cfff02f to d5b8cfc Compare December 18, 2024 10:52
@MacDue
Copy link
Member Author

MacDue commented Dec 18, 2024

This PR is now ready (since #119000 has landed).

Copy link
Collaborator

@SamTebbs33 SamTebbs33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and simple to me, with one remark.

@MacDue
Copy link
Member Author

MacDue commented Jan 6, 2025

Post-holidays ping :)

MacDue added 6 commits January 6, 2025 10:47
This is a split-off from llvm#109833 and only adds code relating to checking if a struct-returning call can be vectorized. This initial patch only allows the case where all users of the struct return are `extractvalue` operations that can be widened. ``` %call = tail call { float, float } @foo(float %in_val) #0 %extract_a = extractvalue { float, float } %call, 0 %extract_b = extractvalue { float, float } %call, 1 ``` Note: The tests require the VFABI changes from llvm#119000 to pass.
@MacDue MacDue force-pushed the loop_vec_struct_legality branch from f1b0fcc to b09f359 Compare January 6, 2025 11:05
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for making all the changes. Perhaps wait a day in case @fhahn has any comments?

@MacDue
Copy link
Member Author

MacDue commented Jan 7, 2025

Thanks for the reviews, I'll land this tomorrow if there's no further comments :)

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the latest updates and the additional tests! A few more final comments inline

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@MacDue MacDue merged commit f88ef1b into llvm:main Jan 9, 2025
8 checks passed
@MacDue MacDue deleted the loop_vec_struct_legality branch January 9, 2025 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:ir llvm:transforms vectorizers

5 participants