Skip to content

Conversation

@FreddyLeaf
Copy link
Contributor

Respect default value as "preserve-sign,preserve-sign" for X86 backend.

Respect default value as "preserve-sign,preserve-sign" for X86 backend.
@llvmbot
Copy link
Member

llvmbot commented Apr 29, 2024

@llvm/pr-subscribers-llvm-ir

Author: Freddy Ye (FreddyLeaf)

Changes

Respect default value as "preserve-sign,preserve-sign" for X86 backend.


Full diff: https://github.com/llvm/llvm-project/pull/90425.diff

10 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+11)
  • (modified) llvm/include/llvm/CodeGen/CommandFlags.h (+1)
  • (modified) llvm/lib/CodeGen/CommandFlags.cpp (+16)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+6-3)
  • (modified) llvm/lib/Target/X86/X86Subtarget.cpp (+3-1)
  • (modified) llvm/lib/Target/X86/X86Subtarget.h (+6-2)
  • (modified) llvm/lib/Target/X86/X86TargetMachine.cpp (+10-1)
  • (added) llvm/test/CodeGen/X86/bfloat-ftz-daz.ll (+78)
  • (added) llvm/test/Other/opt-override-denormal-fp-math-bf16.ll (+23)
  • (modified) llvm/test/Other/opt-override-denormal-fp-math-mixed.ll (+17-3)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index f169ab941c457b..4f37348f55a7cb 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -2408,6 +2408,17 @@ example: attempt is made to diagnose unsupported uses. Currently this attribute is respected by the AMDGPU and NVPTX backends. +``"denormal-fp-math-bf16"`` + Same as ``"denormal-fp-math"``, but only controls the behavior of + the Brain Float16 type (or vectors of Brain Float16). If both are + are present, this overrides ``"denormal-fp-math"``. Not all targets + support separately setting the denormal mode per type, and no + attempt is made to diagnose unsupported uses. Currently this + attribute is respected by the X86 backend. + + If this is attribute is not specified, the default is + ``"preserve-sign,preserve-sign"``. + ``"thunk"`` This attribute indicates that the function will delegate to some other function with a tail call. The prototype of a thunk should not be used for diff --git a/llvm/include/llvm/CodeGen/CommandFlags.h b/llvm/include/llvm/CodeGen/CommandFlags.h index 244dabd38cf65b..58d5c810553fa5 100644 --- a/llvm/include/llvm/CodeGen/CommandFlags.h +++ b/llvm/include/llvm/CodeGen/CommandFlags.h @@ -71,6 +71,7 @@ bool getEnableNoTrappingFPMath(); DenormalMode::DenormalModeKind getDenormalFPMath(); DenormalMode::DenormalModeKind getDenormalFP32Math(); +DenormalMode::DenormalModeKind getDenormalBF16Math(); bool getEnableHonorSignDependentRoundingFPMath(); diff --git a/llvm/lib/CodeGen/CommandFlags.cpp b/llvm/lib/CodeGen/CommandFlags.cpp index 14ac4b2102c2fa..9005005cf050f8 100644 --- a/llvm/lib/CodeGen/CommandFlags.cpp +++ b/llvm/lib/CodeGen/CommandFlags.cpp @@ -73,6 +73,7 @@ CGOPT(bool, EnableNoTrappingFPMath) CGOPT(bool, EnableAIXExtendedAltivecABI) CGOPT(DenormalMode::DenormalModeKind, DenormalFPMath) CGOPT(DenormalMode::DenormalModeKind, DenormalFP32Math) +CGOPT(DenormalMode::DenormalModeKind, DenormalBF16Math) CGOPT(bool, EnableHonorSignDependentRoundingFPMath) CGOPT(FloatABI::ABIType, FloatABIForCalls) CGOPT(FPOpFusion::FPOpFusionMode, FuseFPOps) @@ -277,6 +278,13 @@ codegen::RegisterCodeGenFlags::RegisterCodeGenFlags() { DenormFlagEnumOptions); CGBINDOPT(DenormalFP32Math); + static cl::opt<DenormalMode::DenormalModeKind> DenormalBF16Math( + "denormal-fp-math-bf16", + cl::desc("Select which denormal numbers the code is permitted to require " + "for bfloat"), + cl::init(DenormalMode::PreserveSign), DenormFlagEnumOptions); + CGBINDOPT(DenormalBF16Math); + static cl::opt<bool> EnableHonorSignDependentRoundingFPMath( "enable-sign-dependent-rounding-fp-math", cl::Hidden, cl::desc("Force codegen to assume rounding mode can change dynamically"), @@ -719,6 +727,14 @@ void codegen::setFunctionAttributes(StringRef CPU, StringRef Features, DenormalMode(DenormKind, DenormKind).str()); } + if (DenormalBF16MathView->getNumOccurrences() > 0 && + !F.hasFnAttribute("denormal-fp-math-bf16")) { + // FIXME: Command line flag should expose separate input/output modes. + DenormalMode::DenormalModeKind DenormKind = getDenormalBF16Math(); + NewAttrs.addAttribute("denormal-fp-math-bf16", + DenormalMode(DenormKind, DenormKind).str()); + } + if (TrapFuncNameView->getNumOccurrences() > 0) for (auto &B : F) for (auto &I : B) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index f0cec6224e84e4..8877493b7c7add 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2283,7 +2283,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, } } - if (!Subtarget.useSoftFloat() && + if (!Subtarget.useSoftFloat() && Subtarget.getDenormalMathFTZDAZBF16() && (Subtarget.hasAVXNECONVERT() || Subtarget.hasBF16())) { addRegisterClass(MVT::v8bf16, Subtarget.hasAVX512() ? &X86::VR128XRegClass : &X86::VR128RegClass); @@ -8740,6 +8740,7 @@ X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const { return LowerBUILD_VECTORvXi1(Op, dl, DAG, Subtarget); if (VT.getVectorElementType() == MVT::bf16 && + Subtarget.getDenormalMathFTZDAZBF16() && (Subtarget.hasAVXNECONVERT() || Subtarget.hasBF16())) return LowerBUILD_VECTORvXbf16(Op, DAG, Subtarget); @@ -21536,6 +21537,7 @@ SDValue X86TargetLowering::LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const { if (VT.getScalarType() == MVT::bf16) { if (SVT.getScalarType() == MVT::f32 && + Subtarget.getDenormalMathFTZDAZBF16() && ((Subtarget.hasBF16() && Subtarget.hasVLX()) || Subtarget.hasAVXNECONVERT())) return Op; @@ -21644,8 +21646,9 @@ SDValue X86TargetLowering::LowerFP_TO_BF16(SDValue Op, SDLoc DL(Op); MVT SVT = Op.getOperand(0).getSimpleValueType(); - if (SVT == MVT::f32 && ((Subtarget.hasBF16() && Subtarget.hasVLX()) || - Subtarget.hasAVXNECONVERT())) { + if (SVT == MVT::f32 && Subtarget.getDenormalMathFTZDAZBF16() && + ((Subtarget.hasBF16() && Subtarget.hasVLX()) || + Subtarget.hasAVXNECONVERT())) { SDValue Res; Res = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, MVT::v4f32, Op.getOperand(0)); Res = DAG.getNode(X86ISD::CVTNEPS2BF16, DL, MVT::v8bf16, Res); diff --git a/llvm/lib/Target/X86/X86Subtarget.cpp b/llvm/lib/Target/X86/X86Subtarget.cpp index c2e6ddd7e7fa2c..150236332ac20d 100644 --- a/llvm/lib/Target/X86/X86Subtarget.cpp +++ b/llvm/lib/Target/X86/X86Subtarget.cpp @@ -324,12 +324,14 @@ X86Subtarget::X86Subtarget(const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, const X86TargetMachine &TM, MaybeAlign StackAlignOverride, unsigned PreferVectorWidthOverride, - unsigned RequiredVectorWidth) + unsigned RequiredVectorWidth, + bool DenormalMathFTZDAZBF16) : X86GenSubtargetInfo(TT, CPU, TuneCPU, FS), PICStyle(PICStyles::Style::None), TM(TM), TargetTriple(TT), StackAlignOverride(StackAlignOverride), PreferVectorWidthOverride(PreferVectorWidthOverride), RequiredVectorWidth(RequiredVectorWidth), + DenormalMathFTZDAZBF16(DenormalMathFTZDAZBF16), InstrInfo(initializeSubtargetDependencies(CPU, TuneCPU, FS)), TLInfo(TM, *this), FrameLowering(*this, getStackAlignment()) { // Determine the PICStyle based on the target selected. diff --git a/llvm/lib/Target/X86/X86Subtarget.h b/llvm/lib/Target/X86/X86Subtarget.h index a458b5f9ec8fbb..0b69dbf192e9f3 100644 --- a/llvm/lib/Target/X86/X86Subtarget.h +++ b/llvm/lib/Target/X86/X86Subtarget.h @@ -106,6 +106,9 @@ class X86Subtarget final : public X86GenSubtargetInfo { /// Required vector width from function attribute. unsigned RequiredVectorWidth; + /// Denormal math for bfloat from function attribute. + bool DenormalMathFTZDAZBF16 = false; + X86SelectionDAGInfo TSInfo; // Ordering here is important. X86InstrInfo initializes X86RegisterInfo which // X86TargetLowering needs. @@ -119,8 +122,8 @@ class X86Subtarget final : public X86GenSubtargetInfo { /// X86Subtarget(const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, const X86TargetMachine &TM, MaybeAlign StackAlignOverride, - unsigned PreferVectorWidthOverride, - unsigned RequiredVectorWidth); + unsigned PreferVectorWidthOverride, unsigned RequiredVectorWidth, + bool DenormalMathFTZDAZBF16); const X86TargetLowering *getTargetLowering() const override { return &TLInfo; @@ -238,6 +241,7 @@ class X86Subtarget final : public X86GenSubtargetInfo { unsigned getPreferVectorWidth() const { return PreferVectorWidth; } unsigned getRequiredVectorWidth() const { return RequiredVectorWidth; } + bool getDenormalMathFTZDAZBF16() const { return DenormalMathFTZDAZBF16; } // Helper functions to determine when we should allow widening to 512-bit // during codegen. diff --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp index 86b456019c4e56..ecb67fc887e26b 100644 --- a/llvm/lib/Target/X86/X86TargetMachine.cpp +++ b/llvm/lib/Target/X86/X86TargetMachine.cpp @@ -304,6 +304,15 @@ X86TargetMachine::getSubtargetImpl(const Function &F) const { } } + // Extract denormal-fp-math-bf16 attribute. + bool DenormalMathFTZDAZBF16 = true; + Attribute DenormalBF16MathAttr = F.getFnAttribute("denormal-fp-math-bf16"); + if (DenormalBF16MathAttr.isValid()) { + StringRef Val = DenormalBF16MathAttr.getValueAsString(); + if (Val != "" && Val != "preserve-sign,preserve-sign") + DenormalMathFTZDAZBF16 = false; + } + // Add CPU to the Key. Key += CPU; @@ -339,7 +348,7 @@ X86TargetMachine::getSubtargetImpl(const Function &F) const { I = std::make_unique<X86Subtarget>( TargetTriple, CPU, TuneCPU, FS, *this, MaybeAlign(F.getParent()->getOverrideStackAlignment()), - PreferVectorWidthOverride, RequiredVectorWidth); + PreferVectorWidthOverride, RequiredVectorWidth, DenormalMathFTZDAZBF16); } return I.get(); } diff --git a/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll b/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll new file mode 100644 index 00000000000000..66c3bf22a3f9f0 --- /dev/null +++ b/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll @@ -0,0 +1,78 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-linux-gnu -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math-bf16=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=NOFTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math-bf16=preserve-sign -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math=ieee -denormal-fp-math-bf16=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=NOFTZDAZ + +define void @add_default_attr(ptr %pa, ptr %pb, ptr %pc) nounwind { +; FTZDAZ-LABEL: add_default_attr: +; FTZDAZ: # %bb.0: +; FTZDAZ-NEXT: movzwl (%rsi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm0 +; FTZDAZ-NEXT: movzwl (%rdi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm1 +; FTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; FTZDAZ-NEXT: {vex} vcvtneps2bf16 %xmm0, %xmm0 +; FTZDAZ-NEXT: vpextrw $0, %xmm0, (%rdx) +; FTZDAZ-NEXT: retq +; +; NOFTZDAZ-LABEL: add_default_attr: +; NOFTZDAZ: # %bb.0: +; NOFTZDAZ-NEXT: pushq %rbx +; NOFTZDAZ-NEXT: movq %rdx, %rbx +; NOFTZDAZ-NEXT: movzwl (%rsi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm0 +; NOFTZDAZ-NEXT: movzwl (%rdi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm1 +; NOFTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; NOFTZDAZ-NEXT: callq __truncsfbf2@PLT +; NOFTZDAZ-NEXT: vpextrw $0, %xmm0, (%rbx) +; NOFTZDAZ-NEXT: popq %rbx +; NOFTZDAZ-NEXT: retq + %a = load bfloat, ptr %pa + %b = load bfloat, ptr %pb + %add = fadd bfloat %a, %b + store bfloat %add, ptr %pc + ret void +} + +define void @add_no_ftz_daz_attr(ptr %pa, ptr %pb, ptr %pc) nounwind "denormal-fp-math-bf16"="ieee,ieee" { +; FTZDAZ-LABEL: add_no_ftz_daz_attr: +; FTZDAZ: # %bb.0: +; FTZDAZ-NEXT: movzwl (%rsi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm0 +; FTZDAZ-NEXT: movzwl (%rdi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm1 +; FTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; FTZDAZ-NEXT: {vex} vcvtneps2bf16 %xmm0, %xmm0 +; FTZDAZ-NEXT: vpextrw $0, %xmm0, (%rdx) +; FTZDAZ-NEXT: retq +; +; NOFTZDAZ-LABEL: add_no_ftz_daz_attr: +; NOFTZDAZ: # %bb.0: +; NOFTZDAZ-NEXT: pushq %rbx +; NOFTZDAZ-NEXT: movq %rdx, %rbx +; NOFTZDAZ-NEXT: movzwl (%rsi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm0 +; NOFTZDAZ-NEXT: movzwl (%rdi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm1 +; NOFTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; NOFTZDAZ-NEXT: callq __truncsfbf2@PLT +; NOFTZDAZ-NEXT: vpextrw $0, %xmm0, (%rbx) +; NOFTZDAZ-NEXT: popq %rbx +; NOFTZDAZ-NEXT: retq + %a = load bfloat, ptr %pa + %b = load bfloat, ptr %pb + %add = fadd bfloat %a, %b + store bfloat %add, ptr %pc + ret void +} diff --git a/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll b/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll new file mode 100644 index 00000000000000..0524d9354cf14d --- /dev/null +++ b/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll @@ -0,0 +1,23 @@ +; RUN: opt -S -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEE,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO,ALL %s + +; ALL: @no_denormal_fp_math_f32_attr() [[NOATTR:#[0-9]+]] { +define i32 @no_denormal_fp_math_f32_attr() #0 { +entry: + ret i32 0 +} + +; ALL: denormal_fp_math_attr_preserve_sign_ieee() [[ATTR:#[0-9]+]] { +define i32 @denormal_fp_math_attr_preserve_sign_ieee() #1 { +entry: + ret i32 0 +} + +; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,ieee" } +; IEEE-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGN-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZERO-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="positive-zero,positive-zero" } + +attributes #0 = { nounwind } +attributes #1 = { nounwind "denormal-fp-math-bf16"="preserve-sign,ieee" } diff --git a/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll b/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll index 306fc78a2183a2..e14320007b5895 100644 --- a/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll +++ b/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll @@ -6,11 +6,17 @@ ; RUN: opt -S -denormal-fp-math-f32=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGNF32,ALL %s ; RUN: opt -S -denormal-fp-math-f32=positive-zero %s | FileCheck -check-prefixes=POSITIVEZEROF32,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEEBF16,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGNBF16,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZEROBF16,ALL %s + ; RUN: opt -S -denormal-fp-math=ieee -denormal-fp-math-f32=ieee %s | FileCheck -check-prefixes=IEEE-BOTH,ALL %s ; RUN: opt -S -denormal-fp-math=preserve-sign -denormal-fp-math-f32=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN-BOTH,ALL %s ; RUN: opt -S -denormal-fp-math=positive-zero -denormal-fp-math-f32=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO-BOTH,ALL %s - +; RUN: opt -S -denormal-fp-math=ieee -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEE-BOTH2,ALL %s +; RUN: opt -S -denormal-fp-math=preserve-sign -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN-BOTH2,ALL %s +; RUN: opt -S -denormal-fp-math=positive-zero -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO-BOTH2,ALL %s ; ALL: @no_denormal_fp_math_attrs() [[NOATTR:#[0-9]+]] { define i32 @no_denormal_fp_math_attrs() #0 { @@ -24,7 +30,7 @@ entry: ret i32 0 } -; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } +; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-bf16"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } ; IEEE-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" } ; PRESERVESIGN-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" } @@ -34,9 +40,17 @@ entry: ; PRESERVESIGNF32-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" } ; POSITIVEZEROF32-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-f32"="positive-zero,positive-zero" } +; IEEEBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGNBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZEROBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="positive-zero,positive-zero" } + ; IEEE-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" "denormal-fp-math-f32"="ieee,ieee" } ; PRESERVESIGN-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="preserve-sign,preserve-sign" } ; POSITIVEZERO-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="positive-zero,positive-zero" "denormal-fp-math-f32"="positive-zero,positive-zero" } +; IEEE-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGN-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZERO-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="positive-zero,positive-zero" "denormal-fp-math-bf16"="positive-zero,positive-zero" } + attributes #0 = { nounwind } -attributes #1 = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } +attributes #1 = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-bf16"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } 
@llvmbot
Copy link
Member

llvmbot commented Apr 29, 2024

@llvm/pr-subscribers-backend-x86

Author: Freddy Ye (FreddyLeaf)

Changes

Respect default value as "preserve-sign,preserve-sign" for X86 backend.


Full diff: https://github.com/llvm/llvm-project/pull/90425.diff

10 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+11)
  • (modified) llvm/include/llvm/CodeGen/CommandFlags.h (+1)
  • (modified) llvm/lib/CodeGen/CommandFlags.cpp (+16)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+6-3)
  • (modified) llvm/lib/Target/X86/X86Subtarget.cpp (+3-1)
  • (modified) llvm/lib/Target/X86/X86Subtarget.h (+6-2)
  • (modified) llvm/lib/Target/X86/X86TargetMachine.cpp (+10-1)
  • (added) llvm/test/CodeGen/X86/bfloat-ftz-daz.ll (+78)
  • (added) llvm/test/Other/opt-override-denormal-fp-math-bf16.ll (+23)
  • (modified) llvm/test/Other/opt-override-denormal-fp-math-mixed.ll (+17-3)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index f169ab941c457b..4f37348f55a7cb 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -2408,6 +2408,17 @@ example: attempt is made to diagnose unsupported uses. Currently this attribute is respected by the AMDGPU and NVPTX backends. +``"denormal-fp-math-bf16"`` + Same as ``"denormal-fp-math"``, but only controls the behavior of + the Brain Float16 type (or vectors of Brain Float16). If both are + are present, this overrides ``"denormal-fp-math"``. Not all targets + support separately setting the denormal mode per type, and no + attempt is made to diagnose unsupported uses. Currently this + attribute is respected by the X86 backend. + + If this is attribute is not specified, the default is + ``"preserve-sign,preserve-sign"``. + ``"thunk"`` This attribute indicates that the function will delegate to some other function with a tail call. The prototype of a thunk should not be used for diff --git a/llvm/include/llvm/CodeGen/CommandFlags.h b/llvm/include/llvm/CodeGen/CommandFlags.h index 244dabd38cf65b..58d5c810553fa5 100644 --- a/llvm/include/llvm/CodeGen/CommandFlags.h +++ b/llvm/include/llvm/CodeGen/CommandFlags.h @@ -71,6 +71,7 @@ bool getEnableNoTrappingFPMath(); DenormalMode::DenormalModeKind getDenormalFPMath(); DenormalMode::DenormalModeKind getDenormalFP32Math(); +DenormalMode::DenormalModeKind getDenormalBF16Math(); bool getEnableHonorSignDependentRoundingFPMath(); diff --git a/llvm/lib/CodeGen/CommandFlags.cpp b/llvm/lib/CodeGen/CommandFlags.cpp index 14ac4b2102c2fa..9005005cf050f8 100644 --- a/llvm/lib/CodeGen/CommandFlags.cpp +++ b/llvm/lib/CodeGen/CommandFlags.cpp @@ -73,6 +73,7 @@ CGOPT(bool, EnableNoTrappingFPMath) CGOPT(bool, EnableAIXExtendedAltivecABI) CGOPT(DenormalMode::DenormalModeKind, DenormalFPMath) CGOPT(DenormalMode::DenormalModeKind, DenormalFP32Math) +CGOPT(DenormalMode::DenormalModeKind, DenormalBF16Math) CGOPT(bool, EnableHonorSignDependentRoundingFPMath) CGOPT(FloatABI::ABIType, FloatABIForCalls) CGOPT(FPOpFusion::FPOpFusionMode, FuseFPOps) @@ -277,6 +278,13 @@ codegen::RegisterCodeGenFlags::RegisterCodeGenFlags() { DenormFlagEnumOptions); CGBINDOPT(DenormalFP32Math); + static cl::opt<DenormalMode::DenormalModeKind> DenormalBF16Math( + "denormal-fp-math-bf16", + cl::desc("Select which denormal numbers the code is permitted to require " + "for bfloat"), + cl::init(DenormalMode::PreserveSign), DenormFlagEnumOptions); + CGBINDOPT(DenormalBF16Math); + static cl::opt<bool> EnableHonorSignDependentRoundingFPMath( "enable-sign-dependent-rounding-fp-math", cl::Hidden, cl::desc("Force codegen to assume rounding mode can change dynamically"), @@ -719,6 +727,14 @@ void codegen::setFunctionAttributes(StringRef CPU, StringRef Features, DenormalMode(DenormKind, DenormKind).str()); } + if (DenormalBF16MathView->getNumOccurrences() > 0 && + !F.hasFnAttribute("denormal-fp-math-bf16")) { + // FIXME: Command line flag should expose separate input/output modes. + DenormalMode::DenormalModeKind DenormKind = getDenormalBF16Math(); + NewAttrs.addAttribute("denormal-fp-math-bf16", + DenormalMode(DenormKind, DenormKind).str()); + } + if (TrapFuncNameView->getNumOccurrences() > 0) for (auto &B : F) for (auto &I : B) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index f0cec6224e84e4..8877493b7c7add 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2283,7 +2283,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, } } - if (!Subtarget.useSoftFloat() && + if (!Subtarget.useSoftFloat() && Subtarget.getDenormalMathFTZDAZBF16() && (Subtarget.hasAVXNECONVERT() || Subtarget.hasBF16())) { addRegisterClass(MVT::v8bf16, Subtarget.hasAVX512() ? &X86::VR128XRegClass : &X86::VR128RegClass); @@ -8740,6 +8740,7 @@ X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const { return LowerBUILD_VECTORvXi1(Op, dl, DAG, Subtarget); if (VT.getVectorElementType() == MVT::bf16 && + Subtarget.getDenormalMathFTZDAZBF16() && (Subtarget.hasAVXNECONVERT() || Subtarget.hasBF16())) return LowerBUILD_VECTORvXbf16(Op, DAG, Subtarget); @@ -21536,6 +21537,7 @@ SDValue X86TargetLowering::LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const { if (VT.getScalarType() == MVT::bf16) { if (SVT.getScalarType() == MVT::f32 && + Subtarget.getDenormalMathFTZDAZBF16() && ((Subtarget.hasBF16() && Subtarget.hasVLX()) || Subtarget.hasAVXNECONVERT())) return Op; @@ -21644,8 +21646,9 @@ SDValue X86TargetLowering::LowerFP_TO_BF16(SDValue Op, SDLoc DL(Op); MVT SVT = Op.getOperand(0).getSimpleValueType(); - if (SVT == MVT::f32 && ((Subtarget.hasBF16() && Subtarget.hasVLX()) || - Subtarget.hasAVXNECONVERT())) { + if (SVT == MVT::f32 && Subtarget.getDenormalMathFTZDAZBF16() && + ((Subtarget.hasBF16() && Subtarget.hasVLX()) || + Subtarget.hasAVXNECONVERT())) { SDValue Res; Res = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, MVT::v4f32, Op.getOperand(0)); Res = DAG.getNode(X86ISD::CVTNEPS2BF16, DL, MVT::v8bf16, Res); diff --git a/llvm/lib/Target/X86/X86Subtarget.cpp b/llvm/lib/Target/X86/X86Subtarget.cpp index c2e6ddd7e7fa2c..150236332ac20d 100644 --- a/llvm/lib/Target/X86/X86Subtarget.cpp +++ b/llvm/lib/Target/X86/X86Subtarget.cpp @@ -324,12 +324,14 @@ X86Subtarget::X86Subtarget(const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, const X86TargetMachine &TM, MaybeAlign StackAlignOverride, unsigned PreferVectorWidthOverride, - unsigned RequiredVectorWidth) + unsigned RequiredVectorWidth, + bool DenormalMathFTZDAZBF16) : X86GenSubtargetInfo(TT, CPU, TuneCPU, FS), PICStyle(PICStyles::Style::None), TM(TM), TargetTriple(TT), StackAlignOverride(StackAlignOverride), PreferVectorWidthOverride(PreferVectorWidthOverride), RequiredVectorWidth(RequiredVectorWidth), + DenormalMathFTZDAZBF16(DenormalMathFTZDAZBF16), InstrInfo(initializeSubtargetDependencies(CPU, TuneCPU, FS)), TLInfo(TM, *this), FrameLowering(*this, getStackAlignment()) { // Determine the PICStyle based on the target selected. diff --git a/llvm/lib/Target/X86/X86Subtarget.h b/llvm/lib/Target/X86/X86Subtarget.h index a458b5f9ec8fbb..0b69dbf192e9f3 100644 --- a/llvm/lib/Target/X86/X86Subtarget.h +++ b/llvm/lib/Target/X86/X86Subtarget.h @@ -106,6 +106,9 @@ class X86Subtarget final : public X86GenSubtargetInfo { /// Required vector width from function attribute. unsigned RequiredVectorWidth; + /// Denormal math for bfloat from function attribute. + bool DenormalMathFTZDAZBF16 = false; + X86SelectionDAGInfo TSInfo; // Ordering here is important. X86InstrInfo initializes X86RegisterInfo which // X86TargetLowering needs. @@ -119,8 +122,8 @@ class X86Subtarget final : public X86GenSubtargetInfo { /// X86Subtarget(const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, const X86TargetMachine &TM, MaybeAlign StackAlignOverride, - unsigned PreferVectorWidthOverride, - unsigned RequiredVectorWidth); + unsigned PreferVectorWidthOverride, unsigned RequiredVectorWidth, + bool DenormalMathFTZDAZBF16); const X86TargetLowering *getTargetLowering() const override { return &TLInfo; @@ -238,6 +241,7 @@ class X86Subtarget final : public X86GenSubtargetInfo { unsigned getPreferVectorWidth() const { return PreferVectorWidth; } unsigned getRequiredVectorWidth() const { return RequiredVectorWidth; } + bool getDenormalMathFTZDAZBF16() const { return DenormalMathFTZDAZBF16; } // Helper functions to determine when we should allow widening to 512-bit // during codegen. diff --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp index 86b456019c4e56..ecb67fc887e26b 100644 --- a/llvm/lib/Target/X86/X86TargetMachine.cpp +++ b/llvm/lib/Target/X86/X86TargetMachine.cpp @@ -304,6 +304,15 @@ X86TargetMachine::getSubtargetImpl(const Function &F) const { } } + // Extract denormal-fp-math-bf16 attribute. + bool DenormalMathFTZDAZBF16 = true; + Attribute DenormalBF16MathAttr = F.getFnAttribute("denormal-fp-math-bf16"); + if (DenormalBF16MathAttr.isValid()) { + StringRef Val = DenormalBF16MathAttr.getValueAsString(); + if (Val != "" && Val != "preserve-sign,preserve-sign") + DenormalMathFTZDAZBF16 = false; + } + // Add CPU to the Key. Key += CPU; @@ -339,7 +348,7 @@ X86TargetMachine::getSubtargetImpl(const Function &F) const { I = std::make_unique<X86Subtarget>( TargetTriple, CPU, TuneCPU, FS, *this, MaybeAlign(F.getParent()->getOverrideStackAlignment()), - PreferVectorWidthOverride, RequiredVectorWidth); + PreferVectorWidthOverride, RequiredVectorWidth, DenormalMathFTZDAZBF16); } return I.get(); } diff --git a/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll b/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll new file mode 100644 index 00000000000000..66c3bf22a3f9f0 --- /dev/null +++ b/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll @@ -0,0 +1,78 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-linux-gnu -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math-bf16=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=NOFTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math-bf16=preserve-sign -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math=ieee -denormal-fp-math-bf16=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=NOFTZDAZ + +define void @add_default_attr(ptr %pa, ptr %pb, ptr %pc) nounwind { +; FTZDAZ-LABEL: add_default_attr: +; FTZDAZ: # %bb.0: +; FTZDAZ-NEXT: movzwl (%rsi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm0 +; FTZDAZ-NEXT: movzwl (%rdi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm1 +; FTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; FTZDAZ-NEXT: {vex} vcvtneps2bf16 %xmm0, %xmm0 +; FTZDAZ-NEXT: vpextrw $0, %xmm0, (%rdx) +; FTZDAZ-NEXT: retq +; +; NOFTZDAZ-LABEL: add_default_attr: +; NOFTZDAZ: # %bb.0: +; NOFTZDAZ-NEXT: pushq %rbx +; NOFTZDAZ-NEXT: movq %rdx, %rbx +; NOFTZDAZ-NEXT: movzwl (%rsi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm0 +; NOFTZDAZ-NEXT: movzwl (%rdi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm1 +; NOFTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; NOFTZDAZ-NEXT: callq __truncsfbf2@PLT +; NOFTZDAZ-NEXT: vpextrw $0, %xmm0, (%rbx) +; NOFTZDAZ-NEXT: popq %rbx +; NOFTZDAZ-NEXT: retq + %a = load bfloat, ptr %pa + %b = load bfloat, ptr %pb + %add = fadd bfloat %a, %b + store bfloat %add, ptr %pc + ret void +} + +define void @add_no_ftz_daz_attr(ptr %pa, ptr %pb, ptr %pc) nounwind "denormal-fp-math-bf16"="ieee,ieee" { +; FTZDAZ-LABEL: add_no_ftz_daz_attr: +; FTZDAZ: # %bb.0: +; FTZDAZ-NEXT: movzwl (%rsi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm0 +; FTZDAZ-NEXT: movzwl (%rdi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm1 +; FTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; FTZDAZ-NEXT: {vex} vcvtneps2bf16 %xmm0, %xmm0 +; FTZDAZ-NEXT: vpextrw $0, %xmm0, (%rdx) +; FTZDAZ-NEXT: retq +; +; NOFTZDAZ-LABEL: add_no_ftz_daz_attr: +; NOFTZDAZ: # %bb.0: +; NOFTZDAZ-NEXT: pushq %rbx +; NOFTZDAZ-NEXT: movq %rdx, %rbx +; NOFTZDAZ-NEXT: movzwl (%rsi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm0 +; NOFTZDAZ-NEXT: movzwl (%rdi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm1 +; NOFTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; NOFTZDAZ-NEXT: callq __truncsfbf2@PLT +; NOFTZDAZ-NEXT: vpextrw $0, %xmm0, (%rbx) +; NOFTZDAZ-NEXT: popq %rbx +; NOFTZDAZ-NEXT: retq + %a = load bfloat, ptr %pa + %b = load bfloat, ptr %pb + %add = fadd bfloat %a, %b + store bfloat %add, ptr %pc + ret void +} diff --git a/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll b/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll new file mode 100644 index 00000000000000..0524d9354cf14d --- /dev/null +++ b/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll @@ -0,0 +1,23 @@ +; RUN: opt -S -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEE,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO,ALL %s + +; ALL: @no_denormal_fp_math_f32_attr() [[NOATTR:#[0-9]+]] { +define i32 @no_denormal_fp_math_f32_attr() #0 { +entry: + ret i32 0 +} + +; ALL: denormal_fp_math_attr_preserve_sign_ieee() [[ATTR:#[0-9]+]] { +define i32 @denormal_fp_math_attr_preserve_sign_ieee() #1 { +entry: + ret i32 0 +} + +; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,ieee" } +; IEEE-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGN-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZERO-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="positive-zero,positive-zero" } + +attributes #0 = { nounwind } +attributes #1 = { nounwind "denormal-fp-math-bf16"="preserve-sign,ieee" } diff --git a/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll b/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll index 306fc78a2183a2..e14320007b5895 100644 --- a/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll +++ b/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll @@ -6,11 +6,17 @@ ; RUN: opt -S -denormal-fp-math-f32=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGNF32,ALL %s ; RUN: opt -S -denormal-fp-math-f32=positive-zero %s | FileCheck -check-prefixes=POSITIVEZEROF32,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEEBF16,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGNBF16,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZEROBF16,ALL %s + ; RUN: opt -S -denormal-fp-math=ieee -denormal-fp-math-f32=ieee %s | FileCheck -check-prefixes=IEEE-BOTH,ALL %s ; RUN: opt -S -denormal-fp-math=preserve-sign -denormal-fp-math-f32=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN-BOTH,ALL %s ; RUN: opt -S -denormal-fp-math=positive-zero -denormal-fp-math-f32=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO-BOTH,ALL %s - +; RUN: opt -S -denormal-fp-math=ieee -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEE-BOTH2,ALL %s +; RUN: opt -S -denormal-fp-math=preserve-sign -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN-BOTH2,ALL %s +; RUN: opt -S -denormal-fp-math=positive-zero -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO-BOTH2,ALL %s ; ALL: @no_denormal_fp_math_attrs() [[NOATTR:#[0-9]+]] { define i32 @no_denormal_fp_math_attrs() #0 { @@ -24,7 +30,7 @@ entry: ret i32 0 } -; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } +; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-bf16"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } ; IEEE-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" } ; PRESERVESIGN-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" } @@ -34,9 +40,17 @@ entry: ; PRESERVESIGNF32-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" } ; POSITIVEZEROF32-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-f32"="positive-zero,positive-zero" } +; IEEEBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGNBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZEROBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="positive-zero,positive-zero" } + ; IEEE-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" "denormal-fp-math-f32"="ieee,ieee" } ; PRESERVESIGN-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="preserve-sign,preserve-sign" } ; POSITIVEZERO-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="positive-zero,positive-zero" "denormal-fp-math-f32"="positive-zero,positive-zero" } +; IEEE-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGN-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZERO-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="positive-zero,positive-zero" "denormal-fp-math-bf16"="positive-zero,positive-zero" } + attributes #0 = { nounwind } -attributes #1 = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } +attributes #1 = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-bf16"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } 
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does X86 actually have a separately controllable bfloat denormal mode? This does not control the mode. It only informs the code generator that this is the mode. Without an underlying mode control bit, this is not correct/useful

@FreddyLeaf
Copy link
Contributor Author

Does X86 actually have a separately controllable bfloat denormal mode? This does not control the mode. It only informs the code generator that this is the mode. Without an underlying mode control bit, this is not correct/useful

You are right, currently X86 hardware now only has control bit on denormal mode for f32/f64 (in MXCSR IIRC). But even for attribute of -denormal-fp-math, I didn't find separate backend handlings on different attribute values. The motivation of this patch is to respect -denormal-fp-math for x86 backend, since all x86 bf16 instructions have only one denormal mode, which is flush to zero both for input and output. It can't respect for ieee and dynamic. If the attribute is set to other than "preserve-sign", I was considering to handle it by something like expand bf16 to f32. Anyway, I'm ok to continue to ignore -denormal-fp-math on this. And I'm interested if other targets also have this issue on bf16 and future variants of fp8.

@arsenm arsenm requested a review from jcranmer-intel May 7, 2024 21:18
@arsenm arsenm added the floating-point Floating-point math label May 7, 2024
@arsenm
Copy link
Contributor

arsenm commented May 7, 2024

since all x86 bf16 instructions have only one denormal mode, which is flush to zero both for input and output.

So all the instructions are just defective?

If the attribute is set to other than "preserve-sign", I was considering to handle it by something like expand bf16 to f32.

Sigh. I guess this would be the correct way to handle it

@FreddyLeaf
Copy link
Contributor Author

FreddyLeaf commented May 8, 2024

So all the instructions are just defective?

I'm afraid so. BF16 instructions belong to two CPUIDs of AVX512_BF16 and AVX_NE_CONVERT. E.g.

The first one explicitly mentioned it in SDM but the second one doesn't(ISE):
E.g.
image

But since both of them mentioned does not consult or update MXCSR. I suppose AVX_NE_CONVERT instructions as well has such denormal handling.

@arsenm
Copy link
Contributor

arsenm commented May 9, 2024

I'm afraid so. BF16 instructions belong to two CPUIDs of AVX512_BF16 and AVX_NE_CONVERT. E.g.

Is it all the operations, or just these specific dot products? Are there specific bf16<->float conversions that have the same issue? What about basic arithmetic operations?

@andykaylor
Copy link
Contributor

I'm afraid so. BF16 instructions belong to two CPUIDs of AVX512_BF16 and AVX_NE_CONVERT. E.g.

Is it all the operations, or just these specific dot products? Are there specific bf16<->float conversions that have the same issue? What about basic arithmetic operations?

Is this really a case of "defective instructions" or is it just a difference between the way that Intel processors understand the bfloat16 type compared to other architectures? The Intel white paper on bfloat16 (https://www.intel.com/content/www/us/en/content-details/671279/bfloat16-hardware-numerics-definition.html) says, "There is no need to support denormals; FP32, and therefore also BF16, offer more than enough range for deep learning training tasks."

@FreddyLeaf
Copy link
Contributor Author

FreddyLeaf commented May 10, 2024

Is it all the operations, or just these specific dot products? Are there specific bf16<->float conversions that have the same issue? What about basic arithmetic operations?

x86 only supports bf16<->float conversion and this dot production operations. float->bf16 instruction also mentioned this in SDM but bf16->float didn't. but it still mentioned does not consult or update MXCSR so I suppose it won't flush to zero, either since it required consulting MXCSR.

@phoebewang
Copy link
Contributor

We don't have bf16->float conversion instructions. The current compiler implementation simply uses 16-bit shift, so we don't explicitly do DAZ. The dot product instructions do mention to use both DAZ and FTZ in the calculation. Together with the float->bf16 instructions, we can say all native instructions always use DAZ and FTZ.

In the contrast, the FP16 type never does DAZ/FTZ (https://cdrdv2-public.intel.com/678970/intel-avx512-fp16.pdf), though it's not controlled by MXCSR either.

@FreddyLeaf
Copy link
Contributor Author

We don't have bf16->float conversion instructions.

Intel added them in AVX_NE_CONVERT, memory input only.

@phoebewang
Copy link
Contributor

We don't have bf16->float conversion instructions.

Intel added them in AVX_NE_CONVERT, memory input only.

Ok, they use DAZ as well.

@arsenm
Copy link
Contributor

arsenm commented May 20, 2024

Is this really a case of "defective instructions" or is it just a difference between the way that Intel processors understand the bfloat16 type compared to other architectures?

As a format, it's just IEEE with a different combination of mantissa and exponent widths. Denormals have a specific, and clear meaning here and there's no implied flushing on computation

The Intel white paper on bfloat16 (https://www.intel.com/content/www/us/en/content-details/671279/bfloat16-hardware-numerics-definition.html) says, "There is no need to support denormals; FP32, and therefore also BF16, offer more than enough range for deep learning training tasks."

I don't know how to parse this comment. Denormals, in what type? I almost read this as "you don't need to handle fp16 denormals if you process in bf16 instead". At worst it's a subjective value judgement that bad behavior is OK, but I'm not sure that's what it's really saying

@andykaylor
Copy link
Contributor

Is this really a case of "defective instructions" or is it just a difference between the way that Intel processors understand the bfloat16 type compared to other architectures?

As a format, it's just IEEE with a different combination of mantissa and exponent widths. Denormals have a specific, and clear meaning here and there's no implied flushing on computation

I don't accept that. Denormals have a specific and clear meaning in the IEEE types, but once you say "This is like the IEEE type except..." you can no longer assume anything that isn't part of the specification for the new type. Is there an accepted standard specification for this type? I'm not sure how this would be adjudicated apart from reference to actual implementations defining an ad hoc standard.

Here's something that comes close to a definition for the type, though it isn't presented as a formal specification:

https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus

That document says, "To ensure identical behavior for underflows, overflows, and NaNs, bfloat16 has the same exponent size as FP32. However, bfloat16 handles denormals differently from FP32: it flushes them to zero."

The Intel white paper on bfloat16 (https://www.intel.com/content/www/us/en/content-details/671279/bfloat16-hardware-numerics-definition.html) says, "There is no need to support denormals; FP32, and therefore also BF16, offer more than enough range for deep learning training tasks."

I don't know how to parse this comment. Denormals, in what type? I almost read this as "you don't need to handle fp16 denormals if you process in bf16 instead". At worst it's a subjective value judgement that bad behavior is OK, but I'm not sure that's what it's really saying

I will admit that doesn't sound like a technical specification so much as a marketing pitch. I haven't spoken to any of the Intel hardware engineers responsible for the BF16 implementation in Intel processors, and any opinions I am expressing here do not represent Intel's official position. I'm just offering my interpretation, and my interpretation is that support for denormals is not required for bfloat16.

@arsenm
Copy link
Contributor

arsenm commented Jun 7, 2024

I don't accept that. Denormals have a specific and clear meaning in the IEEE types, but once you say "This is like the IEEE type except..." you can no longer assume anything that isn't part of the specification for the new type.

The IEEE formats are specified as M mantissa bits and E exponent bits, just because the standard didn't prescribe this particular combination of bits as a suggested format doesn't mean it's some wild thing with no obligation to be consistent. Denormals are a set of values in the encoding, which bfloat certainly has. The choice for a computation to drop bits on the floor when a value would need to be encoded as a denormal is somewhat orthogonal to the format itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

5 participants