- Notifications
You must be signed in to change notification settings - Fork 15.3k
CodeGen: Add -denormal-fp-math-bf16 flag #90425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Respect default value as "preserve-sign,preserve-sign" for X86 backend.
| @llvm/pr-subscribers-llvm-ir Author: Freddy Ye (FreddyLeaf) ChangesRespect default value as "preserve-sign,preserve-sign" for X86 backend. Full diff: https://github.com/llvm/llvm-project/pull/90425.diff 10 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index f169ab941c457b..4f37348f55a7cb 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -2408,6 +2408,17 @@ example: attempt is made to diagnose unsupported uses. Currently this attribute is respected by the AMDGPU and NVPTX backends. +``"denormal-fp-math-bf16"`` + Same as ``"denormal-fp-math"``, but only controls the behavior of + the Brain Float16 type (or vectors of Brain Float16). If both are + are present, this overrides ``"denormal-fp-math"``. Not all targets + support separately setting the denormal mode per type, and no + attempt is made to diagnose unsupported uses. Currently this + attribute is respected by the X86 backend. + + If this is attribute is not specified, the default is + ``"preserve-sign,preserve-sign"``. + ``"thunk"`` This attribute indicates that the function will delegate to some other function with a tail call. The prototype of a thunk should not be used for diff --git a/llvm/include/llvm/CodeGen/CommandFlags.h b/llvm/include/llvm/CodeGen/CommandFlags.h index 244dabd38cf65b..58d5c810553fa5 100644 --- a/llvm/include/llvm/CodeGen/CommandFlags.h +++ b/llvm/include/llvm/CodeGen/CommandFlags.h @@ -71,6 +71,7 @@ bool getEnableNoTrappingFPMath(); DenormalMode::DenormalModeKind getDenormalFPMath(); DenormalMode::DenormalModeKind getDenormalFP32Math(); +DenormalMode::DenormalModeKind getDenormalBF16Math(); bool getEnableHonorSignDependentRoundingFPMath(); diff --git a/llvm/lib/CodeGen/CommandFlags.cpp b/llvm/lib/CodeGen/CommandFlags.cpp index 14ac4b2102c2fa..9005005cf050f8 100644 --- a/llvm/lib/CodeGen/CommandFlags.cpp +++ b/llvm/lib/CodeGen/CommandFlags.cpp @@ -73,6 +73,7 @@ CGOPT(bool, EnableNoTrappingFPMath) CGOPT(bool, EnableAIXExtendedAltivecABI) CGOPT(DenormalMode::DenormalModeKind, DenormalFPMath) CGOPT(DenormalMode::DenormalModeKind, DenormalFP32Math) +CGOPT(DenormalMode::DenormalModeKind, DenormalBF16Math) CGOPT(bool, EnableHonorSignDependentRoundingFPMath) CGOPT(FloatABI::ABIType, FloatABIForCalls) CGOPT(FPOpFusion::FPOpFusionMode, FuseFPOps) @@ -277,6 +278,13 @@ codegen::RegisterCodeGenFlags::RegisterCodeGenFlags() { DenormFlagEnumOptions); CGBINDOPT(DenormalFP32Math); + static cl::opt<DenormalMode::DenormalModeKind> DenormalBF16Math( + "denormal-fp-math-bf16", + cl::desc("Select which denormal numbers the code is permitted to require " + "for bfloat"), + cl::init(DenormalMode::PreserveSign), DenormFlagEnumOptions); + CGBINDOPT(DenormalBF16Math); + static cl::opt<bool> EnableHonorSignDependentRoundingFPMath( "enable-sign-dependent-rounding-fp-math", cl::Hidden, cl::desc("Force codegen to assume rounding mode can change dynamically"), @@ -719,6 +727,14 @@ void codegen::setFunctionAttributes(StringRef CPU, StringRef Features, DenormalMode(DenormKind, DenormKind).str()); } + if (DenormalBF16MathView->getNumOccurrences() > 0 && + !F.hasFnAttribute("denormal-fp-math-bf16")) { + // FIXME: Command line flag should expose separate input/output modes. + DenormalMode::DenormalModeKind DenormKind = getDenormalBF16Math(); + NewAttrs.addAttribute("denormal-fp-math-bf16", + DenormalMode(DenormKind, DenormKind).str()); + } + if (TrapFuncNameView->getNumOccurrences() > 0) for (auto &B : F) for (auto &I : B) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index f0cec6224e84e4..8877493b7c7add 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2283,7 +2283,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, } } - if (!Subtarget.useSoftFloat() && + if (!Subtarget.useSoftFloat() && Subtarget.getDenormalMathFTZDAZBF16() && (Subtarget.hasAVXNECONVERT() || Subtarget.hasBF16())) { addRegisterClass(MVT::v8bf16, Subtarget.hasAVX512() ? &X86::VR128XRegClass : &X86::VR128RegClass); @@ -8740,6 +8740,7 @@ X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const { return LowerBUILD_VECTORvXi1(Op, dl, DAG, Subtarget); if (VT.getVectorElementType() == MVT::bf16 && + Subtarget.getDenormalMathFTZDAZBF16() && (Subtarget.hasAVXNECONVERT() || Subtarget.hasBF16())) return LowerBUILD_VECTORvXbf16(Op, DAG, Subtarget); @@ -21536,6 +21537,7 @@ SDValue X86TargetLowering::LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const { if (VT.getScalarType() == MVT::bf16) { if (SVT.getScalarType() == MVT::f32 && + Subtarget.getDenormalMathFTZDAZBF16() && ((Subtarget.hasBF16() && Subtarget.hasVLX()) || Subtarget.hasAVXNECONVERT())) return Op; @@ -21644,8 +21646,9 @@ SDValue X86TargetLowering::LowerFP_TO_BF16(SDValue Op, SDLoc DL(Op); MVT SVT = Op.getOperand(0).getSimpleValueType(); - if (SVT == MVT::f32 && ((Subtarget.hasBF16() && Subtarget.hasVLX()) || - Subtarget.hasAVXNECONVERT())) { + if (SVT == MVT::f32 && Subtarget.getDenormalMathFTZDAZBF16() && + ((Subtarget.hasBF16() && Subtarget.hasVLX()) || + Subtarget.hasAVXNECONVERT())) { SDValue Res; Res = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, MVT::v4f32, Op.getOperand(0)); Res = DAG.getNode(X86ISD::CVTNEPS2BF16, DL, MVT::v8bf16, Res); diff --git a/llvm/lib/Target/X86/X86Subtarget.cpp b/llvm/lib/Target/X86/X86Subtarget.cpp index c2e6ddd7e7fa2c..150236332ac20d 100644 --- a/llvm/lib/Target/X86/X86Subtarget.cpp +++ b/llvm/lib/Target/X86/X86Subtarget.cpp @@ -324,12 +324,14 @@ X86Subtarget::X86Subtarget(const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, const X86TargetMachine &TM, MaybeAlign StackAlignOverride, unsigned PreferVectorWidthOverride, - unsigned RequiredVectorWidth) + unsigned RequiredVectorWidth, + bool DenormalMathFTZDAZBF16) : X86GenSubtargetInfo(TT, CPU, TuneCPU, FS), PICStyle(PICStyles::Style::None), TM(TM), TargetTriple(TT), StackAlignOverride(StackAlignOverride), PreferVectorWidthOverride(PreferVectorWidthOverride), RequiredVectorWidth(RequiredVectorWidth), + DenormalMathFTZDAZBF16(DenormalMathFTZDAZBF16), InstrInfo(initializeSubtargetDependencies(CPU, TuneCPU, FS)), TLInfo(TM, *this), FrameLowering(*this, getStackAlignment()) { // Determine the PICStyle based on the target selected. diff --git a/llvm/lib/Target/X86/X86Subtarget.h b/llvm/lib/Target/X86/X86Subtarget.h index a458b5f9ec8fbb..0b69dbf192e9f3 100644 --- a/llvm/lib/Target/X86/X86Subtarget.h +++ b/llvm/lib/Target/X86/X86Subtarget.h @@ -106,6 +106,9 @@ class X86Subtarget final : public X86GenSubtargetInfo { /// Required vector width from function attribute. unsigned RequiredVectorWidth; + /// Denormal math for bfloat from function attribute. + bool DenormalMathFTZDAZBF16 = false; + X86SelectionDAGInfo TSInfo; // Ordering here is important. X86InstrInfo initializes X86RegisterInfo which // X86TargetLowering needs. @@ -119,8 +122,8 @@ class X86Subtarget final : public X86GenSubtargetInfo { /// X86Subtarget(const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, const X86TargetMachine &TM, MaybeAlign StackAlignOverride, - unsigned PreferVectorWidthOverride, - unsigned RequiredVectorWidth); + unsigned PreferVectorWidthOverride, unsigned RequiredVectorWidth, + bool DenormalMathFTZDAZBF16); const X86TargetLowering *getTargetLowering() const override { return &TLInfo; @@ -238,6 +241,7 @@ class X86Subtarget final : public X86GenSubtargetInfo { unsigned getPreferVectorWidth() const { return PreferVectorWidth; } unsigned getRequiredVectorWidth() const { return RequiredVectorWidth; } + bool getDenormalMathFTZDAZBF16() const { return DenormalMathFTZDAZBF16; } // Helper functions to determine when we should allow widening to 512-bit // during codegen. diff --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp index 86b456019c4e56..ecb67fc887e26b 100644 --- a/llvm/lib/Target/X86/X86TargetMachine.cpp +++ b/llvm/lib/Target/X86/X86TargetMachine.cpp @@ -304,6 +304,15 @@ X86TargetMachine::getSubtargetImpl(const Function &F) const { } } + // Extract denormal-fp-math-bf16 attribute. + bool DenormalMathFTZDAZBF16 = true; + Attribute DenormalBF16MathAttr = F.getFnAttribute("denormal-fp-math-bf16"); + if (DenormalBF16MathAttr.isValid()) { + StringRef Val = DenormalBF16MathAttr.getValueAsString(); + if (Val != "" && Val != "preserve-sign,preserve-sign") + DenormalMathFTZDAZBF16 = false; + } + // Add CPU to the Key. Key += CPU; @@ -339,7 +348,7 @@ X86TargetMachine::getSubtargetImpl(const Function &F) const { I = std::make_unique<X86Subtarget>( TargetTriple, CPU, TuneCPU, FS, *this, MaybeAlign(F.getParent()->getOverrideStackAlignment()), - PreferVectorWidthOverride, RequiredVectorWidth); + PreferVectorWidthOverride, RequiredVectorWidth, DenormalMathFTZDAZBF16); } return I.get(); } diff --git a/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll b/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll new file mode 100644 index 00000000000000..66c3bf22a3f9f0 --- /dev/null +++ b/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll @@ -0,0 +1,78 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-linux-gnu -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math-bf16=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=NOFTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math-bf16=preserve-sign -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math=ieee -denormal-fp-math-bf16=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=NOFTZDAZ + +define void @add_default_attr(ptr %pa, ptr %pb, ptr %pc) nounwind { +; FTZDAZ-LABEL: add_default_attr: +; FTZDAZ: # %bb.0: +; FTZDAZ-NEXT: movzwl (%rsi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm0 +; FTZDAZ-NEXT: movzwl (%rdi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm1 +; FTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; FTZDAZ-NEXT: {vex} vcvtneps2bf16 %xmm0, %xmm0 +; FTZDAZ-NEXT: vpextrw $0, %xmm0, (%rdx) +; FTZDAZ-NEXT: retq +; +; NOFTZDAZ-LABEL: add_default_attr: +; NOFTZDAZ: # %bb.0: +; NOFTZDAZ-NEXT: pushq %rbx +; NOFTZDAZ-NEXT: movq %rdx, %rbx +; NOFTZDAZ-NEXT: movzwl (%rsi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm0 +; NOFTZDAZ-NEXT: movzwl (%rdi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm1 +; NOFTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; NOFTZDAZ-NEXT: callq __truncsfbf2@PLT +; NOFTZDAZ-NEXT: vpextrw $0, %xmm0, (%rbx) +; NOFTZDAZ-NEXT: popq %rbx +; NOFTZDAZ-NEXT: retq + %a = load bfloat, ptr %pa + %b = load bfloat, ptr %pb + %add = fadd bfloat %a, %b + store bfloat %add, ptr %pc + ret void +} + +define void @add_no_ftz_daz_attr(ptr %pa, ptr %pb, ptr %pc) nounwind "denormal-fp-math-bf16"="ieee,ieee" { +; FTZDAZ-LABEL: add_no_ftz_daz_attr: +; FTZDAZ: # %bb.0: +; FTZDAZ-NEXT: movzwl (%rsi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm0 +; FTZDAZ-NEXT: movzwl (%rdi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm1 +; FTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; FTZDAZ-NEXT: {vex} vcvtneps2bf16 %xmm0, %xmm0 +; FTZDAZ-NEXT: vpextrw $0, %xmm0, (%rdx) +; FTZDAZ-NEXT: retq +; +; NOFTZDAZ-LABEL: add_no_ftz_daz_attr: +; NOFTZDAZ: # %bb.0: +; NOFTZDAZ-NEXT: pushq %rbx +; NOFTZDAZ-NEXT: movq %rdx, %rbx +; NOFTZDAZ-NEXT: movzwl (%rsi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm0 +; NOFTZDAZ-NEXT: movzwl (%rdi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm1 +; NOFTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; NOFTZDAZ-NEXT: callq __truncsfbf2@PLT +; NOFTZDAZ-NEXT: vpextrw $0, %xmm0, (%rbx) +; NOFTZDAZ-NEXT: popq %rbx +; NOFTZDAZ-NEXT: retq + %a = load bfloat, ptr %pa + %b = load bfloat, ptr %pb + %add = fadd bfloat %a, %b + store bfloat %add, ptr %pc + ret void +} diff --git a/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll b/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll new file mode 100644 index 00000000000000..0524d9354cf14d --- /dev/null +++ b/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll @@ -0,0 +1,23 @@ +; RUN: opt -S -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEE,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO,ALL %s + +; ALL: @no_denormal_fp_math_f32_attr() [[NOATTR:#[0-9]+]] { +define i32 @no_denormal_fp_math_f32_attr() #0 { +entry: + ret i32 0 +} + +; ALL: denormal_fp_math_attr_preserve_sign_ieee() [[ATTR:#[0-9]+]] { +define i32 @denormal_fp_math_attr_preserve_sign_ieee() #1 { +entry: + ret i32 0 +} + +; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,ieee" } +; IEEE-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGN-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZERO-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="positive-zero,positive-zero" } + +attributes #0 = { nounwind } +attributes #1 = { nounwind "denormal-fp-math-bf16"="preserve-sign,ieee" } diff --git a/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll b/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll index 306fc78a2183a2..e14320007b5895 100644 --- a/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll +++ b/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll @@ -6,11 +6,17 @@ ; RUN: opt -S -denormal-fp-math-f32=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGNF32,ALL %s ; RUN: opt -S -denormal-fp-math-f32=positive-zero %s | FileCheck -check-prefixes=POSITIVEZEROF32,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEEBF16,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGNBF16,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZEROBF16,ALL %s + ; RUN: opt -S -denormal-fp-math=ieee -denormal-fp-math-f32=ieee %s | FileCheck -check-prefixes=IEEE-BOTH,ALL %s ; RUN: opt -S -denormal-fp-math=preserve-sign -denormal-fp-math-f32=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN-BOTH,ALL %s ; RUN: opt -S -denormal-fp-math=positive-zero -denormal-fp-math-f32=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO-BOTH,ALL %s - +; RUN: opt -S -denormal-fp-math=ieee -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEE-BOTH2,ALL %s +; RUN: opt -S -denormal-fp-math=preserve-sign -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN-BOTH2,ALL %s +; RUN: opt -S -denormal-fp-math=positive-zero -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO-BOTH2,ALL %s ; ALL: @no_denormal_fp_math_attrs() [[NOATTR:#[0-9]+]] { define i32 @no_denormal_fp_math_attrs() #0 { @@ -24,7 +30,7 @@ entry: ret i32 0 } -; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } +; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-bf16"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } ; IEEE-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" } ; PRESERVESIGN-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" } @@ -34,9 +40,17 @@ entry: ; PRESERVESIGNF32-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" } ; POSITIVEZEROF32-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-f32"="positive-zero,positive-zero" } +; IEEEBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGNBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZEROBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="positive-zero,positive-zero" } + ; IEEE-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" "denormal-fp-math-f32"="ieee,ieee" } ; PRESERVESIGN-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="preserve-sign,preserve-sign" } ; POSITIVEZERO-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="positive-zero,positive-zero" "denormal-fp-math-f32"="positive-zero,positive-zero" } +; IEEE-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGN-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZERO-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="positive-zero,positive-zero" "denormal-fp-math-bf16"="positive-zero,positive-zero" } + attributes #0 = { nounwind } -attributes #1 = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } +attributes #1 = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-bf16"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } |
| @llvm/pr-subscribers-backend-x86 Author: Freddy Ye (FreddyLeaf) ChangesRespect default value as "preserve-sign,preserve-sign" for X86 backend. Full diff: https://github.com/llvm/llvm-project/pull/90425.diff 10 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index f169ab941c457b..4f37348f55a7cb 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -2408,6 +2408,17 @@ example: attempt is made to diagnose unsupported uses. Currently this attribute is respected by the AMDGPU and NVPTX backends. +``"denormal-fp-math-bf16"`` + Same as ``"denormal-fp-math"``, but only controls the behavior of + the Brain Float16 type (or vectors of Brain Float16). If both are + are present, this overrides ``"denormal-fp-math"``. Not all targets + support separately setting the denormal mode per type, and no + attempt is made to diagnose unsupported uses. Currently this + attribute is respected by the X86 backend. + + If this is attribute is not specified, the default is + ``"preserve-sign,preserve-sign"``. + ``"thunk"`` This attribute indicates that the function will delegate to some other function with a tail call. The prototype of a thunk should not be used for diff --git a/llvm/include/llvm/CodeGen/CommandFlags.h b/llvm/include/llvm/CodeGen/CommandFlags.h index 244dabd38cf65b..58d5c810553fa5 100644 --- a/llvm/include/llvm/CodeGen/CommandFlags.h +++ b/llvm/include/llvm/CodeGen/CommandFlags.h @@ -71,6 +71,7 @@ bool getEnableNoTrappingFPMath(); DenormalMode::DenormalModeKind getDenormalFPMath(); DenormalMode::DenormalModeKind getDenormalFP32Math(); +DenormalMode::DenormalModeKind getDenormalBF16Math(); bool getEnableHonorSignDependentRoundingFPMath(); diff --git a/llvm/lib/CodeGen/CommandFlags.cpp b/llvm/lib/CodeGen/CommandFlags.cpp index 14ac4b2102c2fa..9005005cf050f8 100644 --- a/llvm/lib/CodeGen/CommandFlags.cpp +++ b/llvm/lib/CodeGen/CommandFlags.cpp @@ -73,6 +73,7 @@ CGOPT(bool, EnableNoTrappingFPMath) CGOPT(bool, EnableAIXExtendedAltivecABI) CGOPT(DenormalMode::DenormalModeKind, DenormalFPMath) CGOPT(DenormalMode::DenormalModeKind, DenormalFP32Math) +CGOPT(DenormalMode::DenormalModeKind, DenormalBF16Math) CGOPT(bool, EnableHonorSignDependentRoundingFPMath) CGOPT(FloatABI::ABIType, FloatABIForCalls) CGOPT(FPOpFusion::FPOpFusionMode, FuseFPOps) @@ -277,6 +278,13 @@ codegen::RegisterCodeGenFlags::RegisterCodeGenFlags() { DenormFlagEnumOptions); CGBINDOPT(DenormalFP32Math); + static cl::opt<DenormalMode::DenormalModeKind> DenormalBF16Math( + "denormal-fp-math-bf16", + cl::desc("Select which denormal numbers the code is permitted to require " + "for bfloat"), + cl::init(DenormalMode::PreserveSign), DenormFlagEnumOptions); + CGBINDOPT(DenormalBF16Math); + static cl::opt<bool> EnableHonorSignDependentRoundingFPMath( "enable-sign-dependent-rounding-fp-math", cl::Hidden, cl::desc("Force codegen to assume rounding mode can change dynamically"), @@ -719,6 +727,14 @@ void codegen::setFunctionAttributes(StringRef CPU, StringRef Features, DenormalMode(DenormKind, DenormKind).str()); } + if (DenormalBF16MathView->getNumOccurrences() > 0 && + !F.hasFnAttribute("denormal-fp-math-bf16")) { + // FIXME: Command line flag should expose separate input/output modes. + DenormalMode::DenormalModeKind DenormKind = getDenormalBF16Math(); + NewAttrs.addAttribute("denormal-fp-math-bf16", + DenormalMode(DenormKind, DenormKind).str()); + } + if (TrapFuncNameView->getNumOccurrences() > 0) for (auto &B : F) for (auto &I : B) diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index f0cec6224e84e4..8877493b7c7add 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2283,7 +2283,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, } } - if (!Subtarget.useSoftFloat() && + if (!Subtarget.useSoftFloat() && Subtarget.getDenormalMathFTZDAZBF16() && (Subtarget.hasAVXNECONVERT() || Subtarget.hasBF16())) { addRegisterClass(MVT::v8bf16, Subtarget.hasAVX512() ? &X86::VR128XRegClass : &X86::VR128RegClass); @@ -8740,6 +8740,7 @@ X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const { return LowerBUILD_VECTORvXi1(Op, dl, DAG, Subtarget); if (VT.getVectorElementType() == MVT::bf16 && + Subtarget.getDenormalMathFTZDAZBF16() && (Subtarget.hasAVXNECONVERT() || Subtarget.hasBF16())) return LowerBUILD_VECTORvXbf16(Op, DAG, Subtarget); @@ -21536,6 +21537,7 @@ SDValue X86TargetLowering::LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const { if (VT.getScalarType() == MVT::bf16) { if (SVT.getScalarType() == MVT::f32 && + Subtarget.getDenormalMathFTZDAZBF16() && ((Subtarget.hasBF16() && Subtarget.hasVLX()) || Subtarget.hasAVXNECONVERT())) return Op; @@ -21644,8 +21646,9 @@ SDValue X86TargetLowering::LowerFP_TO_BF16(SDValue Op, SDLoc DL(Op); MVT SVT = Op.getOperand(0).getSimpleValueType(); - if (SVT == MVT::f32 && ((Subtarget.hasBF16() && Subtarget.hasVLX()) || - Subtarget.hasAVXNECONVERT())) { + if (SVT == MVT::f32 && Subtarget.getDenormalMathFTZDAZBF16() && + ((Subtarget.hasBF16() && Subtarget.hasVLX()) || + Subtarget.hasAVXNECONVERT())) { SDValue Res; Res = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, MVT::v4f32, Op.getOperand(0)); Res = DAG.getNode(X86ISD::CVTNEPS2BF16, DL, MVT::v8bf16, Res); diff --git a/llvm/lib/Target/X86/X86Subtarget.cpp b/llvm/lib/Target/X86/X86Subtarget.cpp index c2e6ddd7e7fa2c..150236332ac20d 100644 --- a/llvm/lib/Target/X86/X86Subtarget.cpp +++ b/llvm/lib/Target/X86/X86Subtarget.cpp @@ -324,12 +324,14 @@ X86Subtarget::X86Subtarget(const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, const X86TargetMachine &TM, MaybeAlign StackAlignOverride, unsigned PreferVectorWidthOverride, - unsigned RequiredVectorWidth) + unsigned RequiredVectorWidth, + bool DenormalMathFTZDAZBF16) : X86GenSubtargetInfo(TT, CPU, TuneCPU, FS), PICStyle(PICStyles::Style::None), TM(TM), TargetTriple(TT), StackAlignOverride(StackAlignOverride), PreferVectorWidthOverride(PreferVectorWidthOverride), RequiredVectorWidth(RequiredVectorWidth), + DenormalMathFTZDAZBF16(DenormalMathFTZDAZBF16), InstrInfo(initializeSubtargetDependencies(CPU, TuneCPU, FS)), TLInfo(TM, *this), FrameLowering(*this, getStackAlignment()) { // Determine the PICStyle based on the target selected. diff --git a/llvm/lib/Target/X86/X86Subtarget.h b/llvm/lib/Target/X86/X86Subtarget.h index a458b5f9ec8fbb..0b69dbf192e9f3 100644 --- a/llvm/lib/Target/X86/X86Subtarget.h +++ b/llvm/lib/Target/X86/X86Subtarget.h @@ -106,6 +106,9 @@ class X86Subtarget final : public X86GenSubtargetInfo { /// Required vector width from function attribute. unsigned RequiredVectorWidth; + /// Denormal math for bfloat from function attribute. + bool DenormalMathFTZDAZBF16 = false; + X86SelectionDAGInfo TSInfo; // Ordering here is important. X86InstrInfo initializes X86RegisterInfo which // X86TargetLowering needs. @@ -119,8 +122,8 @@ class X86Subtarget final : public X86GenSubtargetInfo { /// X86Subtarget(const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, const X86TargetMachine &TM, MaybeAlign StackAlignOverride, - unsigned PreferVectorWidthOverride, - unsigned RequiredVectorWidth); + unsigned PreferVectorWidthOverride, unsigned RequiredVectorWidth, + bool DenormalMathFTZDAZBF16); const X86TargetLowering *getTargetLowering() const override { return &TLInfo; @@ -238,6 +241,7 @@ class X86Subtarget final : public X86GenSubtargetInfo { unsigned getPreferVectorWidth() const { return PreferVectorWidth; } unsigned getRequiredVectorWidth() const { return RequiredVectorWidth; } + bool getDenormalMathFTZDAZBF16() const { return DenormalMathFTZDAZBF16; } // Helper functions to determine when we should allow widening to 512-bit // during codegen. diff --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp index 86b456019c4e56..ecb67fc887e26b 100644 --- a/llvm/lib/Target/X86/X86TargetMachine.cpp +++ b/llvm/lib/Target/X86/X86TargetMachine.cpp @@ -304,6 +304,15 @@ X86TargetMachine::getSubtargetImpl(const Function &F) const { } } + // Extract denormal-fp-math-bf16 attribute. + bool DenormalMathFTZDAZBF16 = true; + Attribute DenormalBF16MathAttr = F.getFnAttribute("denormal-fp-math-bf16"); + if (DenormalBF16MathAttr.isValid()) { + StringRef Val = DenormalBF16MathAttr.getValueAsString(); + if (Val != "" && Val != "preserve-sign,preserve-sign") + DenormalMathFTZDAZBF16 = false; + } + // Add CPU to the Key. Key += CPU; @@ -339,7 +348,7 @@ X86TargetMachine::getSubtargetImpl(const Function &F) const { I = std::make_unique<X86Subtarget>( TargetTriple, CPU, TuneCPU, FS, *this, MaybeAlign(F.getParent()->getOverrideStackAlignment()), - PreferVectorWidthOverride, RequiredVectorWidth); + PreferVectorWidthOverride, RequiredVectorWidth, DenormalMathFTZDAZBF16); } return I.get(); } diff --git a/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll b/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll new file mode 100644 index 00000000000000..66c3bf22a3f9f0 --- /dev/null +++ b/llvm/test/CodeGen/X86/bfloat-ftz-daz.ll @@ -0,0 +1,78 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-linux-gnu -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math-bf16=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=NOFTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math-bf16=preserve-sign -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=FTZDAZ +; RUN: llc < %s -mtriple=x86_64-linux-gnu -denormal-fp-math=ieee -denormal-fp-math-bf16=ieee -mattr=avxneconvert | FileCheck %s --check-prefixes=NOFTZDAZ + +define void @add_default_attr(ptr %pa, ptr %pb, ptr %pc) nounwind { +; FTZDAZ-LABEL: add_default_attr: +; FTZDAZ: # %bb.0: +; FTZDAZ-NEXT: movzwl (%rsi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm0 +; FTZDAZ-NEXT: movzwl (%rdi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm1 +; FTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; FTZDAZ-NEXT: {vex} vcvtneps2bf16 %xmm0, %xmm0 +; FTZDAZ-NEXT: vpextrw $0, %xmm0, (%rdx) +; FTZDAZ-NEXT: retq +; +; NOFTZDAZ-LABEL: add_default_attr: +; NOFTZDAZ: # %bb.0: +; NOFTZDAZ-NEXT: pushq %rbx +; NOFTZDAZ-NEXT: movq %rdx, %rbx +; NOFTZDAZ-NEXT: movzwl (%rsi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm0 +; NOFTZDAZ-NEXT: movzwl (%rdi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm1 +; NOFTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; NOFTZDAZ-NEXT: callq __truncsfbf2@PLT +; NOFTZDAZ-NEXT: vpextrw $0, %xmm0, (%rbx) +; NOFTZDAZ-NEXT: popq %rbx +; NOFTZDAZ-NEXT: retq + %a = load bfloat, ptr %pa + %b = load bfloat, ptr %pb + %add = fadd bfloat %a, %b + store bfloat %add, ptr %pc + ret void +} + +define void @add_no_ftz_daz_attr(ptr %pa, ptr %pb, ptr %pc) nounwind "denormal-fp-math-bf16"="ieee,ieee" { +; FTZDAZ-LABEL: add_no_ftz_daz_attr: +; FTZDAZ: # %bb.0: +; FTZDAZ-NEXT: movzwl (%rsi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm0 +; FTZDAZ-NEXT: movzwl (%rdi), %eax +; FTZDAZ-NEXT: shll $16, %eax +; FTZDAZ-NEXT: vmovd %eax, %xmm1 +; FTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; FTZDAZ-NEXT: {vex} vcvtneps2bf16 %xmm0, %xmm0 +; FTZDAZ-NEXT: vpextrw $0, %xmm0, (%rdx) +; FTZDAZ-NEXT: retq +; +; NOFTZDAZ-LABEL: add_no_ftz_daz_attr: +; NOFTZDAZ: # %bb.0: +; NOFTZDAZ-NEXT: pushq %rbx +; NOFTZDAZ-NEXT: movq %rdx, %rbx +; NOFTZDAZ-NEXT: movzwl (%rsi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm0 +; NOFTZDAZ-NEXT: movzwl (%rdi), %eax +; NOFTZDAZ-NEXT: shll $16, %eax +; NOFTZDAZ-NEXT: vmovd %eax, %xmm1 +; NOFTZDAZ-NEXT: vaddss %xmm0, %xmm1, %xmm0 +; NOFTZDAZ-NEXT: callq __truncsfbf2@PLT +; NOFTZDAZ-NEXT: vpextrw $0, %xmm0, (%rbx) +; NOFTZDAZ-NEXT: popq %rbx +; NOFTZDAZ-NEXT: retq + %a = load bfloat, ptr %pa + %b = load bfloat, ptr %pb + %add = fadd bfloat %a, %b + store bfloat %add, ptr %pc + ret void +} diff --git a/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll b/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll new file mode 100644 index 00000000000000..0524d9354cf14d --- /dev/null +++ b/llvm/test/Other/opt-override-denormal-fp-math-bf16.ll @@ -0,0 +1,23 @@ +; RUN: opt -S -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEE,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO,ALL %s + +; ALL: @no_denormal_fp_math_f32_attr() [[NOATTR:#[0-9]+]] { +define i32 @no_denormal_fp_math_f32_attr() #0 { +entry: + ret i32 0 +} + +; ALL: denormal_fp_math_attr_preserve_sign_ieee() [[ATTR:#[0-9]+]] { +define i32 @denormal_fp_math_attr_preserve_sign_ieee() #1 { +entry: + ret i32 0 +} + +; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,ieee" } +; IEEE-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGN-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZERO-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="positive-zero,positive-zero" } + +attributes #0 = { nounwind } +attributes #1 = { nounwind "denormal-fp-math-bf16"="preserve-sign,ieee" } diff --git a/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll b/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll index 306fc78a2183a2..e14320007b5895 100644 --- a/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll +++ b/llvm/test/Other/opt-override-denormal-fp-math-mixed.ll @@ -6,11 +6,17 @@ ; RUN: opt -S -denormal-fp-math-f32=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGNF32,ALL %s ; RUN: opt -S -denormal-fp-math-f32=positive-zero %s | FileCheck -check-prefixes=POSITIVEZEROF32,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEEBF16,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGNBF16,ALL %s +; RUN: opt -S -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZEROBF16,ALL %s + ; RUN: opt -S -denormal-fp-math=ieee -denormal-fp-math-f32=ieee %s | FileCheck -check-prefixes=IEEE-BOTH,ALL %s ; RUN: opt -S -denormal-fp-math=preserve-sign -denormal-fp-math-f32=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN-BOTH,ALL %s ; RUN: opt -S -denormal-fp-math=positive-zero -denormal-fp-math-f32=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO-BOTH,ALL %s - +; RUN: opt -S -denormal-fp-math=ieee -denormal-fp-math-bf16=ieee %s | FileCheck -check-prefixes=IEEE-BOTH2,ALL %s +; RUN: opt -S -denormal-fp-math=preserve-sign -denormal-fp-math-bf16=preserve-sign %s | FileCheck -check-prefixes=PRESERVESIGN-BOTH2,ALL %s +; RUN: opt -S -denormal-fp-math=positive-zero -denormal-fp-math-bf16=positive-zero %s | FileCheck -check-prefixes=POSITIVEZERO-BOTH2,ALL %s ; ALL: @no_denormal_fp_math_attrs() [[NOATTR:#[0-9]+]] { define i32 @no_denormal_fp_math_attrs() #0 { @@ -24,7 +30,7 @@ entry: ret i32 0 } -; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } +; ALL-DAG: attributes [[ATTR]] = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-bf16"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } ; IEEE-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" } ; PRESERVESIGN-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" } @@ -34,9 +40,17 @@ entry: ; PRESERVESIGNF32-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" } ; POSITIVEZEROF32-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-f32"="positive-zero,positive-zero" } +; IEEEBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGNBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZEROBF16-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math-bf16"="positive-zero,positive-zero" } + ; IEEE-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" "denormal-fp-math-f32"="ieee,ieee" } ; PRESERVESIGN-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="preserve-sign,preserve-sign" } ; POSITIVEZERO-BOTH-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="positive-zero,positive-zero" "denormal-fp-math-f32"="positive-zero,positive-zero" } +; IEEE-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="ieee,ieee" "denormal-fp-math-bf16"="ieee,ieee" } +; PRESERVESIGN-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-bf16"="preserve-sign,preserve-sign" } +; POSITIVEZERO-BOTH2-DAG: attributes [[NOATTR]] = { nounwind "denormal-fp-math"="positive-zero,positive-zero" "denormal-fp-math-bf16"="positive-zero,positive-zero" } + attributes #0 = { nounwind } -attributes #1 = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } +attributes #1 = { nounwind "denormal-fp-math"="preserve-sign,ieee" "denormal-fp-math-bf16"="preserve-sign,ieee" "denormal-fp-math-f32"="preserve-sign,ieee" } |
arsenm left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does X86 actually have a separately controllable bfloat denormal mode? This does not control the mode. It only informs the code generator that this is the mode. Without an underlying mode control bit, this is not correct/useful
You are right, currently X86 hardware now only has control bit on denormal mode for f32/f64 (in MXCSR IIRC). But even for attribute of |
So all the instructions are just defective?
Sigh. I guess this would be the correct way to handle it |
Is it all the operations, or just these specific dot products? Are there specific bf16<->float conversions that have the same issue? What about basic arithmetic operations? |
Is this really a case of "defective instructions" or is it just a difference between the way that Intel processors understand the bfloat16 type compared to other architectures? The Intel white paper on bfloat16 (https://www.intel.com/content/www/us/en/content-details/671279/bfloat16-hardware-numerics-definition.html) says, "There is no need to support denormals; FP32, and therefore also BF16, offer more than enough range for deep learning training tasks." |
x86 only supports bf16<->float conversion and this dot production operations. float->bf16 instruction also mentioned this in SDM but bf16->float didn't. but it still mentioned does not consult or update MXCSR so I suppose it won't flush to zero, either since it required consulting MXCSR. |
| We don't have In the contrast, the FP16 type never does DAZ/FTZ (https://cdrdv2-public.intel.com/678970/intel-avx512-fp16.pdf), though it's not controlled by MXCSR either. |
Intel added them in AVX_NE_CONVERT, memory input only. |
Ok, they use DAZ as well. |
As a format, it's just IEEE with a different combination of mantissa and exponent widths. Denormals have a specific, and clear meaning here and there's no implied flushing on computation
I don't know how to parse this comment. Denormals, in what type? I almost read this as "you don't need to handle fp16 denormals if you process in bf16 instead". At worst it's a subjective value judgement that bad behavior is OK, but I'm not sure that's what it's really saying |
I don't accept that. Denormals have a specific and clear meaning in the IEEE types, but once you say "This is like the IEEE type except..." you can no longer assume anything that isn't part of the specification for the new type. Is there an accepted standard specification for this type? I'm not sure how this would be adjudicated apart from reference to actual implementations defining an ad hoc standard. Here's something that comes close to a definition for the type, though it isn't presented as a formal specification: That document says, "To ensure identical behavior for underflows, overflows, and NaNs, bfloat16 has the same exponent size as FP32. However, bfloat16 handles denormals differently from FP32: it flushes them to zero."
I will admit that doesn't sound like a technical specification so much as a marketing pitch. I haven't spoken to any of the Intel hardware engineers responsible for the BF16 implementation in Intel processors, and any opinions I am expressing here do not represent Intel's official position. I'm just offering my interpretation, and my interpretation is that support for denormals is not required for bfloat16. |
The IEEE formats are specified as M mantissa bits and E exponent bits, just because the standard didn't prescribe this particular combination of bits as a suggested format doesn't mean it's some wild thing with no obligation to be consistent. Denormals are a set of values in the encoding, which bfloat certainly has. The choice for a computation to drop bits on the floor when a value would need to be encoded as a denormal is somewhat orthogonal to the format itself. |

Respect default value as "preserve-sign,preserve-sign" for X86 backend.