Skip to content

[X86] Miscompile when using -ftrapping-math #164802

@abhishek-kaushik22

Description

@abhishek-kaushik22

The following C++ code when compiled with -ftrapping-math throws a fp-exception but it runs fine without the flag.

#include <immintrin.h> #include <cstdint> #include <iostream> #include <cfenv> __attribute__((noinline)) void masked_div_store(double* a, double* b, uint8_t mask) { // Convert i8 mask to __mmask8 __mmask8 k = static_cast<__mmask8>(mask); // Masked load from a and b __m512d va = _mm512_maskz_loadu_pd(k, a); // zero-masked load __m512d vb = _mm512_maskz_loadu_pd(k, b); // zero-masked load // Masked divide: va = va / vb __m512d result = _mm512_mask_div_pd(_mm512_setzero_pd(), k, va, vb); // Masked store back to a _mm512_mask_storeu_pd(a, k, result); } int main() { const auto res = feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW); double a[4] = {8.0, 16.0, 24.0, 32.0}; double b[4] = {2.0, 4.0, 6.0, 8.0}; uint8_t mask = 0xF; // binary: 00001111 — enables lanes 0 to 3 masked_div_store(a, b, mask); std::cout << "Result in a after masked division:\n"; for (int i = 0; i < 4; ++i) { std::cout << "a[" << i << "] = " << a[i] << "\n"; } return 0; }
bash$ clang++ -O3 -mavx512f test2.cpp -o no_trap.exe -fuse-ld=lld bash$ ./no_trap.exe Result in a after masked division: a[0] = 4 a[1] = 4 a[2] = 4 a[3] = 4 bash$ clang++ -O3 -mavx512f -ftrapping-math test2.cpp -o trap.exe -fuse-ld=lld bash$ ./trap.exe Floating point exception (core dumped)

Without the flag llvm generates

masked_div_store(double*, double*, unsigned char):  kmovw k1, edx  vmovupd zmm0 {k1} {z}, zmmword ptr [rdi]  vdivpd zmm0 {k1} {z}, zmm0, zmmword ptr [rsi]  vmovupd zmmword ptr [rdi] {k1}, zmm0  vzeroupper  ret

but with the flag it generates

masked_div_store(double*, double*, unsigned char):  kmovw k1, edx  vmovupd zmm0 {k1} {z}, zmmword ptr [rdi]  vmovupd zmm1 {k1} {z}, zmmword ptr [rsi]  vdivpd zmm0, zmm0, zmm1  vmovapd zmm0 {k1} {z}, zmm0  vmovupd zmmword ptr [rdi] {k1}, zmm0  vzeroupper  ret

The problem here is that we do a full-width division instead of a masked one causing the exception.
When the flag is specified the division instruction in LLVM IR is

%div.i = tail call noundef <8 x double> @llvm.experimental.constrained.fdiv.v8f64(<8 x double> %1, <8 x double> %2, metadata !"round.tonearest", metadata !"fpexcept.strict") #9

which is represented as a strict_fdiv in DAG and there is no pattern to select a masked variant with strict_fp opcodes (I did find this commit dbcc139 that removed them)

Godbolt: https://godbolt.org/z/dMr83s35E

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions