Subscribe to RSS

Question 1

As far as I can tell, the MSVC implementation of signaling_NaN does not comply with IEEE 754-2019, the latest version of the IEEE floating-point standard. Unfortunately, I do not have a copy of the ...

Question 2

I'm working with floating-point numbers in Bigloo Scheme, and I encountered a precision issue when performing a simple multiplication: (* 0.005 1e-9) ;; => 5.0000000000000005e-12 I was expecting ...

Question 3

Not sure if that is possible at all?... It is typical problem - when value in db is 4.725 but in UI it shows 4.7250000000000005. And there are lot of other value examples which generating such kind of ...

Question 4

I am writing C++ header-only library doing some floating-point math. Since the library is header-only I realized that the library user sooner or later will include it into his project where -ffast-...

Question 5

I've written a little test program that tiggers FPU-exceptions through feraiseexcept(): #include <iostream> #include <cfenv> using namespace std; int main() { auto test = []( int exc,...

Question 6

Consider the following code #include <iostream> int main() { const std::string s("5.87747175e-39"); float f = std::stof(s); std::cout << s << " - " <<...

Question 7

In ECMAScript, given a non-negative, finite double x, is the following assertion always true? Math.sqrt(x) === Math.pow(x, 0.5) I know that both Math.sqrt() and Math.pow() are implementation-...

Question 8

This is a follow-up to my previous question here. Additional restrictions highlighted in bold. Given two nonzero, finite, double-precision (a.k.a. binary64) floating point numbers x and y, is it ...

Question 9

Given two nonzero, finite, double-precision floating point numbers x and y, is it always true that the equality x * y == ((x * y) / y) * y holds under default IEEE 754 semantics? I've searched ...

Question 10

I noticed that the result of Math.pow(10, -4) differs between JavaScript and C#. JavaScript Math.pow C# Math.Pow In JavaScript, it seems the result is expressed as an approximation, possibly due to ...

Question 11

Numbers sometimes cannot be expressed exactly when they are represented in double precision or single precision. Of course working with bigdecimal is a solution, I know that. Let's come to my question:...

Question 12

I am implementing emulation of ARM float16_t for X64 using SSE; the idea is to have bit-exact values on both platforms. I mostly finished the implementation, except for one thing, I cannot correctly ...

Question 13

I was reading about the floating point implementation from the comments of a ziglings.org exercise, and I came across this info about it. // Floating further: // // As an example, Zig's f16 is a IEEE ...

Question 14

I'm trying to understand how NumPy implements rounding to nearest even when converting to a lower precision format, in this case, Float32 to Float16, specifically the case, when the number is normal ...

Question 15

I want to convert a double to a string with a given number of decimal places in C++ as well as in C# and I want the results of those conversions to be the same in both languages. Especially C++ ...

Collectives™ on Stack Overflow

Does the MSVC implementation of `signaling_NaN` comply with the the latest IEEE floating-point standard?

How to get consistent scientific notation with limited precision in Bigloo Scheme?

How to make TypeORM auto-fixing all floating-point values according to their db schema type?

Good practices guidelines for `ffast-math`

How to trigger exactly only one SSE-exception

c++ std::stof() throws out_of_range for 5.87747175e-39

Is Math.sqrt(x) and Math.pow(x, 0.5) equivalent?

If xy ≠ 2ⁿ, does it follow that x y = ((x * y) / y) * y under IEEE 754 semantics?

Is it always true that x * y = ((x * y) / y) * y under IEEE 754 semantics?

Why does Math.pow(10, -4) produce different results in JavaScript and C#?

Java Double Precision - Rounding - %f specifier

float16_t rounding on ARM NEON

Floating Point: Why does the implicit 1 change the value of the fractional part?

Numpy Float to HalfFloat conversion RNE when result is subnormal

How to achieve same double to string conversion rounding results in C++ and C#?

Hot Network Questions