1,525 questions
8 votes
1 answer
176 views
Does the MSVC implementation of `signaling_NaN` comply with the the latest IEEE floating-point standard?
As far as I can tell, the MSVC implementation of signaling_NaN does not comply with IEEE 754-2019, the latest version of the IEEE floating-point standard. Unfortunately, I do not have a copy of the ...
1 vote
2 answers
92 views
How to get consistent scientific notation with limited precision in Bigloo Scheme?
I'm working with floating-point numbers in Bigloo Scheme, and I encountered a precision issue when performing a simple multiplication: (* 0.005 1e-9) ;; => 5.0000000000000005e-12 I was expecting ...
0 votes
0 answers
44 views
How to make TypeORM auto-fixing all floating-point values according to their db schema type?
Not sure if that is possible at all?... It is typical problem - when value in db is 4.725 but in UI it shows 4.7250000000000005. And there are lot of other value examples which generating such kind of ...
6 votes
1 answer
181 views
Good practices guidelines for `ffast-math`
I am writing C++ header-only library doing some floating-point math. Since the library is header-only I realized that the library user sooner or later will include it into his project where -ffast-...
3 votes
2 answers
129 views
How to trigger exactly only *one* SSE-exception
I've written a little test program that tiggers FPU-exceptions through feraiseexcept(): #include <iostream> #include <cfenv> using namespace std; int main() { auto test = []( int exc,...
3 votes
3 answers
145 views
c++ std::stof() throws out_of_range for 5.87747175e-39
Consider the following code #include <iostream> int main() { const std::string s("5.87747175e-39"); float f = std::stof(s); std::cout << s << " - " <<...
1 vote
3 answers
212 views
Is Math.sqrt(x) and Math.pow(x, 0.5) equivalent?
In ECMAScript, given a non-negative, finite double x, is the following assertion always true? Math.sqrt(x) === Math.pow(x, 0.5) I know that both Math.sqrt() and Math.pow() are implementation-...
8 votes
1 answer
159 views
If x*y ≠ 2ⁿ, does it follow that x * y = ((x * y) / y) * y under IEEE 754 semantics?
This is a follow-up to my previous question here. Additional restrictions highlighted in bold. Given two nonzero, finite, double-precision (a.k.a. binary64) floating point numbers x and y, is it ...
7 votes
1 answer
131 views
Is it always true that x * y = ((x * y) / y) * y under IEEE 754 semantics?
Given two nonzero, finite, double-precision floating point numbers x and y, is it always true that the equality x * y == ((x * y) / y) * y holds under default IEEE 754 semantics? I've searched ...
2 votes
1 answer
217 views
Why does Math.pow(10, -4) produce different results in JavaScript and C#?
I noticed that the result of Math.pow(10, -4) differs between JavaScript and C#. JavaScript Math.pow C# Math.Pow In JavaScript, it seems the result is expressed as an approximation, possibly due to ...
1 vote
2 answers
117 views
Java Double Precision - Rounding - %f specifier
Numbers sometimes cannot be expressed exactly when they are represented in double precision or single precision. Of course working with bigdecimal is a solution, I know that. Let's come to my question:...
2 votes
1 answer
102 views
float16_t rounding on ARM NEON
I am implementing emulation of ARM float16_t for X64 using SSE; the idea is to have bit-exact values on both platforms. I mostly finished the implementation, except for one thing, I cannot correctly ...
2 votes
2 answers
113 views
Floating Point: Why does the implicit 1 change the value of the fractional part?
I was reading about the floating point implementation from the comments of a ziglings.org exercise, and I came across this info about it. // Floating further: // // As an example, Zig's f16 is a IEEE ...
2 votes
1 answer
84 views
Numpy Float to HalfFloat conversion RNE when result is subnormal
I'm trying to understand how NumPy implements rounding to nearest even when converting to a lower precision format, in this case, Float32 to Float16, specifically the case, when the number is normal ...
11 votes
1 answer
661 views
How to achieve same double to string conversion rounding results in C++ and C#?
I want to convert a double to a string with a given number of decimal places in C++ as well as in C# and I want the results of those conversions to be the same in both languages. Especially C++ ...