Why do programmers use floating-point numbers if they can sometimes be inaccurate?

Question

If floating point numbers can sometimes be inaccurate, why do they exist? What scenarios would someone want to use floating point numbers?

Programmers use floating point numbers for the same reason non-programmers use them. There are some situations where the best representation is captured with a floating point number. — jwh20
– jwh20, Commented Jun 5, 2019 at 19:28
Q: If a hammer can sometimes cause damage to a thumb, why do people use hammers? A: the benefits outweigh the costs and understanding the problems and limitations limit the potential for damage. — crashmstr
– crashmstr, Commented Jun 5, 2019 at 19:31
They are accurate. double are ~17 decimals, while most sensors in physics are < 6 decimals. So no need for hundreds of decimals in these computations and they are perfect for many problems. But they do not have a perfect accuracy and it may cause problems from time to time if you are not aware of that. — Alain Merigot
– Alain Merigot, Commented Jun 5, 2019 at 19:45
Are you prepared to grant that there are many many many applications for doing computations with something approximating real numbers? E.g., I want to calculate a rocket trajectory, or model a chemical reaction, or extract frequency data from an audio signal, or perform image recognition, or compute a failure probability, or <insert thousands of other real-world possibilities here>? If so, all we have to persuade you of is that floating-point numbers are a reasonable tool for performing these calculations. If not, I think we have a bigger task and this question may be off-topic ... — Mark Dickinson
– Mark Dickinson, Commented Jun 5, 2019 at 19:53
It is cheaper compared to 'double'. If you deal with large arrays (like matrix sized millions), you have to consider the size of the RAM (heap and stack) in order to be able to save your array. — cho_uc
– cho_uc, Commented Jun 5, 2019 at 20:05

Community · Accepted Answer · 2020-06-20 09:12:55Z

(First, note that floating-point numbers are exact. It is floating-point arithmetic that approximates real-number arithmetic. This distinction is important for designing good floating-point software and writing proofs about it.)

People use floating-point arithmetic because it is useful for working with numbers of diverse magnitudes. Consider using floating-point to design and construct a building or other structure. When the designer specifies a beam or cable that is 10 meters long, the actual delivered cable will not be 10 meters long. If you measure it and convert the result to a 32-bit float¹, the conversion might introduce an error, which will be less than one micrometer. Your measurement of the cable will have more error than that. So the floating-point error is minuscule and does not matter in this simple measurement.

When many calculations are done, these rounding errors can not only accumulate but can combine in surprising ways. If float is not sufficient, we can use double, in which the initial error for a measurement around 10 meters would be under 2 femtometers (10⁻¹⁵ meters).

So floating-point has plenty of precision for normal physical uses: Measuring and designing objects, processing audio or radio signals, evaluating hypotheses from physics or chemistry, and so on. When floating-point is used well, the representation errors and rounding errors in floating-point arithmetic simply do not matter. They are too small to notice; they have no observable effects on the work being done.

Issues using floating-point arise when novices are accustomed to the rigidity of most integer arithmetic and are surprised by how floating-point behaves. Although an error might be one part in 9•10¹⁵, if it means the result is 6.99999999999999911182158029987476766109466552734375 instead of 7 in a number that is converted to int, then they get the wrong result and do not understand how their program went wrong. Mostly this error arises among students and Stack Overflow question writers and is not a problem in practice, when floating-point code is used by people who have learned the basics of floating-point arithmetic.

Issues also arise because, as mentioned above, errors can combine in surprising ways. Matrix operations, for example, can be “unstable,” meaning they tend to amplify errors. Thus, although a floating-point format may have plenty of precision, results may have great errors (compared to real-number arithmetic) due to mathematical properties of the data and operations.

Nonetheless, floating-point is very useful for some work where it would be a burden to use integer arithmetic. When the numbers are diverse in magnitude, it is hard to write integer arithmetic that handles them. Either the scaling has to be designed in advance (which limits what data a program can work with) or it has to be managed by the program, which is essentially a reinvention of floating-point.

Footnotes

¹ IEEE-754 basic 32-bit binary floating-point, which has a sign, an eight-bit exponent, and a 24-bit significand.

² IEEE-754 basic 64-bit binary floating-point, which has a sign, an eleven-bit exponent, and a 53-bit significand.

Collectives™ on Stack Overflow

Why do programmers use floating-point numbers if they can sometimes be inaccurate?

1 Answer 1

Footnotes

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Footnotes

Comments

Related