7

I've been reading the floating point guide to try to clarify some points about floating point numbers and I assume Python's decimal library is an implementation of "Limited-Precision Decimal" mentioned on the linked page.

It mentions that "Limited-Precision Decimal" is "Basically the same as a IEEE 754 binary floating-point, except that the exponent is interpreted as base 10. As a result, there are no unexpected rounding errors. Also, this kind of format is relatively compact and fast, but usually slower than binary formats."

Is Python decimal implemented the same way? If all else is equal in the representation besides the exponent being interpreted differently, why is it slower and why isn't this representation always preferred over the IEEE 754 implementation? Finally, why does using the exponent as base 10 prevent unexpected rounding errors?

Thanks!

0

2 Answers 2

15

It mentions that "Limited-Precision Decimal" [...] Is Python decimal implemented the same way?

No, internally Python's Decimal uses a base-10 exponent, along with an arbitrarily large integer. Since the size of the integer is unlimited, the potential precision is unlimited too.

Why is Python's Decimal class slower?

There are a few reasons for this. First, adding two Decimal values of different exponents requires repeated multiplication by ten, and multiplication by ten is more expensive than multiplying by two on a computer which uses binary. Second, doing an exact calculation requires more digits of precision than doing an approximate calculation. Third, IEEE754 floating point has hardware acceleration because it's such a common operation.

why isn't this representation always preferred over the IEEE 754 implementation?

Speed is a feature, and not all calculations benefit from being done exactly. The use of inexact calculations is more widespread than you might think. For example, Excel uses floating-point numbers internally. Yet, it has hundreds of millions of users, so evidently you can get pretty far with only floating point.

Finally, why does using the exponent as base 10 prevent unexpected rounding errors?

The key word in that sentence is "unexpected." You wouldn't be surprised to learn that a base 10 number system can't represent the number 1/3 without rounding it. We understand and are okay with not being able to represent 1/3, 1/7, and 1/9 perfectly accurately. But people are much less accepting of computer systems which can't represent 1/5 accurately.

If you tried to represent 0.2 in binary, you'd get 0.0011(0011), with the 0011 part repeating forever. A floating point number doesn't have an infinite number of bits, so it rounds off everything after 53 bits (assuming double precision) and approximates it.

This is not to say that Decimal is perfectly accurate. There are lots of situations that force rounding. For example, if you took the square root of two, that's an irrational number, and can't be represented as an exact decimal.

Example:

>>> Decimal(2).sqrt() Decimal('1.414213562373095048801688724') >>> Decimal(2).sqrt() ** 2 Decimal('1.999999999999999999999999999') 

Decimal is a way of doing math that agrees with the answer you'd get by doing it with pencil and paper. For this, it trades off speed and memory use.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks so much for the detailed answer! A few follow up questions if you don't mind: - So if the exponent is in base 10, we are now interpreting the exponent as 10^1, 10^5, etc. Are we also interpreting the significand/mantissa any differently than before or is it still 2^-1, 2^-2, and so on?
"First, adding two Decimal values of different exponents requires multiplying by ten, and multiplying by ten is more expensive than multiplying by two on a computer which uses binary." can you clarify this point? Not totally getting why multiplication is involved here in the first place although I assume it's to make the exponents equal to allow the numbers to be added.
"Are we also interpreting the significand/mantissa any differently than before or is it still 2^-1, 2^-2, and so on?" The mantissa is interpreted as an integer times the exponent. The formula is here.
"Not totally getting why multiplication is involved [...] I assume it's to make the exponents equal to allow the numbers to be added" Yes, that's why. "I would think this only comes into play if you actually specify a precision greater than that afforded by the standard floating point numbers?" Yes, but if you specify the same level of precision, reasons 1 and 3 still apply.
"Any chance you have a concrete example showing actual bits of how this would be represented with Decimal" To represent 0.2 with a Decimal object, you would have an integer with a value of 2 and an exponent with a value of -1. I don't know of a calculator to show a bitwise representation of it like a float.
|
1

It mentions that "Limited-Precision Decimal" is "Basically the same as a IEEE 754 binary floating-point, except that the exponent is interpreted as base 10. As a result, there are no unexpected rounding errors. Also, this kind of format is relatively compact and fast, but usually slower than binary formats."

Is Python decimal implemented the same way?

No, Python's Decimal is an arbitrary-precision decimal. Conceptually, it's a pair of ints: One storing the significand, and one storing the power of ten. So, for example, Decimal('3.141592653589793238462643383') is stored as the pair of numbers (3141592653589793238462643383, -27), representing the number 3141592653589793238462643383×10-27.

As stated in your linked page, Decimal “has the ability to increase the length of the significand (possibly also the exponent) as required”.

>>> Decimal(2).sqrt() Decimal('1.414213562373095048801688724') >>> decimal.getcontext().prec = 64 >>> Decimal(2).sqrt() Decimal('1.414213562373095048801688724209698078569671875376948073176679738') 

The advantage of this approach is that you can perform calculations to very high precision. The disadvantage is that storage of arbitrary-precision integers requires dynamic memory allocation and the overhead of running garbage collection (or manually calling free in C). Also, the more digits you use in a calculation, the more time it takes.

Limited-precision decimal uses a fixed-width format. For example, you could implement a decimal class with 1 bit for a sign, 50 bits to store a 15-decimal-digit significand, and 13 bits for an exponent (allowing exponents up to about ±4000), which can be stored in a 64-bit machine word. This approach gives less-precise calculations, but uses less memory and is faster compared to arbitrary precision.

Anyhow, the arithmetic operations (+, -, *, /) on the Decimal class are implemented in such a way as to automatically find a common exponent for addition and subtraction (e.g., treating 1.23 + 45.6789 as 1.2300 + 45.6789 so the decimal places match), add/subtract exponents for multiplication and division, and round the significand if the maximum number of significant digits have been exceeded. This is conveniently hidden from the programmer, but it does take CPU cycles.

why is it slower

See the previous section where I talked about scaling exponents and rounding to a specified number of significant digits. This must be done in software, whereas float calculations are done in hardware (unless you're working on a retrocomputer or embedded system without a floating-point unit). For Python-style arbitrary-precision decimal, there's the additional slowdown of having to dynamically allocate and deallocate memory for the numbers.

and why isn't this representation always preferred over the IEEE 754 implementation?

Because often, having “exact” results is meaningless.

  • Physical measurements have uncertainty. A man who weights “200 pounds” may actually weigh between 197 and 203 pounds depending on how much food and water he happens to have in his body at the time. Even copies of the standard kilogram, specifically engineered to be as accurate as possible, have tolerances on the order of 100 µg, limiting measurements to 10 significant digits of accuracy. Which makes IEEE 754 double-precision (53 significant bits ≈ 15.95 significant digits) more than enough.
  • Decimal is only exact for decimal numbers, i.e., rationals where the denominator is a product of 2's and 5's. It can't exactly represent non-decimal fractions like 1/3 or 1/29. If you need exact arbitrary fractions, there's another class for that.
  • Irrational-valued functions like sqrt, sin, or log can't be stored exactly (unless you have a full computer algebra system), so must be approximated.

In these cases, there's little practical advantage Decimal's higher precision. Using floats instead saves memory and CPU time, which can be noticeable if you're working with arrays of millions of numbers.

Finally, why does using the exponent as base 10 prevent unexpected rounding errors?

Using a base-ten exponent doesn't prevent all rounding errors. For example,

Decimal('0.9999999999999999999999999999') >>> Decimal(2).sqrt() ** 2 Decimal('1.999999999999999999999999999') 

However, it prevents the particular class of rounding errors that result from float not exactly representing decimal fractions, e.g., that 0.01 is really 0x147AE147AE147B × 2-59 = 0.01000000000000000020816681711721685132943093776702880859375. This tends to show up in financial calculations. For example, calculating sales tax (which is 8.25% where I live) on a $2 item:

>>> 2.00 * 0.0825 0.165 >>> round(_, 2) 0.17 >>> Decimal('2.00') * Decimal('0.0825') Decimal('0.165000') >>> round(_, 2) Decimal('0.16') 

The first round call is incorrect because 0.165 is really 0.1650000000000000077715611723760957829654216766357421875, which rounds up to 17 cents instead of down to 16 cents (assuming the round-half-even rule) as intended.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.