12

I tried running the following code to find out the difference between float64 and double in numpy. The result is interesting as type double takes almost double the time compared with time taken for multiplication with float64. Need some light on this.

import time import numpy as np datalen = 100000 times = 10000 a = np.random.rand(datalen) b = np.random.rand(datalen) da = np.float64(a) db = np.float64(a) dda = np.double(a) ddb = np.double(b) tic = time.time() for k in range(times): dd = dda * ddb toc = time.time() print (toc - tic), 'time taken for double' tic = time.time() for k in range(times): d = da * db toc = time.time() print (toc - tic), 'time taken for float64' 
0

3 Answers 3

12
import numpy as np np.double is np.float64 # returns True 

In theory both should be same.

I felt my answer was misleading and I didn't notice the bug mentioned in other answers. So here is the update:

The benchmark difference wasn't caused by the used types - checkout other answers as it was a bug.

Regarding types: according to the documentation it's up to the used platform implementation to decide with number of bits used to represent double: Double-precision floating-point number type, compatible with Python float and C double. (numpy docs). Hence, in the case of x86_64 it will be just an alias to np.float64. With some other platforms it may not be true. If such choice depends on the system, you'll benefit from the native performance, but it may introduce bugs.

If you run the benchmark with the fix, calculated time values should be almost the same.

Sign up to request clarification or add additional context in comments.

1 Comment

That is what I expected. But time taken is almost double.
11

I think you're comparing apples with oranges.

The first bench is basically a * b but the second a * a.

I suspect much less cache misses for the latter.

2 Comments

Sorry, I didn't see that error in the code. Thank you for pointing it out. When I keep it a * b for both cases, time taken are same. Thank you.
Cache misses maybe, but more likely is optimized squaring algorithm in numpy. Squaring can be done in half the time as general multiplication. Whether using shift and add gradeschool or Karatsuba or other methods, some nice properties allow simplification. Granted if it's going straight to hardware, unclear if such optimization is happening.
7

Many array operations are typically limited only by the time it takes to load the array elements from the RAM into your cache.

So you'd expect the operation to be faster if the size of each value is smaller or you don't need to load so many elements into your cache.

That's why your timings were different: np.float64 and np.double don't actually make copies, so in the first case the arrays are identical while they are not in the second case:

>>> da is db True >>> dda is ddb False 

So the first case can be optimized because you don't need to load values from two arrays but only from one. Thus doubling the bandwith.

However there's another thing to consider: np.float64 and np.double should be identical on most machines but that's not garantueed. np.float64 is a fixed-sized float value (always 64bit) while np.double depends on the machine and/or compiler. On most machines doubles are 64bit but that's not always garantueed! So you might have speed differences if your double and float64 are of different widths:

>>> np.double(10).itemsize 8 >>> np.float64(10).itemsize 8 >>> np.float64 is np.double True 

1 Comment

da is db returns true because of a coding bug... it's in the line: db = np.float64(a)...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.