1

can someone explain why these 2 operations deliver different results? has it to do with some sort of maximum output? I don't mean the difference in time, but in the calculated result.

 l = list(range(100000000)) a = np.arange(100000000) %time np.sum(a ** 2) CPU times: user 132 ms, sys: 217 ms, total: 348 ms Wall time: 347 ms 662921401752298880 %time sum([x ** 2 for x in l]) CPU times: user 23.8 s, sys: 1.32 s, total: 25.1 s Wall time: 25.1 s 333333328333333350000000 
2
  • 5
    Numpy uses Int64, which overflow over a certain limit. Python ints automatically switch to big integers that do not overflow. Python list comprehension has the right result. Commented Feb 13, 2021 at 20:40
  • Then maybe you could try dtype='object' for the np.array. (could not try myself) Commented Feb 13, 2021 at 20:52

2 Answers 2

1

As pointed out by pLOPeGG, BLimitless and KaPy3141 numpy's integers can overflow. You can circumvent that by specifying dtype='object' (with minor speedup):

In [1]: import numpy as np In [2]: n = 10_000_000 In [3]: %timeit np.sum(np.arange(n, dtype='object')**2) 2.28 s ± 56.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [4]: %timeit sum(i**2 for i in range(n)) 2.6 s ± 534 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # also just for completeness sake # multiplying with itself is considerably faster than squaring explicitly In [5]: %timeit sum(i*i for i in range(n)) 1.29 s ± 242 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # fastest thing I could come up with: In [6]: %timeit a = np.arange(n, dtype='object'); np.sum(a * a) 939 ms ± 74.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 
Sign up to request clarification or add additional context in comments.

5 Comments

Hi Stefan, thank you for your explanations, just tried it myself. Do you know why the python list calculation is even even faster than the numpy calculation ?
It's not though - versions using numpy still outperform
Usually object dtype math times about the same as lists. Both use the same sort of object referencing, not the faster compiled c methods used for numeric dtypes.
Generally accessing the object protocol through numpy slows things down. When you find yourself using dtype=object it's generally a safe bet to switch to using a list or so
Thank you all for your answers. For me list calculation was even aster than numpy that's why I was curious. hpaulj and Mad Pysicist's answers make sense to me =)
1

@robin has it right: numpy is overflowing the int size. This is a known issue and will hopefully be fixed soon. These links have more info: https://github.com/numpy/numpy/issues/8987 and https://github.com/numpy/numpy/issues/10964

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.