0

I'm using python and apparently the slowest part of my program is doing simple additions on float variables.

It takes about 35seconds to do around 400,000,000 additions/multiplications.

I'm trying to figure out what is the fastest way I can do this math.

This is how the structure of my code looks like. Example (dummy) code:

def func(x, y, z): loop_count = 30 a = [0,1,2,3,4,5,6,7,8,9,10,11,12,...35 elements] b = [0,11,22,33,44,55,66,77,88,99,1010,1111,1212,...35 elements] p = [0,0,0,0,0,0,0,0,0,0,0,0,0,...35 elements] for i in range(loop_count - 1): c = p[i-1] d = a[i] + c * a[i+1] e = min(2, a[i]) + c * b[i] f = e * x g = y + d * c .... and so on p[i] = d + e + f + s + g5 + f4 + h7 * t5 + y8 return sum(p) 

func() is called about 200k times. The loop_count is about 30. And I have ~20 multiplications and ~45 additions and ~10 uses of min/max

I was wondering if there is a method for me to declare all these as ctypes.c_float and do addition in C using stdlib or something similar ?

Note that the p[i] calculated at the end of the loop is used as c in the next loop iteration. For iteration 0, it just uses p[-1] which is 0 in this case.

My constraints:

  • I need to use python. While I understand plain math would be faster in C/Java/etc. I cannot use it due to a bunch of other things I do in python which cannot be done in C in this same program.
  • I tried writing this with cython, but it caused a bunch of issues with the environment I need to run this in. So, again - not an option.
23
  • 1
    numpy is not about parallelization. Are you sure you understand you constraints? Commented Sep 18, 2018 at 3:45
  • Note that your code will produce an IndexError in the first iteration of the for loop because of p[i-1] with i of 0 value. Commented Sep 18, 2018 at 3:47
  • @blhsing It would use the last index of p - which in the first iteration is 0. And in the second iteration is the p[0] which is set in the last line Commented Sep 18, 2018 at 3:48
  • @AbdealiJK Ah indeed. Had a brain malfunction moment. Commented Sep 18, 2018 at 3:49
  • @StephenRauch - removed it to see what answers come about, what i meant to say was that this cannot be modified to become vector math. But maybe my understanding of numpy's capabilities is limited and I am not able to see an easy way to use it. Commented Sep 18, 2018 at 3:50

1 Answer 1

0

I think you should consider using numpy. You did not mention any constraint.

Example case of a simple dot operation (x.y)

import datetime import numpy as np x = range(0,10000000,1) y = range(0,20000000,2) for i in range(0, len(x)): x[i] = x[i] * 0.00001 y[i] = y[i] * 0.00001 now = datetime.datetime.now() z = 0 for i in range(0, len(x)): z = z+x[i]*y[i] print "handmade dot=", datetime.datetime.now()-now print z x = np.arange(0.0, 10000000.0*0.00001, 0.00001) y = np.arange(0.0, 10000000.0*0.00002, 0.00002) now = datetime.datetime.now() z = np.dot(x,y) print 'numpy dot =',datetime.datetime.now()-now print z 

outputs

handmade dot= 0:00:02.559000 66666656666.7 numpy dot = 0:00:00.019000 66666656666.7 

numpy is more than 100x times faster.

The reason is that numpy encapsulates a C library that does the dot operation with compiled code. In the full python you have a list of potentially generic objects, casting, ...

Sign up to request clarification or add additional context in comments.

3 Comments

While I completely agree with that, i do not have a dot operation here. In the current method, all I can do is do simple + and - with np.float -> which was 3 times slower than doing it with normal python floats PS: I did have a constraint that numpy/numba wont work as i cant vectorize this math, but @stephen Raunch's first comment made me revert that because I assumed maybe I missed something
I réalize that you don't have a dot opération but i didn't have a simpler example. And you are not saying everything because judging from thé code you posted i don't sée why it can't bé vectorize. Post thé code you think can't bé vectorized and we'll discuss it then. Sounds good ?
In the above example, the p[i] is being calculated in the last line of the For loop. And p[i-1] is being used in the first line of the For loop. And all the calculations internally do not have a closed form expression as it has mins and max. In such a case, I can never compute the next for-loop because p[i-1] needs the previous for loop to execute. Hence IMO not vectorizable

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.