2

Is there a speedy way to do this;

import numpy as np a=np.array([1,2,3,4]) b=np.array([1,2]) c=np.array([a,b]) result=magic(c) 

where magic() is the functionality I want and result should be np.array([10,3]) i.e. a numpy.array containing the sums of each of the input arrays.

6
  • Why not sums = [sum(arr) for arr in [a, b]]? Commented Sep 18, 2015 at 23:07
  • Are a, b, etc longer or shorter than c, typically? Is c already created, or can we bypass that step? Commented Sep 18, 2015 at 23:29
  • Well I was hoping for a nice numpy implementation avoiding loops. @Oliver. W tells me that operations on unequal length arrays are not efficient in numpy so maybe the answer is to think of a different way to store my data... Commented Sep 18, 2015 at 23:32
  • @askewchan a and b can vary between length 1 and a few 100. c typically has length of a few tens and c is already created. Commented Sep 18, 2015 at 23:41
  • 1
    np.add.reduceat(d,ind) is a fast way of summing uneven blocks of the array d. But constructing d (np.hstack(c)) and ind takes time. Commented Sep 19, 2015 at 0:11

2 Answers 2

2

Here's a compilation of the suggestions (answers and comments) and their timings:

import numpy as np c = np.array([np.random.rand(np.random.randint(1, 300)) for i in range(50)]) def oliver(arr): res = np.empty_like(arr) for enu, subarr in enumerate(arr): res[enu] = np.sum(subarr) return res def reut(arr): return np.array([a.sum() for a in arr]) def hpaulj(arr): d = np.concatenate(arr) l = map(len, arr) i = np.cumsum(l) - l return np.add.reduceat(d, i) 

And their times:

In [94]: timeit oliver(c) 1000 loops, best of 3: 457 µs per loop In [95]: timeit reut(c) 1000 loops, best of 3: 317 µs per loop In [96]: timeit hpaulj(c) 10000 loops, best of 3: 94.4 µs per loop 

It was somewhat tricky to implement @hpaulj's, but I think I got it (and it's the fastest if you use concatenate instead of hstack)

Sign up to request clarification or add additional context in comments.

Comments

1

Of the many possible solutions, this is one:

def your_magic(arr): res = np.empty_like(arr) for enu, subarr in enumerate(arr): res[enu] = np.sum(subarr) return res 

Be mindful though that making a numpy array of unequal length arrays is not efficient at all and is just very similar to adding arrays in a normal Python list. This is the reason the returned array res in the function above will in general be of the dtype object.

2 Comments

Thanks! when you say not efficient are we talking memory/speed/both?
@fen speed: numpy was designed for computing on contiguous blocks of memory. By making it into a forced list of objects, you're adding unnecessary housekeeping under the hood. Have a look at this thread, in particular the first comment and first post. @Reut's comment to your post is an elegant way as well to define your magic function (although a case could be made to use the numpy method sum rather than the python builtin).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.