10

As part of a batch Euclidean distance computation, I'm computing

(X * X).sum(axis=1) 

where X is a rather large 2-d array. This works fine, but it constructs a temporary array of the same size as X. Is there any way to get rid of this temporary, but retain the efficiency of a vectorized operation?

The obvious candidate,

np.array([np.dot(row, row) for row in X]) 

works, but uses a Python list as a temporary, making it rather slow.

Without the axis, the memory-efficient form would be

(X * X).sum() => np.dot(X.ravel(), X.ravel()) 

and I know that, when axis=1, it's equivalent to

np.diag(np.dot(X, X.T)) 

which got me looking into generalizations of dot such as np.inner, np.tensordot and np.einsum, but I can't figure out how they would solve my problem.

2
  • You should check the following: stackoverflow.com/questions/17527340/… For large arrays, the memory layout becomes very important apparently Commented Sep 30, 2013 at 13:27
  • @usethedeathstar: I'm aware of that; Euclidean distance is a dot product plus the sum of two of the above operations, and the layout is already optimized for the dot product. Commented Sep 30, 2013 at 13:32

1 Answer 1

11

The einsum equivalent is:

np.einsum('ij,ij->i', X, X) 

Though I am not sure how this works internally so it may or may not not solve your problem.

Sign up to request clarification or add additional context in comments.

6 Comments

A first test shows that it's extremely fast. I'll do some memory profiling.
Einsum is probably one of the best functions numpy has for handling large arrays. It is compiled entirely in C and takes advantage of SSE2 which numpy's ufuncs will not use until 1.8. Also it does try to avoid holding large numpy intermediates in memory. Source code is here.
The other alternative is inner1d: numpy.core.umath_tests import inner1d; inner1d(X, X), but np.einsum is faster most of the time, or on most systems.
@Jaime: is that even supposed to be a public function?
I think it is still experimental, but it is here to stay. inner1d and matrix_multiply are the two first instances of gufuncs, where the 'g' is for 'generalized', here's what the API docs say. In numpy 1.8 there will be a whole batch of linear algebra gufuncs, including those two, browse the PR to see what will be added. Best thing of gufuncs is they make it really, really easy to extend numpy with C when you are dealing with more complicated algorithms.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.