As part of a batch Euclidean distance computation, I'm computing
(X * X).sum(axis=1) where X is a rather large 2-d array. This works fine, but it constructs a temporary array of the same size as X. Is there any way to get rid of this temporary, but retain the efficiency of a vectorized operation?
The obvious candidate,
np.array([np.dot(row, row) for row in X]) works, but uses a Python list as a temporary, making it rather slow.
Without the axis, the memory-efficient form would be
(X * X).sum() => np.dot(X.ravel(), X.ravel()) and I know that, when axis=1, it's equivalent to
np.diag(np.dot(X, X.T)) which got me looking into generalizations of dot such as np.inner, np.tensordot and np.einsum, but I can't figure out how they would solve my problem.