NumPy: How to avoid this loop?

Question

Is there a way to avoid this loop so optimize the code?

import numpy as np cLoss = 0 dist_ = np.array([0,1,0,1,1,0,0,1,1,0]) # just an example, longer in reality TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1]) # just an example, longer in reality t = float(dist_.size) for i in range(len(dist_)): labels = TLabels[dist_ == dist_[i]] cLoss+= 1 - TLabels[i]*(1. * np.sum(labels)/t) print cLoss

Note: dist_ and TLabels are both numpy arrays with the same shape (t,1)

Well I believe it's correct, TLabels[dist_ == dist_[i]] will return values from TLabels which have indices where dist_ == dist_[i]. For example let dist_ = array([2,1,2]) and TLabels=array([1,2,3]) so dist_ == dist_[0] will return array([True,False,True]) than TLabels[dist_ == dist_[0]] = array([1,3]) — farhawa
– farhawa, Commented Jun 7, 2015 at 11:01
Just to be clear, are the arrays (t,1) or (t,)? Where is cLoss initialized? — hpaulj
– hpaulj, Commented Jun 7, 2015 at 18:00
You need to turn this into a full running (cut and paste) example, with output. Otherwise we won't take it seriously. — hpaulj
– hpaulj, Commented Jun 7, 2015 at 18:09
Is cLoss initially 0 or []. And why the return? You aren't defining a function. — hpaulj
– hpaulj, Commented Jun 7, 2015 at 19:41

Thomas Baruchel · Accepted Answer · 2015-06-07 21:02:43Z

I am not sure what you exactly want to do, but are you aware of scipy.ndimage.measurements for computing on arrays with labels? It look like you want something like:

cLoss = len(dist_) - sum(TLabels * scipy.ndimage.measurements.sum(TLabels,dist_,dist_) / len(dist_))

hpaulj · Accepted Answer · 2015-06-07 20:57:28Z

I first wonder, what is labels at each step in the loop?

With dist_ = array([2,1,2]) and TLabels=array([1,2,3])

I get

[-1 1] [1] [-1 1]

The different length immediately raise a warning flag - it may be difficult to vectorize this.

With the longer arrays in the edited example

[-1 1 -1 -1 -1] [ 1 1 1 1 -1] [-1 1 -1 -1 -1] [ 1 1 1 1 -1] [ 1 1 1 1 -1] [-1 1 -1 -1 -1] [-1 1 -1 -1 -1] [ 1 1 1 1 -1] [ 1 1 1 1 -1] [-1 1 -1 -1 -1]

The labels vectors are all the same length. Is that normal, or just a coincidence of values?

Drop a couple of elements off of dist_, and labels are:

In [375]: for i in range(len(dist_)): labels = TLabels[dist_ == dist_[i]] v = (1.*np.sum(labels)/t); v1 = 1-TLabels[i]*v print(labels, v, TLabels[i], v1) cLoss += v1 .....: (array([-1, 1, -1, -1]), -0.25, -1, 0.75) (array([1, 1, 1, 1]), 0.5, 1, 0.5) (array([-1, 1, -1, -1]), -0.25, 1, 1.25) (array([1, 1, 1, 1]), 0.5, 1, 0.5) (array([1, 1, 1, 1]), 0.5, 1, 0.5) (array([-1, 1, -1, -1]), -0.25, -1, 0.75) (array([-1, 1, -1, -1]), -0.25, -1, 0.75) (array([1, 1, 1, 1]), 0.5, 1, 0.5)

Again different lengths of labels, but really only a few calculations. There is 1 v value for each different dist_ value.

Without working out all the details, it looks like you are just calculating labels*labels for each distinct dist_ value, and then summing those.

This looks like a groupBy problem. You want to divide the dist_ into groups with a common value, and sum some function of their corresponding TLabels values. Python itertools has a groupBy function, so does pandas. I think both require you to sort dist_.

Try sorting dist_ and see if that adds any clarity to the problem.

percusse · Accepted Answer · 2015-06-07 20:43:06Z

I'm not sure if this is any better since I didn't exactly understand why you might want to do this. Many variables in your loop are bivalued hence can be computed in advance.

Also the entries of dist_ can be used as a boolean switch but I used an explicit copy anyhow.

dist_ = np.array([0,1,0,1,1,0,0,1,1,0]) TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1]) t = len(dist) dist_zeros = dist_== 0 one_zero_sum = [sum(TLabels[dist_zeros])/t , sum(TLabels[~dist_zeros])/t] cLoss = sum([1-x*one_zero_sum[dist_[y]] for y,x in enumerate(TLabels)])

which results in cLoss = 8.2. I am using Python3 so didn't check whether this is a true division or not in Python2.

Collectives™ on Stack Overflow

NumPy: How to avoid this loop?

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related