1

Suppose I have a set of measurements that were obtained by varying two parameters, knob_b and knob_2 (in practice there are a lot more):

data = np.empty((6,3), dtype=np.float) data[:,0] = [3,4,5,3,4,5] data[:,1] = [1,1,1,2,2,2] data[:,2] = np.random.random(6) df = pd.DataFrame(data, columns=['knob_1', 'knob_2', 'signal']) 

i.e., df is

 knob_1 knob_2 signal 0 3 1 0.076571 1 4 1 0.488965 2 5 1 0.506059 3 3 2 0.415414 4 4 2 0.771212 5 5 2 0.502188 

Now, considering each parameter on its own, I want to find the minimum value that was measured for each setting of this parameter (ignoring the settings of all other parameters). The pedestrian way of doing this is:

new_index = [] new_data = [] for param in df.columns: if param == 'signal': continue group = df.groupby(param)['signal'].min() for (k,v) in group.items(): new_index.append((param, k)) new_data.append(v) new_index = pd.MultiIndex.from_tuples(new_index, names=('parameter', 'value')) df2 = pd.Series(index=new_index, data=new_data) 

resulting df2 being:

parameter value knob_1 3 0.495674 4 0.277030 5 0.398806 knob_2 1 0.485933 2 0.277030 dtype: float64 

Is there a better way to do this, in particular to get rid of the inner loop?

It seems to me that the result of the df.groupby operation already has everything I need - if only there was a way to somehow create a MultiIndex from it without going through the list of tuples.

1 Answer 1

3

Use the keys argument of pd.concat():

pd.concat([df.groupby('knob_1')['signal'].min(), df.groupby('knob_2')['signal'].min()], keys=['knob_1', 'knob_2'], names=['parameter', 'value']) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.