Suppose I have a set of measurements that were obtained by varying two parameters, knob_b and knob_2 (in practice there are a lot more):
data = np.empty((6,3), dtype=np.float) data[:,0] = [3,4,5,3,4,5] data[:,1] = [1,1,1,2,2,2] data[:,2] = np.random.random(6) df = pd.DataFrame(data, columns=['knob_1', 'knob_2', 'signal']) i.e., df is
knob_1 knob_2 signal 0 3 1 0.076571 1 4 1 0.488965 2 5 1 0.506059 3 3 2 0.415414 4 4 2 0.771212 5 5 2 0.502188 Now, considering each parameter on its own, I want to find the minimum value that was measured for each setting of this parameter (ignoring the settings of all other parameters). The pedestrian way of doing this is:
new_index = [] new_data = [] for param in df.columns: if param == 'signal': continue group = df.groupby(param)['signal'].min() for (k,v) in group.items(): new_index.append((param, k)) new_data.append(v) new_index = pd.MultiIndex.from_tuples(new_index, names=('parameter', 'value')) df2 = pd.Series(index=new_index, data=new_data) resulting df2 being:
parameter value knob_1 3 0.495674 4 0.277030 5 0.398806 knob_2 1 0.485933 2 0.277030 dtype: float64 Is there a better way to do this, in particular to get rid of the inner loop?
It seems to me that the result of the df.groupby operation already has everything I need - if only there was a way to somehow create a MultiIndex from it without going through the list of tuples.