I have a df as follows:
ContextID EscAct_Curr_A StepID 7289973 0.122100122 1 7289973 0 2 7289973 0 2 7289973 0.122100122 2 7289973 0.122100122 2 7289973 0.122100122 2 7289973 0.122100122 2 7289999 0.244200245 1 7289999 0.244200245 1 7289999 0.244200245 1 7289999 0.244200245 2 7289999 0.366300374 2 7289999 0.366300374 2 7289999 0.366300374 2 7290025 0.122100122 1 7290025 0.122100122 1 7290025 0.122100122 2 7290025 0 2 7290025 0 2 7290025 0.122100122 2 What I want to do is to club all the values from different StepIDs and create a separate df of it.
For instance, all the values of StepID 1 must be saved to one df, say s1 and all the values of StepID 2 must be saved to one df, say s2 so on. I have 24 such StepIDs. After this is done, I want to plot a scatterplot after running a machine learning algorithm.
What I have done:
For StepID 1
s1 = X.loc[X['StepID'] == 1] s1_array = s1.iloc[:,1].values.astype(float).reshape(-1,1) min_max_scaler = preprocessing.MinMaxScaler() scaled_array_s1 = min_max_scaler.fit_transform(s1_array) s1.iloc[:,1]=scaled_array_s1 ocsvm = OneClassSVM(nu = 0.1, kernel = 'rbf', gamma = 'scale') s1['y_ocsvm1'] = ocsvm.fit_predict(s1.values[:,[1]]) For StepID 2
s2 = X.loc[X['StepID'] == 2] s2_array = s2.iloc[:,1].values.astype(float).reshape(-1,1) min_max_scaler = preprocessing.MinMaxScaler() scaled_array_s2 = min_max_scaler.fit_transform(s2_array) s2.iloc[:,1]=scaled_array_s2 ocsvm = OneClassSVM(nu = 0.1, kernel = 'rbf', gamma = 'scale') s2['y_ocsvm2'] = ocsvm.fit_predict(s2.values[:,[1]]) Plotting the scatter plot:
fig, ax = plt.subplots() ax.scatter(s1.values[s1['y_ocsvm1'] == 1, 2], s1.values[s1['y_ocsvm1'] == 1, 1], c = 'green', label = 'Normal') ax.scatter(s1.values[s1['y_ocsvm1'] == -1, 2], s1.values[s1['y_ocsvm1'] == -1, 1], c = 'red', label = 'Outlier') ax.scatter(s2.values[s2['y_ocsvm2'] == 1, 2], s2.values[s2['y_ocsvm2'] == 1, 1], c = 'green') ax.scatter(s2.values[s2['y_ocsvm2'] == -1, 2], s2.values[s2['y_ocsvm2'] == -1, 1], c = 'red') plt.legend() These codes do exactly what I want it to, but writing codes like this for 24 different StepIDs is a lot tedious. So, I would like to know if there's a more compact way to achieve the following, maybe by using a loop or functions.