0

I have a df as follows:

ContextID EscAct_Curr_A StepID 7289973 0.122100122 1 7289973 0 2 7289973 0 2 7289973 0.122100122 2 7289973 0.122100122 2 7289973 0.122100122 2 7289973 0.122100122 2 7289999 0.244200245 1 7289999 0.244200245 1 7289999 0.244200245 1 7289999 0.244200245 2 7289999 0.366300374 2 7289999 0.366300374 2 7289999 0.366300374 2 7290025 0.122100122 1 7290025 0.122100122 1 7290025 0.122100122 2 7290025 0 2 7290025 0 2 7290025 0.122100122 2 

What I want to do is to club all the values from different StepIDs and create a separate df of it.

For instance, all the values of StepID 1 must be saved to one df, say s1 and all the values of StepID 2 must be saved to one df, say s2 so on. I have 24 such StepIDs. After this is done, I want to plot a scatterplot after running a machine learning algorithm.

What I have done:

For StepID 1

s1 = X.loc[X['StepID'] == 1] s1_array = s1.iloc[:,1].values.astype(float).reshape(-1,1) min_max_scaler = preprocessing.MinMaxScaler() scaled_array_s1 = min_max_scaler.fit_transform(s1_array) s1.iloc[:,1]=scaled_array_s1 ocsvm = OneClassSVM(nu = 0.1, kernel = 'rbf', gamma = 'scale') s1['y_ocsvm1'] = ocsvm.fit_predict(s1.values[:,[1]]) 

For StepID 2

s2 = X.loc[X['StepID'] == 2] s2_array = s2.iloc[:,1].values.astype(float).reshape(-1,1) min_max_scaler = preprocessing.MinMaxScaler() scaled_array_s2 = min_max_scaler.fit_transform(s2_array) s2.iloc[:,1]=scaled_array_s2 ocsvm = OneClassSVM(nu = 0.1, kernel = 'rbf', gamma = 'scale') s2['y_ocsvm2'] = ocsvm.fit_predict(s2.values[:,[1]]) 

Plotting the scatter plot:

fig, ax = plt.subplots() ax.scatter(s1.values[s1['y_ocsvm1'] == 1, 2], s1.values[s1['y_ocsvm1'] == 1, 1], c = 'green', label = 'Normal') ax.scatter(s1.values[s1['y_ocsvm1'] == -1, 2], s1.values[s1['y_ocsvm1'] == -1, 1], c = 'red', label = 'Outlier') ax.scatter(s2.values[s2['y_ocsvm2'] == 1, 2], s2.values[s2['y_ocsvm2'] == 1, 1], c = 'green') ax.scatter(s2.values[s2['y_ocsvm2'] == -1, 2], s2.values[s2['y_ocsvm2'] == -1, 1], c = 'red') plt.legend() 

These codes do exactly what I want it to, but writing codes like this for 24 different StepIDs is a lot tedious. So, I would like to know if there's a more compact way to achieve the following, maybe by using a loop or functions.

1 Answer 1

1

You have 95% repetitative code. The only thing that is really different - particular step IDs. So you can use this function and call it many times with different IDs:

def waka(step_id, X=X) s = X.loc[X['StepID'] == step_id] s_array = s.iloc[:,1].values.astype(float).reshape(-1,1) min_max_scaler = preprocessing.MinMaxScaler() scaled_array_s = min_max_scaler.fit_transform(s_array) s.iloc[:,1] = scaled_array_s ocsvm = OneClassSVM(nu = 0.1, kernel = 'rbf', gamma = 'scale') return ocsvm.fit_predict(s.values[:,[1]]) # OR! s['y_ocsvm'] = ocsvm.fit_predict(s.values[:,[1]]) return s 

You can store results in some kind of list or dict to draw them later.

Sign up to request clarification or add additional context in comments.

3 Comments

This reduces some effort. Thanks!
Is there a possibility for the scatterplot code to get simplified as such?
You can create lists of colors and labels and then scatter ing inside a for i, s in enumerate(your_list_with_result_data):

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.