Skip to main content
Bumped by Community user
deleted 3 characters in body
Source Link

I tried to standardize the training data with samples of 629,145 rows and 24 features:

from sklearn import datasets import pandas as pd from sklearn import svm from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.preprocessing import StandardScaler df = pd.read_csv('mydata.csv', dtype='object') #manually choosing 24 features X=df.loc[:, ['Bwd Pkt Len Min','Subflow Fwd Byts','TotLen Fwd Pkts','TotLen Fwd Pkts','Bwd Pkt Len Std','Flow IAT Min', 'Fwd IAT Min','Flow IAT Mean','Flow Duration','Flow IAT Std','Active Min','Active Mean','Fwd IAT Min', 'Bwd IAT Mean','Fwd IAT Mean','Init Fwd Win Byts','ACK Flag Cnt','Fwd PSH Flags','SYN Flag Cnt','Fwd Pkts/s', 'Bwd Pkts/s','Init Bwd Win Byts','PSH Flag Cnt','Pkt Size Avg']] Y= df['Label'] X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4,random_state=42) # 60% training and 40% test X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) #Create a svm Classifier clf = svm.SVC(kernel='rbf') # not linear Kernel clf.fit(X_train, y_train) 

Already 6 hours and SVM does not converge. Same data is converged with other algorithms very fast as RF and I know it is quite normal as SVM considered computationally high algorithm compared to KMeansKNN and RF.. I read quite a bit of questions/answers and articles.

  1. I wonder how can I track and analyze the problem visually (probably plotting some graphs as cost function or ?).
  2. Can parameter tuning (C parameter) be of the help and speed up this?
  3. What could be your advice?

Thanks very much

I tried to standardize the training data with samples of 629,145 rows and 24 features:

from sklearn import datasets import pandas as pd from sklearn import svm from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.preprocessing import StandardScaler df = pd.read_csv('mydata.csv', dtype='object') #manually choosing 24 features X=df.loc[:, ['Bwd Pkt Len Min','Subflow Fwd Byts','TotLen Fwd Pkts','TotLen Fwd Pkts','Bwd Pkt Len Std','Flow IAT Min', 'Fwd IAT Min','Flow IAT Mean','Flow Duration','Flow IAT Std','Active Min','Active Mean','Fwd IAT Min', 'Bwd IAT Mean','Fwd IAT Mean','Init Fwd Win Byts','ACK Flag Cnt','Fwd PSH Flags','SYN Flag Cnt','Fwd Pkts/s', 'Bwd Pkts/s','Init Bwd Win Byts','PSH Flag Cnt','Pkt Size Avg']] Y= df['Label'] X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4,random_state=42) # 60% training and 40% test X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) #Create a svm Classifier clf = svm.SVC(kernel='rbf') # not linear Kernel clf.fit(X_train, y_train) 

Already 6 hours and SVM does not converge. Same data is converged with other algorithms very fast as RF and I know it is quite normal as SVM considered computationally high algorithm compared to KMeans and RF.. I read quite a bit of questions/answers and articles.

  1. I wonder how can I track and analyze the problem visually (probably plotting some graphs as cost function or ?).
  2. Can parameter tuning (C parameter) be of the help and speed up this?
  3. What could be your advice?

Thanks very much

I tried to standardize the training data with samples of 629,145 rows and 24 features:

from sklearn import datasets import pandas as pd from sklearn import svm from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.preprocessing import StandardScaler df = pd.read_csv('mydata.csv', dtype='object') #manually choosing 24 features X=df.loc[:, ['Bwd Pkt Len Min','Subflow Fwd Byts','TotLen Fwd Pkts','TotLen Fwd Pkts','Bwd Pkt Len Std','Flow IAT Min', 'Fwd IAT Min','Flow IAT Mean','Flow Duration','Flow IAT Std','Active Min','Active Mean','Fwd IAT Min', 'Bwd IAT Mean','Fwd IAT Mean','Init Fwd Win Byts','ACK Flag Cnt','Fwd PSH Flags','SYN Flag Cnt','Fwd Pkts/s', 'Bwd Pkts/s','Init Bwd Win Byts','PSH Flag Cnt','Pkt Size Avg']] Y= df['Label'] X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4,random_state=42) # 60% training and 40% test X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) #Create a svm Classifier clf = svm.SVC(kernel='rbf') # not linear Kernel clf.fit(X_train, y_train) 

Already 6 hours and SVM does not converge. Same data is converged with other algorithms very fast as RF and I know it is quite normal as SVM considered computationally high algorithm compared to KNN and RF.. I read quite a bit of questions/answers and articles.

  1. I wonder how can I track and analyze the problem visually (probably plotting some graphs as cost function or ?).
  2. Can parameter tuning (C parameter) be of the help and speed up this?
  3. What could be your advice?

Thanks very much

Source Link

Scaling does not speed up the SVM model

I tried to standardize the training data with samples of 629,145 rows and 24 features:

from sklearn import datasets import pandas as pd from sklearn import svm from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.preprocessing import StandardScaler df = pd.read_csv('mydata.csv', dtype='object') #manually choosing 24 features X=df.loc[:, ['Bwd Pkt Len Min','Subflow Fwd Byts','TotLen Fwd Pkts','TotLen Fwd Pkts','Bwd Pkt Len Std','Flow IAT Min', 'Fwd IAT Min','Flow IAT Mean','Flow Duration','Flow IAT Std','Active Min','Active Mean','Fwd IAT Min', 'Bwd IAT Mean','Fwd IAT Mean','Init Fwd Win Byts','ACK Flag Cnt','Fwd PSH Flags','SYN Flag Cnt','Fwd Pkts/s', 'Bwd Pkts/s','Init Bwd Win Byts','PSH Flag Cnt','Pkt Size Avg']] Y= df['Label'] X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4,random_state=42) # 60% training and 40% test X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) #Create a svm Classifier clf = svm.SVC(kernel='rbf') # not linear Kernel clf.fit(X_train, y_train) 

Already 6 hours and SVM does not converge. Same data is converged with other algorithms very fast as RF and I know it is quite normal as SVM considered computationally high algorithm compared to KMeans and RF.. I read quite a bit of questions/answers and articles.

  1. I wonder how can I track and analyze the problem visually (probably plotting some graphs as cost function or ?).
  2. Can parameter tuning (C parameter) be of the help and speed up this?
  3. What could be your advice?

Thanks very much