2
$\begingroup$

I am using sklearn.svm.SVC( ) to train & test my dataset. 80% are used for training, 20% are used for testing.

Here is my Python code:

data = pd.read_csv(trainPath, header=0) X = data.iloc[:, 5:17].values y = data.iloc[:, 17:18].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) print(X_train.dtype, y_train.dtype) # float64 int64 clf = svm.SVC(kernel='linear').fit(X_train, y_train.ravel()) print('done') y_pred = clf.predict(X_test) print("Accuracy:", metrics.accuracy_score(y_test, y_pred)) print("Precision:", metrics.precision_score(y_test, y_pred)) print("Recall:", metrics.recall_score(y_test, y_pred)) tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel() print(tn, fp, fn, tp) 

For data.shape = 30,000 x 13, it runs around 15 mins.

For data.shape = 130,000 x 13, it runs more than 1 hour.

Why it runs so long time, I don't think it is normal.

  • i5, 2.8GHz, 16.0 GB memory
$\endgroup$

1 Answer 1

5
$\begingroup$

From scikit-learn documentation:

The implementation is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using sklearn.linear_model.LinearSVC or sklearn.linear_model.SGDClassifier instead, possibly after a sklearn.kernel_approximation.Nystroem transformer.

Yo can change

clf = svm.SVC(kernel='linear').fit(X_train, y_train.ravel()) 

by

from sklearn.svm import LinearSVC clf = LinearSVC(random_state=0, tol=1e-5) clf.fit(X_train, y_train.ravel()) 
$\endgroup$
2
  • $\begingroup$ how about the speed of LinearSVC? $\endgroup$ Commented Aug 20, 2019 at 10:51
  • 1
    $\begingroup$ In the liblinear web page its reported that, for the same dataset, linearsvm gets 97% in 3 seconds whilst svm gets 96.8% in 346 seconds... csie.ntu.edu.tw/~cjlin/liblinear $\endgroup$ Commented Aug 20, 2019 at 10:56

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.