2
$\begingroup$

I build an SVM classifier but get an inverse ROC curve. The AUC is only 0.08. I've used the same datasets to build a Logistic Regression classifier and a Decision Tree classifier, and the ROC curves for them look good.

Here are my codes for SVM:

from sklearn.svm import SVC svm = SVC(max_iter = 12, probability = True) svm.fit(train_x_sm, train_y_sm) svm_test_y = svm.predict(X = test_x) svm_roc = plot_roc_curve(svm, test_x, test_y) plt.show() 

Could anyone tell me what is wrong in my codes?

$\endgroup$

2 Answers 2

2
$\begingroup$

For any classification problem if AUC<0.5, you are not performing better than random(0.5).

Reason could be:

  • Your classifier is over-fitted on the training set and performs very poorly on the test set.
  • Your test sample might be very small.
  • Your classifier is giving you the probability that the class is -1. Thus, you get a prediction (close to) 0 for a class 1, and 1 for a class -1 prediction. If your ROC method expects positive (+1) predictions to be higher than negative (-1) ones, you get a reversed curve.

A valid strategy is to simply invert the predictions as:

invert_prob=1-prob 

Reference: ROC

$\endgroup$
0
2
$\begingroup$

One potential fix is to remove max_iter = 12 (which would set it to the scikit learn default of max_iter=-1). Using such a low value can lead to bad scores as you can see from the following example:

from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.metrics import plot_roc_curve from sklearn.datasets import load_breast_cancer data = load_breast_cancer() X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2) model = SVC(max_iter=12, probability = True) model.fit(X_train, y_train) plot_roc_curve(model, X_test, y_test) 

results in

ROC with max_iter=12

However, executing exactly the same code (max_iter=12 still) again gives a totally different result:

ROC max_iter=12

After removing max_iter=12 the code consistently produces higher AUCs around $0.95$ to $0.99$.

$\endgroup$
3
  • $\begingroup$ I set a small value for max_iter because when max_iter = -1, the program will take a really long time and I don't know if it will stop. After I published this question, I tried to change the kernel of SVM to sigmoid instead of the default rbf, and this time I got a good ROC with the AUC equals to 0.94. So maybe the kernel is the issue? $\endgroup$ Commented Jul 28, 2020 at 18:27
  • $\begingroup$ @MMMMMay Have a look at my example above: with max_iter=12 your results can fluctuate a lot. What happens if you use rbf as kernel and fit the model 10 times? Do you always get a low AUC? $\endgroup$ Commented Jul 28, 2020 at 18:40
  • $\begingroup$ Yes, the AUC is always low. $\endgroup$ Commented Jul 28, 2020 at 19:06

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.