I know the question is two years old and the technical answer was given in the comments, but a more elaborate answer might help others still struggling with the concepts.
OP's ROC curve wrong because he used the predicted values of his models instead of the probabilities.
What does this mean?
When a model is trained it learns the relationships between the input variables and the output variable. For each observation the model is shown, the model learns how probable it is that a given observation belongs to a certain class. When the model is presented with the test data it will guess for each unseen observation how probable it is to belong to a given class.
How does the model know if an observation belongs to a class? During testing the model receives an observation for which it estimates a probability of 51% of belonging to Class X. How does take the decision to label as belonging to Class X or not? The researcher will set a threshold telling the model that all observations with a probability under 50% must be classified as Y and all those above must be classified as X. Sometimes the researcher wants to set a stricter rule because they're more interested in correctly predicting a given class like X rather than trying to predict all of them as well.
So you trained model has estimated a probability for each of your observations, but the threshold will ultimately decide to in which class your observation will be categorized.
Why does this matter?
The curve created by the ROC plots a point for each of the True positive rate and false positive rate of your model at different threshold levels. This helps the researcher to see the trade-off between the FPR and TPR for all threshold levels.
So when you pass the predicted values instead of the predicted probabilities to your ROC you will only have one point because these values were calculated using one specific threshold. Because that point is the TPR and FPR of your model for one specific threshold level.
What you need to do is use the probabilities instead and let the threshold vary.
Run your model as such:
from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier() knn_model = knn.fit(X_train,y_train) #Use the values for your confusion matrix knn_y_model = knn_model.predict(X=X_test) # Use the probabilities for your ROC and Precision-recall curves knn_y_proba = knn_model.predict_proba(X=X_test)
When creating your confusion matrix you will use the values of your model
from mlxtend.plotting import plot_confusion_matrix fig, ax = plot_confusion_matrix(conf_mat=confusion_matrix(y_test,knn_y_model), show_absolute=True,show_normed=True,colorbar=True) plt.title("Confusion matrix - KNN") plt.ylabel('True label') plt.xlabel('Predicted label'
When creating your ROC curve you will use the probabilities
import scikitplot as skplt plot = skplt.metrics.plot_roc(y_test, knn_y_proba) plt.title("ROC Curves - K-Nearest Neighbors")
y_pred_1dis the result ofargmax(softmax(output))and it is not probability estimate or confidential score. So I should use the result ofsoftmax(output). $\endgroup$