Plotting precision-recall curve using plot_precision_recall_curve and precision_recall_curve results in different plots

Question

I am plotting the precision-recall curves for my models which I have built using an imbalanced dataset.

I initially plotted the precision-recall curve for my models using the plot_precision_recall_curve function directly, like so:

# split into train/test sets trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2, stratify=y) dt = DecisionTreeClassifier() dt.fit(trainX,trainy) from sklearn.metrics import plot_precision_recall_curve plot_precision_recall_curve(dt, testX, testy, ax = plt.gca(), name = "Decision Tree")

Which resulted in this plot:

However, I then wanted to apply threshold tuning to achieve the optimal F0.5 score for my models. To do this, I plotted the precision-recall curve like so:

 # predict probabilities y_pred = dt.predict_proba(X_test) # keep probabilities for the positive outcome only y_pred = y_pred[:, 1] precision, recall, thresholds = precision_recall_curve(testy, y_pred) # convert to F0.5 score beta = 0.5 f05score = ( (1 + pow(0.5, 2)) * precision * recall ) / (pow(0.5, 2)* precision + recall ) # locate the index of the largest f 0.5 score ix = argmax(f05score) no_skill = len(testy[testy['0']==1]) / len(testy) pyplot.plot([0,1], [no_skill,no_skill], linestyle='--', label='No Skill') pyplot.plot(recall, precision, marker='.', label='DT', zorder=1) # set zorder so dots appear over line pyplot.scatter(recall[ix], precision[ix], marker='o', color='black', label='Best F0.5 Score', zorder=2) # axis labels pyplot.xlabel('Recall') pyplot.ylabel('Precision') pyplot.legend() # show the plot pyplot.show()

Which resulted in this plot:

Therefore, as you can see from both precision-recall curves, they look different and I want to know why this is as I assume they should look the same? Are there any mistakes in my code?

It's the same plot. In second plot you interpolate in the PR domain (and you shouldn't). — usεr11852
– usεr11852, Commented Oct 18, 2022 at 2:44
I see, so how do I stop the interpolation from happening in the second plot? @usεr11852 — sums22
– sums22, Commented Oct 18, 2022 at 8:37
You just change what we plot, we don't have enough points there. — usεr11852
– usεr11852, Commented Oct 18, 2022 at 10:58
@usεr11852 could you include an answer that clearly shows how to do this? Currently I am plotting precision-recall pairs for different thresholds which I calculated through: precision, recall, thresholds = precision_recall_curve(testy, y_pred). How do I modify this code to return more precision-recall pairs? — sums22
– sums22, Commented Oct 20, 2022 at 8:52
I think you don't plot everything, for example, there should be a point at approximately 0 recall, 0.22 precision. Where is that? — usεr11852
– usεr11852, Commented Oct 20, 2022 at 11:10

Peter DeWeirdt · Accepted Answer · 2025-08-04 20:05:59Z

I was looking into this discrepancy as well, and I think it comes down to matplotlib's drawstyle. If you change your plot function in the second block to

pyplot.plot(recall, precision, marker='.', label='DT', zorder=1, drawstyle='steps-post')

you should get the same plot from both code blocks

Stack Exchange Network

Plotting precision-recall curve using plot_precision_recall_curve and precision_recall_curve results in different plots

1 Answer 1

Hot Network Questions

Plotting precision-recall curve using plot_precision_recall_curve and precision_recall_curve results in different plots

1 Answer 1

Related

Hot Network Questions