I am plotting the precision-recall curves for my models which I have built using an imbalanced dataset.
I initially plotted the precision-recall curve for my models using the plot_precision_recall_curve function directly, like so:
# split into train/test sets trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2, stratify=y) dt = DecisionTreeClassifier() dt.fit(trainX,trainy) from sklearn.metrics import plot_precision_recall_curve plot_precision_recall_curve(dt, testX, testy, ax = plt.gca(), name = "Decision Tree") Which resulted in this plot:
However, I then wanted to apply threshold tuning to achieve the optimal F0.5 score for my models. To do this, I plotted the precision-recall curve like so:
# predict probabilities y_pred = dt.predict_proba(X_test) # keep probabilities for the positive outcome only y_pred = y_pred[:, 1] precision, recall, thresholds = precision_recall_curve(testy, y_pred) # convert to F0.5 score beta = 0.5 f05score = ( (1 + pow(0.5, 2)) * precision * recall ) / (pow(0.5, 2)* precision + recall ) # locate the index of the largest f 0.5 score ix = argmax(f05score) no_skill = len(testy[testy['0']==1]) / len(testy) pyplot.plot([0,1], [no_skill,no_skill], linestyle='--', label='No Skill') pyplot.plot(recall, precision, marker='.', label='DT', zorder=1) # set zorder so dots appear over line pyplot.scatter(recall[ix], precision[ix], marker='o', color='black', label='Best F0.5 Score', zorder=2) # axis labels pyplot.xlabel('Recall') pyplot.ylabel('Precision') pyplot.legend() # show the plot pyplot.show() Which resulted in this plot:
Therefore, as you can see from both precision-recall curves, they look different and I want to know why this is as I assume they should look the same? Are there any mistakes in my code?

