13

For a simple binary classification problem, I would like to find what threshold setting maximizes the f1 score, which is the harmonic mean of precision and recall. Is there any built-in in scikit learn that does this? Right now, I am simply calling

precision, recall, thresholds = precision_recall_curve(y_test, y_test_predicted_probas) 

And then, I can compute the f1 score using the information at each index in the triplet of arrays:

curr_f1 = compute_f1(precision[index], recall[index]) 

Is there a better way of doing this, or is this how the library was intended to be used? Thanks.

1
  • Note I am using an XGBoost classifier with binary logistic output if that changes anything Commented Jul 16, 2019 at 15:37

2 Answers 2

16

After calculating the precision, recall and threshold scores you get NumPy arrays.
Just use the NumPy functions to find the threshold that maximizes the F1-Score:

f1_scores = 2*recall*precision/(recall+precision) print('Best threshold: ', thresholds[np.argmax(f1_scores)]) print('Best F1-Score: ', np.max(f1_scores)) 
Sign up to request clarification or add additional context in comments.

1 Comment

what if we care about the "weighted f1 score"? To account for unbalanced classes
10

Sometimes precision_recall_curve picks a few thresholds that are too high for the data so you end up with points where both precision and recall are zero. This can result in nans when computing F1 scores. To ensure correct output, use np.divide to only divide where the denominator is nonzero:

precision, recall, thresholds = precision_recall_curve(y_test, y_test_predicted_probas) numerator = 2 * recall * precision denom = recall + precision f1_scores = np.divide(numerator, denom, out=np.zeros_like(denom), where=(denom!=0)) max_f1 = np.max(f1_scores) max_f1_thresh = thresholds[np.argmax(f1_scores)] 

1 Comment

That's a nifty usage of the out argument -- for other people who were briefly confused like me, f1_scores will default to zero where denom is not zero.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.