3
$\begingroup$

My problem is the following: I have a binary Logistic Regression model with a very imbalanced dataset that outputs the percentage of the prediction. As can be seen in the images, as the threshold is increased there's a certain point it stops predicting. I am researching calibration techniques to try and make it work better but I thought maybe I could get some direction here.

I've tried giving weights to the classes but it didn't seem to get much better.

Is it a probability calibration problem?

The three graphs below are shown in no particular order.

Thanks in advance.

1 2 3

$\endgroup$
1
  • $\begingroup$ Just wondering if you tried finding the AUC of precision-recall curves for these models? $\endgroup$ Commented Mar 5, 2020 at 17:37

1 Answer 1

1
$\begingroup$

Given a confusion matrix:

 predicted (+) (-) --------- (+) | TP | FN | actual --------- (-) | FP | TN | --------- 

we know that:

Precision = TP / (TP + FP) Recall = TP / (TP + FN) 

As can be seen in the images, as the threshold is increased there's a certain point it stops predicting

Thats not true, it stops predicting accurately only one class. What is udnerstandable because you moved threshold to make all your predictions on the other class.

Dont optimise this yourself, for example random forest will do this for you imiplicitly, (determining cut-off level) or just do hyperparameter optimisation youself.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.