Revisions to what to do with 0.5 class probabilities ? [closed]

added 51 characters in body

edited Apr 24, 2023 at 17:45

95.8k
23
246
405

The question's focus on 0.5 conceals an important fact: each and every threshold applied to a continuous prediction implies some number of errors (false positives or false negatives). The question "How do a I set a threshold?" is not answerable in a vacuum, but instead depends on the application & the cost of errors. It is important to consider the cost of an error alongside the probability of the error -- amputating a limb is dramatically different from administering an unnecessary dose of antibiotics.

Even if you are compelled to choose a cutoff for some reason, it is worthwhile to consider what error rates you can tolerate. Receiver Operating Characteristic (roc) curves are a partial answer to that question, framing the choice of a cutoff as achieving a higher (lower) true positive rate at the cost of a higher (lower) false positive rate. ThisThat said, deciding on the appropriate TPR/FPR tradeoff is also contextual & depends on the goals of the model and how it is applied.

deleted 4 characters in body

Source Link

edited Apr 24, 2023 at 15:57

Sycorax ♦

95.8k
23
246
405

The question's focus on 0.5 conceals an important fact: everyeach and anyevery threshold applied to a continuous prediction implies some number of errors (false positives or false negatives). The question "How do a I set a threshold?" is not answerable in a vacuum, but instead depends on the application & the cost of errors. It is important to consider the cost of an error alongside the probabilityprobability of the error -- amputating a limb is dramatically different from administering an unnecessary dose of antibiotics.

Even if you are compelled to choose a cutoff for some reason, it is worthwhile to consider what error rates you can tolerate. Receiver Operating Characteristic (roc) curves are a partial answer to that question, framing the choice of a cutoff as achieving a higher (lower) true positive rate at the cost of a higher (lower) false positive rate. This is also contextual & depends on the goals of the model and how it is applied.

Source Link

answered Apr 24, 2023 at 15:35

Sycorax ♦

95.8k
23
246
405

The question's focus on 0.5 conceals an important fact: every and any threshold applied to a continuous prediction implies some number of errors (false positives or false negatives). The question is not answerable in a vacuum, but instead depends on the cost of errors. It is important to consider the cost of an error alongside the probability of the error -- amputating a limb is dramatically different from administering an unnecessary dose of antibiotics.

Even if you are compelled to choose a cutoff for some reason, it is worthwhile to consider what error rates you can tolerate. Receiver Operating Characteristic (roc) curves are a partial answer to that question, framing the choice of a cutoff as achieving a higher (lower) true positive rate at the cost of a higher (lower) false positive rate.

Stack Exchange Network

Return to Answer