5
$\begingroup$

Consider a dataset and two binary classes CLASS_A and CLASS_B. These two classes are not necessarely independent. Let's say that CLASS_A = "buy an apple" and CLASS_B = "buy an orange". An observation can have both = 1, none, or only one of the two.

Suppose we train a model such as XGBClassifier for both classes (separately) and obtain two models: MODEL_A to predict CLASS_A and MODEL_B to predict CLASS_B. Using the method predict_proba, according to the documentation we can obtain the

probability of each X example being of a given class.

My first question is: can I compare the output of predict_proba of the two models with the respective classes and say that if for observation x_i I get 0.4 with MODEL_A and CLASS_A and I get 0.2 with MODEL_B and CLASS_B then observation x_i is more likely to belong to CLASS_A than CLASS_B? (twice more likely?)

To clarify, I'm considering two different models, each one applied on a different binary output of the same training dataset, not a single multiclass model.

My second question is, would the above holds also if the train set is different? Meaning the training and scoring is done on different observations and different variables, and then I get 0.4 for observation x_i from MODEL_A and CLASS_A and 0.2 for observation x_j from MODEL_B and CLASS_B.

$\endgroup$
3
  • $\begingroup$ Cross-posted at stats.stackexchange.com/q/658423/232706 $\endgroup$ Commented Dec 8, 2024 at 3:46
  • $\begingroup$ In the first setting, you mean CLASS_A and CLASS_B are independent(ish)? Any row can be both A and B, just A, just B, or neither? (I had assumed A was "not B".) $\endgroup$ Commented Dec 9, 2024 at 22:23
  • 1
    $\begingroup$ No, I'm not assuming that they are independent: let's say CLASS_A = "buy an apple" and CLASS_B = "buy an orange". An observation can have both = 1, none, or only one of the two. I'm editing the question accordingly. $\endgroup$ Commented Dec 10, 2024 at 9:00

1 Answer 1

6
$\begingroup$

In both settings, the answer is "kinda". If your models are "well-calibrated" (and that's the terminology you should search on to do further research), then the predicted probabilities really do represent some sort of decent estimate of probabilities, and you can compare them that way. But that's notably often not very true of gradient boosted trees, and very much depends on the class prevalence in your training datasets, so a different set of observations may well make the calibration of the two models significantly different.

$\endgroup$
2
  • $\begingroup$ Thank you for your answer but I'm not sure I got it. In the first setting, is it a yes? To clarify, I'm considering two different models, each one applied on a different binary output of the same training dataset, not a single multiclass model. Regarding the second answer, I understand that the claim holds only if the results are "well-calibrated", which is something I will search on, and that maybe is not even feasible with gradient boosted trees. Is that it? $\endgroup$ Commented Dec 9, 2024 at 21:58
  • 2
    $\begingroup$ You've got it on the second one. There are degrees of calibration; GBMs tend to be overconfident, pushing predicted probabilities toward 0 and 1, but sometimes they're close enough that I think the comparisons you're after would be OK. $\endgroup$ Commented Dec 9, 2024 at 22:22

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.