Consider a dataset and two binary classes CLASS_A and CLASS_B. These two classes are not necessarely independent. Let's say that CLASS_A = "buy an apple" and CLASS_B = "buy an orange". An observation can have both = 1, none, or only one of the two.
Suppose we train a model such as XGBClassifier for both classes (separately) and obtain two models: MODEL_A to predict CLASS_A and MODEL_B to predict CLASS_B. Using the method predict_proba, according to the documentation we can obtain the
probability of each X example being of a given class.
My first question is: can I compare the output of predict_proba of the two models with the respective classes and say that if for observation x_i I get 0.4 with MODEL_A and CLASS_A and I get 0.2 with MODEL_B and CLASS_B then observation x_i is more likely to belong to CLASS_A than CLASS_B? (twice more likely?)
To clarify, I'm considering two different models, each one applied on a different binary output of the same training dataset, not a single multiclass model.
My second question is, would the above holds also if the train set is different? Meaning the training and scoring is done on different observations and different variables, and then I get 0.4 for observation x_i from MODEL_A and CLASS_A and 0.2 for observation x_j from MODEL_B and CLASS_B.