I am studying a machine learning course and the lecture slides contain information what I find contradicting with the recommended book.
The problem is the following: there are three classifiers:
- classifier A providing better performance in the lower range of the thresholds,
- classifier B providing better performance in the higher range of the thresholds,
- classifier C what we get by flipping a p-coin and selecting from the two classifiers.
What will be the performance of classifier C, as viewed on a ROC curve?
The lecture slides state that just by flipping this coin, we are going to get the magical "convex hull" of classifier A's and B's ROC curve.
I don't understand this point. Just by simply flipping a coin, how can we gain information?
The lecture slide

What the book says
The recommended book (Data Mining... by Ian H. Witten, Eibe Frank and Mark A. Hall) on the other hand states that:
To see this, choose a particular probability cutoff for method A that gives true and false positive rates of tA and fA, respectively, and another cutoff for method B that gives tB and fB. If you use these two schemes at random with probabilities p and q, where p + q = 1, then you will get true and false positive rates of p . tA + q . tB and p . fA + q . fB. This represents a point lying on the straight line joining the points (tA, fA) and (tB, fB), and by varying p and q you can trace out the whole line between these two points.
In my understanding, what the book says is that to actually gain information and reach the convex hull we need to do something more advanced than simply flipping a p-coin.
AFAIK, the correct way (as suggested by the book) is the following:
- we should find an optimal threshold Oa for classifier A
- we should find an optimal threshold Ob for classifier B
define C as following:
- If t < Oa, use classifier A with t
- If t > Ob, use classifier B with t
- If Oa < t < Ob, pick between classifier A with Oa and B with Ob by the probability as a linear combination of where we are between Oa and Ob.
Is this correct? If yes, there are a few key differences compared to what the slides suggest.
- It's not a simple coin flipping, but a more advanced algorithm which needs manually defined points and picks based on what region we fall into.
- It never uses classifier A and B with threshold values between Oa and Ob.
Can you explain to me this problem and what is the correct way to understand it, if my understanding was not correct?
What would happen if we would just simply flip a p-coin like the slides would suggest? I would think that we'd get a ROC curve that is between A and B, but never "better" than the better one at a given point.
As far as I can see, I really don't understand how the slides could be correct. The probabilistic calculation on the left hand side doesn't make sense to me.
Update: Found the article written by the original author who invented the convex hull method: http://www.bmva.org/bmvc/1998/pdf/p082.pdf

