Revisions to Combining classifiers by flipping a coin

deleted 20 characters in body

edited May 15, 2012 at 19:29

1.5k
8
15

NoNot really. The correct way is simply: flip a coin with probability $p$, choose a classifier (the optimal A or the optimal B) and classify the observed sample withusing that classifier.

deleted 50 characters in body

Source Link

edited May 15, 2012 at 19:23

leonbloy

1.5k
8
15

Let me change the notation a little (you use $t$ for the value and $t_A$ for the true-positive rate, that can be confusing).

Let $x$ be the measured valueThe lecture slides are right.

Now, it might happen that these two classifiers are too extremes for our needs, we'd like that both types of errors have similar weights. We'd prefer, instead of using classifier A (orange dot) or B (blue dot) to attain a performance that it's in between them. As the course say, one can attain that result by just flipping a coin and choose one of the classifiers at random.

Just by simply flipping a coin, how can we gain information?

We don't gain information. Our new randomized classifier is not simply "better" than A or B, it's performance is sort of an average of A and B, in what respect to the costs assigned to each type of error. That can be or not beneficial to us, depending on what are our costs.

AFAIK, the correct way (as suggested by the book) is the following ... Is this correct?

No really. The correct way is simply: flip a coin with probability $p$, choose a classifier (the optimal A or the optimal B) and classify the observed sample with that classifier.

deleted 50 characters in body

Source Link

edited May 15, 2012 at 19:15

leonbloy

1.5k
8
15

(Edited)

Let me change the notation a little (you use $t$ for the value and $t_A$ for the true-positive rate, that can be confusing).

Method A has an "optimal point" that gives true and false positive rates of $t_a$ and $f_a$ respectively (TPA, FPA in the graph) respectively . This point would correspond to a threshold, or more in general[*] a optimal decision boundary that determines an "optimal acceptance/rejection region" for A. All All the same goes for B. (Notice thatBut the regionsthresholds and the boundaries are not related).

The answer to your first question, is basically yes, except that the probability of the coin is (in some sense) arbitrary. ~~The answer to your first question, is basically yes, except that the probability of the coin is (in some sense) arbitrary. The final clasiffier would be:~~

If $x$ belongs to the "optimal acceptance region for A" (conservative), use that classifier A (i.e.: accept it) If $x$ belongs to the "optimal rejection region for B" (eager), use that classifier B (i.e., reject it) Elsewhere , flip a coin with probability $p$ and use the classifier A or B.

(Corrected: actually, the lectures are completely right, we can just flip the coin in any case. See diagrams)

You can use any fixed $p$ in the range (0,1), it depends on whether you want to be more or less conservative, i.

Should $p$ depend on the particular valuee., if you want to be more near to one of $x$the points or be fixed? I thinkin the textbook assumes it is fixed (arbritrarily)middle.

[You[*] You should be general here: if you think in terms of a single scalar threshold, all this makes little sense; a one-dimensional feature with a threshold-based classifier does not gives you enough degrees of freedom to have different classifiers as A and B, that performs along different curves when the free paramenters (decision boundary=threshold) varies. In other words: A and B are called "methods" or "systems", not "classifiers"; because A is a whole family of classifiers, parametrized by some parameter (scalar) that determines a decision boundary (not, not just a scalar).scalar]

Let me change the notation a little (you use $t$ for the value and $t_A$ for the true-positive rate, that can be confusing).

Method A has an "optimal point" that gives true and false positive rates of $t_a$ and $f_a$ respectively (TPA, FPA in the graph). This point would correspond to a threshold, or more in general[*] a decision boundary that determines an "optimal acceptance/rejection region" for A. All the same goes for B. (Notice that the regions are not related).

The answer to your first question, is basically yes, except that the probability of the coin is (in some sense) arbitrary. ~~The final clasiffier would be:~~

If $x$ belongs to the "optimal acceptance region for A" (conservative), use that classifier A (i.e.: accept it) If $x$ belongs to the "optimal rejection region for B" (eager), use that classifier B (i.e., reject it) Elsewhere , flip a coin with probability $p$ and use the classifier A or B.

(Corrected: actually, we can just flip the coin in any case. See diagrams)

You can use any fixed $p$ in the range (0,1), it depends on whether you want to be more or less conservative.

Should $p$ depend on the particular value of $x$ or be fixed? I think the textbook assumes it is fixed (arbritrarily).

[You should be general here: if you think in terms of a single scalar threshold, all this makes little sense; a one-dimensional feature with a threshold-based classifier does not gives you enough degrees of freedom to have different classifiers as A and B, that performs along different curves when the free paramenters (decision boundary=threshold) varies. In other words: A and B are called "methods" or "systems", not "classifiers"; because A is a whole family of classifiers, parametrized by some parameter (scalar) that determines a decision boundary (not a scalar).

(Edited)

Let me change the notation a little (you use $t$ for the value and $t_A$ for the true-positive rate, that can be confusing).

Method A has an "optimal point" that gives true and false positive rates of (TPA, FPA in the graph) respectively . This point would correspond to a threshold, or more in general[*] a optimal decision boundary for A. All the same goes for B. (But the thresholds and the boundaries are not related).

~~The answer to your first question, is basically yes, except that the probability of the coin is (in some sense) arbitrary. The final clasiffier would be:~~

If $x$ belongs to the "optimal acceptance region for A" (conservative), use that classifier A (i.e.: accept it) If $x$ belongs to the "optimal rejection region for B" (eager), use that classifier B (i.e., reject it) Elsewhere , flip a coin with probability $p$ and use the classifier A or B.

(Corrected: actually, the lectures are completely right, we can just flip the coin in any case. See diagrams)

You can use any fixed $p$ in the range (0,1), it depends on whether you want to be more or less conservative, i.e., if you want to be more near to one of the points or in the middle.

[*] You should be general here: if you think in terms of a single scalar threshold, all this makes little sense; a one-dimensional feature with a threshold-based classifier does not gives you enough degrees of freedom to have different classifiers as A and B, that performs along different curves when the free paramenters (decision boundary=threshold) varies. In other words: A and B are called "methods" or "systems", not "classifiers"; because A is a whole family of classifiers, parametrized by some parameter (scalar) that determines a decision boundary, not just a scalar]

added 1822 characters in body

Source Link

edited May 15, 2012 at 19:02

leonbloy

1.5k
8
15

Loading

Source Link

answered May 6, 2012 at 15:03

leonbloy

1.5k
8
15

Loading

Stack Exchange Network

Return to Answer