Voting combined results from different classifiers gave bad accuracy

Question

I used following classifiers along with their accuracies:

Random forest - 85 %
SVM - 78 %
Adaboost - 82%
Logistic regression - 80%

When I used voting from above classifiers for final classification, I got lesser accuracy than the case when I used Random forest alone.

How is this possible? All classifiers are giving more or less same accuracies when used individually, then how does Random Forest outperform their combined result ?

Not knowing your data, it is difficult to answer. But Overfitting is a strong candidate for this kind of effect. — user10169
– user10169, Commented Oct 7, 2015 at 8:03
This is not an answer, but rather a comment. And one more thing it's not overfitting. — eliasah
– eliasah, Commented Oct 7, 2015 at 10:10
@eliasah: How do you know that it is not overfitting (of course by Random Forest, not by the other three)? — user10169
– user10169, Commented Oct 7, 2015 at 12:06
the Random Forest algorithm (statistically) performs better than the other three. But the issue here is that he's considering a strong heuristic "voting" from different classifiers. — eliasah
– eliasah, Commented Oct 7, 2015 at 12:23
Imagine the case where RF guesses 85/100 correctly and SVM guesses 78/100 correctly. if the SVM predicts (78+15 =) 93 items the same as RF, but gets the other 7 wrong, it is easy to see how voting would be worse than RF alone (SVM made all the same mistakes and then some). So it's certainly possible, particularly if one classifier regularized more strongly for example. — jamesmf
– jamesmf, Commented Oct 29, 2015 at 20:03

eliasah · Accepted Answer · 2015-10-07 12:20:00Z

The approach you are considering is similar to a multi-class SVM or a one-vs-the-rest approach.

And here is how I describe the problem. The support vector machine, per example, is fundamentally a two-class classifier.

In practice, however, we often have to tackle problems involving K > 2 classes. Various methods have therefore been proposed for combining multiple two-class SVMs in order to build a multi-class classifier.

One commonly used approach (Vapnik, 1998) is to construct K separate SVMs, in which the kth model y_k(x) is trained using the data from class C_k as the positive examples and the data from the remaining K − 1 classes as the negative examples. This is known as the one-versus-the-rest approach where :

y(x) = max_k y_k(x)

Unfortunately, this heuristic approach suffers from the problem that the different classifiers were trained on different tasks, and there is no guarantee that the real-valued quantities y_k(x) for different classifiers will have appropriate scales.

Another problem with the one-versus-the-rest approach is that the training sets are imbalanced. For instance, if we have ten classes each with equal numbers of training data points, then the individual classifiers are trained on data sets comprising 90% negative examples and only 10% positive examples, and the symmetry of the original problem is lost.

Therefor, you got your bad accuracy.

PS: Accuracy, in most cases, is not a good measure for evaluating a classifier model.

References :

Vapnik, V. - Statistical Learning Theory. Wiley-Interscience, New York.
Christopher M. Bishop - Pattern Recognition and Machine Learning.

rapaio · Accepted Answer · 2015-10-29 09:11:50Z

Simply combining by voting some classifiers can naturally give bag results. Consider a toy example like having a set of data with $100$ instances. Suppose you have $3$ classifiers. Let's say on first $70$ instances all classifiers match perfectly. Than on next $10$ first classifier is good, the others are bad, on the next $10$, the 2nd is good, others bad, and on the last $10$ the 3rd is good, others are bad. All three models goes on with $0.80$ accuracy. Voting them would lead to $0.70$.

There are also some other potential difficulties like unscaled scores for base learners (it often happen with SVM, if it is not scaled by a logistic or whatever). Another reason is the model family provided by different learners. Bagging (or bootstrap aggregating which is used in RF for example) work simply because they assume that the model comes from the same family (it is the same tree model, or statistical speaking it is the majority class over a region), and only the samples are drawn randomly. Thus bootstrapping which requires independent and identically distributed sub-samples does not work in your case, since your models (which can be considered functions over samples) are not identically distributed.

There are however solutions to this problem, one of them is stacked generalization (or simply stacking). The simplest way is stacking with a logistic regressor. The idea is to train some base classifiers (which you already have), and take their results as input variables for a logistic regression. The top logistic regression would be able to find the 'proper balance' between the predictions and usually you get some increased accuracy as a benefit among others.

For further references see: Wikipedia page on ensemble learning. Search on Google for 'stacked generalization'. For papers start with the seminal paper of David Wolpert - Stacked Generalization - 1992 which started all the discussion on this topic.

Stack Exchange Network

Voting combined results from different classifiers gave bad accuracy

2 Answers 2

Linked

Hot Network Questions

Voting combined results from different classifiers gave bad accuracy

2 Answers 2

Linked

Related

Hot Network Questions