What should I care about while stacking as an ensemble method?

Question

I'm using SMO, Logistic Regression, Bayesian Network and Simple CART algorithms for classification. Results form WEKA:

Algorithm Sensitivity (%) Specificity (%) Overall accuracy (%) Bayesian Network 57.49 76.09 65.24 Logistic Regression 64.73 69.86 66.87 SMO 54.32 79.20 64.69 Simple CART 71.88 61.51 67.56

SMO gives the best result for my classification problem, since it correctly classify the 79.20% of the class which is important for me. I want to increase this accuracy by stacking. I tried to combine some of them. In most of the cases I couldn't increase the accuracy but stacking SMO with Logistic Regression made a little increment in accuracy.

How can I explain why stacking SMO with Logistic Regression is better than others?

Is there any generalization such as combining tree classifiers gives good result in stacking? What should I care about while stacking?

EDIT:

 Bayesian Network Logistic Reg. SMO CART Kappa statistic 0.3196 0.3367 0.3158 0.3335 Mean absolute error 0.3517 0.4164 0.3531 0.4107 Root mean squared error 0.5488 0.4548 0.5942 0.4547 Relative absolute error (%) 72.3389 85.65 72.6299 84.477 Root relative squared error (%) 111.3076 92.2452 120.5239 92.2318 Weighted Avg. of F-Measure 0.653 0.671 0.676 92.2318 ROC Area 0.725 0.727 0.668 0.721

Total number of instance is 25106. 14641 of them is class a, and 10465 of them belong to class b.

=== Confusion Matrix of Simple CART === a b <-- classified as 10524 4117 | a = 0 4028 6437 | b = 1 === Confusion Matrix of SMO === a b <-- classified as 7953 6688 | a = 0 2177 8288 | b = 1 === Confusion Matrix of Logistic Regression === a b <-- classified as 9477 5164 | a = 0 3154 7311 | b = 1

Since SMO is successful at class b and CART is successful at class a, I tried to ensemble these two algorithms. But I couldn't increase the accuracy. Then I tried to combine SMO with Logistic Regression, the accuracy is increased a little bit. Why ensembling SMO with Logistic Regression is better than ensebling SMO with CART, is there any explanation?

In addition to answer of @lollercoaster. I found the paper of LUDMILA I. KUNCHEVA and CHRISTOPHER J. WHITAKE which title is "Measures of Diversity in Classiﬁer Ensembles and Their Relationship with the Ensemble Accuracy". I found it very explanatory about diversity. — ahmet
– ahmet, Commented Jul 26, 2015 at 15:37

lollercoaster · Accepted Answer · 2015-07-09 17:56:15Z

To directly answer your question about stacking: you should care about minimizing 1) bias, and 2) variance. This is obvious, but in practice this often comes down to simply having models which are "diverse". (I apologize that link is behind a paywall, but there are a few others like it and you may well find it other ways)

You don't want ensembles of like-minded models - they will make the same mistakes and reinforce each other.

In the case of stacking, what is happening? You are letting the outputs of the probabilistic classifiers on the actual feature input become the new features. A diverse set of classifiers which can in any way give signals about edge cases is desirable. If classifier 1 is terrible at classes A, B, and C but fantastic at class D, or a certain edge case, it is still a good contribution to the ensemble.

This is why neural nets are so good at what they do in image recognition - deep nets are in fact recursive logistic regression stacking ensembles! Nowadays people don't always use the sigmoid activation and there are many layer architectures, but it's the same general idea.

What I would recommend is trying to maximize the diversity of your ensemble by using some of the similarity metrics on the classifiers' prediction output vectors (ie, Diettrich's Kappa statistic) in training. Here is another good reference.

Hope that helps.

Thanks for your attention. I will try what you suggested and will get back to you. — ahmet
– ahmet, Commented Jul 10, 2015 at 6:11
Great. If you think that has answered your question, please mark at as such. In any case, best of luck. — lollercoaster
– lollercoaster, Commented Jul 10, 2015 at 15:09
I think, I should ensemble SMO with CART according to your suggestion. I edited my question. — ahmet
– ahmet, Commented Jul 10, 2015 at 19:48
Few things: 1) Why not try each combination of three algorithms you have?, 2) for that matter, why aren't you adding other algorithms? Random Forest, KNN come to mind as strong for difficult problems, 3) At some point stacking will not help - you'll need more examples, better features, and hyperparameter tuning. That's going beyond scope of this post and more than someone can answer in a comment. Stacking, unfortunately, is not an solution in itself - merely an augmentation to the basics. — lollercoaster
– lollercoaster, Commented Jul 10, 2015 at 22:29
Actually these are the algorithms which I had the best results and I tried each combination of them. I didn't want to just say I tried the everything and these are the best results. I wanted to add some logical explanations, like I select Logistic Regression because it have ... properties, which SMO doesn't have, to improve SMO. Thanks for your answer. — ahmet
– ahmet, Commented Jul 10, 2015 at 22:42

phiver · Accepted Answer · 2015-07-07 06:50:32Z

0

Read the following by MLWave: http://mlwave.com/kaggle-ensembling-guide/

This is very good starting point to stacking / ensembles.

answered Jul 7, 2015 at 6:50

phiver

7181 gold badge8 silver badges18 bronze badges

$\begingroup$ SE discourages link-only answers -- summarize the content that is relevant to the answer? $\endgroup$

Sean Owen
– Sean Owen

2015-07-07 08:23:10 +00:00
Commented Jul 7, 2015 at 8:23

Add a comment |

Stack Exchange Network

What should I care about while stacking as an ensemble method?

EDIT:

2 Answers 2

Hot Network Questions

What should I care about while stacking as an ensemble method?

EDIT:

2 Answers 2

Related

Hot Network Questions