Return to Answer

i think the original author meant 'different' instead of 'difficult'

edit approved Dec 21, 2016 at 17:17

Is it possible to do much better than the 2.8% test error on MNIST using Random Forests?

Probably, yes. But that doesn't mean you'll be using the same features that you get by default. Decision trees in general don't work well on high dimensional problems like this, as you are only splitting on one feature at a time. Random Forest extends the usefulness of Decision Trees, but they still have the same issue. Beating 2.8% with RF will probably require you to do some feature pre-processing and transform the features into a more useful subset.

Neural Networks and Kernel SVMs are implicitly doing some feature transformation/engineering. So in some sense its impressive that Random Forest gets decently close without any extra work (indeed the real reason RF became popular is it was stupidly easy to get "good enough" results).

I thought that the general consensus was that Random Forests are usually at least as good as kernel SVMs

There is no such consensus. They often have similar results in terms of accuracy - but they are very difficultdifferent algorithms with different strengths / weaknesses. On many problems there accuracies are similar, on others SVMs win by a good margin, on some RF wins by a good margin.

Is it possible to do much better than the 2.8% test error on MNIST using Random Forests?

I thought that the general consensus was that Random Forests are usually at least as good as kernel SVMs

There is no such consensus. They often have similar results in terms of accuracy - but they are very difficult algorithms with different strengths / weaknesses. On many problems there accuracies are similar, on others SVMs win by a good margin, on some RF wins by a good margin.

Is it possible to do much better than the 2.8% test error on MNIST using Random Forests?

I thought that the general consensus was that Random Forests are usually at least as good as kernel SVMs

There is no such consensus. They often have similar results in terms of accuracy - but they are very different algorithms with different strengths / weaknesses. On many problems there accuracies are similar, on others SVMs win by a good margin, on some RF wins by a good margin.