When should I move beyond k nearest neighbour

Question

For many machine learning projects that we do, we start with the k Nearest Neighbour classifier. This is an ideal starting classifier as we usually have sufficient time to calculate all distances and the number of parameters is limited (k, distance metric and weighting)

However, this has often the effect that we stick with the knn classifier as later in the project there is no room for switching to another classifier. What would be good reason to try a new classifier. Obvious ones are memory and time restraints, but are there cases when another classifier can actually improve the accuracy?

Most of our applications are deployed in the industry (so memory footprint and calculation time are issues) — Rhand
– Rhand, Commented Feb 11, 2014 at 14:00

Necro0x0Der · Accepted Answer · 2014-02-11 22:35:21Z

k-NN generalizes in a very restrictive sense. It simply uses smoothness priors (or continuity assumption). This assumption implies that patterns that are close in feature space are most likely belong to the same class. No functional regularity in pattern distribution can be recovered by k-NN.

Thus, it requires representative training samples, which can be extremely large especially in cases of highly dimensional feature spaces. Worse, these samples can be unavailable. Consequently, it cannot learn invariants. If patterns can be subjected to some transformations without changing their labels, and training sample doesn't contain patterns transformed in all admissible ways, k-NN will never recognize transformed patterns that were not presented during training. This is true, e.g., for shifted or rotated images, if they are not represented in some invariant form before running k-NN. k-NN cannot even abstract from irrelevant features.

Another somewhat artificial example is following. Imagine that pattern belonging to different classes distributed periodically (e.g. in accordance with sine - if it is less than 0, then patterns belong to one class, and it is greater, then patterns belong to another class). Training set is finite. So, it will be located in a finite region. Outside this region recognition error will be 50%. One can imagine the logistic regression with periodic basis functions that will perform much better in this case. Other methods will be able to learn other regularities in pattern distributions and extrapolate well.

So, if one suspect that available data set is not representative, and invariance to some transformations of patterns should be achieved, then this is the case, in which one should move beyond k-NN.

Thank you for your answer (and thanks BartoszKP for trying to improve it). It is true that knn cannot find patterns that require transformation (unless you start using a weird(and incorrect) distance metric). That is a good reason to try another classifier, I guess svm is an obvious choice then. I'm not sufficiently familiar with svm to say, but would it not require specific knowledge about the pattern you are looking for to define the kernel? — Rhand
– Rhand, Commented Feb 12, 2014 at 22:36
Yes. The choice of kernel will depend on patterns. Gaussian kernel will have properties similar to k-NN method. Other standard kernels might appear to be also inappropriate. However, at least, one could try using them. — Necro0x0Der
– Necro0x0Der, Commented Feb 13, 2014 at 8:33
As implied by @Necro0x0Der any improvement along these lines would depend on the pattern (in the sine example, periodicity) being natural for the parametrization. That is, the parametrization (kernel choice) defines the structure (effectively, the metric) of the representation space. If you can determine (perhaps by educated guessing) some appropriate structure by some means, then try to parametrize the pattern accordingly. Note that in the end, this is allowing your classifier to readily find certain types of relevant features. — Bob Brandt
– Bob Brandt, Commented Feb 13, 2014 at 18:10

BartoszKP · Accepted Answer · 2014-02-11 13:21:24Z

If you would be constrained by computational complexity, decision trees (Quinal, 1986) are hard to beat (especially when a framework offers direct conversion of DT model to a bunch of if statements - like Accord.NET).

For high dimensional data the notion of distance, on which k-NN is based, becomes worthless (Kriegel, Kröger, Zimek, 2009) (also: Wikipedia article). So other classifiers, like SVM (Corter, Vapnik, 1995) or Random Forests (Breiman, 2001), might perform better.

References:

Kriegel, Hans-Peter; Kröger, Peer; Zimek, Arthur (2009), "Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering", ACM Transactions on Knowledge Discovery from Data (New York, NY: ACM) 3 (1): 1–58
Cortes, Corinna; and Vapnik, Vladimir N.; "Support-Vector Networks", Machine Learning, 20, 1995
Leo Breiman. 2001. Random Forests. Mach. Learn. 45, 1 (October 2001), 5-32.
J. R. Quinlan. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (March 1986), 81-106.

High dimension is not a fixed limit of course, in most cases our features are sufficiently expressive that distance works. Of course this might be an important point. Maybe I should have clarified with an example. Say we have a classifier that has an accuracy of 93%, this is acceptable, but now we can either try to improve the classifier or find new features. It all depends on the new possible features and the data, but I was looking for guidelines on this decision. — Rhand
– Rhand, Commented Feb 11, 2014 at 14:09
@Rhand Seems to me that it's a project management level decision. If current solution is acceptable, why tinker with it? It's a waste of time. If it is not acceptable, define more precisely what do you want to improve (speed, accuracy, etc.). — BartoszKP
– BartoszKP, Commented Feb 11, 2014 at 14:11
It is not only project management, the question is how to get a maximum accuracy (this is in my question) and what direction is the best to take. You suggest svm and random forest because dimensionality might be too high, that is one possibility I could experiment with to see if accuracy improves and that is the kind of answer I was looking for. — Rhand
– Rhand, Commented Feb 11, 2014 at 14:23
Well, this on the other hand is a very broad question. There are no general rules that classifier X is better than Y. You should just try some number of classifiers and then perform cross-validation for model selection for example. — BartoszKP
– BartoszKP, Commented Feb 11, 2014 at 15:10

Iancovici · Accepted Answer · 2014-02-11 12:40:45Z

kNN is useful for large data samples

However it's disadvantages are:

Biased by value of k.
Computation Complexity
Memory Limitation
Being a supervised learning lazy algorithm
Easily fooled by irrelevant attributes.
Prediction accuracy can quickly degrade when number of attributes increase.

It's usually only effective if the training data is large, and training is very fast.

$\begingroup$ I'm not looking at clustering, but at classification $\endgroup$

Rhand
– Rhand

2014-02-11 14:10:23 +00:00
Commented Feb 11, 2014 at 14:10 — Rhand
– Rhand, Commented Feb 11, 2014 at 14:10

Stack Exchange Network

When should I move beyond k nearest neighbour

3 Answers 3

Hot Network Questions

When should I move beyond k nearest neighbour

3 Answers 3

Related

Hot Network Questions