Which algorithm should I choose and why?

Question

My friend was reading a textbook and had this question:

Suppose that you observe $(X_1,Y_1),...,(X_{100}Y_{100})$, which you assume to be i.i.d. copies of a random pair $(X,Y)$ taking values in $\mathbb{R}^2 \times \{1,2\}$. Your plot the data and see the following:

where black circles represent those $X_i$ with $Y_i=1$ and the red triangles represent those $X_i$ with $Y_i=2$. A practitioner tells you that their misclassification costs are equal, $c_1 = c_2 = 1$, and would like advice on which algorithm to use for prediction. Given the options:

Linear discriminant analysis;
K-Nearest neighbours with $K=5$
K-Nearest neighbours with $K=90$.

What would be the best algorithm for this? I think it should be $5$, as the bigger the $K$, the worse the accuracy gets? What would be your choice and why?

stans · Accepted Answer · 2021-01-21 08:49:15Z

1

You can choose the optimal method using cross-validation. If your sample size is relatively small, use leave-one-out cross-validation... I would not be surprised if $K = 5$ worked well. Linear discriminant analysis (LDA) will not work here because it implies linear decision boundaries. Unless you enlarge the set of predictors with non-linear transformations.

Also, the picture above is a classic case where support vector machines (SVM) with a Gaussian kernel could be of use. R has a friendly implementation of SVM in the "kernlab" package.

answered Jan 21, 2021 at 8:49

stans

1324 bronze badges

$\begingroup$ Hi. But why would I not use $K=90$? $\endgroup$

Slim Shady
– Slim Shady

2021-01-21 08:50:24 +00:00
Commented Jan 21, 2021 at 8:50
$\begingroup$ I did not say you shouldn't. What I meant: use cross-validation to decide. $\endgroup$

stans
– stans

2021-01-21 08:51:29 +00:00
Commented Jan 21, 2021 at 8:51
$\begingroup$ No, I mean my question was, given you only have this graph and you had to choose between $K=90$ or $K=5$, what would you choose? And why? $\endgroup$

Slim Shady
– Slim Shady

2021-01-21 08:52:35 +00:00
Commented Jan 21, 2021 at 8:52
1

$\begingroup$ Why would you base your decision on the graph only? Is this a homework problem? $\endgroup$

stans
– stans

2021-01-21 09:01:22 +00:00
Commented Jan 21, 2021 at 9:01
$\begingroup$ I personally wouldn't decide only based on a graph, but the question is from a textbook. It's not homework though! I'd like to hear a good explanation as to why someone would choose one algo over the other, given they only have this graph:) $\endgroup$

Slim Shady
– Slim Shady

2021-01-21 09:04:06 +00:00
Commented Jan 21, 2021 at 9:04

| Show 1 more comment

Stack Exchange Network

Which algorithm should I choose and why?

1 Answer 1

Hot Network Questions

Which algorithm should I choose and why?

1 Answer 1

Related

Hot Network Questions