How does Scikit learn KNN handle categorical input variables?

Question

In some articles, it's said knn uses hamming distance for one-hot encoded categorical variables. Does the scikit learn implementation of knn follow the same way.

Also are there any other ways to handle categorical input variables when using knn.

Tim J · Accepted Answer · 2022-01-12 10:01:07Z

1

As stated in the docs, the KNeighborsClassifier from scikit-learn uses minkowski distance by default.
Other metrics can be used, and you can probably get a decent idea by looking at the docs for scikit-learn's DistanceMetric class

answered Jan 12, 2022 at 10:01

Tim J

3861 silver badge9 bronze badges

$\begingroup$ ok. minkowski distance can be used for continuous input variables. but what about discrete/categorical input variables? $\endgroup$

insomniac
– insomniac

2022-01-19 06:22:40 +00:00
Commented Jan 19, 2022 at 6:22
$\begingroup$ No free lunch, so it's hard to give a straight answer. But I would experiment with 'jaccard' and 'matching'. They are also in the docs, so should work like a charm. $\endgroup$

Tim J
– Tim J

2022-06-28 11:47:52 +00:00
Commented Jun 28, 2022 at 11:47

Add a comment |

Stack Exchange Network

How does Scikit learn KNN handle categorical input variables?

1 Answer 1

Hot Network Questions

How does Scikit learn KNN handle categorical input variables?

1 Answer 1

Related

Hot Network Questions