0
$\begingroup$

In some articles, it's said knn uses hamming distance for one-hot encoded categorical variables. Does the scikit learn implementation of knn follow the same way.

Also are there any other ways to handle categorical input variables when using knn.

$\endgroup$

1 Answer 1

1
$\begingroup$

As stated in the docs, the KNeighborsClassifier from scikit-learn uses minkowski distance by default.
Other metrics can be used, and you can probably get a decent idea by looking at the docs for scikit-learn's DistanceMetric class

$\endgroup$
2
  • $\begingroup$ ok. minkowski distance can be used for continuous input variables. but what about discrete/categorical input variables? $\endgroup$ Commented Jan 19, 2022 at 6:22
  • $\begingroup$ No free lunch, so it's hard to give a straight answer. But I would experiment with 'jaccard' and 'matching'. They are also in the docs, so should work like a charm. $\endgroup$ Commented Jun 28, 2022 at 11:47

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.