Nearest Neighbor Recommendation System w/ categorical variables

Question

I would like to build a recommendation system:

no ratings are available at the time of recommendation, therefore only a purely context-based recommendation system is needed
as input features answers of a questionnaire are available (all categorical)

My idea is the following:

Find the most similar users based on the answers from the questionnaire with a suitable distance measure.
the past recommendations of these users are relevant and meaningful for the new user in the system

When choosing the encoding and distance measure, I have the problem that there are only categorical variables with values from binary to questions with 20 unique values. One-hot encoding has its drawbacks with multicollinearity and I'm not sure since variables with 20 unique possibilties get such a strong emphasis.

Does anyone have a recommendation for a possible approach? Thanks a lot!

Clustering Categorical Data using Gower distance (in Python): link — Vladislav Gladkikh
– Vladislav Gladkikh, Commented Oct 5, 2022 at 5:32

sconfluentus · Accepted Answer · 2022-10-05 03:23:56Z

In r there is a package called dprep and it holds a magical method call knngow(). This is a KNN algorithm which uses the gower distance (not a physical distance like Euclidean or Manhattan).

It is specifically useful for working with nominal and ordinal variables that translate into binary or leveled factors because it is able to manage & differentiate between the regular interval between levels in a variable without being biased by ranks.

There is a dearth of good tutorials or information on it, but it is a solid step in the right direction for you because it solves the distance dilemma under the hood.

cran.r-project.org/web/packages/dprep/index.html Package ‘dprep’ was removed from the CRAN repository. --> Why? — Vladislav Gladkikh
– Vladislav Gladkikh, Commented Oct 5, 2022 at 2:58

Stack Exchange Network

Nearest Neighbor Recommendation System w/ categorical variables

1 Answer 1

Hot Network Questions

Nearest Neighbor Recommendation System w/ categorical variables

1 Answer 1

Related

Hot Network Questions