Train-Test split for a recommender system

Question

In all implementations of recommender systems I've seen so far, the train-test split is performed in this manner:

+------+------+--------+ | user | item | rating | +------+------+--------+ | u1 | i1 | 2.3 | | u2 | i2 | 5.3 | | u1 | i4 | 1.0 | | u3 | i5 | 1.6 | | ... | ... | ... | +------+------+--------+

This is transformed into a rating matrix of the form:

+------+-------+-------+-------+-------+-------+-----+ | user | item1 | item2 | item3 | item4 | item5 | ... | +------+-------+-------+-------+-------+-------+-----+ | u1 | 2.3 | 1.7 | 0.5 | 1.0 | NaN | ... | | u2 | NaN | 5.3 | 1.0 | 0.2 | 4.3 | ... | | u3 | NaN | NaN | 2.1 | 1.3 | 1.6 | ... | | ... | ... | ... | ... | ... | ... | ... | +------+-------+-------+-------+-------+-------+-----+

where NaN corresponds to the situation where a user has not rated that particular item.

Now, from each row (user) of the matrix, a certain percentage of the numeric (non-NaN) values are removed and set aside into a new matrix, representing the test set. The model is then trained on the initial matrix, with test samples removed, and the goal of the recommender is to fill-in the missing values, with the smallest possible error.

My question is, can the train-test split be somehow done user-wise? For example to keep a set of users separate, train the recommender on the rest of the user set and then try to predict the ratings for the new users? I know this goes a bit against the idea that "if a recommender does not know you, it cannot recommend something you like", but I am wondering if some k-NN can be done.

yoav_aaa · Accepted Answer · 2020-01-22 09:38:31Z

My two cents,
Evaluating results of a recommendation engine where test-set is unseen users only, will allow you doing exactly that.
Evaluating results on unseen users only.
If this is indeed the motivation(product/business wise) behind your recommendation engine I would suggest trying to tackle this problem directly.
If evaluating performance on 'unseen' users is an additional metric and not the only one I would keep with current scope of train/test splitting(which should create some unseen users in the test set).

Regarding methods on how to best recommend to unseen users, your intuition(K-nn) is one common solution.
See this for example

igorkf · Accepted Answer · 2020-11-13 01:45:47Z

If you are using a sparse matrix, you can split train/test dataset using this function from LightFM:
lightfm.cross_validation.random_train_test_split

The link above says:

Randomly split interactions between training and testing.
This function takes an interaction set and splits it into two disjoint sets, a training set and a test set. Note that no effort is made to make sure that all items and users with interactions in the test set also have interactions in the training set; this may lead to a partial cold-start problem in the test set.

Parameters: interactions (a scipy sparse matrix containing interactions) – The interactions to split. test_percentage (float, optional) – The fraction of interactions to place in the test set. random_state (np.random.RandomState, optional) – The random state used for the shuffle. Returns:
(train, test) – scipy.sparse.COOMatrix) A tuple of (train data, test data)

Return type:
scipy.sparse.COOMatrix,

Stack Exchange Network

Train-Test split for a recommender system

2 Answers 2

Hot Network Questions

Train-Test split for a recommender system

2 Answers 2

Related

Hot Network Questions