If my dataframe looks like this:
user item property_1 property_2 property_3 rating u1 i1 90.2 0 NaN 0 u1 i2 80.2 1 0.90 1 u1 i3 70.2 1 NaN 1 u2 i2 80.2 1 0.90 0 u2 i4 80.4 0 0.10 1 u3 i1 90.2 0 NaN 1 u3 i4 80.4 0 0.10 1 u3 i5 93.9 1 0.33 0 u3 i6 90.9 0 0.55 0 u4 i1 90.2 0 NaN 0 u4 i6 90.9 0 0.55 1 u4 i7 50.2 1 NaN 1 And I want to predict what rating would a user give to an item using these properties, what method should I apply? Something that would look at the user-item pairs.
Because I used XGBoost for classification, with property_1, property_2, property_3 as features, I obtained good results, but my model doesn't know that more users rated the same item, does it? That the users and items appear multiple times, even if I have no duplicates. For example, second row and fourth row have the same properties, but different ratings, because the users are different:
user item property_1 property_2 property_3 rating u1 i2 80.2 1 0.90 1 u2 i2 80.2 1 0.90 0 I already have a collaborative filtering in a separate model that works well, but it doesn't look at the properties of the item, which is something that I want to use. And if I add item as a feature column I get the error:
ValueError: DataFrame.dtypes for data must be int, float, bool or categorical. When categorical type is supplied, DMatrix parameter `enable_categorical` must be set to `True`.item