I'm reading Collaborative Filtering for Implicit Feedback Datasets. On page 6 they detail their evaluation strategy, which they define as mean Expected Percentile Ranking with the following formula:
$$\overline{\text{rank}} = \frac{\sum_{u,i} r^t_{ui} \text{rank}_{ui}}{\sum_{u,i} r^t_{ui}}$$
This is the same formula that Datacamp defines as the appropriate error metric for implicit recommendation engines, except they call it "Rank Ordering Error Metric". I'm implementing the system in Spark, so I've defined a test dataset to try things out:
test_df = spark.createDataFrame( [ ("A", "Fish", 1, 1), ("A", "Dogs", 2, 2), ("A", "Cats", 3, 3), ("A", "Elephants", 4, 4), ("B", "Fish", 1, 1), ("B", "Dogs", 2, 2), ("B", "Cats", 3, 3), ("B", "Elephants", 4, 4) ], ["Customer", "Item", "ImplicitRating", "PredictedRating"] ) rankWindow = Window.partitionBy("Customer").orderBy(desc("PredictedRating")) test_df\ .withColumn("RankUI", percent_rank().over(rankWindow))\ .withColumn("RankUIxRating", col("RankUI") * col("ImplicitRating"))\ .show() and the output is:
+--------+---------+--------------+---------------+------------------+------------------+ |Customer| Item|ImplicitRating|PredictedRating| RankUI| RankUIxRating| +--------+---------+--------------+---------------+------------------+------------------+ | B|Elephants| 4| 4| 0.0| 0.0| | B| Cats| 3| 3|0.3333333333333333| 1.0| | B| Dogs| 2| 2|0.6666666666666666|1.3333333333333333| | B| Fish| 1| 1| 1.0| 1.0| | A|Elephants| 4| 4| 0.0| 0.0| | A| Cats| 3| 3|0.3333333333333333| 1.0| | A| Dogs| 2| 2|0.6666666666666666|1.3333333333333333| | A| Fish| 1| 1| 1.0| 1.0| +--------+---------+--------------+---------------+------------------+------------------+ I'm effectively modelling a perfect prediction here by setting the Predicted "Rating" to match the ImplicitRating. My problem is that plugging those values into the formula above gives me...
$$\overline{\text{rank}} = \frac{\sum_{u,i} r^t_{ui} \text{rank}_{ui}}{\sum_{u,i} r^t_{ui}} = \frac{0.0+1.0+1.\dot{33}+1.0+0.0+1.0+1.\dot{33}+1.0}{4+3+2+1+4+3+2+1} = \frac{6.\dot{66}}{20} = 0.\dot{33}$$
Given the paper is explicit in saying that lower values of $\overline{\text{rank}}$ are better and that they achieved values as low as ~ 8%, I'm confused as to how that can be given my experience in this experiment.
What am I doing wrong?