I have searched on stackexchange and found a couple of topics like this and this but they are not quite relevant to my problem (or at least I don't know how to make them relevant to my problem).
Anyway, say I have two sets of prediction results, as show by df1 and df2.
y_truth = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1] y_predicted_rank1 = [6, 1, 7, 2, 8, 3, 9, 4, 10, 5] y_predicted_rank2 = [4, 1, 7, 2, 8, 3, 9, 6, 10, 5] df1 = pd.DataFrame({'tag': yy_truth, 'predicted_rank': y_predicted_rank1}).sort_values('predicted_rank') df2 = pd.DataFrame({'tag': yy_truth, 'predicted_rank': y_predicted_rank2}).sort_values('predicted_rank') print(df1) # tag predicted_rank #1 1 1 #3 1 2 #5 1 3 #7 1 4 #9 1 5 #0 0 6 #2 0 7 #4 0 8 #6 0 9 #8 0 10 print(df2) # tag predicted_rank #1 1 1 #3 1 2 #5 1 3 #0 0 4 #9 1 5 #7 1 6 #2 0 7 #4 0 8 #6 0 9 #8 0 10 By looking at them, I know df1 is better than df2, since in df2, a negative sample (zero) was predicted to have rank #4. So my question is, what metric can be used here so that I can (mathematically) tell df1 is better than df2?