How to derive false positive and false negative from top-k accuracy?

Question

I am working on the following "equality identification" problem and become quite confused on how to reasonably define false positive and false negative in my case.

Problem:

Suppose I have a large set of graphs $G$, where for every graph $g \in G$, there is one and only one graph $g'$ that is identical to $g$. I am training a model $M$ which, given two graphs $g, g' \in G$, can decide how similar these two graphs are. The output of $M$, $o =M(g, g')$, denotes a similarity score of these two graphs. $o$ is a floating number, ranging from $[0, 1]$.

To evaluate the performance of $M$, I am measuring the top-$k$ accuracy of its predications in the following very standard way:

for g \in G: for t \in G: // suppose here t != g and therefore comparison is meaningful accuracy[(g, t)] = M(g, t) correct_prediction = 0 total_prediction = 0 for g \in G: total_prediction += 1 sort_accuracy_array_on_graph (g); // we sort all comparisons on $g$ with other graphs \in G if g' in top-$k$ most similarity graphs with g: // g' is the ground truth, the "identical" graph with g correct_prediction += 1 correct_prediction / total_prediction // the average top-$k$ accuracy

OK, so this seems very standard approach to computing the top-$k$ accuracy of $M$. However, currently I want to take one step further and somewhat measure the false positive and false negative of $M$. I understand that FP and FN are both very standard metrics in data mining, but just somewhat become very confused on how to define it in my problem. For instance, I can define False Negative in the following way:

False Negative of $M$: conceptually, FN denotes that $g'$, the identical graph of $g$, does not appear in top-$k$ most similarity graph of $g$. Then, perhaps FN is interchangeable with top-$k$ accuracy in my case?
False Positive of $M$: but if we accept that FN is interchangeable with top-$k$ accuracy in my case, then how to define FP of $M$? FP denotes that $M$ aggressively treats graph $t$, where $t != g'$, as the "similar" graph of $g$ and appear in its top-$k$ most similar set. Then isn't it indicating that FP is ALSO interchangeable with top-$k$ accuracy in my case? That seems suspicious because in my understanding, FP should not equal to FN? There must be something wrong.

Am I clear on this confusion? Any suggestion would be appreciated. Thank you!

dapetillo · Accepted Answer · 2020-07-02 16:19:54Z

I think your confusion comes from the fact that the false negative of $M$ as you thought it, is not top-$k$ accuracy but perhaps $1$ $-$ top-$k$. Moreover, how do you evaluate the condition if g' in top-$k$ most similarity graphs with g?

In any case, for your case I thought the following.

Let's say that if the model $M$ returns a value $o > o^*$, where $o, o^* \in [0, 1]$ (and $o^*$ is a threshold value decided by a human), then we assume that $g \equiv g'$; $g \neq g'$ otherwise.

Thus, a false negative of $M$ is when $o < o^*$ but in reality, $g \equiv g'$.

A false positive of $M$ instead is when $o > o^*$ but in reality, $g \neq g'$.

Of course your result will depend on what value(s) of $o^*$ you choose, but now it should be more clear what are FN and FP.

Thank you for the reply. Yes, deciding a threshold $T$ makes a lot of sense to me. — lllllllllllll
– lllllllllllll, Commented Jul 2, 2020 at 16:22
Let me think more before accepting your answer. Thank you :) — lllllllllllll
– lllllllllllll, Commented Jul 2, 2020 at 16:23

Stack Exchange Network

How to derive false positive and false negative from top-k accuracy?

1 Answer 1

Hot Network Questions

How to derive false positive and false negative from top-k accuracy?

1 Answer 1

Related

Hot Network Questions