0
$\begingroup$

I am working on the following "equality identification" problem and become quite confused on how to reasonably define false positive and false negative in my case.

Problem:

Suppose I have a large set of graphs $G$, where for every graph $g \in G$, there is one and only one graph $g'$ that is identical to $g$. I am training a model $M$ which, given two graphs $g, g' \in G$, can decide how similar these two graphs are. The output of $M$, $o =M(g, g')$, denotes a similarity score of these two graphs. $o$ is a floating number, ranging from $[0, 1]$.

To evaluate the performance of $M$, I am measuring the top-$k$ accuracy of its predications in the following very standard way:

for g \in G: for t \in G: // suppose here t != g and therefore comparison is meaningful accuracy[(g, t)] = M(g, t) correct_prediction = 0 total_prediction = 0 for g \in G: total_prediction += 1 sort_accuracy_array_on_graph (g); // we sort all comparisons on $g$ with other graphs \in G if g' in top-$k$ most similarity graphs with g: // g' is the ground truth, the "identical" graph with g correct_prediction += 1 correct_prediction / total_prediction // the average top-$k$ accuracy 

OK, so this seems very standard approach to computing the top-$k$ accuracy of $M$. However, currently I want to take one step further and somewhat measure the false positive and false negative of $M$. I understand that FP and FN are both very standard metrics in data mining, but just somewhat become very confused on how to define it in my problem. For instance, I can define False Negative in the following way:

  • False Negative of $M$: conceptually, FN denotes that $g'$, the identical graph of $g$, does not appear in top-$k$ most similarity graph of $g$. Then, perhaps FN is interchangeable with top-$k$ accuracy in my case?

  • False Positive of $M$: but if we accept that FN is interchangeable with top-$k$ accuracy in my case, then how to define FP of $M$? FP denotes that $M$ aggressively treats graph $t$, where $t != g'$, as the "similar" graph of $g$ and appear in its top-$k$ most similar set. Then isn't it indicating that FP is ALSO interchangeable with top-$k$ accuracy in my case? That seems suspicious because in my understanding, FP should not equal to FN? There must be something wrong.

Am I clear on this confusion? Any suggestion would be appreciated. Thank you!

$\endgroup$

1 Answer 1

1
$\begingroup$

I think your confusion comes from the fact that the false negative of $M$ as you thought it, is not top-$k$ accuracy but perhaps $1$ $-$ top-$k$. Moreover, how do you evaluate the condition if g' in top-$k$ most similarity graphs with g?

In any case, for your case I thought the following.

Let's say that if the model $M$ returns a value $o > o^*$, where $o, o^* \in [0, 1]$ (and $o^*$ is a threshold value decided by a human), then we assume that $g \equiv g'$; $g \neq g'$ otherwise.

Thus, a false negative of $M$ is when $o < o^*$ but in reality, $g \equiv g'$.

A false positive of $M$ instead is when $o > o^*$ but in reality, $g \neq g'$.

Of course your result will depend on what value(s) of $o^*$ you choose, but now it should be more clear what are FN and FP.

$\endgroup$
2
  • $\begingroup$ Thank you for the reply. Yes, deciding a threshold $T$ makes a lot of sense to me. $\endgroup$ Commented Jul 2, 2020 at 16:22
  • $\begingroup$ Let me think more before accepting your answer. Thank you :) $\endgroup$ Commented Jul 2, 2020 at 16:23

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.