Return to Revisions

2 of 3

added 96 characters in body

edited Sep 15, 2021 at 9:26

7.2k
2
48
89

I'd like to quote from Speech and Language Processing: An introduction to natural language processing:

For named entities, the entity rather than the word is the unit of response.

In your case, the First Bank of Chicago should count as a single response, and it should be predicted as ORG ORG ORG ORG as a whole, otherwise the whole is wrong/false(either false positive or false negative).

If the predicted BIO tags are O B-ORG I-ORG I-ORG, it indicates a boundary error, and the whole is false and then O is false positive and B-ORG I-ORG I-ORG is false negative, two demerits.

However, if the guess tags are O O O O it is just a labeling error and there is only one demerit: one false positive.

In this article: Doing Named Entity Recognition? Don't optimize for F1, Christ Manning stated that the F1 encourages the model to guess all as O if it is not sure because boundary errors and label-boundary errors are more costly.

Side note:
an implement of entity-level F1 score: https://github.com/jantrienes/nereval

answered Sep 14, 2021 at 1:53

Lerner Zhang

7.2k
2
48
89