- Notifications
You must be signed in to change notification settings - Fork 27
Description
I'm looking at the implications from this line:
nervaluate/src/nervaluate/evaluate.py
Line 265 in 36fd20e
| for true in spurious_tags: |
In particular, I have a fair amount of data that's really unbalanced, so assigning a spurious prediction to every class completely washes out the precision metrics for rare classes. For example, if I end up with 100 spurious tags out of 2000 true entities, and one of my classes only has 20 examples, the precision on that class is now taken out of denominator of 120, regardless of which predicted classes comprised the 100 spurious tags.
Shouldn't the spurious tag just be a false positive for the predicted class, and a false negative for the "outside" class (which we kind of don't care about)? I could maybe be convinced otherwise, but wanted to suggest the change, because this is how I'm currently using this code privately.