Timeline for Do infrequent examples screw up classifiers? If so, when is it okay to remove the infrequent examples from the data?
Current License: CC BY-SA 3.0
4 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jun 15, 2011 at 17:18 | comment | added | paul | @Matt It makes a different, not a huge one. Kappa goes from .45 to .48, 61% to 65% correct, mean absolute error from .21 to .26. I cut it off at >=50 examples. There's still a big range, the smallest being 50 instances and the largest 372 (not a huge dataset because I'm starting with a subset of my full set, 808 instances). If I make it >50, I get 69% correct, kappa=0.52 and mean error = 0.29, which is probably not bad for what I'm doing. | |
| Jun 14, 2011 at 23:29 | answer | added | doug | timeline score: 3 | |
| Jun 14, 2011 at 23:17 | comment | added | Matt Parker | Have you fit the model both with and without the infrequent classes? What happens? | |
| Jun 14, 2011 at 22:15 | history | asked | paul | CC BY-SA 3.0 |