Timeline for Do infrequent examples screw up classifiers? If so, when is it okay to remove the infrequent examples from the data?

4 events

when toggle format	what		by	license	comment
Jun 15, 2011 at 17:18	comment	added	paul		@Matt It makes a different, not a huge one. Kappa goes from .45 to .48, 61% to 65% correct, mean absolute error from .21 to .26. I cut it off at >=50 examples. There's still a big range, the smallest being 50 instances and the largest 372 (not a huge dataset because I'm starting with a subset of my full set, 808 instances). If I make it >50, I get 69% correct, kappa=0.52 and mean error = 0.29, which is probably not bad for what I'm doing.
Jun 14, 2011 at 23:29	answer	added	doug		timeline score: 3
Jun 14, 2011 at 23:17	comment	added	Matt Parker		Have you fit the model both with and without the infrequent classes? What happens?
Jun 14, 2011 at 22:15	history	asked	paul	CC BY-SA 3.0