Timeline for Precision, Recall and/or F1? Which should I use? or something different?

Current License: CC BY-SA 4.0

12 events

when toggle format	what		by	license	comment
Aug 3, 2021 at 21:10	comment	added	Stephan Kolassa		Finally, yes, precision and recall are also improper. Actually, they are not scoring rules at all. This earlier thread is on accuracy, but the argument applies to precision and recall as well. If you want to, you could open a question here on CV to ask for a deeper explanation, and possibly link there here. I would love to promise I'll answer, but I'm really starved for CV time right now - sorry. But there are other people out there. Like @Dave.
Aug 3, 2021 at 21:07	comment	added	Stephan Kolassa		As to which rule to choose, Why is LogLoss preferred over other proper scoring rules? specifically presents arguments for and against the log and the Brier score. It also contains pointers to literature on how to choose a scoring rule. I personally like the log score, because it hits you on the head hard if something "impossible" occurs. That is, if you see an outcome you assigned a probability of zero to, the log score will be infinite. I consider this a Good Thing, others feel it's a bug.
Aug 3, 2021 at 21:06	comment	added	Stephan Kolassa		Yes, the "proper" is crucial. Any mapping that maps a probabilistic prediction and an actual outcome to a score is a scoring rule, but a proper scoring rule is one that is optimized (in expectation) by the true density. So you really want to use proper scoring rules. See the tag wiki, which you may already have read.
Aug 3, 2021 at 16:50	comment	added	Dave		Frank Harrell describes the log-loss as the gold standard, due to its relationship to maximum likelihood estimation. Other strictly proper scoring rules exist, but it might help if you can explain why that might not work for your task. // Regarding the comments last week, as much as I appreciate it, I have to laugh at me being mentioned as the authority on this, since I learned about proper scoring rules on here from @StephanKolassa (among a few others).
Aug 3, 2021 at 16:25	history	edited	Panda	CC BY-SA 4.0	deleted 3 characters in body
Aug 2, 2021 at 10:25	comment	added	Panda		That was some heavy reading for me coming from 0 experience in this field. I don't know if I have fully understood the recommendations. 1) From what I understand, the use of accuracy is wrong which I beleive is the same reason I originally decided for recall and precision of only the minority 2 classes, but I don't think I grasped why these are also invalid vs proper scoring rules. 2) Am I right that you are specifically saying to use "proper scoring rules" vs of "scoring rules"? 3) Given the last paragraph in my question, is there a recommended scoring, or how can I find out?
Jul 30, 2021 at 13:19	comment	added	Stephan Kolassa		Instead, use probabilistic classifications, and evaluate these using proper scoring rules. On class balance, see Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
Jul 30, 2021 at 13:19	comment	added	Stephan Kolassa		Don't use accuracy, precision, recall, sensitivity, specificity, or the F1 score. Every criticism at the following threads applies equally to all of these, and indeed to all evaluation metrics that rely on hard classifications: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold
Jul 30, 2021 at 13:19	comment	added	Stephan Kolassa		@AryaMcCarthy - fear not. When Dave is off, I take up the slack. We provide a round-the-clock drumbeat for better practices.
Jul 30, 2021 at 12:20	comment	added	Arya McCarthy		I believe (hope) Dave will chime in soon about proper scoring rules and how class imbalance isn’t a problem. If not, you can still search for these topics on the site.
Jul 30, 2021 at 11:41	review	First posts
Jul 30, 2021 at 14:18
Jul 30, 2021 at 11:37	history	asked	Panda	CC BY-SA 4.0