Timeline for How much of neural network overconfidence in predictions can be attributed to modelers optimizing threshold-based metrics?

Current License: CC BY-SA 4.0

14 events

when toggle format	what		by	license	comment
Sep 18, 2023 at 16:49	answer	added	D.W.		timeline score: 4
Sep 7, 2023 at 11:01	comment	added	Dave		@seanv507 The “confidence” in “overconfidence” seems to refer to more of the colloquial sense of the word than anything formal about standard error or a confidence interval.
Sep 7, 2023 at 10:57	comment	added	seanv507		You are making a common error "But then I figure that the model would be less confident in its predictions". if I have 9 cats and 1 dog in my sample then my estimate is 90%, but my confidence depends on the sample size, 10 vs 1000 etc.
Nov 17, 2021 at 17:04	comment	added	Dave		@StephanKolassa I found an ICML paper by Guo, "On calibration of modern neural networks", that seems to align with what I posit. I think Guo misses some elements of calibration, but the paper does mention that log loss (paper calls it "NLL", if you are doing CTRL+F) can be ovefitted without overfitting accuracy based on the category with the highest probability.
Nov 17, 2021 at 14:33	history	edited	Dave		edited tags
Nov 2, 2021 at 7:28	answer	added	HXD		timeline score: 3
Jul 26, 2021 at 7:10	comment	added	Dikran Marsupial		@StephanKolassa indeed it can. LR can even overfit when it is not over-parameterised, which is why regularised (ridge) logistic regression is a very useful tool to have in your statistic toolbox.
Jul 26, 2021 at 7:08	answer	added	Dikran Marsupial		timeline score: 6
Jul 2, 2021 at 19:25	comment	added	Stephan Kolassa		@Dave: yes, that makes sense. Logistic regression can also overfit if you over-parameterize it. And conversely, I would not expect a simple network architecture to overfit badly.
Jul 2, 2021 at 16:48	comment	added	Dave		@StephanKolassa Why would that be so unique to neural networks and not logistic regression? Is it a matter of a neural network having (perhaps) millions of parameters but the logistic regression maybe having dozens?
Jun 30, 2021 at 15:40	history	edited	Dave	CC BY-SA 4.0	edited title
Jun 30, 2021 at 15:20	comment	added	Aleksejs Fomins		I think the key term to google is Expected Callibration Error (ECE). I suspect this post will answer your question alondaks.com/2017/12/31/…
Jun 30, 2021 at 14:38	comment	added	Stephan Kolassa		Good question. I suspect part of the answer is that you can overfit to proper scoring rules just as easily as to other KPIs if you use them in-sample. After all, OLS is fitted by maximizing the log likelihood, which is the log score, a proper scoring rule - but that OLS can overfit is common knowledge.
Jun 30, 2021 at 14:35	history	asked	Dave	CC BY-SA 4.0