Timeline for How do I intentionally design an overfitting neural network?

Current License: CC BY-SA 4.0

33 events

when toggle format	what		by	license	comment
Jul 1, 2020 at 20:37	comment	added	eric_kernfeld		For readers like me who struggle to understand what overfitting is exactly (despite ample folklore), this answer may be helpful.
Jul 1, 2020 at 18:35	history	protected	gung - Reinstate Monica
Jul 1, 2020 at 15:38	answer	added	gdelab		timeline score: 2
Jul 1, 2020 at 10:06	comment	added	Aleksandr Dubinsky		@StephanKolassa It's more fundamental. Even without dropout, the many randomly-initialized neurons do ensembling. Dropout, SGD, etc enhances it but isn't crucial. But again, it would make for great research.
Jul 1, 2020 at 9:57	comment	added	Stephan Kolassa		@AleksandrDubinsky: you may well be right. I suspect that this magic comes from the regularization/pruning/dropout that is usually applied automatically, and which OP thought about turning off.
Jul 1, 2020 at 9:54	comment	added	Aleksandr Dubinsky		@StephanKolassa I'm pretty sure the magic of NN models is that they avoid doing precisely this. That's what makes them great on high-dimensional data. Still, it would make for a very interesting experiment. I suppose that the more spurious features in each sample, the larger the model would have to be in order to regularize over the additional variance.
Jul 1, 2020 at 6:53	comment	added	Stephan Kolassa		@AleksandrDubinsky: I am not an expert on neural networks, which is why I am suggesting this as a comment, not posting as an answer, and so I don't have a reference. I am linking to another answer of mine as an illustration, since I am most familiar with what I wrote myself.
Jul 1, 2020 at 4:49	answer	added	Wololo		timeline score: 4
Jul 1, 2020 at 4:10	answer	added	D.W.		timeline score: 5
Jul 1, 2020 at 3:21	comment	added	skrrrt		Not sure about neural networks but using a decision tree with no max depth you could surely overfit
Jul 1, 2020 at 2:02	comment	added	SpiderRico		Regarding batch size, I think using larger batches help over-fitting. There are some experimental results that show the variance in stochastic gradients play some kind of a regularization effect. So with large batch sizes, you reduce the variance. Anyway, obtaining good training but bad validation accuracy is trivial: just memorize the training dataset?
Jun 30, 2020 at 22:37	answer	added	Peteris		timeline score: 8
Jun 30, 2020 at 20:39	comment	added	Aleksandr Dubinsky		@StephanKolassa Cite that this would actually ruin performance of a neural network? You link to yourself making the same assertion.
Jun 30, 2020 at 18:37	answer	added	csiz		timeline score: 2
Jun 30, 2020 at 18:35	answer	added	Aleksandr Dubinsky		timeline score: 2
Jun 30, 2020 at 16:55	history	became hot network question
Jun 30, 2020 at 12:00	history	tweeted			twitter.com/StackStats/status/1277934915187859457
Jun 30, 2020 at 10:40	history	edited	Rahn	CC BY-SA 4.0	added 15 characters in body
Jun 30, 2020 at 10:33	history	edited	Rahn	CC BY-SA 4.0	added 39 characters in body
Jun 30, 2020 at 10:20	answer	added	HXD		timeline score: 3
Jun 30, 2020 at 9:46	comment	added	Stephan Kolassa		@DikranMarsupial: yes, as I wrote, "completely random features".
Jun 30, 2020 at 9:35	comment	added	Dikran Marsupial		@StephanKolassa I assume you mean adding additional input features/attributes, rather than additional training samples (random input data could mean either)?
Jun 30, 2020 at 9:32	comment	added	Stephan Kolassa		I am not talking about adding layers. I am talking about adding random input data.
Jun 30, 2020 at 9:27	answer	added	Dikran Marsupial		timeline score: 16
Jun 30, 2020 at 9:25	comment	added	Rahn		@StephanKolassa my experience says that simply adding more layers/channels doesn't usually improve training performance.
Jun 30, 2020 at 9:22	comment	added	Tim		@StephanKolassa I guess, that brings the question: what is "allowed" for making it overfitt? For example, using random label at train stage, but correct ones at test stage would work.
Jun 30, 2020 at 9:21	comment	added	Stephan Kolassa		Add a lot of completely random features to your net. Unless you prune/regularize, your net will latch on the spurious correlations and do better and better in training. And worse in testing/validation. You can even overfit on the test set, it's just a question of sifting through enough random data. See here.
Jun 30, 2020 at 9:18	comment	added	Rahn		@StephanKolassa could you elaborate?
Jun 30, 2020 at 9:17	comment	added	Stephan Kolassa		@Tim: wouldn't just adding massive amounts of totally random data do the trick?
Jun 30, 2020 at 9:03	history	edited	Rahn	CC BY-SA 4.0	deleted 1 character in body
Jun 30, 2020 at 9:00	comment	added	Tim		Nothing "guarantees" overfitting. If there was something like this, we would simply be not using it when building the neural networks.
Jun 30, 2020 at 8:56	history	edited	Rahn	CC BY-SA 4.0	added 30 characters in body
Jun 30, 2020 at 8:51	history	asked	Rahn	CC BY-SA 4.0

toggle format