Timeline for How do I intentionally design an overfitting neural network?
Current License: CC BY-SA 4.0
33 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jul 1, 2020 at 20:37 | comment | added | eric_kernfeld | For readers like me who struggle to understand what overfitting is exactly (despite ample folklore), this answer may be helpful. | |
| Jul 1, 2020 at 18:35 | history | protected | gung - Reinstate Monica | ||
| Jul 1, 2020 at 15:38 | answer | added | gdelab | timeline score: 2 | |
| Jul 1, 2020 at 10:06 | comment | added | Aleksandr Dubinsky | @StephanKolassa It's more fundamental. Even without dropout, the many randomly-initialized neurons do ensembling. Dropout, SGD, etc enhances it but isn't crucial. But again, it would make for great research. | |
| Jul 1, 2020 at 9:57 | comment | added | Stephan Kolassa | @AleksandrDubinsky: you may well be right. I suspect that this magic comes from the regularization/pruning/dropout that is usually applied automatically, and which OP thought about turning off. | |
| Jul 1, 2020 at 9:54 | comment | added | Aleksandr Dubinsky | @StephanKolassa I'm pretty sure the magic of NN models is that they avoid doing precisely this. That's what makes them great on high-dimensional data. Still, it would make for a very interesting experiment. I suppose that the more spurious features in each sample, the larger the model would have to be in order to regularize over the additional variance. | |
| Jul 1, 2020 at 6:53 | comment | added | Stephan Kolassa | @AleksandrDubinsky: I am not an expert on neural networks, which is why I am suggesting this as a comment, not posting as an answer, and so I don't have a reference. I am linking to another answer of mine as an illustration, since I am most familiar with what I wrote myself. | |
| Jul 1, 2020 at 4:49 | answer | added | Wololo | timeline score: 4 | |
| Jul 1, 2020 at 4:10 | answer | added | D.W. | timeline score: 5 | |
| Jul 1, 2020 at 3:21 | comment | added | skrrrt | Not sure about neural networks but using a decision tree with no max depth you could surely overfit | |
| Jul 1, 2020 at 2:02 | comment | added | SpiderRico | Regarding batch size, I think using larger batches help over-fitting. There are some experimental results that show the variance in stochastic gradients play some kind of a regularization effect. So with large batch sizes, you reduce the variance. Anyway, obtaining good training but bad validation accuracy is trivial: just memorize the training dataset? | |
| Jun 30, 2020 at 22:37 | answer | added | Peteris | timeline score: 8 | |
| Jun 30, 2020 at 20:39 | comment | added | Aleksandr Dubinsky | @StephanKolassa Cite that this would actually ruin performance of a neural network? You link to yourself making the same assertion. | |
| Jun 30, 2020 at 18:37 | answer | added | csiz | timeline score: 2 | |
| Jun 30, 2020 at 18:35 | answer | added | Aleksandr Dubinsky | timeline score: 2 | |
| Jun 30, 2020 at 16:55 | history | became hot network question | |||
| Jun 30, 2020 at 12:00 | history | tweeted | twitter.com/StackStats/status/1277934915187859457 | ||
| Jun 30, 2020 at 10:40 | history | edited | Rahn | CC BY-SA 4.0 | added 15 characters in body |
| Jun 30, 2020 at 10:33 | history | edited | Rahn | CC BY-SA 4.0 | added 39 characters in body |
| Jun 30, 2020 at 10:20 | answer | added | HXD | timeline score: 3 | |
| Jun 30, 2020 at 9:46 | comment | added | Stephan Kolassa | @DikranMarsupial: yes, as I wrote, "completely random features". | |
| Jun 30, 2020 at 9:35 | comment | added | Dikran Marsupial | @StephanKolassa I assume you mean adding additional input features/attributes, rather than additional training samples (random input data could mean either)? | |
| Jun 30, 2020 at 9:32 | comment | added | Stephan Kolassa | I am not talking about adding layers. I am talking about adding random input data. | |
| Jun 30, 2020 at 9:27 | answer | added | Dikran Marsupial | timeline score: 16 | |
| Jun 30, 2020 at 9:25 | comment | added | Rahn | @StephanKolassa my experience says that simply adding more layers/channels doesn't usually improve training performance. | |
| Jun 30, 2020 at 9:22 | comment | added | Tim | @StephanKolassa I guess, that brings the question: what is "allowed" for making it overfitt? For example, using random label at train stage, but correct ones at test stage would work. | |
| Jun 30, 2020 at 9:21 | comment | added | Stephan Kolassa | Add a lot of completely random features to your net. Unless you prune/regularize, your net will latch on the spurious correlations and do better and better in training. And worse in testing/validation. You can even overfit on the test set, it's just a question of sifting through enough random data. See here. | |
| Jun 30, 2020 at 9:18 | comment | added | Rahn | @StephanKolassa could you elaborate? | |
| Jun 30, 2020 at 9:17 | comment | added | Stephan Kolassa | @Tim: wouldn't just adding massive amounts of totally random data do the trick? | |
| Jun 30, 2020 at 9:03 | history | edited | Rahn | CC BY-SA 4.0 | deleted 1 character in body |
| Jun 30, 2020 at 9:00 | comment | added | Tim | Nothing "guarantees" overfitting. If there was something like this, we would simply be not using it when building the neural networks. | |
| Jun 30, 2020 at 8:56 | history | edited | Rahn | CC BY-SA 4.0 | added 30 characters in body |
| Jun 30, 2020 at 8:51 | history | asked | Rahn | CC BY-SA 4.0 |