Skip to main content
33 events
when toggle format what by license comment
Jul 1, 2020 at 20:37 comment added eric_kernfeld For readers like me who struggle to understand what overfitting is exactly (despite ample folklore), this answer may be helpful.
Jul 1, 2020 at 18:35 history protected gung - Reinstate Monica
Jul 1, 2020 at 15:38 answer added gdelab timeline score: 2
Jul 1, 2020 at 10:06 comment added Aleksandr Dubinsky @StephanKolassa It's more fundamental. Even without dropout, the many randomly-initialized neurons do ensembling. Dropout, SGD, etc enhances it but isn't crucial. But again, it would make for great research.
Jul 1, 2020 at 9:57 comment added Stephan Kolassa @AleksandrDubinsky: you may well be right. I suspect that this magic comes from the regularization/pruning/dropout that is usually applied automatically, and which OP thought about turning off.
Jul 1, 2020 at 9:54 comment added Aleksandr Dubinsky @StephanKolassa I'm pretty sure the magic of NN models is that they avoid doing precisely this. That's what makes them great on high-dimensional data. Still, it would make for a very interesting experiment. I suppose that the more spurious features in each sample, the larger the model would have to be in order to regularize over the additional variance.
Jul 1, 2020 at 6:53 comment added Stephan Kolassa @AleksandrDubinsky: I am not an expert on neural networks, which is why I am suggesting this as a comment, not posting as an answer, and so I don't have a reference. I am linking to another answer of mine as an illustration, since I am most familiar with what I wrote myself.
Jul 1, 2020 at 4:49 answer added Wololo timeline score: 4
Jul 1, 2020 at 4:10 answer added D.W. timeline score: 5
Jul 1, 2020 at 3:21 comment added skrrrt Not sure about neural networks but using a decision tree with no max depth you could surely overfit
Jul 1, 2020 at 2:02 comment added SpiderRico Regarding batch size, I think using larger batches help over-fitting. There are some experimental results that show the variance in stochastic gradients play some kind of a regularization effect. So with large batch sizes, you reduce the variance. Anyway, obtaining good training but bad validation accuracy is trivial: just memorize the training dataset?
Jun 30, 2020 at 22:37 answer added Peteris timeline score: 8
Jun 30, 2020 at 20:39 comment added Aleksandr Dubinsky @StephanKolassa Cite that this would actually ruin performance of a neural network? You link to yourself making the same assertion.
Jun 30, 2020 at 18:37 answer added csiz timeline score: 2
Jun 30, 2020 at 18:35 answer added Aleksandr Dubinsky timeline score: 2
Jun 30, 2020 at 16:55 history became hot network question
Jun 30, 2020 at 12:00 history tweeted twitter.com/StackStats/status/1277934915187859457
Jun 30, 2020 at 10:40 history edited Rahn CC BY-SA 4.0
added 15 characters in body
Jun 30, 2020 at 10:33 history edited Rahn CC BY-SA 4.0
added 39 characters in body
Jun 30, 2020 at 10:20 answer added HXD timeline score: 3
Jun 30, 2020 at 9:46 comment added Stephan Kolassa @DikranMarsupial: yes, as I wrote, "completely random features".
Jun 30, 2020 at 9:35 comment added Dikran Marsupial @StephanKolassa I assume you mean adding additional input features/attributes, rather than additional training samples (random input data could mean either)?
Jun 30, 2020 at 9:32 comment added Stephan Kolassa I am not talking about adding layers. I am talking about adding random input data.
Jun 30, 2020 at 9:27 answer added Dikran Marsupial timeline score: 16
Jun 30, 2020 at 9:25 comment added Rahn @StephanKolassa my experience says that simply adding more layers/channels doesn't usually improve training performance.
Jun 30, 2020 at 9:22 comment added Tim @StephanKolassa I guess, that brings the question: what is "allowed" for making it overfitt? For example, using random label at train stage, but correct ones at test stage would work.
Jun 30, 2020 at 9:21 comment added Stephan Kolassa Add a lot of completely random features to your net. Unless you prune/regularize, your net will latch on the spurious correlations and do better and better in training. And worse in testing/validation. You can even overfit on the test set, it's just a question of sifting through enough random data. See here.
Jun 30, 2020 at 9:18 comment added Rahn @StephanKolassa could you elaborate?
Jun 30, 2020 at 9:17 comment added Stephan Kolassa @Tim: wouldn't just adding massive amounts of totally random data do the trick?
Jun 30, 2020 at 9:03 history edited Rahn CC BY-SA 4.0
deleted 1 character in body
Jun 30, 2020 at 9:00 comment added Tim Nothing "guarantees" overfitting. If there was something like this, we would simply be not using it when building the neural networks.
Jun 30, 2020 at 8:56 history edited Rahn CC BY-SA 4.0
added 30 characters in body
Jun 30, 2020 at 8:51 history asked Rahn CC BY-SA 4.0