Timeline for Denoising Autoencoder not training properly
Current License: CC BY-SA 3.0
4 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Feb 10, 2016 at 15:16 | comment | added | Blackecho | I initialize the weights using a uniform random distribution between - k * sqrt(6 / (fan in + fan out)) and k * sqrt(6 / (fan in + fan out)) where I use k = 1 with the tanh activation function and k = 4 for the sigmoid. Thanks for making me notice that I forgot to include the weights initialization function in the code :) | |
| Feb 9, 2016 at 13:16 | comment | added | johnblund | Yeah, it should definitely be possible, but naturally it seems that a lower number of hidden units would easier find a lower dimensional representation of the data (abstract features). I am not sure how you initialize the weights, but this as I understand is very important to get convergence. You want them to be in the linear region of the activation function at the start to get the gradient descent going. | |
| Feb 9, 2016 at 11:44 | comment | added | Blackecho | I am just getting started too. The idea is stacking DAs, but I would expect a single DA to extract meaningful features even without stacking and fine tuning. Am I wrong? However, I'll try with a lower number of hidden nodes, thanks! | |
| Feb 9, 2016 at 9:51 | history | answered | johnblund | CC BY-SA 3.0 |