Timeline for Denoising Autoencoder not training properly

4 events

when toggle format	what		by	license	comment
Feb 10, 2016 at 15:16	comment	added	Blackecho		I initialize the weights using a uniform random distribution between `- k * sqrt(6 / (fan in + fan out))` and `k * sqrt(6 / (fan in + fan out))` where I use k = 1 with the tanh activation function and k = 4 for the sigmoid. Thanks for making me notice that I forgot to include the weights initialization function in the code :)
Feb 9, 2016 at 13:16	comment	added	johnblund		Yeah, it should definitely be possible, but naturally it seems that a lower number of hidden units would easier find a lower dimensional representation of the data (abstract features). I am not sure how you initialize the weights, but this as I understand is very important to get convergence. You want them to be in the linear region of the activation function at the start to get the gradient descent going.
Feb 9, 2016 at 11:44	comment	added	Blackecho		I am just getting started too. The idea is stacking DAs, but I would expect a single DA to extract meaningful features even without stacking and fine tuning. Am I wrong? However, I'll try with a lower number of hidden nodes, thanks!
Feb 9, 2016 at 9:51	history	answered	johnblund	CC BY-SA 3.0