Skip to main content
4 events
when toggle format what by license comment
Feb 10, 2016 at 15:16 comment added Blackecho I initialize the weights using a uniform random distribution between - k * sqrt(6 / (fan in + fan out)) and k * sqrt(6 / (fan in + fan out)) where I use k = 1 with the tanh activation function and k = 4 for the sigmoid. Thanks for making me notice that I forgot to include the weights initialization function in the code :)
Feb 9, 2016 at 13:16 comment added johnblund Yeah, it should definitely be possible, but naturally it seems that a lower number of hidden units would easier find a lower dimensional representation of the data (abstract features). I am not sure how you initialize the weights, but this as I understand is very important to get convergence. You want them to be in the linear region of the activation function at the start to get the gradient descent going.
Feb 9, 2016 at 11:44 comment added Blackecho I am just getting started too. The idea is stacking DAs, but I would expect a single DA to extract meaningful features even without stacking and fine tuning. Am I wrong? However, I'll try with a lower number of hidden nodes, thanks!
Feb 9, 2016 at 9:51 history answered johnblund CC BY-SA 3.0