I'm trying to implement GD for standard task of NN training :) The best papers for practioneer I've founded so far are:
1) "Efficient BackProp" by Yann LeCun et al.
2) "Stochastic Gradient Descent Tricks" by Leon Bottou
Are there some other must read papers on this topic?
Thank you!