Linked Questions

1 vote
0 answers
303 views

I am following deep learning.ai's videos on Coursera. I have a couple of questions about feature scaling using the formula: $$ (x - \mu)/ \sigma $$ Edit: There are similar questions which deal with ...
Nitin's user avatar
  • 349
0 votes
0 answers
105 views

I have created a neural network that feeds an image into a convolutional neural net, then feeds the flattened output of this network into an artificial neural network. I have a feeling that my ...
Nick's user avatar
  • 33
26 votes
4 answers
10k views

I guess this is a basic question and it has to do with the direction of the gradient itself, but I'm looking for examples where 2nd order methods (e.g. BFGS) are more effective than simple gradient ...
Bar's user avatar
  • 2,982
7 votes
1 answer
4k views

In chapter 1 of Nielsen's Neural Networks and Deep Learning it says To make gradient descent work correctly, we need to choose the learning rate η to be small enough that Equation (9) is a good ...
fabiomaia's user avatar
  • 639
6 votes
3 answers
2k views

I'm exploring preconditioned gradient descent using a similar toy problem described in the first part of Lecture 8: Accelerating SGD with preconditioning and adaptive learning rates. I have the ...
Quantoisseur's user avatar
1 vote
3 answers
2k views

My understanding is that we use the logit function to convert the sigmoidal curve of a logistic regression to be linear. As a result, we go from a curve modeled as P = ea+bX / (1 + ea+bX) to one that ...
theforestecologist's user avatar
7 votes
1 answer
1k views

Is there any paper which summarizes the mathematical foundation of deep learning? Now, I am studying about the mathematical background of deep learning. However, unfortunately I cannot know to what ...
almnagako's user avatar
9 votes
1 answer
1k views

First of all, I understand this isn't a strictly statistical question, but I've seen other questions involving optim() here. Please feel free to suggest a better SE ...
overdisperse's user avatar
3 votes
2 answers
716 views

I am doing some research on problems, for which the stochastic gradient descent doesn't perform well. Often SGD is mentioned as the best method for the training of neural networks. However, I've also ...
Lisa's user avatar
  • 31
1 vote
1 answer
209 views

Gradient descent uses the first order derivative information of the objective function as a function of the parameters. Gradient descent therefore uses only “local” information about the objective ...
user56834's user avatar
  • 3,077
1 vote
1 answer
335 views

I have a question about the gradient descent step in neural networks. I fully understand the derivative step and taking the steps required to move in the direction that reduces the loss (finding the ...
user9317212's user avatar
1 vote
1 answer
447 views

I'm working with the following code for gradient descent for simple linear regression: ...
boomselector's user avatar
0 votes
1 answer
324 views

I tried to build a neural net for learning XOR. The design is as follows: 1st layer: compute linear function of input 4:2 with 2:2 weights and adding 1:2 bias. 2nd layer: apply sigmoid to all ...
kirgol's user avatar
  • 117
3 votes
1 answer
163 views

Assume we have a neural network with stochastic gradient descent used for backpropagation, and therefore each element in the training set is used once to calculate the error, and then to adjust the ...
user56834's user avatar
  • 3,077
2 votes
1 answer
156 views

I am having a hard time understanding the Gradient Descent Rule for learning in a feedforward ANN. In particular, how do we determine the initial weight vector, and how is this weight vector adjusted ...
David's user avatar
  • 1,276
0 votes
1 answer
111 views

If a gradient points towards a max or a min what stops gradient descent from maximizing error instead of minimizing it? Is it the nature of the update step that makes this process one way?
Jatearoon Keene Boondicharern's user avatar