Linked Questions

Question 1

I am following deep learning.ai's videos on Coursera. I have a couple of questions about feature scaling using the formula: $$ (x - \mu)/ \sigma $$ Edit: There are similar questions which deal with ...

Question 2

I have created a neural network that feeds an image into a convolutional neural net, then feeds the flattened output of this network into an artificial neural network. I have a feeling that my ...

Question 3

I guess this is a basic question and it has to do with the direction of the gradient itself, but I'm looking for examples where 2nd order methods (e.g. BFGS) are more effective than simple gradient ...

Question 4

In chapter 1 of Nielsen's Neural Networks and Deep Learning it says To make gradient descent work correctly, we need to choose the learning rate η to be small enough that Equation (9) is a good ...

Question 5

I'm exploring preconditioned gradient descent using a similar toy problem described in the first part of Lecture 8: Accelerating SGD with preconditioning and adaptive learning rates. I have the ...

Question 6

My understanding is that we use the logit function to convert the sigmoidal curve of a logistic regression to be linear. As a result, we go from a curve modeled as P = ea+bX / (1 + ea+bX) to one that ...

Question 7

Is there any paper which summarizes the mathematical foundation of deep learning? Now, I am studying about the mathematical background of deep learning. However, unfortunately I cannot know to what ...

Question 8

First of all, I understand this isn't a strictly statistical question, but I've seen other questions involving optim() here. Please feel free to suggest a better SE ...

Question 9

I am doing some research on problems, for which the stochastic gradient descent doesn't perform well. Often SGD is mentioned as the best method for the training of neural networks. However, I've also ...

Question 10

Gradient descent uses the first order derivative information of the objective function as a function of the parameters. Gradient descent therefore uses only “local” information about the objective ...

Question 11

I have a question about the gradient descent step in neural networks. I fully understand the derivative step and taking the steps required to move in the direction that reduces the loss (finding the ...

Question 12

I'm working with the following code for gradient descent for simple linear regression: ...

Question 13

I tried to build a neural net for learning XOR. The design is as follows: 1st layer: compute linear function of input 4:2 with 2:2 weights and adding 1:2 bias. 2nd layer: apply sigmoid to all ...

Question 14

Assume we have a neural network with stochastic gradient descent used for backpropagation, and therefore each element in the training set is used once to calculate the error, and then to adjust the ...

Question 15

I am having a hard time understanding the Gradient Descent Rule for learning in a feedforward ANN. In particular, how do we determine the initial weight vector, and how is this weight vector adjusted ...

Question 16

If a gradient points towards a max or a min what stops gradient descent from maximizing error instead of minimizing it? Is it the nature of the update step that makes this process one way?

Stack Exchange Network

Linked Questions

How does Feature Scaling help Gradient Descent? [duplicate]

The result of back propagation for a neural network [duplicate]

Why are second-order derivatives useful in convex optimization?

How can change in cost function be positive?

Basic preconditioned gradient descent example

Is there a reason we need to make a logistic regression linear using the logit?

Is there any paper which summarizes the mathematical foundation of deep learning? [closed]

How to deal with unstable estimates during curve fitting?

Problems, which are difficult for SGD

Can we apply analyticity of a neural network to improve upon gradient descent? [duplicate]

Why does gradient descent HAVE to find the minimum as oppose to a change in the opposite direction

alternating negative and positive value of slope and y-intercept in gradient descent

Interpreting cost change plot in a neural network for learning XOR

how does a neural network with stochastic backpropagation make sure it doesn't "undo" previous learning?

Gradient Descent Rule in feedforward ANN

What stops gradient descent from finding the largest error? [duplicate]

Hot Network Questions