Linked Questions

46 votes
3 answers
125k views

I am training a model (Recurrent Neural Network) to classify 4 types of sequences. As I run my training I see the training loss going down until the point where I correctly classify over 90% of the ...
dins2018's user avatar
  • 463
1 vote
1 answer
2k views

Is the following statement true: Gradient descent is guaranteed to always decrease a loss function. I know that if the loss function is convex, then each iteration of gradient descent will result in ...
Shrey's user avatar
  • 205
0 votes
1 answer
1k views

When performing stochastic gradient descent, it is necessary for the training loss to decrease a) between iterations in an epoch? (I think the answer is no) b) between epochs? (I think the answer is ...
ved's user avatar
  • 1,222
0 votes
1 answer
551 views

I'm currently trying to get the basics of Pytorch, playing around with simple networks topologies for the fashion-MNIST dataset. However, when I record the loss of those models after each epochs, it ...
Seb's user avatar
  • 23
2 votes
0 answers
336 views

When learning about Neural Networks and Gradient Descent, we are often shown the following picture that illustrates the obstacles that can be encountered when trying to optimize the Loss Functions ...
stats_noob's user avatar
1 vote
0 answers
263 views

I am trying to use Hugginface Datasets for speech recognition using transformers using this tutorial, epochs=30, steps=400, train_batch_size=16. Training loss, validation loss and WER decrease, and ...
user1680859's user avatar
0 votes
1 answer
111 views

If a gradient points towards a max or a min what stops gradient descent from maximizing error instead of minimizing it? Is it the nature of the update step that makes this process one way?
Jatearoon Keene Boondicharern's user avatar
0 votes
0 answers
105 views

I have created a neural network that feeds an image into a convolutional neural net, then feeds the flattened output of this network into an artificial neural network. I have a feeling that my ...
Nick's user avatar
  • 33
375 votes
9 answers
377k views

I'm training a neural network but the training loss doesn't decrease. How can I fix this? I'm not asking about overfitting or regularization. I'm asking about how to solve the problem where my network'...
Sycorax's user avatar
  • 95.8k
30 votes
6 answers
11k views

Given a convex cost function, using SGD for optimization, we will have a gradient (vector) at a certain point during the optimization process. My question is, given the point on the convex, does the ...
CyberPlayerOne's user avatar
26 votes
4 answers
10k views

I guess this is a basic question and it has to do with the direction of the gradient itself, but I'm looking for examples where 2nd order methods (e.g. BFGS) are more effective than simple gradient ...
Bar's user avatar
  • 2,982
13 votes
4 answers
3k views

I am trying to understand gradient descent optimization in ML(machine learning) algorithms. I understand that there's a cost function—where the aim is to minimize the error $\hat y-y$. In a ...
Pb89's user avatar
  • 365
6 votes
2 answers
15k views

I've implemented my own gradient descent algorithm for an OLS, code below. It work's, however, when the learning rate is too large (i.e. learn_rate >= .3), my approach is unstable. The coefficient's ...
Jacob H's user avatar
  • 942
10 votes
2 answers
8k views

I've noticed in different papers that after a certain number of epochs there sometimes is a sudden drop in error rate when training a CNN. This example is taken from the "Densely Connected ...
peter griffin's user avatar
1 vote
0 answers
3k views

I just wondered if there are cases where small or very small learning rates in gradient descent based optimization are useful? A large learning rate allows the model to explore a much larger portion ...
Gilfoyle's user avatar
  • 681

15 30 50 per page