What is the difference in "weight update process" in gradient descent vs Stochastic gradient descent?

Question

Question

In normal GD the weights are updated for every row in the training dataset while in SGD the weights are updated only once for the mini batch based on cummulative dLoss/dw1, dLoss/dw2 . Is my understanding correct?
Does updating weights only once for m examples based on cumulative dLoss/dw1, dLoss/dw2 give same result as updating weights for every row in training data? But just that SGD is faster?

My question is with reference to the below Back propagation algorithm from Coursera Deep learning course by Andre NG

Jakub Bartczuk · Accepted Answer · 2019-09-14 11:59:51Z

In general you calculate gradient of some error (which most reasonably will be some kind of average of per-row errors) with respect to weights.

Yes, but the error is not accumulated, only averaged. When using full dataset you calculate average over the whole dataset. In minibatch SGD you do this for the minibatch, so you can treat this error and gradient respectively as estimators of 'real' error and gradient.
No, but if your estimate of error is good enough you might not care, since using minibatches can be much faster. For a concrete example see this notebook (minibatch SGD plot has oscillations, since the samples might not be representative of the whole distribution).

1 Answer 1