Question
- In normal GD the weights are updated for every row in the training dataset while in SGD the weights are updated only once for the mini batch based on cummulative dLoss/dw1, dLoss/dw2 . Is my understanding correct?
- Does updating weights only once for m examples based on cumulative dLoss/dw1, dLoss/dw2 give same result as updating weights for every row in training data? But just that SGD is faster?
My question is with reference to the below Back propagation algorithm from Coursera Deep learning course by Andre NG
