Does reducing the loss change the amount of change during backpropagation?

Question

If I would do loss = loss/10 before calculating the gradient would that change the amount of change applied to the model parameters during back propagation?

Or is the amount of change only dependent on the direction of the gradient and the learning rate?

I'm especially interested in how this would work in pytorch.

Karl · Accepted Answer · 2024-12-17 05:55:51Z

By the chain rule, scaling the loss by a scalar value c, ie loss = c*loss, will cause all gradients computed via backprop to also be scaled by c. ie loss -> c*loss, grad -> c*grad

Scaling the gradients by c changes the magnitude of the gradient vectors but not the direction.

In a gradient descent context, scaling the gradients by c is equivalent to scaling the learning rate by c. ie:

loss = ... w_new = w_old - lr * grad

becomes

loss = c*loss w_new = w_old - lr * c * grad # scaling loss by c -> scaling grad by c w_new = w_old - lr_scaled * grad # lr_scaled = lr * c

Stack Exchange Network

Does reducing the loss change the amount of change during backpropagation?

1 Answer 1

Hot Network Questions

Does reducing the loss change the amount of change during backpropagation?

1 Answer 1

Related

Hot Network Questions