If I would do loss = loss/10 before calculating the gradient would that change the amount of change applied to the model parameters during back propagation?
Or is the amount of change only dependent on the direction of the gradient and the learning rate?
I'm especially interested in how this would work in pytorch.