1
$\begingroup$

I am training a deep learning model, the loss function of which is of the form

$$ \cal{L} = \cal{L_1} + \cal{L_2} $$

where $\cal{L_1}$ and $\cal{L_2}$ are of very different orders. WLOG, let's assume the order of $\cal{L_1}$ is much higher than the order of $\cal{L_2}$.

During the first several epochs of training, the model will attempt to minimize $\cal{L_1}$ largely. However, after a certain number of epochs, the value of $\cal{L_1}$ will converge.

My question is, what will happen now? Specifically, I have three questions:

  • Does the convergence of $\cal{L_1}$ imply the convergence of $\cal{L}$, which means the training is over and the loss function behaved as if it was essentially $\cal{L} = \cal{L_1}$?

  • Since $\cal{L_1}$ has now converged, does that imply $\frac{\partial{\cal{L_1}}}{\partial{\theta}} \approx 0$? (where $\theta$ are the model parameters)

  • If the above point is true, then since the model parameters are updated based on $\frac{\partial{\cal{L}}}{\partial \theta}$, does that imply that the model will now start minimizing $\cal{L_2}$ (since $\frac{\partial{\cal{L}}}{\partial \theta} \approx \frac{\partial \cal{L_2}}{\partial \theta}$)?

$\endgroup$

1 Answer 1

0
$\begingroup$
  1. not generally, consider a case where $L_2 = 1/L_1$, if one converges to 0, the other one diverges
  2. Yes, that's the usual definition of convergence, though the "$\approx$" is not really well defined usually
  3. exactly, however, after one update, most likely you will have that $\frac{dL_1}{d\theta}$ won't be 0 anymore
$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.