Skip to main content
added 27 characters in body
Source Link
Sycorax
  • 95.8k
  • 23
  • 246
  • 405

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE_loss because you can always make $-(x-c)^2$ smaller by choosing $x$ farther from $c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood, whence MSE is a rescaling.

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE_loss because you can always make $-(x-c)^2$ smaller by choosing $x$ farther from $c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE_loss because you can always make $-(x-c)^2$ smaller by choosing $x$ farther from $c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood, whence MSE is a rescaling.

added 5 characters in body
Source Link
Sycorax
  • 95.8k
  • 23
  • 246
  • 405

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSEMSE_loss because you can always make $-x^2$$-(x-c)^2$ smaller by choosing $x$ farther from 0$c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE because you can always make $-x^2$ smaller by choosing $x$ farther from 0.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE_loss because you can always make $-(x-c)^2$ smaller by choosing $x$ farther from $c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.

Source Link
Sycorax
  • 95.8k
  • 23
  • 246
  • 405

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE because you can always make $-x^2$ smaller by choosing $x$ farther from 0.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.