Revisions to Implementing a VAE in pytorch - extremely negative training loss

added 27 characters in body

edited Jan 30, 2022 at 15:51

95.8k
23
246
405

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE_loss because you can always make $-(x-c)^2$ smaller by choosing $x$ farther from $c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood, whence MSE is a rescaling.

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE_loss because you can always make $-(x-c)^2$ smaller by choosing $x$ farther from $c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE_loss because you can always make $-(x-c)^2$ smaller by choosing $x$ farther from $c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood, whence MSE is a rescaling.

added 5 characters in body

Source Link

edited Jan 30, 2022 at 13:46

Sycorax ♦

95.8k
23
246
405

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSEMSE_loss because you can always make $-x^2$$-(x-c)^2$ smaller by choosing $x$ farther from 0$c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE because you can always make $-x^2$ smaller by choosing $x$ farther from 0.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE_loss because you can always make $-(x-c)^2$ smaller by choosing $x$ farther from $c$. If $c$ is your target, this means your model is getting further from your goal.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.

Source Link

answered Jan 30, 2022 at 13:39

Sycorax ♦

95.8k
23
246
405

The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE because you can always make $-x^2$ smaller by choosing $x$ farther from 0.

Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood.

Stack Exchange Network

Return to Answer