The problem is return -MSE_loss + KLDiv_Loss. You don't want to minimize -MSE_loss because you can always make $-(x-c)^2$ smaller by choosing $x$ farther from $c$. If $c$ is your target, this means your model is getting further from your goal.
Use return MSE_loss + KLDiv_Loss instead. You can show that this is correct by starting from a Gaussian likelihood for your target tgts and manipulating the algebra to obtain the negative log-likelihood, whence MSE is a rescaling.