Questions tagged [variational-inference]
The variational-inference tag has no summary.
103 questions
1 vote
1 answer
56 views
Bayes-by-backprop - meaning of partial derivative
The Google Deepmind paper "Weight Uncertainty in Neural Networks" features the following algorithm: Note that the $\frac{∂f(w,θ)}{∂w}$ term of the gradients for the mean and standard ...
3 votes
0 answers
60 views
Normalizing observations in a nonlinear state space model
I am modelling the the sequence $\{(a_t,y_t)\}_t$ as follows: $$ \begin{cases} Y_{t+1} &= g_\nu(X_{t+1}) + \alpha V_{t+1}\\ X_{t+1} &= X_t + \mu_\xi(a_t) + \sigma_\psi(a_t)Z_{t+1}\\ X_0 &= ...
3 votes
1 answer
129 views
Bayesian Clustering with a Finite Gaussian Mixture Model with Missing Data
I would like to perform clustering with a finite Gaussian Mixture model, however, I have missing data (some features are missing at random). I am using Variational Inference to fit my Bayesian GMM. Is ...
0 votes
0 answers
28 views
Why Does the Posterior Estimation of Latent Variables in Binary PPCA is different from Ground Truth?
I’ve been working on implementing a binary variant of probabilistic PCA (PPCA) in Python (based on this paper), which uses variational EM for parameter estimation due to the non-conjugacy between the ...
0 votes
0 answers
128 views
Normalizing Flow with Highly Negative NLL Loss
I am following the Zuko "Train From Data" tutorial to train a Neural Spline Flow. My goal is to approximate a distribution over functions. Therefore, each of my function samples are actually ...
1 vote
1 answer
85 views
Why is the target 𝑦 used as an input to the encoder in a semi-supervised VAE model?
As mentioned in the title, I understand the mathematical derivation of equations (6-7) in Kingma's original paper. \begin{equation} \log p_\theta(\mathbf{x}, y) \geq \mathbb{E}_{q_\phi(\mathbf{z} \mid ...
2 votes
1 answer
115 views
Posterior estimation using VAE
Using normalizing flows, we can model model's posteriors $p(\theta|D)$, by feeding Gaussian noise $z$ to the NF (parametrized with $\phi$), using the output of the NF $\theta$ as model parameters, and ...
2 votes
2 answers
169 views
VAEs - Two questions regarding the posterior and prior distribution derivations
I'm deeply failing to understand the first step in the ELBO derivation in VAEs. When asking my questions I'll also try to clearly state my assumptions and perhaps some of them are wrong to begin with: ...
1 vote
0 answers
92 views
How to speed up the following ELBO evaluation?
I have an estimation problem where I need to maximize the evidence lower bound: $$ \mathrm{ELBO} = -\frac{1}{2} \Bigg( \mathbb{E}_{q(\theta)} \left[ \mathrm{vec}(\mathbf{Z})^{\mathrm{H}} \mathbf{C}^{-...
1 vote
0 answers
71 views
Why do we need to marginalize when finding p(data) when latent variables are involved? (part of elbo derivation)
so confused with the derivation of elbo. In part of the derivation p(data) is intractable as it involves an integral over a high dimensional latent variable. I cant understand why the latent ...
3 votes
1 answer
111 views
When deriving ELBO to solve variational inference problems why do we know p(z) and p(x,z) but not p(x) and p(z|x)?
I am a bit lost with the derivation of ELBO because I dont understand why some distributions are known and some are unknown. I guess we know p(z) (the prior) because it was the last value of q(z) ...
1 vote
0 answers
161 views
ELBO & "backwards" KL divergence argument order
On wikipedia it says: "A simple interpretation of the KL divergence of P from Q [i.e. D_KL(P||Q)] is the expected excess surprise from using Q as a model instead of P when the actual distribution ...
0 votes
1 answer
842 views
Exploring vae latent space
I recently trained a AE and a VAE and used the latent variables of each for a clustering task. It seemed to work well, sensible clusters. The main reason for training the VAE was too gain more ...
2 votes
1 answer
224 views
Why sampling from the posterior is a good estimate for the Likelihood but sampling from the prior is bad?
In Variational Autoencoders (VAE), we have: $$ \log p_\theta(x) = \log \left[ \int p_\theta(x \mid z)p(z) \, dz \right] $$ where $ p_\theta(x \mid z) = \mathcal{N}(x; \mu_\theta(z), I) $ and $ p(z) = \...
2 votes
1 answer
150 views
Why is the forward process referred to as the "ground truth" in diffusion models?
I've seen in many tutorials on diffusion models refer to the distribution of the latent variables induced by the forward process as "ground truth". I wonder why. What we can actually see is ...