Timeline for Semi-supervised classification objective from Kingma et al

5 events

when toggle format	what		by	license	comment
Oct 27, 2021 at 15:12	comment	added	shimao		Well, if you're lucky and there are only a small number of classes $y$, this should only be a small amount of extra computation. Alternatively, it can be lower bounded by $E_y[\log p(x\|y,z)]$, and you could just use a one-sample estimate for this.
Oct 27, 2021 at 15:00	comment	added	Ruben van Bergen		Actually one thing is still not clear to me: how do you adapt this to the case where y is sometimes unobserved, e.g. in semi-supervised training (which is what they do in the paper)? Because you now get $q(\mathbf{\pi})$ out of your inference network - not $q(y)$. So you now need to evaluate $\log p(\mathbf{x}\|\mathbf{\pi},\mathbf{z}) =\log\int p(\mathbf{x}\|y,\mathbf{z})p(y\|\mathbf{\pi}) $ which seems intractable.
Oct 26, 2021 at 12:40	history	bounty awarded	Ruben van Bergen
Oct 26, 2021 at 9:04	comment	added	Ruben van Bergen		Thanks! This was essentially my thinking too, but I was thrown off by the wording which suggests that the precise objective of their eq. 9 may be derived this way. From that perspective, I also couldn't explain the expectation under $\tilde{p}_l$, rather than under a belief $q$, but I think that falls out of having $q(\pi)$ be a point mass as you suggested. The expectation under that $q$ then disappears, and the other expectation is just because we sum over items in the dataset (and not directly related to the ELBO).
Oct 25, 2021 at 2:13	history	answered	shimao	CC BY-SA 4.0