Skip to main content
5 events
when toggle format what by license comment
Oct 27, 2021 at 15:12 comment added shimao Well, if you're lucky and there are only a small number of classes $y$, this should only be a small amount of extra computation. Alternatively, it can be lower bounded by $E_y[\log p(x|y,z)]$, and you could just use a one-sample estimate for this.
Oct 27, 2021 at 15:00 comment added Ruben van Bergen Actually one thing is still not clear to me: how do you adapt this to the case where y is sometimes unobserved, e.g. in semi-supervised training (which is what they do in the paper)? Because you now get $q(\mathbf{\pi})$ out of your inference network - not $q(y)$. So you now need to evaluate $\log p(\mathbf{x}|\mathbf{\pi},\mathbf{z}) =\log\int p(\mathbf{x}|y,\mathbf{z})p(y|\mathbf{\pi}) $ which seems intractable.
Oct 26, 2021 at 12:40 history bounty awarded Ruben van Bergen
Oct 26, 2021 at 9:04 comment added Ruben van Bergen Thanks! This was essentially my thinking too, but I was thrown off by the wording which suggests that the precise objective of their eq. 9 may be derived this way. From that perspective, I also couldn't explain the expectation under $\tilde{p}_l$, rather than under a belief $q$, but I think that falls out of having $q(\pi)$ be a point mass as you suggested. The expectation under that $q$ then disappears, and the other expectation is just because we sum over items in the dataset (and not directly related to the ELBO).
Oct 25, 2021 at 2:13 history answered shimao CC BY-SA 4.0