Timeline for Semi-supervised classification objective from Kingma et al
Current License: CC BY-SA 4.0
5 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Oct 27, 2021 at 15:12 | comment | added | shimao | Well, if you're lucky and there are only a small number of classes $y$, this should only be a small amount of extra computation. Alternatively, it can be lower bounded by $E_y[\log p(x|y,z)]$, and you could just use a one-sample estimate for this. | |
| Oct 27, 2021 at 15:00 | comment | added | Ruben van Bergen | Actually one thing is still not clear to me: how do you adapt this to the case where y is sometimes unobserved, e.g. in semi-supervised training (which is what they do in the paper)? Because you now get $q(\mathbf{\pi})$ out of your inference network - not $q(y)$. So you now need to evaluate $\log p(\mathbf{x}|\mathbf{\pi},\mathbf{z}) =\log\int p(\mathbf{x}|y,\mathbf{z})p(y|\mathbf{\pi}) $ which seems intractable. | |
| Oct 26, 2021 at 12:40 | history | bounty awarded | Ruben van Bergen | ||
| Oct 26, 2021 at 9:04 | comment | added | Ruben van Bergen | Thanks! This was essentially my thinking too, but I was thrown off by the wording which suggests that the precise objective of their eq. 9 may be derived this way. From that perspective, I also couldn't explain the expectation under $\tilde{p}_l$, rather than under a belief $q$, but I think that falls out of having $q(\pi)$ be a point mass as you suggested. The expectation under that $q$ then disappears, and the other expectation is just because we sum over items in the dataset (and not directly related to the ELBO). | |
| Oct 25, 2021 at 2:13 | history | answered | shimao | CC BY-SA 4.0 |