1
$\begingroup$

I wanted to try to derive the Gumbel softmax trick for myself from scratch and I get stuck on the last step of this webpage where it says the integral has a closed form which arrives at the solution.

To recap what is on that page...

The Gumbel distribution is given by the PDF  $f(x) = e^{-(z - \mu + e^{-(z - \mu)})}$ . It is easy to show that the PDF is a proper distribution and integrates to one, as the CDF is given by the following if we assume  $\mu = 0$ ,

$$ \begin{aligned} F(x) &= \int_{-\infty}^\infty e^{-(z + e^{-z })} \\ &= \int_{-\infty}^\infty e^{-z} e^{-(e^{-z })} \\ &= \left[ e^{-(e^{-z})} \right]^\infty_{-\infty} \\ &= \frac{1}{e^{\frac{1}{e^{\infty}}}} - \frac{1}{e^{\frac{1}{e^{-\infty}}}} \\ &= 1 - 0 \end{aligned} $$

Therefore, if we sample some gumbel noise for each logit, it will result in some outcome  $z_k$ given the location  $x_k$ , therefore the probability that  $z_k$  is the largest given  $x_k$  and all the  $x_k$ 's is given by the following expression which uses the CDF derived above. (which gives the probability that the outcome is less than  $z_k$ .)

$$ p(z_k \text{ is the largest } | z_k, \{x_k^\prime\}_{k^\prime = 1}^K) = \prod_{k^\prime \neq k} e^{-e^{-(z_k - x_{k^\prime})}} $$

We now have to do some integrating to get the final probability that k is the largest given the logits,

$$ \begin{aligned} p(z_k \text{ is the largest } | \{x_k^\prime\}_{k^\prime = 1}^K) &= \int p(z_k \text{ is the largest } | z_k, \{x_k^\prime\}_{k^\prime = 1}^K) \;\; p(z_k)\;\; dz_k \\ &= \int e^{-(z_k - x_k + e^{-(z_k - x_k)})} \prod_{k^\prime \neq k} e^{-e^{-(z_k - x_{k^\prime})}} dz_k \\ &= \int e^{-(z_k - x_k + e^{-(z_k - x_k)})} e^{-\sum_{k^\prime \neq k} e^{-(z_k - x_{k^\prime})}} dz_k\\ &= \int e^{(-z_k + x_k - e^{-z_k + x_k})-\sum_{k^\prime \neq k} e^{(-z_k + x_{k^\prime})}} dz_k \\ &= \int e^{(-z_k + x_k) -e^{-z_k} \sum_{k^\prime} e^{x_{k^\prime}}} dz_k \\ &= \dots \\ &= \frac{e^{x_k}}{\sum_{k^\prime} e^{x_k^\prime}} \end{aligned} $$

The problem is that every source I can find online skips to the end and says the integral has a closed form without showing how to get there. I have tried to perform the integral above and failed. Is there any resource or integration trick which can show why this is true?

$\endgroup$

1 Answer 1

3
$\begingroup$

The integral you are trying to evaluate is with respect to $z_k$ and the limits run from $z_k=-\infty$ to $z_k=\infty$. Factoring out the constant $e^{x_k}$ and abbreviating $c:=\sum_{k'}e^{x_{k'}}$, the integral can be written $$ e^{x_k}\int_{z_k=-\infty}^\infty e^{-z_k} \exp\left(-ce^{-z_k}\right)dz_k. $$ Make a change of variables: $u:=e^{-z_k}$, $du=-e^{-z_k}dz_k$. Then the integral becomes $$ e^{x_k}\int_{u=\infty}^0 e^{-cu}(-du)=e^{x_k}\int_{u=0}^{\infty} e^{-cu}du= e^{x_k}\frac1c=\frac{e^{x_k}}{\sum_{k'}e^{x_{k'}}}. $$

$\endgroup$
1
  • $\begingroup$ thanks. It always looks too easy when someone else does it, but its hard to do alone $\endgroup$ Commented Jan 31, 2023 at 7:12

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.