Return to Answer

added 76 characters in body

edited Jun 9, 2023 at 19:23

5.3k
2
18
39

I'm quoting the relevant part in the left column of page 4 (above equation (8)) in the paper you mentioned to give context:

If we let $w_{l−1}$ have a symmetric distribution around zero and $b_{l−1} = 0$, then $y_{l−1}$ has zero mean and has a symmetric distribution around zero. This leads to $E[x_l^2] = \frac{1}{2}\text{Var}[y_{l-1}]$ when $f$ is ReLU.

We are also given that $x_l = \max(0,y_{l-1})$. We can then use the law of total expectation to proceed as follows: \begin{align} E[x_l^2] &= p(y_{l-1} \geq 0)E[x_l^2 \mid y_{l-1} \geq 0] + p(y_{l-1} < 0) E[x_l^2 \mid y_{l-1} < 0] \end{align} However, when $y_{l-1} < 0$, $x_l = \max(0,y_{l-1}) = 0$, and so $E[x_l^2 \mid y_{l-1} < 0] = 0$. Therefore, \begin{align} E[x_l^2] &= p(y_{l-1} \geq 0)E[x_l^2 \mid y_{l-1} \geq 0] \end{align} Because $y_{l-1}$ has a symmetric distribution around zero, then $p(y_{l-1} \geq 0) = \frac{1}{2}$. So, \begin{align} E[x_l^2] &= \frac{1}{2}E[x_l^2 \mid y_{l-1} \geq 0] \end{align} Moreover, we have that \begin{align} E[x_l^2 \mid y_{l-1} \geq 0] &= E[(\max(0,y_{l-1}))^2 \mid y_{l-1} \geq 0] \\ &= E[y_{l-1}^2 \mid y_{l-1} \geq 0] \end{align} Because $y_{l-1}^2 \geq 0$ for every $y_{l-1} \in \mathbb R$, then we do not gain any new information about $y_{l-1}^2$ by learning that $y_{l-1} \geq 0$. This implies that $y_{l-1}^2$ is independent of the event $y_{l-1} \geq 0$. Therefore, \begin{align} E[x_l^2 \mid y_{l-1} \geq 0] &= E[y_{l-1}^2 \mid y_{l-1} \geq 0] \\ &= E[y_{l-1}^2] \end{align} Finally, because $y_{l-1}$ has zero mean, then $E[y_{l-1}^2] = \text{Var}(y_{l-1})$, and so $E[x_l^2] = \frac{1}{2}\text{Var}[y_{l-1}]$.

I'm quoting the relevant part in the left column of page 4 (above equation (8)) in the paper you mentioned to give context:

If we let $w_{l−1}$ have a symmetric distribution around zero and $b_{l−1} = 0$, then $y_{l−1}$ has zero mean and has a symmetric distribution around zero. This leads to $E[x_l^2] = \frac{1}{2}\text{Var}[y_{l-1}]$ when $f$ is ReLU.

We are also given that $x_l = \max(0,y_{l-1})$. We can then use the law of total expectation to proceed as follows: \begin{align} E[x_l^2] &= p(y_{l-1} \geq 0)E[x_l^2 \mid y_{l-1} \geq 0] + p(y_{l-1} < 0) E[x_l^2 \mid y_{l-1} < 0] \end{align} However, when $y_{l-1} < 0$, $x_l = \max(0,y_{l-1}) = 0$, and so $E[x_l^2 \mid y_{l-1} < 0] = 0$. Therefore, \begin{align} E[x_l^2] &= p(y_{l-1} \geq 0)E[x_l^2 \mid y_{l-1} \geq 0] \end{align} Because $y_{l-1}$ has a symmetric distribution around zero, then $p(y_{l-1} \geq 0) = \frac{1}{2}$. So, \begin{align} E[x_l^2] &= \frac{1}{2}E[x_l^2 \mid y_{l-1} \geq 0] \end{align} Moreover, we have that \begin{align} E[x_l^2 \mid y_{l-1} \geq 0] &= E[(\max(0,y_{l-1}))^2 \mid y_{l-1} \geq 0] \\ &= E[y_{l-1}^2 \mid y_{l-1} \geq 0] \end{align} Because $y_{l-1}^2 \geq 0$ for every $y_{l-1} \in \mathbb R$, then we do not gain any new information about $y_{l-1}^2$ by learning that $y_{l-1} \geq 0$. Therefore, \begin{align} E[x_l^2 \mid y_{l-1} \geq 0] &= E[y_{l-1}^2 \mid y_{l-1} \geq 0] \\ &= E[y_{l-1}^2] \end{align} Finally, because $y_{l-1}$ has zero mean, then $E[y_{l-1}^2] = \text{Var}(y_{l-1})$, and so $E[x_l^2] = \frac{1}{2}\text{Var}[y_{l-1}]$.

I'm quoting the relevant part in the left column of page 4 (above equation (8)) in the paper you mentioned to give context:

If we let $w_{l−1}$ have a symmetric distribution around zero and $b_{l−1} = 0$, then $y_{l−1}$ has zero mean and has a symmetric distribution around zero. This leads to $E[x_l^2] = \frac{1}{2}\text{Var}[y_{l-1}]$ when $f$ is ReLU.

Source Link

answered Jun 5, 2023 at 23:42

Mahmoud

5.3k
2
18
39

I'm quoting the relevant part in the left column of page 4 (above equation (8)) in the paper you mentioned to give context:

If we let $w_{l−1}$ have a symmetric distribution around zero and $b_{l−1} = 0$, then $y_{l−1}$ has zero mean and has a symmetric distribution around zero. This leads to $E[x_l^2] = \frac{1}{2}\text{Var}[y_{l-1}]$ when $f$ is ReLU.

We are also given that $x_l = \max(0,y_{l-1})$. We can then use the law of total expectation to proceed as follows: \begin{align} E[x_l^2] &= p(y_{l-1} \geq 0)E[x_l^2 \mid y_{l-1} \geq 0] + p(y_{l-1} < 0) E[x_l^2 \mid y_{l-1} < 0] \end{align} However, when $y_{l-1} < 0$, $x_l = \max(0,y_{l-1}) = 0$, and so $E[x_l^2 \mid y_{l-1} < 0] = 0$. Therefore, \begin{align} E[x_l^2] &= p(y_{l-1} \geq 0)E[x_l^2 \mid y_{l-1} \geq 0] \end{align} Because $y_{l-1}$ has a symmetric distribution around zero, then $p(y_{l-1} \geq 0) = \frac{1}{2}$. So, \begin{align} E[x_l^2] &= \frac{1}{2}E[x_l^2 \mid y_{l-1} \geq 0] \end{align} Moreover, we have that \begin{align} E[x_l^2 \mid y_{l-1} \geq 0] &= E[(\max(0,y_{l-1}))^2 \mid y_{l-1} \geq 0] \\ &= E[y_{l-1}^2 \mid y_{l-1} \geq 0] \end{align} Because $y_{l-1}^2 \geq 0$ for every $y_{l-1} \in \mathbb R$, then we do not gain any new information about $y_{l-1}^2$ by learning that $y_{l-1} \geq 0$. Therefore, \begin{align} E[x_l^2 \mid y_{l-1} \geq 0] &= E[y_{l-1}^2 \mid y_{l-1} \geq 0] \\ &= E[y_{l-1}^2] \end{align} Finally, because $y_{l-1}$ has zero mean, then $E[y_{l-1}^2] = \text{Var}(y_{l-1})$, and so $E[x_l^2] = \frac{1}{2}\text{Var}[y_{l-1}]$.