Revisions to Derivation of the standard error for Pearson's correlation coefficient

added 7 characters in body

edited May 8, 2024 at 17:35

101
1
4

There are two equations here for computing the statistical significance of the correlation coefficient. The first is the variance of the true correlation coefficient $\rho$ of two bivariate normal random variables:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$.

Both of these expression can derived from the principle of maximum-likelihood. That is, if we assume a parameter $\theta$ should be distributed normally,\begin{equation}\mathcal{L}\left(\theta\right)\sim\exp{\left(-\frac{\theta^2}{2\sigma^2_{\theta}}\right)},\end{equation}then the standard error of the parameter can be estimated from the curvature of the log-likelihood $\ell=\log{\mathcal{L}}$, function via\begin{equation}\sigma^2_{\theta}=\frac{-1}{\frac{\partial^2\ell}{\partial\theta^2}\bigr|_{\theta=\hat{\theta}}},\end{equation}where $\hat{\theta}$ is the maximum-likelihood estimate of $\theta$, got from the condition\begin{equation}\frac{\partial\ell}{\partial\theta}\bigr|_{\theta=\hat{\theta}}=0.\end{equation}

Now, Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can summarize his method here. If we let $f$ be the bivariate normal density of two zero-mean random variables, i.e.,\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)},\end{equation}then we can get the variance $\sigma^2_{\rho}$ of the correlation coefficient by evaluating $\frac{-\partial^2\log{f}}{\partial \rho^2}\bigr|_{\rho=\hat{\rho}}$. The first derivative of $\log{f}$ is\begin{align}\frac{\partial\log{f}}{\partial \rho}&=\frac{\rho}{1-\rho^2}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_X^2}+\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)\frac{XY}{\sigma_X\sigma_Y}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_Y^2}\nonumber\\&=\frac{\rho}{1-\rho^2}+\left(\frac{2\rho}{1-\rho^2}\right)\left(\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right)+\frac{1}{1-\rho^2}\frac{XY}{\sigma_X\sigma_Y},\end{align}where at the maximum-likelihood solution $\hat{\rho}=\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}$ the middle term becomes\begin{equation}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}=-\frac{\mathbb{E}\left(X^2\right)}{\sigma_X^2\left(1-\rho^2\right)}+\frac{\rho\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{\mathbb{E}\left(Y^2\right)}{\sigma_X^2\left(1-\rho^2\right)}=-1.\end{equation}Whence upon taking the second derivative and evaluating at $\rho=\hat{\rho}$, we get\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho\right)^2},\end{align}\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+\frac{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho^2\right)^2},\end{align}so that\begin{equation}\sigma^2_{\rho}=\frac{-1}{\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}=\frac{\left(1-\rho^2\right)^2}{1+\rho^2}.\end{equation}Then by the central limit theorem, the sampling variance of $\rho$ is $\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}$.

For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$ with normally-distributed error $\varepsilon$: if\begin{equation}Y=\alpha+\beta X+\varepsilon\end{equation}and\begin{equation}\sigma^2_Y=\beta^2\sigma^2_X+\sigma^2,\end{equation} then according to the relationship $\beta=\rho\frac{\sigma_Y}{\sigma_X}$, it must be the case that error variance is $\sigma^2=\left(1-\rho^2\right)\sigma^2_Y$. Then the distribution of $\varepsilon$ is:\begin{equation}\mathcal{L}\left(\varepsilon\right)\sim\Pi_i\exp{\left(-\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\sigma^2}\right)},\end{equation}so that the log-likelihood is\begin{equation}\ell=-\sum_i\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}The first two derivatives are\begin{equation}\frac{\partial\ell}{\partial \beta}=\sum_i\frac{\left(Y_i-\overline{Y}-\beta\left(X_i-\overline{X}\right)\right)\left(X_i-\overline{X}\right)}{\left(1-\rho^2\right)\sigma_Y^2}\end{equation}and\begin{equation}\frac{\partial^2\ell}{\partial\beta^2}=-\sum_i\frac{\left(X_i-\overline{X}\right)^2}{\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}Now, making the substitution $\beta=\rho\frac{\sigma_Y}{\sigma_X}$ and evaluating at $\rho=\hat{\rho}=r$ gives\begin{equation}\frac{-1}{\frac{\partial^2\ell}{\partial\beta^2}\bigr|_{\beta=\hat{\beta}}}=\frac{-1}{\frac{\partial^2\ell}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}\frac{\sigma^2_Y}{\sigma^2_X}=\frac{\sigma^2_Y\left(1-r^2\right)}{\sigma^2_X\left(n-2\right)},\end{equation}whence the sampling variance of the measured correlation coefficient $\hat{\rho}=r$ is\begin{equation}\sigma^2_r=\frac{1-r^2}{n-2},\end{equation} in which we lose two degrees of freedom from the estimation of two parameters $\alpha$ and $\beta$. Finally, we can form a t-statistic to test the hypothesis that $r=0$ using \begin{equation}t=\frac{r}{\sigma_r}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}For an alternate derivation, see The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation.

A comparison of the two formulas shows\begin{equation}\frac{\sigma^2_{\rho}}{n}=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}<\frac{1-r^2}{n-2}=\sigma^2_r.\end{equation}In other words, there is less variance of the true parameter $\rho$ than in the value estimated from linear regression. It should also be pointed out that Pearson's formula is only true for bivariate normal variables, while the standard error of $r$ is valid for any linear regression. However, we see that the test of whether $r=0$ is equivalent to the test of whether $\beta=0$ and does not really tell us anything new.

There are two equations here for computing the statistical significance of the correlation coefficient. The first is the variance of the true correlation coefficient $\rho$ of two bivariate normal random variables:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$.

Both of these expression can derived from the principle of maximum-likelihood. That is, if we assume a parameter $\theta$ should be distributed normally,\begin{equation}\mathcal{L}\left(\theta\right)\sim\exp{\left(-\frac{\theta^2}{2\sigma^2_{\theta}}\right)},\end{equation}then the standard error of the parameter can be estimated from the curvature of the log-likelihood $\ell=\log{\mathcal{L}}$, function via\begin{equation}\sigma^2_{\theta}=\frac{-1}{\frac{\partial^2\ell}{\partial\theta^2}\bigr|_{\theta=\hat{\theta}}},\end{equation}where $\hat{\theta}$ is the maximum-likelihood estimate of $\theta$, got from the condition\begin{equation}\frac{\partial\ell}{\partial\theta}\bigr|_{\theta=\hat{\theta}}=0.\end{equation}

Now, Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can summarize his method here. If we let $f$ be the bivariate normal density of two zero-mean random variables, i.e.,\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)},\end{equation}then we can get the variance $\sigma^2_{\rho}$ of the correlation coefficient by evaluating $\frac{-\partial^2\log{f}}{\partial \rho^2}\bigr|_{\rho=\hat{\rho}}$. The first derivative of $\log{f}$ is\begin{align}\frac{\partial\log{f}}{\partial \rho}&=\frac{\rho}{1-\rho^2}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_X^2}+\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)\frac{XY}{\sigma_X\sigma_Y}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_Y^2}\nonumber\\&=\frac{\rho}{1-\rho^2}+\left(\frac{2\rho}{1-\rho^2}\right)\left(\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right)+\frac{1}{1-\rho^2}\frac{XY}{\sigma_X\sigma_Y},\end{align}where at the maximum-likelihood solution $\hat{\rho}=\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}$ the middle term becomes\begin{equation}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}=-\frac{\mathbb{E}\left(X^2\right)}{\sigma_X^2\left(1-\rho^2\right)}+\frac{\rho\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{\mathbb{E}\left(Y^2\right)}{\sigma_X^2\left(1-\rho^2\right)}=-1.\end{equation}Whence upon taking the second derivative and evaluating at $\rho=\hat{\rho}$, we get\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho\right)^2},\end{align}so that\begin{equation}\sigma^2_{\rho}=\frac{-1}{\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}=\frac{\left(1-\rho^2\right)^2}{1+\rho^2}.\end{equation}Then by the central limit theorem, the sampling variance of $\rho$ is $\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}$.

For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$ with normally-distributed error $\varepsilon$: if\begin{equation}Y=\alpha+\beta X+\varepsilon\end{equation}and\begin{equation}\sigma^2_Y=\beta^2\sigma^2_X+\sigma^2,\end{equation} then according to the relationship $\beta=\rho\frac{\sigma_Y}{\sigma_X}$, it must be the case that error variance is $\sigma^2=\left(1-\rho^2\right)\sigma^2_Y$. Then the distribution of $\varepsilon$ is:\begin{equation}\mathcal{L}\left(\varepsilon\right)\sim\Pi_i\exp{\left(-\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\sigma^2}\right)},\end{equation}so that the log-likelihood is\begin{equation}\ell=-\sum_i\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}The first two derivatives are\begin{equation}\frac{\partial\ell}{\partial \beta}=\sum_i\frac{\left(Y_i-\overline{Y}-\beta\left(X_i-\overline{X}\right)\right)\left(X_i-\overline{X}\right)}{\left(1-\rho^2\right)\sigma_Y^2}\end{equation}and\begin{equation}\frac{\partial^2\ell}{\partial\beta^2}=-\sum_i\frac{\left(X_i-\overline{X}\right)^2}{\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}Now, making the substitution $\beta=\rho\frac{\sigma_Y}{\sigma_X}$ and evaluating at $\rho=\hat{\rho}=r$ gives\begin{equation}\frac{-1}{\frac{\partial^2\ell}{\partial\beta^2}\bigr|_{\beta=\hat{\beta}}}=\frac{-1}{\frac{\partial^2\ell}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}\frac{\sigma^2_Y}{\sigma^2_X}=\frac{\sigma^2_Y\left(1-r^2\right)}{\sigma^2_X\left(n-2\right)},\end{equation}whence the sampling variance of the measured correlation coefficient $\hat{\rho}=r$ is\begin{equation}\sigma^2_r=\frac{1-r^2}{n-2},\end{equation} in which we lose two degrees of freedom from the estimation of two parameters $\alpha$ and $\beta$. Finally, we can form a t-statistic to test the hypothesis that $r=0$ using \begin{equation}t=\frac{r}{\sigma_r}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}For an alternate derivation, see The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation.

A comparison of the two formulas shows\begin{equation}\frac{\sigma^2_{\rho}}{n}=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}<\frac{1-r^2}{n-2}=\sigma^2_r.\end{equation}In other words, there is less variance of the true parameter $\rho$ than in the value estimated from linear regression. It should also be pointed out that Pearson's formula is only true for bivariate normal variables, while the standard error of $r$ is valid for any linear regression. However, we see that the test of whether $r=0$ is equivalent to the test of whether $\beta=0$ and does not really tell us anything new.

There are two equations here for computing the statistical significance of the correlation coefficient. The first is the variance of the true correlation coefficient $\rho$ of two bivariate normal random variables:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$.

Both of these expression can derived from the principle of maximum-likelihood. That is, if we assume a parameter $\theta$ should be distributed normally,\begin{equation}\mathcal{L}\left(\theta\right)\sim\exp{\left(-\frac{\theta^2}{2\sigma^2_{\theta}}\right)},\end{equation}then the standard error of the parameter can be estimated from the curvature of the log-likelihood $\ell=\log{\mathcal{L}}$, function via\begin{equation}\sigma^2_{\theta}=\frac{-1}{\frac{\partial^2\ell}{\partial\theta^2}\bigr|_{\theta=\hat{\theta}}},\end{equation}where $\hat{\theta}$ is the maximum-likelihood estimate of $\theta$, got from the condition\begin{equation}\frac{\partial\ell}{\partial\theta}\bigr|_{\theta=\hat{\theta}}=0.\end{equation}

Now, Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can summarize his method here. If we let $f$ be the bivariate normal density of two zero-mean random variables, i.e.,\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)},\end{equation}then we can get the variance $\sigma^2_{\rho}$ of the correlation coefficient by evaluating $\frac{-\partial^2\log{f}}{\partial \rho^2}\bigr|_{\rho=\hat{\rho}}$. The first derivative of $\log{f}$ is\begin{align}\frac{\partial\log{f}}{\partial \rho}&=\frac{\rho}{1-\rho^2}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_X^2}+\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)\frac{XY}{\sigma_X\sigma_Y}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_Y^2}\nonumber\\&=\frac{\rho}{1-\rho^2}+\left(\frac{2\rho}{1-\rho^2}\right)\left(\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right)+\frac{1}{1-\rho^2}\frac{XY}{\sigma_X\sigma_Y},\end{align}where at the maximum-likelihood solution $\hat{\rho}=\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}$ the middle term becomes\begin{equation}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}=-\frac{\mathbb{E}\left(X^2\right)}{\sigma_X^2\left(1-\rho^2\right)}+\frac{\rho\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{\mathbb{E}\left(Y^2\right)}{\sigma_X^2\left(1-\rho^2\right)}=-1.\end{equation}Whence upon taking the second derivative and evaluating at $\rho=\hat{\rho}$, we get\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+\frac{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho^2\right)^2},\end{align}so that\begin{equation}\sigma^2_{\rho}=\frac{-1}{\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}=\frac{\left(1-\rho^2\right)^2}{1+\rho^2}.\end{equation}Then by the central limit theorem, the sampling variance of $\rho$ is $\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}$.

For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$ with normally-distributed error $\varepsilon$: if\begin{equation}Y=\alpha+\beta X+\varepsilon\end{equation}and\begin{equation}\sigma^2_Y=\beta^2\sigma^2_X+\sigma^2,\end{equation} then according to the relationship $\beta=\rho\frac{\sigma_Y}{\sigma_X}$, it must be the case that error variance is $\sigma^2=\left(1-\rho^2\right)\sigma^2_Y$. Then the distribution of $\varepsilon$ is:\begin{equation}\mathcal{L}\left(\varepsilon\right)\sim\Pi_i\exp{\left(-\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\sigma^2}\right)},\end{equation}so that the log-likelihood is\begin{equation}\ell=-\sum_i\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}The first two derivatives are\begin{equation}\frac{\partial\ell}{\partial \beta}=\sum_i\frac{\left(Y_i-\overline{Y}-\beta\left(X_i-\overline{X}\right)\right)\left(X_i-\overline{X}\right)}{\left(1-\rho^2\right)\sigma_Y^2}\end{equation}and\begin{equation}\frac{\partial^2\ell}{\partial\beta^2}=-\sum_i\frac{\left(X_i-\overline{X}\right)^2}{\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}Now, making the substitution $\beta=\rho\frac{\sigma_Y}{\sigma_X}$ and evaluating at $\rho=\hat{\rho}=r$ gives\begin{equation}\frac{-1}{\frac{\partial^2\ell}{\partial\beta^2}\bigr|_{\beta=\hat{\beta}}}=\frac{-1}{\frac{\partial^2\ell}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}\frac{\sigma^2_Y}{\sigma^2_X}=\frac{\sigma^2_Y\left(1-r^2\right)}{\sigma^2_X\left(n-2\right)},\end{equation}whence the sampling variance of the measured correlation coefficient $\hat{\rho}=r$ is\begin{equation}\sigma^2_r=\frac{1-r^2}{n-2},\end{equation} in which we lose two degrees of freedom from the estimation of two parameters $\alpha$ and $\beta$. Finally, we can form a t-statistic to test the hypothesis that $r=0$ using \begin{equation}t=\frac{r}{\sigma_r}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}For an alternate derivation, see The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation.

A comparison of the two formulas shows\begin{equation}\frac{\sigma^2_{\rho}}{n}=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}<\frac{1-r^2}{n-2}=\sigma^2_r.\end{equation}In other words, there is less variance of the true parameter $\rho$ than in the value estimated from linear regression. It should also be pointed out that Pearson's formula is only true for bivariate normal variables, while the standard error of $r$ is valid for any linear regression. However, we see that the test of whether $r=0$ is equivalent to the test of whether $\beta=0$ and does not really tell us anything new.

deleted 9 characters in body

Source Link

edited May 3, 2023 at 14:00

William Letsou

101
1
4

There are two equations here for computing the statistical significance of the correlation coefficient. The first is the variance of the true correlation coefficient $\rho$ of two bivariate normal random variables:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$.

Both of these expression can derived from the principle of maximum-likelihood. That is, if we assume a parameter $\theta$ should be distributed normally,\begin{equation}\mathcal{L}\left(\theta\right)\sim\exp{\left(-\frac{\theta^2}{2\sigma^2_{\theta}}\right)},\end{equation}then the standard error of the parameter can be estimated from the curvature of the log-likelihood $\ell=\log{\mathcal{L}}$, function via\begin{equation}\sigma^2_{\theta}=\frac{-1}{\frac{\partial^2\ell}{\partial\theta^2}\bigr|_{\theta=\hat{\theta}}},\end{equation}where $\hat{\theta}$ is the maximum-likelihood estimate of $\theta$, got from the condition\begin{equation}\frac{\partial\ell}{\partial\theta}\bigr|_{\theta=\hat{\theta}}=0.\end{equation}

Now, Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can summarize his method here. If we let $f$ be the bivariate normal density of two zero-mean random variables, i.e.,\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\int\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)}dX dY,\end{equation}\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)},\end{equation}then we can get the variance $\sigma^2_{\rho}$ of the correlation coefficient by evaluating $\frac{-\partial^2\log{f}}{\partial \rho^2}\bigr|_{\rho=\hat{\rho}}$. The first derivative of $\log{f}$ is\begin{align}\frac{\partial\log{f}}{\partial \rho}&=\frac{\rho}{1-\rho^2}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_X^2}+\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)\frac{XY}{\sigma_X\sigma_Y}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_Y^2}\nonumber\\&=\frac{\rho}{1-\rho^2}+\left(\frac{2\rho}{1-\rho^2}\right)\left(\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right)+\frac{1}{1-\rho^2}\frac{XY}{\sigma_X\sigma_Y},\end{align}where at the maximum-likelihood solution $\hat{\rho}=\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}$ the middle term becomes\begin{equation}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}=-\frac{\mathbb{E}\left(X^2\right)}{\sigma_X^2\left(1-\rho^2\right)}+\frac{\rho\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{\mathbb{E}\left(Y^2\right)}{\sigma_X^2\left(1-\rho^2\right)}=-1.\end{equation}Whence upon taking the second derivative and evaluating at $\rho=\hat{\rho}$, we get\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho\right)^2},\end{align}so that\begin{equation}\sigma^2_{\rho}=\frac{-1}{\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}=\frac{\left(1-\rho^2\right)^2}{1+\rho^2}.\end{equation}Then by the central limit theorem, the sampling variance of $\rho$ is $\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}$.

For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$ with normally-distributed error $\varepsilon$: if\begin{equation}Y=\alpha+\beta X+\varepsilon\end{equation}and\begin{equation}\sigma^2_Y=\beta^2\sigma^2_X+\sigma^2,\end{equation} then according to the relationship $\beta=\rho\frac{\sigma_Y}{\sigma_X}$, it must be the case that error variance is $\sigma^2=\left(1-\rho^2\right)\sigma^2_Y$. Then the distribution of $\varepsilon$ is:\begin{equation}\mathcal{L}\left(\varepsilon\right)\sim\Pi_i\exp{\left(-\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\sigma^2}\right)},\end{equation}so that the log-likelihood is\begin{equation}\ell=-\sum_i\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}The first two derivatives are\begin{equation}\frac{\partial\ell}{\partial \beta}=\sum_i\frac{\left(Y_i-\overline{Y}-\beta\left(X_i-\overline{X}\right)\right)\left(X_i-\overline{X}\right)}{\left(1-\rho^2\right)\sigma_Y^2}\end{equation}and\begin{equation}\frac{\partial^2\ell}{\partial\beta^2}=-\sum_i\frac{\left(X_i-\overline{X}\right)^2}{\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}Now, making the substitution $\beta=\rho\frac{\sigma_Y}{\sigma_X}$ and evaluating at $\rho=\hat{\rho}=r$ gives\begin{equation}\frac{-1}{\frac{\partial^2\ell}{\partial\beta^2}\bigr|_{\beta=\hat{\beta}}}=\frac{-1}{\frac{\partial^2\ell}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}\frac{\sigma^2_Y}{\sigma^2_X}=\frac{\sigma^2_Y\left(1-r^2\right)}{\sigma^2_X\left(n-2\right)},\end{equation}whence the sampling variance of the measured correlation coefficient $\hat{\rho}=r$ is\begin{equation}\sigma^2_r=\frac{1-r^2}{n-2},\end{equation} in which we lose two degrees of freedom from the estimation of two parameters $\alpha$ and $\beta$. Finally, we can form a t-statistic to test the hypothesis that $r=0$ using \begin{equation}t=\frac{r}{\sigma_r}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}For an alternate derivation, see The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation.

A comparison of the two formulas shows\begin{equation}\frac{\sigma^2_{\rho}}{n}=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}<\frac{1-r^2}{n-2}=\sigma^2_r.\end{equation}In other words, there is less variance of the true parameter $\rho$ than in the value estimated from linear regression. It should also be pointed out that Pearson's formula is only true for bivariate normal variables, while the standard error of $r$ is valid for any linear regression. However, we see that the test of whether $r=0$ is equivalent to the test of whether $\beta=0$ and does not really tell us anything new.

There are two equations here for computing the statistical significance of the correlation coefficient. The first is the variance of the true correlation coefficient $\rho$ of two bivariate normal random variables:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$.

Both of these expression can derived from the principle of maximum-likelihood. That is, if we assume a parameter $\theta$ should be distributed normally,\begin{equation}\mathcal{L}\left(\theta\right)\sim\exp{\left(-\frac{\theta^2}{2\sigma^2_{\theta}}\right)},\end{equation}then the standard error of the parameter can be estimated from the curvature of the log-likelihood $\ell=\log{\mathcal{L}}$, function via\begin{equation}\sigma^2_{\theta}=\frac{-1}{\frac{\partial^2\ell}{\partial\theta^2}\bigr|_{\theta=\hat{\theta}}},\end{equation}where $\hat{\theta}$ is the maximum-likelihood estimate of $\theta$, got from the condition\begin{equation}\frac{\partial\ell}{\partial\theta}\bigr|_{\theta=\hat{\theta}}=0.\end{equation}

Now, Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can summarize his method here. If we let $f$ be the bivariate normal density of two zero-mean random variables, i.e.,\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\int\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)}dX dY,\end{equation}then we can get the variance $\sigma^2_{\rho}$ of the correlation coefficient by evaluating $\frac{-\partial^2\log{f}}{\partial \rho^2}\bigr|_{\rho=\hat{\rho}}$. The first derivative of $\log{f}$ is\begin{align}\frac{\partial\log{f}}{\partial \rho}&=\frac{\rho}{1-\rho^2}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_X^2}+\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)\frac{XY}{\sigma_X\sigma_Y}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_Y^2}\nonumber\\&=\frac{\rho}{1-\rho^2}+\left(\frac{2\rho}{1-\rho^2}\right)\left(\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right)+\frac{1}{1-\rho^2}\frac{XY}{\sigma_X\sigma_Y},\end{align}where at the maximum-likelihood solution $\hat{\rho}=\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}$ the middle term becomes\begin{equation}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}=-\frac{\mathbb{E}\left(X^2\right)}{\sigma_X^2\left(1-\rho^2\right)}+\frac{\rho\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{\mathbb{E}\left(Y^2\right)}{\sigma_X^2\left(1-\rho^2\right)}=-1.\end{equation}Whence upon taking the second derivative and evaluating at $\rho=\hat{\rho}$, we get\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho\right)^2},\end{align}so that\begin{equation}\sigma^2_{\rho}=\frac{-1}{\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}=\frac{\left(1-\rho^2\right)^2}{1+\rho^2}.\end{equation}Then by the central limit theorem, the sampling variance of $\rho$ is $\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}$.

For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$ with normally-distributed error $\varepsilon$: if\begin{equation}Y=\alpha+\beta X+\varepsilon\end{equation}and\begin{equation}\sigma^2_Y=\beta^2\sigma^2_X+\sigma^2,\end{equation} then according to the relationship $\beta=\rho\frac{\sigma_Y}{\sigma_X}$, it must be the case that error variance is $\sigma^2=\left(1-\rho^2\right)\sigma^2_Y$. Then the distribution of $\varepsilon$ is:\begin{equation}\mathcal{L}\left(\varepsilon\right)\sim\Pi_i\exp{\left(-\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\sigma^2}\right)},\end{equation}so that the log-likelihood is\begin{equation}\ell=-\sum_i\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}The first two derivatives are\begin{equation}\frac{\partial\ell}{\partial \beta}=\sum_i\frac{\left(Y_i-\overline{Y}-\beta\left(X_i-\overline{X}\right)\right)\left(X_i-\overline{X}\right)}{\left(1-\rho^2\right)\sigma_Y^2}\end{equation}and\begin{equation}\frac{\partial^2\ell}{\partial\beta^2}=-\sum_i\frac{\left(X_i-\overline{X}\right)^2}{\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}Now, making the substitution $\beta=\rho\frac{\sigma_Y}{\sigma_X}$ and evaluating at $\rho=\hat{\rho}=r$ gives\begin{equation}\frac{-1}{\frac{\partial^2\ell}{\partial\beta^2}\bigr|_{\beta=\hat{\beta}}}=\frac{-1}{\frac{\partial^2\ell}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}\frac{\sigma^2_Y}{\sigma^2_X}=\frac{\sigma^2_Y\left(1-r^2\right)}{\sigma^2_X\left(n-2\right)},\end{equation}whence the sampling variance of the measured correlation coefficient $\hat{\rho}=r$ is\begin{equation}\sigma^2_r=\frac{1-r^2}{n-2},\end{equation} in which we lose two degrees of freedom from the estimation of two parameters $\alpha$ and $\beta$. Finally, we can form a t-statistic to test the hypothesis that $r=0$ using \begin{equation}t=\frac{r}{\sigma_r}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}For an alternate derivation, see The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation.

A comparison of the two formulas shows\begin{equation}\frac{\sigma^2_{\rho}}{n}=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}<\frac{1-r^2}{n-2}=\sigma^2_r.\end{equation}In other words, there is less variance of the true parameter $\rho$ than in the value estimated from linear regression. It should also be pointed out that Pearson's formula is only true for bivariate normal variables, while the standard error of $r$ is valid for any linear regression. However, we see that the test of whether $r=0$ is equivalent to the test of whether $\beta=0$ and does not really tell us anything new.

There are two equations here for computing the statistical significance of the correlation coefficient. The first is the variance of the true correlation coefficient $\rho$ of two bivariate normal random variables:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$.

Both of these expression can derived from the principle of maximum-likelihood. That is, if we assume a parameter $\theta$ should be distributed normally,\begin{equation}\mathcal{L}\left(\theta\right)\sim\exp{\left(-\frac{\theta^2}{2\sigma^2_{\theta}}\right)},\end{equation}then the standard error of the parameter can be estimated from the curvature of the log-likelihood $\ell=\log{\mathcal{L}}$, function via\begin{equation}\sigma^2_{\theta}=\frac{-1}{\frac{\partial^2\ell}{\partial\theta^2}\bigr|_{\theta=\hat{\theta}}},\end{equation}where $\hat{\theta}$ is the maximum-likelihood estimate of $\theta$, got from the condition\begin{equation}\frac{\partial\ell}{\partial\theta}\bigr|_{\theta=\hat{\theta}}=0.\end{equation}

Now, Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can summarize his method here. If we let $f$ be the bivariate normal density of two zero-mean random variables, i.e.,\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)},\end{equation}then we can get the variance $\sigma^2_{\rho}$ of the correlation coefficient by evaluating $\frac{-\partial^2\log{f}}{\partial \rho^2}\bigr|_{\rho=\hat{\rho}}$. The first derivative of $\log{f}$ is\begin{align}\frac{\partial\log{f}}{\partial \rho}&=\frac{\rho}{1-\rho^2}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_X^2}+\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)\frac{XY}{\sigma_X\sigma_Y}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_Y^2}\nonumber\\&=\frac{\rho}{1-\rho^2}+\left(\frac{2\rho}{1-\rho^2}\right)\left(\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right)+\frac{1}{1-\rho^2}\frac{XY}{\sigma_X\sigma_Y},\end{align}where at the maximum-likelihood solution $\hat{\rho}=\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}$ the middle term becomes\begin{equation}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}=-\frac{\mathbb{E}\left(X^2\right)}{\sigma_X^2\left(1-\rho^2\right)}+\frac{\rho\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{\mathbb{E}\left(Y^2\right)}{\sigma_X^2\left(1-\rho^2\right)}=-1.\end{equation}Whence upon taking the second derivative and evaluating at $\rho=\hat{\rho}$, we get\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho\right)^2},\end{align}so that\begin{equation}\sigma^2_{\rho}=\frac{-1}{\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}=\frac{\left(1-\rho^2\right)^2}{1+\rho^2}.\end{equation}Then by the central limit theorem, the sampling variance of $\rho$ is $\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}$.

For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$ with normally-distributed error $\varepsilon$: if\begin{equation}Y=\alpha+\beta X+\varepsilon\end{equation}and\begin{equation}\sigma^2_Y=\beta^2\sigma^2_X+\sigma^2,\end{equation} then according to the relationship $\beta=\rho\frac{\sigma_Y}{\sigma_X}$, it must be the case that error variance is $\sigma^2=\left(1-\rho^2\right)\sigma^2_Y$. Then the distribution of $\varepsilon$ is:\begin{equation}\mathcal{L}\left(\varepsilon\right)\sim\Pi_i\exp{\left(-\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\sigma^2}\right)},\end{equation}so that the log-likelihood is\begin{equation}\ell=-\sum_i\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}The first two derivatives are\begin{equation}\frac{\partial\ell}{\partial \beta}=\sum_i\frac{\left(Y_i-\overline{Y}-\beta\left(X_i-\overline{X}\right)\right)\left(X_i-\overline{X}\right)}{\left(1-\rho^2\right)\sigma_Y^2}\end{equation}and\begin{equation}\frac{\partial^2\ell}{\partial\beta^2}=-\sum_i\frac{\left(X_i-\overline{X}\right)^2}{\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}Now, making the substitution $\beta=\rho\frac{\sigma_Y}{\sigma_X}$ and evaluating at $\rho=\hat{\rho}=r$ gives\begin{equation}\frac{-1}{\frac{\partial^2\ell}{\partial\beta^2}\bigr|_{\beta=\hat{\beta}}}=\frac{-1}{\frac{\partial^2\ell}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}\frac{\sigma^2_Y}{\sigma^2_X}=\frac{\sigma^2_Y\left(1-r^2\right)}{\sigma^2_X\left(n-2\right)},\end{equation}whence the sampling variance of the measured correlation coefficient $\hat{\rho}=r$ is\begin{equation}\sigma^2_r=\frac{1-r^2}{n-2},\end{equation} in which we lose two degrees of freedom from the estimation of two parameters $\alpha$ and $\beta$. Finally, we can form a t-statistic to test the hypothesis that $r=0$ using \begin{equation}t=\frac{r}{\sigma_r}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}For an alternate derivation, see The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation.

A comparison of the two formulas shows\begin{equation}\frac{\sigma^2_{\rho}}{n}=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}<\frac{1-r^2}{n-2}=\sigma^2_r.\end{equation}In other words, there is less variance of the true parameter $\rho$ than in the value estimated from linear regression. It should also be pointed out that Pearson's formula is only true for bivariate normal variables, while the standard error of $r$ is valid for any linear regression. However, we see that the test of whether $r=0$ is equivalent to the test of whether $\beta=0$ and does not really tell us anything new.

deleted 681 characters in body

Source Link

edited May 2, 2023 at 21:19

William Letsou

101
1
4

There are two equations here for computing the statistical significance of the correlation coefficient here. The The first is the sampling variance of the measuredtrue correlation coefficient $r$$\rho$ of two bivariate normal random variables $X$ and $Y$ with true correlation coefficient $\rho$:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$. These

Both of these expression can be derived infrom the following waysprinciple of maximum-likelihood. That is, if we assume a parameter $\theta$ should be distributed normally,\begin{equation}\mathcal{L}\left(\theta\right)\sim\exp{\left(-\frac{\theta^2}{2\sigma^2_{\theta}}\right)},\end{equation}then the standard error of the parameter can be estimated from the curvature of the log-likelihood $\ell=\log{\mathcal{L}}$, function via\begin{equation}\sigma^2_{\theta}=\frac{-1}{\frac{\partial^2\ell}{\partial\theta^2}\bigr|_{\theta=\hat{\theta}}},\end{equation}where $\hat{\theta}$ is the maximum-likelihood estimate of $\theta$, got from the condition\begin{equation}\frac{\partial\ell}{\partial\theta}\bigr|_{\theta=\hat{\theta}}=0.\end{equation}

Now, Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\approx\frac{1-3\rho^2}{n}\end{equation}\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can derive it using expectations as the OP asks. First, assume $X$ and $Y$ are mean-subtracted standard normal variables with standard deviations $\sigma_X$ and $\sigma_Y$. Then the correlation coefficient is defined by \begin{equation}\mathbb{E}\left(XY\right)=\rho\sigma_X\sigma_Y.\end{equation}Expressed in terms of the density $f\left(x,y\right)$ of the bivariate normal, this becomes \begin{align}\mathbb{E}\left(XY\right)=&\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}\int_{-\infty}^{\infty}xy\exp{\left(\frac{-x^2}{2\sigma_X^2\left(1-\rho^2\right)}\right)}\exp{\left(\frac{-y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)}\exp{\left(\frac{xy\rho}{\sigma_X\sigma_Y\left(1-\rho^2\right)}\right)}dxdy=\rho\sigma_X\sigma_Y.\tag{1}\end{align}Differentiate each side once with respect to $\rho$ to get\begin{align}\frac{\rho}{1-\rho^2}\mathbb{E}\left(XY\right)-\frac{\rho}{\sigma_X^2\left(1-\rho^2\right)^2}\mathbb{E}\left(X^3Y\right)-\frac{\rho}{\sigma_Y^2\left(1-\rho^2\right)^2}\mathbb{E}\left(XY^3\right)+\frac{1}{\sigma_X\sigma_Y\left(1-\rho^2\right)}\left(1+\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(X^2Y^2\right)=\sigma_X\sigma_Y\end{align}or\begin{align}\frac{1}{\sigma_X\sigma_Y}\left(1+\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(X^2Y^2\right)-\frac{\rho}{\sigma_X^2\left(1-\rho^2\right)}\mathbb{E}\left(XY^3\right)-\frac{\rho}{\sigma_Y^2\left(1-\rho^2\right)}\mathbb{E}\left(X^3Y\right)=\sigma_X\sigma_Y\left(1-2\rho^2\right)\tag{2}\end{align}To eliminate $\mathbb{E}\left(X^3Y\right)$ and $\mathbb{E}\left(XY^3\right)$ from this expression, differentiate the identity $\int_{-\infty}^{\infty}f\left(x,y\right)dxdy=1$ with respect to $\rho$ and take expectations to find\begin{align}\frac{\rho}{1-\rho^2}\mathbb{E}\left(1\right)-\frac{\rho}{\sigma_X^2\left(1-\rho^2\right)^2}\mathbb{E}\left(X^2\right)+\frac{1}{\sigma_X\sigma_Y\left(1-\rho^2\right)}\left(1+\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(XY\right)-\frac{\rho}{\sigma_Y^2\left(1-\rho^2\right)^2}\mathbb{E}\left(Y^2\right)=0\end{align}or\begin{align}\rho=\frac{1}{\sigma_X\sigma_Y}\left(1+\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(XY\right)-\frac{\rho}{\sigma_X^2\left(1-\rho^2\right)^2}\mathbb{E}\left(X^2\right)-\frac{\rho}{\sigma_Y^2\left(1-\rho^2\right)^2}\mathbb{E}\left(Y^2\right).\end{align}Since $\rho=\mathbb{E}\left(XY\right)/\sigma_X\sigma_Y$, the only way this expression can be true for all $\rho$ is if $\mathbb{E}\left(X^2\right)=\frac{\rho\sigma_X}{\sigma_Y}\mathbb{E}\left(XY\right)$ and $\mathbb{E}\left(Y^2\right)=\frac{\rho\sigma_Y}{\sigma_X}\mathbb{E}\left(XY\right)$, whence it follows that\begin{align}\mathbb{E}\left(XY\cdot X^2\right)&=\frac{\rho\sigma_X}{\sigma_Y}\mathbb{E}\left(XY\cdot XY\right)\\\mathbb{E}\left(X^3Y\right)&=\frac{\rho\sigma_X}{\sigma_Y}\mathbb{E}\left(X^2Y^2\right)\end{align}and\begin{align}\mathbb{E}\left(XY^3\right)&=\frac{\rho\sigma_Y}{\sigma_X}\mathbb{E}\left(X^2Y^2\right),\end{align}because the left- and right-hand sides refer the expectation to the same distribution $f$. Hence we can substitute into Eq. 2 above to get \begin{align}\frac{1}{\sigma_X\sigma_Y}\left(1+\frac{2\rho^2}{1-\rho^2}-\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(X^2Y^2\right)=\frac{\mathbb{E}\left(X^2Y^2\right)}{\sigma_X\sigma_Y}=\sigma_X\sigma_Y\left(1-2\rho^2\right),\end{align} giving the variance of $XY$ as \begin{equation}\mathbb{E}\left(X^2Y^2\right)-\mathbb{E}\left(XY\right)^2=1-3\rho^2.\end{equation}

Now the sample correlation coefficient of two mean-subtracted random variables is \begin{equation}r=\frac{\sum_i\left(X_iY_i-\overline{X}\overline{Y}\right)}{\sqrt{\sum_i\left(X_i-\overline{X}\right)^2\sum_i\left(Y_i-\overline{Y}\right)^2}}=\frac{\sum_iX_iY_i}{n\sigma_X\sigma_Y}\end{equation}because $\overline{X}=\overline{Y}=0$. Thus \begin{align}\sigma_X\sigma_Yr&=\frac{1}{n}\sum_iX_iY_i=\frac{1}{n}nX_1Y_1,\end{align}which has expected value\begin{equation}\sigma_X\sigma_Y\mathbb{E}\left(r\right)=\mathbb{E}\left(XY\right)=\rho\sigma_X\sigma_Y.\end{equation}Similarly,\begin{align}\sigma_X^2\sigma_Y^2n^2r^2&=\left(\sum_iX_iY_i\right)^2=nX_1Y_1+n\left(n-1\right)X_1X_2Y_1Y_2\end{align} which has expectation\begin{align}\sigma_X^2\sigma_Y^2n^2\mathbb{E}\left(r^2\right)&=n\mathbb{E}\left(X^2Y^2\right)+n\left(n-1\right)\mathbb{E}\left(XY\right)^2.\end{align}Hence the sampling variance of $r$ is approximately\begin{align}\mathbb{E}\left(r^2\right)-\mathbb{E}\left(r\right)^2=\frac{1-2\rho^2+\left(n-1\right)\rho^2}{n}-\frac{n\rho^2}{n}=\frac{1-3\rho^2}{n},\end{align} which is in agreement with Pearson's expression.

We can summarize his method here. If we let $f$ be the bivariate normal density of two zero-mean random variables, i.e.,\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\int\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)}dX dY,\end{equation}then we can get the variance $\sigma^2_{\rho}$ of the correlation coefficient by evaluating $\frac{-\partial^2\log{f}}{\partial \rho^2}\bigr|_{\rho=\hat{\rho}}$. The first derivative of $\log{f}$ is\begin{align}\frac{\partial\log{f}}{\partial \rho}&=\frac{\rho}{1-\rho^2}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_X^2}+\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)\frac{XY}{\sigma_X\sigma_Y}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_Y^2}\nonumber\\&=\frac{\rho}{1-\rho^2}+\left(\frac{2\rho}{1-\rho^2}\right)\left(\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right)+\frac{1}{1-\rho^2}\frac{XY}{\sigma_X\sigma_Y},\end{align}where at the maximum-likelihood solution $\hat{\rho}=\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}$ the middle term becomes\begin{equation}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}=-\frac{\mathbb{E}\left(X^2\right)}{\sigma_X^2\left(1-\rho^2\right)}+\frac{\rho\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{\mathbb{E}\left(Y^2\right)}{\sigma_X^2\left(1-\rho^2\right)}=-1.\end{equation}Whence upon taking the second derivative and evaluating at $\rho=\hat{\rho}$, we get\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho\right)^2},\end{align}so that\begin{equation}\sigma^2_{\rho}=\frac{-1}{\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}=\frac{\left(1-\rho^2\right)^2}{1+\rho^2}.\end{equation}Then by the central limit theorem, the sampling variance of $\rho$ is $\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}$.

For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$. The equation of with normally-distributed error $\varepsilon$: if\begin{equation}Y=\alpha+\beta X+\varepsilon\end{equation}and\begin{equation}\sigma^2_Y=\beta^2\sigma^2_X+\sigma^2,\end{equation} then according to the line which isrelationship $\beta=\rho\frac{\sigma_Y}{\sigma_X}$, it must be the best fit in respect of least squarescase that error variance is $Y=\alpha+\beta X$$\sigma^2=\left(1-\rho^2\right)\sigma^2_Y$. Under Then the null hypothesis thatdistribution of $\beta=0$$\varepsilon$ is:\begin{equation}\mathcal{L}\left(\varepsilon\right)\sim\Pi_i\exp{\left(-\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\sigma^2}\right)},\end{equation}so that the tlog-statisticlikelihood is\begin{equation}\ell=-\sum_i\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}The first two derivatives are\begin{equation}\frac{\partial\ell}{\partial \beta}=\sum_i\frac{\left(Y_i-\overline{Y}-\beta\left(X_i-\overline{X}\right)\right)\left(X_i-\overline{X}\right)}{\left(1-\rho^2\right)\sigma_Y^2}\end{equation}and\begin{equation}\frac{\partial^2\ell}{\partial\beta^2}=-\sum_i\frac{\left(X_i-\overline{X}\right)^2}{\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}Now, making the substitution $\beta=\rho\frac{\sigma_Y}{\sigma_X}$ and evaluating at \begin{equation}t_{n-2}=\frac{\hat{\left(\beta\right)}}{\text{se}\left(\hat{\beta}\right)},\end{equation}in$\rho=\hat{\rho}=r$ gives\begin{equation}\frac{-1}{\frac{\partial^2\ell}{\partial\beta^2}\bigr|_{\beta=\hat{\beta}}}=\frac{-1}{\frac{\partial^2\ell}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}\frac{\sigma^2_Y}{\sigma^2_X}=\frac{\sigma^2_Y\left(1-r^2\right)}{\sigma^2_X\left(n-2\right)},\end{equation}whence the sampling variance of the measured correlation coefficient $\hat{\rho}=r$ is\begin{equation}\sigma^2_r=\frac{1-r^2}{n-2},\end{equation} in which we lose two degrees of freedom infrom the estimation of two parameters $\hat{\alpha}$$\alpha$ and $\hat{\beta}$$\beta$. Now the total sum of squares is\begin{equation}\sum_i\left(Y_i-\hat{\alpha}-\hat{\beta}X_i\right)^2=s_Y^2-\hat{\beta}^2s_X^2=s_Y^2-r^2s_Y^2,\tag{3}\end{equation}where $s$ is the sample standard error Finally, and we have used the relationship\begin{equation}\hat{\beta}=\frac{\sum_i\left(X_i-\overline{X}\right)\left(Y_i-\overline{Y}\right)}{\sum_i\left(X_i-\overline{X}\right)^2}=\frac{\sum_i\left(X_i-\overline{X}\right)\left(Y_i-\overline{Y}\right)}{\sqrt{\sum_i\left(X_i-\overline{X}\right)^2\sum_i\left(Y_i-\overline{Y}\right)^2}}\frac{s_Y}{s_X}=r\frac{s_Y}{s_X}\tag{4}\end{equation} between the estimate of the slope and the sample correlation coefficient. Since the variance of $\hat{\beta}$ is got by \begin{equation}\text{var}\left(\hat{\beta}\right)=\frac{\sum_i\left(Y_i-\hat{\alpha}-\hat{\beta}X_i\right)^2/\left(n-2\right)}{\sum_i\left(X_i-\overline{X}\right)^2},\end{equation} we can substitute from Eqs. (3) and (4) to get\begin{equation}\text{var}\left(\hat{\beta}\right)=\frac{s_Y^2\left(1-r^2\right)}{s_X^2\left(n-2\right)}=\frac{1-r^2}{\left(n-2\right)r^2/\hat{\beta}^2},\end{equation}whence the standard error of $\hat{\beta}$ is $\frac{\hat{\beta}}{r}\sqrt{\frac{1-r^2}{n-2}}$ and ourform a t-statistic becomesto test the hypothesis that \begin{equation}t=\frac{\hat{\beta}}{\text{se}\left(\hat{\beta}\right)}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}The foregoing is a summary of$r=0$ using \begin{equation}t=\frac{r}{\sigma_r}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}For an alternate derivation, see The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation. The idea

A comparison of the two formulas shows\begin{equation}\frac{\sigma^2_{\rho}}{n}=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}<\frac{1-r^2}{n-2}=\sigma^2_r.\end{equation}In other words, there is that if we testless variance of the hypothesis thattrue parameter $\beta$$\rho$ than in the value estimated from linear regression. It should also be pointed out that Pearson's formula is zeroonly true for bivariate normal variables, we are equivalently testingwhile the hypothesis thatstandard error of $r$ is zero. Hence this statistic is only really valid for small valuesany linear regression. However, we see that the test of whether $r=0$ is equivalent to the correlation coefficienttest of whether $\beta=0$ and does not really tell us anything new.

There are two equations for computing the statistical significance of the correlation coefficient here. The first is the sampling variance of the measured correlation coefficient $r$ of two bivariate normal random variables $X$ and $Y$ with true correlation coefficient $\rho$:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$. These can be derived in the following ways.

Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\approx\frac{1-3\rho^2}{n}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can derive it using expectations as the OP asks. First, assume $X$ and $Y$ are mean-subtracted standard normal variables with standard deviations $\sigma_X$ and $\sigma_Y$. Then the correlation coefficient is defined by \begin{equation}\mathbb{E}\left(XY\right)=\rho\sigma_X\sigma_Y.\end{equation}Expressed in terms of the density $f\left(x,y\right)$ of the bivariate normal, this becomes \begin{align}\mathbb{E}\left(XY\right)=&\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}\int_{-\infty}^{\infty}xy\exp{\left(\frac{-x^2}{2\sigma_X^2\left(1-\rho^2\right)}\right)}\exp{\left(\frac{-y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)}\exp{\left(\frac{xy\rho}{\sigma_X\sigma_Y\left(1-\rho^2\right)}\right)}dxdy=\rho\sigma_X\sigma_Y.\tag{1}\end{align}Differentiate each side once with respect to $\rho$ to get\begin{align}\frac{\rho}{1-\rho^2}\mathbb{E}\left(XY\right)-\frac{\rho}{\sigma_X^2\left(1-\rho^2\right)^2}\mathbb{E}\left(X^3Y\right)-\frac{\rho}{\sigma_Y^2\left(1-\rho^2\right)^2}\mathbb{E}\left(XY^3\right)+\frac{1}{\sigma_X\sigma_Y\left(1-\rho^2\right)}\left(1+\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(X^2Y^2\right)=\sigma_X\sigma_Y\end{align}or\begin{align}\frac{1}{\sigma_X\sigma_Y}\left(1+\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(X^2Y^2\right)-\frac{\rho}{\sigma_X^2\left(1-\rho^2\right)}\mathbb{E}\left(XY^3\right)-\frac{\rho}{\sigma_Y^2\left(1-\rho^2\right)}\mathbb{E}\left(X^3Y\right)=\sigma_X\sigma_Y\left(1-2\rho^2\right)\tag{2}\end{align}To eliminate $\mathbb{E}\left(X^3Y\right)$ and $\mathbb{E}\left(XY^3\right)$ from this expression, differentiate the identity $\int_{-\infty}^{\infty}f\left(x,y\right)dxdy=1$ with respect to $\rho$ and take expectations to find\begin{align}\frac{\rho}{1-\rho^2}\mathbb{E}\left(1\right)-\frac{\rho}{\sigma_X^2\left(1-\rho^2\right)^2}\mathbb{E}\left(X^2\right)+\frac{1}{\sigma_X\sigma_Y\left(1-\rho^2\right)}\left(1+\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(XY\right)-\frac{\rho}{\sigma_Y^2\left(1-\rho^2\right)^2}\mathbb{E}\left(Y^2\right)=0\end{align}or\begin{align}\rho=\frac{1}{\sigma_X\sigma_Y}\left(1+\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(XY\right)-\frac{\rho}{\sigma_X^2\left(1-\rho^2\right)^2}\mathbb{E}\left(X^2\right)-\frac{\rho}{\sigma_Y^2\left(1-\rho^2\right)^2}\mathbb{E}\left(Y^2\right).\end{align}Since $\rho=\mathbb{E}\left(XY\right)/\sigma_X\sigma_Y$, the only way this expression can be true for all $\rho$ is if $\mathbb{E}\left(X^2\right)=\frac{\rho\sigma_X}{\sigma_Y}\mathbb{E}\left(XY\right)$ and $\mathbb{E}\left(Y^2\right)=\frac{\rho\sigma_Y}{\sigma_X}\mathbb{E}\left(XY\right)$, whence it follows that\begin{align}\mathbb{E}\left(XY\cdot X^2\right)&=\frac{\rho\sigma_X}{\sigma_Y}\mathbb{E}\left(XY\cdot XY\right)\\\mathbb{E}\left(X^3Y\right)&=\frac{\rho\sigma_X}{\sigma_Y}\mathbb{E}\left(X^2Y^2\right)\end{align}and\begin{align}\mathbb{E}\left(XY^3\right)&=\frac{\rho\sigma_Y}{\sigma_X}\mathbb{E}\left(X^2Y^2\right),\end{align}because the left- and right-hand sides refer the expectation to the same distribution $f$. Hence we can substitute into Eq. 2 above to get \begin{align}\frac{1}{\sigma_X\sigma_Y}\left(1+\frac{2\rho^2}{1-\rho^2}-\frac{2\rho^2}{1-\rho^2}\right)\mathbb{E}\left(X^2Y^2\right)=\frac{\mathbb{E}\left(X^2Y^2\right)}{\sigma_X\sigma_Y}=\sigma_X\sigma_Y\left(1-2\rho^2\right),\end{align} giving the variance of $XY$ as \begin{equation}\mathbb{E}\left(X^2Y^2\right)-\mathbb{E}\left(XY\right)^2=1-3\rho^2.\end{equation}

Now the sample correlation coefficient of two mean-subtracted random variables is \begin{equation}r=\frac{\sum_i\left(X_iY_i-\overline{X}\overline{Y}\right)}{\sqrt{\sum_i\left(X_i-\overline{X}\right)^2\sum_i\left(Y_i-\overline{Y}\right)^2}}=\frac{\sum_iX_iY_i}{n\sigma_X\sigma_Y}\end{equation}because $\overline{X}=\overline{Y}=0$. Thus \begin{align}\sigma_X\sigma_Yr&=\frac{1}{n}\sum_iX_iY_i=\frac{1}{n}nX_1Y_1,\end{align}which has expected value\begin{equation}\sigma_X\sigma_Y\mathbb{E}\left(r\right)=\mathbb{E}\left(XY\right)=\rho\sigma_X\sigma_Y.\end{equation}Similarly,\begin{align}\sigma_X^2\sigma_Y^2n^2r^2&=\left(\sum_iX_iY_i\right)^2=nX_1Y_1+n\left(n-1\right)X_1X_2Y_1Y_2\end{align} which has expectation\begin{align}\sigma_X^2\sigma_Y^2n^2\mathbb{E}\left(r^2\right)&=n\mathbb{E}\left(X^2Y^2\right)+n\left(n-1\right)\mathbb{E}\left(XY\right)^2.\end{align}Hence the sampling variance of $r$ is approximately\begin{align}\mathbb{E}\left(r^2\right)-\mathbb{E}\left(r\right)^2=\frac{1-2\rho^2+\left(n-1\right)\rho^2}{n}-\frac{n\rho^2}{n}=\frac{1-3\rho^2}{n},\end{align} which is in agreement with Pearson's expression.

For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$. The equation of the line which is the best fit in respect of least squares is $Y=\alpha+\beta X$. Under the null hypothesis that $\beta=0$ the t-statistic is \begin{equation}t_{n-2}=\frac{\hat{\left(\beta\right)}}{\text{se}\left(\hat{\beta}\right)},\end{equation}in which we lose two degrees of freedom in the estimation of $\hat{\alpha}$ and $\hat{\beta}$. Now the total sum of squares is\begin{equation}\sum_i\left(Y_i-\hat{\alpha}-\hat{\beta}X_i\right)^2=s_Y^2-\hat{\beta}^2s_X^2=s_Y^2-r^2s_Y^2,\tag{3}\end{equation}where $s$ is the sample standard error, and we have used the relationship\begin{equation}\hat{\beta}=\frac{\sum_i\left(X_i-\overline{X}\right)\left(Y_i-\overline{Y}\right)}{\sum_i\left(X_i-\overline{X}\right)^2}=\frac{\sum_i\left(X_i-\overline{X}\right)\left(Y_i-\overline{Y}\right)}{\sqrt{\sum_i\left(X_i-\overline{X}\right)^2\sum_i\left(Y_i-\overline{Y}\right)^2}}\frac{s_Y}{s_X}=r\frac{s_Y}{s_X}\tag{4}\end{equation} between the estimate of the slope and the sample correlation coefficient. Since the variance of $\hat{\beta}$ is got by \begin{equation}\text{var}\left(\hat{\beta}\right)=\frac{\sum_i\left(Y_i-\hat{\alpha}-\hat{\beta}X_i\right)^2/\left(n-2\right)}{\sum_i\left(X_i-\overline{X}\right)^2},\end{equation} we can substitute from Eqs. (3) and (4) to get\begin{equation}\text{var}\left(\hat{\beta}\right)=\frac{s_Y^2\left(1-r^2\right)}{s_X^2\left(n-2\right)}=\frac{1-r^2}{\left(n-2\right)r^2/\hat{\beta}^2},\end{equation}whence the standard error of $\hat{\beta}$ is $\frac{\hat{\beta}}{r}\sqrt{\frac{1-r^2}{n-2}}$ and our t-statistic becomes \begin{equation}t=\frac{\hat{\beta}}{\text{se}\left(\hat{\beta}\right)}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}The foregoing is a summary of The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation. The idea is that if we test the hypothesis that $\beta$ is zero, we are equivalently testing the hypothesis that $r$ is zero. Hence this statistic is only really valid for small values of the correlation coefficient.

There are two equations here for computing the statistical significance of the correlation coefficient. The first is the variance of the true correlation coefficient $\rho$ of two bivariate normal random variables:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$.

Both of these expression can derived from the principle of maximum-likelihood. That is, if we assume a parameter $\theta$ should be distributed normally,\begin{equation}\mathcal{L}\left(\theta\right)\sim\exp{\left(-\frac{\theta^2}{2\sigma^2_{\theta}}\right)},\end{equation}then the standard error of the parameter can be estimated from the curvature of the log-likelihood $\ell=\log{\mathcal{L}}$, function via\begin{equation}\sigma^2_{\theta}=\frac{-1}{\frac{\partial^2\ell}{\partial\theta^2}\bigr|_{\theta=\hat{\theta}}},\end{equation}where $\hat{\theta}$ is the maximum-likelihood estimate of $\theta$, got from the condition\begin{equation}\frac{\partial\ell}{\partial\theta}\bigr|_{\theta=\hat{\theta}}=0.\end{equation}

Now, Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can summarize his method here. If we let $f$ be the bivariate normal density of two zero-mean random variables, i.e.,\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\int\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)}dX dY,\end{equation}then we can get the variance $\sigma^2_{\rho}$ of the correlation coefficient by evaluating $\frac{-\partial^2\log{f}}{\partial \rho^2}\bigr|_{\rho=\hat{\rho}}$. The first derivative of $\log{f}$ is\begin{align}\frac{\partial\log{f}}{\partial \rho}&=\frac{\rho}{1-\rho^2}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_X^2}+\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)\frac{XY}{\sigma_X\sigma_Y}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_Y^2}\nonumber\\&=\frac{\rho}{1-\rho^2}+\left(\frac{2\rho}{1-\rho^2}\right)\left(\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right)+\frac{1}{1-\rho^2}\frac{XY}{\sigma_X\sigma_Y},\end{align}where at the maximum-likelihood solution $\hat{\rho}=\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}$ the middle term becomes\begin{equation}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}=-\frac{\mathbb{E}\left(X^2\right)}{\sigma_X^2\left(1-\rho^2\right)}+\frac{\rho\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{\mathbb{E}\left(Y^2\right)}{\sigma_X^2\left(1-\rho^2\right)}=-1.\end{equation}Whence upon taking the second derivative and evaluating at $\rho=\hat{\rho}$, we get\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho\right)^2},\end{align}so that\begin{equation}\sigma^2_{\rho}=\frac{-1}{\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}=\frac{\left(1-\rho^2\right)^2}{1+\rho^2}.\end{equation}Then by the central limit theorem, the sampling variance of $\rho$ is $\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}$.

For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$ with normally-distributed error $\varepsilon$: if\begin{equation}Y=\alpha+\beta X+\varepsilon\end{equation}and\begin{equation}\sigma^2_Y=\beta^2\sigma^2_X+\sigma^2,\end{equation} then according to the relationship $\beta=\rho\frac{\sigma_Y}{\sigma_X}$, it must be the case that error variance is $\sigma^2=\left(1-\rho^2\right)\sigma^2_Y$. Then the distribution of $\varepsilon$ is:\begin{equation}\mathcal{L}\left(\varepsilon\right)\sim\Pi_i\exp{\left(-\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\sigma^2}\right)},\end{equation}so that the log-likelihood is\begin{equation}\ell=-\sum_i\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}The first two derivatives are\begin{equation}\frac{\partial\ell}{\partial \beta}=\sum_i\frac{\left(Y_i-\overline{Y}-\beta\left(X_i-\overline{X}\right)\right)\left(X_i-\overline{X}\right)}{\left(1-\rho^2\right)\sigma_Y^2}\end{equation}and\begin{equation}\frac{\partial^2\ell}{\partial\beta^2}=-\sum_i\frac{\left(X_i-\overline{X}\right)^2}{\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}Now, making the substitution $\beta=\rho\frac{\sigma_Y}{\sigma_X}$ and evaluating at $\rho=\hat{\rho}=r$ gives\begin{equation}\frac{-1}{\frac{\partial^2\ell}{\partial\beta^2}\bigr|_{\beta=\hat{\beta}}}=\frac{-1}{\frac{\partial^2\ell}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}\frac{\sigma^2_Y}{\sigma^2_X}=\frac{\sigma^2_Y\left(1-r^2\right)}{\sigma^2_X\left(n-2\right)},\end{equation}whence the sampling variance of the measured correlation coefficient $\hat{\rho}=r$ is\begin{equation}\sigma^2_r=\frac{1-r^2}{n-2},\end{equation} in which we lose two degrees of freedom from the estimation of two parameters $\alpha$ and $\beta$. Finally, we can form a t-statistic to test the hypothesis that $r=0$ using \begin{equation}t=\frac{r}{\sigma_r}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}For an alternate derivation, see The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation.

A comparison of the two formulas shows\begin{equation}\frac{\sigma^2_{\rho}}{n}=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}<\frac{1-r^2}{n-2}=\sigma^2_r.\end{equation}In other words, there is less variance of the true parameter $\rho$ than in the value estimated from linear regression. It should also be pointed out that Pearson's formula is only true for bivariate normal variables, while the standard error of $r$ is valid for any linear regression. However, we see that the test of whether $r=0$ is equivalent to the test of whether $\beta=0$ and does not really tell us anything new.