There are two equations here for computing the statistical significance of the correlation coefficient. The first is the variance of the true correlation coefficient $\rho$ of two bivariate normal random variables:\begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n},\end{equation} and the second is a t-statistic associated with the hypothesis that in the linear regression of $Y$ on $X$, the main effect of $X$ is zero:\begin{equation}t=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}Whence the standard error of $r$ mentioned by the OP: $\text{se}\left(r\right)=\sqrt{\frac{1-r^2}{n-2}}$.
Both of these expression can derived from the principle of maximum-likelihood. That is, if we assume a parameter $\theta$ should be distributed normally,\begin{equation}\mathcal{L}\left(\theta\right)\sim\exp{\left(-\frac{\theta^2}{2\sigma^2_{\theta}}\right)},\end{equation}then the standard error of the parameter can be estimated from the curvature of the log-likelihood $\ell=\log{\mathcal{L}}$, function via\begin{equation}\sigma^2_{\theta}=\frac{-1}{\frac{\partial^2\ell}{\partial\theta^2}\bigr|_{\theta=\hat{\theta}}},\end{equation}where $\hat{\theta}$ is the maximum-likelihood estimate of $\theta$, got from the condition\begin{equation}\frac{\partial\ell}{\partial\theta}\bigr|_{\theta=\hat{\theta}}=0.\end{equation}
Now, Pearson derived the first expression \begin{equation}\text{var}\left(r\right)=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}\end{equation} in VII. Mathematical contributions to the theory of evolution.-III. Regression, heredity, and panmixia and https://royalsocietypublishing.org/doi/10.1098/rspl.1897.0091 by expanding the joint distribution of $n$ pairs of bivariate normal variables about the true value of $\rho$. We can summarize his method here. If we let $f$ be the bivariate normal density of two zero-mean random variables, i.e.,\begin{equation}f\left(X,Y\right)=\frac{1}{2\pi\sqrt{1-\rho^2}\sigma_X\sigma_Y}\exp{\left(-\frac{X^2}{2\sigma_X^2\left(1-\rho^2\right)}+-\frac{\rho XY}{2\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{Y^2}{2\sigma_Y^2\left(1-\rho^2\right)}\right)},\end{equation}then we can get the variance $\sigma^2_{\rho}$ of the correlation coefficient by evaluating $\frac{-\partial^2\log{f}}{\partial \rho^2}\bigr|_{\rho=\hat{\rho}}$. The first derivative of $\log{f}$ is\begin{align}\frac{\partial\log{f}}{\partial \rho}&=\frac{\rho}{1-\rho^2}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_X^2}+\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)\frac{XY}{\sigma_X\sigma_Y}+\frac{2\rho}{1-\rho^2}\cdot\frac{-X^2}{2\left(1-\rho^2\right)\sigma_Y^2}\nonumber\\&=\frac{\rho}{1-\rho^2}+\left(\frac{2\rho}{1-\rho^2}\right)\left(\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right)+\frac{1}{1-\rho^2}\frac{XY}{\sigma_X\sigma_Y},\end{align}where at the maximum-likelihood solution $\hat{\rho}=\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}$ the middle term becomes\begin{equation}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}=-\frac{\mathbb{E}\left(X^2\right)}{\sigma_X^2\left(1-\rho^2\right)}+\frac{\rho\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y\left(1-\rho^2\right)}-\frac{\mathbb{E}\left(Y^2\right)}{\sigma_X^2\left(1-\rho^2\right)}=-1.\end{equation}Whence upon taking the second derivative and evaluating at $\rho=\hat{\rho}$, we get\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho\right)^2},\end{align}\begin{align}\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho^2}}&=\left(\frac{1}{1-\rho^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}\right)+\frac{2\rho}{\left(1-\rho^2\right)^2}\frac{\mathbb{E}\left(XY\right)}{\sigma_X\sigma_Y}+\left(\frac{2}{1-\rho^2}+\frac{4\rho^2}{\left(1-\rho^2\right)}\right)\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}+\frac{2\rho}{1-\rho^2}\frac{\partial}{\partial\rho}\left[\log{f}-\log{\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}}\right]\bigr|_{\rho=\hat{\rho}}\nonumber\\&=\frac{1+\rho^2}{\left(1-\rho^2\right)^2}+\frac{2\rho^2}{\left(1-\rho^2\right)^2}-2\left(\frac{1+\rho^2}{\left(1-\rho^2\right)^2}\right)-\frac{2\rho^2}{\left(1-\rho^2\right)^2},\end{align}so that\begin{equation}\sigma^2_{\rho}=\frac{-1}{\frac{\partial^2\log{f}}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}=\frac{\left(1-\rho^2\right)^2}{1+\rho^2}.\end{equation}Then by the central limit theorem, the sampling variance of $\rho$ is $\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}$.
For the second form of the statistic, let's drop the assumption of bivariate normality and consider the regression of $Y$ on $X$ with normally-distributed error $\varepsilon$: if\begin{equation}Y=\alpha+\beta X+\varepsilon\end{equation}and\begin{equation}\sigma^2_Y=\beta^2\sigma^2_X+\sigma^2,\end{equation} then according to the relationship $\beta=\rho\frac{\sigma_Y}{\sigma_X}$, it must be the case that error variance is $\sigma^2=\left(1-\rho^2\right)\sigma^2_Y$. Then the distribution of $\varepsilon$ is:\begin{equation}\mathcal{L}\left(\varepsilon\right)\sim\Pi_i\exp{\left(-\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\sigma^2}\right)},\end{equation}so that the log-likelihood is\begin{equation}\ell=-\sum_i\frac{\left(Y_i-\alpha-\beta X_i\right)^2}{2\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}The first two derivatives are\begin{equation}\frac{\partial\ell}{\partial \beta}=\sum_i\frac{\left(Y_i-\overline{Y}-\beta\left(X_i-\overline{X}\right)\right)\left(X_i-\overline{X}\right)}{\left(1-\rho^2\right)\sigma_Y^2}\end{equation}and\begin{equation}\frac{\partial^2\ell}{\partial\beta^2}=-\sum_i\frac{\left(X_i-\overline{X}\right)^2}{\left(1-\rho^2\right)\sigma_Y^2}.\end{equation}Now, making the substitution $\beta=\rho\frac{\sigma_Y}{\sigma_X}$ and evaluating at $\rho=\hat{\rho}=r$ gives\begin{equation}\frac{-1}{\frac{\partial^2\ell}{\partial\beta^2}\bigr|_{\beta=\hat{\beta}}}=\frac{-1}{\frac{\partial^2\ell}{\partial\rho^2}\bigr|_{\rho=\hat{\rho}}}\frac{\sigma^2_Y}{\sigma^2_X}=\frac{\sigma^2_Y\left(1-r^2\right)}{\sigma^2_X\left(n-2\right)},\end{equation}whence the sampling variance of the measured correlation coefficient $\hat{\rho}=r$ is\begin{equation}\sigma^2_r=\frac{1-r^2}{n-2},\end{equation} in which we lose two degrees of freedom from the estimation of two parameters $\alpha$ and $\beta$. Finally, we can form a t-statistic to test the hypothesis that $r=0$ using \begin{equation}t=\frac{r}{\sigma_r}=r\sqrt{\frac{n-2}{1-r^2}}.\end{equation}For an alternate derivation, see The Analysis of Physical Measurements, pp. 193-199, by Pugh and Winslow cited in A brief note on the standard error of the Pearson correlation.
A comparison of the two formulas shows\begin{equation}\frac{\sigma^2_{\rho}}{n}=\frac{\left(1-\rho^2\right)^2}{n\left(1+\rho^2\right)}<\frac{1-r^2}{n-2}=\sigma^2_r.\end{equation}In other words, there is less variance of the true parameter $\rho$ than in the value estimated from linear regression. It should also be pointed out that Pearson's formula is only true for bivariate normal variables, while the standard error of $r$ is valid for any linear regression. However, we see that the test of whether $r=0$ is equivalent to the test of whether $\beta=0$ and does not really tell us anything new.