I understand the sampling distribution of unstandardized linear regression coefficients is normal, and therefore a t distribution can be used to determine p values for given coefficient and standard error values. I don’t understand how t distributions can be used for standardized regression, though. It’s my understanding the sampling distribution of Pearson’s r is skewed and not a t distribution. In standardized regression the regression coefficient and Pearson’s r will be equal, so how can a t distribution be used to compute p values for standardized regression coefficients? Wouldn’t a t distribution be the wrong distribution to use?
$\begingroup$ $\endgroup$
3 - 1$\begingroup$ I suspect you might be confusing two different forms of standardization. If you explain what you mean by "standardized regression" and why you think the regression coefficient is the same as Pearson's correlation, that might help people to unravel your questions. BTW, the t distribution is used to test hypotheses about the Pearson correlation, but the correlation coef has to be transformed first: $r \sqrt{(n-2)/(1-r^2)}\sim t_{n-1}$. $\endgroup$Gordon Smyth– Gordon Smyth2021-06-01 07:59:51 +00:00Commented Jun 1, 2021 at 7:59
- $\begingroup$ Hi Gordon. It’s my understanding that in “standardized regression” the correlation coefficient is equal to the regression coefficient. I mean standardization in the sense of converting raw values of a dataset to z scores. I’m unsure why you say this isn’t true for standardized regression. Empirically and mathematically it is the case given the standard deviations of x and y are both one, canceling themselves out in the slope equation and equaling the correlation. $\endgroup$user3138766– user31387662021-06-02 15:24:35 +00:00Commented Jun 2, 2021 at 15:24
- 1$\begingroup$ I didn't say anything was not true, I just asked you clarify what you meant by "standardized regression", since that term does not have an agreed meaning in statistics. I agree that if you are doing simple linear regression and you standardize x and y to have mean 0 and sd 1 then the regression coefficient will become the same as the correlation. Skewness is no problem here. Pearson's correlation and the t-statistic both have symmetric distributions if the true correlation is zero, and both have skew distributions if the correlation is nonzero. So there is no conflict. $\endgroup$Gordon Smyth– Gordon Smyth2021-06-03 05:04:48 +00:00Commented Jun 3, 2021 at 5:04
Add a comment |
1 Answer
$\begingroup$ $\endgroup$
The t statistic divides the estimate by the standard error, which dilates the result from the [-1,1] range to the [-$\infty$, $\infty$] range. You can prove the t distribution result mathematically, but I suspect a simple simulation will be as convincing and perhaps more illustrative:
set.seed(12345) n = 100 b0 = 2 b1 = 0.5 x = rnorm(n) y = b0 + b1*x + rnorm(n) # Correlation = 0.5502: cor(x,y) x.s = scale(x) y.s = scale(y) # t stat for original b1 is = 6.523: summary(lm(y ~ x)) # standardized b1 = corr = 0.5502, # t stat for standardized b1 = 6.523: summary(lm(y.s ~ x.s))