23
$\begingroup$

A standard error is the estimated standard deviation $\hat \sigma(\hat\theta)$ of an estimator $\hat\theta$ for a parameter $\theta$.

Why is the estimated standard deviation of the residuals called "residual standard error" (e.g., in the output of R's summary.lm function) and not "residual standard deviation"? What parameter estimate do we equip with a standard error here?

Do we consider each residual as an estimator for "its" error term and estimate the "pooled" standard error of all these estimators?

$\endgroup$
7
  • 6
    $\begingroup$ I think that's an R thing. I don't think other software necessarily uses that phrasing, & 'residual standard deviation' is common in textbooks, eg. I don't have an answer, but I always thought it was weird that R uses that phrase. $\endgroup$ Commented Apr 1, 2015 at 20:00
  • $\begingroup$ @gung: that could be the explanation! When googling "residual standard error" in quotes I get only 0.1% of the hits than without quotes... $\endgroup$ Commented Apr 1, 2015 at 20:03
  • $\begingroup$ I could put that as a (non-)answer, if you'd prefer. $\endgroup$ Commented Apr 1, 2015 at 20:05
  • 1
    $\begingroup$ @gung it's funny how using specific software shapes your thinking: I'd never call it "residual sd" - residuals are not data but errors, so residual error seems to be proper name. But if you think about it it really seems an R-thing. $\endgroup$ Commented Apr 1, 2015 at 20:09
  • 2
    $\begingroup$ @Tim, it might correctly be considered an estimate of the standard deviation of the errors, but the residuals are not technically the errors themselves. Nor is it the standard error of the error SD, for what that's worth. $\endgroup$ Commented Apr 1, 2015 at 20:17

7 Answers 7

3
$\begingroup$

As in mentioned by a comment by NRH to one of the other answers, the documentation for stats::sigma says:

The misnomer “Residual standard error” has been part of too many R (and S) outputs to be easily changed there.

This tells me that the developers know this terminology to be bogus. However, since it has crept into the software, changing to correct terminology is difficult and not worth the trouble of doing so when experienced statisticians know what is meant.

$\endgroup$
0
15
$\begingroup$

I think that phrasing is specific to R's summary.lm() output. Notice that the underlying value is actually called "sigma" (summary.lm()$sigma). I don't think other software necessarily uses that name for the standard deviation of the residuals. In addition, the phrasing 'residual standard deviation' is common in textbooks, for instance. I don't know how that came to be the phrasing used in R's summary.lm() output, but I always thought it was weird.

$\endgroup$
3
  • 1
    $\begingroup$ How is summary.lm(reg)$sigma different from sd(reg$residuals)? $\endgroup$ Commented Feb 18, 2016 at 6:23
  • 3
    $\begingroup$ @AndréTerra, the correct degrees of freedom is n - p, which is what summary uses. sd uses var which uses n - 1 degrees of freedom. If you manually compute the standard deviation of the residuals dividing by n - p then you will get the same answer as what summary provides. $\endgroup$ Commented Sep 15, 2016 at 17:04
  • 4
    $\begingroup$ To corroborate gung, I cite from the R documentation of stats::sigma: The misnomer “Residual standard error” has been part of too many R (and S) outputs to be easily changed there. $\endgroup$ Commented Oct 5, 2016 at 20:25
3
$\begingroup$

From my econometrics training, it is called "residual standard error" because it is an estimate of the actual "residual standard deviation". See this related question that corroborates this terminology.

A Google search for the term residual standard error also shows up a lot of hits, so it is by no means an R oddity. I tried both terms with quotes, and both show up roughly 60,000 times.

$\endgroup$
3
  • 1
    $\begingroup$ Interesting. But why would you call an estimate of a standard deviation of any random variable (like an error term; and not a specific estimator) a "standard error"? $\endgroup$ Commented Apr 2, 2015 at 6:41
  • $\begingroup$ My thinking is we need to have a name for the estimate (to distinguish from the actual value), any name is as good as another. But surely someone more knowledgeable about the etymology can offer a better reason. Note that there is definitely a parallel with the coefficient standard error, which is the estimate of the coefficient estimate 's standard deviation. $\endgroup$ Commented Apr 2, 2015 at 15:11
  • $\begingroup$ "A Google search for the term residual standard error also shows up a lot of hits, so it is by no means an R oddity." This argument ignores the fact that R is (one of) the most popular programming language for statistics. $\endgroup$ Commented Jun 1, 2023 at 10:10
3
$\begingroup$

This is really, really confusing use of the term "standard error". I teach Introductory Statistics at a college, and this is one of the most confusing details in R for students (along with R using standard deviation and not variance in its various pnorm, qnorm, etc. commands).

A standard error, from a statistical sense, is defined as "a standard deviation of an estimator/statistic". It is a resampling concept: the standard error of the slope estimate, for example. If you were to resample your data, the estimate of the slope will vary, and this type of standard deviation we call a standard error.

But the standard deviation of the residuals is not a resampling concept – it is directly observable in the data. So what R reports as "residual standard error" really is an "estimated standard deviation of the residuals". It is like the difference between $s$ (an estimated standard deviation) and $\sigma$ (a true/theoretical standard deviation), not the difference between $\sigma$ (a true/theoretical standard deviation) and $\sigma/\sqrt{n}$ (a true/theoretical standard error).

$\endgroup$
0
$\begingroup$

Put simply, the standard error of the sample is an estimate of how far the sample mean is likely to be from the population mean, whereas the standard deviation of the sample is the degree to which individuals within the sample differ from the sample mean.

Standard error - Wikipedia, the free encyclopedia

$\endgroup$
1
  • 7
    $\begingroup$ This is true, but does not actually answer the question. What R calls the "residual standard error" is not "an estimate of how far the sample mean is likely to be from the population mean". $\endgroup$ Commented Apr 1, 2015 at 20:03
0
$\begingroup$

A fitted regression model uses the parameters to generate point estimate predictions which are the means of observed responses if you were to replicate the study with the same XX values an infinite number of times (when the linear model is true).

The difference between these predicted values and the ones used to fit the model are called "Residuals" which, when replicating the data collection process, have properties of random variables with 0 means. The observed residuals are then used to subsequently estimate the variability in these values and to estimate the sampling distribution of the parameters.

Note:

When the residual standard error is exactly 0 then the model fits the data perfectly (likely due to overfitting).

If the residual standard error can not be shown to be significantly different from the variability in the unconditional response, then there is little evidence to suggest the linear model has any predictive ability.

$\endgroup$
1
  • 1
    $\begingroup$ Good attempt at explaining it but this story is not compatible with the definition of the standard error. The standard error is the standard deviation of the sampling distribution of the mean, but the distribution of the residuals is not a sampling distribution. Specifically, the standard error depends on N but the standard deviation of the residuals does not. $\endgroup$ Commented Oct 22, 2020 at 17:17
0
$\begingroup$

For the nls (nonlinear least squares fit) R function, the "Residual standard error" seems to be:

$$\sqrt{\frac{\mathrm{RSS}}{n-p}}$$

where RSS is the "residual sum-of-squares", n is the number of observations and p is the number of estimated parameters. There's absolutely no description in the documentation, this assumption is based on a "numerical experiment".

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.