Why do we define the standard error to ignore bias (unlike MSE which includes bias)?

Question

Why is standard error of an estimator $\hat \theta$ defined as $$se = \sqrt{Var(\hat \theta)}$$ and not $$se = \sqrt {MSE(\hat \theta)} = \sqrt{Bias^2(\hat \theta) + Var(\hat \theta)}.$$

That is, standard error should be the square root of mean squared error. Of course, if the estimator is unbiased, there's no difference. But in any case I can think of where we use standard error, if the estimator is biased, that bias needs to be part of the error.

For example, consider performing the Wald test. We can always come up with an estimator of $\sigma^2$ of arbitrarily low variance, if we are are willing to increase the bias. For example, given $\hat \sigma^2$, define $$\hat \sigma_1^2 = (1-t)\hat \sigma^2 + tk$$ for arbitrary constants $t,k$ will give such an estimator. If we use this to perform the Wald test, we can get whatever $\alpha$ we desired, simply by lowering the se, without actually improving the test.

This problem would be solved if the definition of se would include bias - and this would be more consistent with the words standard error. Why don't we do that?

Update - Relevance for Hypothesis Testing

Terminology aside, there is a impactful question here: In cases where our estimator is indeed biased, should we use standard error or the above definition in hypothesis testing? There are cases where this will make a difference in the test result.

Ben · Accepted Answer · 2025-01-05 22:45:56Z

We want named concepts for both things (and they go back a long way)

The reason that the quantity represented by "standard error" has a name is that we want named concepts for both things rather than just one of them. One is called the "standard error" of the estimator and one is called the "root-mean-squared-error" of the estimator. It is okay to have named concepts for both of these things and to keep in mind what both concepts do and don't imply.

As to why that particular name is used, the answer is largely historical. The term "standard error" seems to have been introduced in Yule (1897) and then used in his later introductory statistics text Yule (1911). The latter was a popular textbook for statistics in the early twentieth century and so the name stuck. As to the "root-mean-squared-error", this goes back even further. The term "mean error" appears in Gauss (1821) to describe what we would now call the root-mean-squared-error.$^\dagger$

Contrary to what you seem to assume in your question, the mere fact that we have names for both these concepts, and that they sound a bit similar to one another, does not lead to any serious confusion in the profession. It is well-known that lower "standard error" does not necessarily imply a better estimator. (Indeed, it is well-known that a constant is an estimator with zero standard error and is a really crappy estimator!) As with any long-standing discipline with nomenclature derived from historical processes, there are some areas in mathematics and statistics where it might be nice to rename and redefine some things, to have nomenclature that fits more smoothly. This is not really one of them --- statisticians and other experienced statistical users do not make the mistake you are suggesting, so they do not really see this as a "problem" that needs to be "solved".

$^\dagger$ Link is to the 1995 English translation of Gauss (1821). Interestingly, Gauss actually called this quantity "the mean error to be feared" with "mean error" being the shortened version. What a wonderful name! Perhaps we should make a push to reintroduce that nomenclature!

I can accept that names can be due to historical circumstances, not ideal, but not a problem in practice. But, in this case, it actually makes a difference to results of hypothesis testing. Can you please take a look at my update to the question? — SRobertJames
– SRobertJames, Commented Jan 5 at 23:04
You assert relevance for hypothesis testing, but that sounds wrong to me. A properly constructed hypothesis test uses the p-value for its outcome, which should take into account any relevant bias, etc. (assuming that the test statistic is even based on an estimator). It is unclear to me why you think that hypothesis testing would be adversely affected by the existing concept of standard error. — Ben
– Ben, Commented Jan 5 at 23:27
I agree that a properly constructed hypothesis test should use the p-value and take into account bias, but the standard formulas do not. They just the standard error of the estimator: see, for example, Wikipedia's formula for Wald test. — SRobertJames
– SRobertJames, Commented Jan 6 at 1:21

Thomas Lumley · Accepted Answer · 2025-01-05 22:05:07Z

It can't be wrong (it's a definition), and it can't really be changed (it is too standard), so the question is whether it is helpful in some way or a regrettable historical misstep (like the terms 'error' and 'regression').

I think it is actually a helpful definition, because it matters for interval estimation, where you do need to consider bias and variability separately. That's where we typically report and use the standard error (as contrasted to the MSE or RMSE as summaries of point prediction).

A second reason is that when have good reason to expect the bias is small relative to the uncertainty (a very common situation for estimators of smooth finite-dimensional parameters) we can estimate the standard error without needing to estimate the bias. Estimating the bias (at least in data analysis, rather than simulation) is much harder.

I am not sure it is helpful for a non-symmetrically distributed estimator, as confidence intervals are also asymmetric. Take for example estimating the variance $\sigma^2$ of a $N(\mu, \sigma^2)$ of sample size $n$. While $\frac{1}{n-1}\sum(X_i-\bar X)^2$ may be an unbiased estimator, the MLE estimator is $\frac{1}{n}\sum(X_i-\bar X)^2$ and the MSE minimising estimator is $\frac{1}{n+1}\sum(X_i-\bar X)^2$. None of these are in the middle of a good confidence interval so implicitly suggesting that the interval should be an estimator $\pm$ some SEs is not really a good idea — Henry
– Henry, Commented Jan 5 at 23:01
Regardless of the shape of the interval, it's still going to scale with the standard error (at least approximately) just on equivariance grounds. Otherwise you'd get a different interval if you used different units. — Thomas Lumley
– Thomas Lumley, Commented Jan 5 at 23:51

Stack Exchange Network

Why do we define the standard error to ignore bias (unlike MSE which includes bias)?

Update - Relevance for Hypothesis Testing

2 Answers 2

We want named concepts for both things (and they go back a long way)

Hot Network Questions

Why do we define the standard error to ignore bias (unlike MSE which includes bias)?

Update - Relevance for Hypothesis Testing

2 Answers 2

We want named concepts for both things (and they go back a long way)

Related

Hot Network Questions