Is the variance of the mean of a set of possibly dependent random variables less than the average of their respective variances?

Question

Is the variance of the mean of a set of possibly dependent random variables less than or equal to the average of their respective variances?

Mathematically, given random variables $X_1, X_2, ..., X_n$ that may be dependent:

Let $\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$ be the mean of these random variables.

Is it true that:

$$\text{Var}(\bar{X}) \leq \frac{1}{n}\sum_{i=1}^n \text{Var}(X_i)$$

I know that for independent random variables, we have the following equality:

$$\text{Var}(\bar{X}) = \frac{1}{n^2}\sum_{i=1}^n \text{Var}(X_i)$$

Which clearly satisfies the inequality. However, I'm unsure if this holds for dependent variables.

If this inequality is true, is there a proof or intuitive explanation?

If it's not always true, are there conditions under which it holds? What about the following inequality? $$\text{Var}(\bar{X}) \leq \text{Max}_{i=1}^n \text{Var}(X_i)$$

Any insights, proofs, or counterexamples would be greatly appreciated. Thank you!

Amir · Accepted Answer · 2024-07-08 18:31:56Z

Yes, it is true. Here is a proof. $$ \begin{align} \newcommand{\Var}{\operatorname{Var}} &\Var(\overline{X}) \\ &= \frac1{n^2}\Var\left(\sum_{i=1}^n X_i\right) \\ &=\frac1{n^2}\sum_{i=1}^n\sum_{j=1}^n\text{Cov}(X_i,X_j) \\ &\le\frac1{n^2}\sum_{i=1}^n\sum_{j=1}^n\sqrt{\text{Var}(X_i)\cdot \Var(X_j)} \\ &\le\frac1{n^2}\sum_{i=1}^n\sum_{j=1}^n\frac{\Var X_i+\Var X_j}{2} \\ &= \frac1n\sum_{i=1}^n \Var(X_i). \end{align} $$

Abezhiko · Accepted Answer · 2024-07-07 18:31:50Z

In general, one has : $$ \begin{align} \operatorname{Var}\left(\sum_{k=0}^n X_k\right) &= \sum_{i,j=0}^n \operatorname{Cov}(X_i,X_j) \end{align} $$ Now, the well-known inequality $ab \le \frac{1}{2}(a^2+b^2)$ permits to write : $$ \begin{align} \operatorname{Cov}(X,Y) &= \Bbb{E}\left[(X-\Bbb{E}[X])(Y-\Bbb{E}[Y])\right] \\ &\le \frac{1}{2}\Bbb{E}\left[(X-\Bbb{E}[X])^2 + (Y-\Bbb{E}[Y])^2\right] \\ &= \frac{1}{2} \left(\operatorname{Var}(X) + \operatorname{Var}(Y)\right) \end{align} $$ Hence $$ \operatorname{Var}\left(\sum_{k=0}^n X_k\right) \le \frac{1}{2} \sum_{i,j=0}^n \left(\operatorname{Var}(X_i) + \operatorname{Var}(X_j)\right) = n \sum_{k=0}^n \operatorname{Var}(X_k) $$ and finally $$ \operatorname{Var}\left(\bar{X}\right) = \frac{1}{n^2} \operatorname{Var}\left(\sum_{k=0}^n X_k\right) \le \frac{1}{n} \sum_{k=0}^n \operatorname{Var}(X_k) $$

Zoe Allen · Accepted Answer · 2024-07-07 17:55:04Z

A way to see this at a glance is that real random variables form an inner product space, with $\langle X,Y \rangle = \mathbb{E}XY$. The norm induced by this inner product is $\|X\|^2=\mathbb{E}X^2$, and both $\|X\|$ and $\|X\|^2$ are always convex for an inner product space.

Furthermore $\mathbb{E}X$ is linear, so $f(X)=X-\mathbb{E}X$ is linear, and a convex function composed with a linear transformation is always still convex.

This gives us that $$Var(X)=\langle X-\mathbb{E}X , X-\mathbb{E}X \rangle$$ is convex, from which your conjecture immediately follows.

$\begingroup$ I.e., Jensen's Inequality. $\endgroup$

Mark L. Stone
– Mark L. Stone

2024-08-10 14:58:33 +00:00
Commented Aug 10, 2024 at 14:58 — Mark L. Stone
– Mark L. Stone, Commented Aug 10, 2024 at 14:58

Amir · Accepted Answer · 2024-07-08 18:43:27Z

$$\text{Var}\left(\sum X_i\right) = \sum\limits_i \text{Var}\left( X_i\right) +\sum\limits_i \sum\limits_{j\not=i} \text{Cov}\left( X_i,X_j\right)$$ is maximised when the covariances take their maximum possible positive values, which happens when all the correlations are $+1$.

So the highest variance case for $\sum X_i$ and thus $\bar X$ will be when there is perfect positive correlation between the $X_i$, in which case $$\text{SD}(\sum X_i) = \sum \text{SD}(X_i)$$ giving $$\text{Var}(\bar X) = \frac1{n^2} \text{Var}\left(\sum X_i\right)=\left(\frac1n \sum \text{SD}(X_i)\right)^2 .$$

Then, using the Cauchy–Schwarz inequality:

$$\left(\frac1n \sum \text{SD}(X_i)\right)^2 \le \frac1n \sum \left(\text{SD}(X_i)^2\right) = \frac{1}{n}\sum \text{Var}(X_i)$$

with equality only when all the $\text{SD}(X_i)$ are equal.

So your $\text{Var}(\bar{X}) \leq \frac{1}{n}\sum \text{Var}(X_i)$ is correct,

with equality only when $X_i-E[X_i]=X_j-E[X_j]$ for all $i,j$ so when you have identical variances and perfect positive correlation (though possibly different expectations).

While the situation described in the first sentence seems plausible, the description does not mathematically justify its correctness. — Greg Martin
– Greg Martin, Commented Jul 7, 2024 at 17:10
@GregMartin it is well known. I have added an additional introductory line — Henry
– Henry, Commented Jul 7, 2024 at 17:20

Stack Exchange Network

Is the variance of the mean of a set of possibly dependent random variables less than the average of their respective variances?

4 Answers 4

You must log in to answer this question.

Hot Network Questions

Is the variance of the mean of a set of possibly dependent random variables less than the average of their respective variances?

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions