3
$\begingroup$

For the median ($m$), there is a simple formula that it falls within the range of a random sample of size $n$ from a continuous distribution: $$ \mathbb{P}(X_{(1)} \leqslant m \leqslant X_{(n)}) = 2^{1-n} \left(\text{$\unicode{0008}$2}^{n-1}-1\right) $$

Question: Is there anything that can be said for the population (arithmetic) mean for a general continuous distribution with a finite mean?

$\endgroup$
6
  • 5
    $\begingroup$ I would be surprised if something equally simple existed, because of the presence of arbitrarily long and fat tails, which will lead to arbitrarily improbable extremal order statistics and arbitrarily large or small means... but have much less influence on the median (which is why this formula works for the median). $\endgroup$ Commented Sep 18 at 8:53
  • $\begingroup$ @StephanKolassa Yeah, I suspected the same but can't really formalize it. Thank you. $\endgroup$ Commented Sep 18 at 9:34
  • 3
    $\begingroup$ To take an extreme example, with a Pareto distribution with shape parameter $\alpha=1$, the expectation (population mean) is $+\infty$ so it is never covered by the range of a sample. $\endgroup$ Commented Sep 18 at 9:49
  • $\begingroup$ My immediate idea when presented with a "general distribution" would be to try and apply a Markov or Chernoff bound, and do so in a way that somehow yields the expectation by integrating over the bound for all values of the support. Maybe that could work if you put some mild restriction on the tails…? $\endgroup$ Commented Sep 18 at 9:49
  • 3
    $\begingroup$ @COOLSerdash If you have a Pareto distribution with shape parameter $α>1$ so a finite mean, for any given sample size $n$ you can make the probability the range of the sample covers the expectation (population mean) as small as you want by reducing $α$ towards $1$. $\endgroup$ Commented Sep 18 at 13:48

2 Answers 2

5
$\begingroup$

The probability can be arbitrarily low even if we restrict ourselves to reasonably well behaved distributions (all moments finite, Gaussian tails).


Formally, let's take any $n > 0$ and $0 < p < 1$, we will try to find a distribution where a sample of size $n$ has probability at most $p$ to cover expected value. Take the random variables:

$$ A_1 \sim N(0, 1) \\ B_2 \sim N(m, 1) $$

and have random variable $X$ be a mixture of $A$ with some probability $s$ and $B$ with probability $1 - s$. Now $E(X) = (1 - s)m$ and

$$ \begin{aligned} P(X > E(X)) =\ &sP(A > (1 - s)m) + (1 - s)P(B > (1 - s)m) < \\ &sP(A > (1 - s)m) + (1-s)\frac{1}{2} \end{aligned} $$

And

$$ P(\text{sample of size }n\text{ covers } E(X)) < 1 - P(X < E(X))^n = 1 - (1 - P(X > E(X)))^n $$

By moving $s \to 1$ we can make $\frac{1 - s}{2}$ arbitrarily small and given this $s$, we can increase $m$ to make $sP(A > E(X))$ arbitrarily small so we can make $P(X > E(X))$ arbitrarily small and hence also make $1 - (1 - P(X > E(X)))^n < p$

$\endgroup$
3
$\begingroup$

A useful example to ponder is the lognormal, all of whose moments are finite.

We can (i) focus on the right tail, (ii) take $\mu=0$ without loss of generality, and (iii) work with the logs.

Consider, for $n$ i.i.d. variables $\sim N(0,\sigma^2)$, the probability $P(X_{(n)}<\frac12\sigma^2\,) \, $.

This is the same event as the largest order statistic of the corresponding lognormal being below its population mean, but taken to the log scale. The probability that the sample range doesn't include the population mean will be strictly larger than this.

The location of the distribution of $X_{(n)}$ grows slowly with $n$ (roughly like the log), and has a scale also involving $n$ (slowly getting smaller rather than larger), but here we fix $n$ so we won't need to worry about its properties in terms of $n$. The location and scale of the distribution of $X_{(n)}$ are proportional to $\sigma$. Consequently, for any $ε>0$, at any given $n$, you can make that probability given above larger than $1−ε$ by taking sufficiently large $\sigma$.

A quick illustrative example in R:

nsim=100000;sig=9;n=100;logmean=sig^2/2 mean(replicate(nsim,max(rnorm(n,0,sig))<logmean)) [1] 0.99969 

For that $\sigma$ and $n$, almost 100% of lognormal sample ranges don't include the lognormal's mean. The corresponding lognormal distribution is really skew and heavy-tailed*, but we're working on the log scale here, so conveniently we don't have to worry much about generating numbers that are too large.

* e.g. for large $\sigma$ the third moment-based skewness goes like ~$\exp( \frac{3}{2} \sigma^2)$ (in the sense that the ratio of the skewness to this approximation becomes close to 1), so this skewness would be very roughly on the order of $10^{52}$ (similarly the kurtosis is on the order of about $10^{140}$), where 'roughly' and 'about' with 'on the order of' refers to the exponent being about the right size.

$\endgroup$
2
  • $\begingroup$ +1. Insightful, thank you. $\endgroup$ Commented Sep 19 at 6:21
  • 1
    $\begingroup$ I managed to mangle the code in an edit. I think it is okay now $\endgroup$ Commented Sep 19 at 8:11

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.