Revisions to Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean?

added 2 characters in body

edited Feb 23, 2014 at 15:24

616
6
10

On the one hand, before observing the data, $C_\alpha(X)$ is a random set (or random interval) and the probability that "$C_\alpha(X)$ contains the mean $\mu_\theta$" is, at least, (1-\alpha)$(1-\alpha)$ for all $\theta \in \Theta$. This is a very desirable feature for the frequentist paradigm.

deleted 14 characters in body

Source Link

edited Jan 2, 2014 at 15:29

Alexandre Patriota

616
6
10

The concept of population mean is model-dependent. As all models are wrong, but some are useful, this population mean is a fiction that is defined just to provide useful interpretations. The fiction begins with a probability model.

The probability model is defined by the triplet $$(\mathcal{X}, \mathcal{F}, P),$$ where $\mathcal{X}$ is the sample space (a non-empty set), $\mathcal{F}$ is a family of subsets of $\mathcal{X}$ and $P$ is a well-defined probability measure defined over $\mathcal{F}$ (it governs the data behavior). Without loss of generality, consider only the discrete case. The population mean is defined by $$ \mu = \sum_{x \in \mathcal{X}} xP(X=x), $$ that is, it represents the central tendency under $P$ and it can also be interpreted as the center of mass of all points in $\mathcal{X}$, where the weight of each $x \in \mathcal{X}$ is given by $P(X=x)$.

The concept of population mean is model-dependent. As all models are wrong, but some are useful, this population mean is a fiction that is defined just to provide useful interpretations. The fiction begins with a probability model.

In the probability theory, the measure $P$ is considered known, therefore the population mean is accessible through the above simple operation. However, in practice, the probability $P$ is hardly known. Without a probability $P$, one cannot describe the probabilistic behavior of the data. As we cannot set a precise probability $P$ to explain the data behavior, we set a family $\mathcal{M}$ containing probability measures that possibly govern (or explain) the data behavior. Then, the classical statistical model emerges $$(\mathcal{X}, \mathcal{F}, \mathcal{M}).$$ The above model is said to be a parametric model if there exists $\Theta \subseteq \mathbb{R}^p$ with $p< \infty$ such that $\mathcal{M} \equiv \{P_\theta: \ \theta \in \Theta\}$. Let us consider just the parametric model in this post.

The probability model is defined by the triplet $$(\mathcal{X}, \mathcal{F}, P),$$ where $\mathcal{X}$ is the sample space (a non-empty set), $\mathcal{F}$ is a family of subsets of $\mathcal{X}$ and $P$ is a well-defined probability measure defined over $\mathcal{F}$ (it governs the data behavior). Without loss of generality, consider only the discrete case. The population mean is defined by $$ \mu = \sum_{x \in \mathcal{X}} xP(X=x), $$ that is, it represents the central tendency under $P$ and it can also be interpreted as the center of mass of all points in $\mathcal{X}$, where the weight of each $x \in \mathcal{X}$ is given by $P(X=x)$.

Notice that, for each probability measure $P_\theta \in \mathcal{M}$, there is a respective mean definition $$\mu_\theta = \sum_{x \in \mathcal{X}} x P_\theta(X=x).$$ That is, there is a family of population means $\{\mu_\theta: \ \theta \in \Theta\}$ that depends tightly on the definition of $\mathcal{M}$. The family $\mathcal{M}$ is defined by limited humans and therefore it may not contain the true probability measure that governs the data behavior. Actually, the chosen family will hardly contain the true measure, moreover this true measure may not even exist. As the concept of a population mean depends on the probability measures in $\mathcal{M}$, the population mean is model-dependent.

In the probability theory, the measure $P$ is considered known, therefore the population mean is accessible through the above simple operation. However, in practice, the probability $P$ is hardly known. Without a probability $P$, one cannot describe the probabilistic behavior of the data. As we cannot set a precise probability $P$ to explain the data behavior, we set a family $\mathcal{M}$ containing probability measures that possibly govern (or explain) the data behavior. Then, the classical statistical model emerges $$(\mathcal{X}, \mathcal{F}, \mathcal{M}).$$ The above model is said to be a parametric model if there exists $\Theta \subseteq \mathbb{R}^p$ with $p< \infty$ such that $\mathcal{M} \equiv \{P_\theta: \ \theta \in \Theta\}$. Let us consider just the parametric model in this post.

The Bayesian approach considers a prior probability over the subsets of $\mathcal{M}$ (or, equivalently, $\Theta$), but in this post I will concentrated only on the classical version.

Notice that, for each probability measure $P_\theta \in \mathcal{M}$, there is a respective mean definition $$\mu_\theta = \sum_{x \in \mathcal{X}} x P_\theta(X=x).$$ That is, there is a family of population means $\{\mu_\theta: \ \theta \in \Theta\}$ that depends tightly on the definition of $\mathcal{M}$. The family $\mathcal{M}$ is defined by limited humans and therefore it may not contain the true probability measure that governs the data behavior. Actually, the chosen family will hardly contain the true measure, moreover this true measure may not even exist. As the concept of a population mean depends on the probability measures in $\mathcal{M}$, the population mean is model-dependent.

The Bayesian approach considers a prior probability over the subsets of $\mathcal{M}$ (or, equivalently, $\Theta$), but in this post I will concentrated only on the classical version.

b. What is the definition and the purpose of a confidence interval?

As aforementioned, the population mean is model-dependent and provides useful interpretations. However, we have a family of population means, because the statistical model is defined by a family of probability measures (each probability measure generates a population mean). Therefore, based on an experiment, inferential procedures should be employed in order to estimate a small set (interval) containing good candidates of population means. One well-known procedure is the ($1-\alpha$) confidence region, which is defined by a set $C_\alpha$ such that, for all $\theta \in \Theta$, $$ P_\theta(C_\alpha(X) \ni \mu_\theta) \geq 1-\alpha \ \ \ \mbox{and} \ \ \ \inf_{\theta\in \Theta} P_\theta(C_\alpha(X) \ni \mu_\theta) = 1-\alpha, $$ where $P_\theta(C_\alpha(X) = \varnothing) = 0$ (see Schervish, 1995). This is a very general definition and encompasses virtually any type of confidence intervals. Here, $P_\theta(C_\alpha(X) \ni \mu_\theta)$ is the probability that $C_\alpha(X)$ contains $\mu_\theta$ under the measure $P_\theta$. This probability should be always greater than (or equal to) $1-\alpha$, the equality occurs at the worst case.

As aforementioned, the population mean is model-dependent and provides useful interpretations. However, we have a family of population means, because the statistical model is defined by a family of probability measures (each probability measure generates a population mean). Therefore, based on an experiment, inferential procedures should be employed in order to estimate a small set (interval) containing good candidates of population means. One well-known procedure is the ($1-\alpha$) confidence region, which is defined by a set $C_\alpha$ such that, for all $\theta \in \Theta$, $$ P_\theta(C_\alpha(X) \ni \mu_\theta) \geq 1-\alpha \ \ \ \mbox{and} \ \ \ \inf_{\theta\in \Theta} P_\theta(C_\alpha(X) \ni \mu_\theta) = 1-\alpha, $$ where $P_\theta(C_\alpha(X) = \varnothing) = 0$ (see Schervish, 1995). This is a very general definition and encompasses virtually any type of confidence intervals. Here, $P_\theta(C_\alpha(X) \ni \mu_\theta)$ is the probability that $C_\alpha(X)$ contains $\mu_\theta$ under the measure $P_\theta$. This probability should be always greater than (or equal to) $1-\alpha$, the equality occurs at the worst case.

Remark: The readers should notice that it is not necessary to make assumptions on the state of reality, the confidence region is defined for a well-defined statistical model without making reference to any "true" mean. Even if the "true" probability measure does not exist or it is not in $\mathcal{M}$, the confidence region definition will work, since the assumptions are about statistical modelling rather than the states of reality.

Remark: The readers should notice that it is not necessary to make assumptions on the state of reality, the confidence region is defined for a well-defined statistical model without making reference to any "true" mean. Even if the "true" probability measure does not exist or it is not in $\mathcal{M}$, the confidence region definition will work, since the assumptions are about statistical modelling rather than the states of reality.

On the one hand, before observing the data, $C_\alpha(X)$ is a random set (or random interval) and the probability that "$C_\alpha(X)$ contains the mean $\mu_\theta$" is, at least, (1-\alpha) for all $\theta \in \Theta$. This is a very desirable feature for the frequentist paradigm.

On the one hand, before observing the data, $C_\alpha(X)$ is a random set (or random interval) and the probability that "$C_\alpha(X)$ contains the mean $\mu_\theta$" is, at least, (1-\alpha) for all $\theta \in \Theta$. This is a very desirable feature for the frequentist paradigm.

On the other hand, after observing the data $x$, $C_\alpha(x)$ is just a fixed set and the probability that "$C_\alpha(x)$ contains the mean $\mu_\theta$" should be in {0,1} for all $\theta \in \Theta$.