Two definitions of p-value: how to prove their equivalence?

Question

I'm reading through Larry Wasserman's book, All of Statistics, and currently about p-values (page 187). Let me first introduce some definitions (I quote):

Definition 1 The power function of a test with rejection region $R$ is defined by $$\beta(\theta)=P_{\theta}(X\in R)$$ The size of a test is defined to be $$\alpha = \sup_{\theta\in\Theta_0}\beta(\theta)$$ A test is said to have level $\alpha$ if its size is less than or equal $\alpha$.

This basically says that $\alpha$, the size is the "biggest" probability of an error of type I. The $p$-value is then defined via (I quote)

Definition 2 Suppose that for every $\alpha\in(0,1)$ we have a size $\alpha$ test with rejection region $R_\alpha$. Then, $$p\text{-value}=\inf\{\alpha:T(X^n)\in R_\alpha\}$$ where $X^n=(X_1,\dots,X_n)$.

For me this means: given a specific $\alpha$ there is a test and rejection region $R_\alpha$ so that $\alpha=\sup_{\theta\in\Theta_{0}(\alpha)}P_\theta(T(X^n)\in R_\alpha)$. For the $p$-value I simply take then the smallest of all these $\alpha$.

Question 1 If this would be the case, then I could clearly choose $\alpha = \epsilon$ for arbitrarily small $\epsilon$. What is my wrong interpretation of definition 2, i.e. what does it exactly mean?

Now Wasserman continuous and states a theorem to have an "equivalent" definition of $p$-value with which I'm familiar (I quote):

Theorem Suppose that the size $\alpha$ test is of the form $$\text{reject } H_0 \iff T(X^n)\ge c_\alpha$$ Then, $$p\text{-value} = \sup_{\theta\in\Theta_0}P_{\theta}(T(X^n)\ge T(x^n))$$ where $x^n$ is the observed value of $X^n$.

So here is my second question:

Question 2 How can I actually prove this theorem? Maybe it's due to my misunderstanding of the definition of the $p$-value, but I can't figure it out.

It's positively weird that Wasserman would define power as "$\beta$", since the symbol $\beta$ is almost universally used for the type II error rate (i.e. power = 1-$\beta$ for almost any other author discussing power). I'm finding it hard to imagine a choice of notation able to engender worse confusion except by deliberately setting out to cause it. — Glen_b
– Glen_b, Commented Oct 21, 2015 at 20:58
I agree that that is weird, Glen - however, Casella and Berger do the same thing and their text is, in my opinion, the gold standard for statistical theory. — Matt Brems
– Matt Brems, Commented Oct 26, 2015 at 21:22

score 7 · Accepted Answer · 2015-10-26 17:28:10Z

We have some multivariate data $x$, drawn from a distribution $\mathcal{D}$ with some unknown parameter $\theta$. Note that $x$ are sample outcomes.

We want to test some hypothesis about an unknown parameter $\theta$, the values of $\theta$ under the null hypothesis are in the set $\theta_0$.

In the space of the $X$, we can define a rejection region $R$, and the power of this region $R$ is then defined as $\mathcal{P}_\bar{\theta}^R=P_\bar{\theta}(x \in R)$. So the power is computed for a particular value $\bar{\theta}$ of $\theta$ as the probability that the sample outcome $x$ is in the rejection region $R$ when the value of $\theta$ is $\bar{\theta}$. Obviously the power depends on the region $R$ and on the chosen $\bar{\theta}$.

Definition 1 defines the size of the region $R$ as the supremum of all the values of $\mathcal{P}_\bar{\theta}^R$ for $\bar{\theta}$ in $\theta_0$, so only for values of $\bar{\theta}$ under $H_0$. Obviously this depends on the region, so $\alpha^R=sup_{\bar{\theta} \in \theta_0} \mathcal{P}_\bar{\theta}^R$.

As $\alpha^R$ depends on $R$ we have another value when the region changes, and this is the basis for defining the p-value: change the region, but in such a way that the sample observed value still belongs to the region, for each such region, compute the $\alpha_R$ as defined above and take the infimum: $pv(x)=inf_{R |_{x \in R}} \alpha^R$. So the p-value is the smallest size of all regions that contain $x$.

The theorem is then just a 'translation' of it, namely the case where the regions $R$ are defined using a statistic $T$ and for a value $c$ you define a region $R$ as $R=\{ x | T(x) \ge c \}$. If you use this type of region $R$ in the above reasoning, then the theorem follows.

EDIT because of comments:

@user8: for the theorem; if you define rejection regions as in the theorem, then a rejection region of size $\alpha$ is a set that looks like $R^\alpha= \{X | T(X) \ge c_\alpha \}$ for some $c_\alpha$.

To find the p-value of an observed value $x$, i.e. $pv(x)$ you have to find the smallest region $R$, i.e. the largest value of $c$ such that $\{X | T(X) \ge c \}$ still contains $x$, the latter (the region contains $x$) is equivalent (because of the way the regions are defined) to saying that $c \ge T(x)$, so you have to find the largest $c$ such that $\{X | T(X) \ge c \& c \ge T(x) \}$

Obviously, the largest $c$ such that $ c \ge T(x)$ should be $ c = T(x)$ and then the set supra becomes $\{ X | T(X) \ge c = T(x)\}=\{ X | T(X) \ge T(x)\}$

Many thanks for your answer. For the question about the validation of the theorem: Is there not somehow an $\inf$ over $\alpha$ missing? — math
– math, Commented Oct 25, 2015 at 16:06
@user8: I added a paragraph at the end of my answer, you see the point with the infimum now? — user83346
– user83346, Commented Oct 26, 2015 at 16:22

heropup · Accepted Answer · 2015-10-15 19:38:21Z

8

In Definition 2, the $p$-value of a test statistic is the greatest lower bound of all $\alpha$ such that the hypothesis is rejected for a test of size $\alpha$. Recall that the smaller we make $\alpha$, the less tolerance for Type I error we are allowing, thus the rejection region $R_\alpha$ will also decrease. So (very) informally speaking, the $p$-value is the smallest $\alpha$ we can choose that still lets us reject $H_0$ for the data that we observed. We cannot arbitrarily choose a smaller $\alpha$ because at some point, $R_\alpha$ will be so small that it will exclude (i.e., fail to contain) the event we observed.

Now, in light of the above, I invite you to reconsider the theorem.

answered Oct 15, 2015 at 19:38

heropup

5,6511 gold badge19 silver badges26 bronze badges

$\begingroup$ I'm still a little bit confused. So first, in definition $2$ is the statistic $T$ fixed for all $\alpha$? I disagree with your statement: "...at some point, $R_\alpha$ will be so small that it will exclude (i.e., fail to contain) the event we observed." Perfectly fine, if $R_\alpha$ is so small that it doesnt contain the observed sample, we dont reject $H_0$. What is the problem with this? thanks for you help / patience $\endgroup$

math
– math

2015-10-16 17:37:58 +00:00
Commented Oct 16, 2015 at 17:37
$\begingroup$ Yes. The test statistic $T$ is a predetermined fixed function of the sample, where "fixed" in this sense means that the form of the function does not change for any $\alpha$. The value it takes on may (and should) depend on the sample. Your statement "we don't reject $H_0$" reveals why your disagreement is incorrect: by definition, $R_\alpha$ comprises the set of all values for which the test statistic leads to rejection of the null. That's why it's labeled $R$--for "R"ejection. I will post an update to my answer to explain in more detail. $\endgroup$

heropup
– heropup

2015-10-16 17:45:54 +00:00
Commented Oct 16, 2015 at 17:45
$\begingroup$ Many thanks for your quick answer and in advance for your updated version. What I meant was the following: We reject $H_0$ if $T(x_n)\in R_\alpha$, where $x_n$ is the observed sample. Say I'm very extreme and choose $R_\alpha$ very small, so that for the given sample $T(x_n)\notin R_\alpha$ which just means we DONT reject $H_0$. So a small $R_\alpha$ isnt apriori a bad thing. Clearly, at one point it is so small, that's very very very unlikely to observe a sample belonging to $R_\alpha$. Again, thanks for your patience / help. really appreciated! $\endgroup$

math
– math

2015-10-16 17:50:22 +00:00
Commented Oct 16, 2015 at 17:50
2

$\begingroup$ The given definition of p-value explicitly requires the test statistic for the sample to be in the rejection region. You are not free to change that part of the definition of p-value. $\endgroup$

Glen_b
– Glen_b

2015-10-21 21:03:44 +00:00
Commented Oct 21, 2015 at 21:03
$\begingroup$ @Glen_b Thanks for the comment. Indeed, my previous comment does violate the definition. Thanks for pointing it out. $\endgroup$

math
– math

2015-10-25 15:59:02 +00:00
Commented Oct 25, 2015 at 15:59

Add a comment |

Stack Exchange Network

Two definitions of p-value: how to prove their equivalence?

2 Answers 2

EDIT because of comments:

Hot Network Questions

Two definitions of p-value: how to prove their equivalence?

2 Answers 2

EDIT because of comments:

Related

Hot Network Questions