Wilcoxon signed rank test fails for small sample size

Question

Our data is n=4 and we are looking to test whether it is > 0. We use one-sided Wilcoxon signed rank test because it is non-parametric and apparently should handle low sample size better.

Using MATLAB:

>> data ans = 0.0709 0.0366 0.1228 0.0775 >> p = signrank(data,0,'tail','right') p = 0.0625

One-sided t-test gives p=0.0113 for the same data. Investigating further, I noticed that signrank does not really care about data for n=4:

>> p=signrank(rand(4,1),0,'tail','right') p = 0.0625 >> p=signrank(rand(4,1)+100,0,'tail','right') p = 0.0625

is it expected behavior and does it mean we can't use that test for n=4?

On the other hand, T-test works as expected for small sample size

>> [p,h]=ttest(rand(4,1)+100,0,'tail','right') p = 1.5816e-09 >> [p,h]=ttest(rand(4,1),0,'tail','right') p = 0.0380

Links:

Alexis · Accepted Answer · 2017-10-18 22:38:28Z

The Wilcoxon sign-rank test is a nonparametric test for difference in ranked differences between two groups.

Let's take that apart:

Wilcoxon: He gets the credit for developing this test.
sign-rank: The test statistic is based on the sum of signed ranking of the differences between paired or matched observations. What does that mean? (1) Take the differences between the first and second observations for each individual. (2) Rank those differences. (3) Sign those ranks the same as the sign of the corresponding numerical difference. The sign-rank test statistic is the smaller absolute value of summed signed ranks. Check it out:

 ID Obs 1 Obs 2 Obs 1–Obs 2 Rank + Rank – Rank 1 3 7 –4 1 –1 2 3 4 –1 3 –3 3 5 3 2 4 2 4 6 9 –3 2 –2

nonparametric: The test statistic does not care about the shape of the distribution of your data, as long as they are distributed the same way in observation 1, and the same way in observation 2. (This is often termed the independently and identically distributed assumption.)

So why can't you reject $H_{0}$ with a sample size of 4? (Well, you can, but not with a small $\alpha$-level.) The reason is because the sign-rank distribution underlies your test statistic: we reject test statistics when their value is too extreme (according to our choice of $\alpha$), meaning either too large or too small. But with $N=4$, we just cannot get very extreme. If there is no difference between observation 1 and observation 2 in our data—that is, we have a 50/50 chance of observing a positive vs. negative difference—then the odds of getting all positive signs, or all negative signs (the most extreme possible observation) is only 1 in 16, or 0.0625. We will never get a p-value smaller than 0.0625 with $n=4$ in a sign-rank test. Hence, your conundrum.

By contrast, the t test for mean difference is parametric—meaning it assumes your data are distributed normally—and, assuming your data are indeed approximately normally distributed (a tough thing to judge given $n=4$), can provide more statistical power to detect a difference.

Thanks, that makes so much sense and very clearly explained! Is there anything to be said about using t-test for this data? You got typo though: "We will never get a p-value smaller than 0.625", should be 0.0625 — aaaaa says reinstate Monica
– aaaaa says reinstate Monica, Commented Oct 18, 2017 at 22:28
@aaaaaa Cool, corrected. Also I updated the answer to address the t test. — Alexis
– Alexis, Commented Oct 18, 2017 at 22:38

Stack Exchange Network

Wilcoxon signed rank test fails for small sample size

1 Answer 1

Hot Network Questions

Wilcoxon signed rank test fails for small sample size

1 Answer 1

Related

Hot Network Questions