Revisions to How to choose between t-test or non-parametric test e.g. Wilcoxon in small samples

Question Protected by gung - Reinstate Monica

occurred Aug 28, 2024 at 18:02

replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/

edited Apr 13, 2017 at 12:44

1

Many introductory textbooks and lecture notes give a "flowchart" approach where normality is checked (either – inadvisedly inadvisedly – by normality test, or more broadly by QQ plot or similar) to decide between a t-test or non-parametric test. For the unpaired two-sample t-test there may be a further check for homogeneity of variance to decide whether to apply Welch's correction. One issue with this approach is the way the decision on which test to apply depends on the observed data, and how this affects the performance (power, Type I error rate) of the selected test.

If performing an "unrelated samples" or "unpaired" t-test, whether to use a Welch correction? Some people use a hypothesis test for equality of variances, but here it would have low power; others check whether SDs are "reasonably" close or not (by various criteria). Is it safer simply to always use the Welch correction for small samples, unless there is some good reason to believe population variances are equal?
If you see the choice of methods as a trade-off between power and robustness, claims about the asymptotic efficiency of the non-parametric methods are unhelpful. The rule of thumb that "Wilcoxon tests have about 95% of the power of a t-test if the data really are normal Wilcoxon tests have about 95% of the power of a t-test if the data really are normal, and are often far more powerful if the data is not, so just use a Wilcoxon" is sometimes heard, but if the 95% only applies to large $n$, this is flawed reasoning for smaller samples.
Small samples may make it very difficult, or impossible, to assess whether a transformation is appropriate for the data since it's hard to tell whether the transformed data belong to a (sufficiently) normal distribution. So if a QQ plot reveals very positively skewed data, which look more reasonable after taking logs, is it safe to use a t-test on the logged data? On larger samples this would be very tempting, but with small $n$ I'd probably hold off unless there had been grounds to expect a log-normal distribution in the first place.
What about checking assumptions for the non-parametrics? Some sources recommend verifying a symmetric distribution before applying a Wilcoxon test (treating it as a test for location rather than stochastic dominance), which brings up similar problems to checking normality. If the reason we are applying a non-parametric test in the first place is a blind obedience to the mantra of "safety first", then the difficulty assessing skewness from a small sample would apparently lead us to the lower power of a paired sign test.

Many introductory textbooks and lecture notes give a "flowchart" approach where normality is checked (either – inadvisedly – by normality test, or more broadly by QQ plot or similar) to decide between a t-test or non-parametric test. For the unpaired two-sample t-test there may be a further check for homogeneity of variance to decide whether to apply Welch's correction. One issue with this approach is the way the decision on which test to apply depends on the observed data, and how this affects the performance (power, Type I error rate) of the selected test.

If performing an "unrelated samples" or "unpaired" t-test, whether to use a Welch correction? Some people use a hypothesis test for equality of variances, but here it would have low power; others check whether SDs are "reasonably" close or not (by various criteria). Is it safer simply to always use the Welch correction for small samples, unless there is some good reason to believe population variances are equal?
If you see the choice of methods as a trade-off between power and robustness, claims about the asymptotic efficiency of the non-parametric methods are unhelpful. The rule of thumb that "Wilcoxon tests have about 95% of the power of a t-test if the data really are normal, and are often far more powerful if the data is not, so just use a Wilcoxon" is sometimes heard, but if the 95% only applies to large $n$, this is flawed reasoning for smaller samples.
Small samples may make it very difficult, or impossible, to assess whether a transformation is appropriate for the data since it's hard to tell whether the transformed data belong to a (sufficiently) normal distribution. So if a QQ plot reveals very positively skewed data, which look more reasonable after taking logs, is it safe to use a t-test on the logged data? On larger samples this would be very tempting, but with small $n$ I'd probably hold off unless there had been grounds to expect a log-normal distribution in the first place.
What about checking assumptions for the non-parametrics? Some sources recommend verifying a symmetric distribution before applying a Wilcoxon test (treating it as a test for location rather than stochastic dominance), which brings up similar problems to checking normality. If the reason we are applying a non-parametric test in the first place is a blind obedience to the mantra of "safety first", then the difficulty assessing skewness from a small sample would apparently lead us to the lower power of a paired sign test.

Many introductory textbooks and lecture notes give a "flowchart" approach where normality is checked (either – inadvisedly – by normality test, or more broadly by QQ plot or similar) to decide between a t-test or non-parametric test. For the unpaired two-sample t-test there may be a further check for homogeneity of variance to decide whether to apply Welch's correction. One issue with this approach is the way the decision on which test to apply depends on the observed data, and how this affects the performance (power, Type I error rate) of the selected test.

If performing an "unrelated samples" or "unpaired" t-test, whether to use a Welch correction? Some people use a hypothesis test for equality of variances, but here it would have low power; others check whether SDs are "reasonably" close or not (by various criteria). Is it safer simply to always use the Welch correction for small samples, unless there is some good reason to believe population variances are equal?
If you see the choice of methods as a trade-off between power and robustness, claims about the asymptotic efficiency of the non-parametric methods are unhelpful. The rule of thumb that "Wilcoxon tests have about 95% of the power of a t-test if the data really are normal, and are often far more powerful if the data is not, so just use a Wilcoxon" is sometimes heard, but if the 95% only applies to large $n$, this is flawed reasoning for smaller samples.
Small samples may make it very difficult, or impossible, to assess whether a transformation is appropriate for the data since it's hard to tell whether the transformed data belong to a (sufficiently) normal distribution. So if a QQ plot reveals very positively skewed data, which look more reasonable after taking logs, is it safe to use a t-test on the logged data? On larger samples this would be very tempting, but with small $n$ I'd probably hold off unless there had been grounds to expect a log-normal distribution in the first place.
What about checking assumptions for the non-parametrics? Some sources recommend verifying a symmetric distribution before applying a Wilcoxon test (treating it as a test for location rather than stochastic dominance), which brings up similar problems to checking normality. If the reason we are applying a non-parametric test in the first place is a blind obedience to the mantra of "safety first", then the difficulty assessing skewness from a small sample would apparently lead us to the lower power of a paired sign test.