Why do several (if not all) parametric hypothesis tests assume random sampling?

Question

Tests like Z, t, and several others assume that the data is based on a random sampling. Why?

Suppose that I'm doing experimental research, where I care much more for the internal validity than the external one. So, if my sample might be a little bit biased, okay, as I've accepted to not infer the hypothesis for the whole populations. And the grouping will still be random, i.e., I'll choose for convenience the sample participants, but I will randomly assign them to different groups.

Why can't I just ignore this assumption?

If the sampling technique introduces a bias, then it's not 'random'. If it does not introduce any bias then it is 'random' (for some definition of random;-). I've had sampling schemes that simply took every 7th sample to create a matched sample size to the counter sample. However I knew that there was no special aspect to that selection, so what may be thought of as a non-random sampling process was still effectively random. It's the same as selecting balls 1,2,3,4,5,6 on the lottery. It's just as random as any other sequence. — Philip Oakley
– Philip Oakley, Commented Apr 11, 2018 at 12:42
@PhilipOakley: selecting balls 1,2,3,4,5,6 on the lottery gives you the same chance of winning as any other selection, but reduces your expected winnings as you are more likely to have to share the prize with others who had the same idea — Henry
– Henry, Commented Apr 11, 2018 at 14:25
Systematic sampling, such as described by @Philip, often is analyzed as if it produced simple random samples, but it has pitfalls. For instance, if you were to measure a manufacturing process every day and sample every seventh measurement, you would be subject to confounding your results with a day-of-the-week effect, since (obviously) you would be sampling on the same day each week. You need to work harder to think of and address such subtleties when dealing with non-random samples. — whuber
– whuber ♦, Commented Apr 11, 2018 at 17:48
@whuber, Absolutely. One must think hard (and widely) about these things!! In my case I had hours of video, with hundreds of event, with long gaps between, so needed to reduce the data size of the non-event set for a simple logistic regression (each frame considered independently, little change between frames), so dropping lots of non-event frames was reasonable. The time sequence aspect was considered separately. — Philip Oakley
– Philip Oakley, Commented Apr 12, 2018 at 9:41
@Philip Interestingly, at almost the same time you were writing that comment about randomness not existing, the NIST issued a press release claiming it does. An account appears in today's (4 April 2018) issue of Nature. — whuber
– whuber ♦, Commented Apr 12, 2018 at 13:30

Ben · Accepted Answer · 2018-04-10 23:38:15Z

If you are not making any inference for a wider group than your actual sample, then there is no application of statistical tests in the first place, and the question of "bias" does not arise. In this case you would just calculate descriptive statistics of your sample, which are known. Similarly, there is no question of model "validity" in this case - you are just observing variables and recording their values, and descriptions of aspects of those values.

Once you decide to go beyond your sample, to make inferences about some larger group, then you will need statistics and you will need to consider issues like sampling bias, etc. In this application, random sampling becomes a useful property to assist in getting reliable inferences of the wider group of interest. If you don't have random sampling (and you don't know the probabilities of your samples based on the population) then it becomes hard/impossible to make reliable inferences about the population.

gung - Reinstate Monica · Accepted Answer · 2018-04-11 17:42:57Z

In real scientific research, it is quite rare to have data that came from true random sampling. The data are almost always convenience samples. This primarily affects what population you can generalize to. That said, even if they were a convenience sample, they did come from somewhere, you just need to be clear about where and the limitations that implies. If you really believe your data aren't representative of anything, then your study is not going to be worthwhile on any level, but that probably isn't true¹. Thus, it is often reasonable to consider your samples as drawn from somewhere and to use these standard tests, at least in a hedged or qualified sense.

There is a different philosophy of testing, however, that argues we should move away from those assumptions and the tests that rely on them. Tukey was an advocate of this. Instead, most experimental research is considered (internally) valid because the study units (e.g., patients) were randomly assigned to the arms. Given this, you can use permutation tests, that mostly only assume the randomization was done correctly. The counterargument to worrying too much about this is that permutation tests will typically show the same thing as the corresponding classical tests, and are more work to perform. So again, standard tests may be acceptable.

_{1. For more along these lines, it may help to read my answer here: Identifying the population and samples in a study.}

Michael Lew · Accepted Answer · 2018-04-11 04:16:43Z

Tests like Z, t, and several others are based on known sampling distributions of the relevant statistics. Those sampling distributions, as generally used, are defined for the statistic calculated from a random sample.

It may sometimes be possible to devise a relevant sampling distribution for non-random sampling, but in general it is probably not possible.

civilstat · Accepted Answer · 2025-10-07 01:55:34Z

These tests don't just incidentally assume random sampling. Instead, random sampling is what these tests are designed for.

Even if you've used careful study design to prevent all the other possible problems with a study (unbiased sampling, no measurement error, etc.), you still have to check whether sampling variability could be a problem. If you ran the study again, you'd presumably have gotten different participants, and thus different data, with different summary statistics. So, if the sampling variability is too large, then for instance you might get a positive treatment effect $\bar{x}_{meds}-\bar{x}_{placebo}$ this time but a negative effect next time, and thus you can't really trust the direction of the effect from any one study of this size.

Now, if you have a very large sample size, you'd probably get very similar summary statistics if you ran the study again. Is the sampling variability small enough that you can trust your statistics from this one study you actually ran? This question is essentially the only thing the classic hypothesis tests are checking for.

To do this check, hypothesis tests compare your actual observed statistic against a null distribution of "other statistics you could have seen if the null hypothesis were true." If you sampled your data randomly (simple random sampling / SRS, or independent & identically distributed / iid), this null distribution is straightforward to derive mathematically (or simulate computationally). But if your data are a convenience sample, then there's no clear way to derive/simulate an appropriate null distribution. If your actual statistic came from a convenience sample, it may or may not look "extreme" compared to a null distribution that assumes iid data... but that tells us nothing about how "extreme" it would look compared to other convenience samples.

Asides:

This is also why classic hypothesis tests get more complicated when you allow for "blocking" & other study-design features that minimize variance. If you know the data weren't sampled iid, there's no point in comparing your statistics against a baseline null distribution that was constructed under the iid assumption. Instead, you need to derive a null distribution that matches how your data were actually collected.0
As gung's answer points out, permutation tests can be appropriate if you carried out random assignment instead of random sampling. If that's true, and if it's a situation where the classic Z/t/etc tests are a good approximation for permutation tests, then you can use them. But you're using them because they approximate the permutation test, not because there was random sampling.

Stack Exchange Network

Why do several (if not all) parametric hypothesis tests assume random sampling?

4 Answers 4

Linked

Hot Network Questions

Why do several (if not all) parametric hypothesis tests assume random sampling?

4 Answers 4

Linked

Related

Hot Network Questions