Which statistical test is correct to compare two groups of very different sizes?

Question

I have two groups that consist of the tree fraction and elevation. The sizes of the groups are very different, and the mean of tree fraction and elevation are included in the table, below. They are both normally distributed.

 Group 1 Group 2 n 287 620240 Avg. tree fraction ± SD 31 ± 16 % 24 ± 17 % Avg. elevation ± SD 301 ± 222 m 524 ± 415 m

The hypothesis was that group 1 had higher mean tree cover and lower elevation, and the significance level was 95 %.

For now, I have computed a z test which showed the two groups were significantly different, however, I suspect this is due to the large size of group 2. Which proper statistical test should be used for cases like this? Is MANOVA a test to use?

Dave · Accepted Answer · 2021-01-07 12:19:42Z

Your test is functioning properly. A large sample size provides more compelling evidence of a difference, as it should. The test will be able to pick up on small differences, perhaps smaller differences than you deem to be of practical importance, given your domain knowledge, but the test is functioning properly.

The more standard test for this would be a t-test, since you don’t know the population variances, but with what appears to be over 600,000 observations, they’ll hardly be different (but your reviewer probably expects a t-test).

MANOVA, which is Hotelling’s test in the two-sample case, cannot pin down which variable contributes to the significant difference, so you would say that the mean vectors are different but could not pin down if both variables differ individually. This may make MANOVA inappropriate for your work, particularly if you assume independence of your two variables. MANOVA, like any other frequentist hypothesis test, will become very sensitive to small differences when you have large sample sizes.

Thanks for the elaborating answer @Dave. I do not need to detect the contribution of individual variables, but rather check whether tree fraction is different between Group 1 and Group 2. Is it therefore stil valid to 'just' use the z-test? — Thomas
– Thomas, Commented Jan 7, 2021 at 12:37
And, I thought as a rule of thumb that the t-test was for sample sizes of < 30, or am I wrong here? — Thomas
– Thomas, Commented Jan 7, 2021 at 12:41
If you just need to check tree fraction, why are you checking both variables? \\ $30=\infty$ is a common statistics joke about the central limit theorem. Since you’re working with samples you believe to come from normal populations, you do not have to appeal to convergence of the $z$ or $t$ test statistics. — Dave
– Dave, Commented Jan 7, 2021 at 12:46
Okay, that is good to know. I will resolve to a t-test, thanks. No no, I also need to perform a similar test for elevation, however, the relative importance/contribution is not needed. What test would you suggest if data is not normally distributed? — Thomas
– Thomas, Commented Jan 7, 2021 at 13:05
What do you mean by the “relative importance is not needed”? // The question about non-normal data would make for an good separate question (remember that Cross Validated is Q&A, not a discussion forum), though you’ll find answers if you search a bit on here. — Dave
– Dave, Commented Jan 7, 2021 at 13:19

Stack Exchange Network

Which statistical test is correct to compare two groups of very different sizes?

1 Answer 1

Hot Network Questions

Which statistical test is correct to compare two groups of very different sizes?

1 Answer 1

Related

Hot Network Questions