Group comparison (and pairwise tests) with non-independent data

Question

I'm trying to determine if there are significant differences between groups and identify which groups differ from each other.

First, let me explain the experimental design: I have 6 Locations. Inside each location, there are 3 Plots, where each Plot corresponds to a different Treatment (so each location has only one of each treatment). To increase sample size, I divided each Plot into 4 quadrants, and collected the information I needed from each quadrant. The information is the abundance of a specific fungus, we can call it Abundance, and it refers to the relative abundance of a species X: the abundance of species X over the abundance of all species in each sample, in percentage (it is important that this must be a percentage, as this is compositional data and the total number of each species varies in each sample).

The problem now is that all four samples within each plot are not independent, and even the plots within each location are correlated.

How can I compare the Abundance between each Treatment while accounting for the structure of my data? (Kruskal-Wallis and Dunn's test? Wilcoxon? Mixed models? Permutation test? Bootstrapping?)

I don't want to average values for each Plot, since this will lose a lot of the variability that is inside the plot. Also, it would be best if I could use a non-parametric test, since the distribution is not normal (inverted J: many small values and zeros).

Please edit the question to explain how you measure the Abundance: that is, what constitutes the numerator and the denominator in the percentage calculation. Also, please edit to clarify what aspect of the distribution is "not normal." It's not a problem if there isn't normality among all the outcome observations, as the treatment presumably affects the outcomes. Even within-treatment non-normality of outcomes isn't necessarily a problem, however. Depending on how you calculate the percentages, there might be a good way to deal with the "non-normality" with a different type of model. — EdM
– EdM, Commented Oct 1 at 17:58
Thank you @EdM, I edited it: the abundance refers to the relative abundance of a species X over the total abundance of all species. And it's not normal because it has many small values compared to larger values (inverted J). — Barcik
– Barcik, Commented Oct 1 at 19:30
@Barcik I do not understand why you made 4 quadrants per plot. If you count the abundance (absolute frequency) in each quadrant, you can also sum these across the four quadrants to obtain the total abundance per plot and of course also the relative abundance. And this, say "total", abundance is a better measure for the given plot than each of the four quadrants abundances. No need to take averages, just sum and divide by N in the plot. Or: forget quadrants. — BenP
– BenP, Commented Oct 2 at 17:51
@BenP I didn't explained exactly what is the variable abundance because I deemed irrelevant for Cross Validated. But here it is: in each quadrant I collect soil samples; each soil sample have many fungi, so the technique I used detects the number of DNA snippets (counts) for each species. The abundance is the number of DNA counts of a species divided by the total DNA count. So, each soil sample is a representation of the fungi that exists in the soil. I chose quadrants to get a good representation of area of the plot (which is big: a 30m radius circle in a forest). — Barcik
– Barcik, Commented Oct 3 at 13:28
@Barcick Ah, now I understand your data better! So, from each quadrant you took a number of. samples, probably more than one. Each sample leads to a percentage. That makes summing frequencies less meaningful, as I suggested first. Means are better then. I find this not irrelevant for your question, maybe add it for future readers.Thanks for explaining. — BenP
– BenP, Commented Oct 4 at 6:22

jginestet · Accepted Answer · 2025-10-04 16:30:56Z

The simplest way to approach your situation is to
a) average the values obtained from the 4 quadrants. You say "I don't want to average values for each Plot, since this will lose a lot of the variability that is inside the plot". But that is exactly why you want to average them; that average is a much better estimate of the true Abundance in that plot. Then
b) for each location, you compute the paired differences between the 3 treatments (A-B, A-C, B-C). You will end up with 3 sets of paired differences, each with 6 observations.

You also say that "it would be best if I could use a non-parametric test, since the distribution is not normal". The distribution of the percentages is definitively not normal (it is bound in [0,1]!)(But it is not the marginal distributions which need to be normal, it is the 3 sets of paired differences (which often tend to be normal, even when the marginal ones are not). In any case, with only 6 observations, any eyyeballing, or Q-Q plot, or even formal test is a bit of an exercise in futility. You truly have no way to determine whether these 3 paired samples come, or not, from a normal distribution. And I am willing to bet that a formal test, like Shapiro-Wilk, will fail to reject (due to the small sample size).

Then c), you just run 3 1-sample t-tests, possibly with a Bonferroni correction. You may even think of running them single-sided, since I assume you are looking for "the best" treatment (i.e. the one with larger -or smaller- mean).

If you absolutely have to have a non-parametric test, first do not even think of any permutation/bootstrap approach; your sample size is much too small.
You could use Kruskall-Wallis with post-hoc via Dunn's test, but 1) do this on the paired differences 2) K-W does not test the same null (it tests for stochastic superiority. And it certainly does not test for the medians!) 3) K-W suffers from a non-transitivity issue.
You could also use 3 Brunner-Munzel tests (again, on the 3 sets of paired differences). Brunner-Munzel is preferred to the Mann-Whitney U test (MWUt), because the MWUt suffers from the Behrens-Fisher problem (unequal variances). And use a MCC (Bonferroni?) to account for the multiple testing.
Last you could use Mood's median test, which works to compare 2 or more medians. But at 6 observations per group, I am afraid that this test would have very low power.

Honestly, in your situation, I would go with 3 Welch t-tests.

Last, a word of caution about using relative abundance (as a % of total mushrooms). Say that at a location, treatment A produces 10 mushrooms of species 1, and 10 of all the other species. The %abundance would be .5 (50%). Treatment B, in the next plot, produces 20 mushrooms of species 1, and 60 of all the other species. The %abundance would be .25 (25%). If you compare the %abundance, you conclude that treatment A is "better". But if you use absolute counts, you conclude that treatment B is "better". Which to use is very context dependent (depends on what you will do based on the results, and you have not provided enough details for me to have an pov on this). But you should make sure you are using the right variable (relative counts, or absolute).

Thank you @jginestet I have some questions: first, I assume the paired differences are absolute, right? Also, I don't understand why we are testing the paired differences; shouldn't we make the Welch t-test with the pairwise average abundances? — Barcik
– Barcik, Commented Oct 2 at 14:54
@Barcik, differences are not absolute (as in absolute value), but signed (you want to see if treatment A has larger/smaller abundance). Now, you could use the %Abundance (a relative abundance), but it may be better if you used the "absolute" abundance (e.g. in count per unit surface, number of mushrooms per m^2). %abundance can be misleading (you could have a low count, but high relative abundance). And we test the paired differences because there is dependency accross a given location; so you compute paired differences, and run a test accross locations. — jginestet
– jginestet, Commented Oct 2 at 15:36
thank you. So, in this case it is a One Sample Welch's t-test using the paired differences (and mu=0), correct? — Barcik
– Barcik, Commented Oct 2 at 15:47
@Barcik, why adjust p-value? Because you are making 3 comparisons, and thus increase your chance of false positives. But you can of course decide not to do so (if, e.g. this is just a preliminary study). But in any case, you will ned to address the issue of multiple comparisons (and justify your choice to do it, or not). Why Bonferroni? Because it is the simplest, and with only 3 comparisons, it is close enough (the other methods will not give very different results). But feel free to use any other method (BH is fine), or none at all... — jginestet
– jginestet, Commented Oct 3 at 16:07
@Barcik, the reason ANOVA gave "strange results" when used on the paired differences is that it tests which paired differences are differrent, not which treatments are different. The t-tests test which paired difference has means different from 0. But if you use the absolute values, then ANOVA tells you which treatment is "best". — jginestet
– jginestet, Commented Oct 3 at 16:12

EdM · Accepted Answer · 2025-10-04 16:42:21Z

The answer from @jginestet (+1) covers critical general issues. As noted there, the distinction between relative and total abundance can be very important in terms of biological interpretation, even if the statistical tests on relative abundance are correct.

The nature of the technical measurement and the resulting outcome measure can matter a lot in statistical analysis. Particularly for others who might come upon this question, I provide some further details.

In this case (from a comment on the question):

the technique I used detects the number of DNA snippets (counts) for each species. The abundance is the number of DNA counts of a species divided by the total DNA count.

Although you might get away with t-tests for treatment-associated relative abundance differences in this situation, with your focus on a single fungal species and only 3 treatments, there are better ways to work with compositional data from DNA sequencing. Those methods take into account both the count-based nature of the sequencing method used here, the type of DNA analysis done, and technical sources of bias.

Count-based DNA sequencing

You have "many small values compared to larger values" (from another comment). The precision of estimating a "small value" can be quite different from that of measuring a "larger value." That can pose a problem with t-tests. Count-based DNA sequencing data are typically analyzed based on an underlying negative-binomial distribution or other methods that take into account how estimation errors depend on the magnitudes of the values. The Bioconductor sequencing workflow provides many resources.

Type of DNA analysis

For DNA analysis, you presumably amplified particular DNA regions specific to fungi, then sequenced those regions to distinguish variants associated with particular species (amplicon sequencing). That's only one way to use DNA to estimate microbial populations, however. See this web site on "Microbiome Sequencing for Understanding Microbial Diversity." Statistical tests appropriate for amplicon sequencing might not be the best choice for other DNA preparation/analysis methods.

Technical sources of bias

Even given the within-species focus of your study, there can be bias introduced by the sample preparation and analysis pipeline that limits the interpretation of your simple DNA ratios in terms of species abundance. As this review by Luo et al., "Extracting abundance information from DNA-based data," Molecular Ecology Resources 23: 174-189, (2023) says (emphasis in original):

the combination of species pipeline biases and pipeline noise still causes the number of DNA sequences assigned to a species in a sample to be an error-prone measure of the abundance of that species in that sample.

That review discusses such sources of bias and ways to try to correct for them. It's possible that this type of bias might differ even among your 4 technical replicates within each Plot, in which case you might be better off correcting for such replicate-specific bias before combining their results.

Stack Exchange Network

Group comparison (and pairwise tests) with non-independent data

2 Answers 2

Linked

Hot Network Questions

Group comparison (and pairwise tests) with non-independent data

2 Answers 2

Linked

Related

Hot Network Questions