0
$\begingroup$

I am creating and comparing ECDFs of percent cover value (0-100%) for three groups of biological observations. The dataset as a whole is large, with over a million observations. The three groups have highly unequal group sizes - group 1 is 90% of observations, group 2 is 7%, and group 3 is 3%. The data are highly right skewed with most observations in all groups being very low percent cover.

Within each of the three groups, there are multiple species represented. Each species has different numbers of observations. A single species does not appear in multiple groups. I have been taking a species agnostic approach and essentially ignoring species identity thus far.

My intent is to see whether the distribution of percent cover values is different between the three groups, and see the order of stochastic dominance. I've created ECDFs and then run pairwise KS tests with Bonferroni correction for multiple comparison. Because the data are so large, p values are all very low, so I'm paying attention to the visual ordering and D statistics as well to get a sense of what's meaningful.

My advisor is concerned that some dominant species with many observations will skew the overall cross-species trends. She has suggested that I take 4 random observations from each species (median # of obs per species) and create ECDFs based off of those data (this makes the dataset a good bit smaller as well). I'm struggling to wrap my head around whether that's appropriate. It seems like manipulating the # and frequency of observations is a bad idea for ECDFs. We do get different results if we do this compared to representing all observations as there are some very dominant species.

I'd appreciate some advice - I'm a student, and an ecologist, not a statistician, so please take that into account. Also, I am trying to do some regression modeling as well since that may be a better approach than the ECDFs & KS tests - but looking for advice about the ECDFs and KS test approach here.

$\endgroup$

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.