I have two groups (A and B) and data on 3 different biomarkers (a, b and c). What I need to investigate if there is any difference between the groups in each biomarker, and if the difference between group A and B is most pronounced in biomarker a, b or c. The data is continuous but not normally distributed. It is not pos./negatively skewed and the population is relatively small, so I am thinking that a non-parametric test (e.g. Man-Whitney?) would be appropriate. I know that you can compare median scores, but how can I then estimate/quantify the difference? When I have a significant difference, can I just divide the two median scores of biomarker e.g. a in group A with biomarker a in group B (aA/aB) to get a ratio, and compare with other ratios that I calculate? Would be super happy for some advice!
1 Answer
It helps to start by putting this in the context of using each biomarker individually to distinguish groups A and B. That could be done by using the continuous values of each biomarker to construct a receiver operating characteristic (ROC) curve for distinguishing the 2 groups. The area under that curve (AUC) is a measure of the ability of that biomarker to distinguish the groups. It's also the concordance, the fraction of comparable pairs with one member from each of group A and B for which the direction of the difference in the biomarker value agrees with group membership.
The Mann-Whitney test isn't a test on medians unless you know that the distributions of values between the two groups only differ in position, not in shape. But the Mann-Whitney U statistic is directly related to the AUC and thus to the general measure of concordance.
So one approach would be to gauge whether your individual biomarkers differ significantly in concordance with the A versus B groups. That comparison would probably best be done by bootstrapping your data rather than relying on asymptotic normality of the Mann-Whitney statistic.
A potentially more informative approach, if the numbers in each group aren't too small, would be to model the A/B binary distinction as a logistic regression, starting with all 3 continuous biomarker values in a single model as predictors modeled as flexibly as possible, e.g. with splines. That would tell you whether you might be better off using all 3 biomarkers rather than restricting yourself to 1. You could test the contributions of the individual biomarkers by comparing that full model against models that include subsets of the biomarkers.
- $\begingroup$ I really appreciate you taking your time to answer my questions, thank you. Maybe I need to clearify what it is that I am looking for. So, I have group A for which I have data on 3 different biomarkers (blood sampels acutally) and the same goes for group B. What I aim to do is to see if there is any difference/increase in each separate biomarker between group A and B, and to somehow quantify the increase, to be able to tell in which biomarker the difference is the most pronounced. $\endgroup$Med stud Karl– Med stud Karl2021-09-29 15:34:01 +00:00Commented Sep 29, 2021 at 15:34
- $\begingroup$ I should also say, if that was not too obvious already, that I am quite new in statistics. I looked at the concepts of bootstrapping and ROC, but did not really see how I should use that to answer my question. $\endgroup$Med stud Karl– Med stud Karl2021-09-29 15:38:22 +00:00Commented Sep 29, 2021 at 15:38
- $\begingroup$ @Olof some problems can be approached from two different directions. You could approach this from the direction you seem to be thinking about, how much values of a biomarker differ between groups. The other approach is how well the values of a biomarker can distinguish the two groups. The relationship between Mann-Whitney and AUC/concordance puts those approaches together. Whichever biomarker shows the highest AUC/concordance has "the most pronounced" difference in an important sense. If you are working in biomarkers you need to become familiar with ROC, AUC, and related concepts. $\endgroup$EdM– EdM2021-09-29 16:06:17 +00:00Commented Sep 29, 2021 at 16:06
- $\begingroup$ Thank you for your advise! I will dig into these concepts for sure. I am thinking that maybe I am too fixated with the median value? Maybe there is some other way to see in what biomarker the difference is the most and the least pronounced? $\endgroup$Med stud Karl– Med stud Karl2021-09-30 10:26:20 +00:00Commented Sep 30, 2021 at 10:26
- $\begingroup$ @MedstudKarl don't get fixated on a single characteristic of a probability distribution. In your case, I'd recommend looking at density plots (essentially smoothed histograms) of the values of each biomarker, broken down into your two groups for comparison. That will show much more information about the biomarkers and their associations with the groups than any simple set of statistics (mean or median, standard deviation or median absolute deviation, skewness, etc.) can. And think hard about whether you really need to choose among the biomarkers; a combination of markers can be better. $\endgroup$EdM– EdM2021-09-30 14:58:00 +00:00Commented Sep 30, 2021 at 14:58