how to compare three groups (significant different higher and significant different lower)

Question

I have a Control group with two replicate and two treated group with two replicate. I want to know how I can identify the sample that are significantly different between control and treated 1 (higher expression) while significant different between control and treated 2 (lower expression)

This is an example data

df<-structure(list(C1 = c(0.003926348, 0.001642442, 6.72e-05, 0.000314789, 0.00031372, 0.000196342, 0.01318432, 8.86e-05, 0.005671017, 0.003616196, 0.026635645, 0.001136402, 0.000161111, 0.005777738, 0.000145104, 0.000996546, 4.27e-05, 0.000114159, 0.001152384, 0.002860251, 0.000284873), C2 = c(0.003901373, 0.001526195, 6.3e-05, 0.000387266, 0.000312458, 0.000256647, 0.012489205, 0.00013071, 0.005196136, 0.003059834, 0.024624562, 0.001025486, 0.000144964, 0.005659078, 0.000105755, 0.000844871, 5.88e-05, 0.000118831, 0.000999354, 0.002153167, 0.000214486), T1 = c(0.003646894, 0.001484503, 4.93e-05, 0.00036715, 0.000333378, 0.000244199, 0.010286787, 6.48e-05, 0.006180042, 0.00387491, 0.025428464, 0.001075376, 0.000122088, 0.005448152, 0.000103301, 0.000974826, 4.32e-05, 0.000109876, 0.001030364, 0.002777244, 0.000221169), T2 = c(0.00050388, 0.001135969, 0.000113829, 2.14e-06, 0.00010293, 0.000315704, 0.01160593, 8.46e-05, 0.004495437, 0.003062559, 0.018662663, 0.002096675, 0.000214814, 0.002177849, 8.61e-05, 0.001057254, 3.27e-05, 0.000115822, 0.008133257, 0.021657018, 0.000261339), G1 = c(0.001496712, 0.001640965, 0.000129124, 3.02e-06, 0.000122839, 0.000305686, 0.01378774, 0.000199637, 0.00534668, 0.00300097, 0.023290941, 0.002595433, 0.000262479, 0.002926346, 0.000125655, 0.001302624, 4.89e-05, 0.000122862, 0.009851791, 0.017621282, 0.000197561), G2 = c(0.00114337, 0.001285636, 0.000122848, 2.46e-06, 9.1e-05, 0.000288897, 0.012288087, 0.000122286, 0.002575368, 0.002158011, 0.022008677, 0.002017026, 0.000241754, 0.003340175, 0.00013424, 0.001517655, 4.78e-05, 0.000110353, 0.008293286, 0.018999466, 0.000191129)), .Names = c("C1", "C2", "T1", "T2", "G1", "G2"), row.names = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "PP", "TT", "EE", "FF", "AS"), class = "data.frame")

The first two columns are the control the second two columns are the treated 1 the third two columns are the treated 2

Please keep in mind what "sample" means in statistics. When you say "sample", do you mean "observation"? What are the rows in your data? — Roland
– Roland, Commented May 17, 2017 at 12:02
From your explanation I can infer the C1+T1 are the control and treatment for group 1, C2+T2 are control and treatment for group 2. What is G1 and G2? — Guilherme Marthe
– Guilherme Marthe, Commented May 23, 2017 at 13:49
@Guilherme Marthe C1 and C2 are control (one replicate) ,T1 and T2 are the first treatment (one replicate) and G1 and G2 are the second treatment (one replicate) — nik
– nik, Commented May 23, 2017 at 19:08

Ashe · Accepted Answer · 2017-05-26 13:59:04Z

There are two kinds of questions you could be asking. One is whether there is a treatment difference as a whole, and the other is which individuals responded to treatment. TLDR at the bottom.

Overall Treatment Difference

Given you have repeated measures of each subject, I would recommend using a mixed model design instead of a classical ANOVA or any $t$-tests. This allows for different intercept values for each observation. This will also control for any multiplicity by modeling the differences between control and treatments simultaneously.

In R and using your data, I would do this:

library(nlme) library(reshape2) df$obs <- rownames(df) data.lme <- melt(data=df, id.vars='obs', value.name='score', variable.name='trt') data.lme$trt.lvl <- substr(data.lme$trt, 1, 1) fit.lme <- lme(score ~ trt.lvl, random=~1|obs, data=data.lme) summary(fit.lme)

This gives an output of:

Linear mixed-effects model fit by REML Data: data.lme AIC BIC logLik -1041.937 -1027.876 525.9684 Random effects: Formula: ~1 | obs (Intercept) Residual StdDev: 0.005667867 0.002410868 Fixed effects: score ~ trt.lvl Value Std.Error DF t-value p-value (Intercept) 0.0031333349 0.0012915635 103 2.4260015 0.0170 trt.lvlG 0.0007085406 0.0005260946 103 1.3467933 0.1810 trt.lvlT 0.0001948673 0.0005260946 103 0.3704036 0.7118 Correlation: (Intr) trt.lG trt.lvlG -0.204 trt.lvlT -0.204 0.500 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.457389008 -0.204662471 -0.000118683 0.106147927 4.551752397 Number of Observations: 126 Number of Groups: 21

So for the data you posted, there doesn't appear to be any difference in the two treatment groups compared to the control, when accounting for repeated measures of individuals and using the usual $\alpha=0.05$ level of significance.

Which individuals respond to treatment - Edited responding to comments

After rereading some of your comments and of others (thank you @guilhermemarthe for encouraging me to think more about this), it seems like you want to know which had a positive mean change for Treatment G and a negative mean change for Treatment T. Further, you want to know whether these changes were statistically significant on a per-observation basis. Statistics isn't well designed to identify "significant differences" in individual observable units. Rather it focuses on changes in (statistical) samples and extrapolates those changes to populations of interest. I think by trying to find "significant" changes on the per-individual level, you're asking the wrong question.

But pressing forward with that as best as I can discern, if you want to identify what samples went up vs went down, that is simply an assessment of the change in means for each sample from control to respective treatment. In R, that looks like this:

data.mean <- dcast(data.lme, obs~trt.lvl, value.var='score', fun.aggregate=mean) data.mdiff <- data.frame(obs=data.mean$obs, trt.g=data.mean$G - data.mean$C, trt.t=data.mean$T - data.mean$C) data.mdiff$resp.g <- data.mdiff$trt.g > 0 data.mdiff$resp.t <- data.mdiff$trt.t < 0

The data frame then looks like:

obs trt.g trt.t resp.g resp.t 1 A -0.0025938195 -0.0018384735 FALSE TRUE 2 AS -0.0000553345 -0.0000084255 FALSE TRUE 3 B -0.0001210180 -0.0002740825 FALSE TRUE 4 C 0.0000608860 0.0000164645 TRUE FALSE 5 D -0.0003482875 -0.0001663825 FALSE TRUE 6 E -0.0002061695 -0.0000949350 FALSE TRUE 7 EE 0.0079966695 0.0035059415 TRUE FALSE 8 F 0.0000707970 0.0000534570 TRUE FALSE 9 FF 0.0158036650 0.0097104220 TRUE FALSE 10 G 0.0002011510 -0.0018904040 TRUE TRUE 11 H 0.0000513065 -0.0000349550 TRUE TRUE 12 I -0.0014725525 -0.0000958370 FALSE TRUE 13 J -0.0007585245 0.0001307195 FALSE FALSE 14 K -0.0029802945 -0.0035845400 FALSE TRUE 15 L 0.0012252855 0.0005050815 TRUE FALSE 16 M 0.0000990790 0.0000154135 TRUE FALSE 17 N -0.0025851475 -0.0019054075 FALSE TRUE 18 O 0.0000045180 -0.0000307290 TRUE TRUE 19 P 0.0004894310 0.0000953315 TRUE FALSE 20 PP -0.0000024000 -0.0000128000 FALSE TRUE 21 TT 0.0000001125 -0.0000036460 TRUE TRUE

To sort:

data.mdiff$ord <- data.mdiff$resp.g & data.mdiff$resp.t data.mdiff <- data.mdiff[order(data.mdiff$ord, decreasing=T),]

The flag in the resp.g and resp.t variable indicates which samples went up vs which went down, respective to the treatment applied. If you refine your definition of what "response" is, then you can adjust the thresholds for change in means accordingly. Any attempt to find "significance" of individual observations would only result in finding samples that deviate from some pre-defined distributional assumption (which is precisely what outlier detection is). I don't think you're looking for that.

Conclusion - TLDR

I think you're asking the wrong question to begin with. It seems like you need to first address the question of whether there actually is an overall treatment difference instead of trying to identify what samples "statistically responded." To look for responders necessarily assumes that response to treatment is real, and there doesn't seem to be evidence for that. That overall analysis (the first I presented) would tell you whether your intervention was effective in the (statistical) sample you observed. Once you establish what "response" looks like, then you can start determining whether individual samples meet that criterion without concerning yourself with significance.

Thanks for your message but I want to identify those observations that are significantly different in the first treatment (higher values) and significantly different in the second treatment (lower values) does it make sense? if you help me I will accept your answer — nik
– nik, Commented May 23, 2017 at 0:56
I'm actually concerned that your treatments don't have a significant effect overall, before I would consider any individual sample as being "higher" or "lower." Since there is no overall effect for either treatment, any "higher" or "lower" sample is likely due to random variation, and not any true response to that treatment. — Ashe
– Ashe, Commented May 23, 2017 at 13:15
I'm sorry, but as far as I know, you are not able to use the approach you suggest to compare between treatments, since the comparison is against the control group (or treatment level C in your data frame). Am I correct? — Guilherme Marthe
– Guilherme Marthe, Commented May 23, 2017 at 14:14
The fixed effects (trt.lvlG and trt.lvlT) and the associated significance are mean changes relative to the baseline group (trt.lvlC). So those mean differences are the comparison between the control group and the respective treatment groups. — Ashe
– Ashe, Commented May 23, 2017 at 14:27
But the significance is calculated having the null hypothesis relative to the control group. Unless something different is going on in the significance calculation due to mixed effects model that I am unaware of, you shouldn't use the significance in testing for differences between control X treatment to asses for differences between treatment A X treatment B — Guilherme Marthe
– Guilherme Marthe, Commented May 23, 2017 at 15:02

MadDataScientist · Accepted Answer · 2017-05-18 13:51:10Z

The way you phrase the question makes it seem like you want to compare observations across treatments. However, if the 3 groups are separate, it doesn't make sense to compare single observations. Instead, you probably want to look at whether the group means are significantly different.

One option is to use two Two-Sample T-Tests to compare the control group to each of the treatment groups. Building on your R code:

# Combine Replicates Control <- c(df$C1,df$C2) Treatment1 <- c(df$T1,df$T2) Treatment2 <- c(df$G1,df$G2) # Two Sample T-test for Control Group and Treatment 1 Group t.test(Control,Treatment1) # Two Sample T-test for Control Group and Treatment 2 Group t.test(Control,Treatment2)

You could also perform ANOVA:

# ANOVA groups = factor(rep(c('C','T1','T2'), each = 42)) data <- c(Control,Treatment1,Treatment2) mod <- lm(data~groups) anova(mod)

For this example, the immediate results indicate that there is not a significant difference in the group means. However, you need to keep in mind that the normality assumption is severely violated (as can be seen by plotting the 3 groups). An alternative test may be more appropriate.

can you tell me your alternative method? I want those observations that are significantly changed in the treatment versus control (higher values) while they are significantly different in the second treatment (lower values) — nik
– nik, Commented May 23, 2017 at 0:58

Stack Exchange Network

how to compare three groups (significant different higher and significant different lower)

2 Answers 2

Hot Network Questions

how to compare three groups (significant different higher and significant different lower)

2 Answers 2

Related

Hot Network Questions