2
$\begingroup$

I ran a simple linear mixed model using lmer on a dataset for participants who had been randomly assigned to one of two conditions and were measured at baseline, post-intervention, and at a follow-up on a continuous outcome.

The model took the form lmer(outcome ~ timepoint * condition + (1|cluster/ID), data = data)

The results indicated that there was no significant interaction for timepoint*condition. However, I checked the group comparisons using emmeans that were planned to contrast the marginal mean group scores at post-intervention and also at follow-up, and both of these comparisons are significant.

Since the planned comparisons drew on the model generated in the first step, how is it that the interaction is non-significant but those two comparisons are significant?

The results of the model and then the comparisons are below:

> ## Model ## > lmer(outcome ~ timepoint * condition + (1 | cluster/ID), data = data) Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] Formula: outcome ~ timepoint * condition + (1 | cluster/ID) Data: data REML criterion at convergence: 1294.2 Scaled residuals: Min 1Q Median 3Q Max -3.03115 -0.49191 -0.01708 0.51935 2.43162 Random effects: Groups Name Variance Std.Dev. ID:cluster (Intercept) 1.6674 1.291 cluster (Intercept) 0.1176 0.343 Residual 3.0743 1.753 Number of obs: 302, groups: ID:cluster, 106; cluster, 6 Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 5.161e+00 3.878e-01 4.015e+00 13.309 0.00018 *** timepoint1 2.846e-01 3.589e-01 2.044e+02 0.793 0.42868 timepoint2 3.688e-03 3.615e-01 2.052e+02 0.010 0.99187 conditionTreatment 4.325e-01 5.197e-01 5.582e+00 0.832 0.43945 timepoint1:conditionTreatment 6.619e-01 4.941e-01 2.009e+02 1.340 0.18189 timepoint2:conditionTreatment 6.067e-01 4.974e-01 2.017e+02 1.220 0.22393 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) tmpnt1 tmpnt2 trtmnT tmp1:T timepoint1 -0.427 timepoint2 -0.423 0.478 conditionTr -0.746 0.318 0.316 tmpnt1:conT 0.310 -0.726 -0.347 -0.453 tmpnt2:conT 0.308 -0.347 -0.727 -0.450 0.485 ## Comparisons ## > mod_outcome <- lmer(outcome ~ timepoint * condition + (1 | cluster/ID), data = data) > emm_outcome <- emmeans(mod_outcome, specs = pairwise ~ condition:timepoint, adjust = "none") > emm_outcome$contrasts contrast estimate SE df t.ratio p.value Control timepoint0 - Treatment timepoint0 -0.1755 0.352 292 -0.498 0.6190 Control timepoint0 - Control timepoint1 -0.2939 0.354 203 -0.829 0.4079 Control timepoint0 - Treatment timepoint1 -1.1232 0.354 292 -3.172 0.0017 Control timepoint0 - Control timepoint2 -0.0162 0.357 205 -0.045 0.9638 Control timepoint0 - Treatment timepoint2 -0.7938 0.356 292 -2.230 0.0265 Treatment timepoint0 - Control timepoint1 -0.1184 0.364 292 -0.325 0.7452 Treatment timepoint0 - Treatment timepoint1 -0.9478 0.338 196 -2.806 0.0055 Treatment timepoint0 - Control timepoint2 0.1593 0.366 292 0.435 0.6641 Treatment timepoint0 - Treatment timepoint2 -0.6183 0.340 197 -1.821 0.0701 Control timepoint1 - Treatment timepoint1 -0.8293 0.366 292 -2.268 0.0241 Control timepoint1 - Control timepoint2 0.2777 0.366 196 0.758 0.4494 Control timepoint1 - Treatment timepoint2 -0.4999 0.367 292 -1.361 0.1747 Treatment timepoint1 - Control timepoint2 1.1070 0.368 292 3.009 0.0029 Treatment timepoint1 - Treatment timepoint2 0.3294 0.341 196 0.966 0.3352 Control timepoint2 - Treatment timepoint2 -0.7776 0.370 292 -2.104 0.0363 Degrees-of-freedom method: kenward-roger 
$\endgroup$
1
  • $\begingroup$ Please provide more details about the model and a summary of the results that you cite. I suspect that this has to do with the way that the time variable was coded into your model, but without more information it's hard to know. Please do that by editing the question, as comments are easy to overlook and can be deleted. $\endgroup$ Commented Jul 10, 2023 at 12:17

1 Answer 1

1
$\begingroup$

There are a few possibilities here.

First, there is no assurance that an omnibus significance test on a coefficient will agree with pairwise comparisons based on the factors contributing to the interaction.

Second, you show only the "significance" of the 2 individual coefficients associated with the interaction, not for the interaction term as a whole. A combined Wald test on those coefficients, or a likelihood-ratio test between models with and without the interaction (fit with ML instead of REML) is needed to tell if the interaction as a whole is "significant."

Third, you seem to have specified no multiple-comparison correction for the multiple pairwise comparisons that you performed. It looks like some but not all of those might still be "significant" after such comparison. You need to do some multiple-comparison correction; offsetting that somewhat, you don't seem to need all of those pairwise comparisons to test what seems to be your hypothesis. If you select a few comparisons (based on the experimental setup, not on the results of the model), multiple-comparison corrections will be less restrictive than when you do all 15.

Fourth, although it's not "statistically significant," the timepoint0 values of the conditionTreatment group are somewhat higher (by the value of the associated coefficient, 0.43) than those of the control group (given by the intercept, 5.2). That's of similar magnitude as the estimated post-intervention treatment effect of ~0.6 units. So even if a net difference between the conditionTreatment group post intervention differs from that of the control, you have to decide how much that's due to the difference in pre-intervention values.

$\endgroup$
2
  • $\begingroup$ Thanks for the insights. I am familiar with the idea of omnibus tests differing from pairwise comparisons, but just wondered if there was some other explanation. $\endgroup$ Commented Jul 11, 2023 at 12:03
  • $\begingroup$ @Biscuity think more carefully about the contrasts that you chose. I think what you want for a null hypothesis is that the treatment_time1-treatment_time0 difference is the same as the control_time1-control_time0 difference. Similarly for time2-time0 within-group differences. You don't need all those pairwise comparisons, which hurt power (with proper correction for multiple comparisons) if they aren't testing pre-specified hypotheses of interest. $\endgroup$ Commented Jul 11, 2023 at 12:33

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.