Modeling repeated measures data in R - Interpretation and Validation

Question

I am currently working on a control-case study where patients and controls are assessed at five different time intervals. The aim of the study is to assess possible differences in the response variable between Healthy and Patients at each phase of the experiment. I have implemented a linear mixed model using the lme4 package in R to analyze the data. I have a few questions regarding the interpretation of the results.

Find below the code to reproduce the results

library(lme4) library(emmeans) set.seed(534) n_subjects <- 50 # number of subjects n_timepoints <- 5 # number of repeated measurements per subject subject_ids <- rep(1:n_subjects, each = n_timepoints) timepoints <- rep(1:n_timepoints, times = n_subjects) group <- character(length = length(subject_ids)) group <- rep(sample(c("Healthy", "Patient"), size = n_subjects, replace = TRUE), each = n_timepoints) random_intercepts <- rnorm(n_subjects, mean = 0, sd = 2) response <- rnorm(n_subjects * n_timepoints) + random_intercepts # Create a data frame simulated_data <- data.frame( SubjectID = factor(subject_ids), Timepoint = factor(timepoints), Group = factor(group), Response = response ) # Fit a linear mixed model lmm_model <- lmer(Response ~ Group*Timepoint + (1|SubjectID), data = simulated_data) summary(lmm_model) emmip(lmm_model, Group ~ Timepoint , data = simulated_data, CIs = TRUE, xlab="PHASE") # Display contrasts for PHASE within each level of Sailors emm <- emmeans(lmm_model, ~ Group * Timepoint) contrasts_phases <- pairs(emm, simple="each", adjust="Bonferroni") print(contrasts_phases)

I'm struggling interpreting the results as I've not found similar examples online.

Looking at the results I only have a significant interaction, namely GroupPatient:Timepoint4, which is telling me how much greater is the difference between Healthy and Patients in the phase 4 compared to the phase 1 (interpretation of interaction-term in linear regression, with and without main-effect).

> summary(lmm_model) ... Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 0.1503 0.4807 240.0000 0.313 0.7548 GroupPatient 0.7639 0.6798 240.0000 1.124 0.2622 Timepoint2 -0.9397 0.6798 240.0000 -1.382 0.1681 Timepoint3 -0.7947 0.6613 240.0000 -1.202 0.2307 Timepoint4 0.6256 0.6732 240.0000 0.929 0.3537 Timepoint5 0.7200 0.6868 240.0000 1.048 0.2955 GroupPatient:Timepoint2 -0.8411 0.9613 240.0000 -0.875 0.3825 GroupPatient:Timepoint3 -0.2299 0.9648 240.0000 -0.238 0.8119 GroupPatient:Timepoint4 -2.3312 0.9617 240.0000 -2.424 0.0161 * GroupPatient:Timepoint5 -0.9183 0.9617 240.0000 -0.955 0.3406

Examining the marginal means reveals a significant stacked difference between the Healthy and Patient groups specifically in phase 4 of the experiment. Can I confidently infer that the data demonstrate a distinction between the two groups solely in this phase, or should additional checks be conducted to validate this conclusion? Also, I am a bit lost concerning if the contrast method is correct, as many times I see this in pairwise comparison, which are displaying all possible contrasts.

> contrasts_phases <- pairs(emm, simple="each", adjust="Bonferroni") > print(contrasts_phases) $`simple contrasts for Group` Timepoint = 1: contrast estimate SE df t.ratio p.value Healthy - Patient -0.7639 0.685 240 -1.114 0.2662 Timepoint = 2: contrast estimate SE df t.ratio p.value Healthy - Patient 0.0771 0.685 240 0.113 0.9105 Timepoint = 3: contrast estimate SE df t.ratio p.value Healthy - Patient -0.5340 0.690 240 -0.774 0.4400 Timepoint = 4: contrast estimate SE df t.ratio p.value Healthy - Patient 1.5673 0.686 240 2.285 0.0232 Timepoint = 5: contrast estimate SE df t.ratio p.value Healthy - Patient 0.1543 0.686 240 0.225 0.8222

Why do you have time as categorical? I'm not saying it's wrong to do so, but why do it? — Peter Flom
– Peter Flom, Commented Feb 13, 2024 at 22:13
@PeterFlom the time interval between one timepoint and another is not always the same, e.g. timepoint1 lasts 120s, and so timepoint2, but timepoint3 lasts 520s. — Ed9012
– Ed9012, Commented Feb 13, 2024 at 23:29
Then you shouldn't make them catetgorical but number of seconds. — Peter Flom
– Peter Flom, Commented Feb 14, 2024 at 0:34
@PeterFlom It makes sense. However, in my case, time points reflect certain phase of the experiment, i.e. presentation of a stimulus, including baseline (no stimulus), stimulus presentation, and post-stimulus periods, so that I aim to compare the differences between two groups across different stimuli. While examining the marginal increase of the response variable over time is interesting, my primary focus is on comparing the differences between two groups at each stimulus-related time point. — Ed9012
– Ed9012, Commented Feb 14, 2024 at 1:13

Erik Ruzek · Accepted Answer · 2024-02-14 14:40:37Z

To help make your original regression results more interpretable, I suggest that you code timepoint such that the first occasion is given a value of 0. This is because in regression models, the intercept is the mean outcome value when all predictors are at 0. In the simulated data, timepoint==0 is outside your data, so it is an extrapolation. To change this in your simulation setup, I just changed line 9:

timepoints <- rep(0:4, times = n_subjects)

This change makes it such that the interpretation you put forward in your original question is nearly consistent with model results. Specifically, the interaction between GroupPatient and Timepoint3 gives you a parameter estimate and standard error for the test of the null hypothesis that there is no difference between healthy people (the reference group) and patients at timepoint==3 (the fourth measurement occasion).

It is also worth testing whether the coefficients for all interactions are simultaneously different from 0. We can do that lots of ways. With mixed models, it is common to use likelihood ratio tests. In this case, we compare the model with all the interactions to a model with no interactions, just the main effects of GroupPatient and Timepoint.

lmm_model_1 <- lmer(Response ~ Group + Timepoint + (1|SubjectID), data = simulated_data) summary(lmm_model_1) anova(lmm_model_1, lmm_model)

The lrtest is significant, suggesting that there is evidence for an interaction between GroupPatient and Timepoint:

refitting model(s) with ML (instead of REML) Data: simulated_data Models: lmm_model_1: Response ~ Group + Timepoint + (1 | SubjectID) lmm_model: Response ~ Group * Timepoint + (1 | SubjectID) npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) lmm_model_1 8 1102.7 1130.9 -543.36 1086.7 lmm_model 12 1100.0 1142.3 -538.03 1076.0 10.671 4 0.03053 *

From here, you can use emmeans to do the contrasts and pairwise comparisons. Personally, I would show the same graph you did and report the simple contrasts.

my doubts are exactly about the simple contrasts i showed here. Are these meaningful? Do I need to perform some sort of adjustment? — Ed9012
– Ed9012, Commented Feb 15, 2024 at 11:47
If the phase represents something meaningful, then it makes sense you would want to compare patient/non-patient outcomes within a given phase (the simple contrast). Is it meaningful to compare patients in phase 1 to non-patients in phase 3? If so, report that. — Erik Ruzek
– Erik Ruzek, Commented Feb 15, 2024 at 15:53

Stack Exchange Network

Modeling repeated measures data in R - Interpretation and Validation

1 Answer 1

Linked

Hot Network Questions

Modeling repeated measures data in R - Interpretation and Validation

1 Answer 1

Linked

Related

Hot Network Questions