6
$\begingroup$

I'm fitting a linear mixed model using the nlme::lme() function in R to analyze repeated measures of a continuous outcome variable over time. My base model is structured as follows:

lme(variable ~ bs(seg, df = 4) * group + fumador + diabetes + hipercol + sexo + hta + educ, random = ~ 1 | id2, correlation = corCAR1(form = ~seg | id2), control = lmeControl(opt = "optim"), data = xxx, method = "REML", na.action = na.exclude) 

Here:

  • seg is time (in years),
  • group is an exposure group,
  • id2 is the individual identifier,
  • The correlation structure is CAR(1), assuming repeated measures per individual.

However, I’m working with data pooled from 2 cohorts, and I’m considering whether to model the cohort membership as a random effect. My modified model looks like:

lme(variable ~ bs(seg, df = 4) * group + fumador + diabetes + hipercol + sexo + hta + educ, random = ~ cohort | id2, correlation = corCAR1(form = ~seg | id2), control = lmeControl(opt = "optim"), data = xxx, method = "REML", na.action = na.exclude) 

My questions is:

  1. Is this formulation valid, or should cohort be specified at a higher level (e.g., random = ~1 | cohort/id2)? I am not formulating via lmer because of the correlation structure Any clarification on the implications of modeling cohort as a random effect would be appreciated. Thanks!
$\endgroup$
1
  • 3
    $\begingroup$ You might find the posts on this page to be helpful. My quick sense is that with only 2 levels of cohort it would best be used as a fixed factor $\endgroup$ Commented Jun 12 at 12:33

1 Answer 1

6
$\begingroup$

Your first model looks good, and I agree that you should probably account for cohort. With longitudinal data, only a time-varying variable can be treated as a random (or varying) slope. Because cohort has the same value for a given individual in your data, it is time-invariant. On the other hand, seq varies within individual and it is quite common in longitudinal models to allow for the change in outcome to vary over individuals (random = ~ seq | id2).*

Instead of treating cohort as a random slope, it should be treated as a fixed (non-varying) intercept and included as a covariate in your model. This is preferred over a including it as a random intercept because cohort has only two values. Random intercepts can be estimated with a variable with as few as 5 values, but even that is dicey and other corrections are required to ensure the resulting fixed effect standard errors are unbiased.$^+$ Note that if you expected the effects of any of the other predictors to vary for members of different cohorts, you would need to interact cohort with those variables.

*If you were unsure that such a slope was warranted, $\chi^2$ tests via the anova() function can be utilized to compare models with and without random slopes.

$^+$See McNeish (2017) for an excellent discussion of the problem and solution for small level 2 (or level 3) sample sizes in mixed effects models.

$\endgroup$
2
  • $\begingroup$ I have just one doubt about the reasoning in which you should include time-varying variables. In repeated measures considering experimental settings, when you have multiple i.e blood tests from the same individual over time. You can adjust for the batch (experimental cell-plate) to account for variability random = ~batch/id. And the batch varies because of different factors but not because of the time. $\endgroup$ Commented Jun 12 at 14:29
  • 1
    $\begingroup$ @JavierHernando, I was speaking of random slopes being estimated on variables that vary within person. Unfortunately, I'm not familiar with the design you mentioned. In your example, if persons are nested within batches, then I would treat batch as a random intercept. The code you mentioned, batch/id seems to be this. $\endgroup$ Commented Jun 12 at 15:31

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.