6
$\begingroup$

I have a question regarding model building for a large dataset including about 5000 Subjects. I want to fit a LMEM including multiple variables and I have repeated measurements in time. But for some of the subjects (around 1200, means <25%) I only have one measurement. This was no problem when fitting a simple LMEM just including a random intercept as the dataset is large enough. However, I ended up in identifiability problems and non-convergence when adding a random slope to the model. So im wondering what's more common: Removing the subjects only providing one measurment and estimating a model with random intercept and slope or keeping the total data set as it is and just using a random intercept.

Actually the results concerning the fixed effects are quite similar but I want to go the correct and more-standard way. I am really wondering how to decide whether to use only random intercept or random intercept and slope.

Thanks a lot!

$\endgroup$

1 Answer 1

9
$\begingroup$

First, I would almost always advise against deleting observations for any reason, but in your case I definitely advise against it. By deleting observations you lose statistical power but more importantly you can introduce bias.

Think for a moment about what it means to fit random slopes. It means that you allow the slope for a fixed effect to vary by subject. In other words each subject gets it's own slope for that variable. So in the case where a subject has only one observation, what slope could it have ? To make sense of fitting a slope you would ideally have at least 2 observations. Mixed Models are robust to small cluster sizes, but when you have a large proportion of singleton clusters it doesn't make sense to fit random slopes.

$\endgroup$
6
  • 1
    $\begingroup$ Thanks, Robert. Yes that's exactly what I thought that fitting a slope with just one observation doesn't make sense, of course. Thats why I thought I should remove these subjects from my dataset, but your answer convinced me that I should stick to the full data set and just fit random intercepts. $\endgroup$ Commented Jul 2, 2020 at 19:19
  • $\begingroup$ @Kathrin Glad to hear it. It's much more important to retain your data than delete observations in order to fit a more complex model. $\endgroup$ Commented Jul 2, 2020 at 19:38
  • $\begingroup$ One more question: I just realised that I have more measurements- I removed the values at time zero because I added a variable for the baseline value. Is it allowed to use these measurements in zero although I use baseline as a covariate? in this case I wouldn't run into identifiability problems even when fitting a random slope. $\endgroup$ Commented Jul 3, 2020 at 18:48
  • $\begingroup$ @Kathrin that sounds like a good idea to me. In addition to helping identify random slopes, note that regressing follow-up on baseline is often a very dubious thing to do when analysing change. $\endgroup$ Commented Jul 4, 2020 at 19:08
  • $\begingroup$ Thank you once again! Sorry for all these questions but I'm really not sure... basically we are interested in the change from baseline to a later measurement so we also considered to model the change instead of the value itself. But I think I have to look for good sources providing more explanations on that. $\endgroup$ Commented Jul 4, 2020 at 20:02

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.