7
$\begingroup$

I'm analyzing data from a repeated measures study using a mixed-effects linear model. My dependent variable is Y (an eye-tracking parameter: total fixation duration), and I'm investigating the effect of a continuous predictor X (Likert scale score), as well as a potential moderation by a categorical variable Z (3 groups).

I've specified two models:

  • A model with X predicting Y
  • A second model including the interaction term X * Z

After fitting the models, residual plots show a clear funnel shape, suggesting heteroscedasticity.

My question:

  • Would a log transformation of the dependent variable (Y) be appropriate to address the heteroscedasticity, or could this issue be handled using bootstrapped standard errors instead?
  • Are there best practices for addressing this issue in mixed-effects models with repeated measures?

I'm using lme4 in R. Any suggestions or clarifications would be greatly appreciated!

Thank you all so much for the help!

$\endgroup$
4
  • 2
    $\begingroup$ Can you clarify the design? How many times is the outcome assessed per participant, and are left eye and right eye tested separate? Please include pictures of the residual plot. $\endgroup$ Commented May 23 at 12:59
  • $\begingroup$ @AdamO, Thank you very much! Your answer was very helpful. I have edited my question to add more context about my study design. $\endgroup$ Commented May 23 at 14:12
  • $\begingroup$ Thanks for adding the residuals vs fitted plots. Could we also see the scale-location plot corresponding to the second model? (plot(x, which = 3)) $\endgroup$ Commented May 23 at 14:56
  • $\begingroup$ @BenBolker Thank you! I’ve just added the plot—hopefully it aligns with what you were referring to. $\endgroup$ Commented May 23 at 16:33

1 Answer 1

10
$\begingroup$

Since the response is a duration, I think we are all expecting that it should be log transformed to obtain valid inference. We can't be sure that the log transform would "fix" the heteroscedasticity, and even if it doesn't there is good reason to stick to modeling Y on the log scale.

The residuals plot is subjective kind of diagnostic tool, all the more so if you are fitting a mixed model with imbalanced cluster sizes. I am not saying that testing is needed here. A little bit of funnel-shape can mean only a little bit of heteroscedasticity which may be just fine for all intents and purposes.

If there is significant heteroscedasticity, you may need to think out of the box a little bit. If the design comprises separate assessments in the left and right eye, the balance permits a conditional likelihood approach by taking differences in all Y and X covariates, and modeling independent differences like an extension of the paired T-test - this is somewhat more precise than random intercepts, especially when their distribution is not normal. If the design involves multiple trials within participants and you have just fit a random intercept for each subject, you may not have adequately considered the "growth effects" which concern the effect of time as a fixed or random effect in the model - testing fatigue is the common tendency of performance to taper with time and can be modeled using a fixed effect for time.

$\endgroup$
1
  • 8
    $\begingroup$ I agree with all this. You could also try a Gamma GLMM with a log link (which should generally give fairly similar answers; the implied mean-variance relationship is the same as for a log-transformed LMM). One thing to be careful about with a log transformation (or link) is that it changes the meaning of the interaction; that is, an interaction on the log scale measures deviations from proportional (rather than additive) effects of the main predictor. The marginaleffects package may also be useful here. $\endgroup$ Commented May 23 at 15:36

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.