linear mixed effects models - overfit: how to calculate predictive R squared

Question

I am using R to build the random structure of my model but I am ending up with a very complex model. Currently looks like this:

Model <- lmer(x ~ y * z * d * k + (1 + y * z + d | subject), data = Data, REML = FALSE, control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 100000)))

I would like to know if I am simply overfitting. How can I get a predictive R-squared for linear mixed effects models? Is there a way to calculate these values?

I am aware of the package MuMIn for getting Rsquared values but I am concerned with overfitting, so I wanted to see if the degrees of freedom are biasing too much the AIC and p-values when comparing the models using anova.

I want the predictive R-squared, not the R-squared as I know the R-squared increases with the number of predictors. Do you think AIC would be enough? — CatM
– CatM, Commented Sep 16, 2019 at 14:24
You just need to decide what criterion you want to use. AIC or BIC or AICc are fine to use as model selection criteria. There's also adjusted r-squared. There are also techniques like LASSO and ridge regression. They don't all agree, so you have to decide what criterion is important for your purposes. — Sal Mangiafico
– Sal Mangiafico, Commented Sep 16, 2019 at 22:21
@CatM I don't know how the predictive R-squared is estimated, but given that mixed models already have problems estimating the normal / adjusted R-squared, this may be a hard task. I would use a more common approach, such as AIC or a test set predictions. — user2974951
– user2974951, Commented Sep 17, 2019 at 11:34

Sal Mangiafico · Accepted Answer · 2019-09-16 22:18:34Z

This is an interesting question... The approach for calculating predictive r-square for linear models is given at this rpubs page. It won't work directly for mixed models. There is an influence function in the car package for mixed models, but I couldn't figure out how to adapt this to the purpose.... If I understand predictive r-squared, probably the most fruitful approach would be to write a function that removes data observation by observation and sees how well the model predicts the dropped observation. I didn't see anything that addresses how this would be done specifically.

Yes, but the removal should probably be group-by-group rather than observation-by-observation (and good luck with crossed random effects) — Ben Bolker
– Ben Bolker, Commented May 30, 2021 at 19:48

Stack Exchange Network

linear mixed effects models - overfit: how to calculate predictive R squared

1 Answer 1

Hot Network Questions

linear mixed effects models - overfit: how to calculate predictive R squared

1 Answer 1

Related

Hot Network Questions