x <- model.matrix(Use ~ Habitat + Season + A + B + C + D + E + F + G, data = df)[, -1] y <- df$Use # must be numeric (0/1) # Fit LASSO Lasso_fit <- glmnet(x, y, alpha = 1, family = "binomial", standardize = TRUE) # Plot coefficient paths plot(Lasso_fit, xvar = "lambda", label = TRUE) plot(Lasso_fit, xvar = "dev", label = TRUE) # Cross-validation to find optimal lambda set.seed(123) cv_lasso <- cv.glmnet(x, y, alpha = 1, family = "binomial", type.measure = "class") plot(cv_lasso) best_lambda <- cv_lasso$lambda.min # Extract coefficients at best lambda coef(cv_lasso, s = "lambda.min") # convert to matrix x <- model.matrix(Use ~ Habitat + Season + A + B + C + D + E + F + G, data = df)[, -1] y <- df$Use # must be numeric (0/1) # Fit LASSO Lasso_fit <- glmnet(x, y, alpha = 1, family = "binomial", standardize = TRUE) # Plot coefficient paths plot(Lasso_fit, xvar = "lambda", label = TRUE) plot(Lasso_fit, xvar = "dev", label = TRUE) # Cross-validation to find optimal lambda set.seed(123) cv_lasso <- cv.glmnet(x, y, alpha = 1, family = "binomial", type.measure = "class") plot(cv_lasso) best_lambda <- cv_lasso$lambda.min # Extract coefficients at best lambda coef(cv_lasso, s = "lambda.min") > coef(cv_lasso, s = "lambda.min") 15 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) -4.3742763020 habB 0.3448104589 habC -0.0012576147 habD 0.5056425623 habE -0.1176552784 SeasonSpring -0.0660149736 SeasonSummer . SeasonWinter . A 1.3288390486 B 0.4343750355 C -0.0302584569 D 0.3157774736 E . F -0.0003280062 G 0.0563677638 > coef(cv_lasso, s = "lambda.min") 15 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) -4.3742763020 habB 0.3448104589 habC -0.0012576147 habD 0.5056425623 habE -0.1176552784 SeasonSpring -0.0660149736 SeasonSummer . SeasonWinter . A 1.3288390486 B 0.4343750355 C -0.0302584569 D 0.3157774736 E . F -0.0003280062 G 0.0563677638 I am aware that glmnetglmnet is ignoring the influence of my random effects and that other packages such as glmmLASSOglmmLASSO may be more appropriate, however due to my large dataset it’s too computationally intense. I have therefore decided to compare a null model, a full model (with all variables), and a lasso shrunk model (removing variable E and F) with AIC, and choose the best fitting model based on this.
df AIC Null 2 57723.75 Full 17 34059.04 Shrunk 15 36452.12 df AIC Null 2 57723.75 Full 17 34059.04 Shrunk 15 36452.12 However, I have noticed that the full model is nearly always the best fitting model based on AIC and I don’t fully understand why this may be. After variable/model selection I am then using ggpredictggpredict on the final GLMM to summarise the predicted probability of use.