In a model aimed to assess the influence of land use measures on ecosystem functioning, I have one log-transformed dependent variable (the ecosystem function), and 5 fixed-effects independent variables (1 continuous, 3 binary, 1 categorical).
I've fitted a mixed effects model (with two random effects, spatial and temporal, that remain the same) and now I am trying to find the best model. Initially I wanted to do this by manual stepwise elimination of insignificant variables using the anova(model) command and then comparing likelihood-ratio tests between models. The first model looks like this:
> model0 <- lme(dep_var ~ bin_1 + bin_2 + bin_3 + log(cont_1) + cat_1, random=~1|season/site, method="ML", data=abvpp2) > anova(model0) numDF denDF F-value p-value (Intercept) 1 319 9384.037 <.0001 bin_1 1 72 2.972 0.0890 bin_2 1 72 0.007 0.9338 bin_3 1 72 12.423 0.0007 log(cont_1 + 1) 1 72 1.655 0.2023 cat_1 2 72 29.382 <.0001 After dropping the least significant variable bin_2:
> model1 <- lme(dep_var ~ bin_1 + bin_3 + log(cont_1) + cat_1, random=~1|season/site, method="ML", data=abvpp2) > anova(model1) numDF denDF F-value p-value (Intercept) 1 319 9257.012 <.0001 bin_1 1 73 2.931 0.0911 bin_3 1 73 12.260 0.0008 log(cont_1 + 1) 1 73 1.616 0.2077 cat_1 2 73 28.361 <.0001 > anova(model0,model1) Model df AIC BIC logLik Test L.Ratio p-value model0 1 10 517.3610 557.2506 -248.6805 model1 2 9 516.6558 552.5564 -249.3279 1 vs 2 1.29476 0.2552 Pretty straightforward up to here. But after dropping cont_1 next, I get this:
> model2 <- lme(dep_var ~ bin_1 + bin_3 + cat_1, random=~1|season/site, method="ML", data=abvpp2) > anova(model1,model2) Model df AIC BIC logLik Test L.Ratio p-value model1 1 9 516.6558 552.5564 -249.3279 model2 2 8 522.7970 554.7087 -253.3985 1 vs 2 8.14122 0.0043 cont_1 does seem to have some explanatory power, in spite of it's insignificance in the anova. Apparently, this is due to multicollinearity between the variables, since the order of fitting them in the model changes significances in the anova output drastically (sometimes insignificant variables even become significant when fittet AFTER another variable with usually more explanatory power). I am now a little unsure about how to select my model. Even though some variables are correlated (r=-0.33 and r=0.62), they explain different measures and dropping one of them beforehand would be hard to justify. I would be very grateful for some ideas on model selection while dealing with collinearity in LME models and avoiding stepwise commands.