How to decide whether to include an interaction term in a model

Question

If an interaction term in a regression model leads to a lower AIC (indicating a better model fit), but the p-value for the interaction term itself is not statistically significant, should the interaction still be retained in the model?

From one perspective, a lower AIC suggests the model with the interaction better explains the data. However, a non-significant interaction implies that there is no strong evidence that the effect of one variable on the outcome differs across levels of the other variable.

In this case, how should one balance the improvement in AIC against the lack of statistical significance? Does a non-significant interaction still have any practical or interpretive value, or is it better to exclude it to maintain a more parsimonious model?

Any kind of data-driven model choice, whether driven by p values or by AIC, will invalidate your subsequent p values unless corrected for (which is highly non-trivial). So the answer really is: whatever you do, do it on pilot data, decide on a model, then collect new data and apply your model to it without change. — Stephan Kolassa
– Stephan Kolassa, Commented Oct 11 at 11:48
I would plot the interaction, which can help determine its practical significance (as opposed to statistical significance, which, as @peter flom pointed out, is partly driven by sample size). — Christian Geiser
– Christian Geiser, Commented Oct 13 at 17:24

Peter Flom · Accepted Answer · 2025-10-11 11:30:06Z

You wrote

However, a non-significant interaction implies that there is no strong evidence that the effect of one variable on the outcome differs across levels of the other variable.

This is not exactly correct. It says that if the interaction effect in the population was really 0, you would get an interaction in the sample that was as far from 0 as the one you got at least 5% of the time.

This is partly a function of sample size: The same size effect in a large sample will be significant, in a small one, not significant.

If you must use a purely statistical criterion, AIC is better. But I would compare the predicted values of the two models to the actual results and see what was going on. I'd do this with plots of the errors, maybe a QQ plot, maybe a Tukey mean difference plot.

And for those who are open to Bayes, the best solution is to elicit prior distributions on the importance of interaction effects and using those in the final analysis. Allows interactions to be “half in” and “half out” of the model. — Frank Harrell
– Frank Harrell, Commented Oct 11 at 11:58

Stack Exchange Network

How to decide whether to include an interaction term in a model

1 Answer 1

Hot Network Questions

How to decide whether to include an interaction term in a model

1 Answer 1

Related

Hot Network Questions