Huge Difference in Interaction P-values Between Linear vs. Ordinal Regression (0.991 vs. 0.001)

Question

I have an experiment with two independent variables and one dependent variable:

Independent variable 1: Discrete, range 2-150
Independent variable 2: Continuous, range 0-1
Dependent variable: Discrete, range 1-5 (rating scale)
Sample size: ~1400 observations

Rating distribution:

To find main effects and interaction effects, I ran both Linear Regression and Ordinal Regression.

I tested both with and without standardizing the independent variables.

Results show dramatically different interaction significance: p = 0.991 (linear) vs. p = 0.001 (ordinal)

Detailed results:

Model fit comparison:

Linear Model MAE: 0.6124
Ordinal Model MAE: 0.5866

Code:

 import pandas as pd import numpy as np from statsmodels.formula.api import ols from statsmodels.miscmodels.ordinal_model import OrderedModel from sklearn.preprocessing import StandardScaler def interaction_tests(df): """ IV1: discrete 2-150, IV2: continuous 0-1, DV: ordinal 1-5 """ # Standardize variables scaler = StandardScaler() df_temp = df.copy() df_temp['IV1_scaled'] = scaler.fit_transform(df_temp[['IV1']]) df_temp['IV2_scaled'] = scaler.fit_transform(df_temp[['IV2']]) # Linear regression with interaction linear_model = ols('DV ~ IV1_scaled * IV2_scaled', data=df_temp).fit() # Ordinal regression with interaction df_temp['interaction'] = df_temp['IV1_std'] * df_temp['IV2_std'] predictors = ['IV1_std', 'IV2_std', 'interaction'] X = df_temp[predictors] y = df_temp['DV'] ordinal_model = OrderedModel(y, X, distr='logit') ordinal_result = ordinal_model.fit(method='bfgs', disp=False) return linear_model, ordinal_result

Scatter Plot Matrix:

The r values shown are Pearson correlation coefficients between each pair of variables.

Questions:

Which interaction should I report? Both models or just ordinal? I understand ordinal fits better, but what does this say about the linear model? The dramatic difference is concerning and I don't fully understand it. Any diagnostics to understand why linear model misses what ordinal detects?
How to interpret the ordinal interaction? Is it meaningful? It's on probit scale - how to present this to readers?
Is there a better method for detecting interactions with this type of data?
Could this be a coding error or misuse? The p-value difference seems too extreme - any obvious mistakes in my approach?

Thanks a lot!!

Welcome to CV. Please plot your data. A scatterplot matrix of all three variables will be very informative. — whuber
– whuber ♦, Commented Jul 22 at 22:05
And just speaking generally, interaction is scale-dependent. You might find a strong interaction when analyzing Y but not when analyzing log(Y). Ordinal semiparametric models are transformation-free so we might expect the idea of interaction to be a little cleaner for ordinal models. — Frank Harrell
– Frank Harrell, Commented Jul 22 at 22:16
@whuber Thanks! I've updated the original post with the scatterplot matrix. Hope this is what you meant - if not, I'll adjust it. — Chen
– Chen, Commented Jul 22 at 23:14
There's an error here because the 1/2 threshold for the unstandardized ordinal model is supposed to be below that of the 2/3 threshold. — Noah
– Noah, Commented Jul 23 at 15:24
@Noah That's very different in python. They show the log of the difference between the thresholds for some reason: stats.stackexchange.com/a/592710/341520 — Lukas Lohse
– Lukas Lohse, Commented Jul 23 at 18:45

Noah · Accepted Answer · 2025-07-23 19:23:27Z

Although this seems surprising, actually this is a consequence of using the wrong model, which in this case, is probably the linear model. Let's say we generate data according the ordinal model to attempt to reproduce the data you have. Below is R code that does this (sorry, I don't use Python):

set.seed(1234) n <- 1347 #Generate independent predictors i1 <- sample(seq(2, 150), n, replace = TRUE) i2 <- runif(n) #Generate linear predictor from estimated coefs lp <- .0113 * i1 + 7.5829 * i2 + .0176 * i1 * i2 #Generate latent variable lv <- lp + rlogis(n) #Find thresholds that reproduce observed marginals thr <- quantile(lv, cumsum(c(127, 189, 289, 369)) / n) #Generate outcome by binning y <- findInterval(lv, thr) + 1 table(y) #> y #> 1 2 3 4 5 #> 127 189 289 369 373

If the ordinal model is correct, what would we expect to see if we fit a linear model to the data? The implication of your question is that the linear should (close to) exactly reproduce that of the ordinal model, but that isn't so. If the ordinal model is correct, we actually fail to see a significant interaction in the linear model.

#Fit models ##Linear (same as lm()) fitl <- WeightIt::lm_weightit(y ~ i1 * i2) ##Ordinal (same as MASS::polr()) fito <- WeightIt::ordinal_weightit(factor(y) ~ i1 * i2) summary(fitl) #> #> Call: #> WeightIt::lm_weightit(formula = y ~ i1 * i2) #> #> Coefficients: #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) 1.288592 0.098305 13.108 <1e-06 *** #> i1 0.006319 0.001119 5.646 <1e-06 *** #> i2 3.308936 0.159553 20.739 <1e-06 *** #> i1:i2 0.001192 0.001792 0.665 0.506 #> Standard error: HC0 robust summary(fito) #> #> Call: #> WeightIt::ordinal_weightit(formula = factor(y) ~ i1 * i2) #> #> Coefficients: #> Estimate Std. Error z value Pr(>|z|) #> i1 0.009040 0.002591 3.489 0.000485 *** #> i2 6.991380 0.455491 15.349 < 1e-06 *** #> i1:i2 0.021224 0.005210 4.074 4.63e-05 *** #> Standard error: HC0 robust #> #> Thresholds: #> Estimate Std. Error z value Pr(>|z|) #> 1|2 0.8632 0.2276 3.793 0.000149 *** #> 2|3 2.5797 0.2390 10.796 < 1e-06 *** #> 3|4 4.5800 0.2687 17.043 < 1e-06 *** #> 4|5 7.0898 0.3109 22.803 < 1e-06 ***

That is to say, the pattern of results you observed is consistent with correctly running the models but with the data generated from an ordinal model. It's hard to make a general claim about how a linear model will perform when the data-generating model is ordinal because it is misspecified in several ways. We certainly know the outcome distribution is misspecified, but it's possible the conditional mean is misspecified, too. That's not to say that an ordinal model is the correct model for you data; it might be, but what I aim to show here is that if it is correct, the results you found are not out of the ordinary.

If I were to decide which model to use, I would certainly use the ordinal model, and not even report the linear model. There are ways of probing ordinal models to arrive at more interpretable quantities. I don't know what is available in Python for this analysis.

Lukas Lohse · Accepted Answer · 2025-07-23 14:26:22Z

Let's start with Question 4. I'm no Python expert, but comparing to here (https://www.statsmodels.org/stable/examples/notebooks/generated/ordinal_regression.html) I see no reason to assume anything went wrong.

Now Question 1: I don't think the linear model would be any good. As is visible on your scatter plot DV vs IND2, you go through a huge range on the logit-scale. According to the non standardized model you go up by about 9 and as base I'm gonna use -1,5 from the 1st threshold. Plotting residuals vs prediction from your linear model you should see a trend very roughly like this:

x <- seq(-1.5, 7.5, by = 0.1) y <- plogis(x) m <- lm(y ~ x) plot(x, y) abline(m) plot(predict(m), residuals(m)) abline(h = 0)

The reason being that as you get way from 0 large changes in the logit are increasingly smaller changes in the mean. The choice in link matters a lot for such a huge range in fitted values. In fact it is not a given that effects are linear on the logit-scale an you should consider residual analyses or splines in the ordinal model. In fact non-linear effects can generate spurious interaction effects but since your IND1 and IND2 look very independent, I doubt that is the case here (Read more: https://urisohn.com/sohn_files/papers/interacting.pdf, or same author here: https://datacolada.org/121). He recommends GAMs. I'm not sure how well supported they are in Python for ordinal models. In R they are pretty great: Generalized Additive Model interpretation with ordered categorical family in R.

Question 2: How to interpret the ordinal interaction?

The interaction appears to be quite large if maybe smaller than the other effects. Again with your model we would expect odds-ratios in comparison of IND1 25 vs 125 to be exp(.0176*0.5*100) = 2.41 times larger at IND2 = 0.75 than at IND2 = 0.25. ORs mean different things depending on the base-line probabilities and I would personally pick 4 representative cases (IND1 low, high; IND2 low, high) and present the fitted predictions in a model with and without the interaction.

Question 3: Is there a better method for detecting interactions with this type of data?

Again, given how far this model goes on the logit scale, it would actually be quite weird if it was just linear.

Stack Exchange Network

Huge Difference in Interaction P-values Between Linear vs. Ordinal Regression (0.991 vs. 0.001)

2 Answers 2

Linked

Hot Network Questions

Huge Difference in Interaction P-values Between Linear vs. Ordinal Regression (0.991 vs. 0.001)

2 Answers 2

Linked

Related

Hot Network Questions