I have an experiment with two independent variables and one dependent variable:
- Independent variable 1: Discrete, range 2-150
- Independent variable 2: Continuous, range 0-1
- Dependent variable: Discrete, range 1-5 (rating scale)
- Sample size: ~1400 observations
Rating distribution:
To find main effects and interaction effects, I ran both Linear Regression and Ordinal Regression.
I tested both with and without standardizing the independent variables.
Results show dramatically different interaction significance: p = 0.991 (linear) vs. p = 0.001 (ordinal)
Detailed results:
Model fit comparison:
- Linear Model MAE: 0.6124
- Ordinal Model MAE: 0.5866
Code:
import pandas as pd import numpy as np from statsmodels.formula.api import ols from statsmodels.miscmodels.ordinal_model import OrderedModel from sklearn.preprocessing import StandardScaler def interaction_tests(df): """ IV1: discrete 2-150, IV2: continuous 0-1, DV: ordinal 1-5 """ # Standardize variables scaler = StandardScaler() df_temp = df.copy() df_temp['IV1_scaled'] = scaler.fit_transform(df_temp[['IV1']]) df_temp['IV2_scaled'] = scaler.fit_transform(df_temp[['IV2']]) # Linear regression with interaction linear_model = ols('DV ~ IV1_scaled * IV2_scaled', data=df_temp).fit() # Ordinal regression with interaction df_temp['interaction'] = df_temp['IV1_std'] * df_temp['IV2_std'] predictors = ['IV1_std', 'IV2_std', 'interaction'] X = df_temp[predictors] y = df_temp['DV'] ordinal_model = OrderedModel(y, X, distr='logit') ordinal_result = ordinal_model.fit(method='bfgs', disp=False) return linear_model, ordinal_result Scatter Plot Matrix:
The r values shown are Pearson correlation coefficients between each pair of variables.
Questions:
Which interaction should I report? Both models or just ordinal? I understand ordinal fits better, but what does this say about the linear model? The dramatic difference is concerning and I don't fully understand it. Any diagnostics to understand why linear model misses what ordinal detects?
How to interpret the ordinal interaction? Is it meaningful? It's on probit scale - how to present this to readers?
Is there a better method for detecting interactions with this type of data?
Could this be a coding error or misuse? The p-value difference seems too extreme - any obvious mistakes in my approach?
Thanks a lot!!
