Estimating Range-Based Comparisons for Continuous Predictors with marginaleffects

Question

Background

I need to estimate the average treatment effect of being above vs. below the median on a continuous predictor that exhibits non-linearities. This quantity answers the research question:
“What is the causal effect of crossing a meaningful threshold while preserving the non-linear functional form?”

Problem Statement

The marginaleffects package handles point-to-point comparisons and categorical comparisons well, but lacks native support for range-to-range comparisons on continuous predictors.

Current Status

Below you can find the reproducible code for testing how the quantity that I want differs from the quantity that I can get using hypothesis = difference ~ pairwise

# Minimal Reproducible Example: Range-based comparisons for continuous predictors library(marginaleffects) # Load mtcars and create weight grouping variable data(mtcars) mtcars$weight_group <- ifelse(mtcars$wt > median(mtcars$wt), "heavy", "light") # Fit model with continuous wt (preserving non-linearities) mod <- lm(mpg ~ am * wt, data = mtcars) # Create counterfactual grid nd <- datagrid(newdata = mtcars, wt = unique, disp = unique , grid_type = "counterfactual") # Method 1: Vincent's suggested approach using grouping categorical variable # This works when you have a categorical grouping variable avg_predictions(mod, by = "weight_group", newdata = nd, hypothesis = difference ~ pairwise) # Method 2: What I actually need - comparing ranges of continuous predictor # Below median wt values (roughly wt <= 3.325) scene_1 <- avg_predictions(mod, newdata = datagrid(wt = mtcars$wt[mtcars$wt <= median(mtcars$wt)], am = unique, disp = unique, grid_type = "counterfactual")) # Above median wt values (roughly wt > 3.325) scene_2 <- avg_predictions(mod, newdata = datagrid(wt = mtcars$wt[mtcars$wt > median(mtcars$wt)], am = unique, disp = unique, grid_type = "counterfactual")) # Manual calculation of difference manual_diff <- scene_1$estimate - scene_2$estimate # Print results cat("Method 1 (categorical grouping):", avg_predictions(mod, by = "weight_group", newdata = nd, hypothesis = difference ~ pairwise)$estimate, "\n") cat("Method 2 (range-based manual):", manual_diff, "\n") # The problem: These should be equivalent but they're not! # Method 1 treats weight_group as categorical in the prediction grid # Method 2 preserves the continuous nature of wt while comparing ranges

Method 1 (categorical grouping): -1.247234 Method 2 (range-based manual): 6.870543

Key Issues

Mathematical Contradiction: These two methods produce different results despite conceptually estimating the same threshold effect. Method 1 relies on a categorical grouping variable in the prediction grid, while Method 2 directly compares ranges of the continuous predictor while preserving its functional form.

Missing Uncertainty Quantification: The manual approach (Method 2) loses standard errors, confidence intervals, and p-values that are essential for statistical inference.

Questions

Why do these methods produce different results when they should estimate the effect of the same weight threshold?
Is there native support for range-to-range comparisons that preserves the continuous specification?

The manual approach appears methodologically correct for my research question, but I lose the statistical inference capabilities that make marginaleffects so valuable. Any guidance would be greatly appreciated!

NOTE: This is a toy example, my real model has more variables and meaningful non-linearities.

There will be some reluctance to provide guidance here on working with binned predictor variables, given the well known problems with using them in models. I appreciate that your original model uses continuous values, but binning for predictions from the model will have many of the same problems and could depend heavily on the distribution of those values among the population. Please edit the question to say why you need to make this particular type of comparison; there could be better ways to accomplish your goal. — EdM
– EdM, Commented Aug 24 at 17:14
Thanks for the reply. To be honest, I really don't know how to rephrase this question differently. I think my point is exactly what you're saying—I don't want to "bin" a continuous variable with non-linearities. I want to fit my model using the continuous specification and get the average estimates for "interesting" ranges above and below the median. This means that each value is predicted using its true functional value (1, 2, 3, etc.), and only after expected values are estimated do I calculate the difference between them. — Mauricio Mandujano M.
– Mauricio Mandujano M., Commented Aug 24 at 17:54
The issue is that marginaleffects does not have a native way to do these comparisons. I can only predict them using avg_predictions() and then manually estimate the differences, but this means I have to handle all the uncertainty measures on my own. Which is fine, I can bootstrap my deltas and use these delta vectors to get the CIs, SEs, and p-values. However, this is computationally very slow, and it seems like thus could be nicely integrated into the package. More importantly, I think the suggestion to use hypothesis = difference ~ pairwise is doing something else altogether. — Mauricio Mandujano M.
– Mauricio Mandujano M., Commented Aug 24 at 17:59

EdM · Accepted Answer · 2025-08-24 20:55:26Z

The problem is probably due to the difference in the joint distributions of wt and am between the original mtcars data and the nd data that you constructed for avg_predictions().

with(mtcars,(ftable(highWeight= wt>mean(wt),am))) # am 0 1 # highWeight # FALSE 4 12 # TRUE 15 1 # with(nd,(ftable(highWeight= wt>mean(wt),am))) # am 0 1 # highWeight # FALSE 7182 4914 # TRUE 7695 5265

Both wt and am are significantly associated with the mpg outcome (not shown), so the major difference between the two data sets in their joint distributions means that the marginal predictions based on those data sets necessarily differ.

I'm a bit skeptical of the attempt to get counterfactual estimates based on continuous ranges above and below a threshold, anyway. I suspect that illustrative point estimates based on typical data values would be more useful. But if you are going to go down this road, be very careful in how you construct the data frames that you use for marginal predictions.

Thank you! I think this solves the mystery. And for sure, data counterfactuals have to be carefully constructed. Though in my case it makes sense to use the full range above and below because that will be a "balanced grid" - counterfactually it'll weight what would have been the effect had this row entry had a 1 for all entries, then 2, and so on. This of course has its own assumptions (my DAG should be correct) but is better than making predictions tied to a specific sample that might be unbalanced. — Mauricio Mandujano M.
– Mauricio Mandujano M., Commented Aug 24 at 21:15

Stack Exchange Network

Estimating Range-Based Comparisons for Continuous Predictors with marginaleffects

Background

Problem Statement

Current Status

Key Issues

Questions

1 Answer 1

Linked

Hot Network Questions

Estimating Range-Based Comparisons for Continuous Predictors with marginaleffects

Background

Problem Statement

Current Status

Key Issues

Questions

1 Answer 1

Linked

Related

Hot Network Questions