1
$\begingroup$

I have been recently asked what is the difference between a Dose Response Function (DRF) estimation (as the one proposed in this link and this paper) and a statistical regression method. I therefore tried to create some toy synthetic examples in order to show some cases in which regression methods fail to estimate the impact of the treatment while DRF estimation had this ability. But I wasn't able to find any.

In particular, I created this notebook to test the two cited methods in comparison to a linear regression (LR) and found the following:

  1. In an unconfounded and without colliders scenario, nothing is as accurate as the regression
  2. DRF estimation methods are biased/not as accurate as I was expecting
  3. If I include a collider in the analysis confusing it as a confounder, then the DRF estimation methods are (still biased but more) robust than a LR

Given those findings, I have a few questions.

  1. What's the point of using those methods if in a scenario without colliders traditional regression methods works better? (Of course, if the user does not think to be in this case he/she can perform an analyses to remove colliders)
  2. Is there any case in which DRF estimation methods are less biased than regression methods?

Feel free to download and play with my notebook, some other notebook examples would be greatly appreciated, thanks

$\endgroup$
2
  • 1
    $\begingroup$ I'm not sure whether your question is focusing on what you are actually interested in (propensity scores / inverse probability weights?). A dose-response function is simply...a function. You can estimate its parameter values using regression techniques. In fact, logistic regression is often used for this in toxicology: en.wikipedia.org/wiki/Dose%E2%80%93response_relationship . $\endgroup$ Commented Mar 8, 2024 at 9:20
  • $\begingroup$ Hi mkt, thanks for your comment! I am referring to a causal dose response function, as the function that maps continuous treatments T into continuous potential outcomes Y(T). In the Hirano, Imbens paper they refer to this as dose response function so I kept calling it in this way $\endgroup$ Commented Mar 8, 2024 at 9:58

1 Answer 1

3
$\begingroup$

First, don't think about the colliders situation. Both methods are equally ill-equipped to handle colliders and none are more robust to conditioning on colliders than others. That is not the sense in which propensity score methods are designed to be more robust. You might have heard that IPW is able to handle post-treatment colliders and regression can't, but that is only in the case of sequential treatment effect estimation with time-varying confounders, which are also colliders. In a single time point study like this, colliders are equally toxic for both methods and the difference you saw is likely due to the specific data-generating model and not to an inherent difference between the two methods that one would expect to hold more broadly.

I'm not exactly sure what you mean by "a statistical regression method". When the estimand is the average dose-response function (ADRF), you can use regression in the form of g-computation to estimate it. You can also use propensity score methods to do so instead, or combine the methods.

G-computation involves fitting a regression model for the outcome given the treatment and covariates. From that model, you generate predictions for outcome under every possible value of the treatment, and for each value, you compute the mean prediction across all units in your sample. That outcome model needs to be correct for the data-generating process in order for the ADRF to be consistently estimated. A problem with this approach is that not only do you need to allow flexibility in the relationship between the treatment and the outcome, but you also need to flexibly model the relationships between the covariates and the outcome and their interactions with treatments. That can lead to huge models that are impossible to fit, even with just a few covariates, and the method is not robust to misspecification of this model. For example, if the relationship between the outcome and the treatment is curvy, and the relationship between the outcome and a covariate is curvy, and the relationship between the outcome and treatment changes based on the covariate in a curvy way, then modeling any of these relationships as linear will yield a biased ADRF, even if you have correctly include all variables required to remove confounding.

Weighting allows you to avoid some of these issues; importantly, you don't need to model the relationship between the covariates and the outcome or the interactions between the covariates and treatment in the outcome model. You do still need to get the relationship between the treatment and outcome correct, but that is much easier than getting the entire data-generating model correct. The tradeoff is that you still need to correctly estimate the weights and you may have a decrement in precision using weights. To model the weights, you need to correctly model the conditional density of the treatment given the covariates. For propensity score methods with a binary treatment, the density is determined by a single parameter, the mean (i.e., the probability of getting treated), so that is much easier to do. Here, you need to correctly model the whole density. Usually we just model the mean of the density and make a strong assumption about its shape (including that its shape is constant), but this can yield bias. An alternative is to use a weighting method that avoids modeling the treatment like distance covariance optimization weights (DCOWs), which directly estimate weights to minimize imbalance without a treatment model.

Your toy examples could not hope to express the realistic situations in which these two approaches would differ. Below is a list of scenarios in which you would expect weighting to perform better than regression-based g-computation:

  • The outcome model is highly complex and curvy in the covariates (it can be hard to specify a well-fitting outcome model that accounts for this)
  • The individual dose-response functions vary a lot (weighting averages over the individual dose-repsonse functions without requiring modeling this heterogeneity)
  • The outcome model involves many covariates (weighting is less sensitive to overfitting the treatment model and it is easier to prevent overfitting without impacting inference than it is to do so in the outcome model)
  • The treatment is straightforward to model, but not linear (if it were linear, adjusting for linear terms in the outcome model would be enough even if the rest of the model were misspecified)
  • You have a large sample (g-computation for a large sample can be computationally intensive, and the precision benefits one gets from g-computation are diminished with larger samples)

I think it is noble to want to explore the differences in performance in toy examples that are easy to understand and control, but these methods were designed for more complicated cases. Simulations with simple data-generating models tend to prefer simple methods that make strong (but incidentally correct) assumptions. It is important to consider how these methods were designed to be used in practice, which often involves complex data-generating processes that aren't very revealing on their own. I recommend the simulations in Huling et al. (2023), which introduces the DCOWs and compares them to other weighting methods (and of which I am an author). While we don't compare regression and weighting, we do provide a potentially realistically complex data-generating model that is able to distinguish among many flexible weighting methods.

References

Huling, J. D., Greifer, N., & Chen, G. (2023). Independence Weights for Causal Inference with Continuous Treatments. Journal of the American Statistical Association, 0(0), 1–14. https://doi.org/10.1080/01621459.2023.2213485

$\endgroup$
5
  • $\begingroup$ Thank you Noah for the great answer! I still feel like point 2 has not been totally addressed, but probably it's because my question was a bit fuzzy. Can you provide any example in which estimating the DRF using the methods you described leads to correct results while a naive estimation (such as one that estimates $E[Y|T,E[X]]$) does not? I tried to ask this also in this question stats.stackexchange.com/questions/642620/… $\endgroup$ Commented Mar 14, 2024 at 17:41
  • $\begingroup$ Lastly, I get your point that these methods are designed for more complicate scenarios but I also feel like toy examples allow us to bridge the gap between scientific and industrial world, as we could use them to explain causality concepts to people with just a ML background which would be great! Conversely, it will always be fuzzy for them to understand 'why to use causal methods' instead of just fitting a giant neural network. $\endgroup$ Commented Mar 14, 2024 at 17:43
  • 1
    $\begingroup$ A "naive" estimation works only when the outcoem model is linear. Most models are nonlienar (e.g., logistic regression, Poisson regression, general machine learning models) so you would not expect them to be equal in most cases. A linear model is special in that $E[E[Y|T, X]] = E[Y|T, X = \bar{X}]$. That is not something that should generally assumed to be true. $\endgroup$ Commented Mar 14, 2024 at 19:48
  • $\begingroup$ Great, thanks a lot! Is there any material that I can use to read something more about g-computation as well? $\endgroup$ Commented Mar 19, 2024 at 12:50
  • 1
    $\begingroup$ Snowden et al (2011) provides an intro for binary treatments. Austin (2019) explains how to use it for estimating the ADRF of a continuous treatment. $\endgroup$ Commented Mar 19, 2024 at 16:23

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.