What does extrapolation mean in the context of regression weights and what are its downsides?

Question

In numerous articles I have read (Abadie 2021, Samii 2016, basically any Abadie piece that talks about the synthetic control method), the authors cite regression's reliance on extrapolation for weights as a negative feature for estimating causal effects.

For example, Abadie 2021, in the section discussing the benefits of synthetic controls compared to regression cites:

Synthetic control estimators preclude extrapolation, because synthetic control weights are nonnegative and sum to one. It is easy to check that, like their synthetic control counterparts, the regression weights in $W^{reg}$ sum to one. Unlike the synthetic control weights, however, regression weights may be outside the [0, 1] interval, allowing extrapolation outside of the support of the data

In Samii 2016:

Where there is no overlap, one can only make comparisons with interpolated or extrapolated counterfactual potential outcomes values. King and Zeng (2006) brought the issue to political scientists’ attention, characterizing interpolations and, especially, extrapolations as “model dependent,” by which they meant that they were nonrobust to modeling choices that are often indefensible. By pointing out how common such model dependent estimates are in political science research, King and Zeng raised troubling questions about the validity of many generality claims in quantitative causal research in political science.

In sum, I think the critique is less with the method and more with the lack of transparency for how regression weights are generated (relying on extrapolation and not equally weighting observations in the data set such that a small subset of weights can dominate the weighting of an entire sample.

However, in these pieces, what this entails is somewhat vague. I can see that regression can produce negative weights (as opposed to the synthetic control method where the lower bound of weights is 0), but I do not understand how this is a negative feature. Abadie (2021) notes that weights beyond [0,1] leads to extrapolation outside the support of the data. I am confused on what this means. How do negative weights or weights > 1 extrapolate beyond the support of the data? I am sure that the answer is fairly simple, I am just having a hard time explaining this back to myself in a way that I can understand.

Can you point to a section of the works where they make their claim about regression's reliance on extrapolation being a negative feature? — Scriddie
– Scriddie, Commented May 26, 2023 at 9:15
@Scriddie sure! I'll edit the initial post to support that claim — Brian Lookabaugh
– Brian Lookabaugh, Commented May 26, 2023 at 13:29
It's about using a convex combination to remain inside support of the observations I think this book explains why we should avoid extrapolation and why using a convex combination is important in synthetic control to remian inside the support of the data : matheusfacure.github.io/python-causality-handbook/… — mo1234
– mo1234, Commented Dec 11, 2023 at 14:55

Scriddie · Accepted Answer · 2023-06-05 07:51:22Z

It seems to be about how far one can move away from the support

In the synthetic control setting, the hypothetical value that would have been observed without a treatment is approximated using a linear combination of subjects thought to be similar to the treatment subject. The authors stress that the weights in the linear combination are in [0, 1] (Section 3.2 in Abadie 2021). In Section 4 of Abadie 2021, the authors explain that a regression-based estimator of the hypothetical value without a treatment can be thought of as a different way of computing a synthetic control that is not restricted to weights in [0, 1]. There are other differences, but that's the one the authors stress as they seem very worried by the idea of obtaining extreme values for their synthetic control.

Looking at King and Zheng (2006) (cited in Section 4 of Abadie 2021), extreme values for the synthetic control indeed seems to be the concern behind wanting to restrict the coefficients. While I can see some intuition for this, I also feel that the technical reasons are not outlined clearly. Even in the [0, 1] weight scenario, one could clearly generate a linear combination that is not in the support of the original data. However, there is a limit to how far away from the support of the data one could go, and that seems to be the point. Without the [0, 1] restriction, one could move arbitrarily far from the support (see also Section 2.3.1 in Abadie 2021). King and Zheng (2006) make the point that extrapolating further from the support amounts to relying more strongly on the assumptions underlying the model class, which one may want to avoid.

Overall my feeling is that it would depend on the scenario - the linear regression method could lead to some extreme weights, but it wouldn't have to. Similarly, the synthetic control trial depends on assumptions on the similarity of the units being reweighted to the actual treated unit. The authors seem to think that the first poses a greater danger in many cases. The examples and their subject matter expertise make this a credible point, although I don't think it should be thought of as a fact.

I'd say that it can happen and sometimes does is a fact. I don't know how often it makes much difference in practice though--which I think is the gist of what you're saying. This research fits with a lot of research that basically says, "OLS can do some weird things." For one recent example, see arxiv.org/abs/2106.05024 — Josh
– Josh, Commented Jun 1, 2023 at 10:43

Stack Exchange Network

What does extrapolation mean in the context of regression weights and what are its downsides?

1 Answer 1

It seems to be about how far one can move away from the support

Hot Network Questions

What does extrapolation mean in the context of regression weights and what are its downsides?

1 Answer 1

It seems to be about how far one can move away from the support

Related

Hot Network Questions