Revisions to R Squared Causal Inference

added 39 characters in body

edited Aug 26, 2024 at 5:07

42.3k
2
67
167

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

R squared tells us how much variation in your outcome is explained by variation in our predictor.
Covariates with small causal effects in systems with high noise can result in models with low R squared
Despite this, we can still detect causal effects of variables if our identification strategy is valid and we are powered to do so.
Adding more variables can reduce the variability in the outcome , which leads to higher R squared and better precision, but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactlynecessarily . What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models simply because they have low R squared. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

R squared tells us how much variation in your outcome is explained by variation in our predictor.
Covariates with small causal effects in systems with high noise can result in models with low R squared
Despite this, we can still detect causal effects of variables if our identification strategy is valid
Adding more variables can reduce the variability in the outcome but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

R squared tells us how much variation in your outcome is explained by variation in our predictor.
Covariates with small causal effects in systems with high noise can result in models with low R squared
Despite this, we can still detect causal effects of variables if our identification strategy is valid and we are powered to do so.
Adding more variables can reduce the variability in the outcome, which leads to higher R squared and better precision, but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not necessarily . What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models simply because they have low R squared. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

deleted 1 character in body

Source Link

edited Aug 25, 2024 at 18:33

Demetri Pananos

42.3k
2
67
167

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

R squared tells us how much variation in your outcome is explained by variation in our predictor.
Covariates with small causal effects in systems with high noise can result in models with low R squared
Despite this, we can still detect causal effects of variables if our identification strategy is valid
Adding more variables can reduce the variability in the outcome but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data _oror adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

R squared tells us how much variation in your outcome is explained by variation in our predictor.
Covariates with small causal effects in systems with high noise can result in models with low R squared
Despite this, we can still detect causal effects of variables if our identification strategy is valid
Adding more variables can reduce the variability in the outcome but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data _or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

R squared tells us how much variation in your outcome is explained by variation in our predictor.
Covariates with small causal effects in systems with high noise can result in models with low R squared
Despite this, we can still detect causal effects of variables if our identification strategy is valid
Adding more variables can reduce the variability in the outcome but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

Source Link

answered Aug 25, 2024 at 17:00

Demetri Pananos

42.3k
2
67
167

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

R squared tells us how much variation in your outcome is explained by variation in our predictor.
Covariates with small causal effects in systems with high noise can result in models with low R squared
Despite this, we can still detect causal effects of variables if our identification strategy is valid
Adding more variables can reduce the variability in the outcome but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data _or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

Stack Exchange Network

Return to Answer