Skip to main content
added 39 characters in body
Source Link
Demetri Pananos
  • 42.3k
  • 2
  • 67
  • 167

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

  • R squared tells us how much variation in your outcome is explained by variation in our predictor.
  • Covariates with small causal effects in systems with high noise can result in models with low R squared
  • Despite this, we can still detect causal effects of variables if our identification strategy is valid and we are powered to do so.
  • Adding more variables can reduce the variability in the outcome  , which leads to higher R squared and better precision, but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactlynecessarily . What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models simply because they have low R squared. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

  • R squared tells us how much variation in your outcome is explained by variation in our predictor.
  • Covariates with small causal effects in systems with high noise can result in models with low R squared
  • Despite this, we can still detect causal effects of variables if our identification strategy is valid
  • Adding more variables can reduce the variability in the outcome  but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

  • R squared tells us how much variation in your outcome is explained by variation in our predictor.
  • Covariates with small causal effects in systems with high noise can result in models with low R squared
  • Despite this, we can still detect causal effects of variables if our identification strategy is valid and we are powered to do so.
  • Adding more variables can reduce the variability in the outcome, which leads to higher R squared and better precision, but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not necessarily . What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models simply because they have low R squared. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

deleted 1 character in body
Source Link
Demetri Pananos
  • 42.3k
  • 2
  • 67
  • 167

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

  • R squared tells us how much variation in your outcome is explained by variation in our predictor.
  • Covariates with small causal effects in systems with high noise can result in models with low R squared
  • Despite this, we can still detect causal effects of variables if our identification strategy is valid
  • Adding more variables can reduce the variability in the outcome but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data _oror adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

  • R squared tells us how much variation in your outcome is explained by variation in our predictor.
  • Covariates with small causal effects in systems with high noise can result in models with low R squared
  • Despite this, we can still detect causal effects of variables if our identification strategy is valid
  • Adding more variables can reduce the variability in the outcome but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data _or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

  • R squared tells us how much variation in your outcome is explained by variation in our predictor.
  • Covariates with small causal effects in systems with high noise can result in models with low R squared
  • Despite this, we can still detect causal effects of variables if our identification strategy is valid
  • Adding more variables can reduce the variability in the outcome but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).

Source Link
Demetri Pananos
  • 42.3k
  • 2
  • 67
  • 167

R squared tells you nothing important when it comes to causal inference and the validity of your estimate.

In brief:

  • R squared tells us how much variation in your outcome is explained by variation in our predictor.
  • Covariates with small causal effects in systems with high noise can result in models with low R squared
  • Despite this, we can still detect causal effects of variables if our identification strategy is valid
  • Adding more variables can reduce the variability in the outcome but there is a legitimate risk that, if we are not careful, we add a variable which breaks our identification strategy (e.g. a cause of the treatment, a collider, etc).

Does a low R squared model pose a problem in assessing the coefficients? No, not exactly. What a low R squared model tells me is that the system is quite noisy as compared to the signal from the covariate. This may pose a problem for statistical power, but that could (in principle) be combated through getting more data _or adjusting for appropriate covariates to reduce residual variation (see my last point above).

Generally, I would not worry about low R squared models. As an example, a model for a binary exposure in an RCT with a binary outcome will almost surely have low R squared, and yet if our identification strategy is correct, we can use a linear model to estimate treatment effects in such scenarios (and do so quite reliably).