0
$\begingroup$

I have a multiple linear regression model, with several independent variables - I am mainly interested in one of these, say X, the others are what I consider covariates (age, education etc.) The model yields a significant association between the outcome variable (say Y) and X, but in addition the covariates are also significantly associated with Y. I would like to present a plot to show the significant association between Y and X but the scatter plot Y vs. X doesn’t illustrate the relationship accurately (since it doesn’t take into account the effects of the covariates). What would be a good way to show the bivariate relationship between Y and X that accounts for the effect of the covariates? I use SAS for estimating models but can use other software for plots if necessary

Thanks!

$\endgroup$

1 Answer 1

2
$\begingroup$

In a multiple linear regression, the "effect"(i.e., the partial derivative) of any predictor will be the same regardless of the other covariates in the model. This is the beauty of the linear model.

I think your problem is not the plot per se but the way in which you plot it. To fix ideas, say you have run a regression and obtained predictions of the form

$$\hat y_i = \hat \beta_0 + \hat\beta_1 x_1 + \hat\beta_2 x_2, $$

where $x_1$ is the main variable of interest. Note, that $\hat y_i$ will vary with both $x_1$ and $x_2$. So, when you plot $\hat y_i$ against $x_1$, the points will be scattered vertically at each value of $x_1$. This is because different observations with the same value of $x_1$ will have different values on $x_2$, and so their predicted $\hat y_i$ will be different.

There are two ways to generate a "clean" plot in this situation.

1) For each level of $x_1$, you can calculate the average predicted value, and plot these against $x_1$ (you might also use bins in the case if $x_1$ has too many levels). For example, if $x_1$ takes on values 1,2,3,...10, you generate an averaged predicted value by calculating the mean of $\hat y_i$ using all observations for which $x_1=1$, $x_1=2$, and so on. Then, you plot these avaraged predicted values against $x_1$. This will give you a single line.

2) Another way is simply to fix the value of $x_2$ at any reasonable value (for example, the sample mean of $x_2$) and calculate

$$ \tilde{y_i} = \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2\bar x_2 = (\hat\beta_0 + \hat \beta_2 \bar x_2) + \hat \beta_1x_1.$$

$\hat\beta_2\bar x_2$ is simply a constant and the values of $\tilde y_i$ will only vary with $x_1$. So, if you plot $\tilde y_i$ against $x_1$, you'll get a "clean" plot. In the case you have many "control" variables, you'll have to fix all of them, so that you get a single line that represents how the expected outcome varies with $x_1$ if all other variables are fixed at those values.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.