3
$\begingroup$

Suppose we have the following model:

$\log(\mathbb{E}(y|x_1, x_2))=\beta_0+\beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2$.

It could be a logistic regression, a Poisson regression, or a negative binomial model.

Suppose $y$ is a positive integer and the model is a negative binomial model.

Suppose $x_1$ and $x_2$ are dummy variables.

Suppose that all estimated coefficients are statistically significant.

Then, when $x_1=1$, switching $x_2$ from $0$ to $1$ multiplies $\mathbb{E}(y|x_1, x_2)$ by $e^{\beta_2} e^{\beta_3}$.

Is the fact that both $\beta_2$ and $\beta_3$ are statistically significant sufficient to establish the above fact?

Or do I need to test whether the effect of the two on $y$ is statistically significant as well? In this case, would a one-side t-test on $\beta_2+\beta_3>0$ be sufficient?

$\endgroup$
6
  • $\begingroup$ Watch out with where you take the expectation! If you $\log$ the outcome with a piece of code like y_regression = np.log(df["y"]) in Python/Pandas or y_regression <- log(d$y) in R and then use y_regression as the variable to which the regression is fit, you are not doing a GLM with a $\log$ link like negative binomial regression (usually) uses. You are estimating $\mathbb E\left[\log\left(y\mid X\right)\right]$ instead of $\log\left(\mathbb E\left[y\mid X\right]\right)$ that the $\log$-link GLM estimates. Depending on your work, either could be appropriate, but they are not the same. $\endgroup$ Commented Jan 4 at 22:57
  • $\begingroup$ It can't be a logistic regression. It could be a binomial regression with a log link, but "logistic" is specifying the link to be logit. $\endgroup$ Commented Jan 6 at 0:24
  • $\begingroup$ @Dave I posted a comment where I thanked you and said I fixed the mistake. It disappeared. $\endgroup$ Commented Jan 6 at 5:39
  • $\begingroup$ @Glen_b it can be a logistic regression if we put on the LHS $\log(o(\mathbb{E}(y|x)))$, where $y$ is the probability of the event and $o$ is the odds function $o(x)=\frac{x}{1-x}$. It can be a Poisson or negative binomial regression if $y$ is the number of events and $o(x)$ is the identity function $o(x)=x$ $\endgroup$ Commented Jan 6 at 5:40
  • $\begingroup$ In this case, the effect of the interaction becomes "when $x_1=1$, switching $x_2$ from $0$ to $1$ multiplies o($\mathbb{E}(y|x_1, x_2)$) by $e^{\beta_2} e^{\beta_3}$." $\endgroup$ Commented Jan 6 at 5:41

1 Answer 1

6
$\begingroup$

The answer depends on the covariance between $\beta_2$ and $\beta_3$. You want to test the the null hypothesis that $\beta_2 + \beta_3 = 0$.

To do this, we need to write a vector, $\delta$, which creates $\beta_2 + \beta_3$ upon performing a dot product with the coefficient vector from the model. After some thought, you can see that $\delta = [0, 0, 1, 1]$. We then need to determine the standard error of $\beta_2 + \beta_3$, which is

$$ \sqrt{\delta^T \Sigma \delta} $$

where $\Sigma$ is the covariance matrix from the model. This can be done as follows in R:

# Simulate some data to fit a model set.seed(2) n <- 1000 x1 <- rbinom(n, 1, 0.5) x2 <- rbinom(n, 1, 0.5) eta <- -1 + 0.5*x1 + 0.25*x2 + 0.15*x1*x2 mu <- exp(eta) theta <- 1/2 y <- MASS::rnegbin(n, mu, theta) d <- data.frame(x1, x2) fit <- MASS::glm.nb(y~x1*x2, data=d) Sigma <- vcov(fit) delta <- c(0, 0, 1, 1) beta <- coef(fit) # Get the variance of b2+b3 vr <- delta %*% Sigma %*% delta se <- sqrt(vr) z <- (beta %*% delta) / se p <- 2*(1-pnorm(abs(z))) # 0.01221841 

You should also be able to do this with the {marginaleffects} package, as follows:

library(marginaleffects) pred <- predictions( model=fit, newdata=datagrid(x1=1, x2=c(1, 0)), type = 'link', ) # x1 x2 Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % # 1 1 -0.158 0.116 -1.36 0.175 2.5 -0.385 0.0703 # 1 0 -0.587 0.126 -4.66 <0.001 18.3 -0.834 -0.3401 #Columns: rowid, estimate, std.error, statistic, p.value, s.value, #conf.low, conf.high, x1, x2 #Type: link hypotheses(pred, hypothesis = 'pairwise') # Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % # Row 1 - Row 2 0.429 0.171 2.51 0.0122 6.4 0.0935 0.765 #Columns: term, estimate, std.error, statistic, p.value, s.value, #conf.low, conf.high #Type: link 

Explanation of {marginaleffects}

The predictions function will first get the predicted values for the cases where $x_1=1, x_2=1$ and $x_1=1, x_2=0$. The values are returned on the scale of the link function thanks to the type=link argument.

The first row is equivalent to

$$ \beta_0 + \beta_1 + \beta_2 + \beta_3 $$

The second row is equivalent to

$$ \beta_0 + \beta_1 $$

Using the hypotheses function, I can test the difference between the first and second row, which is equivalent to

$$ \beta_2 + \beta_3 $$

which is what we want. The package will handle the standard error computation for us, but it is more or less what I have shown above.

Edit:

The easiest way to get the answer with {marginaleffects} is by referencing the coefficients in the model as follows (h/t robertspierre)

hypotheses( model = fit, hypothesis = 'x2 + `x1:x2`=0' ) #> #> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % #> x2 + `x1:x2`=0 0.429 0.171 2.51 0.0122 6.4 0.0935 0.765 #> #> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
$\endgroup$
3
  • 3
    $\begingroup$ Notice that hypotheses(fit, hypothesis = 'x2+`x1:x2`=0') would work as well. So if you have a lot of variables you don't have to specify a value for the variables we are not interested in. $\endgroup$ Commented Jan 5 at 5:20
  • $\begingroup$ @robertspierre ah nice, I tired something similar but I guess I forgot the backticks! Thanks for the info $\endgroup$ Commented Jan 5 at 14:16
  • 1
    $\begingroup$ Or hypotheses(fit, hypothesis = "b3 + b4 = 0") if that's easier. $\endgroup$ Commented Jan 5 at 16:02

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.