Why/when will negative binomial regression flip the sign of the effect from analogous Poisson regression, after covariate adjustment

Question

First, to be clear, the issue mentioned in the title does not happen often and, in every parametric simulation I've tried, both NB and Poisson produce similar estimates, regardless of the type of covariate. So, I cannot reproduce this phenomena using parametric simulations (not that it's impossible; I'm guessing it's not). Yet, there are a few commonalities I've seen in the cases where this does happen:

there is a numeric (non-categorical) predictor, with a strong effect, included in the model;
there is noteworthy departure from the mean/variance equivalence in the Poisson model; and
the Poisson point estimate comports with basic descriptive statistics, and the corresponding linear regression model for that matter, but the NB point estimate does not

I understand that when the covariates are not perfectly orthogonal, which is never the case with numeric covariates in a count regression model (I've read @cardinal's answer here: When do Poisson and negative binomial regressions fit the same coefficients?), that the two models will not produce identical estimates, but I'm trying to figure out what's going on here when there is a qualitative difference in the point estimates, and what to do when I encounter this situation.

This took a while to cook up but below is some fake data that exhibits this phenomena. I tried to create a situation where they were statistically significant in opposite directions, but I was going on kind of a trial and error basis here (mostly trying to mimic the shape/modality, etc., of real data where I've seen this) and eventually gave up.

y <- c(7,9,3,7,10,4,0,0,0,10,11,9,7,0,0,0,0,0,0,19,0,21,10,5,5,3,0,0,8,2,1,0,7,6,0,0,0,17,1,10,3,13,4,0,4,0,4,0,6,8,25,9,2,14,1,3,0,2,0,0,0,4,1,1,30,0,6,8,0,22,0) x1 <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) x2 <- c(3,0,4,28,9,5,7,0,0,12,6,1,7,6,11,35,0,0,1,0,5,7,0,0,0,10,10,0,0,2,4,0,0,0,6,3,17,9,0,10,1,17,1,3,0,3,5,1,7,2,12,0,7,20,17,0,1,6,0,0,5,0,0,4,27,4,0,9,5,8,7)

Let's say the outcome is y, and x2 is the strong numeric predictor. The effect of x1, a binary predictor that is unassociated with x2, in the NB model gets weird when you include x2. First, the unadjusted effects:

library(MASS) coef( glm( y ~ x1, family=poisson ) )[2] x1 0.1602297 coef( glm.nb( y ~ x1 ) )[2] x1 0.1602297 ## which of course equals the difference of the logs of the means log( mean(y[which(x1==1)]) ) - log( mean(y[which(x1==0)]) ) [1] 0.1602297

All good so far. Now include x2 and it gets weird

coef( glm( y ~ x1 + x2, family=poisson ) )[2] x1 0.1515809 coef( glm.nb( y ~ x1 + x2 ) )[2] x1 -0.1308116

What's going on here, and why is it always (evidence not shown) the NB model point estimate that goes haywire in this situation? It feels like this is evidence that the NB model is "biased" in a certain way here. (I'm putting "biased" in quotation marks because I cannot reproduce this with a parametric simulation where I know the true coefficient value; I really just mean it is apparently off base from basic descriptives).

As a more general question: How would you go about modeling data distributed like this? Count/numeric data distributed this way is not going to be amenable to any classical family of GLMs.

Any insights into this are appreciated!

jbowman · Accepted Answer · 2025-05-06 02:17:44Z

The NB model point estimate has not actually gone haywire in this situation. We can see this by looking at the full model output, which includes the standard errors of the estimate:

> summary(glm.nb( y ~ x1 + x2 )) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.33002 0.28587 4.652 3.28e-06 *** x1 -0.13081 0.35781 -0.366 0.7147 x2 0.05417 0.02473 2.191 0.0285 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Negative Binomial(0.4889) family taken to be 1) Null deviance: 81.014 on 70 degrees of freedom Residual deviance: 76.689 on 68 degrees of freedom AIC: 375.29

The standard error for the coefficient of x1 is 0.358, greater than the difference between the estimates in the Poisson and this model, which is 0.282. A change in the coefficient estimate of (slightly less than) 0.8 standard errors is nothing to worry about.

We can compare the NB model with the estimated coefficient with a model that has the coefficient forced to equal the estimate from the Poisson model:

m1 <- glm.nb( y ~ x1 + x2 ) m2 <- glm.nb( y ~ offset(0.1515809*x1) + x2) > AIC(m1,m2) df AIC m1 4 375.2859 m2 3 373.8324

The AIC is lower for the model using the Poisson estimate than for the full model, indicating that the Poisson estimate is not a poor estimate when plugged into the NB-based model. This supports the conclusion that the difference between the two estimates is immaterial from a statistical point of view.

Ben Bolker · Accepted Answer · 2025-05-06 16:11:49Z

@jbowman gives a good answer for the provided example but if the difference between the models is bigger, which seems to be what you are asking, then it's probably the Poisson model that is, potentially very, wrong.

The negative binomial model does not only assume that there is overdispersion, but that the overdispersion is growing as $\mu$ grows. The most common parameterization assumes $\sigma^2 = \mu + \frac{1}{\theta} \mu^2$. $\theta$ is called the shape or overdispersion parameter. If this model is true and x2 is a strong confounder, then not adjusting for it leaves the estimate confounded; when adjusting for it the Poisson model will massively underestimate the real variance high $\mu$-observations and be highly sensitive to the randomness in those values. This is more or less equivalent to a linear model dealing poorly with heteroscedasticity.

Below a very extreme simulation where the true coefficient is $-0.5$, the sign in the simple models is guaranteed positive and for the adjusted Poisson it's almost coin-flip that will almost certainly have a p-value of <2e-16 ***. The adjusted NB-model at the end gives accurate estimates. Check out the plot of residuals vs. predicted, x1==1 being red points, yes very extreme:

library(MASS) library(tidyverse) n <- 500 true_theta <- 2 set.seed(5) x2 <- runif(n, 0, 2) x1 <- rbinom(n, size = 1, prob = x2 > 1) y <- rnbinom(n, mu = 2*exp(-0.5 * x1 + 20*x2), size = true_theta) summary(glm.nb(y ~ x1)) summary(glm(y ~ x1, family = poisson())) m_poiss <- glm(y ~ x1 + x2, family = poisson()) plot(predict(m_poiss), residuals(m_poiss), col = x1+1) summary(m_poiss) summary(glm.nb(y ~ x1 + x2))

Stack Exchange Network

Why/when will negative binomial regression flip the sign of the effect from analogous Poisson regression, after covariate adjustment

2 Answers 2

Linked

Hot Network Questions

Why/when will negative binomial regression flip the sign of the effect from analogous Poisson regression, after covariate adjustment

2 Answers 2

Linked

Related

Hot Network Questions