First, to be clear, the issue mentioned in the title does not happen often and, in every parametric simulation I've tried, both NB and Poisson produce similar estimates, regardless of the type of covariate. So, I cannot reproduce this phenomena using parametric simulations (not that it's impossible; I'm guessing it's not). Yet, there are a few commonalities I've seen in the cases where this does happen:
there is a numeric (non-categorical) predictor, with a strong effect, included in the model;
there is noteworthy departure from the mean/variance equivalence in the Poisson model; and
the Poisson point estimate comports with basic descriptive statistics, and the corresponding linear regression model for that matter, but the NB point estimate does not
I understand that when the covariates are not perfectly orthogonal, which is never the case with numeric covariates in a count regression model (I've read @cardinal's answer here: When do Poisson and negative binomial regressions fit the same coefficients?), that the two models will not produce identical estimates, but I'm trying to figure out what's going on here when there is a qualitative difference in the point estimates, and what to do when I encounter this situation.
This took a while to cook up but below is some fake data that exhibits this phenomena. I tried to create a situation where they were statistically significant in opposite directions, but I was going on kind of a trial and error basis here (mostly trying to mimic the shape/modality, etc., of real data where I've seen this) and eventually gave up.
y <- c(7,9,3,7,10,4,0,0,0,10,11,9,7,0,0,0,0,0,0,19,0,21,10,5,5,3,0,0,8,2,1,0,7,6,0,0,0,17,1,10,3,13,4,0,4,0,4,0,6,8,25,9,2,14,1,3,0,2,0,0,0,4,1,1,30,0,6,8,0,22,0) x1 <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) x2 <- c(3,0,4,28,9,5,7,0,0,12,6,1,7,6,11,35,0,0,1,0,5,7,0,0,0,10,10,0,0,2,4,0,0,0,6,3,17,9,0,10,1,17,1,3,0,3,5,1,7,2,12,0,7,20,17,0,1,6,0,0,5,0,0,4,27,4,0,9,5,8,7) Let's say the outcome is y, and x2 is the strong numeric predictor. The effect of x1, a binary predictor that is unassociated with x2, in the NB model gets weird when you include x2. First, the unadjusted effects:
library(MASS) coef( glm( y ~ x1, family=poisson ) )[2] x1 0.1602297 coef( glm.nb( y ~ x1 ) )[2] x1 0.1602297 ## which of course equals the difference of the logs of the means log( mean(y[which(x1==1)]) ) - log( mean(y[which(x1==0)]) ) [1] 0.1602297 All good so far. Now include x2 and it gets weird
coef( glm( y ~ x1 + x2, family=poisson ) )[2] x1 0.1515809 coef( glm.nb( y ~ x1 + x2 ) )[2] x1 -0.1308116 What's going on here, and why is it always (evidence not shown) the NB model point estimate that goes haywire in this situation? It feels like this is evidence that the NB model is "biased" in a certain way here. (I'm putting "biased" in quotation marks because I cannot reproduce this with a parametric simulation where I know the true coefficient value; I really just mean it is apparently off base from basic descriptives).
As a more general question: How would you go about modeling data distributed like this? Count/numeric data distributed this way is not going to be amenable to any classical family of GLMs.
Any insights into this are appreciated!
