Negative Binomial Regression Model - Effect of Removing Significant Covariates

Question

I have been dabbling in NB regression for less than a year now. I have applied the well known g.o.f. tests. Lately I started using the Conditional Moment (CM) test, described in Cameron and Trivedi 2003 Regression Analysis of Count-Data book. (Chp. 5, pg. 194). It basically looks at the expected versus observed counts, and tests the g.o.f. based on Chi-Square statistic.

Here is my problem: I generate synthetic NB2 data, where the linear predictor is a function of two covariates. Below is the R code borrowed from Hilbe's [NB Book]:

nb2 <- function(nobs = 5000, off = 0, xv = c(-16.50, 1.65, 0.75)) { p <- length(xv) -1 x1 <- 1 x2 <- rnorm(nobs,mean = 10000, sd = 2100) x3 <- 0.1 + runif(nobs) X <- cbind(1, log(x2), x3) xb <- X %*% xv alpha = 6.50 exb <- exp(xb + off) # Poisson predicted value xg <- rgamma(n = nobs, shape = alpha, rate = alpha) # generate gamma variates given alpha xbg <-exb*xg # mix Poisson and gamma variates nby <- rpois(nobs, xbg) # generate NB2 variates out <- data.frame(cbind(nby, x2, x3)) names(out) <- c("y", "x2", "x3") return(out) }

I generate 5000 random NB2 data:

data <- nb2(nobs = 5000)

The summary table of the generated data is:

Then I fit the data to NB2 model using glm.nb function in R using the 2 covariates, as follows:

m1 <- glm.nb(CRASH ~ log(x2) + x3, data = data)

The results are:

 Deviance Residuals: Min 1Q Median 3Q Max -1.4137 -0.9121 -0.7612 0.5488 3.3634 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -15.20781 1.03408 -14.707 <2e-16 *** log(x2) 1.50963 0.11167 13.519 <2e-16 *** x3 0.71873 0.07901 9.097 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Negative Binomial(4.7311) family taken to be 1) Null deviance: 4714.7 on 4999 degrees of freedom Residual deviance: 4433.2 on 4997 degrees of freedom AIC: 8437.5 Number of Fisher Scoring iterations: 1 Theta: 4.73 Std. Err.: 1.16 2 x log-likelihood: -8429.509

When I calculate the Pearson Statistic, I get 1.0063, which is expected. I also apply the CM test for testing how well the expected versus observed count match using 11 bins (where the last bin includes counts of 10 and more). The corresponding Chi Square statistic is 5.77, which is significant for df = 10. This can also be visually verified using the Rootogram plots of the model, where the expected and observed counts overlap perfectly.

Then I try an intercept only model, the results are:

 Deviance Residuals: Min 1Q Median 3Q Max -0.8938 -0.8938 -0.8938 0.6647 3.2406 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.83933 0.02325 -36.11 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Negative Binomial(2.5845) family taken to be 1) Null deviance: 4419.8 on 4999 degrees of freedom Residual deviance: 4419.8 on 4999 degrees of freedom AIC: 8705.4 Number of Fisher Scoring iterations: 1 Theta: 2.584 Std. Err.: 0.410 2 x log-likelihood: -8701.442

Where, the Pearson Statistic is 1.0060, and the Chi Square statistic for the CM test is 3.897 (which is again significant). The rootogram is:

1- I did the same with with one covariate only , the results are similar, where I do not see the effect of removing a significant covariate.

2- I tried using more than two covariates up to six, still the same result.

3- I did this with many different random sets, with varying sample sizes and I get similar results, where I do not see the effect of removing the covariates, and looks like an intercept only model or one covariate does the job (Of course, we can see that the LL values are much improved when adding covariates).

4- On the other hand, when I repeat this experiment with synthetic Poisson data, fit the model using a Poisson regression model, and carry on with the same approach, the impact of removing a covariate is significant.

Does any one have an explanation for this? I would appreciate any input, comment, suggestion.

Could this be because NB requires that the NB distribution be fully correct, including a specific variance/mean relationship, while the Poisson quasi-MLE doesn’t? — dimitriy
– dimitriy, Commented Oct 26, 2023 at 9:27
When I generate Poisson count variables, I fit them using Poisson regression (no overdispersion inherent in data). Removing a known significant covariate clearly has an impact on all fronts. Conversely, when NB2 count variables are generated and fit using NB2 regression, this effect is not observed - except in LL values. — Barton
– Barton, Commented Oct 26, 2023 at 10:04

Peter Flom · Accepted Answer · 2023-10-25 19:21:24Z

1

The two models are clearly different:

the parameter estimate for the intercept changes hugely, as does its z value.
the deviance residuals change a little
the log likelihood changes

The p values have so many 0's that you can't see any change. Make the model less exact and you will probably see changes in p values as well.

answered Oct 25, 2023 at 19:21

Peter Flom

141k37 gold badges201 silver badges484 bronze badges

$\begingroup$ No doubt that two models are different. I should have been clearer with my question. What if we were given this dataset without knowing what the significant covariates are, but had an inkling that it could be x2, for example. It comes out as significant in model results, gof tests (Pearson ChiSqr, CM test, rootogram) are all good, and no reason to think of another covariate in the model. In hindsight, we know x3 is significant, yet gof tests do not show that. Conversely, with the Poisson dataset, the gof tests clearly show that with only an intercept, or intercept + log(x2) is not sufficient. $\endgroup$

Barton
– Barton

2023-10-26 07:30:53 +00:00
Commented Oct 26, 2023 at 7:30
$\begingroup$ Judging models solely by p values is a bad idea. I've been saying that for decades. $\endgroup$

Peter Flom
– Peter Flom

2023-10-26 11:19:34 +00:00
Commented Oct 26, 2023 at 11:19
$\begingroup$ I generated 10K rv.s from NB2, where linear predictor is based on 5 covariates. Dropped 1 covariate at a time, ending with an intercept only model. Repeated it for 100 times. The impact of removing a covariate is very little based on Pearson stats (increasing only to 1.08 on average for intercept only model). Repeated the same procedure with Poisson rvs, and Poi model. I see obvious impact on results (pearson stat increase to 1.30 on avg for intercept only model) . I assume the reason is larger variance in NB2 model, and the removing a covariate doesn't have much of an impact as in Poi model. $\endgroup$

Barton
– Barton

2023-11-01 15:47:09 +00:00
Commented Nov 1, 2023 at 15:47

Add a comment |

Stack Exchange Network

Negative Binomial Regression Model - Effect of Removing Significant Covariates

1 Answer 1

Hot Network Questions

Negative Binomial Regression Model - Effect of Removing Significant Covariates

1 Answer 1

Related

Hot Network Questions