To select the most appropriate type of data distribution, it is necessary to calculate the dispersion value. This can be overdispersion and underdispersion. To determine the dispersion, I used the DHARMa package.
First sample. When fitting the GLM model, I used offset, given that "density" = "nest" (number of nests) / "area". Dependent variable is the number of nests of species2.
An example of my data. I have several independent predictors, here is one of them.
structure(list(area = c(11.7, 14.2, 9.7, 12.5, 12.7, 7.4, 7.5, 10, 11.3, 7.1, 7.7, 14.1, 5.7, 11.1, 8.8, 9, 7.8, 8.5), nest_species1 = c(1L, 6L, 2L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 0L, 4L, 1L, 5L, 4L, 1L, 1L, 1L), nest_species2 = c(2L, 1L, 2L, 0L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 0L, 0L, 1L, 0L, 0L, 1L), green = c(5.7, 6.02, 6.14, 5.6, 5.64, 6.43, 6.2, 6.05, 4.91, 5.85, 4.86, 7.14, 5.22, 6.81, 6.63, 6.76, 6.42, 6.5)), class = "data.frame", row.names = c(NA, -18L )) I started by fitting a GLM using Poisson:
model.1=glm (nest_species2 ~ green + offset(log(area)), data= Example, family = poisson (link="log")) summary(model.1) Call: glm(formula = nest_species2 ~ green + offset(log(area)), family = poisson(link = "log"), data = Example) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.4435 2.1259 -0.679 0.497 green -0.1217 0.3512 -0.347 0.729 (Dispersion parameter for poisson family taken to be 1) Null deviance: 16.523 on 17 degrees of freedom Residual deviance: 16.404 on 16 degrees of freedom AIC: 50.699 Number of Fisher Scoring iterations: 5 We see that the residual deviation is equal to the number of degrees of freedom. That is, there is no under- or overdispersion. Next, I determine the dispersion using the DHARMa package.
library(DHARMa) testDispersion(model.1) DHARMa nonparametric dispersion test via sd of residuals fitted vs. simulated data: simulationOutput dispersion = 0.6911, p-value = 0.472 alternative hypothesis: two.sided Judging by the value of 0.6911, there is underdispersion here. Given the underdispersion, I fit the GLM using double Poisson regression:
gamlss1 <- gamlss(nest_species2 ~ green + offset(log(area)), data= Example, family = DPO) summary(gamlss1) Family: c("DPO", "Double Poisson") Call: gamlss(formula = nest_species2 ~ green + offset(log(area)), family = DPO, data = Example) Fitting method: RS() ------------------------------------------------------------------ Mu link function: log Mu Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.4574 1.7915 -0.813 0.429 green -0.1200 0.2958 -0.406 0.691 ------------------------------------------------------------------ Sigma link function: log Sigma Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.3143 0.3324 -0.946 0.359 ------------------------------------------------------------------ No. of observations in the fit: 18 Degrees of Freedom for the fit: 3 Residual Deg. of Freedom: 15 at cycle: 3 Global Deviance: 45.97346 AIC: 51.97346 SBC: 54.64457 Questions.
- Why is the variance estimate so different when using the ratio of residual deviation to residual degrees of freedom and when using DHARMa? Should I trust the DHARMa results more?
- Am I correct in interpreting the results using DHARMa to indicate that underdispersion was detected?
- Is it correct to use double Poisson regression in my case?
Second sample. Same input data. Similar algorithm for another bird species (species1). GLM fit using Poisson:
model.2=glm (nest_species1 ~ green + offset(log(area)), data= Example, family = poisson (link="log")) summary(model.2) Call: glm(formula = nest_species1 ~ green + offset(log(area)), family = poisson(link = "log"), data = Example) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.4714 1.7872 -3.061 0.0022 ** green 0.6269 0.2822 2.221 0.0263 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 18.122 on 17 degrees of freedom Residual deviance: 12.918 on 16 degrees of freedom AIC: 58.467 Number of Fisher Scoring iterations: 5 We see that the residual deviation is less than the number of degrees of freedom. These are signs of underdispersion. Next, I determine the dispersion using the DHARMa package.
library(DHARMa) testDispersion(model.2) DHARMa nonparametric dispersion test via sd of residuals fitted vs. simulated data: simulationOutput dispersion = 0.90849, p-value = 0.976 alternative hypothesis: two.sided Judging by the value of 0.90849 there is underdispersion here. Given the insufficient dispersion, I fit the GLM using double Poisson regression:
gamlss2 <- gamlss(nest_species1 ~ green + offset(log(area)), data= Example, family = DPO) summary(gamlss2) Family: c("DPO", "Double Poisson") Call: gamlss(formula = nest_species1 ~ green + offset(log(area)), family = DPO, data = Example) Fitting method: RS() ------------------------------------------------------------------ Mu link function: log Mu Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.4952 1.4053 -3.910 0.00139 ** green 0.6293 0.2222 2.833 0.01260 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ------------------------------------------------------------------ Sigma link function: log Sigma Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.4612 0.3041 -1.516 0.15 ------------------------------------------------------------------ No. of observations in the fit: 18 Degrees of Freedom for the fit: 3 Residual Deg. of Freedom: 15 at cycle: 3 Global Deviance: 52.57692 AIC: 58.57692 SBC: 61.24803 Questions.
- Am I correct in interpreting the results using DHARMa to indicate that underdispersion was detected?
- How should I correctly interpret the DHARMa results (> 1 indicates overdispersion, < 1 indicates underdispersion)?
- Is it correct to use double Poisson regression in my case?