I am working with binned data indexed by $( i = 1, \ldots, n )$, where for each bin $i$, I have:
- $( X_i )$: the number of successes
- $( N_i )$: the total number of trials
I want to model the proportion of successes within each bin, and I am trying to decide between using a binomial or a Poisson generalized linear model (GLM).
Current Approach:
My understanding is that I can define the proportion of successes with its maximum likelihood estimate for binomial data as: \begin{align} p_i = \frac{X_i}{N_i} \end{align}
A common approach is to model $ p_i $ using a binomial GLM: \begin{align} \text{logit}\left( E(p_i) \right) = \beta_0 + \beta_1 x_{1,i} + \dots + \beta_{p-1} x_{p-1,i} \end{align}
Alternatively, I could use a Poisson model with an offset to model the counts $ X_i $ directly: \begin{align} \log\left(E(X_i)\right) = \beta_0 + \beta_1 x_{1,i} + \dots + \beta_{p-1} x_{p-1,i} + \log(N_i) \end{align}
Since we can rewrite this as: \begin{align} \log\left(E(X_i)\right) - \log(N_i) = \log\left(\frac{E(X_i)}{N_i}\right), \end{align} this effectively models the rate (or proportion) of successes relative to the total number of trials.
Question:
Given that my primary interest is in modeling the occurrence of successes relative to the total number of trials within each bin (i.e., the proportion of successes $ \frac{X_i}{N_i} $ in each space/time bin $ i $), which model should I favor: the binomial GLM or the Poisson GLM with an offset?
Specifically:
- Are there situations where one model would be more appropriate than the other?
- Does the choice depend on properties of my data (e.g., the distribution of $ N_i $ across bins or the variability in proportions)?