2
$\begingroup$

I am working with binned data indexed by $( i = 1, \ldots, n )$, where for each bin $i$, I have:

  • $( X_i )$: the number of successes
  • $( N_i )$: the total number of trials

I want to model the proportion of successes within each bin, and I am trying to decide between using a binomial or a Poisson generalized linear model (GLM).

Current Approach:

My understanding is that I can define the proportion of successes with its maximum likelihood estimate for binomial data as: \begin{align} p_i = \frac{X_i}{N_i} \end{align}

A common approach is to model $ p_i $ using a binomial GLM: \begin{align} \text{logit}\left( E(p_i) \right) = \beta_0 + \beta_1 x_{1,i} + \dots + \beta_{p-1} x_{p-1,i} \end{align}

Alternatively, I could use a Poisson model with an offset to model the counts $ X_i $ directly: \begin{align} \log\left(E(X_i)\right) = \beta_0 + \beta_1 x_{1,i} + \dots + \beta_{p-1} x_{p-1,i} + \log(N_i) \end{align}

Since we can rewrite this as: \begin{align} \log\left(E(X_i)\right) - \log(N_i) = \log\left(\frac{E(X_i)}{N_i}\right), \end{align} this effectively models the rate (or proportion) of successes relative to the total number of trials.

Question:

Given that my primary interest is in modeling the occurrence of successes relative to the total number of trials within each bin (i.e., the proportion of successes $ \frac{X_i}{N_i} $ in each space/time bin $ i $), which model should I favor: the binomial GLM or the Poisson GLM with an offset?

Specifically:

  • Are there situations where one model would be more appropriate than the other?
  • Does the choice depend on properties of my data (e.g., the distribution of $ N_i $ across bins or the variability in proportions)?
$\endgroup$
3
  • 2
    $\begingroup$ Was there some reason why you binned the data (presumably in bins of time)? It's possible that modeling both the rates of trials and their corresponding probabilities of success over continuous time would provide more information. $\endgroup$ Commented Nov 14, 2024 at 20:38
  • $\begingroup$ I binned the data in spatial bins I have some gliders in the ocean recording data and I’m interested in the proportion of anomalous gliders over the total number of gliders for a given spatial area over a specific season $\endgroup$ Commented Nov 14, 2024 at 21:48
  • $\begingroup$ I edited to change some $\LaTeX$ contsructs not recognized by mathjax $\endgroup$ Commented Nov 14, 2024 at 23:23

1 Answer 1

2
$\begingroup$

You are modeling the ratio of successes to total trials, presumably with something more than a minimal success rate. That argues for a binomial model, with logistic regression a common choice. If the probability of success is very low and you have a large number of trials, the binomial distribution has a Poisson limit, but why go to that limit if the binomial model is more appropriate, simple enough, and easy to interpret?

Typically, a Poisson model with an offset is used when you are trying to evaluate a rate over continuous time or space. For example, if you were modeling something like the distribution of gliders over your binned areas, a Poisson model with a log(area) offset would make sense. See Gravity's Rainbow, for example.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.