4
$\begingroup$

I have a process which succeeds with known probability $p$, and fails with probability $q = 1-p$. The process is repeated a finite, but unknown, number of times $n$.

Given a number of successes $r$ (the number of failures $n-r$, is unknown), how should we determine a confidence interval around the number of trials that led to the $r$ successes?

To give a simple example, a fair coin was flipped a total of $n$ times, and landed 150 times with heads uppermost. How is $n$ distributed, and what is a 95% CI around its mean, covering the most likely values for $n$. The most likely $n$ is 300, the smallest $n$ is 150, and the maximum $n$ is infinity, but how do we think about everything in between?

Note: The interval doesn't have to be centered on the mean, if the distribution is not symmetric.


I've been looking at the wilson score, and other binomial proportion confidence intervals, and they are relevant, but they estimate $p$ instead, not $n$. I'm wondering if there exist similar approaches for exactly what I'm looking for.

Lacking a closed formula for an estimator, I was otherwise just going to try to find the ends of my confidence interval $(n_{low}, n_{high})$ by fitting the two gaussians with means $n_{low}p$ and $n_{high}p$ for which $\hat{n}p$ (where $\hat{n} = r/p$ is my estimated $n$) falls on their 2.5%-area tail boundary, using tables.

$\endgroup$
3
  • 2
    $\begingroup$ Could be worth to take a look at the en.wikipedia.org/wiki/Negative_binomial_distribution $\endgroup$ Commented Dec 12, 2024 at 18:18
  • $\begingroup$ oh! this looks very promising. i will read on $\endgroup$ Commented Dec 12, 2024 at 19:43
  • $\begingroup$ This question made me think of the following derived question stats.stackexchange.com/questions/658678/… which is about a search for the equivalent of a Jeffreys interval for a binomial proportion. $\endgroup$ Commented Dec 13, 2024 at 16:10

1 Answer 1

2
$\begingroup$

You can use the same approach as behind Pearson Clopper confidence intervals, but with the free parameter the number of trials $n$ instead of the probability of success $p$.

Below you see a plot of $P(X \leq 150|p=0.5,n)$

illustration of fiducial distribution

A 5% confidence interval can be found by excluding the tails with $P(X \leq 150|p=0.5,n) > 0.975$ and with $P(X \leq 150|p=0.5,n) < 0.025$. This gives for this case the interval $(269,336)$.

More about this logic:

Note/edit:

In the above reasoning I am using the fiducial distribution, but it is tricky to use for discrete distributions.

I used it a bit too sloppy/simplistic. What you want more precisely is to exclude the parts where $P(X \geq 150|n) \leq 0.025$ and $P(X \leq 150|n) \leq 0.025$.

For continuous distributions this doesn't matter but for discrete distributions it does matter because we don't have $P(X \leq 150|n) = 1-P(X \geq 150|n)$.

So in that case the interval is actually more precisely $(268,336)$.

$\endgroup$
1
  • 2
    $\begingroup$ There are of course many other approaches, just like finding the interval for a binomial proportion has many approaches. The simplest of them is to compute the maximum likelihood estimate and it's standard error based on the estimated parameter value (wald interval). The equivalent of a wilson score interval would be more tricky and you'd need to express the standard deviation at each hypothetical $n$, but it wouldn't be impossible. $\endgroup$ Commented Dec 13, 2024 at 15:48