6
$\begingroup$

I am a math student working with a group of field biologists. In multiple experiments of mark-recapture of the same population, they claimed that if the number of observations (recaptures) is large enough, it is possible to infer both population size ($N$) and proportion ($p=K/N$) based only on the sample, without prior knowledge of the number of individuals that were originally marked ($K$).

Considering a sample of size $n$ with $k$ marked individuals is drawn from the population, a histogram of the observed proportions is created using jackknifes subsamples. Then a hypergeometric distribution is fitted to the histogram using minimum squares. This leads to a pair of estimated parameters $\hat{p}$ and $\hat{N}$.

I am convinced that under the assumption of $K$ unknown this estimator converges to the sample size ($n$), i.e $\hat{N} \rightarrow n$ (or at least to a value different than $N$). However, I haven't proved it yet.

They tested this estimator with simulated data. My conclusion is that in the simulations the standard deviation of $\hat{N}$ is large enough so that the values of $N$ and $\hat{N}$ sometimes coincide.

I am creating this question to verify whether I am correct. Also, I am not very good explaining myself, and need to create an argument to convince non-specialists that $\hat{N} \rightarrow n$. Thank all of you for your help.

$\endgroup$
9
  • $\begingroup$ +1 Nice clear explanation. Do you know what estimators they're using to estimate N and p? $\endgroup$ Commented Jun 14, 2024 at 5:23
  • $\begingroup$ They create a histogram of the observed proportions (they make many observations) and they fit a hypergeometric distribution trough minimum squares. Thanks Glen_b. $\endgroup$ Commented Jun 14, 2024 at 5:27
  • $\begingroup$ Better said, they made 1 single (large) observation (size n) and they sub-sample (bootstrap or jackknifes, not entirely sure) this vector to create the histogram. $\endgroup$ Commented Jun 14, 2024 at 5:32
  • $\begingroup$ Do you at least know if you've recaptured the same specimen, i.e. is the mark unique? If not you might be able to estimate $p$ but I don't see how that tells you anything about $N$. $\endgroup$ Commented Jun 14, 2024 at 6:09
  • 1
    $\begingroup$ @Glen_b The population sizes are $N$ (total, unkown), $K$ (marked, unkown), and the sample sizes are $n$ (total sampled, known), $k$ (marked, known). $\endgroup$ Commented Jun 14, 2024 at 7:33

1 Answer 1

4
$\begingroup$

From first principles it is clear that the single observed hypergeometric sample of size $n$ out of which $k$ will be informative about little more than $p=K/N$. Both $N$ and $K$ will be nearly unidentifiable which can be seen from the fact that the likelihood tends towards a flat ridge along the line given by $K/N=k/n$ as $N$ and $K$ becomes large (this follows from the binomial as a limit of hypergeometic distribution). The likelihood ridge is illustrated below for the sample $k=5$ and $n=10$.

It is impossible that the subsampling method employed can produce any information about $N$ beyond the information provided by the above observed likelihood. Conditional on $n$ and $k$ (that is, conditional on the original sample), it is true that subsamples of size $m$ (assuming that these are also drawn without replacement from the orignal sample) will be independently hypergeometrically distributed but the parameters will be $n,k,m$ rather than $N,K,m$. So when estimating $N$ by fitting a hypergeomertic distribution to these subsamples, the field biologists are in reality indeed estimating $n$, not $N$. It is also true that the subsamples hypergeometrically distributed with parameters $N,K,m$ but this is only marginally (that is, when not conditioning on $n$ and $k$). And marginally, the subsamples are also not independent which makes fitting a hypergeometric distribution to the subsamples invalid when the aim is to estimate $N$.

N <- 1:100 K <- 1:100 n <- 10 k <- 5 contour(N, K, outer(N, K, function(N,K) dhyper(k, K, N-K, n)), xlab="N", ylab="K", nlevels = 50 ) #> Warning in dhyper(k, K, N - K, n): NaNs produced 

Created on 2024-06-14 with reprex v2.1.0

$\endgroup$
1
  • 1
    $\begingroup$ Sorry for my late response, I was waiting for further comments. Your answer gave me valuable insights. Thank you Jarle! $\endgroup$ Commented Jul 9, 2024 at 6:17

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.