A-level Capture and Recapture Problem

Author

Dr Andrew Dalby

Question

There are \(n_{0}\) fish in a lake. A random sample of m of these fish is taken. the fish in this sample are tagged and released unharmed back into the lake. After a suitable interval, a second random sample of size n is taken. The random variable R is the number of fish in this second sample that are found to have been tagged. Assuming that the probability that a fish is captured is independent of whether it has been tagged or not, and that \(n_{0}\) is sufficiently large for a binomial approximation to be used, obtain the expectation of R in terms of m,n and \(n_{0}\). Suppose that m=100, n=4000 and the observed value of R is 20. Obtain an approximate symmetric 98% confidence interval for the proportion of fish in the lake which are tagged. Deduce an approximate 98% confidence interval for \(n_{0}\).

Solution

Firstly you need to write the formula connecting R to m,n and \(n_{0}\).

\(\dfrac{R}{m}=\dfrac{n}{n_{0}}\)

\(R=\dfrac{m \times n}{n_{0}}\)

\(\dfrac{R}{n}=\dfrac{m}{n_{0}}\)

The sample proportion is

\(\dfrac{R}{n} = \dfrac{20}{4000} = 0.005\)

The critical value for the 98% confidence interval is 2.326 which means that the formula for the confidence interval for the sample proportion p is

\(p_{s}-2.326 \sqrt{\dfrac{p_{s}(1-p_{s})}{n}}< p <p_{s}+2.326 \sqrt{\dfrac{p_{s}(1-p_{s})}{n}}\)

where \(p_{s}=0.005\) and \(n=4000\)

a <- 0.005*(1-0.005)
b <- a/4000
c <- sqrt(b)
d <- 2.326*c
lower <- 0.005 - d
upper <- 0.005 + d
lower 
[1] 0.002405962
upper
[1] 0.007594038

So that

\(0.00241<p< 0.00759\)

The sample proportion multiplied by \(n_{0}\) gives m.

n1 <- 100/lower
n2 <- 100/upper
n1
[1] 41563.41
n2
[1] 13168.23

Which gives confidence bounds for \(n_{0}\) of approximately 13168 to 41563.

This is important as calculating the confidence bands for the proportion gives a confidence interval for \(n_{0}\) which is not centred on 20,000 which is the point estimate that you get for using the population proportion for the sample.

Capture and recapture experiments are a lot more complex than the versions given at GCSE. Even by simulation with a known value for \(n_{0}\) and a specified probability of capture you can get some very different estimates to the reality from a single sample.