Background

In this example, we will examine data from a “Mark-Recapture” study. The goal with this analysis is to estimate the size of a population of animals by first capturing a sample of the population, marking and releasing them, and then recapturing a population sample at a later date. In that recaptured sample, the researcher hopes to recover marked individuals. They then use the numbers of captured, marked, recovered, and marked-recovered individuals to create an estimate of the total population size.

Scenario

You live in rural Uruguay and have recently inherited a large area of old agricultural land with a population of cattle. You are interested in estimating the size of the cattle population but realize that capturing all of them across your land will be difficult. Because of this, you decide to determine the MLE for the cattle population size using a mark-recapture study.

You and a friend set out one morning and capture 10 cattle. Each individual is marked with a small patch of harmless, biodegradable paint. The cattle are released after being marked.

A few days later, you and your friend again capture a sample of cattle but this time you gather 25 individuals and notice that 3 of them have the paint that you used earlier. What can this tell you about the cattle population size?

Procedure

To perform this estimate of N animals in the total population, we need to know the number of animals that were captured and marked/tagged (10), the number of animals that were recaptured (25) and the number of recaptured animals that were previously marked (3).

We assume that the population of animals did not change during the study and that each time an animal was captured, it was equally likely to be any of the uncaptured animals in the total population.

The distribution representing the number of marked animals (X) that were recaptured is given as:

\(P(X=i) = \frac{{N-m \choose n-i}{m \choose i}}{{N \choose n}}\)

where \(N\) represents the total population size, \(m\) represents the number of marked animals, \(n\) represents the number of recaptured animals, and \(i\) represents the number of marked animals in the sample of recaptured animals.

Our distribution is therefore:

\(P(X=i) = \frac{{N-10 \choose 25-3}{10 \choose 3}}{{N \choose 25}}\)

#NOTES
#=====
#n = entire population

#VARIABLES
#First day (marking)
tagged = 10 #number of animals that were captured, tagged, and released

#Second day (recapture)
recaptured = 25 #number of recaptured animals
recapturedTagged = 3
recapturedNotTagged = recaptured-recapturedTagged #(22)

#ANALYSIS
#define a sequence of values from 0 to 100 by 1 (0,1,2,3...n)
input = seq(0,100,1)

#create a function to calculate our MLE
mleFunction<-function(n)
{
  (choose(n-tagged,recapturedNotTagged) * choose(tagged,recapturedTagged)) / choose(n,recaptured)  
}

#apply the function to input and store output 
output <- mleFunction(input)

#store input and output in a data frame
df<-data.frame(input,output)

#remove NA and INF values 
df <- df[is.finite(rowSums(df)),]

#print the maximum value of the output (this also shows the value of input at that point). Note: the input that maximizes the output is the MLE and our estimate of population size.
df[which.max(df$output), ]
#create a line plot to show the function 
plot(input,output, type="l", ylim=c(0,0.3), ylab="P (Observed Recapture Results)", xlab="Population Size Estimate")

#add a horizontal line at the maximum output value and vertical line at maximum input value
abline(h=df[which.max(df$output), ], col="blue")
abline(v=df[which.max(df$output), ]$input, col="red")

Let’s now look at some summary information. We can first examine the MLE which is the estimated size of the unknown whole population of cattle. This is the same as the input value for our function that maximizes the output probability. In other words, given our recapture results, the MLE is the most probable population size. We can find it in two ways: directly calculating it from our data or by examining the whole number associated with the peak of our function (note: animals usually occur as countable integers).

We can also examine the proportion of tagged individuals out of the estimate of population size. How does this compare to the proportion of tagged-and-recaptured individuals out of the sample of recaptured animals? What does this tell us? We notice that the proportions are equivalent which tells us that our model of (tagged vs whole population) is roughly equivalent to (recovered tags vs recaptured population size). That is, \(\frac{10_{tagged}}{83_{estimated}}\) is roughly equivalent to \(\frac{10_{recovered \& tagged}}{25_{recovered}}\).

#SUMMARY
#calculate MLE directly from data 
mleEstimate<-(tagged*recaptured)/recapturedTagged

#find the input value (population size) that maximizes the output (probability)
mleInteger <-df[which.max(df$output), ]$input 

#calculate proportion of tagged out of the whole-number estimate of population size
proportion <- tagged/mleInteger

#calculate the proportion of tagged recaptured individuals out of the whole sample of recaptured individuals
proportion2<-recapturedTagged/recaptured

cat("MLE (calculated) = ", mleEstimate, "\n")
## MLE (calculated) =  83.33333
cat("MLE (integer) = ", mleInteger, "\n")
## MLE (integer) =  83
cat("Tagged / MLE(integer)) = ", proportion, "\n")
## Tagged / MLE(integer)) =  0.1204819
cat("Proportion of tagged animals in the recaptured sample = ", proportion2, "\n")
## Proportion of tagged animals in the recaptured sample =  0.12
#NOTE: the proportion of tagged out of the whole population matches the number of tagged in the recaptured sample.