A worker for the Department of Fish and Game is assigned the job of estimating the number of trout in a certain lake of modest size. She proceeds as follows: She catches 100 trout, tags each of them, and puts them back in the lake. One month later, she catches 100 more trout, and notes that 10 of them have tags.

(a) Without doing any fancy calculations, give a rough estimate of the number of trout in the lake.

Without doing any fancy calculations, we can make a rough estimate based on the proportion of tagged trout in the second catch. If 10 out of 100 trout in the second catch have tags, and assuming that this proportion is representative of the entire lake, we might estimate that about 10% of the lake’s trout have tags. Therefore, a rough estimate of the total number of trout in the lake could be \(\frac{100}{0.10} = 1000\).

(b) Let N be the number of trout in the lake. Find an expression, in terms of N, for the probability that the worker would catch 10 tagged trout out of the 100 trout that she caught the second time.

The probability of catching exactly 10 tagged trout out of the 100 trout caught the second time can be modeled using the hypergeometric distribution. The probability mass function (PMF) of the hypergeometric distribution is given by:

\[P(X = k) = \frac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}\] In this case, \(K = 100\) (initially tagged trout), \(n = 100\) (trout caught in the second sample), and \(k = 10\) (trout with tags in the second sample). Therefore, the expression for the probability is:

\[P(X = 10) = \frac{\binom{100}{10} \binom{N - 100}{90}}{\binom{N}{100}}\]

(c) Find the value of N which maximizes the expression in part (b). This value is called the maximum likelihood estimate for the unknown quantity N.

Using the above hypergeometric formula and R’s dhyper() function, the result was 999. This is very close to the expected 1000. I do not understand why it didn’t return exactly 1000 though.

K <- 100  # initially tagged trout
n <- 100  # trout caught in the second sample
k <- 10   # trout with tags in the second sample

N_values <- 190:5000 # there must be 190 since at least 90 from the second sample was not present in the first

likelihoods <- sapply(N_values, function(N) dhyper(k, K, N-K, n))

N_max_likelihood <- N_values[which.max(likelihoods)]

print(N_max_likelihood)
## [1] 999