Without doing any fancy calculations, we can make a rough estimate based on the proportion of tagged trout in the second catch. If 10 out of 100 trout in the second catch have tags, and assuming that this proportion is representative of the entire lake, we might estimate that about 10% of the lake’s trout have tags. Therefore, a rough estimate of the total number of trout in the lake could be \(\frac{100}{0.10} = 1000\).
The probability of catching exactly 10 tagged trout out of the 100 trout caught the second time can be modeled using the hypergeometric distribution. The probability mass function (PMF) of the hypergeometric distribution is given by:
\[P(X = k) = \frac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}\] In this case, \(K = 100\) (initially tagged trout), \(n = 100\) (trout caught in the second sample), and \(k = 10\) (trout with tags in the second sample). Therefore, the expression for the probability is:
\[P(X = 10) = \frac{\binom{100}{10} \binom{N - 100}{90}}{\binom{N}{100}}\]
Using the above hypergeometric formula and R’s dhyper() function, the result was 999. This is very close to the expected 1000. I do not understand why it didn’t return exactly 1000 though.
K <- 100 # initially tagged trout
n <- 100 # trout caught in the second sample
k <- 10 # trout with tags in the second sample
N_values <- 190:5000 # there must be 190 since at least 90 from the second sample was not present in the first
likelihoods <- sapply(N_values, function(N) dhyper(k, K, N-K, n))
N_max_likelihood <- N_values[which.max(likelihoods)]
print(N_max_likelihood)
## [1] 999