3.8 key

Walleye question

In a small pond, there are 70 bass and 40 walleye. A fisherman catches 20 fish from the pond to take home for dinner. Let \(Y\) represent the number of walleye in his catch.

Write a simulation study that generates 10,000 realizations of \(Y\) without using rhyper(). Verify that this has the correct hypergeometric distribution by plotting the empirical CDF overlaid vs the appropriate hypergeometric CDF evaluated at the simulated data values, and with a p-p plot of the analytic versus the simulated cumulative probabilities. Use your simulation study to also verify \(E(Y)\) and \(Var(Y)\).

Code for producing simulation study:

library(tidyverse)
library(purrrfect)

pond <- rep(c('bass','walleye'), c(70,40))

(many_catches <- replicate(10000, sample(pond, 20, replace = FALSE), .as = catch)
                %>% mutate(Y = map_dbl(catch,\(x) sum(x=='walleye'))) 
                %>% mutate(empirical_cdf = cume_dist(Y),
                           analytic_cdf = phyper(Y, 40, 70, 20))
) %>% head
# A tibble: 6 × 5
  .trial catch          Y empirical_cdf analytic_cdf
   <dbl> <list>     <dbl>         <dbl>        <dbl>
1      1 <chr [20]>     9         0.877        0.873
2      2 <chr [20]>     9         0.877        0.873
3      3 <chr [20]>     5         0.183        0.182
4      4 <chr [20]>     8         0.740        0.738
5      5 <chr [20]>     8         0.740        0.738
6      6 <chr [20]>     8         0.740        0.738

We know that analytically, \(E(Y) = \frac{r}{N}\times n = \frac{40}{110}\times 20 = 7.27\) and \(Var(Y) = n \times \frac{r}{N}\times \left(1-\frac{r}{N}\right)\times \frac{N-n}{N-1}=3.82\).

Approximating these from the simulation:

#Verify E(Y) and Var(Y)
(many_catches 
  %>% summarize(EY = mean(Y), VarY = var(Y))
)
# A tibble: 1 × 2
     EY  VarY
  <dbl> <dbl>
1  7.26  3.86

Now verify the simulated data follow a hypergeometric distribution by plotting the overlaid CDF plots and p-p plot:

#Verify simulated data follow hypergeometric:

(ggplot(aes(x=Y),data=many_catches)
  + geom_step(aes(y =empirical_cdf, color = 'Simulated CDF'))
  + geom_step(aes(y = analytic_cdf, color = 'Analytic CDF'))
  + labs(y='P(Y <= y)', x= 'y', color='')
  + theme_classic(base_size = 14)
)

(ggplot(aes(x=empirical_cdf, y = analytic_cdf),data=many_catches)
  + geom_point()
  + geom_abline(aes(intercept = 0, slope = 1))
  + labs(y='Analytic cumulative probabilities',
         x='Empirical cumulative probabilities')
  + theme_classic(base_size = 14)
)