discussion7

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

set.seed(905)

#initialize vector to store TRUE/FALSE (whether any one result in 10,000 trials turned up someone)
#with the rare blood type
any_with_rare_bloodtype <-c()

#run 100,000 tests of a city with 10,000 people. For each trial, check whether any of the 10,000
#"people" turned out to have the rare blood type.
for (i in 1:100000) {
  pop <- sample(x=0:1, size=10000, prob = c(999/1000, 1/1000), replace = TRUE)
  any_with_rare_bloodtype = c(any_with_rare_bloodtype, any(pop==1))
}

Now take the proportion of “cities” with at least one person with the rare blood type, divided by the number of tests. That’s the empirical probability of a city having at least one person with the blood type, so one minus that value is the probability we care about (of not a single person in the city having that blood type).

1 - sum(any_with_rare_bloodtype) / 100000

## [1] 7e-05

discussion7_reply

2024-03-10