library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
set.seed(905)
#initialize vector to store TRUE/FALSE (whether any one result in 10,000 trials turned up someone)
#with the rare blood type
any_with_rare_bloodtype <-c()
#run 100,000 tests of a city with 10,000 people. For each trial, check whether any of the 10,000
#"people" turned out to have the rare blood type.
for (i in 1:100000) {
pop <- sample(x=0:1, size=10000, prob = c(999/1000, 1/1000), replace = TRUE)
any_with_rare_bloodtype = c(any_with_rare_bloodtype, any(pop==1))
}
Now take the proportion of “cities” with at least one person with the rare blood type, divided by the number of tests. That’s the empirical probability of a city having at least one person with the blood type, so one minus that value is the probability we care about (of not a single person in the city having that blood type).
1 - sum(any_with_rare_bloodtype) / 100000
## [1] 7e-05