set.seed(485)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(cache.extra = knitr::rand_seed)
The psychologist Tversky and his colleagues say that about four out of five people will answer (a) to the following question:
A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital 15 babies are born each day. Although the overall proportion of boys is about 50 percent, the actual proportion at either hospital may be more or less than 50 percent on any day. At the end of a year, which hospital will have the greater number of days on which more than 60 percent of the babies born were boys?
Assume that the probability that a baby is a boy is .5 (actual estimates make this more like .513). Decide, by simulation, what the right answer is to the question. Can you suggest why so many people go wrong?
To answer via simulation, I will create a function that first draws 365 Poisson random variables for the number of children born each day of the year. The mean value for the Poisson will be 15 for small hospitals and 45 for big hospitals. Using these random draws, I will then draw 365 binomially distributed variables, each with probability 0.5, but each with the number of trials set by the corresponding Poisson. Then, along all 365 observations, I will add up the number of times the boys are greater than 60% of the total population. This will be one sample of the problem described by Tversky. I will then replicate this process 100,000 times. The mean results of the 100,000 observations will be a good indication for the difference between two hospitals.
SimHospYear <- function(hospital){
# First draw number of babies
NumBabies <- rpois(365, lambda = hospital)
# Next draw number of boys
NumBoys <- rbinom(365, size = NumBabies, prob = 0.5)
# How many days (out of 365) were more than 60% boys
sum(NumBoys / NumBabies > 0.6)
}
# Next two lines will draw 100,000 observations with mean 45 & 15
Big <- replicate(1e5, SimHospYear(45), simplify = TRUE)
Small <- replicate(1e5, SimHospYear(15), simplify = TRUE)
# Average days with 60% boys. Need na.rm = TRUE because in small hospital, 0
# may occur. Technically may occur in big hospital, but not with only 365,000
# observations!!
mean(Big, na.rm = TRUE)
## [1] 31.79202
mean(Small, na.rm = TRUE)
## [1] 76.65756
The answer to Tversky’s question based on the simulation is that the small hospital will have more days with more than 60% boys. I believe the explanation is one of proportion. With a smaller base, it takes fewer boys to be over 60%. When the mean number of children is 15 it takes, on average 10 boys to be over 60%. When the mean number of boys is 7.5, getting to 10 is a variance of 2.5 boys. With a mean of 45 children, however, it takes 28 boys to be over 60%. This is a variance of 5.5 boys. With the probability of a boy being 50% independent of any other observation, getting those extra three boys is much more difficult.
library(ggplot2)
library(data.table)
DT <- data.table(Replication = seq_along(Big), Big = Big, Small = Small)
DT <- melt(DT, id.vars = "Replication", variable.name = "Hospital",
value.name = "Days60%+")
ggplot(DT, aes(x = Hospital, y = `Days60%+`)) + geom_boxplot() + coord_flip()
sessionInfo()
## R version 3.6.1 Patched (2019-07-14 r76832)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] data.table_1.12.2 ggplot2_3.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 knitr_1.24 magrittr_1.5 tidyselect_0.2.5
## [5] munsell_0.5.0 colorspace_1.4-1 R6_2.4.0 rlang_0.4.0
## [9] stringr_1.4.0 dplyr_0.8.3 tools_3.6.1 grid_3.6.1
## [13] gtable_0.3.0 xfun_0.9 withr_2.1.2 htmltools_0.3.6
## [17] assertthat_0.2.1 yaml_2.2.0 lazyeval_0.2.2 digest_0.6.20
## [21] tibble_2.1.3 crayon_1.3.4 purrr_0.3.2 codetools_0.2-16
## [25] glue_1.3.1 evaluate_0.14 rmarkdown_1.15 labeling_0.3
## [29] stringi_1.4.3 compiler_3.6.1 pillar_1.4.2 scales_1.0.0
## [33] pkgconfig_2.0.2