set.seed(485)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(cache.extra = knitr::rand_seed)

Question

The psychologist Tversky and his colleagues say that about four out of five people will answer (a) to the following question:

A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital 15 babies are born each day. Although the overall proportion of boys is about 50 percent, the actual proportion at either hospital may be more or less than 50 percent on any day. At the end of a year, which hospital will have the greater number of days on which more than 60 percent of the babies born were boys?

the large hospital
the small hospital
neither—the number of days will be about the same.

Assume that the probability that a baby is a boy is .5 (actual estimates make this more like .513). Decide, by simulation, what the right answer is to the question. Can you suggest why so many people go wrong?

Answer

Process

To answer via simulation, I will create a function that first draws 365 Poisson random variables for the number of children born each day of the year. The mean value for the Poisson will be 15 for small hospitals and 45 for big hospitals. Using these random draws, I will then draw 365 binomially distributed variables, each with probability 0.5, but each with the number of trials set by the corresponding Poisson. Then, along all 365 observations, I will add up the number of times the boys are greater than 60% of the total population. This will be one sample of the problem described by Tversky. I will then replicate this process 100,000 times. The mean results of the 100,000 observations will be a good indication for the difference between two hospitals.

Code

SimHospYear <- function(hospital){
  # First draw number of babies
  NumBabies <- rpois(365, lambda = hospital)
  # Next draw number of boys
  NumBoys <- rbinom(365, size = NumBabies, prob = 0.5)
  # How many days (out of 365) were more than 60% boys
  sum(NumBoys / NumBabies > 0.6)
}
# Next two lines will draw 100,000 observations with mean 45 & 15
Big <- replicate(1e5, SimHospYear(45), simplify = TRUE)
Small <- replicate(1e5, SimHospYear(15), simplify = TRUE)

# Average days with 60% boys. Need na.rm = TRUE because in small hospital, 0 
# may occur. Technically may occur in big hospital, but not with only 365,000
# observations!!
mean(Big, na.rm = TRUE)

## [1] 31.79202

mean(Small, na.rm = TRUE)

## [1] 76.65756

Specific Answers

The answer to Tversky’s question based on the simulation is that the small hospital will have more days with more than 60% boys. I believe the explanation is one of proportion. With a smaller base, it takes fewer boys to be over 60%. When the mean number of children is 15 it takes, on average 10 boys to be over 60%. When the mean number of boys is 7.5, getting to 10 is a variance of 2.5 boys. With a mean of 45 children, however, it takes 28 boys to be over 60%. This is a variance of 5.5 boys. With the probability of a boy being 50% independent of any other observation, getting those extra three boys is much more difficult.

Boxplots

library(ggplot2)
library(data.table)
DT <- data.table(Replication = seq_along(Big), Big = Big, Small = Small)
DT <- melt(DT, id.vars = "Replication", variable.name = "Hospital",
           value.name = "Days60%+")
ggplot(DT, aes(x = Hospital, y = `Days60%+`)) + geom_boxplot() + coord_flip()

Session Info

sessionInfo()

## R version 3.6.1 Patched (2019-07-14 r76832)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] data.table_1.12.2 ggplot2_3.2.1    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2       knitr_1.24       magrittr_1.5     tidyselect_0.2.5
##  [5] munsell_0.5.0    colorspace_1.4-1 R6_2.4.0         rlang_0.4.0     
##  [9] stringr_1.4.0    dplyr_0.8.3      tools_3.6.1      grid_3.6.1      
## [13] gtable_0.3.0     xfun_0.9         withr_2.1.2      htmltools_0.3.6 
## [17] assertthat_0.2.1 yaml_2.2.0       lazyeval_0.2.2   digest_0.6.20   
## [21] tibble_2.1.3     crayon_1.3.4     purrr_0.3.2      codetools_0.2-16
## [25] glue_1.3.1       evaluate_0.14    rmarkdown_1.15   labeling_0.3    
## [29] stringi_1.4.3    compiler_3.6.1   pillar_1.4.2     scales_1.0.0    
## [33] pkgconfig_2.0.2

DT 605—Fall 2019—Week 5

Probability—Chapter 1.1—Q13 Pages 14–15

Avraham Adler

9/22/2019