Riddler Classic

"From Joel Lewis, this week’s Riddler Classic is a birthday puzzle for the ages:

The classic birthday problem asks about how many people need to be in a room together before you have better-than-even odds that at least two of them have the same birthday. Ignoring leap years, the answer is, paradoxically, only 23 people — fewer than you might intuitively think.

But Joel noticed something interesting about a well-known group of 100 people: In the U.S. Senate, three senators happen to share the same birthday of October 20: Kamala Harris, Brian Schatz and Sheldon Whitehouse.

And so Joel has thrown a new wrinkle into the classic birthday problem. How many people do you need to have better-than-even odds that at least three of them have the same birthday? (Again, ignore leap years.)"

The Simulation

For inputs \(N\), the number of times we will simulate, \(n\), the minimum number of people that need to share a birthday, and \(days\), with a default value of 365 (but allowed to use other values), we will output a histogram of the number of people it takes to obtain those \(n\) people and a summary of the simulation which will include the minimum, first, second, and third quartiles, the mean, and the maximum.

We will be interested in the median, as this value represents the middle point in which 50% of the simulations fell below and 50% fell above.

simB_Day <- function(N,n,days=365){
  people <- c()
  for(i in 1:N){
    bdays <- rep(0,days)
    p <- 0
    while(max(bdays)<n){
      day <- sample(1:365,1)
      bdays[day] <- bdays[day]+1
      p <- p+1
    }
    people <- c(people,p)
  }
  hist(people)
  return(summary(people))
}

## To check the classic birthday problem.
set.seed(192)
par(mfrow = c(3,2))
simB_Day(10000,2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   15.00   23.00   24.71   33.00   89.00
## Now, to explore the wrinkle.
simB_Day(10000,3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   65.00   88.00   89.16  111.00  217.00
simB_Day(10000,3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   65.00   87.00   88.29  110.00  235.00
simB_Day(10000,3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.00   65.00   88.00   88.93  111.00  209.00
simB_Day(10000,3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   65.00   87.00   88.44  111.00  229.00
simB_Day(10000,3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00   64.00   88.00   88.57  112.00  205.00

The Hypothesis Tests

So, is it 87 or 88? Assuming a null hypothesis that the proportion of times you will get at least 3 people having the same birthday when drawing \(n\) people (for 87 and 88) is less than or equal to 0.5, we will test a research hypothesis that there is evidence to indicate that the proportion is more than 0.5.

To do this, we’ll use another kind of simulation that provides the number of successful simulations that result in at least three people having the same birthday. It will take the same inputs as before and return a success rate along with a p-value for this particular hypothesis test.

simB_Day2 <- function(N,n,k,days=365){
  people <- c()
  successes <- 0
  for(i in 1:N){
    bdays <- rep(0,days)
    for(i in 1:n){
      day <- sample(1:365, 1)
      bdays[day] <- bdays[day]+1
    }
    if(max(bdays)>=k){
      successes <- successes +1
    }
  }
  p.hat <- successes/N
  p.value <- 1-pnorm((p.hat-0.5)/sqrt(0.25/N))
  return(c(p.hat=p.hat, p.value=p.value))
}

set.seed(8788)
simB_Day2(10000,87,3)
##     p.hat   p.value 
## 0.4959000 0.7938919
simB_Day2(10000,87,3)
##     p.hat   p.value 
## 0.5026000 0.3015318
simB_Day2(10000,88,3)
##      p.hat    p.value 
## 0.50840000 0.04647866
simB_Day2(10000,88,3)
##        p.hat      p.value 
## 0.5162000000 0.0005976485

Inconclusive

The p-values for the first two, when set to 87 people are not that convincing. The ones for 88 are somewhat convincing with one p-value less than 0.05 and the other less than .001. Let’s do a few more using 50000 trials.

## Result for 87 people
simB_Day2(50000,87,3)
##     p.hat   p.value 
## 0.4966800 0.9311949
## Result for 88 people
simB_Day2(50000,88,3)
##        p.hat      p.value 
## 5.087800e-01 4.308911e-05

These results, along with the previous, provide me statistical enough evidence to give an answer of 88 people.