MATH266: The Birthday Problem

The Problem

Th “birthday problem” is a famous example used in statistics classes that asks for the probability that, in a set of \(n\) randomly chosen people, at least two will share a birthday.

Simulation:

Try it out!

STEP 1) Can you simulate \(n=25\) random birthdays?

Hint: Use the sample() function

class<-sample(1:365, size=25, replace=TRUE)
class

##  [1] 282 136 195 124 258 287 180  12 309  98  87 195 327  26 297  10 343 339 221
## [20] 317  92  16 259 278 355

STEP 2) Are there any repeated birthdays?

Hint: Use the table() function

table(class)

## class
##  10  12  16  26  87  92  98 124 136 180 195 221 258 259 278 282 287 297 309 317 
##   1   1   1   1   1   1   1   1   1   1   2   1   1   1   1   1   1   1   1   1 
## 327 339 343 355 
##   1   1   1   1

Check to see if the maximum is greater than 1.

max(as.numeric(table(class)))

## [1] 2

STEP 3) Repeat this many times to calculate the empirical probability

nsim<-1000
sim<-c()
for(i in 1:nsim){
  class<-sample(1:365, size=25, replace=TRUE)
  this_sim<-max(as.numeric(table(class)))>1
  sim<-c(sim, this_sim)
}

mean(sim)

## [1] 0.567

STEP 4) How does this probability change for different class sizes?

nsim<-1000
cSize<-60 # max class size

birthProb<-matrix(nrow=60, ncol=2)
birthProb[,1]<-1:60

for(i in 1:cSize){
sim<-c()
for(j in 1:nsim){
  class<-sample(1:365, size=i, replace=TRUE)
  this_sim<-max(as.numeric(table(class)))>1
  sim<-c(sim, this_sim)
}
birthProb[i,2]<-mean(sim)
}

Theory:

STEP 6) How does this compare to theory?

Using principles of counting and the complement rule:

\(Pr(\text{at least 2 people have the same birthday})\)

\(=1-Pr(\text{no 2 people have the same birthday})\)

\(=1-\frac{365\times 364 \times \cdot \cdot \cdot \times (365-n+1)}{365^n}\)

Observe that \[365\times 364 \times \cdot \cdot \cdot \times (365-n+1)=\frac{365!}{(365-n)!}\]

This can also be thought of as the permutation of \(n\) distinct objects chosen from a set of 365 objects.

perm<-function(n, k){
  prod(n:(n-k+1))
}

# Permutation 
perm(365, 25)

## [1] 4.921544e+63

# True probability for 25 people
trueBD<-function(k){
  1-(perm(n=365, k)/(365^k))
}

trueBD(k=25)

## [1] 0.5686997

How does this change across different class sizes?

birthProb<-as.data.frame(birthProb)

for(i in 1:60){
  birthProb$true[i]<-trueBD(k=i)
}

colnames(birthProb)<-c("ClassSize", "Simulation", "True")

Comparison:

STEP 7) Make a pretty picture!

library(tidyverse)

colnames(birthProb)<-c("ClassSize", "Simulation", "True")

birthSim<-birthProb%>%
  gather("Type", "Prob", -ClassSize)


ggplot(birthSim, aes(x=ClassSize, y=Prob, color=Type))+
  geom_line(lwd=1, alpha=.75)+
  geom_hline(aes(yintercept=.5), lty=2)+
  theme_bw()

The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 50% in a group of only 23 people.