Th “birthday problem” is a famous example used in statistics classes that asks for the probability that, in a set of \(n\) randomly chosen people, at least two will share a birthday.
Hint: Use the sample()
function
class<-sample(1:365, size=25, replace=TRUE)
class
## [1] 282 136 195 124 258 287 180 12 309 98 87 195 327 26 297 10 343 339 221
## [20] 317 92 16 259 278 355
Hint: Use the table()
function
table(class)
## class
## 10 12 16 26 87 92 98 124 136 180 195 221 258 259 278 282 287 297 309 317
## 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
## 327 339 343 355
## 1 1 1 1
Check to see if the maximum is greater than 1.
max(as.numeric(table(class)))
## [1] 2
nsim<-1000
sim<-c()
for(i in 1:nsim){
class<-sample(1:365, size=25, replace=TRUE)
this_sim<-max(as.numeric(table(class)))>1
sim<-c(sim, this_sim)
}
mean(sim)
## [1] 0.567
nsim<-1000
cSize<-60 # max class size
birthProb<-matrix(nrow=60, ncol=2)
birthProb[,1]<-1:60
for(i in 1:cSize){
sim<-c()
for(j in 1:nsim){
class<-sample(1:365, size=i, replace=TRUE)
this_sim<-max(as.numeric(table(class)))>1
sim<-c(sim, this_sim)
}
birthProb[i,2]<-mean(sim)
}
Using principles of counting and the complement rule:
\(Pr(\text{at least 2 people have the same birthday})\)
\(=1-Pr(\text{no 2 people have the same birthday})\)
\(=1-\frac{365\times 364 \times \cdot \cdot \cdot \times (365-n+1)}{365^n}\)
Observe that \[365\times 364 \times \cdot \cdot \cdot \times (365-n+1)=\frac{365!}{(365-n)!}\]
This can also be thought of as the permutation of \(n\) distinct objects chosen from a set of 365 objects.
perm<-function(n, k){
prod(n:(n-k+1))
}
# Permutation
perm(365, 25)
## [1] 4.921544e+63
# True probability for 25 people
trueBD<-function(k){
1-(perm(n=365, k)/(365^k))
}
trueBD(k=25)
## [1] 0.5686997
How does this change across different class sizes?
birthProb<-as.data.frame(birthProb)
for(i in 1:60){
birthProb$true[i]<-trueBD(k=i)
}
colnames(birthProb)<-c("ClassSize", "Simulation", "True")
library(tidyverse)
colnames(birthProb)<-c("ClassSize", "Simulation", "True")
birthSim<-birthProb%>%
gather("Type", "Prob", -ClassSize)
ggplot(birthSim, aes(x=ClassSize, y=Prob, color=Type))+
geom_line(lwd=1, alpha=.75)+
geom_hline(aes(yintercept=.5), lty=2)+
theme_bw()
The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 50% in a group of only 23 people.