There exists a mathematical concept that states that in a group of just 23 people, there is a greater then 50% chance that at least 2 people will share the same birthday. Most people assume that the probability is significantly lower then 50% but in reality, when comparing birthdays, each person is compared to everyother person in the group creating many more possible pairings then just comparing eachother to one another. Simply put, more people = more comparisons = higher probability. In this analysis, we will display this mathematical concept.
\[ P(\text{no shared birthday}) = \frac{365}{365} \times \frac{364}{365} \times \frac{363}{365} \times \dots \times \frac{(365-n+1)}{365} \]
Then:
\[ P(\text{at least one shared}) = 1 - P(\text{no shared birthday}) \]
We start by importing the following libraries that will be used to visualize our findings. plotly is specifically used to create interactive 3D visualizations which allow zooming, rotating, and hovering over points for detailed analysis. ggplot2 is used to create plots in 2D visualization and they can help in drawing line charts and scatter plots for better understanding.
library(plotly)
library(ggplot2)
To simulate a real word scenario where people in groups have random birthdays, with a given group size n, we randomly n birthdays from integers 1 - 365. If two people have the same birthday, we count this as a success and otherwise would be a failure. We repeat this 5000 times to get an accurate estimate.
set.seed(123)
birthday_conc_prob <- function(n, trials = 5000) {
shared_count <- sum(replicate(trials, {
length(unique(sample(1:365, n, replace = TRUE))) < n
}))
return(shared_count / trials)
}
group_sizes <- 1:50
estimated_probs <- sapply(group_sizes, birthday_conc_prob)
In this plot, we display a line graph showing thr raw data as well (red dots)
ggplot(data.frame(group_sizes, estimated_probs), aes(x = group_sizes, y = estimated_probs)) +
geom_point(color = "red") +
geom_smooth(method = "loess", se = FALSE, color = "black") +
labs(title = "Probability of Shared Birthday as Group Size Increases",
x = "Group Size", y = "Probability") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
As you can see in the graph, we notice that the probability increases at a rapid rate as the number of people grows. We see that at 23 people, the probability crosses 50% thus proving the math concept claim.
In this plot, we display a bar graph which gives even more of a detailed look at our analysis.
ggplot(data.frame(group_sizes, estimated_probs), aes(x = factor(group_sizes), y = estimated_probs)) +
geom_bar(stat = "identity", fill = "red", color = "black", alpha = 0.8) +
labs(title = "Probability of Shared Birthday by Group Size",
x = "Group Size",
y = "Probability of a Shared Birthday") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
We can still see that the probability increases as the group size gets bigger. We can also see all group size numbers and there respected probabilities
In this diagram, we can see a detailed and easier to see 3D diagram of the probability changing across different group sizes.
prob_data <- data.frame(group_sizes, estimated_probs, z = estimated_probs)
plot_ly(prob_data, x = ~group_sizes, y = ~estimated_probs, z = ~z, type = 'scatter3d', mode = 'lines+markers',
marker = list(size = 3)) %>%
layout(title = "3D Visualization of Birthday Problem",
scene = list(xaxis = list(title = "Number of People"),
yaxis = list(title = "Estimated Probability"),
zaxis = list(title = "Probability"))
)