Synopsis

The project calculates the probabilities of outcomes when n number of fair 6-sided dice are rolled. It then attempts to estimate the probabilities using Normal Distribution and demonstrates graphically the exact and estimated values.

Importing Libraries

library(polynom)
library(ggplot2)
library(gridExtra)

Mathematical solution

We consider sum of outcomes of n number of a 6-sided fair dice rolls. The probability of obtaing p can be expressed as the coeffecient of xp in the expression (1/6n)(1 + x + x2 + x3 + x4 + x5 + x6)n. This is the mathematical aspect of the project. Check more details here

A function is created that takes number of dice, n as the input and returns histogram of probability of outcomes. It calculates probabilities by the above mentioned method.

pd <- function(n){
    s = 6
    p<-polynomial(c(0,rep(1/s,s)))
    k <- data.frame(as.vector(p^n)[-(1:n)])
    k$no <- n:(s*n)
    names(k)[1] <- "p"
    ggplot(data.frame(k)) +
        geom_bar(stat = "identity",aes(no, p), size = 1, fill = "orange", col = "black") +
        labs(title = paste(n, ifelse(n == 1,"die roll", "dice rolls"), sep = " "),
             x = "Outcome", y = "Probability")
}

Probability Distribution of a single die roll, 6 dice rolls, 9 dice rolls and 25 dice rolls are shown.

grid.arrange(pd(1), pd(6),pd(9),pd(25), ncol = 2)

Statistical Estimation

These probabilities are estimated via normal distribution with the same mean and variance as that of the data.

For a single 6-sided fair dice roll, mean and variance can be calculated as follows:

E(X) = (1+2+3+4+5+6)/6 = 3.5

Var(X) = E(X2)-(E(X))2 = (12 + 22 + 32 + 42 + 52 + 62)/6 - 3.52 = 35/12

In case of n number of 6-sided fair dice rolls, since we assume these dice rolls to be independent. So, we can calculate mean and variance of sum by simply adding n times the mean and variance of single die roll case. So we get:

E(S) = 3.5n

Var(S) = (35/12)n

The previous function is modified to overlap the output histogram with normal distribution with parameters calculated above

pd <- function(n){
    s = 6
    p<-polynomial(c(0,rep(1/s,s)))
    k <- data.frame(as.vector(p^n)[-(1:n)])
    k$no <- n:(s*n)
    names(k)[1] <- "p"
    ggplot(data.frame(k)) +
        geom_bar(stat = "identity",aes(no, p), size = 1, fill = "orange", col = "black") +
        stat_function(fun = dnorm, size = 1.3, col = "blue",
                      args = list(mean = (s+1)/2*n, sd = sqrt(n*(s^2-1)/12))) +
        labs(title = paste(n, ifelse(n == 1,"die roll", "dice rolls"), sep = " "),
             x = "Outcome", y = "Probability")
}

Normal Estimations of the above probability distributions are shown

grid.arrange(pd(1), pd(6),pd(9),pd(25), ncol = 2)

Estimation is observed to get better as n increases.

Conclusion

  • As n increases, probabilities of sum of n dice rolls approaches normal distribution with mean of 3.5n and variance 35n/12
  • As n increases, finding the exact solution becomes more and more mathematically rigorous. So, for large n, we can estimate probabilities via normal approximation