Overview

I’m taking a statistical inference course and was looking for good R code that simulates the throwing of two dice. I found this piece on RPubs and it seemed to do what I wanted. http://rpubs.com/Lionel/11497 . Everytime I ran it I got an error on the plot call so I dove in a little deeper. The end result is that I changed the names of the variables to make more sense to the non-R-gurus and fixed the ggplot issue. Turns out he was using a histogram instead of a bar plot. I’m not even sure how he got it to run for the knitr doc.

Original

This is the original code from Lionel:

library(ggplot2)
# function to generate dice throws
des <- function(time) {
    # this is the theoretical distributions of the resulting values, there is
    # one chance out of 36 to get 2, 2 chance out of 36 to get 3...
    true <- c(1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1)/36
    # these are the sums of our two throws, and since we are throwing the same
    # number over and over again we have to set replace =TRUE
    res <- sample(1:6, size = time, replace = TRUE) + sample(1:6, size = time, 
        replace = TRUE)
    # a resulting data frame for later use, when time is low there is some
    # chance that we will not get all numbers between 2 and 12, therefore the
    # specification of the Value and Exp column is not that straightforward
    out <- data.frame(Value = c(unique(res)[order(unique(res))], 2:12), Res = c(table(res)/time, 
        true), Exp = c(rep("Observed", length(unique(res))), rep("Theoretical", 
        11)))
    # plotting
    #print(ggplot(out, aes(x = Value, y = Res, fill = Exp)) + geom_histogram(stat = "identity", 
    #    position = "dodge") + scale_x_discrete(breaks = c(2:12)))
}
des(100)

On my R version 3.3.0 the ggplot call produces this error

# Error: Unknown parameters: binwidth, bins, pad

I figured Lionel was using the actual results to create a histogram but he actually broke it down into probabilities which is better illustrated by a bar plot.

Revised

This is my revised code. Instead of dodging the bar chart I thought it was easier to see side-by-side.

dice_throws <- function(n_throws) {
    # This is the theoretical distributions of the resulting values, there is
    # one chance out of 36 to get 2, 2 chance out of 36 to get 3...
    # I'll try to automate this next part instead of relying on manual calculations..
    theoretical <- c(1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1)/36
    # these are the sums of our two throws, and since we are throwing the same
    # number over and over again we have to set replace =TRUE
    throw_results <- sample(1:6, size = n_throws, replace = TRUE) + sample(1:6, size = n_throws, 
                                                             replace = TRUE)
    # a resulting data frame for later use, when time is low there is some
    # chance that we will not get all numbers between 2 and 12, therefore the
    # specification of the Value and Exp column is not that straightforward
    observed__vs_expected <- data.frame(Value = c(unique(throw_results)[order(unique(throw_results))], 2:12), 
                      Observed = c(table(throw_results)/n_throws, theoretical), 
                      Expected = c(rep("Observed", length(unique(throw_results))), rep("Theoretical", 11)))
   
    print(ggplot(observed__vs_expected, aes(x = Value, y = Observed, fill = Expected)) + 
              geom_bar(stat = "identity") + 
              facet_wrap(~ Expected, ncol = 2))
}
dice_throws(100)

With only 100 dice throws the distribution is probably uneven because we’re dealing with few samples. Now lets run it for 1,000,000 throws. You can now see that it is very close to the Theoretical distribution.

dice_throws(1000000)

The more times you sample, the closer the sample mean is to the population mean.

Therefore, any sample mean is an estimator of the population mean.

Thanks for the inspiration Lionel :-)

Simulating Dice Throws - Proving Regressing Towards the Mean

Kier O’Neil

September 24, 2016

Overview

Original

Revised