Grando 8 Discussion

Chapter 8.1 Exercise 3

Write a program to toss a coin 10,000 times. Let \({S}_{n}\) be the number of heads in the first n tosses. Have your program print out, after every 1000 tosses,\({S}_{n} - \frac{n}{2}\). On the basis of this simulation, is it correct to say that you can expect heads about half of the time when you toss a coin a large number of times?

Answer:

suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(tidyr)))
library(ggplot2)
# In this exercise, heads = 1
flip_coin <- function(flips, bin_size) {
    x = list()
    for (i in 1:bin_size) {
        x[i] <- sum(ifelse(runif(flips, 0, 1) > 0.5, 1, 0))
    }
    df <- data.frame(Heads = unlist(x), Flips = c(rep(flips, 
        bin_size))) %>% mutate(TotalHeads = cumsum(Heads), TotalFlips = cumsum(Flips)) %>% 
        select(TotalHeads, TotalFlips) %>% mutate(Difference = TotalHeads - 
        TotalFlips/2)
}

flip_df <- flip_coin(1000, 10)
ggplot(flip_df, aes(x = TotalFlips, y = Difference)) + geom_line()

Here we see that the difference of \({S}_{n} - \frac{n}{2}\) does not trend towards zero, it simply oscialtes around that value. In fact, the maximum deviation from zero gets bigger as more samples are taken.

flip_df <- flip_coin(10000, 10000)
ggplot(flip_df, aes(x = TotalFlips, y = Difference)) + geom_line()

Having said that, when you compare the difference to the number of tosses, the ratio of the discrepancy decreases over time. Therefore, it is expected that the we would get heads half of the time.

flip_df <- flip_coin(10000, 10000)
ggplot(flip_df, aes(x = TotalFlips, y = Difference/TotalFlips)) + 
    geom_line()