1 Approximating the Binomial Distribution

We flip a coin 10 times and we want to know the probability of getting more than 3 heads. This is a trivial problem using the Binomial distribution but suppose we have forgotten about this or never learned it in the first place.

Let’s solve this problem with a Monte Carlo simulation. We will use the common trick of representing tails with 0 and heads with 1, then simulate 10 coin tosses 100 times and see how often that happens.

runs = 100 # number of simulations to run

greater_than_three = rep(0, runs) # vector to hold outcomes

# run 100 simulations
for ( i in 1:runs ) {

  # flip a coin ten times (0 - tail, 1 - head)
  coin_flips = sample(c(0,1), 10, replace=T)

  # count how many heads and check if greater than 3
  greater_than_three[i] = ( sum(coin_flips) > 3 )
}

# compute average over simulations
pr_greater_than_three <- sum(greater_than_three)/runs

For our MC estimate of the probability \(P(X>3)\) we get

print(pr_greater_than_three)
## [1] 0.83

which we can compare to R’s built-in Binomial distribution function:

print( pbinom(3, 10, 0.5, lower.tail=FALSE) )
## [1] 0.828125

Not bad! The Monte Carlo estimate is close to the true value.

1.1 Problem

Q1. Try increasing the number of simulations and see how the accuracy improves?

Q2. Can you plot how the accuracy varies as a function of the number of simulations? (hint: see the previous section)

1.1.1 Solution

Only expand once you have made an attempt!
runs <- c(10, 50, 100, 250, 500, 750, 1000) # try different number of simulations
n_runs <- length(runs) # number of different simulation sizes to try
rpts <- 100
accuracy <- rep(0, n_runs) # vector to record accuracy values
accuracy_sd <- rep(0, n_runs) # vector to record accuracy sd values
# for each simulation size
for ( i in 1:n_runs){

  run_sz <- runs[i] # select a simulation size to use

  # vector to store results from each repeat
  pr_greater_than_three=rep(0, rpts)
  for (k in 1:rpts){
    greater_than_three <- rep(0, run_sz)
  for ( j in 1:run_sz ) {

    # flip a coin ten times (0 - tail, 1 - head)
    coin_flips = sample(c(0,1), 10, replace=T)

    # count how many heads and check if greater than 3
    greater_than_three[j] <- (sum(coin_flips) > 3)
  }
  pr_greater_than_three[k] <- sum(greater_than_three)/run_sz
  }
  # compute difference between MC estimate and real value
  accuracy[i] <- mean(pr_greater_than_three -
        pbinom(3, 10, 0.5, lower.tail=FALSE))

  # compute sd difference between MC estimate and real value
  accuracy_sd[i] <- sd(pr_greater_than_three -
      pbinom(3, 10, 0.5, lower.tail=FALSE))

}

print(accuracy)
## [1]  0.0078750000 -0.0099250000  0.0007750000  0.0027550000  0.0012550000
## [6]  0.0006216667  0.0005550000
print(accuracy_sd)
## [1] 0.11679214 0.05585660 0.03961672 0.02197578 0.01652838 0.01223242 0.01168948
library(ggplot2)

df <- data.frame(runs, accuracy)

# use ggplot to plot lines for accuracy
ggplot(df, aes(x=runs, y=accuracy)) +
  geom_line() +
  geom_point() +
  geom_errorbar(aes(ymin=accuracy-accuracy_sd, ymax=accuracy+accuracy_sd), width=.2,
                position=position_dodge(0.05)) +
  ylab("Estimate-Exact") +
  xlab("Simulations")

2 Monte Carlo Expectations

Consider the following spinner. If the spinner is spun randomly then it has a probability 0.5 of landing on yellow and 0.25 of landing on red or blue respectively.

If the rules of the game are such that landing on ‘yellow’ you gain 1 point, ‘red’ you lose 1 point and ‘blue’ you gain 2 points. We can easily calculate the expected score.

Let X denote the random variable associated with the score of the spin then:

\[E[X]=(\frac{1}{2} \times 1)+(\frac{1}{4}\times-1)+(\frac{1}{4}\times2)=0.75\]

If we ask a more challenging question such as:

After 20 spins what is the probability that you will have less then 0 points?”

How might we solve this?

Of course, there are methods to analytically solve this type of problem but by the time they are even explained we could have already written our simulation!

To solve this with a Monte Carlo simulation you need to sample from the Spinner 20 times, and return 1 if we are below 0 other wise we’ll return 0. We will repeat this 10,000 times to see how often it happens!

2.1 Functions

First, we are going to introduce the concept of a function. This is a piece of code which is encapsulated so then we can refer to it repeated via the name of the function rather than repeatedly writing those lines of code. The function we will write will simulate one game as indicated above and return whether the number of points is less than zero.

# simulates a game of 20 spins
play_game <- function(){
  # picks a number from the list (1, -1, 2)
  # with probability 50%, 25% and 25% twenty times
  results <- sample( c(1, -1, 2), 20, replace=TRUE, prob = c(0.5, 0.25, 0.25))
  # function returns whether the sum of all the spins is < 1
  return(sum(results) < 0)
}

2.2 Simulation

Now we can use this function in a loop to play the game 100 times:

runs <- 100 # play the game 100 times

less_than_zero = rep(0, runs) # vector to store outcome of each game
for ( it in 1:runs ) {
  # play the game by calling the function and store the outcome
  less_than_zero[it] <- play_game()
}

We can then compute the probability that, after twenty spins, we will have less than zero points:

prob_less_than_zero <- sum(less_than_zero)/runs
print(prob_less_than_zero)
## [1] 0

The probability is very low. This is not surprising since there is only a 25% chance of getting a point deduction on any spin and a 75% chance of gaining points. Try to increase the number of simulation runs to see if you can detect any games where you do find a negative score.

2.3 Problems

Q1. Modify your code to allow you to calculate the expected number of points after 20 spins.

2.3.1 Solution

Only expand once you have made an attempt!
# simulates a game of 20 spins
play_game <- function(){
  # picks a number from the list (1, -1, 2)
  # with probability 50%, 25% and 25% twenty times
  results <- sample( c(1, -1, 2), 20, replace=TRUE, prob = c(0.5, 0.25, 0.25))
  # function returns the sum of all the spins
  return(sum(results))
}

score_per_game = rep(0, runs) # vector to store outcome of each game
for ( it in 1:runs ) {
  score_per_game[it] <- play_game() # play the game by calling the function
}
expected_score = mean(score_per_game) # average over all simulations

print(expected_score)
## [1] 14.82


Q2. Simulate a game in which you have a maximum of 20 spins but you go “bust” once you hit a negative score and take this into account when you compute the expected end of game score.

2.3.2 Solution

Only expand once you have made an attempt!
# simulates a game of up to 20 spins
play_game <- function(){
  # picks a number from the list (1, -1, 2)
  # with probability 50%, 25% and 25% twenty times
  results <- sample( c(1, -1, 2), 20, replace=TRUE, prob = c(0.5, 0.25, 0.25))
  # compute a running sum of points
  results_sum = cumsum(results)
  # check if the game goes to zero at any point
    if ( sum(results_sum < 0) >0  ){
      return(0) # return zero
    } else {
  return(results_sum[20]) # returns the final score
    }
}



runs <- 100 # play the game 100 times
game_score = rep(0, runs) # vector to store scores in each game played

for ( i in 1:runs){
  game_score[i] = play_game()
}
print(mean(game_score))
## [1] 10.67
plot(game_score)
The games with score zero now corresponds to the number of games where we went bust (or genuinely ended the game with zero).