We flip a coin 10 times and we want to know the probability of getting more than 3 heads. This is a trivial problem using the Binomial distribution but suppose we have forgotten about this or never learned it in the first place.
Let’s solve this problem with a Monte Carlo simulation. We will use the common trick of representing tails with 0 and heads with 1, then simulate 10 coin tosses 100 times and see how often that happens.
runs = 100 # number of simulations to run
greater_than_three = rep(0, runs) # vector to hold outcomes
# run 100 simulations
for ( i in 1:runs ) {
# flip a coin ten times (0 - tail, 1 - head)
coin_flips = sample(c(0,1), 10, replace=T)
# count how many heads and check if greater than 3
greater_than_three[i] = ( sum(coin_flips) > 3 )
}
# compute average over simulations
pr_greater_than_three <- sum(greater_than_three)/runs
For our MC estimate of the probability \(P(X>3)\) we get
print(pr_greater_than_three)
## [1] 0.83
which we can compare to R’s built-in Binomial distribution function:
print( pbinom(3, 10, 0.5, lower.tail=FALSE) )
## [1] 0.828125
Not bad! The Monte Carlo estimate is close to the true value.
Q1. Try increasing the number of simulations and see how the accuracy improves?
Q2. Can you plot how the accuracy varies as a function of the number of simulations? (hint: see the previous section)
runs <- c(10, 50, 100, 250, 500, 750, 1000) # try different number of simulations
n_runs <- length(runs) # number of different simulation sizes to try
rpts <- 100
accuracy <- rep(0, n_runs) # vector to record accuracy values
accuracy_sd <- rep(0, n_runs) # vector to record accuracy sd values
# for each simulation size
for ( i in 1:n_runs){
run_sz <- runs[i] # select a simulation size to use
# vector to store results from each repeat
pr_greater_than_three=rep(0, rpts)
for (k in 1:rpts){
greater_than_three <- rep(0, run_sz)
for ( j in 1:run_sz ) {
# flip a coin ten times (0 - tail, 1 - head)
coin_flips = sample(c(0,1), 10, replace=T)
# count how many heads and check if greater than 3
greater_than_three[j] <- (sum(coin_flips) > 3)
}
pr_greater_than_three[k] <- sum(greater_than_three)/run_sz
}
# compute difference between MC estimate and real value
accuracy[i] <- mean(pr_greater_than_three -
pbinom(3, 10, 0.5, lower.tail=FALSE))
# compute sd difference between MC estimate and real value
accuracy_sd[i] <- sd(pr_greater_than_three -
pbinom(3, 10, 0.5, lower.tail=FALSE))
}
print(accuracy)
## [1] 0.0078750000 -0.0099250000 0.0007750000 0.0027550000 0.0012550000
## [6] 0.0006216667 0.0005550000
print(accuracy_sd)
## [1] 0.11679214 0.05585660 0.03961672 0.02197578 0.01652838 0.01223242 0.01168948
library(ggplot2)
df <- data.frame(runs, accuracy)
# use ggplot to plot lines for accuracy
ggplot(df, aes(x=runs, y=accuracy)) +
geom_line() +
geom_point() +
geom_errorbar(aes(ymin=accuracy-accuracy_sd, ymax=accuracy+accuracy_sd), width=.2,
position=position_dodge(0.05)) +
ylab("Estimate-Exact") +
xlab("Simulations")
Consider the following spinner. If the spinner is spun randomly then it has a probability 0.5 of landing on yellow and 0.25 of landing on red or blue respectively.
If the rules of the game are such that landing on ‘yellow’ you gain 1 point, ‘red’ you lose 1 point and ‘blue’ you gain 2 points. We can easily calculate the expected score.
Let X denote the random variable associated with the score of the spin then:
\[E[X]=(\frac{1}{2} \times 1)+(\frac{1}{4}\times-1)+(\frac{1}{4}\times2)=0.75\]
If we ask a more challenging question such as:
After 20 spins what is the probability that you will have less then 0 points?”
How might we solve this?
Of course, there are methods to analytically solve this type of problem but by the time they are even explained we could have already written our simulation!
To solve this with a Monte Carlo simulation you need to sample from the Spinner 20 times, and return 1 if we are below 0 other wise we’ll return 0. We will repeat this 10,000 times to see how often it happens!
First, we are going to introduce the concept of a function. This is a piece of code which is encapsulated so then we can refer to it repeated via the name of the function rather than repeatedly writing those lines of code. The function we will write will simulate one game as indicated above and return whether the number of points is less than zero.
# simulates a game of 20 spins
play_game <- function(){
# picks a number from the list (1, -1, 2)
# with probability 50%, 25% and 25% twenty times
results <- sample( c(1, -1, 2), 20, replace=TRUE, prob = c(0.5, 0.25, 0.25))
# function returns whether the sum of all the spins is < 1
return(sum(results) < 0)
}
Now we can use this function in a loop to play the game 100 times:
runs <- 100 # play the game 100 times
less_than_zero = rep(0, runs) # vector to store outcome of each game
for ( it in 1:runs ) {
# play the game by calling the function and store the outcome
less_than_zero[it] <- play_game()
}
We can then compute the probability that, after twenty spins, we will have less than zero points:
prob_less_than_zero <- sum(less_than_zero)/runs
print(prob_less_than_zero)
## [1] 0
The probability is very low. This is not surprising since there is only a 25% chance of getting a point deduction on any spin and a 75% chance of gaining points. Try to increase the number of simulation runs to see if you can detect any games where you do find a negative score.
Q1. Modify your code to allow you to calculate the expected number of points after 20 spins.
# simulates a game of 20 spins
play_game <- function(){
# picks a number from the list (1, -1, 2)
# with probability 50%, 25% and 25% twenty times
results <- sample( c(1, -1, 2), 20, replace=TRUE, prob = c(0.5, 0.25, 0.25))
# function returns the sum of all the spins
return(sum(results))
}
score_per_game = rep(0, runs) # vector to store outcome of each game
for ( it in 1:runs ) {
score_per_game[it] <- play_game() # play the game by calling the function
}
expected_score = mean(score_per_game) # average over all simulations
print(expected_score)
## [1] 14.82
Q2. Simulate a game in which you have a maximum of 20 spins but you go “bust” once you hit a negative score and take this into account when you compute the expected end of game score.
# simulates a game of up to 20 spins
play_game <- function(){
# picks a number from the list (1, -1, 2)
# with probability 50%, 25% and 25% twenty times
results <- sample( c(1, -1, 2), 20, replace=TRUE, prob = c(0.5, 0.25, 0.25))
# compute a running sum of points
results_sum = cumsum(results)
# check if the game goes to zero at any point
if ( sum(results_sum < 0) >0 ){
return(0) # return zero
} else {
return(results_sum[20]) # returns the final score
}
}
runs <- 100 # play the game 100 times
game_score = rep(0, runs) # vector to store scores in each game played
for ( i in 1:runs){
game_score[i] = play_game()
}
print(mean(game_score))
## [1] 10.67
plot(game_score)