This is a quick R markdown viginette written using the ‘knitr’ package and MikTex LaTex package.
The purpose of this viginette is to document a way of modelling the Monty Hall Problem in R. The Monty Hall Problem is a classic statistical problem based on the TV show “Let’s Do A Deal” with the host Monty Hall. The premise of the problem is:
Now, intuitively, it would appear that changing doors or staying on the current door would appear to have the same chance of winning the prize. However, the initial door choice had a 33.33% chance of winning and switching to the other unopened door actually represents 66.66% chance (the other opened door plus the opened door) of winning.
In the following segments, I will model the problem and demonstrate that switching doors is indeed the statisically better answer.
First, let’s begin with the creation of the prize behind a door.
Random numbers can be generated using the ‘runif’ function:
runif(10,0,4)
#> [1] 1.054523 2.207024 2.976834 1.778216 3.078014 1.976064 3.650708
#> [8] 1.298162 3.213687 1.280704We have 10 random results, all between 1 and 4.
But hang on, we want whole integers for the door numbers! Let’s add the ‘floor’ function to round down the random numbers to the nearest whole integer:
floor(runif(10,1,4))
#> [1] 2 3 3 2 3 1 2 3 3 2Note: We now move the lowest parameter to 1, so that we have results between 1 and 3.
Now, let’s put this into practice! We will assign our selected door to ‘doorSelect’ and our prize door as ‘doorPrize’. We will use a 2 x 1000 array to store the values:
#Set random seed so results are reproducible
set.seed(1)
#Run N amount of iterations
N <- 1:1000
#Counter initialisation
i <- 0
stay <- 0
switch <- 0
#Random number generator used to select door and prize
doorSelect <- floor(runif(N,1,4))
doorPrize <- floor(runif(N,1,4))
#Combine the selections into one matrix
doors <- cbind(doorPrize,doorSelect)Now that we have our prize doors and our contestant selected doors, let’s do the analysis!
#For N iterations, count wins for staying if door and prize are the same and count wins for switching, if they are different
for(i in N) {
if (doors[i,1]==doors[i,2])
{
stay <- stay + 1
}
else
{
switch <- switch + 1
}
} In this instance, we make two assumptions:
If the selected door and the prize door are the same, then staying with contestant selection will win.
If the selected door is different to the prize door, then switching doors will win as the opened door reveals no prize.
So is the end result 50:50 between switching or staying? Let’s find out!
print(stay)
#> [1] 325
print(switch)
#> [1] 675As you can see, it is actually more successful to switch doors! If it were equal chance, we would’ve observed a roughly even success rate. However, switching doors is successful about 66.66% of the time, in line with our hypothesis.
Alternatively, for a full simulation of the Monty Hall problem, we can use a 4 x 1000 array to show all the door results, plus the winning strategy. Just for transparency and in case you didn’t believe the first simulation, we will add in the door reveal as well:
#Counter initialisation
i <- 0
stay <- 0
switch <- 0
#Door selection array
door <- 1:3
#Door result matrix - creates empty matrix to store all results
doors <- matrix(c(0),nrow = length(N), ncol = 4)
colnames(doors) <- c("Door 1", "Door 2","Door 3","Winning strategy")
#Add prize and selection
for(i in N)
{
#Contestant randomly selects door
doorSelect <- floor(runif(1,1,4))
#Prize is randomly hidden behind a door
doorPrize <- floor(runif(1,1,4))
if (doorPrize!=doorSelect)
{
#If the door and prize are not the same door, we subtract the
#two doors away and are left with the one door to reveal.
#In this instance, switching would be the winning strategy
doorOpen <- door[-c(doorPrize,doorSelect)]
switch = switch + 1
#Store result
doors[i,doorOpen] <- "Revealed"
doors[i,doorPrize] <- "Prize"
doors[i,doorSelect] <- "Selected"
doors[i,4] <- "Switch"
}
else
{
#If the door and prize are the same door, we have to select
#one of the two remaining doors at random to reveal. This is
#done using the 'sample' function, which randomly sequences
#a selection of numbers.
#In this instance, staying would be the winning strategy
doorOpen <- sample(door[-c(doorPrize,doorSelect)],1)
stay = stay + 1
#Store result
doors[i,doorOpen] <- "Revealed"
doors[i,doorPrize] <- "Prize"
doors[i,4] <- "Stay"
}
#Use the following code to print results for each run
# cat("Door opened: ",doorOpen,"\n")
# cat("Prize door: ",doorPrize,"\n")
# cat("Current wins (stay): ",stay,"\n")
# cat("Current wins (switch): ",switch,"\n\n")
}
#Print first ten results:
head(doors,10)
#> Door 1 Door 2 Door 3 Winning strategy
#> [1,] "0" "Revealed" "Prize" "Stay"
#> [2,] "Prize" "Selected" "Revealed" "Switch"
#> [3,] "Selected" "Prize" "Revealed" "Switch"
#> [4,] "Selected" "Prize" "Revealed" "Switch"
#> [5,] "Prize" "Selected" "Revealed" "Switch"
#> [6,] "Revealed" "Selected" "Prize" "Switch"
#> [7,] "Prize" "Selected" "Revealed" "Switch"
#> [8,] "Prize" "Selected" "Revealed" "Switch"
#> [9,] "Selected" "Revealed" "Prize" "Switch"
#> [10,] "Prize" "0" "Revealed" "Stay"The following results were obtained for 1,000 iterations:
print(stay)
#> [1] 313
print(switch)
#> [1] 687Once again, we achieved the same result. This problem is counter-intuitive because additional information is obtained after making the initial decision. If no doors were revealed, the probability of selecting the right door is 33.33%. However, once an empty door is revealed, this additional information helps us to make a better choice which is to change doors and thereby giving us a 66.66% chance of winning.
I hope you found this modelling vignette helpful and please feel free to add comments and suggestions for improvement. Thanks!