Some considered the Monty Hall Problem as one of the “statistical illusions”.
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice? ~ (From Parade magazine’s Ask Marilyn column)
It’s hard to believe that your odd of winning the game will increase if you switch your choice. Just like what the article from www.statisticshowto.com said, Marilyn challenges the reader to simulate the problem if we don’t believe it.
library(ggplot2)
# number of times to repeat the experiment
iter<- 10000
# defining the doors
doors <- c("goat","goat","car")
# initialize dataframe to store the result per iteration
monte_hall <- function(iteration){
contestant_door <- sample(doors, size = iteration, replace = TRUE)
i=1:iteration
#stick_win which is equal to 1 if the contestant_door in current i is car, 0 otherwise.
#switch_win which is equal to 0 if the contestant_door is equal to car, 1 otherwise.
stick_win <- ifelse(contestant_door == 'car',1,0)
switch_win <- ifelse(contestant_door == 'car',0,1)
stick_prob <- cumsum(stick_win)/i
switch_prob <- cumsum(switch_win)/i
#store result in a dataframe
results <- data.frame(i=i, contestant_door=contestant_door,
stick_win=stick_win, switch_win=switch_win,
stick_prob=stick_prob, switch_prob=switch_prob)
return(results)
}
monte_hall_results <- monte_hall(iter)
Let’s check our simulated data. First let’s check the contestant_door column or the result of the door that the contestant pick, of 10k repetitions, 1/3 should be results to car, and 2/3 to goat. Then let’s print some of the data.
summary <- table(monte_hall_results$contestant_door)
df_summary <- data.frame(label = names(summary), count = matrix(summary))
print(df_summary)
## label count
## 1 car 3333
## 2 goat 6667
head(monte_hall_results)
## i contestant_door stick_win switch_win stick_prob switch_prob
## 1 1 goat 0 1 0.00 1.00
## 2 2 goat 0 1 0.00 1.00
## 3 3 goat 0 1 0.00 1.00
## 4 4 car 1 0 0.25 0.75
## 5 5 car 1 0 0.40 0.60
## 6 6 car 1 0 0.50 0.50
You can see that if the contestant play the show 10k times, 0.3333, 0.6667 of the time we got the car which is very close to the theoritical probability of winning (\(1/3\) or 33.33%).
Let’s summarize the chance of winning if we stick vs we switch.
print(paste("Chance of winning if we always stick from our first choice: ", sum(monte_hall_results$stick_win)/iter))
## [1] "Chance of winning if we always stick from our first choice: 0.3333"
print(paste("Chance of winning if we switch from our first choice: ", sum(monte_hall_results$switch_win)/iter))
## [1] "Chance of winning if we switch from our first choice: 0.6667"
Let’s plot the estimated probability of winning by sticking to the first choise vs switching to the other door by the number of iterations:
g <- ggplot(monte_hall_results, mapping = aes(x=i, y=stick_prob))
g + geom_line(color="#3333ff") +
geom_hline(yintercept = 1/3, color = '#8080ff') +
geom_line(aes(y=switch_prob), color= "#ff751a") +
geom_hline(yintercept = 2/3, color = '#ff944d') +
ylab('Est.Probability')+xlab('Iteration') +
geom_label(data = data.frame(label = c('switch', 'stick'), i = c(10000,10000), stick_prob = c(0.75,0.25)),
aes(label = label, color=label),
show.legend = FALSE
)+
scale_color_manual(values = c("#3333ff", "#ff751a"))+
ggtitle("Estimated Probability of Winning")