Monty Hall Simulation

Goal :
The “Monty Hall Problem” is a famous probability puzzle named after the host of the american game show “Let’s Make a Deal” (original run 1963-1976). The main premise is this:

“Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you,”Do you want to pick door No. 2?” Is it to your advantage to switch your choice or to stick with your original choice?”

Should you switch or Stay?

I will have to admit that my initial thought was that “it doesn’t matter”, there are two remaining doors, it’s 50/50 as to which door has the car.

It turns out that my initial thought was wrong. However I guess I’m in good company with PhD Mathematicians (see original article: https://en.wikipedia.org/wiki/Monty_Hall_problem).

The two key analytical principals that I missed in my original assessment are:

You’re not picking from two doors at random. The host bases his choice on YOUR choice and is therefore giving you additional information that you didn’t have before you made your original choice.
When you evaluate the odds of something happening, you have to make sure that it is 1 - the odds of something NOT happening. Sometimes looking at a problem forwards and backwards will help you see a blind spot.

For example, when you pick 1 of 3 doors, you have a 1/3 chance of picking the door with the car behind it and inversely you have 2/3 chance of NOT picking the door with the car behind it.

Because the host’s choice is dependent on your choice (versus just having two doors to choose from to begin with), the original odds of the car being behind one of your un-chosen doors is still 2/3 and the host has just politely eliminated the incorrect choice.

Given that there are 3 doors, you pick one and then the host reveals all but one of the remaining doors, let’s run 1000 simulations that assume random positions of both the car and your initial guess and put those in a data frame:

# A function to simulate the Monty Hall problem
GetMonty = function(Doors, N) {
  Doors = 1:Doors
  for (i in 1:N) {
    TempDF = tibble(
      DoorCount = max(Doors),
      Car = sample(Doors, 1), # randomize which door has the car
      Guess = sample(Doors, 1)) # guess a door at random)
    TempDF = TempDF %>%
      mutate(
      # Reveal all doors you didn't pick which have goats except one
      NotRevealed = if_else(Car==Guess, sample(Doors[!(Doors %in% c(Car, Guess))], 1), Car),
      Strategy = if_else(Guess == Car, "Stay", "Switch"),
      ResultStay = if_else(Guess == Car, 1, 0),
      ResultSwitch = if_else(ResultStay == 0, 1, 0)
    )
    if (i == 1) {
      ResultDF = TempDF
    } else {
      ResultDF = bind_rows(ResultDF, TempDF)
    }
  }
  return(ResultDF)
}

SimulationDF = GetMonty(3,1000)

datatable(SimulationDF)

Now let’s summarize and visualize what the results are from switching versus staying:

SimulationDF %>%
  mutate(GrandTotal = n()) %>%
  group_by(Strategy, GrandTotal) %>%
  summarize(StrategyCount = n(),
            StrategyPercent = percent(n() / min(GrandTotal), accuracy = 0.1), .groups = "drop") %>%
  select(Strategy, StrategyPercent) %>%
PrettyTable(paste0("Simulation Result - Doors =", max(SimulationDF$Car) ,", n=", nrow(SimulationDF)))

Simulation Result - Doors =3, n=1000
Strategy	StrategyPercent
Stay	34.9%
Switch	65.1%

CarPlot = SimulationDF %>%
  group_by(Car) %>%
  summarize(n = n(), .groups = "drop") %>%
  ggplot(aes(Car, n)) +
  geom_col(fill = "blue") +
  geom_label(aes(Car, n * 0.95, label=n), fill = "white") +
  theme_classic() +
  labs(title = "Distribution of Cars", y = "", x = "Door")

GuessPlot = SimulationDF %>%
  group_by(Guess) %>%
  summarize(n = n(), .groups = "drop") %>%
  ggplot(aes(Guess, n)) +
  geom_col(fill = "lightblue") +
  geom_label(aes(Guess, n * 0.95, label=n), fill = "white") +
  theme_classic() +
  labs(title = "Distribution of Guesses", y = "", x = "Door")

grid.arrange(CarPlot, GuessPlot, ncol = 1)

Because the data is random it’s not a perfect 2/3 winning probability to switch but it is clearly a better winning strategy.

So that’s a fairly convincing proof but let’s make this a bit more intuitive by looking at an example where there are 10 doors.

In this example, just like the other the host will reveal all but one of the remaining doors. That’s right, he will reveal the 8 doors that the car is NOT behind.

So your odds of NOT picking the door were 9/10 and now the host has eliminated 8 incorrect choices. This was the real ah-ha moment for me as to just how much information the host is giving you in this puzzle.

Here’s what happens if we run 1000 simulations on the 10 door scenario:

SimulationDF = GetMonty(10,1000)

CarPlot = SimulationDF %>%
  group_by(Car) %>%
  summarize(n = n(), .groups = "drop") %>%
  ggplot(aes(Car, n)) +
  geom_col(fill = "blue") +
  geom_label(aes(Car, n * 0.95, label=n), fill = "white") +
  scale_x_continuous(breaks = seq(1:10)) +
  theme_classic() +
  labs(title = "Distribution of Cars", y = "", x = "Door")

GuessPlot = SimulationDF %>%
  group_by(Guess) %>%
  summarize(n = n(), .groups = "drop") %>%
  ggplot(aes(Guess, n)) +
  geom_col(fill = "lightblue") +
  geom_label(aes(Guess, n * 0.95, label=n), fill = "white") +
  scale_x_continuous(breaks = seq(1:10)) +
  theme_classic() +
  labs(title = "Distribution of Guesses", y = "", x = "Door")

grid.arrange(CarPlot, GuessPlot, ncol = 1)

SimulationDF %>%
  mutate(GrandTotal = n()) %>%
  group_by(Strategy, GrandTotal) %>%
  summarize(StrategyCount = n(),
            StrategyPercent = percent(n() / min(GrandTotal), accuracy = 0.1), .groups = "drop") %>%
  select(Strategy, StrategyPercent) %>%
PrettyTable(paste0("Simulation Result - Doors =", max(SimulationDF$Car) ,", n=", nrow(SimulationDF)))

Simulation Result - Doors =10, n=1000
Strategy	StrategyPercent
Stay	10.0%
Switch	90.0%

Now it’s VERY obvious that switching is the better strategy, especially if there are more doors involved.

This concludes our investigation of this paradox, but regardless of your selection,

make it with Savvy.

Monty Hall Simulation

by : Chris at Savvy Analytics