Course: HarvardX PH125.3x | Data Science: Probability

Exercise 1. The Cavs and the Warriors

Two teams, say the Cavs and the Warriors, are playing a seven game championship series. The first to win four games wins the series. The teams are equally good, so they each have a 50-50 chance of winning each game.

If the Cavs lose the first game, what is the probability that they win the series?

# Assign a variable 'n' as the number of remaining games.
n <- 6

# Assign a variable `outcomes` as a vector of possible game outcomes, where 0 indicates a loss and 1 indicates a win for the Cavs.
outcomes <- c(0,1)

# Assign a variable `l` to a list of all possible outcomes in all remaining games. Use the `rep` function on `list(outcomes)` to create list of length `n`. 
l <- rep(list(outcomes), n)

# Create a data frame named 'possibilities' that contains all combinations of possible outcomes for the remaining games.
possibilities <- expand.grid(l)
possibilities
##    Var1 Var2 Var3 Var4 Var5 Var6
## 1     0    0    0    0    0    0
## 2     1    0    0    0    0    0
## 3     0    1    0    0    0    0
## 4     1    1    0    0    0    0
## 5     0    0    1    0    0    0
## 6     1    0    1    0    0    0
## 7     0    1    1    0    0    0
## 8     1    1    1    0    0    0
## 9     0    0    0    1    0    0
## 10    1    0    0    1    0    0
## 11    0    1    0    1    0    0
## 12    1    1    0    1    0    0
## 13    0    0    1    1    0    0
## 14    1    0    1    1    0    0
## 15    0    1    1    1    0    0
## 16    1    1    1    1    0    0
## 17    0    0    0    0    1    0
## 18    1    0    0    0    1    0
## 19    0    1    0    0    1    0
## 20    1    1    0    0    1    0
## 21    0    0    1    0    1    0
## 22    1    0    1    0    1    0
## 23    0    1    1    0    1    0
## 24    1    1    1    0    1    0
## 25    0    0    0    1    1    0
## 26    1    0    0    1    1    0
## 27    0    1    0    1    1    0
## 28    1    1    0    1    1    0
## 29    0    0    1    1    1    0
## 30    1    0    1    1    1    0
## 31    0    1    1    1    1    0
## 32    1    1    1    1    1    0
## 33    0    0    0    0    0    1
## 34    1    0    0    0    0    1
## 35    0    1    0    0    0    1
## 36    1    1    0    0    0    1
## 37    0    0    1    0    0    1
## 38    1    0    1    0    0    1
## 39    0    1    1    0    0    1
## 40    1    1    1    0    0    1
## 41    0    0    0    1    0    1
## 42    1    0    0    1    0    1
## 43    0    1    0    1    0    1
## 44    1    1    0    1    0    1
## 45    0    0    1    1    0    1
## 46    1    0    1    1    0    1
## 47    0    1    1    1    0    1
## 48    1    1    1    1    0    1
## 49    0    0    0    0    1    1
## 50    1    0    0    0    1    1
## 51    0    1    0    0    1    1
## 52    1    1    0    0    1    1
## 53    0    0    1    0    1    1
## 54    1    0    1    0    1    1
## 55    0    1    1    0    1    1
## 56    1    1    1    0    1    1
## 57    0    0    0    1    1    1
## 58    1    0    0    1    1    1
## 59    0    1    0    1    1    1
## 60    1    1    0    1    1    1
## 61    0    0    1    1    1    1
## 62    1    0    1    1    1    1
## 63    0    1    1    1    1    1
## 64    1    1    1    1    1    1
# Create a vector named 'results' that indicates whether each row in the data frame 'possibilities' contains enough wins for the Cavs to win the series.
results <- rowSums(possibilities)>=4
results
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [25] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
## [49] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
## [61]  TRUE  TRUE  TRUE  TRUE
# Calculate the proportion of 'results' in which the Cavs win the series. Print the outcome to the console.
mean(results)
## [1] 0.34375

Exercise 2. The Cavs and the Warriors - Monte Carlo

Confirm the results of the previous question with a Monte Carlo simulation to estimate the probability of the Cavs winning the series after losing the first game.

Instructions

  • Use the replicate function to replicate the sample code for B <- 10000 simulations.
  • Use the sample function to simulate a series of 6 games with random, independent outcomes of either a loss for the Cavs (0) or a win for the Cavs (1) in that order. Use the default probabilities to sample.
  • Use the sum function to determine whether a simulated series contained at least 4 wins for the Cavs.
  • Use the mean function to find the proportion of simulations in which the Cavs win at least 4 of the remaining games. Print your answer to the console.
# The variable `B` specifies the number of times we want the simulation to run. Let's run the Monte Carlo simulation 10,000 times.
B <- 10000

# Use the `set.seed` function to make sure your answer matches the expected result after random sampling.
set.seed(1)

# Create an object called `results` that replicates for `B` iterations a simulated series and determines whether that series contains at least four wins for the Cavs.
results <- replicate(B, {
  cavs_wins <- sample(c(0,1), 6, replace = TRUE)
  sum(cavs_wins)>=4 
})

# Calculate the frequency out of `B` iterations that the Cavs won at least four games in the remainder of the series. Print your answer to the console. 
mean(results)
## [1] 0.3371

Exercise 3. A and B play a series - part 1

Two teams, A and B, are playing a seven series game series. Team A is better than team B and has a p> 0.5 chance of winning each game.

Instructions

  • Use the function sapply to compute the probability, call it Pr of winning for p <- seq(0.5, 0.95, 0.025).
  • Then plot the result plot(p, Pr).
# Let's assign the variable 'p' as the vector of probabilities that team A will win.
p <- seq(0.5, 0.95, 0.025)

# Given a value 'p', the probability of winning the series for the underdog team B can be computed with the following function based on a Monte Carlo simulation:
prob_win <- function(p){
  B <- 10000
  result <- replicate(B, {
    b_win <- sample(c(1,0), 7, replace = TRUE, prob = c(1-p, p))
    sum(b_win)>=4
    })
  mean(result)
}

# Apply the 'prob_win' function across the vector of probabilities that team A will win to determine the probability that team B will win. Call this object 'Pr'.
Pr <- sapply(p, prob_win)

# Plot the probability 'p' on the x-axis and 'Pr' on the y-axis.
plot(p, Pr)

Exercise 4. A and B play a series - part 2

Repeat the previous exercise, but now keep the probability that team A wins fixed at p <- 0.75 and compute the probability for different series lengths. For example, wins in best of 1 game, 3 games, 5 games, and so on through a series that lasts 25 games.

Instructions - Use the seq function to generate a list of odd numbers ranging from 1 to 25. - Use the function sapply to compute the probability, call it Pr, of winning during series of different lengths. - Then plot the result plot(N, Pr).

# Given a value 'p', the probability of winning the series for the underdog team B can be computed with the following function based on a Monte Carlo simulation:
prob_win <- function(N, p=0.75){
      B <- 10000
      result <- replicate(B, {
        b_win <- sample(c(1,0), N, replace = TRUE, prob = c(1-p, p))
        sum(b_win)>=(N+1)/2
        })
      mean(result)
    }

# Assign the variable 'N' as the vector of series lengths. Use only odd numbers ranging from 1 to 25 games.
N <- seq(1, 25, by=2)

# Apply the 'prob_win' function across the vector of series lengths to determine the probability that team B will win. Call this object `Pr`.
Pr <- sapply(N, prob_win)

# Plot the number of games in the series 'N' on the x-axis and 'Pr' on the y-axis.
plot(N, Pr)