1 Introduction

In team sports, there has always been a discussion of how home advantage often gives the home team an unfair advantage.Sports like soccer has mitigated this unfair advantage by attributing more weight to the points tallied by the away team. This deliverable is focused on examining the impact of home field advantage on the Braves winning the world series.

2 Method

The method used for estimating the impact of home advantage can be seen in the code chunk below. It has three different probability values for gauging the impact of home advantage. pb is used to showcase the non-existence of home field advantage. It is estimated to be 0.55 as shown below: \[P(B)=0.55\]

pbh is the probability that the Braves have a home advantage if the games was to be played in ATL. It is calculated as \[P(B)_h = P(B)*advantage\]

pba is the probability that the Braves win if the game was to be played in NYC. It is calculated as \[P(B)_a = 1 - (1 - P(B))*advantage\]

advantage stands for advantage multiplier. It has been set as 1.1 for cases in which home advantage exists and 1 if there is no home advantage. A data set with all the possible combinations is also loaded into the function with each row representing each combination. Based on the combination of each row, the probabilities are multiplied. Then, all the probabilities are added up. If the summation of the probabilities is 1, then the sequences in which the Braves win are added together and outputted.

require(dplyr)
#install.packages("data.table")
require(data.table)
# Get all possible outcomes
calc <- function(vec, adv, p) {
  apo <- fread("https://raw.githubusercontent.com/thomasgstewart/data-science-5620-fall-2021/master/deliverables/assets/all-possible-world-series-outcomes.csv")

  # Home field indicator
  hfi <- vec #{ATL, ATL, NYC, NYC, NYC, ATL, ATL}

  # P_B
  pb <- p
  advantage_multiplier <- adv # Set = 1 for no advantage
  pbh <- p*advantage_multiplier
  pba <- 1 - (1 - p)*advantage_multiplier

  # Calculate the probability of each possible outcome
  apo[, p := NA_real_] # Initialize new column in apo to store prob
  for(i in 1:nrow(apo)) {
    prob_game <- rep(1, 7)
    for(j in 1:7) {
      p_win <- ifelse(hfi[j], pbh, pba)
      prob_game[j] <- case_when(
          apo[i,j,with=FALSE] == "W" ~ p_win
        , apo[i,j,with=FALSE] == "L" ~ 1 - p_win
        , TRUE ~ 1
        )
      }
    apo[i, p := prod(prob_game)] # Data.table syntax
    }
  # Sanity check: does sum(p) == 1?
  if (all.equal(apo[, sum(p)],1)) { # This is data.table notation
     return(as.double(apo[, sum(p), overall_outcome][1,2])) 
    # Probability of overall World Series outcomes
  }
}
calc(c(1,1,0,0,0,1,1), 1.1, 0.55)
## [1] 0.6345261

3 Questions & Answers

  1. Compute analytically the probability that the Braves win the world series when the sequence of game locations is {NYC, NYC, ATL, ATL, ATL, NYC, NYC}. (The code below computes the probability for the alternative sequence of game locations. Note: The code uses data.table syntax, which may be new to you. This is intentional, as a gentle way to introduce data.table) Calculate the probability with and without home field advantage when P(B)=0.55. What is the difference in probabilities?

Home advantage effect

ha <- calc(c(0,0,1,1,1,0,0), 1.1, 0.55) # chances of winning with home advantage

The probability that the Braves will win without home advantage is 0.604221.

No home advantage

n_ha <- calc(c(0,0,1,1,1,0,0), 1, 0.55) # chances of winning without home advantage

The probability that the Braves will win without home advantage is 0.6082878.

The difference in probabilities is 0.0040668

  1. Calculate the same probabilities as the previous question by simulation.
# Function for running simulation, it's parameter is the advantage multiplier
calc2 <- function(adv) {
  pb <- 0.55
  advantage_multiplier <- adv
  pbh <- pb*advantage_multiplier
  pba <- 1 - (1 - pb)*advantage_multiplier
  d <- data.frame()
  set.seed(0)
  for (i in 1:5000) {
      n <- rbinom(7,1, c(pba,pba,pbh,pbh,pbh,pba,pba))
      # Our probability vector is c(pba,pba,pbh,pbh,pbh,pba,pba)
      # because the order is {ATL, ATL, NYC, NYC, NYC, ATL, ATL}
      for (j in 1:7) {
        d[i,j] <- n[j] # appending the cells with outcome of the game. 1 stands for W
      }                # and 0 stands for L
  }
  d <- d %>% 
    mutate(count = V1+V2+V3+V4+V5+V6+V7) %>% 
    mutate(win = if_else(count >= 4, "W", "L")) # selecting cases in which the Braves win
  d_win <- d %>% 
    filter(win == "W")
  return(nrow(d_win)/nrow(d))
}
prob_b <- calc2(1) # Probability without home advantage
prob_ba <- calc2(1.1) # Probability with home advantage

The simulted probability that the Braves will win without home advantage is 0.6102. The simulted probability that the Braves will win with home advantage is 0.61.

  1. What is the absolute and relative error for your simulation in the previous question?
abs_p_br <- abs(prob_b - n_ha)
rel_p_br <- abs(prob_b - n_ha)/n_ha
abs_p_ba <- abs(prob_ba - ha)
rel_p_ba <- abs(prob_ba - ha)/ha

Absolute error for the case without home advantage is 0.0019122. Relative error is 0.0031436.

Absolute error for the case with home advantage is 0.005779. Relative error is 0.0095644.

  1. Does the difference in probabilities (with vs without home field advantage) depend on P(B)? (Generate a plot to answer this question.)
vec <- seq(0.51,1,0.01)
pha <- c() # probability w/ home advantage
pnha <- c() # probability w/o home advantage
count <- 1
for (i in vec) {
  pha[count] <- calc(c(0,0,1,1,1,0,0), 1.1, i)
  pnha[count] <- calc(c(0,0,1,1,1,0,0), 1, i)
  count <- count + 1
}
new_df <- data.frame(prob = vec, yes = pha, no = pnha) # creating a data frame with
new_df <- new_df %>%                                   # the probabilities and the
  mutate(diff = yes - no)                              # outcomes of the impacts of
head(new_df)                                           # home advantage
##   prob       yes        no         diff
## 1 0.51 0.5084578 0.5218663 -0.013408500
## 2 0.52 0.5326106 0.5436801 -0.011069469
## 3 0.53 0.5566685 0.5653893 -0.008720791
## 4 0.54 0.5805615 0.5869421 -0.006380600
## 5 0.55 0.6042210 0.6082878 -0.004066825
## 6 0.56 0.6275792 0.6293763 -0.001797034

Plot

ggplot(new_df) +
  geom_point(aes(prob, diff), color = "blue") +
  geom_line(aes(prob, diff), color = "red") +
  labs(title="The impact of P(B) on the home advantage results",
       x = "Probability",
       y = "Difference in the probability of winning") +
  theme_classic() +
  theme(plot.title = element_text(hjust=0.5)) +
  inset_element(p = img_r,
                left = 0.6,
                bottom = 0.5,
                right = 0.4,
                top = 0.65)

According to the plot, it cannot be clearly inferred if P(B) has a huge impact on the difference in probabilities, given that the range of difference in probabilities is quite low. However, most of the differences in probabilities when P(B) > 0.57 were positive.

  1. Does the difference in probabilities (with vs without home field advantage) depend on the advantage factor? (The advantage factor in PBH and PBA is the 1.1 multiplier that results in a 10% increase for the home team. Generate a plot to answer this question.)
vec <- seq(0.1,3,0.05)
adv <- c()
count <- 1
for (i in vec) {
  adv[count] <- calc(c(0,0,1,1,1,0,0), i, 0.55) # probability w/ different 
  count <- count + 1                             # advantage multipliers
}   
new_df2 <- data.frame(adv_mult = vec, yes = adv, no = calc(c(0,0,1,1,1,0,0), 1, 0.55))
new_df2 <- new_df2 %>% 
  mutate(diff = yes - no)
head(new_df2)
##   adv_mult       yes        no       diff
## 1     0.10 0.8563581 0.6082878 0.24807029
## 2     0.15 0.8064258 0.6082878 0.19813801
## 3     0.20 0.7671446 0.6082878 0.15885678
## 4     0.25 0.7362330 0.6082878 0.12794517
## 5     0.30 0.7118517 0.6082878 0.10356391
## 6     0.35 0.6925327 0.6082878 0.08424489

Plot

ggplot(new_df2) +
  geom_point(aes(adv_mult, diff), color = "blue", size=2) +
  geom_line(aes(adv_mult, diff), color = "red") +
  labs(title="The impact of advantage multiplier on the home advantage results",
       x = "Advantage Multiplier",
       y = "Difference in the probability of winning") +
  theme_classic() +
  theme(plot.title = element_text(hjust=0.5)) +
  inset_element(p = img_r,
                left = 0.5,
                bottom = 0.6,
                right = 0.75,
                top = 0.75)

According to the plot, it can be observed that advantage multiplier has a huge impact on the difference in probabilities, given that the size of the advantage multiplier defines the magnitude of the difference. One can clearly observe that increasing the multiplier from 1 to 3 results results in the magnitude of the difference increasing by more than 3 times. Therefore, there is a non-linear relationship between differences in probability and advantage multiplier.

4 Conclusion

This deliverable showcases another way in which statistical concepts can be used to estimate how home advantage has an impact on a sports’ team performance. In this case, we are interested in measuring the ability of the Braves winning the World Series based on utilizing a parameter called advantage multiplier. The advantage multiplier served as a means to gauge their chances of winning.