In team sports, there has always been a discussion of how home advantage often gives the home team an unfair advantage.Sports like soccer has mitigated this unfair advantage by attributing more weight to the points tallied by the away team. This deliverable is focused on examining the impact of home field advantage on the Braves winning the world series.
The method used for estimating the impact of home advantage can be seen in the code chunk below. It has three different probability values for gauging the impact of home advantage. pb is used to showcase the non-existence of home field advantage. It is estimated to be 0.55 as shown below: \[P(B)=0.55\]
pbh is the probability that the Braves have a home advantage if the games was to be played in ATL. It is calculated as \[P(B)_h = P(B)*advantage\]
pba is the probability that the Braves win if the game was to be played in NYC. It is calculated as \[P(B)_a = 1 - (1 - P(B))*advantage\]
advantage stands for advantage multiplier. It has been set as 1.1 for cases in which home advantage exists and 1 if there is no home advantage. A data set with all the possible combinations is also loaded into the function with each row representing each combination. Based on the combination of each row, the probabilities are multiplied. Then, all the probabilities are added up. If the summation of the probabilities is 1, then the sequences in which the Braves win are added together and outputted.
require(dplyr)
#install.packages("data.table")
require(data.table)
# Get all possible outcomes
calc <- function(vec, adv, p) {
apo <- fread("https://raw.githubusercontent.com/thomasgstewart/data-science-5620-fall-2021/master/deliverables/assets/all-possible-world-series-outcomes.csv")
# Home field indicator
hfi <- vec #{ATL, ATL, NYC, NYC, NYC, ATL, ATL}
# P_B
pb <- p
advantage_multiplier <- adv # Set = 1 for no advantage
pbh <- p*advantage_multiplier
pba <- 1 - (1 - p)*advantage_multiplier
# Calculate the probability of each possible outcome
apo[, p := NA_real_] # Initialize new column in apo to store prob
for(i in 1:nrow(apo)) {
prob_game <- rep(1, 7)
for(j in 1:7) {
p_win <- ifelse(hfi[j], pbh, pba)
prob_game[j] <- case_when(
apo[i,j,with=FALSE] == "W" ~ p_win
, apo[i,j,with=FALSE] == "L" ~ 1 - p_win
, TRUE ~ 1
)
}
apo[i, p := prod(prob_game)] # Data.table syntax
}
# Sanity check: does sum(p) == 1?
if (all.equal(apo[, sum(p)],1)) { # This is data.table notation
return(as.double(apo[, sum(p), overall_outcome][1,2]))
# Probability of overall World Series outcomes
}
}
calc(c(1,1,0,0,0,1,1), 1.1, 0.55)
## [1] 0.6345261
Home advantage effect
ha <- calc(c(0,0,1,1,1,0,0), 1.1, 0.55) # chances of winning with home advantage
The probability that the Braves will win without home advantage is 0.604221.
No home advantage
n_ha <- calc(c(0,0,1,1,1,0,0), 1, 0.55) # chances of winning without home advantage
The probability that the Braves will win without home advantage is 0.6082878.
The difference in probabilities is 0.0040668
# Function for running simulation, it's parameter is the advantage multiplier
calc2 <- function(adv) {
pb <- 0.55
advantage_multiplier <- adv
pbh <- pb*advantage_multiplier
pba <- 1 - (1 - pb)*advantage_multiplier
d <- data.frame()
set.seed(0)
for (i in 1:5000) {
n <- rbinom(7,1, c(pba,pba,pbh,pbh,pbh,pba,pba))
# Our probability vector is c(pba,pba,pbh,pbh,pbh,pba,pba)
# because the order is {ATL, ATL, NYC, NYC, NYC, ATL, ATL}
for (j in 1:7) {
d[i,j] <- n[j] # appending the cells with outcome of the game. 1 stands for W
} # and 0 stands for L
}
d <- d %>%
mutate(count = V1+V2+V3+V4+V5+V6+V7) %>%
mutate(win = if_else(count >= 4, "W", "L")) # selecting cases in which the Braves win
d_win <- d %>%
filter(win == "W")
return(nrow(d_win)/nrow(d))
}
prob_b <- calc2(1) # Probability without home advantage
prob_ba <- calc2(1.1) # Probability with home advantage
The simulted probability that the Braves will win without home advantage is 0.6102. The simulted probability that the Braves will win with home advantage is 0.61.
abs_p_br <- abs(prob_b - n_ha)
rel_p_br <- abs(prob_b - n_ha)/n_ha
abs_p_ba <- abs(prob_ba - ha)
rel_p_ba <- abs(prob_ba - ha)/ha
Absolute error for the case without home advantage is 0.0019122. Relative error is 0.0031436.
Absolute error for the case with home advantage is 0.005779. Relative error is 0.0095644.
vec <- seq(0.51,1,0.01)
pha <- c() # probability w/ home advantage
pnha <- c() # probability w/o home advantage
count <- 1
for (i in vec) {
pha[count] <- calc(c(0,0,1,1,1,0,0), 1.1, i)
pnha[count] <- calc(c(0,0,1,1,1,0,0), 1, i)
count <- count + 1
}
new_df <- data.frame(prob = vec, yes = pha, no = pnha) # creating a data frame with
new_df <- new_df %>% # the probabilities and the
mutate(diff = yes - no) # outcomes of the impacts of
head(new_df) # home advantage
## prob yes no diff
## 1 0.51 0.5084578 0.5218663 -0.013408500
## 2 0.52 0.5326106 0.5436801 -0.011069469
## 3 0.53 0.5566685 0.5653893 -0.008720791
## 4 0.54 0.5805615 0.5869421 -0.006380600
## 5 0.55 0.6042210 0.6082878 -0.004066825
## 6 0.56 0.6275792 0.6293763 -0.001797034
Plot
ggplot(new_df) +
geom_point(aes(prob, diff), color = "blue") +
geom_line(aes(prob, diff), color = "red") +
labs(title="The impact of P(B) on the home advantage results",
x = "Probability",
y = "Difference in the probability of winning") +
theme_classic() +
theme(plot.title = element_text(hjust=0.5)) +
inset_element(p = img_r,
left = 0.6,
bottom = 0.5,
right = 0.4,
top = 0.65)
According to the plot, it cannot be clearly inferred if P(B) has a huge impact on the difference in probabilities, given that the range of difference in probabilities is quite low. However, most of the differences in probabilities when P(B) > 0.57 were positive.
vec <- seq(0.1,3,0.05)
adv <- c()
count <- 1
for (i in vec) {
adv[count] <- calc(c(0,0,1,1,1,0,0), i, 0.55) # probability w/ different
count <- count + 1 # advantage multipliers
}
new_df2 <- data.frame(adv_mult = vec, yes = adv, no = calc(c(0,0,1,1,1,0,0), 1, 0.55))
new_df2 <- new_df2 %>%
mutate(diff = yes - no)
head(new_df2)
## adv_mult yes no diff
## 1 0.10 0.8563581 0.6082878 0.24807029
## 2 0.15 0.8064258 0.6082878 0.19813801
## 3 0.20 0.7671446 0.6082878 0.15885678
## 4 0.25 0.7362330 0.6082878 0.12794517
## 5 0.30 0.7118517 0.6082878 0.10356391
## 6 0.35 0.6925327 0.6082878 0.08424489
Plot
ggplot(new_df2) +
geom_point(aes(adv_mult, diff), color = "blue", size=2) +
geom_line(aes(adv_mult, diff), color = "red") +
labs(title="The impact of advantage multiplier on the home advantage results",
x = "Advantage Multiplier",
y = "Difference in the probability of winning") +
theme_classic() +
theme(plot.title = element_text(hjust=0.5)) +
inset_element(p = img_r,
left = 0.5,
bottom = 0.6,
right = 0.75,
top = 0.75)
According to the plot, it can be observed that advantage multiplier has a huge impact on the difference in probabilities, given that the size of the advantage multiplier defines the magnitude of the difference. One can clearly observe that increasing the multiplier from 1 to 3 results results in the magnitude of the difference increasing by more than 3 times. Therefore, there is a non-linear relationship between differences in probability and advantage multiplier.
This deliverable showcases another way in which statistical concepts can be used to estimate how home advantage has an impact on a sports’ team performance. In this case, we are interested in measuring the ability of the Braves winning the World Series based on utilizing a parameter called advantage multiplier. The advantage multiplier served as a means to gauge their chances of winning.