This blog post is aimed at answering a series of questions related to the World Series using the rules of probability and discrete probability functions to answer the questions. The target person of this blog post is a HR manager tasked with hiring a data scientist.
The World Series is the annual championship series of Major League Baseball (MLB) and concludes the MLB postseason. Learn more about World Series via https://en.wikipedia.org/wiki/List_of_World_Series_champions
In this post, we suppose that the Braves and the Yankees are teams competing in the World Series.
We suppose that in any given game, the probability that the Braves win is PB and the probability that the Yankees win is: PY=1−PB ## Questions to answer
library(ggplot2)
library(tidyverse)
Question 1: The probability for Braves win the World Series should be: P=P(Win in 4 games)+P(Win in 5 games)+P(Win in 6 games)+P(Win in 7 games)
Prob_BravesWinIn7=choose(6,3)*0.55^3*0.45^3*0.55+choose(5,3)*0.55^3*0.45^2*0.55+choose(4,3)*0.55^3*0.45*0.55+0.55^4
Question2: The probability of Brave wins WS should be: P=P(Win in 4 games)+P(Win in 5 games)+P(Win in 6 games)+P(Win in 7 games)
PB = seq(.1,1,.05)
Prob=c()
for(prob in PB){
Prob<-append(Prob,dnbinom(0,4,prob)+dnbinom(1,4,prob)+dnbinom(2,4,prob)+dnbinom(3,4,prob))}
tibble_1<-tibble(PB,Prob)
ggplot(tibble_1,aes(x=PB,y=Prob))+geom_point()+geom_line()+ylab("Probability of Braves wins WS")+xlab('Probability of Braves wins each game')
Question 3:
num_games = seq(7,151, by=2)
Pb= 0.55
num_wins=ceiling(num_games/2)
num_loss=num_games-num_wins
P_B_WS=pnbinom(num_loss,num_wins,Pb)
plot(num_games,P_B_WS)+abline(h=.8,col="red")
## integer(0)
num_games[which(P_B_WS>=.8)][1]
## [1] 71
Question 4:
Pb= 0.6
num_games = seq(7,151, by=2)
num_wins=ceiling(num_games/2)
num_loss=num_games-num_wins
P_B_WS=pnbinom(num_loss,num_wins,Pb)
plot(num_games,P_B_WS)+abline(h=.8,col="red")
## integer(0)
num_games[which(P_B_WS>=.8)]
## [1] 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53
## [20] 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91
## [39] 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 123 125 127 129
## [58] 131 133 135 137 139 141 143 145 147 149 151
P_B_WS_SEQ=c()
seq_Pb=seq(.5,1,by=.01)
for(prob in seq_Pb){
P_B_WS=pnbinom(num_loss,num_wins,prob)
P_B_WS_SEQ<-append(P_B_WS_SEQ,num_games[which(P_B_WS>=.8)][1])
}
plot(x=seq_Pb,y=P_B_WS_SEQ)+abline(h=7,col="red")
## integer(0)
# We could not see clearly when Probability >.65, let's limit y axis to see more clearly
plot(x=seq_Pb,y=P_B_WS_SEQ,ylim=c(7,10))+abline(h=7,col="red")
## integer(0)
We could see that when probability is greater than .65, the shortest length is 7 for Braves has 80% to win.
Question 5:
The question could be interpreted to: P(Pb=0.55| Braves win WS in 7 games)= P(Braves win WS in 7 games|Pb=0.55)/ P(Braves win WS in 7 games) or P(Pb=0.45| Braves win WS in 7 games)= P(Braves win WS in 7 games|Pb=0.45)/ P(Braves win WS in 7 games) or
# Pb=0.55
(choose(6,3)*.55^3*.45^3*.55/(choose(6,3)*.55^3*.45^3*.55+choose(6,3)*.45^3*.55^3*.45))
## [1] 0.55
(choose(6,3)*.45^3*.55^3*.45/(choose(6,3)*.55^3*.45^3*.55+choose(6,3)*.45^3*.55^3*.45))
## [1] 0.45
The result shows that the results of the probabilty for Pb =.55 and .45 are the probability themselves. This is might because Braves win WS in 7 games are independant with what’s the probability should be. So P(Braves win WS in 7 games|Pb=.45/.55 would equal P(Braves win WS in 7 games)*P(Pb=.45/.55)