Each week we can find two problems posed on fivethirtyeight.com/features. Here they are. Let’s get solving!
“If a baseball team is truly .500, meaning it has a 50 percent chance of winning each game, what’s the probability that it has won two of its last four games and four of its last eight games?”
We can answer this question in several different ways. I will cover a few. We are trying to find a probability of a compound event (that word “and” lets you know). Breaking this up, let’s let A be the event that the team won two of its last four games and B be the event that the team won four of its last eight games. Then, we’re after P(A∩B).
This can be solved by using conditional probability. We condition on one of the events. This is the two ways I will solve the problem. First, by conditioning on A, and second, by conditioning on B.
Here is what I mean by conditioning on A. I find the probability of A occurring without considering B at all, and then find the probability that B will occur given that A has already occurred. What this looks like notationally is P(A∩B)=P(A)×P(B|A).
To find P(A), the probality that the team won two of its last four games, is to consider all of the possibilities. Here are the equally likely outcomes of the last four games: {LLLL,LLLW,LLWL,LWLL,WLLL,LLWW,LWLW,WLLW,LWWL,WLWL,WWLL,LWWW,WLWW,WWLW,WWWL,WWWW}
We can see that 6 of these 16 come up with two wins and two losses. That means that P(A)=6/16=3/8=0.375. For those that recognize this as a binomial distribution with n=4 and p=0.5, we could calculate this by
dbinom(2,4,0.5)
## [1] 0.375
Now, we pretend this has already happened. What is the probability B occurs? Again, we’re given that A already had occurred, so we just need for it to happen again for B to occur! This happens with the same probability of 0.375. Now, using the formula we have
P(A∩B)=P(A)×P(B|A)=(0.375)×(0.375)=0.140625.
If we tackled this from the other direction, we would first find P(B). This would be a bit more involved, as we would have to list out 256 outcomes rather than the 16. Of those 256 outcomes, we would find that 70 of them had 4 wins and 4 losses.
This means P(B)=70/256=0.2734375. Now, under the condition that B has occurred (that is, one of those 70 outcomes in which we have 4 losses and 4 wins occurred), what is the probability that A occurs.
This could take quite a bit of work, sifting through all 70 outcomes and counting up those that have 2 wins and 2 losses in the last four games (which ends up being 36). Using counting techniques or a hypergeometric distribution could get you there faster.
P(A∩B)=P(B)×P(A|B)=70256×3670=36256=0.140625
As this is the same answer as the previous value, this provided a good check.
"Riddler League Baseball, also known as the RLB, consists of three teams: the Mississippi Moonwalkers, the Delaware Doubloons and the Tennessee Taters.
"Each time a batter for the Moonwalkers comes to the plate, they have a 40 percent chance of getting a walk and a 60 percent chance of striking out. Each batter for the Doubloons, meanwhile, hits a double 20 percent percent of the time, driving in any teammates who are on base, and strikes out the remaining 80 percent of the time. Finally, each batter for the Taters has a 10 percent chance of hitting a home run and a 90 percent chance of striking out.
“During the RLB season, each team plays an equal number of games against each opponent. Games are nine innings long and can go into extra innings just like in other baseball leagues. Which of the three teams is most likely to have the best record at the end of the season?”
I will answer this problem using simulation. To simulate a season of the Riddler Baseball League, I will need to first break down the simulation into simpler steps. First, I’ll simulate an at-bat for someone on team Moonwalker.
We will input a situation, which will be a 3-tuple of 0’s or 1’s, representing whether a runner is on 1st, 2nd, or 3rd base. Next, we’ll input the number of runs. Last, we’ll input the number of outs. Then, with the specified probabilities given in the problem, the output will be the resulting situation, runs, and outs.
MW_atBat <- function(situation, runs, outs){
p <- sample(0:1, 1, prob = c(0.6,0.4))
if(p==0){
return(list(situation = situation, runs = runs, outs = outs+1))
}
if(p==1){
if(situation[3]==1){runs = runs+1}
situation[2:3] <- situation[1:2]
situation[1] <- 1
return(list(situation = situation, runs = runs, outs = outs))
}
}
set.seed(9876)
MW_atBat(c(1,0,0), 2, 1)
## $situation
## [1] 1 1 0
##
## $runs
## [1] 2
##
## $outs
## [1] 1
MW_atBat(c(0,0,0), 0, 2)
## $situation
## [1] 0 0 0
##
## $runs
## [1] 0
##
## $outs
## [1] 3
For the at-bat functions for the Doubloons, the situation will be a guy on second or not, so we just need a variable that takes on 0 or 1. For the Taters, there is never a situation on the bases since they’ll either hit a homerun or not.
Dbl_atBat <- function(On2nd, runs, outs){
p <- sample(0:1, 1, prob = c(0.8,0.2))
if(p==0){
return(list(On2nd = On2nd, runs = runs, outs = outs+1))
}
if(p==1){
if(On2nd == 1){runs <- runs+1}
return(list(On2nd = 1, runs = runs, outs = outs))
}
}
Tat_atBat <- function(runs, outs){
p <- sample(0:1, 1, prob = c(0.9,0.1))
if(p==0){
return(list(runs = runs, outs = outs+1))
}
if(p==1){
return(list(runs = runs + 1, outs = outs))
}
}
Dbl_atBat(0, 1, 2)
## $On2nd
## [1] 0
##
## $runs
## [1] 1
##
## $outs
## [1] 3
Dbl_atBat(1, 2, 0)
## $On2nd
## [1] 1
##
## $runs
## [1] 2
##
## $outs
## [1] 1
Tat_atBat(0,0)
## $runs
## [1] 0
##
## $outs
## [1] 1
Tat_atBat(1,2)
## $runs
## [1] 1
##
## $outs
## [1] 3
Here, we’ll take the team as input and play an inning. This will keep track of the situations, runs, and outs until there is 3 outs and then will report the number of runs scored in that inning.
playInning <- function(team){
if(! team %in% c("Moonwalkers", "Doubloons", "Taters")){
return("That is not a team in the Riddler Baseball League.")
}
s <- c(0,0,0)
O2 <- 0
runs <- 0
outs <- 0
if(team =="Moonwalkers"){
while(outs<3){
AB <- MW_atBat(s,runs,outs)
s <- AB$situation
runs <- AB$runs
outs <- AB$outs
}
}
if(team =="Doubloons"){
while(outs<3){
AB <- Dbl_atBat(O2,runs,outs)
O2 <- AB$On2nd
runs <- AB$runs
outs <- AB$outs
}
}
if(team =="Taters"){
while(outs<3){
AB <- Tat_atBat(runs,outs)
runs <- AB$runs
outs <- AB$outs
}
}
return(runs)
}
playInning("Moonwalkers")
## [1] 0
playInning("Doubloons")
## [1] 0
playInning("Taters")
## [1] 0
In this function, we’ll take as input the home and away teams, in that order. A scoreboard will be the output, where the away team is on top, the home team is on the bottom, and the number of runs scored in each inning is displayed along with the total number of runs at the end.
playGame <- function(home, away){
game <- matrix(rep(0,18), nrow=2)
colnames(game) = 1:9
rownames(game) <- c(paste(away), paste(home))
runsH <- 0
runsA <- 0
inning <- 1
while(inning < 10 || runsA==runsH){
if(ncol(game)<inning){
game <- cbind(game,c(0,0))
colnames(game)[inning] = inning
}
game[1,inning] <- playInning(away)
runsA <- sum(game[1,])
if(inning < 9 || (inning>=9 & runsH<=runsA)){
game[2,inning] <- playInning(home)
runsH <- sum(game[2,])
}
inning <- inning + 1
}
return(cbind(game, runs=c(runsA, runsH)))
}
playGame("Taters", "Doubloons")
## 1 2 3 4 5 6 7 8 9 runs
## Doubloons 0 0 0 0 0 0 0 0 0 0
## Taters 1 0 3 0 0 3 0 0 0 7
playGame("Moonwalkers", "Taters")
## 1 2 3 4 5 6 7 8 9 runs
## Taters 1 0 0 0 1 0 0 3 0 5
## Moonwalkers 0 0 0 0 0 0 1 0 0 1
playGame("Doubloons", "Moonwalkers")
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 runs
## Moonwalkers 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 3
## Doubloons 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 4 7
playGame("Taters", "Moonwalkers")
## 1 2 3 4 5 6 7 8 9 runs
## Moonwalkers 0 0 2 5 1 0 0 0 0 8
## Taters 0 0 0 0 0 1 0 1 0 2
playGame("Moonwalkers", "Doubloons")
## 1 2 3 4 5 6 7 8 9 runs
## Doubloons 0 0 1 0 0 0 0 0 0 1
## Moonwalkers 0 0 0 0 0 0 0 0 0 0
playGame("Doubloons", "Taters")
## 1 2 3 4 5 6 7 8 9 runs
## Taters 3 0 1 0 1 0 0 0 0 5
## Doubloons 0 0 0 0 0 0 0 0 0 0
Note the 18 inning game between the Moonwalkers and Doubloons ends in the unrealistic 4 runs. Although a grand slam is a possibility to end a game, this is not possible with the Doubloons. The Doubloons would have quit after the first run. Since this detail was not that important in how we answer the problem (we only care about wins and losses in the end), I did not fix the function to stop after the first run was scored in the bottom of any extra inning.
I’ll define a season as one in which each team plays the other team a total of 50 times, 25 at home and 25 away. This will result in a total of 150 games. We’ll input the teams and let them play! The output will be the record of wins and losses for each of the three teams.
PlaySeason <- function(teams){
season <- matrix(rep(0,6), nrow=3)
colnames(season) <- c("Wins", "Losses")
rownames(season) <- teams
numgames = length(teams)*100
for(hometeam in teams){
awayteams <- teams[-which(teams==hometeam)]
for(awayteam in awayteams){
for(i in 1:25){
pg <- playGame(hometeam, awayteam)
if(pg[,"runs"][1]>pg[,"runs"][2]){
season[hometeam,2] <- season[hometeam,2]+1
season[awayteam,1] <- season[awayteam,1]+1
}
else{
season[hometeam,1] <- season[hometeam,1]+1
season[awayteam,2] <- season[awayteam,2]+1
}
}
}
}
return(season)
}
RiddlerTeams <- c("Moonwalkers", "Doubloons", "Taters")
PlaySeason(RiddlerTeams)
## Wins Losses
## Moonwalkers 51 49
## Doubloons 43 57
## Taters 56 44
Just one season is not good enough for an answer. Let’s simulate 100 seasons and report an average number of wins and losses for each of the teams. This will give us a high probability of getting the correct answer.
SimBN <- function(N, teams){
aveseason <- PlaySeason(teams)
for(i in 2:N){
aveseason <- ((N-1)*aveseason + PlaySeason(teams))/N
}
return(aveseason)
}
SimBN(100, RiddlerTeams)
## Wins Losses
## Moonwalkers 55.01694 44.98306
## Doubloons 37.85082 62.14918
## Taters 57.13223 42.86777
With a 57.1% win record for the Taters on this simulation over the 55.0% win record for the Moonwalkers, I’m guessing the Taters. Let’s simulate 500 games just to be on the safe side.
SimBN(500, RiddlerTeams)
## Wins Losses
## Moonwalkers 53.94780 46.05220
## Doubloons 39.38016 60.61984
## Taters 56.67204 43.32796
With a little more confidence (56.7% versus 53.9%), I can now say the Taters will, on average, have the best record at the end of a season.