New York, New York : A Tale of two Teams





On July 24th, the talk of the baseball world was how the New York Mets were a big success while the Yankees were closer to last place than first



Team Division Race Record Playoff Chances
Mets 1st Place 51-43 63%
Yankees 3rd Place 50-44 35%




40 days later, the Mets odds of a post season appearance plunged to 1%. The Yankees chances jumped to 98%.



Team Division Race Record Playoff Chances
Mets 3rd Place 61-67 1%
Yankees 2nd Place 76-52 98%





Lets examine what happened by looking at the daily change in ELO scores with each game.

In this excercise, we will stick with Base R.





  1. Retrieve data from fivethirtyeight.com
mlb_elo <-read.csv("https://projects.fivethirtyeight.com/mlb-api/mlb_elo.csv")



2. Isolate the mets and yankees in the past month

mlb_elo_2 <- subset(mlb_elo,(team1=="NYM" | team2=="NYM" | team1=="NYY" | team2=="NYY" ) & date < "2021-08-28" & date > "2021-07-28" )[c("date","team1", "team2", "elo1_post" ,"elo2_post","score1", "score2" )]



3. Rearrange the data into 2 tables to make it more readable i.e.ย the Mets table would look like

Date Opponent Outcome ELO Rating
2021-08-27 WSN L 1490.045
2021-08-26 SFG L 1491.586



mets_away <- subset(mlb_elo_2,team1=="NYM")[c("date", "team2", "elo1_post", "score1", "score2" )]
names(mets_away)<-c("date", "opponent", "elo", "score_Mets", "score_Opponent" )
mets_away$outcome  <- ifelse(mets_away$score_Mets > mets_away$score_Opponent, "W", "L")
mets_away$loc="A"




mets_home <- subset(mlb_elo_2,team2=="NYM")[c("date", "team1", "elo2_post", "score2", "score1" )]
names(mets_home)<-c("date", "opponent", "elo", "score_Mets", "score_Opponent" )
mets_home$outcome  <- ifelse(mets_home$score_Mets > mets_home$score_Opponent, "W", "L")
mets_home$loc="H"


mets<-rbind(mets_away, mets_home)

   # sort by date and apply a reverse index
mets<-mets[order(mets$date, decreasing =FALSE), ] 
rownames(mets)<-seq(28:1)


knitr::kable(mets, caption='Mets : What Happened in the past Month ??')
Mets : What Happened in the past Month ??
date opponent elo score_Mets score_Opponent outcome loc
2021-07-29 ATL 1509.421 3 6 L A
2021-07-30 CIN 1506.525 2 6 L A
2021-07-31 CIN 1507.809 5 4 W A
2021-08-01 CIN 1504.194 1 7 L A
2021-08-02 FLA 1501.963 3 6 L H
2021-08-03 FLA 1500.574 4 5 L H
2021-08-04 FLA 1502.547 5 3 W H
2021-08-05 FLA 1500.728 2 4 L H
2021-08-06 PHI 1499.111 2 4 L H
2021-08-07 PHI 1497.515 3 5 L H
2021-08-08 PHI 1495.595 0 3 L H
2021-08-10 WSN 1496.869 8 7 W A
2021-08-12 WSN 1500.131 5 4 W A
2021-08-12 WSN 1498.891 4 1 W A
2021-08-13 LAD 1499.040 5 6 L A
2021-08-14 LAD 1497.959 1 2 L A
2021-08-15 LAD 1494.506 4 14 L A
2021-08-16 SFG 1493.229 5 7 L H
2021-08-17 SFG 1492.264 2 3 L H
2021-08-18 SFG 1495.823 6 2 W H
2021-08-19 LAD 1494.496 1 4 L H
2021-08-20 LAD 1493.655 2 3 L H
2021-08-21 LAD 1492.822 3 4 L H
2021-08-22 LAD 1497.312 7 2 W H
2021-08-24 SFG 1493.922 0 8 L A
2021-08-25 SFG 1492.748 2 3 L A
2021-08-26 SFG 1491.586 2 3 L A
2021-08-27 WSN 1490.045 1 2 L A



  1. Do the same for the Yankees

yankees_away <- subset(mlb_elo_2,team1=="NYY")[c("date", "team2", "elo1_post", "score1", "score2" )]
names(yankees_away)<-c("date", "opponent", "elo", "score_Yankees", "score_Opponent" )
yankees_away$outcome  <- ifelse(yankees_away$score_Yankees > yankees_away$score_Opponent, "W", "L")
yankees_away$loc="A"




yankees_home <- subset(mlb_elo_2,team2=="NYY")[c("date", "team1", "elo2_post", "score2", "score1" )]
names(yankees_home)<-c("date", "opponent", "elo", "score_Yankees", "score_Opponent" )
yankees_home$outcome  <- ifelse(yankees_home$score_Yankees > yankees_home$score_Opponent, "W", "L")
yankees_home$loc="H"


yankees<-rbind(yankees_away, yankees_home)
yankees<-yankees[order(yankees$date, decreasing = FALSE), ] 

rownames(yankees)<-seq(28:1)


knitr::kable(yankees, caption='Yankees : What Happened in the past Month ??')
Yankees : What Happened in the past Month ??
date opponent elo score_Yankees score_Opponent outcome loc
2021-07-29 TBD 1525.869 0 14 L H
2021-07-30 FLA 1527.681 3 1 W H
2021-07-31 FLA 1529.469 4 2 W H
2021-08-01 FLA 1531.233 3 1 W H
2021-08-02 BAL 1526.726 1 7 L A
2021-08-03 BAL 1530.044 13 1 W A
2021-08-04 BAL 1532.319 10 3 W A
2021-08-05 SEA 1533.717 5 3 W A
2021-08-06 SEA 1534.776 3 2 W A
2021-08-07 SEA 1535.824 5 4 W A
2021-08-08 SEA 1533.616 0 2 L A
2021-08-09 KCR 1535.031 8 6 W H
2021-08-10 KCR 1532.001 4 8 L H
2021-08-11 KCR 1533.775 5 2 W H
2021-08-12 CHW 1532.420 8 9 L H
2021-08-14 CHW 1534.394 7 5 W H
2021-08-15 CHW 1536.342 5 3 W H
2021-08-16 ANA 1537.357 2 1 W A
2021-08-17 BOS 1540.442 2 0 W A
2021-08-17 BOS 1538.910 5 3 W A
2021-08-18 BOS 1542.302 5 2 W A
2021-08-19 MIN 1543.577 7 5 W A
2021-08-20 MIN 1546.327 10 2 W A
2021-08-21 MIN 1548.568 7 1 W A
2021-08-23 ATL 1551.461 5 1 W H
2021-08-24 ATL 1552.961 5 4 W H
2021-08-26 OAK 1554.381 7 6 W H
2021-08-27 OAK 1557.711 8 2 W H



  1. Now plot the progression of the elo scores.



# set the y limit to the lowest and highest ELO score

plot(yankees$elo, type="l", ylab="ELO", xlab="Day", col="blue", ylim=c(1488,1560), main="ELO : One Month in New York Baseball")   
lines(mets$elo, col="red", lty=1)
legend("topleft", legend=c("Mets", "Yankees"), col=c("red", "blue"), lty=1, cex=0.8,text.font=4, bg='lightblue')



Conclusion



Both teams now have a record that conforms to preseason expectations so we can use the law of large numbers to explain the change



The Mets recent record can be explained in part to the loss of their best pitcher plus a brutal road trip against some very good teams



The Yankees on the other hand, as evidenced by a profound improvement in their ELO score simply got hot and are currently playing at a high level