New York, New York : A Tale of two Teams
On July 24th, the talk of the baseball world was how the New York Mets were a big success while the Yankees were closer to last place than first
| Mets |
1st Place |
51-43 |
63% |
| Yankees |
3rd Place |
50-44 |
35% |
40 days later, the Mets odds of a post season appearance plunged to 1%. The Yankees chances jumped to 98%.
| Mets |
3rd Place |
61-67 |
1% |
| Yankees |
2nd Place |
76-52 |
98% |
Lets examine what happened by looking at the daily change in ELO scores with each game.
In this excercise, we will stick with Base R.
- Retrieve data from fivethirtyeight.com
mlb_elo <-read.csv("https://projects.fivethirtyeight.com/mlb-api/mlb_elo.csv")
2. Isolate the mets and yankees in the past month
mlb_elo_2 <- subset(mlb_elo,(team1=="NYM" | team2=="NYM" | team1=="NYY" | team2=="NYY" ) & date < "2021-08-28" & date > "2021-07-28" )[c("date","team1", "team2", "elo1_post" ,"elo2_post","score1", "score2" )]
3. Rearrange the data into 2 tables to make it more readable i.e.ย the Mets table would look like
| 2021-08-27 |
WSN |
L |
1490.045 |
| 2021-08-26 |
SFG |
L |
1491.586 |
mets_away <- subset(mlb_elo_2,team1=="NYM")[c("date", "team2", "elo1_post", "score1", "score2" )]
names(mets_away)<-c("date", "opponent", "elo", "score_Mets", "score_Opponent" )
mets_away$outcome <- ifelse(mets_away$score_Mets > mets_away$score_Opponent, "W", "L")
mets_away$loc="A"
mets_home <- subset(mlb_elo_2,team2=="NYM")[c("date", "team1", "elo2_post", "score2", "score1" )]
names(mets_home)<-c("date", "opponent", "elo", "score_Mets", "score_Opponent" )
mets_home$outcome <- ifelse(mets_home$score_Mets > mets_home$score_Opponent, "W", "L")
mets_home$loc="H"
mets<-rbind(mets_away, mets_home)
# sort by date and apply a reverse index
mets<-mets[order(mets$date, decreasing =FALSE), ]
rownames(mets)<-seq(28:1)
knitr::kable(mets, caption='Mets : What Happened in the past Month ??')
Mets : What Happened in the past Month ??
| 2021-07-29 |
ATL |
1509.421 |
3 |
6 |
L |
A |
| 2021-07-30 |
CIN |
1506.525 |
2 |
6 |
L |
A |
| 2021-07-31 |
CIN |
1507.809 |
5 |
4 |
W |
A |
| 2021-08-01 |
CIN |
1504.194 |
1 |
7 |
L |
A |
| 2021-08-02 |
FLA |
1501.963 |
3 |
6 |
L |
H |
| 2021-08-03 |
FLA |
1500.574 |
4 |
5 |
L |
H |
| 2021-08-04 |
FLA |
1502.547 |
5 |
3 |
W |
H |
| 2021-08-05 |
FLA |
1500.728 |
2 |
4 |
L |
H |
| 2021-08-06 |
PHI |
1499.111 |
2 |
4 |
L |
H |
| 2021-08-07 |
PHI |
1497.515 |
3 |
5 |
L |
H |
| 2021-08-08 |
PHI |
1495.595 |
0 |
3 |
L |
H |
| 2021-08-10 |
WSN |
1496.869 |
8 |
7 |
W |
A |
| 2021-08-12 |
WSN |
1500.131 |
5 |
4 |
W |
A |
| 2021-08-12 |
WSN |
1498.891 |
4 |
1 |
W |
A |
| 2021-08-13 |
LAD |
1499.040 |
5 |
6 |
L |
A |
| 2021-08-14 |
LAD |
1497.959 |
1 |
2 |
L |
A |
| 2021-08-15 |
LAD |
1494.506 |
4 |
14 |
L |
A |
| 2021-08-16 |
SFG |
1493.229 |
5 |
7 |
L |
H |
| 2021-08-17 |
SFG |
1492.264 |
2 |
3 |
L |
H |
| 2021-08-18 |
SFG |
1495.823 |
6 |
2 |
W |
H |
| 2021-08-19 |
LAD |
1494.496 |
1 |
4 |
L |
H |
| 2021-08-20 |
LAD |
1493.655 |
2 |
3 |
L |
H |
| 2021-08-21 |
LAD |
1492.822 |
3 |
4 |
L |
H |
| 2021-08-22 |
LAD |
1497.312 |
7 |
2 |
W |
H |
| 2021-08-24 |
SFG |
1493.922 |
0 |
8 |
L |
A |
| 2021-08-25 |
SFG |
1492.748 |
2 |
3 |
L |
A |
| 2021-08-26 |
SFG |
1491.586 |
2 |
3 |
L |
A |
| 2021-08-27 |
WSN |
1490.045 |
1 |
2 |
L |
A |
- Do the same for the Yankees
yankees_away <- subset(mlb_elo_2,team1=="NYY")[c("date", "team2", "elo1_post", "score1", "score2" )]
names(yankees_away)<-c("date", "opponent", "elo", "score_Yankees", "score_Opponent" )
yankees_away$outcome <- ifelse(yankees_away$score_Yankees > yankees_away$score_Opponent, "W", "L")
yankees_away$loc="A"
yankees_home <- subset(mlb_elo_2,team2=="NYY")[c("date", "team1", "elo2_post", "score2", "score1" )]
names(yankees_home)<-c("date", "opponent", "elo", "score_Yankees", "score_Opponent" )
yankees_home$outcome <- ifelse(yankees_home$score_Yankees > yankees_home$score_Opponent, "W", "L")
yankees_home$loc="H"
yankees<-rbind(yankees_away, yankees_home)
yankees<-yankees[order(yankees$date, decreasing = FALSE), ]
rownames(yankees)<-seq(28:1)
knitr::kable(yankees, caption='Yankees : What Happened in the past Month ??')
Yankees : What Happened in the past Month ??
| 2021-07-29 |
TBD |
1525.869 |
0 |
14 |
L |
H |
| 2021-07-30 |
FLA |
1527.681 |
3 |
1 |
W |
H |
| 2021-07-31 |
FLA |
1529.469 |
4 |
2 |
W |
H |
| 2021-08-01 |
FLA |
1531.233 |
3 |
1 |
W |
H |
| 2021-08-02 |
BAL |
1526.726 |
1 |
7 |
L |
A |
| 2021-08-03 |
BAL |
1530.044 |
13 |
1 |
W |
A |
| 2021-08-04 |
BAL |
1532.319 |
10 |
3 |
W |
A |
| 2021-08-05 |
SEA |
1533.717 |
5 |
3 |
W |
A |
| 2021-08-06 |
SEA |
1534.776 |
3 |
2 |
W |
A |
| 2021-08-07 |
SEA |
1535.824 |
5 |
4 |
W |
A |
| 2021-08-08 |
SEA |
1533.616 |
0 |
2 |
L |
A |
| 2021-08-09 |
KCR |
1535.031 |
8 |
6 |
W |
H |
| 2021-08-10 |
KCR |
1532.001 |
4 |
8 |
L |
H |
| 2021-08-11 |
KCR |
1533.775 |
5 |
2 |
W |
H |
| 2021-08-12 |
CHW |
1532.420 |
8 |
9 |
L |
H |
| 2021-08-14 |
CHW |
1534.394 |
7 |
5 |
W |
H |
| 2021-08-15 |
CHW |
1536.342 |
5 |
3 |
W |
H |
| 2021-08-16 |
ANA |
1537.357 |
2 |
1 |
W |
A |
| 2021-08-17 |
BOS |
1540.442 |
2 |
0 |
W |
A |
| 2021-08-17 |
BOS |
1538.910 |
5 |
3 |
W |
A |
| 2021-08-18 |
BOS |
1542.302 |
5 |
2 |
W |
A |
| 2021-08-19 |
MIN |
1543.577 |
7 |
5 |
W |
A |
| 2021-08-20 |
MIN |
1546.327 |
10 |
2 |
W |
A |
| 2021-08-21 |
MIN |
1548.568 |
7 |
1 |
W |
A |
| 2021-08-23 |
ATL |
1551.461 |
5 |
1 |
W |
H |
| 2021-08-24 |
ATL |
1552.961 |
5 |
4 |
W |
H |
| 2021-08-26 |
OAK |
1554.381 |
7 |
6 |
W |
H |
| 2021-08-27 |
OAK |
1557.711 |
8 |
2 |
W |
H |
- Now plot the progression of the elo scores.
# set the y limit to the lowest and highest ELO score
plot(yankees$elo, type="l", ylab="ELO", xlab="Day", col="blue", ylim=c(1488,1560), main="ELO : One Month in New York Baseball")
lines(mets$elo, col="red", lty=1)
legend("topleft", legend=c("Mets", "Yankees"), col=c("red", "blue"), lty=1, cex=0.8,text.font=4, bg='lightblue')

Conclusion
Both teams now have a record that conforms to preseason expectations so we can use the law of large numbers to explain the change
The Mets recent record can be explained in part to the loss of their best pitcher plus a brutal road trip against some very good teams
The Yankees on the other hand, as evidenced by a profound improvement in their ELO score simply got hot and are currently playing at a high level