Overview
https://fivethirtyeight.com/features/how-our-mlb-predictions-work/
This article explains how the writers settled on their current rating
system for predicting a baseball team’s likelihood of winning a game.
Included are the different factors that are considered and the rationale
behind their weightings. The ultimate goal of this system is to predict
which team will make the postseason and finally win the World Series
every season.
Data Frame and Modifications
The selected dataset is the MLB 2023 full season set of games. The
rating_prob1 and rating_prob2 columns are unique in that they are meant
to account for a team’s rating, home-field advantage, travel, rest, and
starting pitcher already. The team with above a 0.5 rating probability
is more likely to win. The columns for rating_prob were renamed to
rating_win_probability to clarify what they are meant to predict.
The post-game ratings are omitted because they are not useful for
predicting their own respective games. The resulting subset does not
include Elo ratings because those are more useful for telling a
franchise’s performance over the years. Within the frame of a single
season, they little value above that of rating_prob.
library(tidyverse)
library(openintro)
library(RCurl)
library(dplyr)
mlb_elo_latest_url = "https://raw.githubusercontent.com/Megabuster/Data607/refs/heads/main/data/Assignment1/mlb_elo_latest.csv"
raw_text <- getURL(mlb_elo_latest_url)
mlb_elo_latest_dataset <- read.csv(text = raw_text)
mlb_elo_parsed_dataframe <- select(mlb_elo_latest_dataset, team1, team2, rating_prob1, rating_prob2)
colnames(mlb_elo_parsed_dataframe)[colnames(mlb_elo_parsed_dataframe) == "rating_prob1"] <- "rating_win_probability1"
colnames(mlb_elo_parsed_dataframe)[colnames(mlb_elo_parsed_dataframe) == "rating_prob2"] <- "rating_win_probability2"
mlb_elo_parsed_dataframe[1:10,]
## team1 team2 rating_win_probability1 rating_win_probability2
## 1 STL CIN 0.5758205 0.4241795
## 2 SEA TEX 0.5046097 0.4953903
## 3 NYM PHI 0.5386678 0.4613322
## 4 MIL CHC 0.5574756 0.4425244
## 5 KCR NYY 0.3475025 0.6524975
## 6 DET CLE 0.4300340 0.5699660
## 7 COL MIN 0.4232595 0.5767405
## 8 CHW SDP 0.4533101 0.5466899
## 9 ARI HOU 0.4742784 0.5257216
## 10 TOR TBD 0.4952234 0.5047766
Findings
The article compared the historic Elo system and team ratings to show
why FiveThirtyEight only uses the latter for its seasonal predictions.
The current team ratings system is very adaptive in that it considers
team strength yet accounts for factors such as team rest that can affect
how a team performs. This system, while better than the historic Elo
system, can still be improved further.
Recommendations
Curiously, home-field advantage is given a flat 24 rating points with
an adjustment when there are no fans. Home-field advantage is far more
nuanced than the current iteration of the model gives it credit for. The
article references this Fangraphs article, https://library.fangraphs.com/the-beginners-guide-to-understanding-park-factors/,
but fails to account for the difference in park factors that the latter
article describes. Baseball fields do not follow clear dimensions
standards. This means it is unlikely that all stadiums have a comparable
home-field advantage. The field on which the game is played at is a
known and visible factor. This makes it a good way to improve the
model.
LS0tDQp0aXRsZTogIkRhdGEgNjA3IEFzc2lnbm1lbnQgMSAtIExvYWRpbmcgZGF0YSBpbnRvIGEgRGF0YSBGcmFtZSINCmF1dGhvcjogIkxhd3JlbmNlIFl1Ig0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KDQoNCiMjIyBPdmVydmlldw0KaHR0cHM6Ly9maXZldGhpcnR5ZWlnaHQuY29tL2ZlYXR1cmVzL2hvdy1vdXItbWxiLXByZWRpY3Rpb25zLXdvcmsvDQpUaGlzIGFydGljbGUgZXhwbGFpbnMgaG93IHRoZSB3cml0ZXJzIHNldHRsZWQgb24gdGhlaXIgY3VycmVudCByYXRpbmcgc3lzdGVtIGZvciBwcmVkaWN0aW5nIGEgYmFzZWJhbGwgdGVhbSdzIGxpa2VsaWhvb2Qgb2Ygd2lubmluZyBhIGdhbWUuIEluY2x1ZGVkIGFyZSB0aGUgZGlmZmVyZW50IGZhY3RvcnMgdGhhdCBhcmUgY29uc2lkZXJlZCBhbmQgdGhlIHJhdGlvbmFsZSBiZWhpbmQgdGhlaXIgd2VpZ2h0aW5ncy4gVGhlIHVsdGltYXRlIGdvYWwgb2YgdGhpcyBzeXN0ZW0gaXMgdG8gcHJlZGljdCB3aGljaCB0ZWFtIHdpbGwgbWFrZSB0aGUgcG9zdHNlYXNvbiBhbmQgZmluYWxseSB3aW4gdGhlIFdvcmxkIFNlcmllcyBldmVyeSBzZWFzb24uDQoNCiMjIyBEYXRhIEZyYW1lIGFuZCBNb2RpZmljYXRpb25zDQpUaGUgc2VsZWN0ZWQgZGF0YXNldCBpcyB0aGUgTUxCIDIwMjMgZnVsbCBzZWFzb24gc2V0IG9mIGdhbWVzLiBUaGUgcmF0aW5nX3Byb2IxIGFuZCByYXRpbmdfcHJvYjIgY29sdW1ucyBhcmUgdW5pcXVlIGluIHRoYXQgdGhleSBhcmUgbWVhbnQgdG8gYWNjb3VudCBmb3IgYSB0ZWFtJ3MgcmF0aW5nLCBob21lLWZpZWxkIGFkdmFudGFnZSwgdHJhdmVsLCByZXN0LCBhbmQgc3RhcnRpbmcgcGl0Y2hlciBhbHJlYWR5LiBUaGUgdGVhbSB3aXRoIGFib3ZlIGEgMC41IHJhdGluZyBwcm9iYWJpbGl0eSBpcyBtb3JlIGxpa2VseSB0byB3aW4uIFRoZSBjb2x1bW5zIGZvciByYXRpbmdfcHJvYiB3ZXJlIHJlbmFtZWQgdG8gcmF0aW5nX3dpbl9wcm9iYWJpbGl0eSB0byBjbGFyaWZ5IHdoYXQgdGhleSBhcmUgbWVhbnQgdG8gcHJlZGljdC4NCg0KVGhlIHBvc3QtZ2FtZSByYXRpbmdzIGFyZSBvbWl0dGVkIGJlY2F1c2UgdGhleSBhcmUgbm90IHVzZWZ1bCBmb3IgcHJlZGljdGluZyB0aGVpciBvd24gcmVzcGVjdGl2ZSBnYW1lcy4gVGhlIHJlc3VsdGluZyBzdWJzZXQgZG9lcyBub3QgaW5jbHVkZSBFbG8gcmF0aW5ncyBiZWNhdXNlIHRob3NlIGFyZSBtb3JlIHVzZWZ1bCBmb3IgdGVsbGluZyBhIGZyYW5jaGlzZSdzIHBlcmZvcm1hbmNlIG92ZXIgdGhlIHllYXJzLiBXaXRoaW4gdGhlIGZyYW1lIG9mIGEgc2luZ2xlIHNlYXNvbiwgdGhleSBsaXR0bGUgdmFsdWUgYWJvdmUgdGhhdCBvZiByYXRpbmdfcHJvYi4NCg0KYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShvcGVuaW50cm8pDQpsaWJyYXJ5KFJDdXJsKQ0KbGlicmFyeShkcGx5cikNCmBgYA0KDQpgYGB7ciBtbGJfbGF0ZXN0X2RhdGFzZXRfY29sdW1ucyB9DQptbGJfZWxvX2xhdGVzdF91cmwgPSAiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL01lZ2FidXN0ZXIvRGF0YTYwNy9yZWZzL2hlYWRzL21haW4vZGF0YS9Bc3NpZ25tZW50MS9tbGJfZWxvX2xhdGVzdC5jc3YiDQpyYXdfdGV4dCA8LSBnZXRVUkwobWxiX2Vsb19sYXRlc3RfdXJsKQ0KbWxiX2Vsb19sYXRlc3RfZGF0YXNldCA8LSByZWFkLmNzdih0ZXh0ID0gcmF3X3RleHQpDQptbGJfZWxvX3BhcnNlZF9kYXRhZnJhbWUgPC0gc2VsZWN0KG1sYl9lbG9fbGF0ZXN0X2RhdGFzZXQsIHRlYW0xLCB0ZWFtMiwgcmF0aW5nX3Byb2IxLCByYXRpbmdfcHJvYjIpDQpjb2xuYW1lcyhtbGJfZWxvX3BhcnNlZF9kYXRhZnJhbWUpW2NvbG5hbWVzKG1sYl9lbG9fcGFyc2VkX2RhdGFmcmFtZSkgPT0gInJhdGluZ19wcm9iMSJdIDwtICJyYXRpbmdfd2luX3Byb2JhYmlsaXR5MSINCmNvbG5hbWVzKG1sYl9lbG9fcGFyc2VkX2RhdGFmcmFtZSlbY29sbmFtZXMobWxiX2Vsb19wYXJzZWRfZGF0YWZyYW1lKSA9PSAicmF0aW5nX3Byb2IyIl0gPC0gInJhdGluZ193aW5fcHJvYmFiaWxpdHkyIg0KbWxiX2Vsb19wYXJzZWRfZGF0YWZyYW1lWzE6MTAsXQ0KYGBgDQoNCiMjIyBGaW5kaW5ncw0KVGhlIGFydGljbGUgY29tcGFyZWQgdGhlIGhpc3RvcmljIEVsbyBzeXN0ZW0gYW5kIHRlYW0gcmF0aW5ncyB0byBzaG93IHdoeSBGaXZlVGhpcnR5RWlnaHQgb25seSB1c2VzIHRoZSBsYXR0ZXIgZm9yIGl0cyBzZWFzb25hbCBwcmVkaWN0aW9ucy4gVGhlIGN1cnJlbnQgdGVhbSByYXRpbmdzIHN5c3RlbSBpcyB2ZXJ5IGFkYXB0aXZlIGluIHRoYXQgaXQgY29uc2lkZXJzIHRlYW0gc3RyZW5ndGggeWV0IGFjY291bnRzIGZvciBmYWN0b3JzIHN1Y2ggYXMgdGVhbSByZXN0IHRoYXQgY2FuIGFmZmVjdCBob3cgYSB0ZWFtIHBlcmZvcm1zLiBUaGlzIHN5c3RlbSwgd2hpbGUgYmV0dGVyIHRoYW4gdGhlIGhpc3RvcmljIEVsbyBzeXN0ZW0sIGNhbiBzdGlsbCBiZSBpbXByb3ZlZCBmdXJ0aGVyLg0KDQojIyMgUmVjb21tZW5kYXRpb25zDQpDdXJpb3VzbHksIGhvbWUtZmllbGQgYWR2YW50YWdlIGlzIGdpdmVuIGEgZmxhdCAyNCByYXRpbmcgcG9pbnRzIHdpdGggYW4gYWRqdXN0bWVudCB3aGVuIHRoZXJlIGFyZSBubyBmYW5zLiBIb21lLWZpZWxkIGFkdmFudGFnZSBpcyBmYXIgbW9yZSBudWFuY2VkIHRoYW4gdGhlIGN1cnJlbnQgaXRlcmF0aW9uIG9mIHRoZSBtb2RlbCBnaXZlcyBpdCBjcmVkaXQgZm9yLiBUaGUgYXJ0aWNsZSByZWZlcmVuY2VzIHRoaXMgRmFuZ3JhcGhzIGFydGljbGUsIGh0dHBzOi8vbGlicmFyeS5mYW5ncmFwaHMuY29tL3RoZS1iZWdpbm5lcnMtZ3VpZGUtdG8tdW5kZXJzdGFuZGluZy1wYXJrLWZhY3RvcnMvLCBidXQgZmFpbHMgdG8gYWNjb3VudCBmb3IgdGhlIGRpZmZlcmVuY2UgaW4gcGFyayBmYWN0b3JzIHRoYXQgdGhlIGxhdHRlciBhcnRpY2xlIGRlc2NyaWJlcy4gQmFzZWJhbGwgZmllbGRzIGRvIG5vdCBmb2xsb3cgY2xlYXIgZGltZW5zaW9ucyBzdGFuZGFyZHMuIFRoaXMgbWVhbnMgaXQgaXMgdW5saWtlbHkgdGhhdCBhbGwgc3RhZGl1bXMgaGF2ZSBhIGNvbXBhcmFibGUgaG9tZS1maWVsZCBhZHZhbnRhZ2UuIFRoZSBmaWVsZCBvbiB3aGljaCB0aGUgZ2FtZSBpcyBwbGF5ZWQgYXQgaXMgYSBrbm93biBhbmQgdmlzaWJsZSBmYWN0b3IuIFRoaXMgbWFrZXMgaXQgYSBnb29kIHdheSB0byBpbXByb3ZlIHRoZSBtb2RlbC4NCg0KDQo=