Overview

https://fivethirtyeight.com/features/how-our-mlb-predictions-work/ This article explains how the writers settled on their current rating system for predicting a baseball team’s likelihood of winning a game. Included are the different factors that are considered and the rationale behind their weightings. The ultimate goal of this system is to predict which team will make the postseason and finally win the World Series every season.

Data Frame and Modifications

The selected dataset is the MLB 2023 full season set of games. The rating_prob1 and rating_prob2 columns are unique in that they are meant to account for a team’s rating, home-field advantage, travel, rest, and starting pitcher already. The team with above a 0.5 rating probability is more likely to win. The columns for rating_prob were renamed to rating_win_probability to clarify what they are meant to predict.

The post-game ratings are omitted because they are not useful for predicting their own respective games. The resulting subset does not include Elo ratings because those are more useful for telling a franchise’s performance over the years. Within the frame of a single season, they little value above that of rating_prob.

library(tidyverse)
library(openintro)
library(RCurl)
library(dplyr)
mlb_elo_latest_url = "https://raw.githubusercontent.com/Megabuster/Data607/refs/heads/main/data/Assignment1/mlb_elo_latest.csv"
raw_text <- getURL(mlb_elo_latest_url)
mlb_elo_latest_dataset <- read.csv(text = raw_text)
mlb_elo_parsed_dataframe <- select(mlb_elo_latest_dataset, team1, team2, rating_prob1, rating_prob2)
colnames(mlb_elo_parsed_dataframe)[colnames(mlb_elo_parsed_dataframe) == "rating_prob1"] <- "rating_win_probability1"
colnames(mlb_elo_parsed_dataframe)[colnames(mlb_elo_parsed_dataframe) == "rating_prob2"] <- "rating_win_probability2"
mlb_elo_parsed_dataframe[1:10,]
##    team1 team2 rating_win_probability1 rating_win_probability2
## 1    STL   CIN               0.5758205               0.4241795
## 2    SEA   TEX               0.5046097               0.4953903
## 3    NYM   PHI               0.5386678               0.4613322
## 4    MIL   CHC               0.5574756               0.4425244
## 5    KCR   NYY               0.3475025               0.6524975
## 6    DET   CLE               0.4300340               0.5699660
## 7    COL   MIN               0.4232595               0.5767405
## 8    CHW   SDP               0.4533101               0.5466899
## 9    ARI   HOU               0.4742784               0.5257216
## 10   TOR   TBD               0.4952234               0.5047766

Findings

The article compared the historic Elo system and team ratings to show why FiveThirtyEight only uses the latter for its seasonal predictions. The current team ratings system is very adaptive in that it considers team strength yet accounts for factors such as team rest that can affect how a team performs. This system, while better than the historic Elo system, can still be improved further.

Recommendations

Curiously, home-field advantage is given a flat 24 rating points with an adjustment when there are no fans. Home-field advantage is far more nuanced than the current iteration of the model gives it credit for. The article references this Fangraphs article, https://library.fangraphs.com/the-beginners-guide-to-understanding-park-factors/, but fails to account for the difference in park factors that the latter article describes. Baseball fields do not follow clear dimensions standards. This means it is unlikely that all stadiums have a comparable home-field advantage. The field on which the game is played at is a known and visible factor. This makes it a good way to improve the model.

LS0tDQp0aXRsZTogIkRhdGEgNjA3IEFzc2lnbm1lbnQgMSAtIExvYWRpbmcgZGF0YSBpbnRvIGEgRGF0YSBGcmFtZSINCmF1dGhvcjogIkxhd3JlbmNlIFl1Ig0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KDQoNCiMjIyBPdmVydmlldw0KaHR0cHM6Ly9maXZldGhpcnR5ZWlnaHQuY29tL2ZlYXR1cmVzL2hvdy1vdXItbWxiLXByZWRpY3Rpb25zLXdvcmsvDQpUaGlzIGFydGljbGUgZXhwbGFpbnMgaG93IHRoZSB3cml0ZXJzIHNldHRsZWQgb24gdGhlaXIgY3VycmVudCByYXRpbmcgc3lzdGVtIGZvciBwcmVkaWN0aW5nIGEgYmFzZWJhbGwgdGVhbSdzIGxpa2VsaWhvb2Qgb2Ygd2lubmluZyBhIGdhbWUuIEluY2x1ZGVkIGFyZSB0aGUgZGlmZmVyZW50IGZhY3RvcnMgdGhhdCBhcmUgY29uc2lkZXJlZCBhbmQgdGhlIHJhdGlvbmFsZSBiZWhpbmQgdGhlaXIgd2VpZ2h0aW5ncy4gVGhlIHVsdGltYXRlIGdvYWwgb2YgdGhpcyBzeXN0ZW0gaXMgdG8gcHJlZGljdCB3aGljaCB0ZWFtIHdpbGwgbWFrZSB0aGUgcG9zdHNlYXNvbiBhbmQgZmluYWxseSB3aW4gdGhlIFdvcmxkIFNlcmllcyBldmVyeSBzZWFzb24uDQoNCiMjIyBEYXRhIEZyYW1lIGFuZCBNb2RpZmljYXRpb25zDQpUaGUgc2VsZWN0ZWQgZGF0YXNldCBpcyB0aGUgTUxCIDIwMjMgZnVsbCBzZWFzb24gc2V0IG9mIGdhbWVzLiBUaGUgcmF0aW5nX3Byb2IxIGFuZCByYXRpbmdfcHJvYjIgY29sdW1ucyBhcmUgdW5pcXVlIGluIHRoYXQgdGhleSBhcmUgbWVhbnQgdG8gYWNjb3VudCBmb3IgYSB0ZWFtJ3MgcmF0aW5nLCBob21lLWZpZWxkIGFkdmFudGFnZSwgdHJhdmVsLCByZXN0LCBhbmQgc3RhcnRpbmcgcGl0Y2hlciBhbHJlYWR5LiBUaGUgdGVhbSB3aXRoIGFib3ZlIGEgMC41IHJhdGluZyBwcm9iYWJpbGl0eSBpcyBtb3JlIGxpa2VseSB0byB3aW4uIFRoZSBjb2x1bW5zIGZvciByYXRpbmdfcHJvYiB3ZXJlIHJlbmFtZWQgdG8gcmF0aW5nX3dpbl9wcm9iYWJpbGl0eSB0byBjbGFyaWZ5IHdoYXQgdGhleSBhcmUgbWVhbnQgdG8gcHJlZGljdC4NCg0KVGhlIHBvc3QtZ2FtZSByYXRpbmdzIGFyZSBvbWl0dGVkIGJlY2F1c2UgdGhleSBhcmUgbm90IHVzZWZ1bCBmb3IgcHJlZGljdGluZyB0aGVpciBvd24gcmVzcGVjdGl2ZSBnYW1lcy4gVGhlIHJlc3VsdGluZyBzdWJzZXQgZG9lcyBub3QgaW5jbHVkZSBFbG8gcmF0aW5ncyBiZWNhdXNlIHRob3NlIGFyZSBtb3JlIHVzZWZ1bCBmb3IgdGVsbGluZyBhIGZyYW5jaGlzZSdzIHBlcmZvcm1hbmNlIG92ZXIgdGhlIHllYXJzLiBXaXRoaW4gdGhlIGZyYW1lIG9mIGEgc2luZ2xlIHNlYXNvbiwgdGhleSBsaXR0bGUgdmFsdWUgYWJvdmUgdGhhdCBvZiByYXRpbmdfcHJvYi4NCg0KYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShvcGVuaW50cm8pDQpsaWJyYXJ5KFJDdXJsKQ0KbGlicmFyeShkcGx5cikNCmBgYA0KDQpgYGB7ciBtbGJfbGF0ZXN0X2RhdGFzZXRfY29sdW1ucyB9DQptbGJfZWxvX2xhdGVzdF91cmwgPSAiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL01lZ2FidXN0ZXIvRGF0YTYwNy9yZWZzL2hlYWRzL21haW4vZGF0YS9Bc3NpZ25tZW50MS9tbGJfZWxvX2xhdGVzdC5jc3YiDQpyYXdfdGV4dCA8LSBnZXRVUkwobWxiX2Vsb19sYXRlc3RfdXJsKQ0KbWxiX2Vsb19sYXRlc3RfZGF0YXNldCA8LSByZWFkLmNzdih0ZXh0ID0gcmF3X3RleHQpDQptbGJfZWxvX3BhcnNlZF9kYXRhZnJhbWUgPC0gc2VsZWN0KG1sYl9lbG9fbGF0ZXN0X2RhdGFzZXQsIHRlYW0xLCB0ZWFtMiwgcmF0aW5nX3Byb2IxLCByYXRpbmdfcHJvYjIpDQpjb2xuYW1lcyhtbGJfZWxvX3BhcnNlZF9kYXRhZnJhbWUpW2NvbG5hbWVzKG1sYl9lbG9fcGFyc2VkX2RhdGFmcmFtZSkgPT0gInJhdGluZ19wcm9iMSJdIDwtICJyYXRpbmdfd2luX3Byb2JhYmlsaXR5MSINCmNvbG5hbWVzKG1sYl9lbG9fcGFyc2VkX2RhdGFmcmFtZSlbY29sbmFtZXMobWxiX2Vsb19wYXJzZWRfZGF0YWZyYW1lKSA9PSAicmF0aW5nX3Byb2IyIl0gPC0gInJhdGluZ193aW5fcHJvYmFiaWxpdHkyIg0KbWxiX2Vsb19wYXJzZWRfZGF0YWZyYW1lWzE6MTAsXQ0KYGBgDQoNCiMjIyBGaW5kaW5ncw0KVGhlIGFydGljbGUgY29tcGFyZWQgdGhlIGhpc3RvcmljIEVsbyBzeXN0ZW0gYW5kIHRlYW0gcmF0aW5ncyB0byBzaG93IHdoeSBGaXZlVGhpcnR5RWlnaHQgb25seSB1c2VzIHRoZSBsYXR0ZXIgZm9yIGl0cyBzZWFzb25hbCBwcmVkaWN0aW9ucy4gVGhlIGN1cnJlbnQgdGVhbSByYXRpbmdzIHN5c3RlbSBpcyB2ZXJ5IGFkYXB0aXZlIGluIHRoYXQgaXQgY29uc2lkZXJzIHRlYW0gc3RyZW5ndGggeWV0IGFjY291bnRzIGZvciBmYWN0b3JzIHN1Y2ggYXMgdGVhbSByZXN0IHRoYXQgY2FuIGFmZmVjdCBob3cgYSB0ZWFtIHBlcmZvcm1zLiBUaGlzIHN5c3RlbSwgd2hpbGUgYmV0dGVyIHRoYW4gdGhlIGhpc3RvcmljIEVsbyBzeXN0ZW0sIGNhbiBzdGlsbCBiZSBpbXByb3ZlZCBmdXJ0aGVyLg0KDQojIyMgUmVjb21tZW5kYXRpb25zDQpDdXJpb3VzbHksIGhvbWUtZmllbGQgYWR2YW50YWdlIGlzIGdpdmVuIGEgZmxhdCAyNCByYXRpbmcgcG9pbnRzIHdpdGggYW4gYWRqdXN0bWVudCB3aGVuIHRoZXJlIGFyZSBubyBmYW5zLiBIb21lLWZpZWxkIGFkdmFudGFnZSBpcyBmYXIgbW9yZSBudWFuY2VkIHRoYW4gdGhlIGN1cnJlbnQgaXRlcmF0aW9uIG9mIHRoZSBtb2RlbCBnaXZlcyBpdCBjcmVkaXQgZm9yLiBUaGUgYXJ0aWNsZSByZWZlcmVuY2VzIHRoaXMgRmFuZ3JhcGhzIGFydGljbGUsIGh0dHBzOi8vbGlicmFyeS5mYW5ncmFwaHMuY29tL3RoZS1iZWdpbm5lcnMtZ3VpZGUtdG8tdW5kZXJzdGFuZGluZy1wYXJrLWZhY3RvcnMvLCBidXQgZmFpbHMgdG8gYWNjb3VudCBmb3IgdGhlIGRpZmZlcmVuY2UgaW4gcGFyayBmYWN0b3JzIHRoYXQgdGhlIGxhdHRlciBhcnRpY2xlIGRlc2NyaWJlcy4gQmFzZWJhbGwgZmllbGRzIGRvIG5vdCBmb2xsb3cgY2xlYXIgZGltZW5zaW9ucyBzdGFuZGFyZHMuIFRoaXMgbWVhbnMgaXQgaXMgdW5saWtlbHkgdGhhdCBhbGwgc3RhZGl1bXMgaGF2ZSBhIGNvbXBhcmFibGUgaG9tZS1maWVsZCBhZHZhbnRhZ2UuIFRoZSBmaWVsZCBvbiB3aGljaCB0aGUgZ2FtZSBpcyBwbGF5ZWQgYXQgaXMgYSBrbm93biBhbmQgdmlzaWJsZSBmYWN0b3IuIFRoaXMgbWFrZXMgaXQgYSBnb29kIHdheSB0byBpbXByb3ZlIHRoZSBtb2RlbC4NCg0KDQo=