The following data was drawn from FiveThirtyEight, which attempts to predict NBA game outcome by ranking team. The latest ranking methodology is a result of mutliple iterations over a period of several years. Initially, teams were ranked using the Elo system, which tracks the teams winning history and margin of victory but it fails to account for the signing on of new members or loss of members (whether due to injuries or rest) and it’s impact on overall team ranking. All Elo results are based on on-court results and the system would take time to update and reflect the talent lost or gained via team ranking. The methodology was further developed using the Elo system but again, the issue was not resolved in capturing other major factors such as a mega-talented teams that are known to pull back in effort towards the end of a regular season. The methodology moved away from Elo system altogether and adopted a new projection algorithm, “CARMELO”, which ranks teams based on the current level of talent on the roster.
FiveThirtyEight’s RAPTOR metric, which stands for Robust Algorithm (using) Player Tracking (and) On/Off Ratings. The player’s performance history is used as a template to predict how the player may perform in the future. Then the player is ranked for offensive and defensive ratings for the next several seasons, which will effect the player’s influence on the overall team rating. Using this model, each player is rated per 100 possessions. See the link below for more information on the methodology.
https://fivethirtyeight.com/methodology/how-our-nba-predictions-work/
The current methodology uses both the Elo system as well as the Raptor method. Through FiveThirtyEight’s extensive testing, giving Elo and Raptor a 35% weight to the overall team ranking has yielded the best predictive results. Although this perecentage varies depending on the current roster and how much the current roster contributes to the Elo rating.
This report will look at both the original Elo rating as well as the current Raptor rating system and compare against each team’s outcome per game in the 2020-21 season.
library(readr)
urlfile<-"https://projects.fivethirtyeight.com/nba-model/nba_elo_latest.csv"
forecast<-read_csv(url(urlfile))
## Warning: One or more parsing issues, see `problems()` for details
## Rows: 1171 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): team1, team2
## dbl (14): season, neutral, elo1_pre, elo2_pre, elo_prob1, elo_prob2, elo1_p...
## lgl (7): playoff, carm-elo1_pre, carm-elo2_pre, carm-elo_prob1, carm-elo_p...
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
comparison<-subset(forecast,select=c(team1,team2,elo_prob1,elo_prob2,raptor_prob1,raptor_prob2,score1,score2))
summary(comparison)
## team1 team2 elo_prob1 elo_prob2
## Length:1171 Length:1171 Min. :0.1298 Min. :0.05332
## Class :character Class :character 1st Qu.:0.5131 1st Qu.:0.25201
## Mode :character Mode :character Median :0.6406 Median :0.35943
## Mean :0.6256 Mean :0.37439
## 3rd Qu.:0.7480 3rd Qu.:0.48688
## Max. :0.9467 Max. :0.87023
## raptor_prob1 raptor_prob2 score1 score2
## Min. :0.05714 Min. :0.005665 Min. : 73.0 Min. : 75.0
## 1st Qu.:0.46411 1st Qu.:0.260065 1st Qu.:104.0 1st Qu.:103.0
## Median :0.61677 Median :0.383225 Median :112.0 Median :111.0
## Mean :0.59679 Mean :0.403215 Mean :112.6 Mean :111.4
## 3rd Qu.:0.73994 3rd Qu.:0.535889 3rd Qu.:121.0 3rd Qu.:120.0
## Max. :0.99434 Max. :0.942861 Max. :154.0 Max. :154.0
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#subset of data when team1 wins
team1wins<-filter(comparison,score1>score2)
#subset of data when team2 wins
team2wins<-filter(comparison,score2>score1)
#percentage of when Elo gets the prediction correct
elo_team1_correct<-filter(team1wins,elo_prob1>elo_prob2)
elo_team2_correct<-filter(team2wins,elo_prob1>elo_prob2)
elo_correct<-nrow(elo_team1_correct)+nrow(elo_team2_correct)
elo_percentage_correct<-(elo_correct/nrow(comparison))*100
cat("The percentage of when the Elo system predicts correctly is",elo_percentage_correct,"%")
## The percentage of when the Elo system predicts correctly is 76.94278 %
#percentage of when raptor gets the prediction correct
raptor_team1_correct<-filter(team1wins,raptor_prob1>raptor_prob2)
raptor_team2_correct<-filter(team2wins,raptor_prob1>raptor_prob2)
raptor_correct<-nrow(raptor_team1_correct)+nrow(raptor_team2_correct)
raptor_percentage_correct<-(raptor_correct/nrow(comparison))*100
cat("The percentage of when the Raptor rating system predicts correctly is",raptor_percentage_correct,"%")
## The percentage of when the Raptor rating system predicts correctly is 69.51324 %
There are many factors to consider when designing a predictive model of NBA game outcomes. The Roster and individual player history will make significant impact on the Elo system versus the Raptor system and due to its volatile nature, the data scientist will have to make case by case decision on the appropriate model for the dataset collected. In this case, when analyzing the Raptor and Elo separately, the Elo has a higher percentage of correct predictions than Raptor. This may be a result of analyzing a dataset for 2020-2021 only and may have to do with the specific players in the roster for those years.