Assignment 1

by Catherine Cho

Introduction

The following data was drawn from FiveThirtyEight, which attempts to predict NBA game outcome by ranking team. The latest ranking methodology is a result of mutliple iterations over a period of several years. Initially, teams were ranked using the Elo system, which tracks the teams winning history and margin of victory but it fails to account for the signing on of new members or loss of members (whether due to injuries or rest) and it’s impact on overall team ranking. All Elo results are based on on-court results and the system would take time to update and reflect the talent lost or gained via team ranking. The methodology was further developed using the Elo system but again, the issue was not resolved in capturing other major factors such as a mega-talented teams that are known to pull back in effort towards the end of a regular season. The methodology moved away from Elo system altogether and adopted a new projection algorithm, “CARMELO”, which ranks teams based on the current level of talent on the roster.

FiveThirtyEight’s RAPTOR metric, which stands for Robust Algorithm (using) Player Tracking (and) On/Off Ratings. The player’s performance history is used as a template to predict how the player may perform in the future. Then the player is ranked for offensive and defensive ratings for the next several seasons, which will effect the player’s influence on the overall team rating. Using this model, each player is rated per 100 possessions. See the link below for more information on the methodology.

https://fivethirtyeight.com/methodology/how-our-nba-predictions-work/

Current Projection Methodology

The current methodology uses both the Elo system as well as the Raptor method. Through FiveThirtyEight’s extensive testing, giving Elo and Raptor a 35% weight to the overall team ranking has yielded the best predictive results. Although this perecentage varies depending on the current roster and how much the current roster contributes to the Elo rating.

Purpose

This report will look at both the original Elo rating as well as the current Raptor rating system and compare against each team’s outcome per game in the 2020-21 season.

library(readr)
urlfile<-"https://projects.fivethirtyeight.com/nba-model/nba_elo_latest.csv"
forecast<-read_csv(url(urlfile))

## Warning: One or more parsing issues, see `problems()` for details

## Rows: 1171 Columns: 24

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (2): team1, team2
## dbl  (14): season, neutral, elo1_pre, elo2_pre, elo_prob1, elo_prob2, elo1_p...
## lgl   (7): playoff, carm-elo1_pre, carm-elo2_pre, carm-elo_prob1, carm-elo_p...
## date  (1): date

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Subsetting data to obtain a dataframe of Elo v. Raptor probablity per game

comparison<-subset(forecast,select=c(team1,team2,elo_prob1,elo_prob2,raptor_prob1,raptor_prob2,score1,score2))
summary(comparison)

##     team1              team2             elo_prob1        elo_prob2      
##  Length:1171        Length:1171        Min.   :0.1298   Min.   :0.05332  
##  Class :character   Class :character   1st Qu.:0.5131   1st Qu.:0.25201  
##  Mode  :character   Mode  :character   Median :0.6406   Median :0.35943  
##                                        Mean   :0.6256   Mean   :0.37439  
##                                        3rd Qu.:0.7480   3rd Qu.:0.48688  
##                                        Max.   :0.9467   Max.   :0.87023  
##   raptor_prob1      raptor_prob2          score1          score2     
##  Min.   :0.05714   Min.   :0.005665   Min.   : 73.0   Min.   : 75.0  
##  1st Qu.:0.46411   1st Qu.:0.260065   1st Qu.:104.0   1st Qu.:103.0  
##  Median :0.61677   Median :0.383225   Median :112.0   Median :111.0  
##  Mean   :0.59679   Mean   :0.403215   Mean   :112.6   Mean   :111.4  
##  3rd Qu.:0.73994   3rd Qu.:0.535889   3rd Qu.:121.0   3rd Qu.:120.0  
##  Max.   :0.99434   Max.   :0.942861   Max.   :154.0   Max.   :154.0

Filtering through dataframe “comparison” to report when team 1 wins vs. team 2

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#subset of data when team1 wins
team1wins<-filter(comparison,score1>score2)
#subset of data when team2 wins
team2wins<-filter(comparison,score2>score1)

Calculation of when the Elo system predicts correctly or when Raptor predicts correctly.

#percentage of when Elo gets the prediction correct
elo_team1_correct<-filter(team1wins,elo_prob1>elo_prob2)
elo_team2_correct<-filter(team2wins,elo_prob1>elo_prob2)
elo_correct<-nrow(elo_team1_correct)+nrow(elo_team2_correct)
elo_percentage_correct<-(elo_correct/nrow(comparison))*100
cat("The percentage of when the Elo system predicts correctly is",elo_percentage_correct,"%")

## The percentage of when the Elo system predicts correctly is 76.94278 %

#percentage of when raptor gets the prediction correct
raptor_team1_correct<-filter(team1wins,raptor_prob1>raptor_prob2)
raptor_team2_correct<-filter(team2wins,raptor_prob1>raptor_prob2)
raptor_correct<-nrow(raptor_team1_correct)+nrow(raptor_team2_correct)
raptor_percentage_correct<-(raptor_correct/nrow(comparison))*100
cat("The percentage of when the Raptor rating system predicts correctly is",raptor_percentage_correct,"%")

## The percentage of when the Raptor rating system predicts correctly is 69.51324 %

Concluding Remarks

There are many factors to consider when designing a predictive model of NBA game outcomes. The Roster and individual player history will make significant impact on the Elo system versus the Raptor system and due to its volatile nature, the data scientist will have to make case by case decision on the appropriate model for the dataset collected. In this case, when analyzing the Raptor and Elo separately, the Elo has a higher percentage of correct predictions than Raptor. This may be a result of analyzing a dataset for 2020-2021 only and may have to do with the specific players in the roster for those years.