DATA 607 Final Project Presentation

May 13, 2020

Introduction

Goal: Analyze individual player advanced statistics as a predictor of game success.

Step 1: Evaluate player contribution impact on team’s single game success

Step 2: Identify player advanced statistic to use as predictor of team’s game outcome

Step 3: Attempt machine learning model based on player statistics and game features

Data Sources

Basketball-reference.com:
- https://www.basketball-reference.com/ (screen scraping)
- Player advanced statistics by team roster
Ball Don’t Lie API:
- https://www.balldontlie.io/#get-all-stats (API)
- Full team schedule of game results
- Each player game stats for 2013 season
NBA Elo data:
- https://github.com/fivethirtyeight/data/tree/master/nba-elo (CSV)
- Elo rating for each team for every game

Challenge

Goal: Complete automated flow

Result: Piecemeal approach

Data acquisition:

API throttling
Data across sources not synced or incomplete
Matching on strings difficult (accent marks, splitting)

Data transformation:

Calculations
Data type coercion

Data: NBA season 2012-2013

Player Participation

Load management demonstrates Spurs fare better when star players participate in game.

Advanced Statistics

Stats calculated for each Spurs game of 2012-13 NBA season

Red - PER; Green - VORP; Blue - WS48; Purple - BPM

PER: player efficiency rating VORP: value over replacement player WS48: win shares per 48 minutes BPM: box plus/minus

Dataset for ML

Population: Every regular season game of the 2012-13 NBA season

Target variable: Point differential from perspective of better-rated team

Features:

high_elo: Elo rating of better rated team before game
high_per: Cumulative PER based on players’ minutes from game
high_vorp: Cumulative VORP based on players’ minutes from game
high_bpm: Cumulative BPM based on players’ minutes from game
high_ws48: Cumulative Win Shares per 48 based on players’ minutes from game
high_days: Days since previous game of better rated team
high_home: Game location of better rated team (1: H; 0: A)
low_elo: Elo rating of lesser rated team before game
low_per: Cumulative PER based on players’ minutes from game
low_vorp: Cumulative VORP based on players’ minutes from game
low_bpm: Cumulative BPM based on players’ minutes from game
low_ws48: Cumulative Win Shares per 48 based on players’ minutes from game
low_days: Days since previous game of lesser rated team

Keras Code Snippet

Code chunk highlights the Keras API

library(keras)
# Build model
model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 14, activation = 'relu', kernel_initializer='RandomNormal', 
              input_shape = c(13)) %>% 
  # Training data samples/Factor * (Input Neurons + Output Neurons)
  layer_dense(units = 65, activation = 'relu') %>%
  layer_dense(units = 1, activation = 'linear')

# Train model
model %>% compile( loss = 'mean_squared_error', optimizer = 'adam', 
                   metrics = c('mae') )

history <- model %>% fit( X_train, y_train, epochs = 30, 
                          batch_size = 50, validation_split = 0.2 )

model %>% evaluate(X_val, y_val)

# Predictions
pred <- data.frame(y = predict(model, as.matrix(X_val)))

Source: towardsdatascience.com/keras-and-r-predicting-blood-glucose-levels-with-the-sequential-model-596efe89a6b8

ML Predictions

Measuring ML Predictions

Measure predictions with mean absolute percentage error regression loss. (Hint: Lower is better.)

All features: 1.179060
Suggested features: 1.110837
Personal features: 1.054806
All features but Elo: 1.158007
Linear Formula: 1.002978

Formula:
Points_Differential = -2.124987 +
(0.031464 * Elo_Difference) +
(5.696513 * Home)

Conclusion

Goal: Advanced statistics as predictor of game success - No

Step 1: Star player contribution does impact team success

Step 2: No clear advantage in the advanced statistic as predictor

Note: BPM met significance threshold for high and low team

Step 3: NBA game predictions did not improve with ML model

Next steps:

Larger training size (more seasons)
Bin approach to game outcome instead of continuous
Simulate an average (or replacement) team