May 13, 2020

Introduction

Goal: Analyze individual player advanced statistics as a predictor of game success.

Step 1: Evaluate player contribution impact on team’s single game success

Step 2: Identify player advanced statistic to use as predictor of team’s game outcome

Step 3: Attempt machine learning model based on player statistics and game features

Data Sources

Challenge

Goal: Complete automated flow

Result: Piecemeal approach

Data acquisition:

  • API throttling
  • Data across sources not synced or incomplete
  • Matching on strings difficult (accent marks, splitting)

Data transformation:

  • Calculations
  • Data type coercion

Data: NBA season 2012-2013

Player Participation

Load management demonstrates Spurs fare better when star players participate in game.

Advanced Statistics

Stats calculated for each Spurs game of 2012-13 NBA season

Red - PER; Green - VORP; Blue - WS48; Purple - BPM

Red - PER; Green - VORP; Blue - WS48; Purple - BPM

PER: player efficiency rating VORP: value over replacement player WS48: win shares per 48 minutes BPM: box plus/minus

Dataset for ML

Population: Every regular season game of the 2012-13 NBA season

Target variable: Point differential from perspective of better-rated team

Features:

  • high_elo: Elo rating of better rated team before game

  • high_per: Cumulative PER based on players’ minutes from game

  • high_vorp: Cumulative VORP based on players’ minutes from game

  • high_bpm: Cumulative BPM based on players’ minutes from game

  • high_ws48: Cumulative Win Shares per 48 based on players’ minutes from game

  • high_days: Days since previous game of better rated team

  • high_home: Game location of better rated team (1: H; 0: A)

  • low_elo: Elo rating of lesser rated team before game

  • low_per: Cumulative PER based on players’ minutes from game

  • low_vorp: Cumulative VORP based on players’ minutes from game

  • low_bpm: Cumulative BPM based on players’ minutes from game

  • low_ws48: Cumulative Win Shares per 48 based on players’ minutes from game

  • low_days: Days since previous game of lesser rated team

Keras Code Snippet

Code chunk highlights the Keras API

library(keras)
# Build model
model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 14, activation = 'relu', kernel_initializer='RandomNormal', 
              input_shape = c(13)) %>% 
  # Training data samples/Factor * (Input Neurons + Output Neurons)
  layer_dense(units = 65, activation = 'relu') %>%
  layer_dense(units = 1, activation = 'linear')

# Train model
model %>% compile( loss = 'mean_squared_error', optimizer = 'adam', 
                   metrics = c('mae') )

history <- model %>% fit( X_train, y_train, epochs = 30, 
                          batch_size = 50, validation_split = 0.2 )

model %>% evaluate(X_val, y_val)

# Predictions
pred <- data.frame(y = predict(model, as.matrix(X_val)))

Source: towardsdatascience.com/keras-and-r-predicting-blood-glucose-levels-with-the-sequential-model-596efe89a6b8

ML Predictions

Measuring ML Predictions

Measure predictions with mean absolute percentage error regression loss. (Hint: Lower is better.)

  • All features: 1.179060

  • Suggested features: 1.110837

  • Personal features: 1.054806

  • All features but Elo: 1.158007

  • Linear Formula: 1.002978

Formula:
Points_Differential = -2.124987 +
(0.031464 * Elo_Difference) +
(5.696513 * Home)

Conclusion

Goal: Advanced statistics as predictor of game success - No

Step 1: Star player contribution does impact team success

Step 2: No clear advantage in the advanced statistic as predictor

  • Note: BPM met significance threshold for high and low team

Step 3: NBA game predictions did not improve with ML model

Next steps:

  • Larger training size (more seasons)
  • Bin approach to game outcome instead of continuous
  • Simulate an average (or replacement) team