Introduction

This project we will construct a neural network to analyse which factors go into player efficiency rating. Referred to as PER, is the sum of all of a player’s positive accomplishments and subtracts the negative accomplishments, and returns a per-minute rating of a player’s performance. A neural network is a set of connected input and output units in which each connection has a weight associated with it.

Using advanced player metrics, the inputs to be used are as follows

• Minutes Played

• True Shooting Percentage

• 3 pointer Rate

• Free throw Rate

• Offensive/ Defensive and Total Rebounding Percentage

• Assist Percentage

• Steal Percentage

• Block Percentage

• Turnover Percentage

• Usage Percentage

• Offensive/Defensive and Total Win Shares

• Win Shares Per 48

• Offensive/ Defensive Rebounds Per Minute

• Blocks Per Minute

• Value Over Replacement

Our only output is the following

• Player Efficiency Rating

During the construction of a neural network, the learning phase it adjusts the weights of each input to predict the correct class label of the given inputs. The basic structure of a neural networks consists on an input layer, any number of hidden layers, and an output layer.

The information processing units do not work in a linear manner. In fact, neural network draws its strength from parallel processing of information, which allows it to deal with non-linearity.

The formula for the linear weights using more traditional statistics for PER is as follows

[ (FGM x 85.910) + (Steals x 53.897) + (3PTM x 51.757) + (FTM x 46.845) + (Blocks x 39.190) + (Offensive_Reb x 39.190) + (Assists x 34.677) + (Defensive_Reb x 14.707) - (Foul x 17.174) - (FT_Miss x 20.091) - (FG_Miss x 39.190) - (TO x 53.897) ] x (1 / Minutes).

The goal of this project is to predict the PER using the advanced player metrics of all the players during the 2018-19 NBA season.

NBA Statistics in R

Using the ballr package in R, we will be able to extract advanced player metric data from basketballreference.com directly into the console.

Let’s view the advanced player metrics from the 2018-2019 NBA regular season

rk player pos age tm g mp per tspercent x3par ftr orbpercent drbpercent trbpercent astpercent stlpercent blkpercent tovpercent usgpercent x ows dws ws ws_48 x_2 obpm dbpm bpm vorp link
1 Álex Abrines SG 25 OKC 31 588 6.3 0.507 0.809 0.083 0.9 7.8 4.2 4.3 1.3 0.9 7.9 12.2 NA 0.1 0.6 0.6 0.053 NA -3.7 0.4 -3.3 -0.2 /players/a/abrinal01.html
2 Quincy Acy PF 28 PHO 10 123 2.9 0.379 0.833 0.556 2.7 20.1 11.3 8.2 0.4 2.7 15.2 9.2 NA -0.1 0.0 -0.1 -0.022 NA -7.6 -0.5 -8.1 -0.2 /players/a/acyqu01.html
3 Jaylen Adams PG 22 ATL 34 428 7.6 0.474 0.673 0.082 2.6 12.3 7.4 19.8 1.5 1.0 19.7 13.5 NA -0.1 0.2 0.1 0.011 NA -3.8 -0.5 -4.3 -0.2 /players/a/adamsja01.html
4 Steven Adams C 25 OKC 80 2669 18.5 0.591 0.002 0.361 14.7 14.8 14.7 6.6 2.0 2.4 12.6 16.4 NA 5.1 4.0 9.1 0.163 NA 0.7 0.4 1.1 2.1 /players/a/adamsst01.html
5 Bam Adebayo C 21 MIA 82 1913 17.9 0.623 0.031 0.465 9.2 24.0 16.6 14.2 1.8 3.0 17.1 15.8 NA 3.4 3.4 6.8 0.171 NA -0.4 2.2 1.8 1.8 /players/a/adebaba01.html
6 Deng Adel SF 21 CLE 19 194 2.7 0.424 0.639 0.111 1.6 9.6 5.4 3.4 0.3 1.8 13.7 9.9 NA -0.2 0.0 -0.2 -0.054 NA -6.0 -1.6 -7.5 -0.3 /players/a/adelde01.html

Now let’s examine the structure of the data frame.

## 'data.frame':    708 obs. of  30 variables:
##  $ rk        : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ player    : chr  "Álex Abrines" "Quincy Acy" "Jaylen Adams" "Steven Adams" ...
##  $ pos       : chr  "SG" "PF" "PG" "C" ...
##  $ age       : num  25 28 22 25 21 21 25 33 21 23 ...
##  $ tm        : chr  "OKC" "PHO" "ATL" "OKC" ...
##  $ g         : num  31 10 34 80 82 19 7 81 10 38 ...
##  $ mp        : num  588 123 428 2669 1913 ...
##  $ per       : num  6.3 2.9 7.6 18.5 17.9 2.7 8.2 22.9 8.1 7.5 ...
##  $ tspercent : num  0.507 0.379 0.474 0.591 0.623 0.424 0.322 0.576 0.418 0.516 ...
##  $ x3par     : num  0.809 0.833 0.673 0.002 0.031 0.639 0.4 0.032 0.308 0.556 ...
##  $ ftr       : num  0.083 0.556 0.082 0.361 0.465 0.111 0.2 0.312 0.308 0.337 ...
##  $ orbpercent: num  0.9 2.7 2.6 14.7 9.2 1.6 4.9 10.3 9.9 0.8 ...
##  $ drbpercent: num  7.8 20.1 12.3 14.8 24 9.6 14.8 19.8 13.7 5.1 ...
##  $ trbpercent: num  4.2 11.3 7.4 14.7 16.6 5.4 9.8 15.1 11.8 3 ...
##  $ astpercent: num  4.3 8.2 19.8 6.6 14.2 3.4 37.1 11.6 15.2 8.9 ...
##  $ stlpercent: num  1.3 0.4 1.5 2 1.8 0.3 4.5 0.8 0.4 0.7 ...
##  $ blkpercent: num  0.9 2.7 1 2.4 3 1.8 0 3.4 0 1.1 ...
##  $ tovpercent: num  7.9 15.2 19.7 12.6 17.1 13.7 15.5 8.8 15.3 13.9 ...
##  $ usgpercent: num  12.2 9.2 13.5 16.4 15.8 9.9 25 26.9 19 24.4 ...
##  $ x         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ows       : num  0.1 -0.1 -0.1 5.1 3.4 -0.2 -0.1 6.4 -0.1 -0.4 ...
##  $ dws       : num  0.6 0 0.2 4 3.4 0 0 2.9 0 0.4 ...
##  $ ws        : num  0.6 -0.1 0.1 9.1 6.8 -0.2 0 9.3 -0.1 0 ...
##  $ ws_48     : num  0.053 -0.022 0.011 0.163 0.171 -0.054 -0.051 0.167 -0.042 0.002 ...
##  $ x_2       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ obpm      : num  -3.7 -7.6 -3.8 0.7 -0.4 -6 -7.9 2.4 -3.8 -4.2 ...
##  $ dbpm      : num  0.4 -0.5 -0.5 0.4 2.2 -1.6 2.1 -0.6 -3.5 -2.1 ...
##  $ bpm       : num  -3.3 -8.1 -4.3 1.1 1.8 -7.5 -5.8 1.8 -7.3 -6.3 ...
##  $ vorp      : num  -0.2 -0.2 -0.2 2.1 1.8 -0.3 0 2.6 -0.2 -0.5 ...
##  $ link      : chr  "/players/a/abrinal01.html" "/players/a/acyqu01.html" "/players/a/adamsja01.html" "/players/a/adamsst01.html" ...

We will omit the following columns from our analysis for they do not provide any extra value

• rk

• player

• pos

• age

• tm

• g

• x

• x_2

• link

Here is the summary of the data set

rk player pos age tm g mp per tspercent x3par ftr orbpercent drbpercent trbpercent astpercent stlpercent blkpercent tovpercent usgpercent x ows dws ws ws_48 x_2 obpm dbpm bpm vorp link
Min. : 1.0 Length:708 Length:708 Min. :19.00 Length:708 Min. : 1.00 Min. : 1.0 Min. :-38.10 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.00 Min. : NA Min. :-2.800 Min. :-0.500 Min. :-1.700 Min. :-0.94600 Min. : NA Min. :-52.400 Min. :-31.1000 Min. :-81.400 Min. :-2.0000 Length:708
1st Qu.:137.8 Class :character Class :character 1st Qu.:23.00 Class :character 1st Qu.:19.00 1st Qu.: 245.2 1st Qu.: 9.30 1st Qu.:0.5000 1st Qu.:0.2527 1st Qu.:0.1540 1st Qu.: 1.90 1st Qu.: 9.90 1st Qu.: 6.20 1st Qu.: 7.10 1st Qu.: 1.000 1st Qu.: 0.500 1st Qu.: 9.00 1st Qu.:15.00 1st Qu.: NA 1st Qu.: 0.000 1st Qu.: 0.200 1st Qu.: 0.200 1st Qu.: 0.03200 1st Qu.: NA 1st Qu.: -3.200 1st Qu.: -1.2000 1st Qu.: -3.700 1st Qu.:-0.1000 Class :character
Median :269.5 Mode :character Mode :character Median :26.00 Mode :character Median :44.00 Median : 788.0 Median : 12.40 Median :0.5440 Median :0.3890 Median :0.2250 Median : 3.30 Median :13.40 Median : 8.70 Median :10.60 Median : 1.400 Median : 1.200 Median :11.50 Median :17.80 Median : NA Median : 0.400 Median : 0.650 Median : 1.100 Median : 0.07700 Median : NA Median : -1.300 Median : -0.3000 Median : -1.500 Median : 0.0000 Mode :character
Mean :268.4 NA NA Mean :26.14 NA Mean :42.88 Mean : 972.3 Mean : 12.75 Mean :0.5315 Mean :0.3801 Mean :0.2493 Mean : 4.98 Mean :15.17 Mean :10.07 Mean :13.03 Mean : 1.484 Mean : 1.591 Mean :12.02 Mean :18.49 Mean :NaN Mean : 1.003 Mean : 0.964 Mean : 1.968 Mean : 0.07156 Mean :NaN Mean : -1.488 Mean : -0.3542 Mean : -1.845 Mean : 0.4458 NA
3rd Qu.:398.2 NA NA 3rd Qu.:29.00 NA 3rd Qu.:68.00 3rd Qu.:1579.5 3rd Qu.: 16.20 3rd Qu.:0.5810 3rd Qu.:0.5300 3rd Qu.:0.3115 3rd Qu.: 7.00 3rd Qu.:19.20 3rd Qu.:12.85 3rd Qu.:17.32 3rd Qu.: 1.825 3rd Qu.: 2.125 3rd Qu.:14.40 3rd Qu.:21.80 3rd Qu.: NA 3rd Qu.: 1.500 3rd Qu.: 1.400 3rd Qu.: 2.800 3rd Qu.: 0.11900 3rd Qu.: NA 3rd Qu.: 0.300 3rd Qu.: 0.6000 3rd Qu.: 0.500 3rd Qu.: 0.5000 NA
Max. :530.0 NA NA Max. :42.00 NA Max. :82.00 Max. :3028.0 Max. : 80.40 Max. :1.5000 Max. :1.0000 Max. :2.0000 Max. :100.00 Max. :90.30 Max. :51.60 Max. :73.40 Max. :12.300 Max. :14.800 Max. :50.00 Max. :47.20 Max. : NA Max. :11.400 Max. : 5.900 Max. :15.200 Max. : 1.26100 Max. : NA Max. : 40.100 Max. : 11.9000 Max. : 52.000 Max. : 9.3000 NA
NA NA NA NA NA NA NA NA NA’s :6 NA’s :6 NA’s :6 NA NA NA NA NA NA NA’s :6 NA NA’s :708 NA NA NA NA NA’s :708 NA NA NA NA NA

We will now create a data frame with only columns with numerical values.

mp per tspercent x3par ftr orbpercent drbpercent trbpercent astpercent stlpercent blkpercent tovpercent usgpercent ows dws ws ws_48 obpm dbpm bpm vorp
588 6.3 0.507 0.809 0.083 0.9 7.8 4.2 4.3 1.3 0.9 7.9 12.2 0.1 0.6 0.6 0.053 -3.7 0.4 -3.3 -0.2
123 2.9 0.379 0.833 0.556 2.7 20.1 11.3 8.2 0.4 2.7 15.2 9.2 -0.1 0.0 -0.1 -0.022 -7.6 -0.5 -8.1 -0.2
428 7.6 0.474 0.673 0.082 2.6 12.3 7.4 19.8 1.5 1.0 19.7 13.5 -0.1 0.2 0.1 0.011 -3.8 -0.5 -4.3 -0.2
2669 18.5 0.591 0.002 0.361 14.7 14.8 14.7 6.6 2.0 2.4 12.6 16.4 5.1 4.0 9.1 0.163 0.7 0.4 1.1 2.1
1913 17.9 0.623 0.031 0.465 9.2 24.0 16.6 14.2 1.8 3.0 17.1 15.8 3.4 3.4 6.8 0.171 -0.4 2.2 1.8 1.8
194 2.7 0.424 0.639 0.111 1.6 9.6 5.4 3.4 0.3 1.8 13.7 9.9 -0.2 0.0 -0.2 -0.054 -6.0 -1.6 -7.5 -0.3

Fitting the Neural Network

The objective of our neural network is to predict PER based on advanced metrics as our dependent variables. We will divide the data into training and test sets. The training set is used to find the relationship between dependent and our independent variable, PER, while the test set assesses the performance of the model. We will use 60% of the data set as training set. The assignment of the data to training and test set is done using random sampling, while also using the index variable while fitting neural network to create training and test data sets.

We will fit a neural network on our data using the neuralnet library. The first step is to scale our data set, this is essential because otherwise a variable may have large impact on the prediction variable leading to meaningless results. We will use min-max normalization in order to scale our data.

Below are the maximum values of our data set

##         mp        per  tspercent      x3par        ftr orbpercent drbpercent 
##   3028.000     80.400      1.500      1.000      2.000    100.000     90.300 
## trbpercent astpercent stlpercent blkpercent tovpercent usgpercent        ows 
##     51.600     73.400     12.300     14.800     50.000     47.200     11.400 
##        dws         ws      ws_48       obpm       dbpm        bpm       vorp 
##      5.900     15.200      1.261     40.100     11.900     52.000      9.300

Below are the minimum values of our data set

##         mp        per  tspercent      x3par        ftr orbpercent drbpercent 
##      1.000    -38.100      0.000      0.000      0.000      0.000      0.000 
## trbpercent astpercent stlpercent blkpercent tovpercent usgpercent        ows 
##      0.000      0.000      0.000      0.000      0.000      5.700     -2.800 
##        dws         ws      ws_48       obpm       dbpm        bpm       vorp 
##     -0.500     -1.700     -0.946    -52.400    -31.100    -81.400     -2.000

Here are the first 6 entries of our new scaled data frame

mp per tspercent x3par ftr orbpercent drbpercent trbpercent astpercent stlpercent blkpercent tovpercent usgpercent ows dws ws ws_48 obpm dbpm bpm vorp
0.1939214 0.3746835 0.3380000 0.809 0.0415 0.009 0.0863787 0.0813953 0.0585831 0.1056911 0.0608108 0.158 0.1566265 0.2042254 0.171875 0.1360947 0.4526507 0.5264865 0.7325581 0.5854573 0.1592920
0.0403039 0.3459916 0.2526667 0.833 0.2780 0.027 0.2225914 0.2189922 0.1117166 0.0325203 0.1824324 0.304 0.0843373 0.1901408 0.078125 0.0946746 0.4186679 0.4843243 0.7116279 0.5494753 0.1592920
0.1410638 0.3856540 0.3160000 0.673 0.0410 0.026 0.1362126 0.1434109 0.2697548 0.1219512 0.0675676 0.394 0.1879518 0.1901408 0.109375 0.1065089 0.4336203 0.5254054 0.7116279 0.5779610 0.1592920
0.8814007 0.4776371 0.3940000 0.002 0.1805 0.147 0.1638981 0.2848837 0.0899183 0.1626016 0.1621622 0.252 0.2578313 0.5563380 0.703125 0.6390533 0.5024921 0.5740541 0.7325581 0.6184408 0.3628319
0.6316485 0.4725738 0.4153333 0.031 0.2325 0.092 0.2657807 0.3217054 0.1934605 0.1463415 0.2027027 0.342 0.2433735 0.4366197 0.609375 0.5029586 0.5061169 0.5621622 0.7744186 0.6236882 0.3362832
0.0637595 0.3443038 0.2826667 0.639 0.0555 0.016 0.1063123 0.1046512 0.0463215 0.0243902 0.1216216 0.274 0.1012048 0.1830986 0.078125 0.0887574 0.4041686 0.5016216 0.6860465 0.5539730 0.1504425

Neural Net Visualization

The scaled data is used to fit the neural network. We visualize the neural network with weights for each of the variable.

The formula used for the construction of the neural network is as follows

## per ~ mp + tspercent + x3par + ftr + orbpercent + drbpercent + 
##     trbpercent + astpercent + stlpercent + blkpercent + tovpercent + 
##     usgpercent + ows + dws + ws + ws_48 + obpm + dbpm + bpm + 
##     vorp

Now we can plot out the neural network as follows

The black lines indicate the connections between each layer and the weights on each connection while the blue lines show the bias term added in each step. Neural nets are essentially a black box so thus there isn’t much to infer about the fitting, the weights and the model values. As defined before we can see which statistical categories have a negative or positive effect on PER.

Here is another plot of the same neural network

In this plot, the inputs are labelled as \(I_{n}\), the bias terms as \(B_{n}\), the hidden layers as \(H_{n}\) and the output as \(O_{1}\). The black lines indicate a positive values in the weight of the connection, the grey lines indicate a negative value.

Here is a list of the positive inputs of our neural network

• Minutes Played
. • True Shooting Percentage

• 3 pointer Rate

• Free throw Rate

• Offensive/ Defensive Percentage

• Assist Percentage

• Turnover Percentage

• Offensive Win Shares

• Win Shares Per 48

• Offensive Rebounds Per Minute

• Value Over Replacement

Here is a list of the negative inputs of our neural network

• Total Rebounding Percentage

• Steal Percentage

• Block Percentage

• Usage Percentage

• Defensive and Total Win Shares

• Defensive Rebounds Per Minute

• Blocks Per Minute

Not necessarily the best indicators on what to value but clearly there is more importance on offensive categories rather than defensive categories when it comes to PER.

The training algorithm has converged and therefore this model can be used to predict Player Efficiency Rating.

Predictions using the Model

Now we can try to predict the values for the test set and calculate the MSE. Need to scale back the values in order to make a meaningful comparison.

Here is the list of the structure of the values

## List of 2
##  $ neurons   :List of 2
##   ..$ : num [1:281, 1:21] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:281] "1" "3" "5" "7" ...
##   .. .. ..$ : chr [1:21] "" "mp" "per" "tspercent" ...
##   ..$ : num [1:281, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:281] "1" "3" "5" "7" ...
##   .. .. ..$ : NULL
##  $ net.result: num [1:281, 1] 0.41 0.463 0.572 0.532 0.527 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:281] "1" "3" "5" "7" ...
##   .. ..$ : NULL

We can compare the predicted rating with real rating using the following visualization

Now let’s visualize the error in our plot in this data frame

##    advtest.r true.predictions
## 1  0.3746835        0.4097333
## 3  0.3856540        0.4631767
## 5  0.4725738        0.5723588
## 7  0.3907173        0.5321029
## 12 0.4582278        0.5266487
## 16 0.3603376        0.4312061

Now we can use data from any season or team specific to predicted and classify PER.