This project we will construct a neural network to analyse which factors go into player efficiency rating. Referred to as PER, is the sum of all of a player’s positive accomplishments and subtracts the negative accomplishments, and returns a per-minute rating of a player’s performance. A neural network is a set of connected input and output units in which each connection has a weight associated with it.
Using advanced player metrics, the inputs to be used are as follows
• Minutes Played
• True Shooting Percentage
• 3 pointer Rate
• Free throw Rate
• Offensive/ Defensive and Total Rebounding Percentage
• Assist Percentage
• Steal Percentage
• Block Percentage
• Turnover Percentage
• Usage Percentage
• Offensive/Defensive and Total Win Shares
• Win Shares Per 48
• Offensive/ Defensive Rebounds Per Minute
• Blocks Per Minute
• Value Over Replacement
Our only output is the following
• Player Efficiency Rating
During the construction of a neural network, the learning phase it adjusts the weights of each input to predict the correct class label of the given inputs. The basic structure of a neural networks consists on an input layer, any number of hidden layers, and an output layer.
The information processing units do not work in a linear manner. In fact, neural network draws its strength from parallel processing of information, which allows it to deal with non-linearity.
The formula for the linear weights using more traditional statistics for PER is as follows
[ (FGM x 85.910) + (Steals x 53.897) + (3PTM x 51.757) + (FTM x 46.845) + (Blocks x 39.190) + (Offensive_Reb x 39.190) + (Assists x 34.677) + (Defensive_Reb x 14.707) - (Foul x 17.174) - (FT_Miss x 20.091) - (FG_Miss x 39.190) - (TO x 53.897) ] x (1 / Minutes).
The goal of this project is to predict the PER using the advanced player metrics of all the players during the 2018-19 NBA season.
Using the ballr package in R, we will be able to extract advanced player metric data from basketballreference.com directly into the console.
Let’s view the advanced player metrics from the 2018-2019 NBA regular season
| rk | player | pos | age | tm | g | mp | per | tspercent | x3par | ftr | orbpercent | drbpercent | trbpercent | astpercent | stlpercent | blkpercent | tovpercent | usgpercent | x | ows | dws | ws | ws_48 | x_2 | obpm | dbpm | bpm | vorp | link |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Álex Abrines | SG | 25 | OKC | 31 | 588 | 6.3 | 0.507 | 0.809 | 0.083 | 0.9 | 7.8 | 4.2 | 4.3 | 1.3 | 0.9 | 7.9 | 12.2 | NA | 0.1 | 0.6 | 0.6 | 0.053 | NA | -3.7 | 0.4 | -3.3 | -0.2 | /players/a/abrinal01.html |
| 2 | Quincy Acy | PF | 28 | PHO | 10 | 123 | 2.9 | 0.379 | 0.833 | 0.556 | 2.7 | 20.1 | 11.3 | 8.2 | 0.4 | 2.7 | 15.2 | 9.2 | NA | -0.1 | 0.0 | -0.1 | -0.022 | NA | -7.6 | -0.5 | -8.1 | -0.2 | /players/a/acyqu01.html |
| 3 | Jaylen Adams | PG | 22 | ATL | 34 | 428 | 7.6 | 0.474 | 0.673 | 0.082 | 2.6 | 12.3 | 7.4 | 19.8 | 1.5 | 1.0 | 19.7 | 13.5 | NA | -0.1 | 0.2 | 0.1 | 0.011 | NA | -3.8 | -0.5 | -4.3 | -0.2 | /players/a/adamsja01.html |
| 4 | Steven Adams | C | 25 | OKC | 80 | 2669 | 18.5 | 0.591 | 0.002 | 0.361 | 14.7 | 14.8 | 14.7 | 6.6 | 2.0 | 2.4 | 12.6 | 16.4 | NA | 5.1 | 4.0 | 9.1 | 0.163 | NA | 0.7 | 0.4 | 1.1 | 2.1 | /players/a/adamsst01.html |
| 5 | Bam Adebayo | C | 21 | MIA | 82 | 1913 | 17.9 | 0.623 | 0.031 | 0.465 | 9.2 | 24.0 | 16.6 | 14.2 | 1.8 | 3.0 | 17.1 | 15.8 | NA | 3.4 | 3.4 | 6.8 | 0.171 | NA | -0.4 | 2.2 | 1.8 | 1.8 | /players/a/adebaba01.html |
| 6 | Deng Adel | SF | 21 | CLE | 19 | 194 | 2.7 | 0.424 | 0.639 | 0.111 | 1.6 | 9.6 | 5.4 | 3.4 | 0.3 | 1.8 | 13.7 | 9.9 | NA | -0.2 | 0.0 | -0.2 | -0.054 | NA | -6.0 | -1.6 | -7.5 | -0.3 | /players/a/adelde01.html |
Now let’s examine the structure of the data frame.
## 'data.frame': 708 obs. of 30 variables:
## $ rk : num 1 2 3 4 5 6 7 8 9 10 ...
## $ player : chr "Álex Abrines" "Quincy Acy" "Jaylen Adams" "Steven Adams" ...
## $ pos : chr "SG" "PF" "PG" "C" ...
## $ age : num 25 28 22 25 21 21 25 33 21 23 ...
## $ tm : chr "OKC" "PHO" "ATL" "OKC" ...
## $ g : num 31 10 34 80 82 19 7 81 10 38 ...
## $ mp : num 588 123 428 2669 1913 ...
## $ per : num 6.3 2.9 7.6 18.5 17.9 2.7 8.2 22.9 8.1 7.5 ...
## $ tspercent : num 0.507 0.379 0.474 0.591 0.623 0.424 0.322 0.576 0.418 0.516 ...
## $ x3par : num 0.809 0.833 0.673 0.002 0.031 0.639 0.4 0.032 0.308 0.556 ...
## $ ftr : num 0.083 0.556 0.082 0.361 0.465 0.111 0.2 0.312 0.308 0.337 ...
## $ orbpercent: num 0.9 2.7 2.6 14.7 9.2 1.6 4.9 10.3 9.9 0.8 ...
## $ drbpercent: num 7.8 20.1 12.3 14.8 24 9.6 14.8 19.8 13.7 5.1 ...
## $ trbpercent: num 4.2 11.3 7.4 14.7 16.6 5.4 9.8 15.1 11.8 3 ...
## $ astpercent: num 4.3 8.2 19.8 6.6 14.2 3.4 37.1 11.6 15.2 8.9 ...
## $ stlpercent: num 1.3 0.4 1.5 2 1.8 0.3 4.5 0.8 0.4 0.7 ...
## $ blkpercent: num 0.9 2.7 1 2.4 3 1.8 0 3.4 0 1.1 ...
## $ tovpercent: num 7.9 15.2 19.7 12.6 17.1 13.7 15.5 8.8 15.3 13.9 ...
## $ usgpercent: num 12.2 9.2 13.5 16.4 15.8 9.9 25 26.9 19 24.4 ...
## $ x : num NA NA NA NA NA NA NA NA NA NA ...
## $ ows : num 0.1 -0.1 -0.1 5.1 3.4 -0.2 -0.1 6.4 -0.1 -0.4 ...
## $ dws : num 0.6 0 0.2 4 3.4 0 0 2.9 0 0.4 ...
## $ ws : num 0.6 -0.1 0.1 9.1 6.8 -0.2 0 9.3 -0.1 0 ...
## $ ws_48 : num 0.053 -0.022 0.011 0.163 0.171 -0.054 -0.051 0.167 -0.042 0.002 ...
## $ x_2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ obpm : num -3.7 -7.6 -3.8 0.7 -0.4 -6 -7.9 2.4 -3.8 -4.2 ...
## $ dbpm : num 0.4 -0.5 -0.5 0.4 2.2 -1.6 2.1 -0.6 -3.5 -2.1 ...
## $ bpm : num -3.3 -8.1 -4.3 1.1 1.8 -7.5 -5.8 1.8 -7.3 -6.3 ...
## $ vorp : num -0.2 -0.2 -0.2 2.1 1.8 -0.3 0 2.6 -0.2 -0.5 ...
## $ link : chr "/players/a/abrinal01.html" "/players/a/acyqu01.html" "/players/a/adamsja01.html" "/players/a/adamsst01.html" ...
We will omit the following columns from our analysis for they do not provide any extra value
• rk
• player
• pos
• age
• tm
• g
• x
• x_2
• link
Here is the summary of the data set
| rk | player | pos | age | tm | g | mp | per | tspercent | x3par | ftr | orbpercent | drbpercent | trbpercent | astpercent | stlpercent | blkpercent | tovpercent | usgpercent | x | ows | dws | ws | ws_48 | x_2 | obpm | dbpm | bpm | vorp | link | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. : 1.0 | Length:708 | Length:708 | Min. :19.00 | Length:708 | Min. : 1.00 | Min. : 1.0 | Min. :-38.10 | Min. :0.0000 | Min. :0.0000 | Min. :0.0000 | Min. : 0.00 | Min. : 0.00 | Min. : 0.00 | Min. : 0.00 | Min. : 0.000 | Min. : 0.000 | Min. : 0.00 | Min. : 0.00 | Min. : NA | Min. :-2.800 | Min. :-0.500 | Min. :-1.700 | Min. :-0.94600 | Min. : NA | Min. :-52.400 | Min. :-31.1000 | Min. :-81.400 | Min. :-2.0000 | Length:708 | |
| 1st Qu.:137.8 | Class :character | Class :character | 1st Qu.:23.00 | Class :character | 1st Qu.:19.00 | 1st Qu.: 245.2 | 1st Qu.: 9.30 | 1st Qu.:0.5000 | 1st Qu.:0.2527 | 1st Qu.:0.1540 | 1st Qu.: 1.90 | 1st Qu.: 9.90 | 1st Qu.: 6.20 | 1st Qu.: 7.10 | 1st Qu.: 1.000 | 1st Qu.: 0.500 | 1st Qu.: 9.00 | 1st Qu.:15.00 | 1st Qu.: NA | 1st Qu.: 0.000 | 1st Qu.: 0.200 | 1st Qu.: 0.200 | 1st Qu.: 0.03200 | 1st Qu.: NA | 1st Qu.: -3.200 | 1st Qu.: -1.2000 | 1st Qu.: -3.700 | 1st Qu.:-0.1000 | Class :character | |
| Median :269.5 | Mode :character | Mode :character | Median :26.00 | Mode :character | Median :44.00 | Median : 788.0 | Median : 12.40 | Median :0.5440 | Median :0.3890 | Median :0.2250 | Median : 3.30 | Median :13.40 | Median : 8.70 | Median :10.60 | Median : 1.400 | Median : 1.200 | Median :11.50 | Median :17.80 | Median : NA | Median : 0.400 | Median : 0.650 | Median : 1.100 | Median : 0.07700 | Median : NA | Median : -1.300 | Median : -0.3000 | Median : -1.500 | Median : 0.0000 | Mode :character | |
| Mean :268.4 | NA | NA | Mean :26.14 | NA | Mean :42.88 | Mean : 972.3 | Mean : 12.75 | Mean :0.5315 | Mean :0.3801 | Mean :0.2493 | Mean : 4.98 | Mean :15.17 | Mean :10.07 | Mean :13.03 | Mean : 1.484 | Mean : 1.591 | Mean :12.02 | Mean :18.49 | Mean :NaN | Mean : 1.003 | Mean : 0.964 | Mean : 1.968 | Mean : 0.07156 | Mean :NaN | Mean : -1.488 | Mean : -0.3542 | Mean : -1.845 | Mean : 0.4458 | NA | |
| 3rd Qu.:398.2 | NA | NA | 3rd Qu.:29.00 | NA | 3rd Qu.:68.00 | 3rd Qu.:1579.5 | 3rd Qu.: 16.20 | 3rd Qu.:0.5810 | 3rd Qu.:0.5300 | 3rd Qu.:0.3115 | 3rd Qu.: 7.00 | 3rd Qu.:19.20 | 3rd Qu.:12.85 | 3rd Qu.:17.32 | 3rd Qu.: 1.825 | 3rd Qu.: 2.125 | 3rd Qu.:14.40 | 3rd Qu.:21.80 | 3rd Qu.: NA | 3rd Qu.: 1.500 | 3rd Qu.: 1.400 | 3rd Qu.: 2.800 | 3rd Qu.: 0.11900 | 3rd Qu.: NA | 3rd Qu.: 0.300 | 3rd Qu.: 0.6000 | 3rd Qu.: 0.500 | 3rd Qu.: 0.5000 | NA | |
| Max. :530.0 | NA | NA | Max. :42.00 | NA | Max. :82.00 | Max. :3028.0 | Max. : 80.40 | Max. :1.5000 | Max. :1.0000 | Max. :2.0000 | Max. :100.00 | Max. :90.30 | Max. :51.60 | Max. :73.40 | Max. :12.300 | Max. :14.800 | Max. :50.00 | Max. :47.20 | Max. : NA | Max. :11.400 | Max. : 5.900 | Max. :15.200 | Max. : 1.26100 | Max. : NA | Max. : 40.100 | Max. : 11.9000 | Max. : 52.000 | Max. : 9.3000 | NA | |
| NA | NA | NA | NA | NA | NA | NA | NA | NA’s :6 | NA’s :6 | NA’s :6 | NA | NA | NA | NA | NA | NA | NA’s :6 | NA | NA’s :708 | NA | NA | NA | NA | NA’s :708 | NA | NA | NA | NA | NA |
We will now create a data frame with only columns with numerical values.
| mp | per | tspercent | x3par | ftr | orbpercent | drbpercent | trbpercent | astpercent | stlpercent | blkpercent | tovpercent | usgpercent | ows | dws | ws | ws_48 | obpm | dbpm | bpm | vorp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 588 | 6.3 | 0.507 | 0.809 | 0.083 | 0.9 | 7.8 | 4.2 | 4.3 | 1.3 | 0.9 | 7.9 | 12.2 | 0.1 | 0.6 | 0.6 | 0.053 | -3.7 | 0.4 | -3.3 | -0.2 |
| 123 | 2.9 | 0.379 | 0.833 | 0.556 | 2.7 | 20.1 | 11.3 | 8.2 | 0.4 | 2.7 | 15.2 | 9.2 | -0.1 | 0.0 | -0.1 | -0.022 | -7.6 | -0.5 | -8.1 | -0.2 |
| 428 | 7.6 | 0.474 | 0.673 | 0.082 | 2.6 | 12.3 | 7.4 | 19.8 | 1.5 | 1.0 | 19.7 | 13.5 | -0.1 | 0.2 | 0.1 | 0.011 | -3.8 | -0.5 | -4.3 | -0.2 |
| 2669 | 18.5 | 0.591 | 0.002 | 0.361 | 14.7 | 14.8 | 14.7 | 6.6 | 2.0 | 2.4 | 12.6 | 16.4 | 5.1 | 4.0 | 9.1 | 0.163 | 0.7 | 0.4 | 1.1 | 2.1 |
| 1913 | 17.9 | 0.623 | 0.031 | 0.465 | 9.2 | 24.0 | 16.6 | 14.2 | 1.8 | 3.0 | 17.1 | 15.8 | 3.4 | 3.4 | 6.8 | 0.171 | -0.4 | 2.2 | 1.8 | 1.8 |
| 194 | 2.7 | 0.424 | 0.639 | 0.111 | 1.6 | 9.6 | 5.4 | 3.4 | 0.3 | 1.8 | 13.7 | 9.9 | -0.2 | 0.0 | -0.2 | -0.054 | -6.0 | -1.6 | -7.5 | -0.3 |
The objective of our neural network is to predict PER based on advanced metrics as our dependent variables. We will divide the data into training and test sets. The training set is used to find the relationship between dependent and our independent variable, PER, while the test set assesses the performance of the model. We will use 60% of the data set as training set. The assignment of the data to training and test set is done using random sampling, while also using the index variable while fitting neural network to create training and test data sets.
We will fit a neural network on our data using the neuralnet library. The first step is to scale our data set, this is essential because otherwise a variable may have large impact on the prediction variable leading to meaningless results. We will use min-max normalization in order to scale our data.
Below are the maximum values of our data set
## mp per tspercent x3par ftr orbpercent drbpercent
## 3028.000 80.400 1.500 1.000 2.000 100.000 90.300
## trbpercent astpercent stlpercent blkpercent tovpercent usgpercent ows
## 51.600 73.400 12.300 14.800 50.000 47.200 11.400
## dws ws ws_48 obpm dbpm bpm vorp
## 5.900 15.200 1.261 40.100 11.900 52.000 9.300
Below are the minimum values of our data set
## mp per tspercent x3par ftr orbpercent drbpercent
## 1.000 -38.100 0.000 0.000 0.000 0.000 0.000
## trbpercent astpercent stlpercent blkpercent tovpercent usgpercent ows
## 0.000 0.000 0.000 0.000 0.000 5.700 -2.800
## dws ws ws_48 obpm dbpm bpm vorp
## -0.500 -1.700 -0.946 -52.400 -31.100 -81.400 -2.000
Here are the first 6 entries of our new scaled data frame
| mp | per | tspercent | x3par | ftr | orbpercent | drbpercent | trbpercent | astpercent | stlpercent | blkpercent | tovpercent | usgpercent | ows | dws | ws | ws_48 | obpm | dbpm | bpm | vorp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.1939214 | 0.3746835 | 0.3380000 | 0.809 | 0.0415 | 0.009 | 0.0863787 | 0.0813953 | 0.0585831 | 0.1056911 | 0.0608108 | 0.158 | 0.1566265 | 0.2042254 | 0.171875 | 0.1360947 | 0.4526507 | 0.5264865 | 0.7325581 | 0.5854573 | 0.1592920 |
| 0.0403039 | 0.3459916 | 0.2526667 | 0.833 | 0.2780 | 0.027 | 0.2225914 | 0.2189922 | 0.1117166 | 0.0325203 | 0.1824324 | 0.304 | 0.0843373 | 0.1901408 | 0.078125 | 0.0946746 | 0.4186679 | 0.4843243 | 0.7116279 | 0.5494753 | 0.1592920 |
| 0.1410638 | 0.3856540 | 0.3160000 | 0.673 | 0.0410 | 0.026 | 0.1362126 | 0.1434109 | 0.2697548 | 0.1219512 | 0.0675676 | 0.394 | 0.1879518 | 0.1901408 | 0.109375 | 0.1065089 | 0.4336203 | 0.5254054 | 0.7116279 | 0.5779610 | 0.1592920 |
| 0.8814007 | 0.4776371 | 0.3940000 | 0.002 | 0.1805 | 0.147 | 0.1638981 | 0.2848837 | 0.0899183 | 0.1626016 | 0.1621622 | 0.252 | 0.2578313 | 0.5563380 | 0.703125 | 0.6390533 | 0.5024921 | 0.5740541 | 0.7325581 | 0.6184408 | 0.3628319 |
| 0.6316485 | 0.4725738 | 0.4153333 | 0.031 | 0.2325 | 0.092 | 0.2657807 | 0.3217054 | 0.1934605 | 0.1463415 | 0.2027027 | 0.342 | 0.2433735 | 0.4366197 | 0.609375 | 0.5029586 | 0.5061169 | 0.5621622 | 0.7744186 | 0.6236882 | 0.3362832 |
| 0.0637595 | 0.3443038 | 0.2826667 | 0.639 | 0.0555 | 0.016 | 0.1063123 | 0.1046512 | 0.0463215 | 0.0243902 | 0.1216216 | 0.274 | 0.1012048 | 0.1830986 | 0.078125 | 0.0887574 | 0.4041686 | 0.5016216 | 0.6860465 | 0.5539730 | 0.1504425 |
The scaled data is used to fit the neural network. We visualize the neural network with weights for each of the variable.
The formula used for the construction of the neural network is as follows
## per ~ mp + tspercent + x3par + ftr + orbpercent + drbpercent +
## trbpercent + astpercent + stlpercent + blkpercent + tovpercent +
## usgpercent + ows + dws + ws + ws_48 + obpm + dbpm + bpm +
## vorp
Now we can plot out the neural network as follows
The black lines indicate the connections between each layer and the weights on each connection while the blue lines show the bias term added in each step. Neural nets are essentially a black box so thus there isn’t much to infer about the fitting, the weights and the model values. As defined before we can see which statistical categories have a negative or positive effect on PER.
Here is another plot of the same neural network
In this plot, the inputs are labelled as \(I_{n}\), the bias terms as \(B_{n}\), the hidden layers as \(H_{n}\) and the output as \(O_{1}\). The black lines indicate a positive values in the weight of the connection, the grey lines indicate a negative value.
Here is a list of the positive inputs of our neural network
• Minutes Played
. • True Shooting Percentage
• 3 pointer Rate
• Free throw Rate
• Offensive/ Defensive Percentage
• Assist Percentage
• Turnover Percentage
• Offensive Win Shares
• Win Shares Per 48
• Offensive Rebounds Per Minute
• Value Over Replacement
Here is a list of the negative inputs of our neural network
• Total Rebounding Percentage
• Steal Percentage
• Block Percentage
• Usage Percentage
• Defensive and Total Win Shares
• Defensive Rebounds Per Minute
• Blocks Per Minute
Not necessarily the best indicators on what to value but clearly there is more importance on offensive categories rather than defensive categories when it comes to PER.
The training algorithm has converged and therefore this model can be used to predict Player Efficiency Rating.
Now we can try to predict the values for the test set and calculate the MSE. Need to scale back the values in order to make a meaningful comparison.
Here is the list of the structure of the values
## List of 2
## $ neurons :List of 2
## ..$ : num [1:281, 1:21] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:281] "1" "3" "5" "7" ...
## .. .. ..$ : chr [1:21] "" "mp" "per" "tspercent" ...
## ..$ : num [1:281, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:281] "1" "3" "5" "7" ...
## .. .. ..$ : NULL
## $ net.result: num [1:281, 1] 0.41 0.463 0.572 0.532 0.527 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:281] "1" "3" "5" "7" ...
## .. ..$ : NULL
We can compare the predicted rating with real rating using the following visualization
Now let’s visualize the error in our plot in this data frame
## advtest.r true.predictions
## 1 0.3746835 0.4097333
## 3 0.3856540 0.4631767
## 5 0.4725738 0.5723588
## 7 0.3907173 0.5321029
## 12 0.4582278 0.5266487
## 16 0.3603376 0.4312061
Now we can use data from any season or team specific to predicted and classify PER.