Neural Network on PER

Introduction

This project we will construct a neural network to analyse which factors go into player efficiency rating. Referred to as PER, is the sum of all of a player’s positive accomplishments and subtracts the negative accomplishments, and returns a per-minute rating of a player’s performance. A neural network is a set of connected input and output units in which each connection has a weight associated with it.

Using advanced player metrics, the inputs to be used are as follows

• Minutes Played

• True Shooting Percentage

• 3 pointer Rate

• Free throw Rate

• Offensive/ Defensive and Total Rebounding Percentage

• Assist Percentage

• Steal Percentage

• Block Percentage

• Turnover Percentage

• Usage Percentage

• Offensive/Defensive and Total Win Shares

• Win Shares Per 48

• Offensive/ Defensive Rebounds Per Minute

• Blocks Per Minute

• Value Over Replacement

Our only output is the following

• Player Efficiency Rating

During the construction of a neural network, the learning phase it adjusts the weights of each input to predict the correct class label of the given inputs. The basic structure of a neural networks consists on an input layer, any number of hidden layers, and an output layer.

The information processing units do not work in a linear manner. In fact, neural network draws its strength from parallel processing of information, which allows it to deal with non-linearity.

The formula for the linear weights using more traditional statistics for PER is as follows

[ (FGM x 85.910) + (Steals x 53.897) + (3PTM x 51.757) + (FTM x 46.845) + (Blocks x 39.190) + (Offensive_Reb x 39.190) + (Assists x 34.677) + (Defensive_Reb x 14.707) - (Foul x 17.174) - (FT_Miss x 20.091) - (FG_Miss x 39.190) - (TO x 53.897) ] x (1 / Minutes).

The goal of this project is to predict the PER using the advanced player metrics of all the players during the 2018-19 NBA season.

NBA Statistics in R

Using the ballr package in R, we will be able to extract advanced player metric data from basketballreference.com directly into the console.

Let’s view the advanced player metrics from the 2018-2019 NBA regular season

rk	player	pos	age	tm	g	mp	per	tspercent	x3par	ftr	orbpercent	drbpercent	trbpercent	astpercent	stlpercent	blkpercent	tovpercent	usgpercent	x	ows	dws	ws	ws_48	x_2	obpm	dbpm	bpm	vorp	link
1	Álex Abrines	SG	25	OKC	31	588	6.3	0.507	0.809	0.083	0.9	7.8	4.2	4.3	1.3	0.9	7.9	12.2	NA	0.1	0.6	0.6	0.053	NA	-3.7	0.4	-3.3	-0.2	/players/a/abrinal01.html
2	Quincy Acy	PF	28	PHO	10	123	2.9	0.379	0.833	0.556	2.7	20.1	11.3	8.2	0.4	2.7	15.2	9.2	NA	-0.1	0.0	-0.1	-0.022	NA	-7.6	-0.5	-8.1	-0.2	/players/a/acyqu01.html
3	Jaylen Adams	PG	22	ATL	34	428	7.6	0.474	0.673	0.082	2.6	12.3	7.4	19.8	1.5	1.0	19.7	13.5	NA	-0.1	0.2	0.1	0.011	NA	-3.8	-0.5	-4.3	-0.2	/players/a/adamsja01.html
4	Steven Adams	C	25	OKC	80	2669	18.5	0.591	0.002	0.361	14.7	14.8	14.7	6.6	2.0	2.4	12.6	16.4	NA	5.1	4.0	9.1	0.163	NA	0.7	0.4	1.1	2.1	/players/a/adamsst01.html
5	Bam Adebayo	C	21	MIA	82	1913	17.9	0.623	0.031	0.465	9.2	24.0	16.6	14.2	1.8	3.0	17.1	15.8	NA	3.4	3.4	6.8	0.171	NA	-0.4	2.2	1.8	1.8	/players/a/adebaba01.html
6	Deng Adel	SF	21	CLE	19	194	2.7	0.424	0.639	0.111	1.6	9.6	5.4	3.4	0.3	1.8	13.7	9.9	NA	-0.2	0.0	-0.2	-0.054	NA	-6.0	-1.6	-7.5	-0.3	/players/a/adelde01.html

Now let’s examine the structure of the data frame.

## 'data.frame':    708 obs. of  30 variables:
##  $ rk        : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ player    : chr  "Álex Abrines" "Quincy Acy" "Jaylen Adams" "Steven Adams" ...
##  $ pos       : chr  "SG" "PF" "PG" "C" ...
##  $ age       : num  25 28 22 25 21 21 25 33 21 23 ...
##  $ tm        : chr  "OKC" "PHO" "ATL" "OKC" ...
##  $ g         : num  31 10 34 80 82 19 7 81 10 38 ...
##  $ mp        : num  588 123 428 2669 1913 ...
##  $ per       : num  6.3 2.9 7.6 18.5 17.9 2.7 8.2 22.9 8.1 7.5 ...
##  $ tspercent : num  0.507 0.379 0.474 0.591 0.623 0.424 0.322 0.576 0.418 0.516 ...
##  $ x3par     : num  0.809 0.833 0.673 0.002 0.031 0.639 0.4 0.032 0.308 0.556 ...
##  $ ftr       : num  0.083 0.556 0.082 0.361 0.465 0.111 0.2 0.312 0.308 0.337 ...
##  $ orbpercent: num  0.9 2.7 2.6 14.7 9.2 1.6 4.9 10.3 9.9 0.8 ...
##  $ drbpercent: num  7.8 20.1 12.3 14.8 24 9.6 14.8 19.8 13.7 5.1 ...
##  $ trbpercent: num  4.2 11.3 7.4 14.7 16.6 5.4 9.8 15.1 11.8 3 ...
##  $ astpercent: num  4.3 8.2 19.8 6.6 14.2 3.4 37.1 11.6 15.2 8.9 ...
##  $ stlpercent: num  1.3 0.4 1.5 2 1.8 0.3 4.5 0.8 0.4 0.7 ...
##  $ blkpercent: num  0.9 2.7 1 2.4 3 1.8 0 3.4 0 1.1 ...
##  $ tovpercent: num  7.9 15.2 19.7 12.6 17.1 13.7 15.5 8.8 15.3 13.9 ...
##  $ usgpercent: num  12.2 9.2 13.5 16.4 15.8 9.9 25 26.9 19 24.4 ...
##  $ x         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ows       : num  0.1 -0.1 -0.1 5.1 3.4 -0.2 -0.1 6.4 -0.1 -0.4 ...
##  $ dws       : num  0.6 0 0.2 4 3.4 0 0 2.9 0 0.4 ...
##  $ ws        : num  0.6 -0.1 0.1 9.1 6.8 -0.2 0 9.3 -0.1 0 ...
##  $ ws_48     : num  0.053 -0.022 0.011 0.163 0.171 -0.054 -0.051 0.167 -0.042 0.002 ...
##  $ x_2       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ obpm      : num  -3.7 -7.6 -3.8 0.7 -0.4 -6 -7.9 2.4 -3.8 -4.2 ...
##  $ dbpm      : num  0.4 -0.5 -0.5 0.4 2.2 -1.6 2.1 -0.6 -3.5 -2.1 ...
##  $ bpm       : num  -3.3 -8.1 -4.3 1.1 1.8 -7.5 -5.8 1.8 -7.3 -6.3 ...
##  $ vorp      : num  -0.2 -0.2 -0.2 2.1 1.8 -0.3 0 2.6 -0.2 -0.5 ...
##  $ link      : chr  "/players/a/abrinal01.html" "/players/a/acyqu01.html" "/players/a/adamsja01.html" "/players/a/adamsst01.html" ...

We will omit the following columns from our analysis for they do not provide any extra value

• rk

• player

• pos

• age

• tm

• g

• x

• x_2

• link

Here is the summary of the data set

rk	player	pos	age	tm	g	mp	per	tspercent	x3par	ftr	orbpercent	drbpercent	trbpercent	astpercent	stlpercent	blkpercent	tovpercent	usgpercent	x	ows	dws	ws	ws_48	x_2	obpm	dbpm	bpm	vorp	link
Min. : 1.0	Length:708	Length:708	Min. :19.00	Length:708	Min. : 1.00	Min. : 1.0	Min. :-38.10	Min. :0.0000	Min. :0.0000	Min. :0.0000	Min. : 0.00	Min. : 0.00	Min. : 0.00	Min. : 0.00	Min. : 0.000	Min. : 0.000	Min. : 0.00	Min. : 0.00	Min. : NA	Min. :-2.800	Min. :-0.500	Min. :-1.700	Min. :-0.94600	Min. : NA	Min. :-52.400	Min. :-31.1000	Min. :-81.400	Min. :-2.0000	Length:708
1st Qu.:137.8	Class :character	Class :character	1st Qu.:23.00	Class :character	1st Qu.:19.00	1st Qu.: 245.2	1st Qu.: 9.30	1st Qu.:0.5000	1st Qu.:0.2527	1st Qu.:0.1540	1st Qu.: 1.90	1st Qu.: 9.90	1st Qu.: 6.20	1st Qu.: 7.10	1st Qu.: 1.000	1st Qu.: 0.500	1st Qu.: 9.00	1st Qu.:15.00	1st Qu.: NA	1st Qu.: 0.000	1st Qu.: 0.200	1st Qu.: 0.200	1st Qu.: 0.03200	1st Qu.: NA	1st Qu.: -3.200	1st Qu.: -1.2000	1st Qu.: -3.700	1st Qu.:-0.1000	Class :character
Median :269.5	Mode :character	Mode :character	Median :26.00	Mode :character	Median :44.00	Median : 788.0	Median : 12.40	Median :0.5440	Median :0.3890	Median :0.2250	Median : 3.30	Median :13.40	Median : 8.70	Median :10.60	Median : 1.400	Median : 1.200	Median :11.50	Median :17.80	Median : NA	Median : 0.400	Median : 0.650	Median : 1.100	Median : 0.07700	Median : NA	Median : -1.300	Median : -0.3000	Median : -1.500	Median : 0.0000	Mode :character
Mean :268.4	NA	NA	Mean :26.14	NA	Mean :42.88	Mean : 972.3	Mean : 12.75	Mean :0.5315	Mean :0.3801	Mean :0.2493	Mean : 4.98	Mean :15.17	Mean :10.07	Mean :13.03	Mean : 1.484	Mean : 1.591	Mean :12.02	Mean :18.49	Mean :NaN	Mean : 1.003	Mean : 0.964	Mean : 1.968	Mean : 0.07156	Mean :NaN	Mean : -1.488	Mean : -0.3542	Mean : -1.845	Mean : 0.4458	NA
3rd Qu.:398.2	NA	NA	3rd Qu.:29.00	NA	3rd Qu.:68.00	3rd Qu.:1579.5	3rd Qu.: 16.20	3rd Qu.:0.5810	3rd Qu.:0.5300	3rd Qu.:0.3115	3rd Qu.: 7.00	3rd Qu.:19.20	3rd Qu.:12.85	3rd Qu.:17.32	3rd Qu.: 1.825	3rd Qu.: 2.125	3rd Qu.:14.40	3rd Qu.:21.80	3rd Qu.: NA	3rd Qu.: 1.500	3rd Qu.: 1.400	3rd Qu.: 2.800	3rd Qu.: 0.11900	3rd Qu.: NA	3rd Qu.: 0.300	3rd Qu.: 0.6000	3rd Qu.: 0.500	3rd Qu.: 0.5000	NA
Max. :530.0	NA	NA	Max. :42.00	NA	Max. :82.00	Max. :3028.0	Max. : 80.40	Max. :1.5000	Max. :1.0000	Max. :2.0000	Max. :100.00	Max. :90.30	Max. :51.60	Max. :73.40	Max. :12.300	Max. :14.800	Max. :50.00	Max. :47.20	Max. : NA	Max. :11.400	Max. : 5.900	Max. :15.200	Max. : 1.26100	Max. : NA	Max. : 40.100	Max. : 11.9000	Max. : 52.000	Max. : 9.3000	NA
NA	NA	NA	NA	NA	NA	NA	NA	NA’s :6	NA’s :6	NA’s :6	NA	NA	NA	NA	NA	NA	NA’s :6	NA	NA’s :708	NA	NA	NA	NA	NA’s :708	NA	NA	NA	NA	NA

We will now create a data frame with only columns with numerical values.

mp	per	tspercent	x3par	ftr	orbpercent	drbpercent	trbpercent	astpercent	stlpercent	blkpercent	tovpercent	usgpercent	ows	dws	ws	ws_48	obpm	dbpm	bpm	vorp
588	6.3	0.507	0.809	0.083	0.9	7.8	4.2	4.3	1.3	0.9	7.9	12.2	0.1	0.6	0.6	0.053	-3.7	0.4	-3.3	-0.2
123	2.9	0.379	0.833	0.556	2.7	20.1	11.3	8.2	0.4	2.7	15.2	9.2	-0.1	0.0	-0.1	-0.022	-7.6	-0.5	-8.1	-0.2
428	7.6	0.474	0.673	0.082	2.6	12.3	7.4	19.8	1.5	1.0	19.7	13.5	-0.1	0.2	0.1	0.011	-3.8	-0.5	-4.3	-0.2
2669	18.5	0.591	0.002	0.361	14.7	14.8	14.7	6.6	2.0	2.4	12.6	16.4	5.1	4.0	9.1	0.163	0.7	0.4	1.1	2.1
1913	17.9	0.623	0.031	0.465	9.2	24.0	16.6	14.2	1.8	3.0	17.1	15.8	3.4	3.4	6.8	0.171	-0.4	2.2	1.8	1.8
194	2.7	0.424	0.639	0.111	1.6	9.6	5.4	3.4	0.3	1.8	13.7	9.9	-0.2	0.0	-0.2	-0.054	-6.0	-1.6	-7.5	-0.3

Fitting the Neural Network

The objective of our neural network is to predict PER based on advanced metrics as our dependent variables. We will divide the data into training and test sets. The training set is used to find the relationship between dependent and our independent variable, PER, while the test set assesses the performance of the model. We will use 60% of the data set as training set. The assignment of the data to training and test set is done using random sampling, while also using the index variable while fitting neural network to create training and test data sets.

We will fit a neural network on our data using the neuralnet library. The first step is to scale our data set, this is essential because otherwise a variable may have large impact on the prediction variable leading to meaningless results. We will use min-max normalization in order to scale our data.

Below are the maximum values of our data set

##         mp        per  tspercent      x3par        ftr orbpercent drbpercent 
##   3028.000     80.400      1.500      1.000      2.000    100.000     90.300 
## trbpercent astpercent stlpercent blkpercent tovpercent usgpercent        ows 
##     51.600     73.400     12.300     14.800     50.000     47.200     11.400 
##        dws         ws      ws_48       obpm       dbpm        bpm       vorp 
##      5.900     15.200      1.261     40.100     11.900     52.000      9.300

Below are the minimum values of our data set

##         mp        per  tspercent      x3par        ftr orbpercent drbpercent 
##      1.000    -38.100      0.000      0.000      0.000      0.000      0.000 
## trbpercent astpercent stlpercent blkpercent tovpercent usgpercent        ows 
##      0.000      0.000      0.000      0.000      0.000      5.700     -2.800 
##        dws         ws      ws_48       obpm       dbpm        bpm       vorp 
##     -0.500     -1.700     -0.946    -52.400    -31.100    -81.400     -2.000

Here are the first 6 entries of our new scaled data frame

mp	per	tspercent	x3par	ftr	orbpercent	drbpercent	trbpercent	astpercent	stlpercent	blkpercent	tovpercent	usgpercent	ows	dws	ws	ws_48	obpm	dbpm	bpm	vorp
0.1939214	0.3746835	0.3380000	0.809	0.0415	0.009	0.0863787	0.0813953	0.0585831	0.1056911	0.0608108	0.158	0.1566265	0.2042254	0.171875	0.1360947	0.4526507	0.5264865	0.7325581	0.5854573	0.1592920
0.0403039	0.3459916	0.2526667	0.833	0.2780	0.027	0.2225914	0.2189922	0.1117166	0.0325203	0.1824324	0.304	0.0843373	0.1901408	0.078125	0.0946746	0.4186679	0.4843243	0.7116279	0.5494753	0.1592920
0.1410638	0.3856540	0.3160000	0.673	0.0410	0.026	0.1362126	0.1434109	0.2697548	0.1219512	0.0675676	0.394	0.1879518	0.1901408	0.109375	0.1065089	0.4336203	0.5254054	0.7116279	0.5779610	0.1592920
0.8814007	0.4776371	0.3940000	0.002	0.1805	0.147	0.1638981	0.2848837	0.0899183	0.1626016	0.1621622	0.252	0.2578313	0.5563380	0.703125	0.6390533	0.5024921	0.5740541	0.7325581	0.6184408	0.3628319
0.6316485	0.4725738	0.4153333	0.031	0.2325	0.092	0.2657807	0.3217054	0.1934605	0.1463415	0.2027027	0.342	0.2433735	0.4366197	0.609375	0.5029586	0.5061169	0.5621622	0.7744186	0.6236882	0.3362832
0.0637595	0.3443038	0.2826667	0.639	0.0555	0.016	0.1063123	0.1046512	0.0463215	0.0243902	0.1216216	0.274	0.1012048	0.1830986	0.078125	0.0887574	0.4041686	0.5016216	0.6860465	0.5539730	0.1504425

Neural Net Visualization

The scaled data is used to fit the neural network. We visualize the neural network with weights for each of the variable.

The formula used for the construction of the neural network is as follows

## per ~ mp + tspercent + x3par + ftr + orbpercent + drbpercent + 
##     trbpercent + astpercent + stlpercent + blkpercent + tovpercent + 
##     usgpercent + ows + dws + ws + ws_48 + obpm + dbpm + bpm + 
##     vorp

Now we can plot out the neural network as follows

The black lines indicate the connections between each layer and the weights on each connection while the blue lines show the bias term added in each step. Neural nets are essentially a black box so thus there isn’t much to infer about the fitting, the weights and the model values. As defined before we can see which statistical categories have a negative or positive effect on PER.

Here is another plot of the same neural network

In this plot, the inputs are labelled as \(I_{n}\), the bias terms as \(B_{n}\), the hidden layers as \(H_{n}\) and the output as \(O_{1}\). The black lines indicate a positive values in the weight of the connection, the grey lines indicate a negative value.

Here is a list of the positive inputs of our neural network

• Minutes Played
. • True Shooting Percentage

• 3 pointer Rate

• Free throw Rate

• Offensive/ Defensive Percentage

• Assist Percentage

• Turnover Percentage

• Offensive Win Shares

• Win Shares Per 48

• Offensive Rebounds Per Minute

• Value Over Replacement

Here is a list of the negative inputs of our neural network

• Total Rebounding Percentage

• Steal Percentage

• Block Percentage

• Usage Percentage

• Defensive and Total Win Shares

• Defensive Rebounds Per Minute

• Blocks Per Minute

Not necessarily the best indicators on what to value but clearly there is more importance on offensive categories rather than defensive categories when it comes to PER.

The training algorithm has converged and therefore this model can be used to predict Player Efficiency Rating.

Predictions using the Model

Now we can try to predict the values for the test set and calculate the MSE. Need to scale back the values in order to make a meaningful comparison.

Here is the list of the structure of the values

## List of 2
##  $ neurons   :List of 2
##   ..$ : num [1:281, 1:21] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:281] "1" "3" "5" "7" ...
##   .. .. ..$ : chr [1:21] "" "mp" "per" "tspercent" ...
##   ..$ : num [1:281, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:281] "1" "3" "5" "7" ...
##   .. .. ..$ : NULL
##  $ net.result: num [1:281, 1] 0.41 0.463 0.572 0.532 0.527 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:281] "1" "3" "5" "7" ...
##   .. ..$ : NULL

We can compare the predicted rating with real rating using the following visualization

Now let’s visualize the error in our plot in this data frame

##    advtest.r true.predictions
## 1  0.3746835        0.4097333
## 3  0.3856540        0.4631767
## 5  0.4725738        0.5723588
## 7  0.3907173        0.5321029
## 12 0.4582278        0.5266487
## 16 0.3603376        0.4312061

Now we can use data from any season or team specific to predicted and classify PER.