DATA 621 Blog 1: Predicting NBA Player Efficiency by Total Points
David Quarshie
Intro
For my blogs I plan on looking at player stats from the NBA to see if there are statistical methods we can use to derive some interesting findings. The data I’ll be using can be found here: https://www.kaggle.com/drgilermo/nba-players-stats#Seasons_Stats.csv It contains stats such as points, rebounds, assists, fouls, etc. for players starting from the 1950 season.
In this first blog I’m going to focus on a player’s efficiency. The data we’re dealing with already has a stat called Player Efficiency Rating, or PER. This stat was developed by ESPN’s John Hollinger and is defined by John as, “The PER sums up all a player’s positive accomplishments, subtracts the negative accomplishments, and returns a per-minute rating of a player’s performance.” For the full breakdown of how PER is calculated, take a look here: https://www.basketball-reference.com/about/per.html
After reading about what goes into calculating the PER we see that a major point of basketball, scoring points, is not a major part. So let’s use regression to see how total points relates to a player’s efficiency.
Dimensions
We’ve drilled down our data to only include stats from the 2017 NBA season. The dimensions of the dataset are below. We’re working with 595 rows of data with 53 columns.
## [1] 595 53
Data Summary
Let’s also take a look at data’s summary. This will allow us to see some basic information like mean, minimum, and maximum. For example we see that the average number of points for a player is around 474.
## X Year Player Pos
## Min. :24096 Min. :2017 Ersan Ilyasova : 4 SG :125
## 1st Qu.:24244 1st Qu.:2017 Lance Stephenson: 4 SF :121
## Median :24393 Median :2017 Omri Casspi : 4 PF :119
## Mean :24393 Mean :2017 Andrew Bogut : 3 PG :116
## 3rd Qu.:24542 3rd Qu.:2017 Andrew Nicholson: 3 C :113
## Max. :24690 Max. :2017 Anthony Brown : 3 PF-C : 1
## (Other) :574 (Other): 0
## Age Tm G GS
## Min. :19.00 TOT : 53 Min. : 1.00 Min. : 0.00
## 1st Qu.:23.00 NOP : 26 1st Qu.:24.00 1st Qu.: 0.00
## Median :26.00 DAL : 24 Median :55.00 Median : 8.00
## Mean :26.41 BRK : 21 Mean :48.43 Mean :22.14
## 3rd Qu.:29.00 CLE : 21 3rd Qu.:73.00 3rd Qu.:39.00
## Max. :40.00 PHI : 21 Max. :82.00 Max. :82.00
## (Other):429
## MP PER TS. X3PAr
## Min. : 1 Min. :-35.30 Min. :0.0000 Min. :0.0000
## 1st Qu.: 321 1st Qu.: 9.60 1st Qu.:0.5000 1st Qu.:0.1670
## Median :1013 Median : 12.70 Median :0.5370 Median :0.3330
## Mean :1091 Mean : 12.73 Mean :0.5268 Mean :0.3216
## 3rd Qu.:1758 3rd Qu.: 15.55 3rd Qu.:0.5750 3rd Qu.:0.4600
## Max. :3048 Max. : 31.50 Max. :0.8200 Max. :1.0000
## NA's :2 NA's :2
## FTr ORB. DRB. TRB.
## Min. :0.0000 Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.:0.1560 1st Qu.: 1.800 1st Qu.: 10.30 1st Qu.: 6.20
## Median :0.2310 Median : 3.300 Median : 13.80 Median : 8.90
## Mean :0.2705 Mean : 4.912 Mean : 15.14 Mean :10.03
## 3rd Qu.:0.3400 3rd Qu.: 7.550 3rd Qu.: 19.00 3rd Qu.:13.00
## Max. :2.0000 Max. :26.300 Max. :100.00 Max. :56.40
## NA's :2
## AST. STL. BLK. TOV.
## Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 6.20 1st Qu.: 1.000 1st Qu.: 0.500 1st Qu.: 9.70
## Median :10.10 Median : 1.400 Median : 1.200 Median :12.50
## Mean :12.77 Mean : 1.535 Mean : 1.685 Mean :12.89
## 3rd Qu.:17.50 3rd Qu.: 1.900 3rd Qu.: 2.300 3rd Qu.:15.60
## Max. :57.30 Max. :11.100 Max. :20.200 Max. :43.60
## NA's :2
## USG. blanl OWS DWS
## Min. : 0.00 Mode:logical Min. :-1.700 Min. :0.000
## 1st Qu.:14.60 NA's:595 1st Qu.: 0.000 1st Qu.:0.200
## Median :18.10 Median : 0.500 Median :0.800
## Mean :18.50 Mean : 1.155 Mean :1.103
## 3rd Qu.:21.25 3rd Qu.: 1.600 3rd Qu.:1.700
## Max. :41.70 Max. :11.500 Max. :6.000
##
## WS WS.48 blank2 OBPM
## Min. :-0.80 Min. :-0.47300 Mode:logical Min. :-26.700
## 1st Qu.: 0.30 1st Qu.: 0.03700 NA's:595 1st Qu.: -3.000
## Median : 1.30 Median : 0.08100 Median : -1.400
## Mean : 2.26 Mean : 0.07366 Mean : -1.573
## 3rd Qu.: 3.30 3rd Qu.: 0.11400 3rd Qu.: 0.000
## Max. :15.00 Max. : 0.48000 Max. : 11.800
##
## DBPM BPM VORP FG
## Min. :-7.1000 Min. :-26.900 Min. :-1.4000 Min. : 0.0
## 1st Qu.:-1.8000 1st Qu.: -3.850 1st Qu.:-0.1000 1st Qu.: 38.0
## Median :-0.4000 Median : -1.800 Median : 0.0000 Median :134.0
## Mean :-0.3773 Mean : -1.951 Mean : 0.5227 Mean :175.5
## 3rd Qu.: 0.9000 3rd Qu.: 0.200 3rd Qu.: 0.7500 3rd Qu.:261.5
## Max. :12.0000 Max. : 15.600 Max. :12.4000 Max. :824.0
##
## FGA FG. X3P X3PA
## Min. : 0.0 Min. :0.0000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 91.5 1st Qu.:0.4000 1st Qu.: 2.00 1st Qu.: 9.0
## Median : 300.0 Median :0.4420 Median : 23.00 Median : 73.0
## Mean : 384.7 Mean :0.4412 Mean : 43.93 Mean :122.9
## 3rd Qu.: 562.0 3rd Qu.:0.4850 3rd Qu.: 69.00 3rd Qu.:195.0
## Max. :1941.0 Max. :1.0000 Max. :324.00 Max. :789.0
## NA's :2
## X3P. X2P X2PA X2P.
## Min. :0.0000 Min. : 0.0 Min. : 0.0 Min. :0.0000
## 1st Qu.:0.2660 1st Qu.: 26.5 1st Qu.: 57.5 1st Qu.:0.4460
## Median :0.3340 Median : 87.0 Median : 183.0 Median :0.4910
## Mean :0.3011 Mean :131.5 Mean : 261.7 Mean :0.4865
## 3rd Qu.:0.3760 3rd Qu.:195.5 3rd Qu.: 386.0 3rd Qu.:0.5370
## Max. :1.0000 Max. :730.0 Max. :1421.0 Max. :1.0000
## NA's :46 NA's :5
## eFG. FT FTA FT.
## Min. :0.0000 Min. : 0.00 Min. : 0.0 Min. :0.0000
## 1st Qu.:0.4650 1st Qu.: 12.00 1st Qu.: 18.0 1st Qu.:0.6670
## Median :0.5000 Median : 45.00 Median : 63.0 Median :0.7650
## Mean :0.4945 Mean : 79.87 Mean :103.7 Mean :0.7376
## 3rd Qu.:0.5360 3rd Qu.:103.00 3rd Qu.:134.5 3rd Qu.:0.8330
## Max. :1.0000 Max. :746.00 Max. :881.0 Max. :1.0000
## NA's :2 NA's :24
## ORB DRB TRB AST
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 8.00 1st Qu.: 36.0 1st Qu.: 45.0 1st Qu.: 18.5
## Median : 25.00 Median :119.0 Median : 151.0 Median : 58.0
## Mean : 45.51 Mean :150.9 Mean : 196.4 Mean :101.1
## 3rd Qu.: 60.50 3rd Qu.:214.0 3rd Qu.: 280.5 3rd Qu.:132.5
## Max. :345.00 Max. :817.0 Max. :1116.0 Max. :906.0
##
## STL BLK TOV PF
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 8.00 1st Qu.: 3.00 1st Qu.: 14.00 1st Qu.: 29.00
## Median : 27.00 Median : 11.00 Median : 43.00 Median : 84.00
## Mean : 34.66 Mean : 21.51 Mean : 60.33 Mean : 90.23
## 3rd Qu.: 52.00 3rd Qu.: 29.00 3rd Qu.: 88.50 3rd Qu.:139.00
## Max. :157.00 Max. :214.00 Max. :464.00 Max. :278.00
##
## PTS
## Min. : 0.0
## 1st Qu.: 103.0
## Median : 357.0
## Mean : 474.7
## 3rd Qu.: 685.0
## Max. :2558.0
##
Fields
Now that we have a summary of our data let’s shave down our data some more to only get the PER and some stats that have to do with scoring. We’ll pull the true shooting %, 2- and 3- point shooting %, effective field goal %, free throw %, and total points. This should give our regression model enough info to see how scoring effects PER.
## PER TS. FG. X3P.
## Min. :-35.30 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 9.60 1st Qu.:0.5000 1st Qu.:0.4000 1st Qu.:0.2660
## Median : 12.70 Median :0.5370 Median :0.4420 Median :0.3340
## Mean : 12.73 Mean :0.5268 Mean :0.4412 Mean :0.3011
## 3rd Qu.: 15.55 3rd Qu.:0.5750 3rd Qu.:0.4850 3rd Qu.:0.3760
## Max. : 31.50 Max. :0.8200 Max. :1.0000 Max. :1.0000
## NA's :2 NA's :2 NA's :46
## X2P. eFG. FT. PTS
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. : 0.0
## 1st Qu.:0.4460 1st Qu.:0.4650 1st Qu.:0.6670 1st Qu.: 103.0
## Median :0.4910 Median :0.5000 Median :0.7650 Median : 357.0
## Mean :0.4865 Mean :0.4945 Mean :0.7376 Mean : 474.7
## 3rd Qu.:0.5370 3rd Qu.:0.5360 3rd Qu.:0.8330 3rd Qu.: 685.0
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :2558.0
## NA's :5 NA's :2 NA's :24
Model
With just PER and some scoring stats in our dataset we can use R’s lm function to create a linear model to see how those stats play into PER.
##
## Call:
## lm(formula = PER ~ ., data = stats_final)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.1462 -1.7146 -0.2006 1.4715 11.7290
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.444752 1.016989 -8.304 8.68e-16 ***
## TS. 45.844088 5.275127 8.691 < 2e-16 ***
## FG. 51.700570 4.268011 12.114 < 2e-16 ***
## X3P. 2.605305 1.213459 2.147 0.0323 *
## X2P. -11.067119 2.433360 -4.548 6.73e-06 ***
## eFG. -44.764416 5.812190 -7.702 6.76e-14 ***
## FT. -1.735443 1.090412 -1.592 0.1121
## PTS 0.004940 0.000263 18.785 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.59 on 523 degrees of freedom
## (64 observations deleted due to missingness)
## Multiple R-squared: 0.7588, Adjusted R-squared: 0.7556
## F-statistic: 235.1 on 7 and 523 DF, p-value: < 2.2e-16
Our results shows us some interesting things. For one, looking at the p-values for stats like 3 point % and free throw % we see that they’re not that critical for PER. Stats like true shooting %, field goal %, and points are in fact very important scoring metrics for PER. We see from their intercepts that if a player increases either their true shooting % or field goal % their PER will increase faster. The total points they score will increase their PER but not by that much.