NBA_season1718_salary (1).csv: https://www.kaggle.com/koki25ando/nba-salary-prediction-using-multiple-regression/data
nba_extra.csv: https://www.kaggle.com/mcamli/nba17-18/downloads/nba17-18.zip/4
The first dataset (NBA_season1718_salary (1).csv, named “salary” in R) is a simple four column dataset that shows the player name, team, and season salary (2017-2018) for every player in the NBA during that season. The foruth colum simply labels each NBA player 1-573.
salary<-read.csv("NBA_season1718_salary (1).csv", header=TRUE, na.strings = "?")
head(salary)
## X Player Tm season17_18
## 1 1 Stephen Curry GSW 34682550
## 2 2 LeBron James CLE 33285709
## 3 3 Paul Millsap DEN 31269231
## 4 4 Gordon Hayward BOS 29727900
## 5 5 Blake Griffin DET 29512900
## 6 6 Kyle Lowry TOR 28703704
summary(salary)
## X Player Tm season17_18
## Min. : 1 Briante Weber : 3 ATL : 27 Min. : 17224
## 1st Qu.:144 DeAndre Liggins : 3 CHI : 24 1st Qu.: 1312611
## Median :287 Demetrius Jackson: 3 DAL : 22 Median : 2386864
## Mean :287 Isaiah Canaan : 3 HOU : 22 Mean : 5858946
## 3rd Qu.:430 Jarell Eddie : 3 LAL : 22 3rd Qu.: 7936509
## Max. :573 Nigel Hayes : 3 MIL : 22 Max. :34682550
## (Other) :555 (Other):434
The second data set (nba_extra.csv, named “player_stats” in R) is a more complex dataset with over 30 different in-game statistics of every player in the NBA during that same 2017-2018 season. Some of these statitics include 2pt FG%, 3pt FG%, effective FG%, rebounds (offensive and defensive), turnovers, etc.
player_stats<-read.csv("nba_extra.csv", header=TRUE, na.strings = "?")
head(player_stats)
## Rk Player Pos Age Tm G GS MP FG FGA FG. X3P
## 1 1 Alex Abrines\\abrinal01 SG 24 OKC 75 8 1134 115 291 0.395 84
## 2 2 Quincy Acy\\acyqu01 PF 27 BRK 70 8 1359 130 365 0.356 102
## 3 3 Steven Adams\\adamsst01 C 24 OKC 76 76 2487 448 712 0.629 0
## 4 4 Bam Adebayo\\adebaba01 C 20 MIA 69 19 1368 174 340 0.512 0
## 5 5 Arron Afflalo\\afflaar01 SG 32 ORL 53 3 682 65 162 0.401 27
## 6 6 Cole Aldrich\\aldrico01 C 29 MIN 21 0 49 5 15 0.333 0
## X3PA X3P. X2P X2PA X2P. eFG. FT FTA FT. ORB DRB TRB AST STL BLK
## 1 221 0.380 31 70 0.443 0.540 39 46 0.848 26 88 114 28 38 8
## 2 292 0.349 28 73 0.384 0.496 49 60 0.817 40 217 257 57 33 29
## 3 2 0.000 448 710 0.631 0.629 160 286 0.559 384 301 685 88 92 78
## 4 7 0.000 174 333 0.523 0.512 129 179 0.721 118 263 381 101 32 41
## 5 70 0.386 38 92 0.413 0.485 22 26 0.846 4 62 66 30 4 9
## 6 0 NA 5 15 0.333 0.333 2 6 0.333 3 12 15 3 2 1
## TOV PF PTS
## 1 25 124 353
## 2 60 149 411
## 3 128 215 1056
## 4 66 138 477
## 5 21 56 179
## 6 1 11 12
summary(player_stats)
## Rk Player Pos
## Min. : 1.0 Sean Kilpatrick\\kilpase01 : 5 C :126
## 1st Qu.:139.0 Greg Monroe\\monrogr01 : 4 PF :122
## Median :266.5 Nigel Hayes\\hayesni01 : 4 PG :142
## Mean :270.8 Rashad Vaughn\\vaughra01 : 4 PG-SG: 1
## 3rd Qu.:401.2 Trevor Booker\\booketr01 : 4 SF :115
## Max. :540.0 Antonius Cleveland\\clevean01: 3 SF-SG: 2
## (Other) :640 SG :156
## Age Tm G GS
## Min. :19.00 TOT : 59 Min. : 1.00 Min. : 0.00
## 1st Qu.:23.00 HOU : 24 1st Qu.:17.00 1st Qu.: 0.00
## Median :26.00 LAL : 24 Median :46.00 Median : 4.00
## Mean :26.19 MEM : 24 Mean :43.28 Mean :19.71
## 3rd Qu.:29.00 MIL : 24 3rd Qu.:71.00 3rd Qu.:35.00
## Max. :41.00 DAL : 23 Max. :82.00 Max. :82.00
## (Other):486
## MP FG FGA FG.
## Min. : 1.0 Min. : 0.0 Min. : 0.0 Min. :0.0000
## 1st Qu.: 186.0 1st Qu.: 22.0 1st Qu.: 58.0 1st Qu.:0.3950
## Median : 755.0 Median :102.0 Median : 224.5 Median :0.4400
## Mean : 972.9 Mean :159.5 Mean : 347.2 Mean :0.4414
## 3rd Qu.:1651.5 3rd Qu.:253.0 3rd Qu.: 554.0 3rd Qu.:0.4930
## Max. :3026.0 Max. :857.0 Max. :1687.0 Max. :1.0000
## NA's :4
## X3P X3PA X3P. X2P
## Min. : 0.00 Min. : 0.0 Min. :0.0000 Min. : 0.0
## 1st Qu.: 1.75 1st Qu.: 7.0 1st Qu.:0.2500 1st Qu.: 15.0
## Median : 18.00 Median : 56.5 Median :0.3370 Median : 71.0
## Mean : 42.27 Mean :117.2 Mean :0.3100 Mean :117.2
## 3rd Qu.: 64.25 3rd Qu.:189.2 3rd Qu.:0.3795 3rd Qu.:181.2
## Max. :265.00 Max. :722.0 Max. :1.0000 Max. :725.0
## NA's :65
## X2PA X2P. eFG. FT
## Min. : 0.0 Min. :0.0000 Min. :0.000 Min. : 0.00
## 1st Qu.: 33.0 1st Qu.:0.4422 1st Qu.:0.458 1st Qu.: 8.00
## Median : 143.0 Median :0.4980 Median :0.506 Median : 37.50
## Mean : 230.0 Mean :0.4931 Mean :0.498 Mean : 66.93
## 3rd Qu.: 361.2 3rd Qu.:0.5467 3rd Qu.:0.551 3rd Qu.: 97.00
## Max. :1361.0 Max. :1.0000 Max. :1.500 Max. :624.00
## NA's :18 NA's :4
## FTA FT. ORB DRB
## Min. : 0.00 Min. :0.0000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 12.00 1st Qu.:0.6670 1st Qu.: 5.00 1st Qu.: 22.0
## Median : 51.00 Median :0.7680 Median : 21.50 Median : 95.5
## Mean : 87.19 Mean :0.7411 Mean : 39.01 Mean :135.3
## 3rd Qu.:120.75 3rd Qu.:0.8330 3rd Qu.: 53.00 3rd Qu.:208.0
## Max. :727.00 Max. :1.0000 Max. :399.00 Max. :848.0
## NA's :58
## TRB AST STL BLK
## Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 29.0 1st Qu.: 11.75 1st Qu.: 5.00 1st Qu.: 2.00
## Median : 121.5 Median : 51.00 Median : 23.00 Median : 10.00
## Mean : 174.3 Mean : 93.18 Mean : 31.15 Mean : 19.01
## 3rd Qu.: 259.2 3rd Qu.:126.25 3rd Qu.: 47.00 3rd Qu.: 25.00
## Max. :1247.0 Max. :820.00 Max. :177.00 Max. :193.00
##
## TOV PF PTS
## Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.: 8.00 1st Qu.: 16.75 1st Qu.: 59.0
## Median : 36.00 Median : 66.50 Median : 264.0
## Mean : 55.01 Mean : 79.91 Mean : 428.1
## 3rd Qu.: 86.00 3rd Qu.:132.00 3rd Qu.: 667.2
## Max. :381.00 Max. :285.00 Max. :2251.0
##
Being a basketball player myself, my teammates and I all closely follow the NBA regular season and playoffs and often debate whether certain players are being overrated/over paid or underrated/under paid. Using these two datasets, I am interested in seeing if salary can be a good predictor of certain player stats for that season and if there is a consisent, linear association that would suggest general manager’s across the league are doing a good job paying players the amount of money they are worth. Just looking at the data, regression seems like an effective way to examine the prediction aspect and to spot particular players who may be over paid or under paid.
This regression process could compare the individual salaries to different statisitics to ensure that different strengths of different positions on the court are taken into consideration. This statistical anaylsis of comparing a player’s salary to his production on the floor would be beneficial for teams across the league when making decisions on keeping or cutting/trading players.
“salary” dataset is filtered from highest salary to lowest salary while “player_stats” dataset is filtered alphabetically by last name -> how will I get each individual players season salary to match up with their statistics?
“player_stats” has multiple entries for some players due to trades during the season, I am assuming these can be combined into one row using R code?