Data Sets

NBA_season1718_salary (1).csv: https://www.kaggle.com/koki25ando/nba-salary-prediction-using-multiple-regression/data

nba_extra.csv: https://www.kaggle.com/mcamli/nba17-18/downloads/nba17-18.zip/4

The first dataset (NBA_season1718_salary (1).csv, named “salary” in R) is a simple four column dataset that shows the player name, team, and season salary (2017-2018) for every player in the NBA during that season. The foruth colum simply labels each NBA player 1-573.

salary<-read.csv("NBA_season1718_salary (1).csv", header=TRUE, na.strings = "?")
head(salary)
##   X         Player  Tm season17_18
## 1 1  Stephen Curry GSW    34682550
## 2 2   LeBron James CLE    33285709
## 3 3   Paul Millsap DEN    31269231
## 4 4 Gordon Hayward BOS    29727900
## 5 5  Blake Griffin DET    29512900
## 6 6     Kyle Lowry TOR    28703704
summary(salary)
##        X                     Player          Tm       season17_18      
##  Min.   :  1   Briante Weber    :  3   ATL    : 27   Min.   :   17224  
##  1st Qu.:144   DeAndre Liggins  :  3   CHI    : 24   1st Qu.: 1312611  
##  Median :287   Demetrius Jackson:  3   DAL    : 22   Median : 2386864  
##  Mean   :287   Isaiah Canaan    :  3   HOU    : 22   Mean   : 5858946  
##  3rd Qu.:430   Jarell Eddie     :  3   LAL    : 22   3rd Qu.: 7936509  
##  Max.   :573   Nigel Hayes      :  3   MIL    : 22   Max.   :34682550  
##                (Other)          :555   (Other):434

The second data set (nba_extra.csv, named “player_stats” in R) is a more complex dataset with over 30 different in-game statistics of every player in the NBA during that same 2017-2018 season. Some of these statitics include 2pt FG%, 3pt FG%, effective FG%, rebounds (offensive and defensive), turnovers, etc.

player_stats<-read.csv("nba_extra.csv", header=TRUE, na.strings = "?")
head(player_stats)
##   Rk                   Player Pos Age  Tm  G GS   MP  FG FGA   FG. X3P
## 1  1  Alex Abrines\\abrinal01  SG  24 OKC 75  8 1134 115 291 0.395  84
## 2  2      Quincy Acy\\acyqu01  PF  27 BRK 70  8 1359 130 365 0.356 102
## 3  3  Steven Adams\\adamsst01   C  24 OKC 76 76 2487 448 712 0.629   0
## 4  4   Bam Adebayo\\adebaba01   C  20 MIA 69 19 1368 174 340 0.512   0
## 5  5 Arron Afflalo\\afflaar01  SG  32 ORL 53  3  682  65 162 0.401  27
## 6  6  Cole Aldrich\\aldrico01   C  29 MIN 21  0   49   5  15 0.333   0
##   X3PA  X3P. X2P X2PA  X2P.  eFG.  FT FTA   FT. ORB DRB TRB AST STL BLK
## 1  221 0.380  31   70 0.443 0.540  39  46 0.848  26  88 114  28  38   8
## 2  292 0.349  28   73 0.384 0.496  49  60 0.817  40 217 257  57  33  29
## 3    2 0.000 448  710 0.631 0.629 160 286 0.559 384 301 685  88  92  78
## 4    7 0.000 174  333 0.523 0.512 129 179 0.721 118 263 381 101  32  41
## 5   70 0.386  38   92 0.413 0.485  22  26 0.846   4  62  66  30   4   9
## 6    0    NA   5   15 0.333 0.333   2   6 0.333   3  12  15   3   2   1
##   TOV  PF  PTS
## 1  25 124  353
## 2  60 149  411
## 3 128 215 1056
## 4  66 138  477
## 5  21  56  179
## 6   1  11   12
summary(player_stats)
##        Rk                                 Player        Pos     
##  Min.   :  1.0   Sean Kilpatrick\\kilpase01   :  5   C    :126  
##  1st Qu.:139.0   Greg Monroe\\monrogr01       :  4   PF   :122  
##  Median :266.5   Nigel Hayes\\hayesni01       :  4   PG   :142  
##  Mean   :270.8   Rashad Vaughn\\vaughra01     :  4   PG-SG:  1  
##  3rd Qu.:401.2   Trevor Booker\\booketr01     :  4   SF   :115  
##  Max.   :540.0   Antonius Cleveland\\clevean01:  3   SF-SG:  2  
##                  (Other)                      :640   SG   :156  
##       Age              Tm            G               GS       
##  Min.   :19.00   TOT    : 59   Min.   : 1.00   Min.   : 0.00  
##  1st Qu.:23.00   HOU    : 24   1st Qu.:17.00   1st Qu.: 0.00  
##  Median :26.00   LAL    : 24   Median :46.00   Median : 4.00  
##  Mean   :26.19   MEM    : 24   Mean   :43.28   Mean   :19.71  
##  3rd Qu.:29.00   MIL    : 24   3rd Qu.:71.00   3rd Qu.:35.00  
##  Max.   :41.00   DAL    : 23   Max.   :82.00   Max.   :82.00  
##                  (Other):486                                  
##        MP               FG             FGA              FG.        
##  Min.   :   1.0   Min.   :  0.0   Min.   :   0.0   Min.   :0.0000  
##  1st Qu.: 186.0   1st Qu.: 22.0   1st Qu.:  58.0   1st Qu.:0.3950  
##  Median : 755.0   Median :102.0   Median : 224.5   Median :0.4400  
##  Mean   : 972.9   Mean   :159.5   Mean   : 347.2   Mean   :0.4414  
##  3rd Qu.:1651.5   3rd Qu.:253.0   3rd Qu.: 554.0   3rd Qu.:0.4930  
##  Max.   :3026.0   Max.   :857.0   Max.   :1687.0   Max.   :1.0000  
##                                                    NA's   :4       
##       X3P              X3PA            X3P.             X2P       
##  Min.   :  0.00   Min.   :  0.0   Min.   :0.0000   Min.   :  0.0  
##  1st Qu.:  1.75   1st Qu.:  7.0   1st Qu.:0.2500   1st Qu.: 15.0  
##  Median : 18.00   Median : 56.5   Median :0.3370   Median : 71.0  
##  Mean   : 42.27   Mean   :117.2   Mean   :0.3100   Mean   :117.2  
##  3rd Qu.: 64.25   3rd Qu.:189.2   3rd Qu.:0.3795   3rd Qu.:181.2  
##  Max.   :265.00   Max.   :722.0   Max.   :1.0000   Max.   :725.0  
##                                   NA's   :65                      
##       X2PA             X2P.             eFG.             FT        
##  Min.   :   0.0   Min.   :0.0000   Min.   :0.000   Min.   :  0.00  
##  1st Qu.:  33.0   1st Qu.:0.4422   1st Qu.:0.458   1st Qu.:  8.00  
##  Median : 143.0   Median :0.4980   Median :0.506   Median : 37.50  
##  Mean   : 230.0   Mean   :0.4931   Mean   :0.498   Mean   : 66.93  
##  3rd Qu.: 361.2   3rd Qu.:0.5467   3rd Qu.:0.551   3rd Qu.: 97.00  
##  Max.   :1361.0   Max.   :1.0000   Max.   :1.500   Max.   :624.00  
##                   NA's   :18       NA's   :4                       
##       FTA              FT.              ORB              DRB       
##  Min.   :  0.00   Min.   :0.0000   Min.   :  0.00   Min.   :  0.0  
##  1st Qu.: 12.00   1st Qu.:0.6670   1st Qu.:  5.00   1st Qu.: 22.0  
##  Median : 51.00   Median :0.7680   Median : 21.50   Median : 95.5  
##  Mean   : 87.19   Mean   :0.7411   Mean   : 39.01   Mean   :135.3  
##  3rd Qu.:120.75   3rd Qu.:0.8330   3rd Qu.: 53.00   3rd Qu.:208.0  
##  Max.   :727.00   Max.   :1.0000   Max.   :399.00   Max.   :848.0  
##                   NA's   :58                                       
##       TRB              AST              STL              BLK        
##  Min.   :   0.0   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:  29.0   1st Qu.: 11.75   1st Qu.:  5.00   1st Qu.:  2.00  
##  Median : 121.5   Median : 51.00   Median : 23.00   Median : 10.00  
##  Mean   : 174.3   Mean   : 93.18   Mean   : 31.15   Mean   : 19.01  
##  3rd Qu.: 259.2   3rd Qu.:126.25   3rd Qu.: 47.00   3rd Qu.: 25.00  
##  Max.   :1247.0   Max.   :820.00   Max.   :177.00   Max.   :193.00  
##                                                                     
##       TOV               PF              PTS        
##  Min.   :  0.00   Min.   :  0.00   Min.   :   0.0  
##  1st Qu.:  8.00   1st Qu.: 16.75   1st Qu.:  59.0  
##  Median : 36.00   Median : 66.50   Median : 264.0  
##  Mean   : 55.01   Mean   : 79.91   Mean   : 428.1  
##  3rd Qu.: 86.00   3rd Qu.:132.00   3rd Qu.: 667.2  
##  Max.   :381.00   Max.   :285.00   Max.   :2251.0  
## 

Investigation

Being a basketball player myself, my teammates and I all closely follow the NBA regular season and playoffs and often debate whether certain players are being overrated/over paid or underrated/under paid. Using these two datasets, I am interested in seeing if salary can be a good predictor of certain player stats for that season and if there is a consisent, linear association that would suggest general manager’s across the league are doing a good job paying players the amount of money they are worth. Just looking at the data, regression seems like an effective way to examine the prediction aspect and to spot particular players who may be over paid or under paid.

This regression process could compare the individual salaries to different statisitics to ensure that different strengths of different positions on the court are taken into consideration. This statistical anaylsis of comparing a player’s salary to his production on the floor would be beneficial for teams across the league when making decisions on keeping or cutting/trading players.

Potential Area of Question/Concern

“salary” dataset is filtered from highest salary to lowest salary while “player_stats” dataset is filtered alphabetically by last name -> how will I get each individual players season salary to match up with their statistics?

“player_stats” has multiple entries for some players due to trades during the season, I am assuming these can be combined into one row using R code?