Part II - NBA Player Salary Analysis with Multiple Regression

Christian Thieme

11/7/2020

Can We Predict an NBA Player’s Salary Using His Statistics from the Prior Year?

In the previous week’s discussion we used a simple linear regression model with one input variable, total points from the previous year (a single independent variable), to try to predict NBA players’ salaries. That analysis and discussion can be reviewed here. As a refresher, our overall goal is to run a regression model to see if we can predict a player’s salary for the coming year. For this analysis, we will use multiple factors in our model and perform multi-factor linear regression.

We’ll use the same two datasets that we used previously. One data set includes player statistics and the other dataset will include information about the salary of the players from 2017 to 2018. We’ll join these two datasets together to perform our analysis. The player statistics dataset can be downloaded here and the NBA Salary dataset can be downloaded here.

The nba_stats dataset has data from 1955 through 2017. We’ll limit this data down to 2016 to conduct our analysis (since we’ll be looking to see if we can predict salary in 2017). Additionally, some players have multiple lines per year since they were traded to different teams throughout the year. First, we’ll remove the duplicates from the dataset, then we’ll circle back and only pull the duplicates and pull the total value for the season from the dataset for each player and append on to our dataset.

## # A tibble: 6 x 41
##    Year Player Pos     Age Tm        G GS       MP   PER `TS%` `3PAr`   FTr
##   <dbl> <chr>  <chr> <dbl> <chr> <dbl> <lgl> <dbl> <dbl> <dbl> <lgl>  <dbl>
## 1  2016 Quinc~ PF       25 SAC      59 NA      876  14.7 0.629 NA     0.318
## 2  2016 Jorda~ SG       21 MEM       2 FALSE    15  17.3 0.427 NA     0.833
## 3  2016 Steve~ C        22 OKC      80 NA     2014  15.5 0.621 FALSE  0.46 
## 4  2016 Arron~ SG       30 NYK      71 NA     2371  10.9 0.531 NA     0.164
## 5  2016 Alexi~ C        27 NOP      59 NA      861  13.8 0.514 NA     0.197
## 6  2016 Cole ~ C        27 LAC      60 NA      800  21.3 0.626 FALSE  0.373
## # ... with 29 more variables: `ORB%` <lgl>, `DRB%` <lgl>, `TRB%` <lgl>,
## #   `AST%` <lgl>, `STL%` <lgl>, `BLK%` <lgl>, `TOV%` <lgl>, BPM <lgl>,
## #   FG <dbl>, FGA <dbl>, `FG%` <dbl>, `3P` <lgl>, `3PA` <lgl>, `3P%` <lgl>,
## #   `2P` <dbl>, `2PA` <dbl>, `2P%` <dbl>, `eFG%` <dbl>, FT <dbl>, FTA <dbl>,
## #   `FT%` <dbl>, ORB <lgl>, DRB <lgl>, TRB <dbl>, AST <dbl>, STL <lgl>,
## #   BLK <lgl>, TOV <lgl>, PTS <dbl>

Next we’ll select the columns we’ll need in our nba_salary dataset and aggregate the salary values for the players that transitioned teams mid-year (meanining they have multiple rows of data). Additionally, we’ll divide the salary column by a million to make the numbers easier to view when graphing/plotting.

## # A tibble: 6 x 2
##   Player          salary_2017_in_millions
##   <chr>                             <dbl>
## 1 A.J. Hammons                      1.31 
## 2 Aaron Brooks                      2.12 
## 3 Aaron Gordon                      5.50 
## 4 Aaron Gray                        0.452
## 5 Abdel Nader                       1.17 
## 6 Al-Farouq Aminu                   7.32

Next, we’ll join these two datasets together so we have both points and salary in the same dataset.

## # A tibble: 6 x 42
##    Year Player Pos     Age Tm        G GS       MP   PER  `TS%` `3PAr`   FTr
##   <dbl> <chr>  <chr> <dbl> <chr> <dbl> <lgl> <dbl> <dbl>  <dbl> <lgl>  <dbl>
## 1  2016 Sam D~ SF       21 HOU       3 FALSE     6  10.8 NA     NA        NA
## 2  2016 J.J. ~ SF       23 UTA       2 FALSE     6   1.3  0     FALSE      0
## 3  2016 Nate ~ PG       31 NOP       2 TRUE     23   2.6  0     TRUE       0
## 4  2016 Bruno~ SF       20 TOR       6 TRUE     43  -7.7  0.125 NA         0
## 5  2016 Joe H~ SG       24 CLE       5 FALSE    15   3.4  0.375 TRUE       0
## 6  2016 Rakee~ PF       24 IND       1 FALSE     6  32    1     FALSE      0
## # ... with 30 more variables: `ORB%` <lgl>, `DRB%` <lgl>, `TRB%` <lgl>,
## #   `AST%` <lgl>, `STL%` <lgl>, `BLK%` <lgl>, `TOV%` <lgl>, BPM <lgl>,
## #   FG <dbl>, FGA <dbl>, `FG%` <dbl>, `3P` <lgl>, `3PA` <lgl>, `3P%` <lgl>,
## #   `2P` <dbl>, `2PA` <dbl>, `2P%` <dbl>, `eFG%` <dbl>, FT <dbl>, FTA <dbl>,
## #   `FT%` <dbl>, ORB <lgl>, DRB <lgl>, TRB <dbl>, AST <dbl>, STL <lgl>,
## #   BLK <lgl>, TOV <lgl>, PTS <dbl>, salary_2017_in_millions <dbl>

Before moving on, let’s quickly look and see how many null values we have in each of our columns:

##                    Year                  Player                     Pos 
##                       0                       0                       0 
##                     Age                      Tm                       G 
##                       0                       0                       0 
##                      GS                      MP                     PER 
##                     383                       0                       0 
##                     TS%                    3PAr                     FTr 
##                       1                     479                       1 
##                    ORB%                    DRB%                    TRB% 
##                     502                     520                     521 
##                    AST%                    STL%                    BLK% 
##                     511                     478                     470 
##                    TOV%                     BPM                      FG 
##                     512                     515                       0 
##                     FGA                     FG%                      3P 
##                       0                       1                     394 
##                     3PA                     3P%                      2P 
##                     449                     466                       0 
##                     2PA                     2P%                    eFG% 
##                       0                       3                       1 
##                      FT                     FTA                     FT% 
##                       0                       0                      15 
##                     ORB                     DRB                     TRB 
##                     494                     512                       0 
##                     AST                     STL                     BLK 
##                       0                     486                     458 
##                     TOV                     PTS salary_2017_in_millions 
##                     501                       0                     157

HOLY! That’s a lot of nulls. Some of these columns are almost completly empty. Let’s remove these, as they will not be helpful to our anlaysis:

##                    Year                  Player                     Pos 
##                       0                       0                       0 
##                     Age                      Tm                       G 
##                       0                       0                       0 
##                      MP                     PER                     TS% 
##                       0                       0                       1 
##                     FTr                      FG                     FGA 
##                       1                       0                       0 
##                     FG%                      2P                     2PA 
##                       1                       0                       0 
##                     2P%                    eFG%                      FT 
##                       3                       1                       0 
##                     FTA                     FT%                     TRB 
##                       0                      15                       0 
##                     AST                     PTS salary_2017_in_millions 
##                       0                       0                     157

Looking above, the null values in our columns look much better, except for the salary column. Let’s clean that up in our next step.

Since our analysis is specifically looking at predicting the salary of those players with stats in 2016, we’ll go ahead and remove any rows of the dataset where the salary is null. Additionally, there are cases where a player has very few points and is still paid a salary (they were injured, etc.). For our purposes, we’ll consider these as outliers and remove anyone with less than 25 points them from our analysis (there are 82 games in an NBA season, so scoring 25 points should be realistic for most players - even bench sitters).

## # A tibble: 358 x 24
##     Year Player Pos     Age Tm        G    MP   PER `TS%`   FTr    FG   FGA
##    <dbl> <chr>  <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  2016 Alan ~ PF       23 PHO      10    68  21.1 0.481 0.583    10    24
##  2  2016 Brian~ PG       23 TOT       7   169   8.9 0.38  0.103    14    39
##  3  2016 Brian~ PG       23 TOT       7   169   8.9 0.38  0.103    14    39
##  4  2016 Pat C~ SG       23 POR      34   143   4.6 0.352 0.102    13    49
##  5  2016 Nikol~ C        30 MIN      12   156   6.5 0.459 0.4      19    50
##  6  2016 Spenc~ PG       22 DET      12   159   8.9 0.423 0.611    19    54
##  7  2016 Udoni~ PF       35 MIA      37   260   9.1 0.434 0.262    23    61
##  8  2016 Caron~ SF       35 SAC      17   176  11.3 0.49  0.203    25    59
##  9  2016 Jarel~ SF       24 WAS      26   147  11   0.46  0.123    20    65
## 10  2016 Lucas~ C        23 TOR      29   225  15.6 0.642 0.341    28    44
## # ... with 348 more rows, and 12 more variables: `FG%` <dbl>, `2P` <dbl>,
## #   `2PA` <dbl>, `2P%` <dbl>, `eFG%` <dbl>, FT <dbl>, FTA <dbl>, `FT%` <dbl>,
## #   TRB <dbl>, AST <dbl>, PTS <dbl>, salary_2017_in_millions <dbl>

Now that we’ve got our data straightened out, let’s see if we can identify some relationship between some of the factors in our dataset and a player’s salary.

First, let’s take a look at the distribution of salary between different positions in the NBA. Before plotting, lets take a look and see how many players of each position we have in the dataset:

## # A tibble: 5 x 2
##   Pos       n
##   <chr> <int>
## 1 C        68
## 2 PF       70
## 3 PG       75
## 4 SF       72
## 5 SG       73

Looking at the above counts, it looks like there is a pretty even breakout between each position. Equipped with this information, let’s now plot this with the salary information.

One of the most interesting things we can see in the above plot is that point guards (PG) have the lowest median salary of any position in the NBA but also have the widest range and the highest salary value in the dataset as well as several other outliers. The centers (C) have the highest median values as well as a wider IQR than any of the other positions.

Now let’s turn our attention to age to see if it plays a factor in salary:

Looking at the boxplots above, we can see that age is definitely a factor in salary. We can see that, in general, salary increases at age 22 and then decreases after age 31. This makes sense, because players can begin to enter the league at age 19 and teams generally aren’t willing to take a big risk on a large salary with an untested rookie. After a few years of solid performance, we’d expect salaries to increase. Additionally, players generally start seeing some decline in their athletic abilities in their early 30’s with the demands of the game.

Next, we’ll take a look at the relationship between team (Tm) and salary:

These results are pretty surprising. You can see there is a huge discrepancy between team’s salaries, for example, it looks like eveyr player on the Cleveland Caveliers makes more than every player on the Philidelphia 76ers. It does look like team will play a factor in the salary puzzle.

Now that we’ve reviewed all of the categorical data, let’s use a pair plot to investigate relationships between the remaining variables:

Looking at the above pair plot we can see a couple things: First, salary has a medium to medium loose relationship with minutes played (MP), Player Efficiency Rating (PER), True Shooting Percentage (TS%), Field Goals (FG), Field Goal Attempts (FGA), Field Goal % (FG%), 2 Point Field Goals (2P), 2 Point Field Goal Attempts (2PA), Free Throws (FT), Free Throw Attempts (FTA), Total Rebounds (TRB) and Points (PTS). Another thing we can see is that some of our columns are highly correlated with one another such as Free Throws and Free Throw Attempts. We’ll have to watch out for this collinearity when we build our model.

Let’s now turn our attention to model building. We’ll start by building a multi-factor regression model and use backward elimination to remove insignificant factors. Based on our analysis, we’ve got a good idea of what factors will be relevant to the model. Before building, you’ll remember in the last discussion we built a model using a single factor. The output from that model is below:

In reviewing the results you can see we had a standard error of $6.283M dollars and an adjusted R-squared value of 0.3718. We’ll be looking to improve those metrics with this multi-factor regression model. Since points (PTS) is strongly correlated with 2PA and 2PA%, we’ll remove it in this model run.

## 
## Call:
## lm(formula = salary_2017_in_millions ~ ., data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.7411  -2.8801  -0.2109   2.9241  15.2311 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.897e+01  6.648e+00   2.854 0.004619 ** 
## PosPF       -1.087e+00  9.983e-01  -1.089 0.277000    
## PosPG       -4.221e+00  1.493e+00  -2.828 0.004998 ** 
## PosSF       -2.795e+00  1.293e+00  -2.161 0.031444 *  
## PosSG       -3.401e+00  1.419e+00  -2.397 0.017128 *  
## Age         -2.667e-02  7.653e-02  -0.348 0.727746    
## TmBOS       -6.258e+00  2.234e+00  -2.802 0.005411 ** 
## TmBRK       -6.391e+00  2.336e+00  -2.735 0.006600 ** 
## TmCHI       -3.697e+00  2.148e+00  -1.721 0.086222 .  
## TmCHO       -4.059e+00  2.194e+00  -1.850 0.065312 .  
## TmCLE        1.626e-01  2.285e+00   0.071 0.943323    
## TmDAL       -6.356e+00  2.124e+00  -2.992 0.002997 ** 
## TmDEN       -8.187e+00  2.196e+00  -3.729 0.000229 ***
## TmDET       -5.588e+00  2.331e+00  -2.397 0.017131 *  
## TmGSW       -4.907e+00  2.172e+00  -2.259 0.024577 *  
## TmHOU       -7.030e+00  2.271e+00  -3.096 0.002146 ** 
## TmIND       -4.258e+00  2.304e+00  -1.849 0.065497 .  
## TmLAC        1.644e-02  2.340e+00   0.007 0.994397    
## TmLAL       -7.255e+00  2.517e+00  -2.882 0.004227 ** 
## TmMEM       -2.996e+00  2.333e+00  -1.284 0.200043    
## TmMIA       -3.531e+00  2.208e+00  -1.599 0.110806    
## TmMIL       -3.935e+00  2.252e+00  -1.747 0.081576 .  
## TmMIN       -9.291e+00  2.343e+00  -3.965 9.16e-05 ***
## TmNOP       -2.447e+00  2.300e+00  -1.064 0.288290    
## TmNYK       -5.780e+00  2.255e+00  -2.564 0.010841 *  
## TmOKC       -7.945e-01  2.249e+00  -0.353 0.724105    
## TmORL       -5.344e+00  2.177e+00  -2.454 0.014670 *  
## TmPHI       -9.273e+00  2.400e+00  -3.863 0.000137 ***
## TmPHO       -5.368e+00  2.202e+00  -2.438 0.015334 *  
## TmPOR       -2.377e+00  2.199e+00  -1.081 0.280632    
## TmSAC       -8.272e+00  2.141e+00  -3.864 0.000136 ***
## TmSAS       -3.960e+00  2.253e+00  -1.758 0.079788 .  
## TmTOR       -1.458e+00  2.186e+00  -0.667 0.505337    
## TmTOT       -6.363e+00  1.766e+00  -3.602 0.000368 ***
## TmUTA       -3.625e+00  2.151e+00  -1.685 0.092959 .  
## TmWAS       -3.818e+00  2.237e+00  -1.707 0.088891 .  
## G           -1.599e-01  3.012e-02  -5.309 2.12e-07 ***
## MP           4.589e-03  1.705e-03   2.691 0.007512 ** 
## PER          6.036e-02  1.919e-01   0.315 0.753338    
## `TS%`        3.209e+01  4.079e+01   0.787 0.432111    
## FTr         -2.643e+00  5.734e+00  -0.461 0.645228    
## FG           9.269e-02  4.489e-02   2.065 0.039811 *  
## FGA         -2.884e-02  1.896e-02  -1.521 0.129174    
## `FG%`       -2.411e+01  2.321e+01  -1.039 0.299624    
## `2P`        -6.144e-03  5.333e-02  -0.115 0.908357    
## `2PA`       -1.066e-02  2.344e-02  -0.455 0.649473    
## `2P%`        2.430e+00  1.492e+01   0.163 0.870752    
## `eFG%`      -1.901e+01  4.110e+01  -0.462 0.644100    
## FT          -1.657e-03  2.582e-02  -0.064 0.948891    
## FTA          1.591e-02  2.106e-02   0.755 0.450601    
## `FT%`       -3.709e+00  4.968e+00  -0.746 0.455942    
## TRB          1.213e-05  4.116e-03   0.003 0.997651    
## AST          8.234e-03  4.044e-03   2.036 0.042594 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.884 on 305 degrees of freedom
## Multiple R-squared:  0.6673, Adjusted R-squared:  0.6106 
## F-statistic: 11.77 on 52 and 305 DF,  p-value: < 2.2e-16

Looking at the above model, we’ve already improved on our simple linear regression model significantly. Using backward eliminations, it looks like total rebounds has the highest p-value, so we’ll remove that factor and rerun the model:

## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Age + Tm + G + MP + 
##     PER + `TS%` + FTr + FG + FGA + `FG%` + `2P` + `2PA` + `2P%` + 
##     `eFG%` + FT + FTA + `FT%` + AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.7419  -2.8801  -0.2106   2.9238  15.2299 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.975637   6.484227   2.926 0.003686 ** 
## PosPF        -1.087502   0.990855  -1.098 0.273269    
## PosPG        -4.222675   1.330548  -3.174 0.001659 ** 
## PosSF        -2.796642   1.217967  -2.296 0.022343 *  
## PosSG        -3.403225   1.257968  -2.705 0.007206 ** 
## Age          -0.026675   0.076332  -0.349 0.726989    
## TmBOS        -6.258631   2.229770  -2.807 0.005324 ** 
## TmBRK        -6.390893   2.330441  -2.742 0.006459 ** 
## TmCHI        -3.696411   2.143591  -1.724 0.085645 .  
## TmCHO        -4.058861   2.190527  -1.853 0.064857 .  
## TmCLE         0.162783   2.280516   0.071 0.943142    
## TmDAL        -6.355670   2.120635  -2.997 0.002949 ** 
## TmDEN        -8.186940   2.192114  -3.735 0.000224 ***
## TmDET        -5.588048   2.326892  -2.402 0.016924 *  
## TmGSW        -4.906909   2.163765  -2.268 0.024041 *  
## TmHOU        -7.030738   2.263621  -3.106 0.002074 ** 
## TmIND        -4.258209   2.299730  -1.852 0.065045 .  
## TmLAC         0.015882   2.327986   0.007 0.994561    
## TmLAL        -7.254382   2.509703  -2.891 0.004121 ** 
## TmMEM        -2.996187   2.329266  -1.286 0.199303    
## TmMIA        -3.531344   2.202067  -1.604 0.109823    
## TmMIL        -3.935047   2.244650  -1.753 0.080590 .  
## TmMIN        -9.291086   2.335093  -3.979 8.65e-05 ***
## TmNOP        -2.446562   2.296170  -1.065 0.287491    
## TmNYK        -5.779795   2.250935  -2.568 0.010711 *  
## TmOKC        -0.794081   2.241156  -0.354 0.723345    
## TmORL        -5.344031   2.170376  -2.462 0.014357 *  
## TmPHI        -9.273574   2.395564  -3.871 0.000132 ***
## TmPHO        -5.368108   2.197721  -2.443 0.015148 *  
## TmPOR        -2.376698   2.193558  -1.083 0.279444    
## TmSAC        -8.271569   2.136835  -3.871 0.000133 ***
## TmSAS        -3.960339   2.248331  -1.761 0.079160 .  
## TmTOR        -1.457843   2.179375  -0.669 0.504046    
## TmTOT        -6.363345   1.762842  -3.610 0.000358 ***
## TmUTA        -3.624868   2.144496  -1.690 0.091987 .  
## TmWAS        -3.818186   2.230605  -1.712 0.087960 .  
## G            -0.159912   0.029976  -5.335 1.86e-07 ***
## MP            0.004592   0.001418   3.239 0.001331 ** 
## PER           0.060559   0.179309   0.338 0.735792    
## `TS%`        32.073489  40.407896   0.794 0.427960    
## FTr          -2.642786   5.724125  -0.462 0.644630    
## FG            0.092689   0.044818   2.068 0.039468 *  
## FGA          -0.028850   0.018792  -1.535 0.125771    
## `FG%`       -24.122789  22.889563  -1.054 0.292771    
## `2P`         -0.006132   0.053089  -0.116 0.908121    
## `2PA`        -0.010667   0.023392  -0.456 0.648716    
## `2P%`         2.430311  14.898810   0.163 0.870531    
## `eFG%`      -18.995560  40.820113  -0.465 0.642013    
## FT           -0.001672   0.025267  -0.066 0.947293    
## FTA           0.015925   0.020347   0.783 0.434414    
## `FT%`        -3.707572   4.942419  -0.750 0.453739    
## AST           0.008233   0.004002   2.057 0.040539 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.876 on 306 degrees of freedom
## Multiple R-squared:  0.6673, Adjusted R-squared:  0.6119 
## F-statistic: 12.04 on 51 and 306 DF,  p-value: < 2.2e-16

With the removal of total rebounds we’ve seen a small decrease in residual standard error and a slight increase in the adjusted r-squared value. Next, we’ll rerun removing freethrows (FT):

## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Age + Tm + G + MP + 
##     PER + `TS%` + FTr + FG + FGA + `FG%` + `2P` + `2PA` + `2P%` + 
##     `eFG%` + FTA + `FT%` + AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.7700  -2.8748  -0.2025   2.9617  15.2331 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.128738   6.047428   3.163 0.001717 ** 
## PosPF        -1.089418   0.988824  -1.102 0.271442    
## PosPG        -4.216763   1.325390  -3.182 0.001615 ** 
## PosSF        -2.796102   1.215963  -2.299 0.022148 *  
## PosSG        -3.400814   1.255399  -2.709 0.007129 ** 
## Age          -0.026640   0.076206  -0.350 0.726898    
## TmBOS        -6.265059   2.224036  -2.817 0.005162 ** 
## TmBRK        -6.402272   2.320314  -2.759 0.006142 ** 
## TmCHI        -3.706725   2.134445  -1.737 0.083457 .  
## TmCHO        -4.062399   2.186321  -1.858 0.064112 .  
## TmCLE         0.161126   2.276677   0.071 0.943625    
## TmDAL        -6.361807   2.115168  -3.008 0.002850 ** 
## TmDEN        -8.189813   2.188127  -3.743 0.000217 ***
## TmDET        -5.574481   2.314076  -2.409 0.016587 *  
## TmGSW        -4.909443   2.159915  -2.273 0.023718 *  
## TmHOU        -7.035439   2.258834  -3.115 0.002016 ** 
## TmIND        -4.262703   2.294996  -1.857 0.064213 .  
## TmLAC         0.009719   2.322346   0.004 0.996664    
## TmLAL        -7.263961   2.501458  -2.904 0.003953 ** 
## TmMEM        -3.002419   2.323583  -1.292 0.197277    
## TmMIA        -3.535789   2.197470  -1.609 0.108638    
## TmMIL        -3.946765   2.234021  -1.767 0.078278 .  
## TmMIN        -9.294075   2.330867  -3.987 8.36e-05 ***
## TmNOP        -2.452416   2.290741  -1.071 0.285200    
## TmNYK        -5.786573   2.244953  -2.578 0.010415 *  
## TmOKC        -0.809383   2.225573  -0.364 0.716352    
## TmORL        -5.344645   2.166835  -2.467 0.014187 *  
## TmPHI        -9.266850   2.389523  -3.878 0.000129 ***
## TmPHO        -5.371581   2.193529  -2.449 0.014891 *  
## TmPOR        -2.384480   2.186848  -1.090 0.276404    
## TmSAC        -8.266228   2.131844  -3.878 0.000129 ***
## TmSAS        -3.969662   2.240270  -1.772 0.077394 .  
## TmTOR        -1.473302   2.163297  -0.681 0.496357    
## TmTOT        -6.366577   1.759305  -3.619 0.000346 ***
## TmUTA        -3.632750   2.137709  -1.699 0.090263 .  
## TmWAS        -3.819446   2.226904  -1.715 0.087329 .  
## G            -0.159763   0.029842  -5.354 1.69e-07 ***
## MP            0.004614   0.001376   3.352 0.000903 ***
## PER           0.062145   0.177411   0.350 0.726363    
## `TS%`        30.274543  29.842731   1.014 0.311158    
## FTr          -2.464744   5.043929  -0.489 0.625434    
## FG            0.092880   0.044653   2.080 0.038350 *  
## FGA          -0.029092   0.018403  -1.581 0.114956    
## `FG%`       -24.601710  21.679586  -1.135 0.257350    
## `2P`         -0.006279   0.052957  -0.119 0.905697    
## `2PA`        -0.010457   0.023138  -0.452 0.651642    
## `2P%`         2.501004  14.836330   0.169 0.866244    
## `eFG%`      -17.151034  29.768226  -0.576 0.564934    
## FTA           0.014668   0.007261   2.020 0.044228 *  
## `FT%`        -3.717076   4.932313  -0.754 0.451657    
## AST           0.008183   0.003925   2.085 0.037909 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.868 on 307 degrees of freedom
## Multiple R-squared:  0.6673, Adjusted R-squared:  0.6131 
## F-statistic: 12.32 on 50 and 307 DF,  p-value: < 2.2e-16

We are still seeing improvements to the model, so we’ll continue with eliminations:

## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Age + Tm + G + MP + 
##     PER + `TS%` + FTr + FG + FGA + `FG%` + `2PA` + `2P%` + `eFG%` + 
##     FTA + `FT%` + AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.7633  -2.8927  -0.1809   2.9474  15.2428 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.149238   6.035273   3.173 0.001662 ** 
## PosPF        -1.088422   0.987205  -1.103 0.271093    
## PosPG        -4.219524   1.323063  -3.189 0.001573 ** 
## PosSF        -2.808121   1.209789  -2.321 0.020930 *  
## PosSG        -3.412391   1.249591  -2.731 0.006682 ** 
## Age          -0.026843   0.076065  -0.353 0.724409    
## TmBOS        -6.282007   2.215883  -2.835 0.004886 ** 
## TmBRK        -6.404695   2.316508  -2.765 0.006039 ** 
## TmCHI        -3.694913   2.128704  -1.736 0.083607 .  
## TmCHO        -4.063115   2.182810  -1.861 0.063638 .  
## TmCLE         0.140416   2.266331   0.062 0.950637    
## TmDAL        -6.352523   2.110332  -3.010 0.002827 ** 
## TmDEN        -8.191665   2.184566  -3.750 0.000211 ***
## TmDET        -5.572941   2.310333  -2.412 0.016442 *  
## TmGSW        -4.882205   2.144223  -2.277 0.023477 *  
## TmHOU        -7.042464   2.254440  -3.124 0.001955 ** 
## TmIND        -4.258284   2.291017  -1.859 0.064025 .  
## TmLAC         0.017230   2.317763   0.007 0.994073    
## TmLAL        -7.270515   2.496841  -2.912 0.003855 ** 
## TmMEM        -2.991559   2.318058  -1.291 0.197829    
## TmMIA        -3.541115   2.193491  -1.614 0.107470    
## TmMIL        -3.948531   2.230393  -1.770 0.077661 .  
## TmMIN        -9.307269   2.324480  -4.004 7.81e-05 ***
## TmNOP        -2.450573   2.287019  -1.072 0.284777    
## TmNYK        -5.779309   2.240523  -2.579 0.010359 *  
## TmOKC        -0.828563   2.216131  -0.374 0.708752    
## TmORL        -5.346376   2.163315  -2.471 0.014000 *  
## TmPHI        -9.288735   2.378567  -3.905 0.000116 ***
## TmPHO        -5.367927   2.189799  -2.451 0.014788 *  
## TmPOR        -2.368885   2.179393  -1.087 0.277910    
## TmSAC        -8.262130   2.128150  -3.882 0.000127 ***
## TmSAS        -3.958700   2.234776  -1.771 0.077482 .  
## TmTOR        -1.460788   2.157260  -0.677 0.498819    
## TmTOT        -6.363357   1.756278  -3.623 0.000340 ***
## TmUTA        -3.624112   2.133045  -1.699 0.090323 .  
## TmWAS        -3.828000   2.222169  -1.723 0.085957 .  
## G            -0.160006   0.029724  -5.383 1.45e-07 ***
## MP            0.004582   0.001348   3.398 0.000768 ***
## PER           0.057857   0.173407   0.334 0.738873    
## `TS%`        31.124277  28.923001   1.076 0.282721    
## FTr          -2.481232   5.033935  -0.493 0.622434    
## FG            0.088167   0.020312   4.341 1.93e-05 ***
## FGA          -0.027126   0.007968  -3.404 0.000751 ***
## `FG%`       -24.039099  21.120033  -1.138 0.255916    
## `2PA`        -0.013116   0.005672  -2.313 0.021406 *  
## `2P%`         1.499948  12.180136   0.123 0.902071    
## `eFG%`      -17.372429  29.662015  -0.586 0.558520    
## FTA           0.014561   0.007193   2.024 0.043789 *  
## `FT%`        -3.794066   4.881552  -0.777 0.437622    
## AST           0.008225   0.003902   2.108 0.035860 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.86 on 308 degrees of freedom
## Multiple R-squared:  0.6673, Adjusted R-squared:  0.6144 
## F-statistic: 12.61 on 49 and 308 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Age + Tm + G + MP + 
##     PER + `TS%` + FTr + FG + FGA + `FG%` + `2PA` + `eFG%` + FTA + 
##     `FT%` + AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.7724  -2.9077  -0.1963   3.0091  15.2468 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.237843   5.982673   3.216 0.001440 ** 
## PosPF        -1.065036   0.967222  -1.101 0.271698    
## PosPG        -4.199051   1.310483  -3.204 0.001496 ** 
## PosSF        -2.769554   1.166687  -2.374 0.018214 *  
## PosSG        -3.389594   1.233832  -2.747 0.006363 ** 
## Age          -0.026739   0.075939  -0.352 0.724995    
## TmBOS        -6.308937   2.201549  -2.866 0.004447 ** 
## TmBRK        -6.448532   2.285343  -2.822 0.005087 ** 
## TmCHI        -3.744525   2.086900  -1.794 0.073743 .  
## TmCHO        -4.098013   2.160884  -1.896 0.058833 .  
## TmCLE         0.109122   2.248447   0.049 0.961324    
## TmDAL        -6.357322   2.106607  -3.018 0.002758 ** 
## TmDEN        -8.216854   2.171500  -3.784 0.000185 ***
## TmDET        -5.589226   2.302867  -2.427 0.015793 *  
## TmGSW        -4.920410   2.118277  -2.323 0.020837 *  
## TmHOU        -7.065979   2.242756  -3.151 0.001789 ** 
## TmIND        -4.292849   2.270132  -1.891 0.059558 .  
## TmLAC        -0.008961   2.304303  -0.004 0.996900    
## TmLAL        -7.308872   2.473386  -2.955 0.003367 ** 
## TmMEM        -3.027874   2.295557  -1.319 0.188141    
## TmMIA        -3.563001   2.182793  -1.632 0.103632    
## TmMIL        -4.002225   2.183872  -1.833 0.067820 .  
## TmMIN        -9.332995   2.311381  -4.038 6.81e-05 ***
## TmNOP        -2.480412   2.270520  -1.092 0.275490    
## TmNYK        -5.813322   2.219887  -2.619 0.009260 ** 
## TmOKC        -0.849232   2.206241  -0.385 0.700560    
## TmORL        -5.375653   2.146781  -2.504 0.012793 *  
## TmPHI        -9.312596   2.366881  -3.935 0.000103 ***
## TmPHO        -5.401032   2.169769  -2.489 0.013328 *  
## TmPOR        -2.400754   2.160522  -1.111 0.267350    
## TmSAC        -8.284555   2.116963  -3.913 0.000112 ***
## TmSAS        -4.001309   2.204308  -1.815 0.070459 .  
## TmTOR        -1.495658   2.135185  -0.700 0.484154    
## TmTOT        -6.388921   1.741185  -3.669 0.000286 ***
## TmUTA        -3.653216   2.116531  -1.726 0.085340 .  
## TmWAS        -3.869519   2.192941  -1.765 0.078630 .  
## G            -0.159696   0.029570  -5.401 1.33e-07 ***
## MP            0.004577   0.001346   3.401 0.000759 ***
## PER           0.056932   0.172968   0.329 0.742269    
## `TS%`        31.296961  28.842914   1.085 0.278731    
## FTr          -2.559017   4.986184  -0.513 0.608163    
## FG            0.088416   0.020178   4.382 1.61e-05 ***
## FGA          -0.027145   0.007954  -3.413 0.000728 ***
## `FG%`       -22.506673  17.037540  -1.321 0.187477    
## `2PA`        -0.013283   0.005497  -2.416 0.016261 *  
## `eFG%`      -17.582608  29.565643  -0.595 0.552481    
## FTA           0.014648   0.007146   2.050 0.041235 *  
## `FT%`        -3.808184   4.872422  -0.782 0.435060    
## AST           0.008234   0.003896   2.114 0.035356 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.853 on 309 degrees of freedom
## Multiple R-squared:  0.6673, Adjusted R-squared:  0.6156 
## F-statistic: 12.91 on 48 and 309 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Age + Tm + G + MP + 
##     `TS%` + FTr + FG + FGA + `FG%` + `2PA` + `eFG%` + FTA + `FT%` + 
##     AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.8643  -2.8448  -0.1932   3.0061  15.3444 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.373377   5.959896   3.251 0.001278 ** 
## PosPF        -1.103418   0.958785  -1.151 0.250681    
## PosPG        -4.291503   1.278185  -3.357 0.000885 ***
## PosSF        -2.887009   1.109175  -2.603 0.009690 ** 
## PosSG        -3.522264   1.164458  -3.025 0.002696 ** 
## Age          -0.029649   0.075314  -0.394 0.694090    
## TmBOS        -6.314430   2.198318  -2.872 0.004355 ** 
## TmBRK        -6.416837   2.280028  -2.814 0.005200 ** 
## TmCHI        -3.714137   2.081856  -1.784 0.075393 .  
## TmCHO        -4.045062   2.151786  -1.880 0.061064 .  
## TmCLE         0.133036   2.244038   0.059 0.952764    
## TmDAL        -6.323684   2.101098  -3.010 0.002830 ** 
## TmDEN        -8.197184   2.167554  -3.782 0.000187 ***
## TmDET        -5.587450   2.299546  -2.430 0.015674 *  
## TmGSW        -4.925851   2.115164  -2.329 0.020511 *  
## TmHOU        -7.040701   2.238214  -3.146 0.001818 ** 
## TmIND        -4.279371   2.266496  -1.888 0.059946 .  
## TmLAC         0.069401   2.288672   0.030 0.975828    
## TmLAL        -7.307460   2.469822  -2.959 0.003327 ** 
## TmMEM        -2.947358   2.279201  -1.293 0.196921    
## TmMIA        -3.537102   2.178235  -1.624 0.105427    
## TmMIL        -4.022309   2.179878  -1.845 0.065962 .  
## TmMIN        -9.418114   2.293563  -4.106 5.15e-05 ***
## TmNOP        -2.449118   2.265263  -1.081 0.280465    
## TmNYK        -5.751309   2.208694  -2.604 0.009660 ** 
## TmOKC        -0.863564   2.202637  -0.392 0.695284    
## TmORL        -5.373432   2.143681  -2.507 0.012700 *  
## TmPHI        -9.307974   2.363433  -3.938 0.000101 ***
## TmPHO        -5.350176   2.161145  -2.476 0.013835 *  
## TmPOR        -2.395692   2.157358  -1.110 0.267655    
## TmSAC        -8.332619   2.108881  -3.951 9.63e-05 ***
## TmSAS        -3.861378   2.159809  -1.788 0.074780 .  
## TmTOR        -1.446744   2.126941  -0.680 0.496886    
## TmTOT        -6.358422   1.736215  -3.662 0.000294 ***
## TmUTA        -3.606409   2.108709  -1.710 0.088221 .  
## TmWAS        -3.891765   2.188744  -1.778 0.076371 .  
## G            -0.161542   0.028991  -5.572 5.47e-08 ***
## MP            0.004383   0.001208   3.627 0.000335 ***
## `TS%`        34.769289  26.805746   1.297 0.195567    
## FTr          -2.593037   4.977938  -0.521 0.602804    
## FG            0.090826   0.018776   4.837 2.08e-06 ***
## FGA          -0.027538   0.007852  -3.507 0.000520 ***
## `FG%`       -22.054223  16.957555  -1.301 0.194377    
## `2PA`        -0.013457   0.005464  -2.463 0.014329 *  
## `eFG%`      -19.738184  28.789698  -0.686 0.493478    
## FTA           0.015008   0.007052   2.128 0.034100 *  
## `FT%`        -4.031305   4.818092  -0.837 0.403405    
## AST           0.008654   0.003675   2.355 0.019158 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.846 on 310 degrees of freedom
## Multiple R-squared:  0.6672, Adjusted R-squared:  0.6167 
## F-statistic: 13.22 on 47 and 310 DF,  p-value: < 2.2e-16

We’ve now come to the point where we will remove Age, which I had originally thought would be a helpful indicator to the model:

## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Tm + G + MP + `TS%` + 
##     FTr + FG + FGA + `FG%` + `2PA` + `eFG%` + FTA + `FT%` + AST, 
##     data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.9012  -2.8248  -0.1794   3.0236  15.4394 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.566720   5.588983   3.322 0.001000 ** 
## PosPF        -1.061774   0.951636  -1.116 0.265397    
## PosPG        -4.190041   1.250230  -3.351 0.000903 ***
## PosSF        -2.810366   1.090472  -2.577 0.010421 *  
## PosSG        -3.454598   1.150137  -3.004 0.002884 ** 
## TmBOS        -6.268907   2.192290  -2.860 0.004530 ** 
## TmBRK        -6.399142   2.276485  -2.811 0.005253 ** 
## TmCHI        -3.738799   2.078085  -1.799 0.072963 .  
## TmCHO        -3.998885   2.145666  -1.864 0.063306 .  
## TmCLE         0.106429   2.239971   0.048 0.962134    
## TmDAL        -6.415502   2.085275  -3.077 0.002280 ** 
## TmDEN        -8.125158   2.156882  -3.767 0.000198 ***
## TmDET        -5.535999   2.292708  -2.415 0.016329 *  
## TmGSW        -4.952196   2.111231  -2.346 0.019622 *  
## TmHOU        -7.072912   2.233678  -3.166 0.001696 ** 
## TmIND        -4.256594   2.262677  -1.881 0.060876 .  
## TmLAC        -0.024564   2.273098  -0.011 0.991385    
## TmLAL        -7.255242   2.462905  -2.946 0.003464 ** 
## TmMEM        -3.079225   2.251388  -1.368 0.172393    
## TmMIA        -3.579052   2.172670  -1.647 0.100505    
## TmMIL        -3.929935   2.164266  -1.816 0.070360 .  
## TmMIN        -9.308636   2.273547  -4.094 5.40e-05 ***
## TmNOP        -2.445056   2.262160  -1.081 0.280601    
## TmNYK        -5.753550   2.205685  -2.609 0.009533 ** 
## TmOKC        -0.839353   2.198785  -0.382 0.702919    
## TmORL        -5.308780   2.134476  -2.487 0.013401 *  
## TmPHI        -9.181521   2.338321  -3.927 0.000106 ***
## TmPHO        -5.342968   2.158130  -2.476 0.013828 *  
## TmPOR        -2.324455   2.146833  -1.083 0.279764    
## TmSAC        -8.344454   2.105800  -3.963 9.20e-05 ***
## TmSAS        -3.966484   2.140330  -1.853 0.064799 .  
## TmTOR        -1.405214   2.121436  -0.662 0.508213    
## TmTOT        -6.407384   1.729400  -3.705 0.000250 ***
## TmUTA        -3.544161   2.099914  -1.688 0.092459 .  
## TmWAS        -3.887985   2.185748  -1.779 0.076251 .  
## G            -0.161268   0.028944  -5.572 5.47e-08 ***
## MP            0.004331   0.001199   3.611 0.000355 ***
## `TS%`        34.096828  26.714898   1.276 0.202794    
## FTr          -2.464821   4.960519  -0.497 0.619619    
## FG            0.090574   0.018740   4.833 2.11e-06 ***
## FGA          -0.027274   0.007813  -3.491 0.000551 ***
## `FG%`       -21.423964  16.858863  -1.271 0.204755    
## `2PA`        -0.013434   0.005456  -2.462 0.014358 *  
## `eFG%`      -19.632916  28.749319  -0.683 0.495178    
## FTA           0.014939   0.007040   2.122 0.034623 *  
## `FT%`        -4.075498   4.810236  -0.847 0.397504    
## AST           0.008435   0.003628   2.325 0.020711 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.839 on 311 degrees of freedom
## Multiple R-squared:  0.667,  Adjusted R-squared:  0.6178 
## F-statistic: 13.54 on 46 and 311 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Tm + G + MP + `TS%` + 
##     FG + FGA + `FG%` + `2PA` + `eFG%` + FTA + `FT%` + AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.7725  -2.7710  -0.2483   2.9208  15.4736 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  17.918309   5.427942   3.301 0.001075 ** 
## PosPF        -0.999265   0.942145  -1.061 0.289680    
## PosPG        -4.179786   1.248550  -3.348 0.000915 ***
## PosSF        -2.745907   1.081420  -2.539 0.011597 *  
## PosSG        -3.416748   1.146226  -2.981 0.003100 ** 
## TmBOS        -6.244440   2.189090  -2.853 0.004627 ** 
## TmBRK        -6.395766   2.273726  -2.813 0.005221 ** 
## TmCHI        -3.732267   2.075534  -1.798 0.073109 .  
## TmCHO        -3.960610   2.141693  -1.849 0.065362 .  
## TmCLE         0.118214   2.237140   0.053 0.957892    
## TmDAL        -6.457485   2.081046  -3.103 0.002091 ** 
## TmDEN        -8.101066   2.153733  -3.761 0.000202 ***
## TmDET        -5.587081   2.287636  -2.442 0.015149 *  
## TmGSW        -4.914952   2.107352  -2.332 0.020321 *  
## TmHOU        -7.068386   2.230962  -3.168 0.001685 ** 
## TmIND        -4.219731   2.258729  -1.868 0.062673 .  
## TmLAC        -0.128312   2.260754  -0.057 0.954776    
## TmLAL        -7.254854   2.459931  -2.949 0.003427 ** 
## TmMEM        -3.102576   2.248179  -1.380 0.168563    
## TmMIA        -3.543214   2.168850  -1.634 0.103334    
## TmMIL        -3.854960   2.156392  -1.788 0.074797 .  
## TmMIN        -9.293542   2.270599  -4.093 5.43e-05 ***
## TmNOP        -2.450586   2.259401  -1.085 0.278928    
## TmNYK        -5.757565   2.203006  -2.614 0.009396 ** 
## TmOKC        -0.762667   2.190713  -0.348 0.727973    
## TmORL        -5.335097   2.131242  -2.503 0.012816 *  
## TmPHI        -9.143565   2.334250  -3.917 0.000110 ***
## TmPHO        -5.371814   2.154744  -2.493 0.013184 *  
## TmPOR        -2.381222   2.141202  -1.112 0.266953    
## TmSAC        -8.273354   2.098396  -3.943 9.95e-05 ***
## TmSAS        -3.993014   2.137080  -1.868 0.062637 .  
## TmTOR        -1.371237   2.117773  -0.647 0.517791    
## TmTOT        -6.428177   1.726806  -3.723 0.000234 ***
## TmUTA        -3.556630   2.097228  -1.696 0.090908 .  
## TmWAS        -3.919168   2.182208  -1.796 0.073468 .  
## G            -0.158550   0.028388  -5.585 5.09e-08 ***
## MP            0.004177   0.001157   3.609 0.000358 ***
## `TS%`        27.726810  23.409647   1.184 0.237149    
## FG            0.090788   0.018712   4.852 1.94e-06 ***
## FGA          -0.026974   0.007780  -3.467 0.000600 ***
## `FG%`       -24.701646  15.496045  -1.594 0.111935    
## `2PA`        -0.012669   0.005229  -2.423 0.015960 *  
## `eFG%`      -11.624879  23.778501  -0.489 0.625269    
## FTA           0.012347   0.004722   2.615 0.009360 ** 
## `FT%`        -3.111552   4.396377  -0.708 0.479627    
## AST           0.008631   0.003602   2.396 0.017159 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.833 on 312 degrees of freedom
## Multiple R-squared:  0.6667, Adjusted R-squared:  0.6187 
## F-statistic: 13.87 on 45 and 312 DF,  p-value: < 2.2e-16

In this last run, our adjusted r-squared didn’t decrease, but our residual standard error went down. Let’s keep going with our eliminations:

## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Tm + G + MP + `TS%` + 
##     FG + FGA + `FG%` + `2PA` + FTA + `FT%` + AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.8446  -2.7867  -0.2265   2.9412  15.5120 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  17.367171   5.303123   3.275 0.001175 ** 
## PosPF        -1.049536   0.935378  -1.122 0.262703    
## PosPG        -4.290358   1.226400  -3.498 0.000536 ***
## PosSF        -2.833499   1.065178  -2.660 0.008214 ** 
## PosSG        -3.529062   1.121601  -3.146 0.001812 ** 
## TmBOS        -6.186426   2.183212  -2.834 0.004901 ** 
## TmBRK        -6.361102   2.269856  -2.802 0.005388 ** 
## TmCHI        -3.750866   2.072661  -1.810 0.071303 .  
## TmCHO        -3.940468   2.138692  -1.842 0.066353 .  
## TmCLE         0.149522   2.233504   0.067 0.946668    
## TmDAL        -6.383992   2.073084  -3.079 0.002258 ** 
## TmDEN        -8.081361   2.150737  -3.757 0.000205 ***
## TmDET        -5.598814   2.284728  -2.451 0.014811 *  
## TmGSW        -4.859742   2.101765  -2.312 0.021414 *  
## TmHOU        -7.098690   2.227388  -3.187 0.001583 ** 
## TmIND        -4.211452   2.255918  -1.867 0.062858 .  
## TmLAC        -0.194220   2.253986  -0.086 0.931388    
## TmLAL        -7.216472   2.455687  -2.939 0.003541 ** 
## TmMEM        -3.093364   2.245365  -1.378 0.169291    
## TmMIA        -3.494986   2.163970  -1.615 0.107301    
## TmMIL        -3.843599   2.153644  -1.785 0.075279 .  
## TmMIN        -9.189014   2.257760  -4.070 5.96e-05 ***
## TmNOP        -2.454769   2.256637  -1.088 0.277520    
## TmNYK        -5.714177   2.198540  -2.599 0.009790 ** 
## TmOKC        -0.707466   2.185140  -0.324 0.746334    
## TmORL        -5.310440   2.128053  -2.495 0.013095 *  
## TmPHI        -9.092178   2.329046  -3.904 0.000116 ***
## TmPHO        -5.316163   2.149118  -2.474 0.013904 *  
## TmPOR        -2.362740   2.138264  -1.105 0.270017    
## TmSAC        -8.268715   2.095822  -3.945 9.84e-05 ***
## TmSAS        -3.925513   2.130021  -1.843 0.066282 .  
## TmTOR        -1.361017   2.115094  -0.643 0.520385    
## TmTOT        -6.406202   1.724121  -3.716 0.000240 ***
## TmUTA        -3.533948   2.094164  -1.688 0.092499 .  
## TmWAS        -3.919957   2.179553  -1.799 0.073059 .  
## G            -0.159850   0.028229  -5.663 3.37e-08 ***
## MP            0.004149   0.001154   3.594 0.000378 ***
## `TS%`        18.862452  14.788941   1.275 0.203098    
## FG            0.087816   0.017676   4.968 1.11e-06 ***
## FGA          -0.026641   0.007741  -3.442 0.000657 ***
## `FG%`       -27.293938  14.542887  -1.877 0.061477 .  
## `2PA`        -0.011624   0.004766  -2.439 0.015284 *  
## FTA           0.013505   0.004080   3.310 0.001043 ** 
## `FT%`        -2.082230   3.854731  -0.540 0.589460    
## AST           0.008802   0.003580   2.458 0.014497 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.827 on 313 degrees of freedom
## Multiple R-squared:  0.6665, Adjusted R-squared:  0.6196 
## F-statistic: 14.22 on 44 and 313 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Tm + G + MP + `TS%` + 
##     FG + FGA + `FG%` + `2PA` + FTA + AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.8548  -2.7673  -0.2873   3.0274  15.5516 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  16.020890   4.675694   3.426 0.000693 ***
## PosPF        -1.026501   0.933351  -1.100 0.272261    
## PosPG        -4.326583   1.223183  -3.537 0.000465 ***
## PosSF        -2.817146   1.063546  -2.649 0.008486 ** 
## PosSG        -3.544884   1.119954  -3.165 0.001702 ** 
## TmBOS        -6.196685   2.180666  -2.842 0.004782 ** 
## TmBRK        -6.288926   2.263363  -2.779 0.005789 ** 
## TmCHI        -3.718131   2.069437  -1.797 0.073346 .  
## TmCHO        -3.871668   2.132487  -1.816 0.070391 .  
## TmCLE         0.207959   2.228365   0.093 0.925706    
## TmDAL        -6.372841   2.070643  -3.078 0.002270 ** 
## TmDEN        -8.074077   2.148268  -3.758 0.000204 ***
## TmDET        -5.471832   2.270038  -2.410 0.016507 *  
## TmGSW        -4.810971   2.097455  -2.294 0.022467 *  
## TmHOU        -7.037423   2.221988  -3.167 0.001691 ** 
## TmIND        -4.200427   2.253281  -1.864 0.063235 .  
## TmLAC        -0.199023   2.251425  -0.088 0.929616    
## TmLAL        -7.129292   2.447613  -2.913 0.003839 ** 
## TmMEM        -3.135079   2.241505  -1.399 0.162905    
## TmMIA        -3.437127   2.158878  -1.592 0.112371    
## TmMIL        -3.796548   2.149454  -1.766 0.078320 .  
## TmMIN        -9.160278   2.254586  -4.063 6.13e-05 ***
## TmNOP        -2.455896   2.254089  -1.090 0.276756    
## TmNYK        -5.770647   2.193575  -2.631 0.008941 ** 
## TmOKC        -0.664463   2.181226  -0.305 0.760851    
## TmORL        -5.307012   2.125643  -2.497 0.013049 *  
## TmPHI        -8.986538   2.318202  -3.877 0.000129 ***
## TmPHO        -5.269972   2.144993  -2.457 0.014556 *  
## TmPOR        -2.373148   2.135765  -1.111 0.267355    
## TmSAC        -8.191351   2.088563  -3.922 0.000108 ***
## TmSAS        -3.958534   2.126741  -1.861 0.063634 .  
## TmTOR        -1.296021   2.109286  -0.614 0.539372    
## TmTOT        -6.371101   1.720952  -3.702 0.000253 ***
## TmUTA        -3.491915   2.090357  -1.670 0.095819 .  
## TmWAS        -3.848108   2.173036  -1.771 0.077557 .  
## G            -0.159647   0.028194  -5.662 3.37e-08 ***
## MP            0.004207   0.001148   3.664 0.000291 ***
## `TS%`        14.717145  12.627825   1.165 0.244720    
## FG            0.087203   0.017619   4.949 1.22e-06 ***
## FGA          -0.026295   0.007706  -3.412 0.000729 ***
## `FG%`       -22.897220  12.038176  -1.902 0.058079 .  
## `2PA`        -0.012263   0.004611  -2.659 0.008232 ** 
## FTA           0.014020   0.003963   3.538 0.000464 ***
## AST           0.008910   0.003571   2.495 0.013101 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.822 on 314 degrees of freedom
## Multiple R-squared:  0.6662, Adjusted R-squared:  0.6205 
## F-statistic: 14.57 on 43 and 314 DF,  p-value: < 2.2e-16

The above model run, is the best model we’ll be able to eek out without any type of data transformations. If we remove the next highest p-value factor TS%, the model returns a higher residual error as well as a lower adjusted r-squared value:

## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Tm + G + MP + FG + 
##     FGA + `FG%` + `2PA` + FTA + AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.8021  -2.9458  -0.3667   2.9623  15.7687 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.659795   4.093191   4.559 7.37e-06 ***
## PosPF        -0.883689   0.925798  -0.955 0.340554    
## PosPG        -4.181362   1.217512  -3.434 0.000673 ***
## PosSF        -2.533525   1.035919  -2.446 0.015005 *  
## PosSG        -3.220868   1.085514  -2.967 0.003236 ** 
## TmBOS        -6.404589   2.174593  -2.945 0.003468 ** 
## TmBRK        -6.402664   2.262544  -2.830 0.004956 ** 
## TmCHI        -3.804716   2.069279  -1.839 0.066905 .  
## TmCHO        -3.850031   2.133619  -1.804 0.072114 .  
## TmCLE         0.057248   2.225874   0.026 0.979498    
## TmDAL        -6.424537   2.071344  -3.102 0.002099 ** 
## TmDEN        -8.148359   2.148543  -3.793 0.000179 ***
## TmDET        -5.616280   2.267940  -2.476 0.013797 *  
## TmGSW        -5.030329   2.090181  -2.407 0.016675 *  
## TmHOU        -7.425419   2.198156  -3.378 0.000822 ***
## TmIND        -4.432995   2.245703  -1.974 0.049256 *  
## TmLAC        -0.505987   2.237237  -0.226 0.821219    
## TmLAL        -7.370429   2.440239  -3.020 0.002731 ** 
## TmMEM        -3.170121   2.242578  -1.414 0.158465    
## TmMIA        -3.572091   2.156996  -1.656 0.098708 .  
## TmMIL        -3.985195   2.144569  -1.858 0.064063 .  
## TmMIN        -9.062082   2.254292  -4.020 7.29e-05 ***
## TmNOP        -2.501026   2.255038  -1.109 0.268240    
## TmNYK        -5.799560   2.194682  -2.643 0.008640 ** 
## TmOKC        -0.906151   2.172580  -0.417 0.676900    
## TmORL        -5.363804   2.126292  -2.523 0.012141 *  
## TmPHI        -9.211694   2.311452  -3.985 8.38e-05 ***
## TmPHO        -5.362271   2.144749  -2.500 0.012921 *  
## TmPOR        -2.686588   2.119968  -1.267 0.205992    
## TmSAC        -8.264730   2.088801  -3.957 9.39e-05 ***
## TmSAS        -4.000362   2.127648  -1.880 0.061006 .  
## TmTOR        -1.346284   2.110044  -0.638 0.523914    
## TmTOT        -6.427731   1.721244  -3.734 0.000223 ***
## TmUTA        -3.606637   2.089225  -1.726 0.085273 .  
## TmWAS        -3.919862   2.173399  -1.804 0.072255 .  
## G            -0.156140   0.028049  -5.567 5.56e-08 ***
## MP            0.004016   0.001137   3.532 0.000474 ***
## FG            0.089992   0.017466   5.152 4.55e-07 ***
## FGA          -0.024965   0.007625  -3.274 0.001178 ** 
## `FG%`       -11.915701   7.496500  -1.590 0.112950    
## `2PA`        -0.015743   0.003517  -4.477 1.06e-05 ***
## FTA           0.015450   0.003770   4.098 5.30e-05 ***
## AST           0.009082   0.003570   2.544 0.011433 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.825 on 315 degrees of freedom
## Multiple R-squared:  0.6647, Adjusted R-squared:   0.62 
## F-statistic: 14.87 on 42 and 315 DF,  p-value: < 2.2e-16

While our final model still isn’t amazing, it is about twice as good as our initial model - and we still haven’t performed any transformations to the dataset. I think there is still some room for improvement here, which we’ll explore next week.

You can actually run this analysis automatically using the stepAIC function from the MASS library as shown below, with very similar results. Having done it once by hand, i’ll probably save some time next time by using this function.

## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## Start:  AIC=1184.21
## salary_2017_in_millions ~ Pos + Age + Tm + G + MP + PER + `TS%` + 
##     FTr + FG + FGA + `FG%` + `2P` + `2PA` + `2P%` + `eFG%` + 
##     FT + FTA + `FT%` + TRB + AST
## 
##          Df Sum of Sq    RSS    AIC
## - TRB     1      0.00 7275.8 1182.2
## - FT      1      0.10 7275.9 1182.2
## - `2P`    1      0.32 7276.1 1182.2
## - `2P%`   1      0.63 7276.4 1182.2
## - PER     1      2.36 7278.1 1182.3
## - Age     1      2.90 7278.7 1182.4
## - `2PA`   1      4.94 7280.7 1182.5
## - FTr     1      5.07 7280.8 1182.5
## - `eFG%`  1      5.10 7280.9 1182.5
## - `FT%`   1     13.29 7289.1 1182.9
## - FTA     1     13.61 7289.4 1182.9
## - `TS%`   1     14.76 7290.5 1182.9
## - `FG%`   1     25.75 7301.5 1183.5
## <none>                7275.8 1184.2
## - FGA     1     55.22 7331.0 1184.9
## - Pos     4    211.69 7487.5 1186.5
## - AST     1     98.91 7374.7 1187.0
## - FG      1    101.68 7377.5 1187.2
## - MP      1    172.78 7448.5 1190.6
## - Tm     30   1900.08 9175.9 1207.3
## - G       1    672.45 7948.2 1213.9
## 
## Step:  AIC=1182.21
## salary_2017_in_millions ~ Pos + Age + Tm + G + MP + PER + `TS%` + 
##     FTr + FG + FGA + `FG%` + `2P` + `2PA` + `2P%` + `eFG%` + 
##     FT + FTA + `FT%` + AST
## 
##          Df Sum of Sq    RSS    AIC
## - FT      1      0.10 7275.9 1180.2
## - `2P`    1      0.32 7276.1 1180.2
## - `2P%`   1      0.63 7276.4 1180.2
## - PER     1      2.71 7278.5 1180.3
## - Age     1      2.90 7278.7 1180.4
## - `2PA`   1      4.94 7280.7 1180.5
## - FTr     1      5.07 7280.8 1180.5
## - `eFG%`  1      5.15 7280.9 1180.5
## - `FT%`   1     13.38 7289.2 1180.9
## - FTA     1     14.57 7290.3 1180.9
## - `TS%`   1     14.98 7290.8 1181.0
## - `FG%`   1     26.41 7302.2 1181.5
## <none>                7275.8 1182.2
## - FGA     1     56.04 7331.8 1183.0
## + TRB     1      0.00 7275.8 1184.2
## - AST     1    100.60 7376.4 1185.1
## - FG      1    101.70 7377.5 1185.2
## - Pos     4    294.64 7570.4 1188.4
## - MP      1    249.46 7525.2 1192.3
## - Tm     30   1901.56 9177.3 1205.3
## - G       1    676.68 7952.4 1212.0
## 
## Step:  AIC=1180.22
## salary_2017_in_millions ~ Pos + Age + Tm + G + MP + PER + `TS%` + 
##     FTr + FG + FGA + `FG%` + `2P` + `2PA` + `2P%` + `eFG%` + 
##     FTA + `FT%` + AST
## 
##          Df Sum of Sq    RSS    AIC
## - `2P`    1      0.33 7276.2 1178.2
## - `2P%`   1      0.67 7276.5 1178.2
## - Age     1      2.90 7278.8 1178.4
## - PER     1      2.91 7278.8 1178.4
## - `2PA`   1      4.84 7280.7 1178.5
## - FTr     1      5.66 7281.5 1178.5
## - `eFG%`  1      7.87 7283.7 1178.6
## - `FT%`   1     13.46 7289.3 1178.9
## - `TS%`   1     24.39 7300.3 1179.4
## - `FG%`   1     30.52 7306.4 1179.7
## <none>                7275.9 1180.2
## - FGA     1     59.22 7335.1 1181.1
## + FT      1      0.10 7275.8 1182.2
## + TRB     1      0.01 7275.9 1182.2
## - FTA     1     96.73 7372.6 1183.0
## - FG      1    102.54 7378.4 1183.2
## - AST     1    103.01 7378.9 1183.2
## - Pos     4    295.84 7571.7 1186.5
## - MP      1    266.28 7542.2 1191.1
## - Tm     30   1923.34 9199.2 1204.2
## - G       1    679.27 7955.2 1210.2
## 
## Step:  AIC=1178.24
## salary_2017_in_millions ~ Pos + Age + Tm + G + MP + PER + `TS%` + 
##     FTr + FG + FGA + `FG%` + `2PA` + `2P%` + `eFG%` + FTA + `FT%` + 
##     AST
## 
##          Df Sum of Sq    RSS    AIC
## - `2P%`   1      0.36 7276.6 1176.2
## - PER     1      2.63 7278.8 1176.4
## - Age     1      2.94 7279.2 1176.4
## - FTr     1      5.74 7281.9 1176.5
## - `eFG%`  1      8.10 7284.3 1176.6
## - `FT%`   1     14.27 7290.5 1176.9
## - `TS%`   1     27.36 7303.6 1177.6
## - `FG%`   1     30.61 7306.8 1177.7
## <none>                7276.2 1178.2
## + `2P`    1      0.33 7275.9 1180.2
## + FT      1      0.12 7276.1 1180.2
## + TRB     1      0.00 7276.2 1180.2
## - FTA     1     96.82 7373.0 1181.0
## - AST     1    104.95 7381.2 1181.4
## - `2PA`   1    126.34 7402.5 1182.4
## - Pos     4    298.37 7574.6 1184.6
## - MP      1    272.80 7549.0 1189.4
## - FGA     1    273.78 7550.0 1189.5
## - FG      1    445.12 7721.3 1197.5
## - Tm     30   1926.34 9202.6 1202.3
## - G       1    684.57 7960.8 1208.4
## 
## Step:  AIC=1176.25
## salary_2017_in_millions ~ Pos + Age + Tm + G + MP + PER + `TS%` + 
##     FTr + FG + FGA + `FG%` + `2PA` + `eFG%` + FTA + `FT%` + AST
## 
##          Df Sum of Sq    RSS    AIC
## - PER     1      2.55 7279.1 1174.4
## - Age     1      2.92 7279.5 1174.4
## - FTr     1      6.20 7282.8 1174.6
## - `eFG%`  1      8.33 7284.9 1174.7
## - `FT%`   1     14.39 7291.0 1175.0
## - `TS%`   1     27.73 7304.3 1175.6
## <none>                7276.6 1176.2
## - `FG%`   1     41.09 7317.7 1176.3
## + `2P%`   1      0.36 7276.2 1178.2
## + FT      1      0.14 7276.4 1178.2
## + `2P`    1      0.02 7276.5 1178.2
## + TRB     1      0.01 7276.6 1178.2
## - FTA     1     98.94 7375.5 1179.1
## - AST     1    105.19 7381.8 1179.4
## - `2PA`   1    137.48 7414.1 1181.0
## - Pos     4    299.19 7575.8 1182.7
## - MP      1    272.45 7549.0 1187.4
## - FGA     1    274.30 7550.9 1187.5
## - FG      1    452.13 7728.7 1195.8
## - Tm     30   1926.17 9202.7 1200.3
## - G       1    686.83 7963.4 1206.5
## 
## Step:  AIC=1174.38
## salary_2017_in_millions ~ Pos + Age + Tm + G + MP + `TS%` + FTr + 
##     FG + FGA + `FG%` + `2PA` + `eFG%` + FTA + `FT%` + AST
## 
##          Df Sum of Sq    RSS    AIC
## - Age     1      3.64 7282.8 1172.6
## - FTr     1      6.37 7285.5 1172.7
## - `eFG%`  1     11.04 7290.2 1172.9
## - `FT%`   1     16.44 7295.6 1173.2
## - `TS%`   1     39.51 7318.6 1174.3
## - `FG%`   1     39.72 7318.8 1174.3
## <none>                7279.1 1174.4
## + PER     1      2.55 7276.6 1176.2
## + TRB     1      0.47 7278.6 1176.4
## + FT      1      0.33 7278.8 1176.4
## + `2P%`   1      0.28 7278.8 1176.4
## + `2P`    1      0.01 7279.1 1176.4
## - FTA     1    106.36 7385.5 1177.6
## - AST     1    130.20 7409.3 1178.7
## - `2PA`   1    142.42 7421.5 1179.3
## - Pos     4    349.76 7628.9 1183.2
## - FGA     1    288.78 7567.9 1186.3
## - MP      1    308.98 7588.1 1187.3
## - FG      1    549.45 7828.6 1198.4
## - Tm     30   1972.90 9252.0 1200.2
## - G       1    729.05 8008.2 1206.5
## 
## Step:  AIC=1172.56
## salary_2017_in_millions ~ Pos + Tm + G + MP + `TS%` + FTr + FG + 
##     FGA + `FG%` + `2PA` + `eFG%` + FTA + `FT%` + AST
## 
##          Df Sum of Sq    RSS    AIC
## - FTr     1      5.78 7288.5 1170.8
## - `eFG%`  1     10.92 7293.7 1171.1
## - `FT%`   1     16.81 7299.6 1171.4
## - `FG%`   1     37.82 7320.6 1172.4
## - `TS%`   1     38.15 7320.9 1172.4
## <none>                7282.8 1172.6
## + Age     1      3.64 7279.1 1174.4
## + PER     1      3.27 7279.5 1174.4
## + TRB     1      0.70 7282.1 1174.5
## + FT      1      0.35 7282.4 1174.5
## + `2P%`   1      0.25 7282.5 1174.5
## + `2P`    1      0.01 7282.8 1174.6
## - FTA     1    105.45 7388.2 1175.7
## - AST     1    126.59 7409.4 1176.7
## - `2PA`   1    141.94 7424.7 1177.5
## - Pos     4    349.19 7631.9 1181.3
## - FGA     1    285.35 7568.1 1184.3
## - MP      1    305.34 7588.1 1185.3
## - FG      1    547.04 7829.8 1196.5
## - Tm     30   1973.81 9256.6 1198.4
## - G       1    726.99 8009.7 1204.6
## 
## Step:  AIC=1170.84
## salary_2017_in_millions ~ Pos + Tm + G + MP + `TS%` + FG + FGA + 
##     `FG%` + `2PA` + `eFG%` + FTA + `FT%` + AST
## 
##          Df Sum of Sq    RSS    AIC
## - `eFG%`  1      5.58 7294.1 1169.1
## - `FT%`   1     11.70 7300.2 1169.4
## - `TS%`   1     32.77 7321.3 1170.5
## <none>                7288.5 1170.8
## - `FG%`   1     59.36 7347.9 1171.8
## + FTr     1      5.78 7282.8 1172.6
## + PER     1      3.38 7285.2 1172.7
## + Age     1      3.05 7285.5 1172.7
## + `2P%`   1      0.63 7287.9 1172.8
## + TRB     1      0.43 7288.1 1172.8
## + FT      1      0.34 7288.2 1172.8
## + `2P`    1      0.04 7288.5 1172.8
## - AST     1    134.12 7422.7 1175.4
## - `2PA`   1    137.15 7425.7 1175.5
## - FTA     1    159.73 7448.3 1176.6
## - Pos     4    349.85 7638.4 1179.6
## - FGA     1    280.78 7569.3 1182.4
## - MP      1    304.30 7592.8 1183.5
## - FG      1    549.91 7838.5 1194.9
## - Tm     30   1968.97 9257.5 1196.5
## - G       1    728.71 8017.2 1203.0
## 
## Step:  AIC=1169.12
## salary_2017_in_millions ~ Pos + Tm + G + MP + `TS%` + FG + FGA + 
##     `FG%` + `2PA` + FTA + `FT%` + AST
## 
##          Df Sum of Sq    RSS    AIC
## - `FT%`   1      6.80 7300.9 1167.5
## - `TS%`   1     37.91 7332.0 1169.0
## <none>                7294.1 1169.1
## + PER     1      5.66 7288.5 1170.8
## + `eFG%`  1      5.58 7288.5 1170.8
## + Age     1      3.32 7290.8 1171.0
## + FT      1      3.25 7290.9 1171.0
## + `2P%`   1      0.54 7293.6 1171.1
## + FTr     1      0.44 7293.7 1171.1
## + `2P`    1      0.03 7294.1 1171.1
## + TRB     1      0.02 7294.1 1171.1
## - `FG%`   1     82.08 7376.2 1171.1
## - `2PA`   1    138.63 7432.8 1173.9
## - AST     1    140.84 7435.0 1174.0
## - FTA     1    255.29 7549.4 1179.4
## - Pos     4    384.57 7678.7 1179.5
## - FGA     1    276.01 7570.1 1180.4
## - MP      1    300.99 7595.1 1181.6
## - FG      1    575.21 7869.3 1194.3
## - Tm     30   1965.34 9259.5 1194.5
## - G       1    747.27 8041.4 1202.0
## 
## Step:  AIC=1167.45
## salary_2017_in_millions ~ Pos + Tm + G + MP + `TS%` + FG + FGA + 
##     `FG%` + `2PA` + FTA + AST
## 
##          Df Sum of Sq    RSS    AIC
## - `TS%`   1     31.58 7332.5 1167.0
## <none>                7300.9 1167.5
## + `FT%`   1      6.80 7294.1 1169.1
## + PER     1      5.99 7294.9 1169.2
## + Age     1      3.75 7297.2 1169.3
## + `eFG%`  1      0.68 7300.2 1169.4
## + `2P%`   1      0.44 7300.5 1169.4
## + TRB     1      0.22 7300.7 1169.4
## + FTr     1      0.14 7300.8 1169.4
## + FT      1      0.11 7300.8 1169.4
## + `2P`    1      0.01 7300.9 1169.5
## - `FG%`   1     84.12 7385.0 1169.5
## - AST     1    144.77 7445.7 1172.5
## - `2PA`   1    164.43 7465.4 1173.4
## - Pos     4    394.61 7695.5 1178.3
## - FGA     1    270.74 7571.7 1178.5
## - FTA     1    291.07 7592.0 1179.5
## - MP      1    312.22 7613.1 1180.4
## - FG      1    569.55 7870.5 1192.3
## - Tm     30   1958.56 9259.5 1192.5
## - G       1    745.50 8046.4 1200.3
## 
## Step:  AIC=1167
## salary_2017_in_millions ~ Pos + Tm + G + MP + FG + FGA + `FG%` + 
##     `2PA` + FTA + AST
## 
##          Df Sum of Sq    RSS    AIC
## <none>                7332.5 1167.0
## + `TS%`   1     31.58 7300.9 1167.5
## - `FG%`   1     58.81 7391.3 1167.9
## + PER     1     18.67 7313.8 1168.1
## + `eFG%`  1     10.97 7321.5 1168.5
## + FT      1      8.48 7324.0 1168.6
## + `2P`    1      1.91 7330.6 1168.9
## + Age     1      1.27 7331.2 1168.9
## + `FT%`   1      0.47 7332.0 1169.0
## + `2P%`   1      0.46 7332.0 1169.0
## + FTr     1      0.31 7332.2 1169.0
## + TRB     1      0.15 7332.4 1169.0
## - AST     1    150.67 7483.2 1172.3
## - Pos     4    368.31 7700.8 1176.5
## - FGA     1    249.52 7582.0 1177.0
## - MP      1    290.44 7622.9 1178.9
## - FTA     1    390.96 7723.5 1183.6
## - `2PA`   1    466.52 7799.0 1187.1
## - Tm     30   1946.23 9278.7 1191.3
## - FG      1    617.95 7950.5 1194.0
## - G       1    721.32 8053.8 1198.6
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## salary_2017_in_millions ~ Pos + Age + Tm + G + MP + PER + `TS%` + 
##     FTr + FG + FGA + `FG%` + `2P` + `2PA` + `2P%` + `eFG%` + 
##     FT + FTA + `FT%` + TRB + AST
## 
## Final Model:
## salary_2017_in_millions ~ Pos + Tm + G + MP + FG + FGA + `FG%` + 
##     `2PA` + FTA + AST
## 
## 
##        Step Df     Deviance Resid. Df Resid. Dev      AIC
## 1                                 305   7275.772 1184.214
## 2     - TRB  1  0.000207136       306   7275.772 1182.214
## 3      - FT  1  0.104079693       307   7275.876 1180.220
## 4    - `2P`  1  0.333172397       308   7276.210 1178.236
## 5   - `2P%`  1  0.358263313       309   7276.568 1176.254
## 6     - PER  1  2.551185046       310   7279.119 1174.379
## 7     - Age  1  3.639134778       311   7282.758 1172.558
## 8     - FTr  1  5.781650500       312   7288.540 1170.842
## 9  - `eFG%`  1  5.583338028       313   7294.123 1169.116
## 10  - `FT%`  1  6.799831890       314   7300.923 1167.450
## 11  - `TS%`  1 31.581893335       315   7332.505 1166.995
## 
## Call:
## lm(formula = salary_2017_in_millions ~ Pos + Tm + G + MP + FG + 
##     FGA + `FG%` + `2PA` + FTA + AST, data = nba_data_for_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.8021  -2.9458  -0.3667   2.9623  15.7687 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.659795   4.093191   4.559 7.37e-06 ***
## PosPF        -0.883689   0.925798  -0.955 0.340554    
## PosPG        -4.181362   1.217512  -3.434 0.000673 ***
## PosSF        -2.533525   1.035919  -2.446 0.015005 *  
## PosSG        -3.220868   1.085514  -2.967 0.003236 ** 
## TmBOS        -6.404589   2.174593  -2.945 0.003468 ** 
## TmBRK        -6.402664   2.262544  -2.830 0.004956 ** 
## TmCHI        -3.804716   2.069279  -1.839 0.066905 .  
## TmCHO        -3.850031   2.133619  -1.804 0.072114 .  
## TmCLE         0.057248   2.225874   0.026 0.979498    
## TmDAL        -6.424537   2.071344  -3.102 0.002099 ** 
## TmDEN        -8.148359   2.148543  -3.793 0.000179 ***
## TmDET        -5.616280   2.267940  -2.476 0.013797 *  
## TmGSW        -5.030329   2.090181  -2.407 0.016675 *  
## TmHOU        -7.425419   2.198156  -3.378 0.000822 ***
## TmIND        -4.432995   2.245703  -1.974 0.049256 *  
## TmLAC        -0.505987   2.237237  -0.226 0.821219    
## TmLAL        -7.370429   2.440239  -3.020 0.002731 ** 
## TmMEM        -3.170121   2.242578  -1.414 0.158465    
## TmMIA        -3.572091   2.156996  -1.656 0.098708 .  
## TmMIL        -3.985195   2.144569  -1.858 0.064063 .  
## TmMIN        -9.062082   2.254292  -4.020 7.29e-05 ***
## TmNOP        -2.501026   2.255038  -1.109 0.268240    
## TmNYK        -5.799560   2.194682  -2.643 0.008640 ** 
## TmOKC        -0.906151   2.172580  -0.417 0.676900    
## TmORL        -5.363804   2.126292  -2.523 0.012141 *  
## TmPHI        -9.211694   2.311452  -3.985 8.38e-05 ***
## TmPHO        -5.362271   2.144749  -2.500 0.012921 *  
## TmPOR        -2.686588   2.119968  -1.267 0.205992    
## TmSAC        -8.264730   2.088801  -3.957 9.39e-05 ***
## TmSAS        -4.000362   2.127648  -1.880 0.061006 .  
## TmTOR        -1.346284   2.110044  -0.638 0.523914    
## TmTOT        -6.427731   1.721244  -3.734 0.000223 ***
## TmUTA        -3.606637   2.089225  -1.726 0.085273 .  
## TmWAS        -3.919862   2.173399  -1.804 0.072255 .  
## G            -0.156140   0.028049  -5.567 5.56e-08 ***
## MP            0.004016   0.001137   3.532 0.000474 ***
## FG            0.089992   0.017466   5.152 4.55e-07 ***
## FGA          -0.024965   0.007625  -3.274 0.001178 ** 
## `FG%`       -11.915701   7.496500  -1.590 0.112950    
## `2PA`        -0.015743   0.003517  -4.477 1.06e-05 ***
## FTA           0.015450   0.003770   4.098 5.30e-05 ***
## AST           0.009082   0.003570   2.544 0.011433 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.825 on 315 degrees of freedom
## Multiple R-squared:  0.6647, Adjusted R-squared:   0.62 
## F-statistic: 14.87 on 42 and 315 DF,  p-value: < 2.2e-16

Now that we have a better model, let’s do some analysis of the residuals:

Let’s start by analyzing the Residuals vs Fitted plot on the top-left first. This plot shows us if our residuals have any non-linear patterns. Looking at the plot, we can see that the red line is not perfectly straight and that it does almost seem to have a very gentle sigmoid curve. As the curves are fairly faint, I think its safe to say here that it is approriate to categorize the relationships between our predictor variables and our outcome variable as linear.

Moving to the Normal Q-Q plot on the bottom-left, this plot shows us if our residuals are normally distributed. Looking at our plot, for the most part, our residuals follow the diagonal line, however we do see some deviation in the top right of the chart, although it is not a severe deviation.

Turning our attention now to the Scale-Location plot on the top-right, this plot helps us to check the assumption of equal variance (homoscedasticity). If the residuals had equal variance we would see a horizontal line with equally spread points. However, in our plot you can see that those points from 0-5 have a smaller variance than the rest of the plot. What we are seeing is referred to as heteroscedasticity. There isn’t a tremendous amount of difference in the variation but there definitely is some. This plot is probably right on the edge of what we would deem as acceptable, but as the heteroscedasticity is fairly faint, I’ll consider this assumption as met.

Lastly, let’s look at the Residuals vs Leverage plot on the bottom-right. This plot helps us to determine if we have influential outliers in our data that are pulling the regression line in one direction or another. In this plot, we aren’t looking for patterns, we are really looking for outlying values in the upper-right or lower-right corner. Those spots are places where cases can be heavily influential to the least-squares line. What we are looking for is points that fall outside of the red-dashed line, which is Cook’s distance. Points outside of that line tell us that the points would be influential to the regression results. In the case of our plot, we can’t even see the Cook’s distance lines because all the cases are well inside of the lines, so we can have some comfort that our outliers, if any, are not having a large effect on the model. All in all, we’ve built a semi-proficient multi-factor linear regression model that could definitely be directional in its predictions. Our next step in the coming week will be to see if we can improve our model with some data transformations (log transformation, including a quadratic term, etc.).