Introduction

The intertwining of sports and statistics has now affected how each game is played. Basketball is in the midst of adding more numbers to define success. In baseball, there are more and more metrics being used to calculate how efficient a player and a team can be. Baseball is by far the most set in terms of advanced metrics. Basketball, on the other hand, is starting to emerge with more and more metrics to alter how the game is played. This is especially important when a game like basketball is constantly changing; the game played today is already much different from how it was played in the 1990’s. In today’s era, it is all about how well the team can shoot the ball, create open shots, and rely on the three pointer.

I am confident that in today’s era of basketball, the advanced metrics are driving team’s successes, and that one statistic, True-Shooting Percentage (TS%), should now be used to value a player and their team over regular statistics. TS% is a measure of shooting efficiency that takes into account field goals, 3-point field goals, and free throws. Basketball-Reference.com, where I collected the TS% data, has True-Shooting Attempts defined as FGA + 0.44*FTA. FGA is Field Goal Attempts, and FTA is Free Throw attempts.

Advanced statistics have been quite controversial in sports. Should we rely on numbers to measure talent, or should we look more at their athleticism and passion to win a game? Such arguments have gotten scouts and sports analysts fired because the GM wants to take a more analytical route. One such team that has started to take a more analytical approach to basketball is the Golden State Warriors. They are the 2014-2015 NBA Champions, and in 2015-16, set the record for the most amount of wins in a single season with a 73-9 record.

What’s notable about the Warriors is that they rely on shooting and creating open shots, something that TS% can measure better than other statistics because it weighs shot percentages and free throws. Golden State led the league with a .593 TS%. The second highest? The Oklahoma City Thunder, at .565, a huge difference. This year, among players who have taken more than 500 field goal attempts, the third best player in terms of TS% is Stephen Curry, the leader of the Warriors. Not far behind him is Klay Thompson, who averages 22 points for Golden State.

If I can prove that TS% is very significant in estimating the number of wins a team gets and how good a player can be, then it should start replacing regular statistics in basketball, and should be the biggest tool for NBA scouts and analysts in trying to find the best types of players. After proving that it is significant in estimating wins for teams, given other variables, and by examining TS% on specific teams, I can determine that it is the most valuable tool. Hence, TS% has the potential to become the way to monitor the future of basketball.

Methods

This report will focus on analyzing significant advanced and basic statistics in basketball, and using them to estimate the number of wins per team in a season. This will be done two ways - the first will be using a linear model approach, and the other will be through a tree-based model. I used a tree-based model to simplify the data and evaluate using only a few variables. Futhermore, I will evaluate TS% on its own in predicting how well it values a player’s performance per team.

I used basketballreference.com (advanced statistics) and nba.com (basic statistics) to sort out my various statistics. The number of wins was made into a separate list, and can be found anywhere online. Please see the appendix at the end of this sheet to find more information about each statistic given in the various models.

To analyze the data, I first tried to find a linear model that would fit my data, the used it to predict the number of wins each team would get. Another way I did this using team statistics is by a tree-based model, and using that to find out what model can be made. Lastly, I took statistics from individual players (n>300) to measure each TS% number, and used that to estimate the number of win shares each player contributed to the team. Win shares estimates how many wins each player contributes to his/her team. I summed up the players’ predictions based on what team they were on, and used that to estimate the wins.

Graphically, I used the plot_ly command to create interactive graphs, as well as make a little more comfortable. I also used the tree package, as well as the ISLR package.

setwd("C:/Users/Evan Boyd/Desktop/College/Spring 2016/Stat 479/Individual Project/Files/CSV")
basic = read.csv("basic.csv", header = T)
advanced = read.csv("advanced.csv", header = T)
advanced = advanced[,-22]

wins = c(60,40,38,33,50,53,50,30,32,67,56,38,56,21,55,37,41,16,45,17,45,25,18,39,51,29,55,49,38,46)

basic$wins = wins
advanced$wins = wins

Below are the first few lines of the “basic” NBA statistics table. Please see the references for a description of each statistic.

head(basic)
##        TEAM   OWN   OPP DIFF FG.OWN FG.OPP X3P.OWN X3P.OPP   FT. OFF.REB.
## 1   Atlanta 102.5  97.1  5.4  0.466  0.439   0.380   0.341 0.778    0.214
## 2    Boston 101.4 101.2  0.2  0.443  0.450   0.327   0.336 0.754    0.247
## 3  Brooklyn  98.0 100.9 -2.9  0.451  0.456   0.331   0.358 0.748    0.239
## 4 Charlotte  94.2  97.3 -3.1  0.420  0.440   0.318   0.357 0.748    0.221
## 5   Chicago 100.8  97.8  3.0  0.442  0.435   0.353   0.335 0.783    0.270
## 6 Cleveland 103.1  98.7  4.4  0.458  0.456   0.367   0.343 0.751    0.268
##   DEF.REB. TOT.REB.. OWN.TO OPP.TO wins
## 1    0.734     0.482   13.5   15.4   60
## 2    0.750     0.495   13.3   14.6   40
## 3    0.737     0.489   13.4   13.0   38
## 4    0.793     0.500   11.2   12.0   33
## 5    0.744     0.513   13.2   11.7   50
## 6    0.747     0.511   13.6   12.7   53

Next are the first few lines of the “advanced” NBA statistics table. Please see the references for a description of each statistic.

head(advanced)
##                   Team  Age PW PL   MOV   SOS   SRS  ORtg  DRtg Pace   FTr
## 1       Atlanta Hawks* 27.8 56 26  5.43 -0.68  4.75 108.9 103.1 93.9 0.259
## 2      Boston Celtics* 25.0 41 41  0.16 -0.56 -0.40 104.7 104.5 95.8 0.233
## 3       Brooklyn Nets* 28.6 33 49 -2.88 -0.25 -3.13 104.4 107.4 92.7 0.267
## 4    Charlotte Hornets 26.0 32 50 -3.17 -0.27 -3.44 100.1 103.5 93.0 0.269
## 5       Chicago Bulls* 28.8 50 32  3.00 -0.46  2.54 107.5 104.3 92.8 0.304
## 6 Cleveland Cavaliers* 26.9 53 29  4.48 -0.40  4.08 111.1 106.3 92.3 0.287
##   X3PAr   TS.  eFG. TOV. ORB. FT.FGA eFG..1 TOV..1 DRB. FT.FGA.1
## 1 0.321 0.563 0.527 13.5 21.4  0.201  0.492   14.9 73.4    0.185
## 2 0.280 0.523 0.489 12.5 24.7  0.176  0.494   13.7 75.0    0.208
## 3 0.240 0.529 0.491 13.0 23.9  0.200  0.506   12.9 73.7    0.185
## 4 0.226 0.498 0.456 11.2 22.1  0.202  0.487   12.0 79.3    0.188
## 5 0.269 0.536 0.489 12.9 27.0  0.238  0.473   11.3 74.4    0.182
## 6 0.334 0.557 0.520 13.4 26.8  0.216  0.502   12.6 74.7    0.177
##   Attendance wins
## 1     713909   60
## 2     721350   40
## 3     698529   38
## 4     704886   33
## 5     886612   50
## 6     843042   53
plot(basic$OPP, wins, main = "Wins by Opponents Points Per Game", xlab = "Opponent PPG", ylab = "Wins",col=ifelse(basic$OPP==basic[10,3], "red", "blue"),
     pch=ifelse(basic$OPP==basic[10,3], 19, 1), cex=ifelse(basic$OPP==basic[10,3], 1, 1))
 legend(101.0909,65.20229,
        c("Golden State"),
        lty = c(1,1),
        col = "red")

Notice that, though they had the most wins in the 2014-15 season, Golden State did not hold their opponents to the fewest points on average.The correlation between wins and opponent points per game is -.51; there is a medium-level negative correlation between the two variables.

Linear-based model

out.basic = lm(wins~.-TEAM-DIFF+X3P.OWN:FG.OWN+FG.OPP:X3P.OPP+TOT.REB..:OWN.TO,data=basic)
out.basic.AIC = step(out.basic, trace = FALSE)

out.basic2 = update(out.basic.AIC, .~. -OFF.REB.)

out.basic3 = update(out.basic2, .~. -FG.OPP)

out.basic4 = update(out.basic3, .~. -OPP.TO)

out.basic5 = update(out.basic4, .~. -X3P.OWN)
summary(out.basic5)
## 
## Call:
## lm(formula = wins ~ OWN + OPP + FG.OWN + X3P.OPP + TOT.REB.., 
##     data = basic)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4541 -1.0268 -0.1471  1.4232  4.1071 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  144.6642    37.3273   3.876 0.000721 ***
## OWN            2.2902     0.2089  10.964 7.91e-11 ***
## OPP           -2.7084     0.2074 -13.060 2.13e-12 ***
## FG.OWN        87.5261    49.1573   1.781 0.087654 .  
## X3P.OPP     -132.0821    51.2540  -2.577 0.016541 *  
## TOT.REB..   -109.8973    37.5014  -2.930 0.007316 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.434 on 24 degrees of freedom
## Multiple R-squared:  0.9729, Adjusted R-squared:  0.9673 
## F-statistic: 172.5 on 5 and 24 DF,  p-value: < 2.2e-16
plot(out.basic5)

Our summary finds that of the basic stats, there are five variables that are significant in predicting wins at the .1 significance level. First, the amount of points per game scored and opponents points per game are significant in the model. This makes perfect sense - the more points you score and the more you limit your opponents, the more likely you win. The other significant variables are field goal percentage, Opponent 3-point field goal percentage, and total rebound percentage. I took a look at logical interaction terms, but none of them appear to be significant in the model. Our diagnostics are reasonable, so there is nothing skewed about the model and it is appropriate to use.

Now, I’ll take a look at the significant variables in the advanced data, computing using the standard p-value approach. This method will check to see what variables are significnant and calculate a linear model, using wins as the dependent variable.

setwd("C:/Users/Evan Boyd/Desktop/College/Spring 2016/Stat 479/Individual Project/Files/CSV")
advanced = read.csv("advanced.csv", header = T)
advanced = advanced[,-22]

advanced$wins = wins
head(advanced)
##                   Team  Age PW PL   MOV   SOS   SRS  ORtg  DRtg Pace   FTr
## 1       Atlanta Hawks* 27.8 56 26  5.43 -0.68  4.75 108.9 103.1 93.9 0.259
## 2      Boston Celtics* 25.0 41 41  0.16 -0.56 -0.40 104.7 104.5 95.8 0.233
## 3       Brooklyn Nets* 28.6 33 49 -2.88 -0.25 -3.13 104.4 107.4 92.7 0.267
## 4    Charlotte Hornets 26.0 32 50 -3.17 -0.27 -3.44 100.1 103.5 93.0 0.269
## 5       Chicago Bulls* 28.8 50 32  3.00 -0.46  2.54 107.5 104.3 92.8 0.304
## 6 Cleveland Cavaliers* 26.9 53 29  4.48 -0.40  4.08 111.1 106.3 92.3 0.287
##   X3PAr   TS.  eFG. TOV. ORB. FT.FGA eFG..1 TOV..1 DRB. FT.FGA.1
## 1 0.321 0.563 0.527 13.5 21.4  0.201  0.492   14.9 73.4    0.185
## 2 0.280 0.523 0.489 12.5 24.7  0.176  0.494   13.7 75.0    0.208
## 3 0.240 0.529 0.491 13.0 23.9  0.200  0.506   12.9 73.7    0.185
## 4 0.226 0.498 0.456 11.2 22.1  0.202  0.487   12.0 79.3    0.188
## 5 0.269 0.536 0.489 12.9 27.0  0.238  0.473   11.3 74.4    0.182
## 6 0.334 0.557 0.520 13.4 26.8  0.216  0.502   12.6 74.7    0.177
##   Attendance wins
## 1     713909   60
## 2     721350   40
## 3     698529   38
## 4     704886   33
## 5     886612   50
## 6     843042   53
out.adv = lm(wins~.-Team-PL-PW+Pace:ORtg+TS.:X3PAr+eFG.:TS.+eFG.:X3PAr+Pace:FTr+ORtg:DRtg,data = advanced)
out.adv.AIC = step(out.adv, trace = FALSE)
out.adv1= update(out.adv.AIC, .~. -X3PAr:eFG.)
out.adv2= update(out.adv1, .~. -ORtg:DRtg)
out.adv3= update(out.adv2, .~. -ORtg:Pace)
out.adv4= update(out.adv3, .~. -ORtg)
out.adv5= update(out.adv4, .~. -Attendance)
out.adv6= update(out.adv5, .~. -SOS)
out.adv7= update(out.adv6, .~. -SRS)
summary(out.adv7)
## 
## Call:
## lm(formula = wins ~ Age + MOV + DRtg + Pace + FTr + X3PAr + TS. + 
##     eFG. + TOV. + ORB. + FT.FGA + eFG..1 + TOV..1 + DRB. + FT.FGA.1 + 
##     TS.:eFG. + Pace:FTr, data = advanced)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9961 -1.2007  0.1919  0.7763  2.4821 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  2534.7292   668.0572   3.794  0.00256 **
## Age             0.7875     0.3495   2.253  0.04373 * 
## MOV            -9.7722     3.1751  -3.078  0.00958 **
## DRtg          -17.7768     4.6622  -3.813  0.00247 **
## Pace           -5.7064     2.4711  -2.309  0.03953 * 
## FTr         -3087.2407   995.1980  -3.102  0.00915 **
## X3PAr          48.6919    20.2582   2.404  0.03330 * 
## TS.         -4441.8665  1395.8790  -3.182  0.00789 **
## eFG.         2332.7848  1088.0784   2.144  0.05321 . 
## TOV.          -15.4753     4.0611  -3.811  0.00248 **
## ORB.            6.4102     1.7898   3.581  0.00377 **
## FT.FGA       2262.8943   634.1885   3.568  0.00387 **
## eFG..1        898.4616   364.7875   2.463  0.02988 * 
## TOV..1         -7.9169     3.3131  -2.390  0.03416 * 
## DRB.           -4.1530     1.3748  -3.021  0.01065 * 
## FT.FGA.1      154.9870    77.2950   2.005  0.06804 . 
## TS.:eFG.     3510.7995  1420.3594   2.472  0.02940 * 
## Pace:FTr       20.7749     8.8546   2.346  0.03697 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.985 on 12 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9782 
## F-statistic: 77.66 on 17 and 12 DF,  p-value: 1.152e-09
plot(out.adv7)

In the advanced set, there are many more variables that are significant in fitting the model. We even have two significant interaction terms: True Shooting Percentage with Effective Field Goal Percentage, as well as Pace of Play with Free Throw Attempt Rate. The Adjusted R-Squared is still high at 0.9782.

Now, I will combine the two data sets of significant variables and predict a stronger model.

out = lm(wins~advanced$Age+advanced$MOV+advanced$DRtg+advanced$Pace+advanced$FTr+advanced$X3PAr+advanced$TS.
         +advanced$eFG.+advanced$TOV.+advanced$ORB.+advanced$FT.FGA+advanced$eFG..1+advanced$TOV..1+advanced$DRB.
         +basic$OWN+basic$OPP+basic$FG.OWN+basic$X3P.OPP+basic$TOT.REB..+advanced$FT.FGA.1
         +advanced$TS.:advanced$eFG. + advanced$Pace:advanced$FTr
         +basic$OWN:advanced$TS.+basic$FG.OWN:advanced$TS.
         +basic$FG.OWN:advanced$X3PAr+basic$OWN:advanced$FTr+basic$X3P.OPP:advanced$X3PAr
         +basic$TOT.REB..:advanced$ORB.)
out.AIC = step(out, trace = FALSE)

Using backwards elimination at the .025 level, I ended up with this model:

out2 = glm(wins~advanced$Pace+advanced$FTr+advanced$X3PAr+advanced$TS.
         +advanced$eFG.+advanced$ORB.+advanced$FT.FGA+advanced$eFG..1+advanced$TOV..1
         +basic$OWN+basic$FG.OWN+basic$TOT.REB..
         +advanced$TS.:advanced$eFG.
         +basic$FG.OWN:advanced$TS.
         +basic$FG.OWN:advanced$X3PAr
         +basic$TOT.REB..:advanced$ORB.)
summary(out2)
## 
## Call:
## glm(formula = wins ~ advanced$Pace + advanced$FTr + advanced$X3PAr + 
##     advanced$TS. + advanced$eFG. + advanced$ORB. + advanced$FT.FGA + 
##     advanced$eFG..1 + advanced$TOV..1 + basic$OWN + basic$FG.OWN + 
##     basic$TOT.REB.. + advanced$TS.:advanced$eFG. + basic$FG.OWN:advanced$TS. + 
##     basic$FG.OWN:advanced$X3PAr + basic$TOT.REB..:advanced$ORB.)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.74936  -0.50681   0.03013   0.47220   1.52704  
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    6.603e+02  2.559e+02   2.580  0.02287 *  
## advanced$Pace                 -3.929e+00  2.634e-01 -14.915 1.48e-09 ***
## advanced$FTr                  -4.700e+02  1.743e+02  -2.697  0.01831 *  
## advanced$X3PAr                 3.727e+03  3.693e+02  10.092 1.62e-07 ***
## advanced$TS.                  -3.143e+03  9.489e+02  -3.312  0.00561 ** 
## advanced$eFG.                 -1.591e+04  2.420e+03  -6.573 1.79e-05 ***
## advanced$ORB.                 -2.199e+01  5.270e+00  -4.173  0.00109 ** 
## advanced$FT.FGA                9.350e+02  3.593e+02   2.602  0.02191 *  
## advanced$eFG..1               -4.176e+02  3.366e+01 -12.406 1.40e-08 ***
## advanced$TOV..1                1.726e+00  4.994e-01   3.457  0.00425 ** 
## basic$OWN                      2.988e+00  2.310e-01  12.937 8.44e-09 ***
## basic$FG.OWN                   2.001e+04  2.290e+03   8.738 8.40e-07 ***
## basic$TOT.REB..               -1.071e+03  2.784e+02  -3.848  0.00201 ** 
## advanced$TS.:advanced$eFG.     3.166e+04  3.618e+03   8.751 8.26e-07 ***
## advanced$TS.:basic$FG.OWN     -3.211e+04  3.819e+03  -8.409 1.29e-06 ***
## advanced$X3PAr:basic$FG.OWN   -8.081e+03  8.131e+02  -9.939 1.93e-07 ***
## advanced$ORB.:basic$TOT.REB..  4.193e+01  1.040e+01   4.032  0.00142 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.476056)
## 
##     Null deviance: 5250.000  on 29  degrees of freedom
## Residual deviance:   19.189  on 13  degrees of freedom
## AIC: 107.73
## 
## Number of Fisher Scoring iterations: 2
plot(out2)

lm.predict = predict(out2)

It is clear that advanced statistics seem to be more valuable than basic statistics, but there are some important interaction terms with the two datasets. The variables with a strong significance are PPG, Pace of Play, and TS%. We will come back to this model later when looking at specific players and the 2015-16 season numbers. Once again, our diagnostics are appropriate.

Tree Based Model

Tree-based modeling is a simple way to interpret what variables are used in evaluating (NOT predicting) an observed variable. A regression tree can predict the number of wins, based on what values our specific stats are. Here I found a basic tree to help examine wins:

library(tree)
## Warning: package 'tree' was built under R version 3.4.2
library(ISLR)
## Warning: package 'ISLR' was built under R version 3.4.2
tree.basic=tree(wins~Pace+FTr+X3PAr+TS.
                +eFG.+ORB.+FT.FGA+eFG..1+TOV..1
                +basic$OWN+basic$FG.OWN+basic$TOT.REB..
                ,advanced
                )
summary(tree.basic)
## 
## Regression tree:
## tree(formula = wins ~ Pace + FTr + X3PAr + TS. + eFG. + ORB. + 
##     FT.FGA + eFG..1 + TOV..1 + basic$OWN + basic$FG.OWN + basic$TOT.REB.., 
##     data = advanced)
## Variables actually used in tree construction:
## [1] "TS."    "eFG."   "eFG..1"
## Number of terminal nodes:  5 
## Residual mean deviance:  36.46 = 911.6 / 25 
## Distribution of residuals:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -8.6000 -4.1670 -0.4111  0.0000  3.4750 12.0000
plot(tree.basic, main = "Tree-Based Model: After Significant Model")
text(tree.basic,pretty=0)

cv.advanced=cv.tree(tree.basic)
plot(cv.advanced$size,cv.advanced$dev,type='b',xlab="Size",ylab="Deviations",main="Tree-Based Plot")

yhat=predict(tree.basic)
out = lm(wins~yhat)
cor(yhat,wins)
## [1] 0.9090457

Our tree indicates that TS%, effective field goal percentage (eFG), and opponent effective field goal percentage, create an accurate tree model with a correlation of .90 with wins. Effective field goal percentage adjusts for the fact that a 3-pointer is worth more than a two-pointer. Hence, it would make sense for a team like the Golden State Warriors to have such a high percentage, and thus it makes sense for the two variables to be correlated with wins.

When TS% is less than .5265 (below the 2015 average, which is .5341), then a team will get about 21 wins with an eFG less than .476 and about 33 with an eFG above .476. When TS% is greater than .5265 but less than .543, a team will get about 38 wins (3 games under .500) with an eFG greater than .494, but 46 wins if it is less than .494. It appears as though having too low of an eFG percentage will hurt, but if it is too high, then teams are not taking as many two-point shots or free throws, and thus are losing a few extra games.

Note: Interaction factors do not work with tree-based models. Also, none of the “basic” statistics that we found were significant were needed in predicting this model.

True Shooting Percentage Analysis

Next, I will take a look at TS% on its own, by looking at each individual player’s percentage from the 2014-15 season. To avoid skewness, I took out any player that did not play more than 500 minutes on the season. Instead of wins, my y-variable will be win shares.

## Warning: package 'plotly' was built under R version 3.4.2
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
TS_FRAME = data.frame(shooting,teams,wins_frame,WS,Player)
names(TS_FRAME) = c("TS","Team","# of Wins", "Win_Shares","Player")
head(TS_FRAME, n =20)
##       TS Team # of Wins Win_Shares          Player
## 1  0.699  ATL        60        7.5     Kyle Korver
## 2  0.565  ATL        60        8.3    Paul Millsap
## 3  0.563  ATL        60        8.7      Al Horford
## 4  0.566  ATL        60        7.7     Jeff Teague
## 5  0.603  ATL        60        7.0 DeMarre Carroll
## 6  0.516  ATL        60        2.5 Dennis Schroder
## 7  0.520  ATL        60        1.6   Kent Bazemore
## 8  0.543  ATL        60        2.7      Mike Scott
## 9  0.508  ATL        60        1.5      Pero Antic
## 10 0.506  ATL        60        2.5 Thabo Sefolosha
## 11 0.489  ATL        60        1.3    Shelvin Mack
## 12 0.608  ATL        60        1.9    Mike Muscala
## 13 0.507  BOS        40        2.5   Avery Bradley
## 14 0.482  BOS        40        2.5     Evan Turner
## 15 0.557  BOS        40        5.3    Brandon Bass
## 16 0.491  BOS        40        2.9    Marcus Smart
## 17 0.594  BOS        40        6.5    Tyler Zeller
## 18 0.503  BOS        40        4.0 Jared Sullinger
## 19 0.558  BOS        40        3.6    Kelly Olynyk
## 20 0.512  BOS        40        3.3     Jae Crowder
plotbox <- plot_ly(TS_FRAME, x = shooting, color = teams, 
             type = "box",xlab = "True Shooting Percentage", 
             main = "True Shooting Percentages by Team")
plotbox
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

## Warning: 'box' objects don't have these attributes: 'xlab', 'main'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'hoverinfo', 'hoverlabel', 'stream', 'y', 'x', 'x0', 'y0', 'whiskerwidth', 'boxpoints', 'boxmean', 'jitter', 'pointpos', 'orientation', 'marker', 'line', 'fillcolor', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'ysrc', 'xsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule'

The graph above gives boxplots for each team based on each player’s TS%. It appears that Golden State, Toronto, Los Angeles (Clippers), Cleveland, and Atlanta all have an average TS% greater than the rest of the NBA teams. All of these teams made it to the playoffs in 2015.

plot_ly(TS_FRAME, x = TS., y = Win_Shares, text = paste("Player: ", TS_FRAME$Player),
        mode = "markers", color = TS., size = TS., opacity = TS.)
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter
#plot(TS_FRAME$TS, TS_FRAME$Win_Shares, xlab = "True Shooting Percentage",ylab="Win Shares",main = #"True Shooting Percentage versus Win Shares")
cor(TS_FRAME$TS, TS_FRAME$Win_Shares)
## [1] 0.580575
out_shares = lm(Win_Shares ~ TS, data = TS_FRAME)
summary(out_shares)
## 
## Call:
## lm(formula = Win_Shares ~ TS, data = TS_FRAME)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0250 -1.4046 -0.3676  0.8578 10.7471 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -14.653      1.268  -11.56   <2e-16 ***
## TS            33.567      2.384   14.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.275 on 390 degrees of freedom
## Multiple R-squared:  0.3371, Adjusted R-squared:  0.3354 
## F-statistic: 198.3 on 1 and 390 DF,  p-value: < 2.2e-16
predictions = predict(out_shares)
TS_FRAME$predictions = predictions
TS_FRAME$diff = TS_FRAME$Win_Shares - predictions
head(TS_FRAME, n = 15)
##       TS Team # of Wins Win_Shares          Player predictions       diff
## 1  0.699  ATL        60        7.5     Kyle Korver    8.810322 -1.3103217
## 2  0.565  ATL        60        8.3    Paul Millsap    4.312345  3.9876551
## 3  0.563  ATL        60        8.7      Al Horford    4.245211  4.4547891
## 4  0.566  ATL        60        7.7     Jeff Teague    4.345912  3.3540881
## 5  0.603  ATL        60        7.0 DeMarre Carroll    5.587891  1.4121095
## 6  0.516  ATL        60        2.5 Dennis Schroder    2.667562 -0.1675623
## 7  0.520  ATL        60        1.6   Kent Bazemore    2.801830 -1.2018303
## 8  0.543  ATL        60        2.7      Mike Scott    3.573871 -0.8738711
## 9  0.508  ATL        60        1.5      Pero Antic    2.399026 -0.8990264
## 10 0.506  ATL        60        2.5 Thabo Sefolosha    2.331892  0.1681076
## 11 0.489  ATL        60        1.3    Shelvin Mack    1.761254 -0.4612536
## 12 0.608  ATL        60        1.9    Mike Muscala    5.755726 -3.8557255
## 13 0.507  BOS        40        2.5   Avery Bradley    2.365459  0.1345406
## 14 0.482  BOS        40        2.5     Evan Turner    1.526285  0.9737154
## 15 0.557  BOS        40        5.3    Brandon Bass    4.043809  1.2561910

TS% only gives us a correlation of about .58. Some players that stand out in this graph are Stephen Curry, James Harden, and Chris Paul. Curry won the 2015 MVP race, while Harden finished in 2nd place and Paul in 6th place. All of them are excellent shooters, and Curry and Harden especially are notable 3-point shooters. Anthony Davis, Kyle Korver, Jimmy Butler, DeAndre Jordan, and LeBron James also stand out. All of these players were all-stars that year.

Looking at the first 20 rows of the data frame, some of the differences are minute, while others are very inaccurate, which can lead to a lot of problems. Summing up the total number of “wins” from the predictions column by team, I added it back to the advanced dataset.

advanced$Predictions = team.predictions
write.csv(advanced,'advanced_with_predictions.csv',row.names=F)

Based on our results, we find that true shooting percentage can be very accurate in estimating some teams wins, but at other times it comes nowhere close. This may be due to too much noise in the data, or the fact that there needs to be more variables implemented into the model. After all, basketball is not just about shooting (thought that is a big factor).

What’s interesting to see how accurate it is for some teams (eg. Dallas Mavericks, Indiana Pacers, Milwaukee Bucks), but really poor on others (Chicago Bulls, Memphis Grizzlies,Miami Heat). Chicago and Memphis are known to be defensive-minded teams, while Dallas is a more offensive team. What is interesting, however, is that of the 16 teams that made it to the playoffs, 13 of them would have made it to the playoffs in 2015.

overall.pred = data.frame(advanced$Team,lm.predict,yhat,team.predictions,wins)
names(overall.pred) = c("Team","LM Pred","Tree","TS Pred","Wins")
overall.pred
##                       Team  LM Pred     Tree  TS Pred Wins
## 1           Atlanta Hawks* 59.19007 55.22222 48.59284   60
## 2          Boston Celtics* 39.82644 33.00000 38.58785   40
## 3           Brooklyn Nets* 36.47296 37.60000 36.89373   38
## 4        Charlotte Hornets 32.59971 21.00000 30.97220   33
## 5           Chicago Bulls* 50.50716 45.83333 26.38726   50
## 6     Cleveland Cavaliers* 52.40175 55.22222 46.56103   53
## 7        Dallas Mavericks* 49.98080 55.22222 50.13692   50
## 8           Denver Nuggets 30.63676 33.00000 37.95008   30
## 9          Detroit Pistons 31.50735 33.00000 37.36367   32
## 10  Golden State Warriors* 67.17686 55.22222 49.46558   67
## 11        Houston Rockets* 55.58915 55.22222 37.36570   56
## 12          Indiana Pacers 37.60497 33.00000 38.26998   38
## 13   Los Angeles Clippers* 57.18031 55.22222 43.20635   56
## 14      Los Angeles Lakers 21.13945 21.00000 36.67453   21
## 15      Memphis Grizzlies* 55.69056 45.83333 33.10066   55
## 16              Miami Heat 36.80245 37.60000 53.12033   37
## 17        Milwaukee Bucks* 41.28591 45.83333 40.97313   41
## 18  Minnesota Timberwolves 15.19497 21.00000 35.21336   16
## 19   New Orleans Pelicans* 43.56867 37.60000 44.61414   45
## 20         New York Knicks 18.12086 21.00000 39.99361   17
## 21   Oklahoma City Thunder 44.95894 45.83333 43.48863   45
## 22           Orlando Magic 26.32363 33.00000 37.09513   25
## 23      Philadelphia 76ers 17.92769 21.00000 26.32013   18
## 24            Phoenix Suns 38.30892 37.60000 44.09486   39
## 25 Portland Trail Blazers* 51.70299 55.22222 47.09810   51
## 26        Sacramento Kings 29.18525 37.60000 40.04903   29
## 27      San Antonio Spurs* 53.50904 55.22222 58.80892   55
## 28        Toronto Raptors* 50.74936 55.22222 42.85491   49
## 29               Utah Jazz 38.35128 45.83333 32.53205   38
## 30     Washington Wizards* 46.50574 45.83333 43.50643   46
mse.lm.2015 = (1/30)*sum((lm.predict-wins)^2)
mse.yhat.2015 = (1/30)*sum((yhat-wins)^2)
mse.TS.2015 = (1.30)*sum((team.predictions-wins)^2)
c(mse.lm.2015,mse.yhat.2015,mse.TS.2015)
## [1]    0.6396242   30.3862963 5199.3585102

Based on our mean-squared error analysis, the linear model by far predicts wins better than the tree-based method or by looking at TS% alone. Despite this, at some points the other models had a better prediction of the actual number of wins by a team than the full linear model.For example, the Tree-based model did better at estimating the number of wins for the Washington Wizards, and the TS% approach estimated the number of wins for the Dallas Mavericks with extreme accuracy. Though TS% might not be good alone in solely predicting wins, it might be better at predicting offensive performance than normal statistics like just 2-point field goal percentage. TS% can estimate the number of wins a team gets due to their offense, while others could be explained by defensive factors.

Results: Predicting 2016 Data

The 2015-16 NBA regular season, meaning that the new data is available to test on. Using the data from the 2014-15 season, I will predict the number of wins and win shares each player gets through three ways: Our linear model combining basic and advanced statistics, our TS% model, and our tree-based model.

setwd("C:/Users/Evan Boyd/Desktop/College/Spring 2016/Stat 479/Individual Project/Files/CSV/2016")

basic.2016 = read.csv("leagues_NBA_2016_team.csv", header = T)
TOT.REB.. = c(.475,.494,.494,.496,.502,.52,.485,.511,.521,.513,.491,.498,.474,.48,
              .491,.517,.492,.498,.49,.501,.547,.494,.464,.506,.51,.496,.52,.516,.519,.486)
basic.2016$TOT.REB.. = TOT.REB..
advanced.2016 = read.csv("leagues_NBA_2016_misc.csv", header = T)
advanced.2016 = advanced.2016[,-23]
advanced.2016 = advanced.2016[,-24]
wins2 = c(48,48,21,48,42,57,42,33,44,73,41,45,53,17,42,48,33,29,30,
          32,55,35,10,23,44,33,67,56,40,41)

I will be taking the same approach as using the 2015 data, only I am using the variables expressed in the 2015 data in the linear model, as well as seeing what new variables fit a respectable tree diagram.

TS_FRAME_2016 = data.frame(shooting.2016,teams.2016,WS.2016,Player.2016)
names(TS_FRAME_2016) = c("TS","Team","Win_Shares","Player")
head(TS_FRAME_2016, n = 15)
##       TS Team Win_Shares          Player
## 1  0.556  ATL       10.1    Paul Millsap
## 2  0.565  ATL        9.4      Al Horford
## 3  0.578  ATL        4.1     Kyle Korver
## 4  0.551  ATL        5.9     Jeff Teague
## 5  0.551  ATL        4.1   Kent Bazemore
## 6  0.578  ATL        4.5 Thabo Sefolosha
## 7  0.510  ATL        2.2 Dennis Schroder
## 8  0.575  ATL        3.2      Mike Scott
## 9  0.563  ATL        1.9    Tim Hardaway
## 10 0.571  ATL        1.7  Tiago Splitter
## 11 0.576  ATL        1.6    Mike Muscala
## 12 0.562  BOS        9.7   Isaiah Thomas
## 13 0.538  BOS        4.8   Avery Bradley
## 14 0.565  BOS        7.3     Jae Crowder
## 15 0.513  BOS        4.0     Evan Turner
plot_ly(TS_FRAME_2016, x = TS., y = Win.Shares, 
        text = paste("Name: ", TS_FRAME_2016$Player),
        mode = "markers", color = TS., size = TS., 
        opacity = TS.)
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter
#plot(TS_FRAME_2016$TS, TS_FRAME_2016$Win_Shares,xlab="True Shooting Percentage",ylab="Win #Shares",main="True Shooting Percentage versus Win Shares: 2016")
cor(TS_FRAME_2016$TS, TS_FRAME_2016$Win_Shares)
## [1] 0.5977635
out.shares.2016 = lm(Win_Shares ~ TS, data = TS_FRAME_2016)
summary(out.shares.2016)
## 
## Call:
## lm(formula = Win_Shares ~ TS, data = TS_FRAME_2016)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0252 -1.5121 -0.3207  1.0127  9.9134 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -18.169      1.548  -11.73   <2e-16 ***
## TS            40.172      2.872   13.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.339 on 352 degrees of freedom
## Multiple R-squared:  0.3573, Adjusted R-squared:  0.3555 
## F-statistic: 195.7 on 1 and 352 DF,  p-value: < 2.2e-16
predictions.2016 = predict(out.shares.2016)
TS_FRAME_2016$predictions = predictions.2016
TS_FRAME_2016$diff = WS.2016 - predictions.2016
head(TS_FRAME_2016, n = 20)
##       TS Team Win_Shares          Player predictions       diff
## 1  0.556  ATL       10.1    Paul Millsap   4.1669193  5.9330807
## 2  0.565  ATL        9.4      Al Horford   4.5284714  4.8715286
## 3  0.578  ATL        4.1     Kyle Korver   5.0507133 -0.9507133
## 4  0.551  ATL        5.9     Jeff Teague   3.9660570  1.9339430
## 5  0.551  ATL        4.1   Kent Bazemore   3.9660570  0.1339430
## 6  0.578  ATL        4.5 Thabo Sefolosha   5.0507133 -0.5507133
## 7  0.510  ATL        2.2 Dennis Schroder   2.3189863 -0.1189863
## 8  0.575  ATL        3.2      Mike Scott   4.9301959 -1.7301959
## 9  0.563  ATL        1.9    Tim Hardaway   4.4481265 -2.5481265
## 10 0.571  ATL        1.7  Tiago Splitter   4.7695061 -3.0695061
## 11 0.576  ATL        1.6    Mike Muscala   4.9703684 -3.3703684
## 12 0.562  BOS        9.7   Isaiah Thomas   4.4079540  5.2920460
## 13 0.538  BOS        4.8   Avery Bradley   3.4438150  1.3561850
## 14 0.565  BOS        7.3     Jae Crowder   4.5284714  2.7715286
## 15 0.513  BOS        4.0     Evan Turner   2.4395036  1.5604964
## 16 0.476  BOS        4.8 Jared Sullinger   0.9531227  3.8468773
## 17 0.602  BOS        5.9    Amir Johnson   6.0148523 -0.1148523
## 18 0.463  BOS        2.9    Marcus Smart   0.4308808  2.4691192
## 19 0.561  BOS        4.1    Kelly Olynyk   4.3677815 -0.2677815
## 20 0.531  BOS        2.4   Jonas Jerebko   3.1626078 -0.7626078

Notice that in our TS%/Win Shares graph, Steph Curry has the highest TS% and the highest number of win shares. Next to him Kevin Durant and Kawhi Leonard. Curry was named unanimous Most Valuable Player for the 2016 season, with Leonard coming in second place, and Durant in 4th place. Once again, TS% is significant when estimating win shares alone in the model, though the adjusted-R^2 is low, meaning that there could be other variables to make a more accurate prediction.

tree.2016=tree(wins2~Pace+FTr+X3PAr+TS.
                +eFG.+ORB.+FT.FGA+eFG..1+TOV..1
                +basic$OWN+basic$FG.OWN+basic$TOT.REB..
                ,advanced.2016
)

summary(tree.2016)
## 
## Regression tree:
## tree(formula = wins2 ~ Pace + FTr + X3PAr + TS. + eFG. + ORB. + 
##     FT.FGA + eFG..1 + TOV..1 + basic$OWN + basic$FG.OWN + basic$TOT.REB.., 
##     data = advanced.2016)
## Variables actually used in tree construction:
## [1] "eFG..1" "TS."    "TOV..1"
## Number of terminal nodes:  4 
## Residual mean deviance:  43.87 = 1141 / 26 
## Distribution of residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -13.2900  -3.3960  -0.7929   0.0000   4.6500  14.5700
plot(tree.2016,main = "Tree-Based Plot: 2016")
text(tree.2016,pretty=0)

tree.basic
## node), split, n, deviance, yval
##       * denotes terminal node
## 
##  1) root 30 5250.0 41.00  
##    2) TS. < 0.5265 10  702.0 27.00  
##      4) eFG. < 0.476 5  194.0 21.00 *
##      5) eFG. > 0.476 5  148.0 33.00 *
##    3) TS. > 0.5265 20 1608.0 48.00  
##      6) TS. < 0.543 11  502.9 42.09  
##       12) eFG..1 < 0.494 6  186.8 45.83 *
##       13) eFG..1 > 0.494 5  131.2 37.60 *
##      7) TS. > 0.543 9  251.6 55.22 *
cv.2016=cv.tree(tree.2016)
plot(cv.2016$size,cv.2016$dev,type='b',xlab="Size",ylab="Deviations",main ="Tree-Based Plot: 2016")

yhat.2016=predict(tree.2016)
out.2016.tree = lm(wins~yhat)
cor(yhat.2016,wins2)
## [1] 0.8921499

There is still a strong correlation with the number of wins and the tree model. What is interesting, however, is that there are different variables being used to create this tree than the 2014-15 data, AND TS% is not the highest part in the tree. We will continue to evaluate the model by estimating the number of wins.

setwd("C:/Users/Evan Boyd/Desktop/College/Spring 2016/Stat 479/Individual Project/Files/CSV/2016")


overall.2016 = data.frame(advanced.2016$Team,predict.2016,yhat.2016,
                          team.predictions.2016,wins2)
names(overall.2016) = c("Team","LM Pred","Tree","TS Pred","Wins")
overall.2016
##                       Team  LM Pred     Tree  TS Pred Wins
## 1           Atlanta Hawks* 45.32603 58.42857 48.16611   48
## 2          Boston Celtics* 50.26584 43.30000 33.23298   48
## 3            Brooklyn Nets 19.17782 23.28571 35.40941   21
## 4       Charlotte Hornets* 48.77669 43.30000 36.64374   48
## 5            Chicago Bulls 39.17992 43.30000 36.91783   42
## 6     Cleveland Cavaliers* 54.87495 58.42857 40.90592   57
## 7        Dallas Mavericks* 42.31609 43.30000 37.35972   42
## 8           Denver Nuggets 34.04783 23.28571 36.74612   33
## 9         Detroit Pistons* 44.52795 43.30000 29.97901   44
## 10  Golden State Warriors* 75.00185 58.42857 58.96149   73
## 11        Houston Rockets* 39.61921 37.50000 36.95800   41
## 12         Indiana Pacers* 45.93655 43.30000 39.91974   45
## 13   Los Angeles Clippers* 52.49417 58.42857 49.18857   53
## 14      Los Angeles Lakers 14.56487 23.28571 25.45766   17
## 15      Memphis Grizzlies* 39.17754 37.50000 33.66386   42
## 16             Miami Heat* 45.95549 43.30000 48.30477   48
## 17         Milwaukee Bucks 32.39930 37.50000 34.93836   33
## 18  Minnesota Timberwolves 33.06054 23.28571 36.47593   29
## 19    New Orleans Pelicans 32.03207 23.28571 45.20048   30
## 20         New York Knicks 35.02824 43.30000 33.37163   32
## 21  Oklahoma City Thunder* 54.66215 58.42857 46.17563   55
## 22           Orlando Magic 38.20079 37.50000 43.20287   35
## 23      Philadelphia 76ers 12.91266 23.28571 36.22388   10
## 24            Phoenix Suns 20.62755 23.28571 37.26836   23
## 25 Portland Trail Blazers* 42.49434 43.30000 38.77678   44
## 26        Sacramento Kings 34.26589 37.50000 47.91406   33
## 27      San Antonio Spurs* 65.63326 58.42857 54.62286   67
## 28        Toronto Raptors* 53.35108 58.42857 41.45731   56
## 29               Utah Jazz 45.69640 43.30000 44.68925   40
## 30      Washington Wizards 38.39292 37.50000 43.86766   41
mse.lm.2016 = (1/30)*sum((predict.2016-wins2)^2)
mse.yhat.2016 = (1/30)*sum((yhat.2016-wins2)^2)
mse.TS.2016 = (1.30)*sum((team.predictions.2016-wins2)^2)
c(mse.lm.2016,mse.yhat.2016,mse.TS.2016)
## [1]    5.396527   38.024762 4472.239454

Based on the MSE values, the tree-based model did slightly better in its accuracy, but the linear model was not too far behind. Similar to the 2015 data, there were still some teams that are estimated better by TS% or the linear model.

Conclusion

Evaluating True-Shooting percentage to predict wins failed to solely provide an accurate prediction on wins compared to a linear model or a tree-based model, which incorporated potential confounding and/or lurking variables. Despite this, I was able to accurately predict wins based on a small number of basketball statistics. In an field where new statistics are being made every week, it is interesting to see what statistics actually generate team wins.

Obviously, TS% does not account for defensive performance, which I believe resulted in the over/underappreciation of many teams. Teams with high shooting percentages but low defensive statistics will have more wins on paper.

What is important is that advanced statistics seem to have a much larger impact on the models than basic statistics. Based on the number of variables in the linear model, the tree diagrams, and the correlation with TS% and wins, advanced statistics appear to have a profound impact. NBA teams should consider investing more in valuing these statistics in order to create shots, run the shot clock well, and rebound the ball successfully. Also, media outlets such as ESPN and 538 should consider analyzing various teams using advanced statistics to try and create better models, as well as draw more interest.

Many of the statistics I used were offensive-oriented, meaning that they focus on a team’s offensive performance instead of defensive. One limitation for that is the fact that there are not as many advanced statistics out there for defense than for offense. With more resources, like defensive numbers and data for over the past 10 years, I would like to look in more to see how important defense, and what type of defense, is to create wins.

I also will confirm the idea that creating the right shot is the new emphasis on the NBA. This is with more three pointers, sharing the ball, and generating on-and-off-ball screens. True Shooting Percentage measures exactly that, which is why it predicted teams like the Warriors, Spurs, and Mavericks so high, and why it valued the Bulls and Jazz less.

The playoffs are beginning in the NBA, with the frontrunners to win it all being the Golden State Warriors. If the advanced statistics do not lie, then Golden State, Oklahoma City, and the Cleveland Cavaliers will make deep runs in the playoffs. May the best numbers win!

Appendix: List of Variables and Descriptions

Basic Statistics:

OWN = The number of points scored per game by that team.

OPP = The number of points allowed per game that by team. In other words, the amount of points the opposing team scores on average.

DIFF = OWN = OPP

FG.OWN = The team’s field goal percentage. This calculates two-pointers and three-pointers, not free throws.

FG.OPP = The opponent’s field goal percentage, on average.

X3P.OWN = The team’s three-point field goal percentage.

X3P.OPP = The opponent’s three-point field goal percentage.

FT. = The team’s free throw percentage. A player can attempt a free throw by drawing any sort of foul from the opposing team.

OFF.REB. = The percentage of offensive rebounds a player obtains while on the court. This is averaged for each team.

DEF.REB. = The percentage of defensive rebounds a player obtains while on the court.

TOT.REB.. = The percentage of total rebounds a player obtains while on the court.

OWN.TO = The number of turnovers a team makes per game.

OPP.TO = The number of turnovers a team creates per game. In other words, the number of turnovers the opponent team makes per game.

Advanced statistics:

PW = Pythagorean wins, i.e., expected wins based on points scored and allowed

PL = Pythagorean losses, i.e., expected losses based on points scored and allowed

MOV = Margin of Victory

SOS = Strength of Schedule - A rating of strength of schedule. The rating is denominated in points above/below average, where zero is the average.

SRS = Simple Rating System; A team rating that takes into account average point differential and strength of schedule. The rating is denominated in points above/below average, where zero is average.

ORtg = Offensive Rating: An estimate of points produced (players) or scored (teams) per 100 possessions

DRtg = Defensive Rating: An estimate of points allowed per 100 possessions

Pace = Pace Factor: An estimate of possessions per 48 minutes

FTr = Free Throw Attempt Rate: Number of free throw attempts per field goal attempt

x3PAr = 3-Point Attempt Rate: Percentage of FG Attempts from 3-point range.

TS. = True Shooting Percentage: A measure of shooting efficiency that takes into account 2-point field goals, 3-point field goals, and free throws

Efg.=Effective Field Goal Percentage

Adjusts for the fact that a 3-point field goal is one more point than a 2-point field goal

TOV. = Turnover percentage: An estimate of turnovers committee per 100 plays

ORB. = Offensive Rebound Percentage: An estimate of the percentage of available offensive rebounds a player grabbed while he was on the floor

FT.FGA = Free Throws per Field Goal Attempt (might wanna delete)

Opp Efg = Opponent Effective Field Goal Percentage

OTOV = Opponent Turnover Percentage

DRB = Defensive Rebound Percentage: An estimate of the percentage of available defensive rebounds a player grabbed while he was on the floor

OFT/FGA = Opponent Free Throws Per Field Goal Attempt

Attendance = Total Attendance during home games

Wins = # of wins for each team in the 2014-15 (2015-16) season