2018 NBA Draft Anova Test

By Carlos Jones

(E)

Analysis of Variance (ANOVA) is a statistical technique, commonly used to studying differences between two or more group means. ANOVA in R primarily provides evidence of the existence of the mean equality between the groups. This statistical method is an extension of the t-test and is used in a situation where the factor variable has more than one group.

Our goal is to look at the 2018 NBA Draft to understand if there is a difference in the amount of minutes played per game, based on the Round drafted. To understand this, I would like to perform an ANOVA test to understand if there is a significant statistical difference of the means between the following groups: Lottery Draft Picks (1-15), Remainder Round 1 Draft Picks (16-30), and 2nd Round Draft Picks (31-60).

Before any analysis begin, I like to inspect the data to understand the column names, missing rows, and just the overall structure of the Data Frame. First lets check out a quick summary of the data:

setwd("~/R Projects/Hawks Season Project")
Draft2018<-read.csv("Draft2018.csv")
summary(Draft2018)
##        Rk              Pk              Tm    
##  Min.   : 1.00   Min.   : 1.00   PHI    : 6  
##  1st Qu.:15.75   1st Qu.:15.75   ATL    : 4  
##  Median :30.50   Median :30.50   PHO    : 4  
##  Mean   :30.50   Mean   :30.50   BRK    : 3  
##  3rd Qu.:45.25   3rd Qu.:45.25   DAL    : 3  
##  Max.   :60.00   Max.   :60.00   DEN    : 3  
##                                  (Other):37  
##                           Player         College        Yrs       
##  Aaron Holiday\\holidaa01     : 1            : 9   Min.   :1.000  
##  Alize Johnson\\johnsal02     : 1   Duke     : 4   1st Qu.:2.000  
##  Anfernee Simons\\simonan01   : 1   Kentucky : 4   Median :2.000  
##  Arnoldas Kulboka\\kulboar01  : 1   Villanova: 4   Mean   :1.875  
##  Bruce Brown\\brownbr01       : 1   Kansas   : 2   3rd Qu.:2.000  
##  Chandler Hutchison\\hutchch01: 1   Maryland : 2   Max.   :2.000  
##  (Other)                      :54   (Other)  :35   NA's   :4      
##        G                MP              PTS              TRB        
##  Min.   :  1.00   Min.   :   6.0   Min.   :   0.0   Min.   :   1.0  
##  1st Qu.: 47.25   1st Qu.: 565.2   1st Qu.: 167.8   1st Qu.:  73.5  
##  Median : 86.50   Median :1481.5   Median : 539.0   Median : 203.5  
##  Mean   : 80.50   Mean   :1697.9   Mean   : 739.7   Mean   : 289.0  
##  3rd Qu.:112.25   3rd Qu.:2800.8   3rd Qu.:1104.2   3rd Qu.: 470.8  
##  Max.   :147.00   Max.   :4748.0   Max.   :3327.0   Max.   :1089.0  
##  NA's   :4        NA's   :4        NA's   :4        NA's   :4       
##       AST              FG.              X3P.             FT.        
##  Min.   :   0.0   Min.   :0.0000   Min.   :0.0000   Min.   :0.3330  
##  1st Qu.:  32.0   1st Qu.:0.3835   1st Qu.:0.2995   1st Qu.:0.6840  
##  Median :  82.5   Median :0.4250   Median :0.3305   Median :0.7550  
##  Mean   : 163.3   Mean   :0.4211   Mean   :0.2979   Mean   :0.7173  
##  3rd Qu.: 205.2   3rd Qu.:0.4670   3rd Qu.:0.3653   3rd Qu.:0.7870  
##  Max.   :1213.0   Max.   :0.7200   Max.   :0.5000   Max.   :0.8470  
##  NA's   :4        NA's   :5        NA's   :8        NA's   :7       
##       MP.1           PTS.1            TRB.1            AST.1      
##  Min.   : 2.70   Min.   : 0.000   Min.   : 0.200   Min.   :0.000  
##  1st Qu.:10.90   1st Qu.: 3.600   1st Qu.: 1.400   1st Qu.:0.600  
##  Median :17.05   Median : 6.550   Median : 2.550   Median :1.000  
##  Mean   :17.01   Mean   : 7.123   Mean   : 2.977   Mean   :1.507  
##  3rd Qu.:23.85   3rd Qu.: 8.575   3rd Qu.: 3.950   3rd Qu.:1.825  
##  Max.   :32.80   Max.   :24.400   Max.   :10.800   Max.   :8.600  
##  NA's   :4       NA's   :4        NA's   :4        NA's   :4      
##        WS             WS.48               BPM               VORP        
##  Min.   :-1.200   Min.   :-0.47100   Min.   :-19.900   Min.   :-2.7000  
##  1st Qu.: 0.100   1st Qu.: 0.01100   1st Qu.: -4.825   1st Qu.:-0.2250  
##  Median : 1.700   Median : 0.05350   Median : -2.400   Median : 0.0000  
##  Mean   : 2.595   Mean   : 0.04516   Mean   : -2.971   Mean   : 0.3036  
##  3rd Qu.: 3.925   3rd Qu.: 0.09550   3rd Qu.: -1.200   3rd Qu.: 0.3250  
##  Max.   :13.000   Max.   : 0.22300   Max.   :  5.900   Max.   : 8.2000  
##  NA's   :4        NA's   :4          NA's   :4         NA's   :4

Now lets, take a look at the structure:

str(Draft2018)
## 'data.frame':    60 obs. of  22 variables:
##  $ Rk     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Pk     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Tm     : Factor w/ 28 levels "ATL","BOS","BRK",..: 23 25 1 15 7 21 4 6 19 22 ...
##  $ Player : Factor w/ 60 levels "Aaron Holiday\\holidaa01",..: 10 39 38 24 56 45 59 8 33 42 ...
##  $ College: Factor w/ 37 levels "","Alabama","Arizona",..: 3 10 1 18 22 27 10 2 13 34 ...
##  $ Yrs    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ G      : int  101 75 126 112 141 107 87 147 140 147 ...
##  $ MP     : int  3179 1901 4117 3027 4623 1634 2366 4748 3324 4189 ...
##  $ PTS    : int  1729 1108 3075 1712 3327 624 939 2720 1382 1249 ...
##  $ TRB    : int  1089 568 1065 526 556 532 712 440 519 523 ...
##  $ AST    : int  182 72 899 140 1213 81 129 435 143 288 ...
##  $ FG.    : num  0.572 0.497 0.443 0.485 0.428 0.474 0.508 0.45 0.367 0.466 ...
##  $ X3P.   : num  0 0.288 0.322 0.386 0.344 0.333 0.197 0.392 0.337 0.341 ...
##  $ FT.    : num  0.753 0.703 0.733 0.755 0.847 0.624 0.761 0.843 0.697 0.825 ...
##  $ MP.1   : num  31.5 25.3 32.7 27 32.8 15.3 27.2 32.3 23.7 28.5 ...
##  $ PTS.1  : num  17.1 14.8 24.4 15.3 23.6 5.8 10.8 18.5 9.9 8.5 ...
##  $ TRB.1  : num  10.8 7.6 8.5 4.7 3.9 5 8.2 3 3.7 3.6 ...
##  $ AST.1  : num  1.8 1 7.1 1.3 8.6 0.8 1.5 3 1 2 ...
##  $ WS     : num  8.3 4 13 6.5 9.2 4.2 5.2 1.9 -1.2 7 ...
##  $ WS.48  : num  0.125 0.1 0.151 0.102 0.095 0.124 0.105 0.02 -0.017 0.08 ...
##  $ BPM    : num  0.3 -1.4 5.9 0 1.5 -0.4 -2 -3.4 -5.2 -0.6 ...
##  $ VORP   : num  1.9 0.3 8.2 1.5 4 0.7 0 -1.7 -2.7 1.5 ...

Our main objective is to understand is there is a statiscal difference between the average minutes per game played by Lottery Draft Picks, Remainding 1st Round Draft Picks, and 2nd Round Draft Picks. Currently our data shows Total Minutes (MP); however, we need to find the Minutes Per Game (mpg), by dividing the Total Minutes by Games Played (G).

Draft2018$mpg <- Draft2018$MP/Draft2018$G
Draft2018$mpg<-round(Draft2018$mpg,2)
head (Draft2018$mpg)
## [1] 31.48 25.35 32.67 27.03 32.79 15.27

Now that we have our “Minutes Per Game” column, we need to define our three groups that we plan to measure. Lets add a column that specify Lottery, Round 1, and Round 2 groups based on Pk number.

Draft2018$Round<-ifelse(Draft2018$Pk<=15,"Lottery","Round 1")
Draft2018$Round<-ifelse(Draft2018$Pk> 30,"Round 2",Draft2018$Round)
View (Draft2018)

There are several columns that are still in my dataset that I don’t necessarily need for the Anova test. Lets remove those columns that I do not need for the analysis for simplicity.

DontNeed<-c(1,5,6,9,10,11,12,13,14,15,16,17,18,19,20,21,22)
Draft2018<-Draft2018[,-DontNeed]
DontNeedRows<-c(43,44,51,55)
Draft2018<-Draft2018[-DontNeedRows,]
head(Draft2018)
##   Pk  Tm                   Player   G   MP   mpg   Round
## 1  1 PHO Deandre Ayton\\aytonde01 101 3179 31.48 Lottery
## 2  2 SAC Marvin Bagley\\baglema01  75 1901 25.35 Lottery
## 3  3 ATL   Luka Don?i?\\doncilu01 126 4117 32.67 Lottery
## 4  4 MEM Jaren Jackson\\jacksja02 112 3027 27.03 Lottery
## 5  5 DAL    Trae Young\\youngtr01 141 4623 32.79 Lottery
## 6  6 ORL Mohamed Bamba\\bambamo01 107 1634 15.27 Lottery

Just for simplicity and ease of reading, I would like to reorder the columns.

Draft2018<-Draft2018[,c(7,1,2,3,5,4,6)]
head(Draft2018)
##     Round Pk  Tm                   Player   MP   G   mpg
## 1 Lottery  1 PHO Deandre Ayton\\aytonde01 3179 101 31.48
## 2 Lottery  2 SAC Marvin Bagley\\baglema01 1901  75 25.35
## 3 Lottery  3 ATL   Luka Don?i?\\doncilu01 4117 126 32.67
## 4 Lottery  4 MEM Jaren Jackson\\jacksja02 3027 112 27.03
## 5 Lottery  5 DAL    Trae Young\\youngtr01 4623 141 32.79
## 6 Lottery  6 ORL Mohamed Bamba\\bambamo01 1634 107 15.27

I would also like to rename a few of the columns.

colnames(Draft2018)[colnames(Draft2018)=="Pk"]<-"Pick"
colnames(Draft2018)[colnames(Draft2018)=="MP"]<-"Total_MP"
head(Draft2018)
##     Round Pick  Tm                   Player Total_MP   G   mpg
## 1 Lottery    1 PHO Deandre Ayton\\aytonde01     3179 101 31.48
## 2 Lottery    2 SAC Marvin Bagley\\baglema01     1901  75 25.35
## 3 Lottery    3 ATL   Luka Don?i?\\doncilu01     4117 126 32.67
## 4 Lottery    4 MEM Jaren Jackson\\jacksja02     3027 112 27.03
## 5 Lottery    5 DAL    Trae Young\\youngtr01     4623 141 32.79
## 6 Lottery    6 ORL Mohamed Bamba\\bambamo01     1634 107 15.27

At this time, we are ready to perform our Anova test to understand if there is a significant difference between each of the Group’s means.

anova<-aov(mpg~Round, data=Draft2018)
summary(anova)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Round        2   1597   798.7   16.97 2.01e-06 ***
## Residuals   53   2494    47.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the Anova test, we can see that there is a significant difference between the Groups; hwowever, we still do not necessarily know which Groups have a significant difference.

Lets first plot the groups, so that we can visually see our groups and then we will perform a Post Hoc Analysis to dig deeper.

boxplot(mpg ~ Round, data=Draft2018)

The boxplot show us that the Lottery Group has seperated itself from both Round 1 and Round 2 Draft picks. The seperation does not look as clear with Round 1 vs. Round 2 Groups.

Lets perform our Post Hoc Analysis, called TukeyHSD, to dig deeper

Post Hoc Analysis

TukeyHSD(anova)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mpg ~ Round, data = Draft2018)
## 
## $Round
##                       diff       lwr        upr     p adj
## Round 1-Lottery  -8.136667 -14.17631 -2.0970263 0.0056303
## Round 2-Lottery -12.958333 -18.32125 -7.5954131 0.0000010
## Round 2-Round 1  -4.821667 -10.18459  0.5412536 0.0861837

Based on our TukeyHSD analysis, we can verify what our Boxplot visually showed us. There is a significant difference of the means between Lottery Draft picks and both Round 1 & Round 2 Draft picks. This is determined by our adjusted P value, which is below .05. Because the p value is below .05, we have enough evidence to reject the Null Hypothesis that the Means for all Groups are even.