For this analysis, my goal is to understand if there is a statistically significant difference, in the SRS Mean, between four groups in the NBA. First let’s discuss what SRS is, when it comes to the NBA. SRS, also known as Simple Rating System, is a system introduced by Sports Reference and used by Basketball Reference, Pro Football Focus, and others as well. It considers Average Point Differential and Strength of Schedule. For instance, the 2006-07 Spurs won games by an average of 8.43 points per game and played a schedule with opponents that were 0.08 points worse than average, giving them an SRS of 8.35. This means they were 8.35 points better than an average team. An average team would have an SRS of 0.0.
Let’s do a quick reminder of how the Anova is defined. Analysis of Variance (ANOVA) is a statistical technique, commonly used to studying differences between two or more group means.ANOVA in R primarily provides evidence of the existence of the mean equality between the groups. This statistical method is an extension of the t-test and is used in a situation where the factor variable has more than one group.
I have broken this analysis down into the following 4 Groups:
Lets begin our analysis to see if there is a statistical significant SRS Mean difference between the 4 Groups by defining our Hypothesis.
nbaSRS <- read.csv("~/R Projects/nbaSRS/nbaSRS.csv")
Before we move further in our analysis, let’s take a quick look at our data to look for any missing rows, incorrect columns, duplicate rows, etc.
str(nbaSRS)
## 'data.frame': 31 obs. of 8 variables:
## $ Eastern.Conference: Factor w/ 31 levels "Atlanta Hawks ",..: 17 28 23 2 12 3 22 9 4 16 ...
## $ W : Factor w/ 21 levels "17","19","22",..: 20 19 15 13 12 11 11 10 9 9 ...
## $ L : Factor w/ 21 levels "22","24","25",..: 1 2 6 8 9 10 10 11 12 12 ...
## $ W.L. : Factor w/ 21 levels "0.207","0.232",..: 20 19 15 13 12 11 11 10 9 9 ...
## $ GB : Factor w/ 20 levels "—","11","12",..: 1 6 19 2 3 4 4 5 8 8 ...
## $ PS.G : Factor w/ 29 levels "103.5","104.5",..: 28 21 25 15 9 14 7 6 11 5 ...
## $ PA.G : Factor w/ 30 levels "104.7","105.9",..: 11 9 19 8 1 18 5 7 17 2 ...
## $ SRS : Factor w/ 31 levels "-0.4","-0.45",..: 30 28 20 23 21 1 17 3 8 2 ...
head(nbaSRS,31)
## Eastern.Conference W L W.L. GB PS.G PA.G SRS
## 1 Milwaukee Bucks 60 22 0.732 — 118.1 109.3 8.04
## 2 Toronto Raptors 58 24 0.707 2 114.4 108.4 5.49
## 3 Philadelphia 76ers 51 31 0.622 9 115.2 112.5 2.25
## 4 Boston Celtics 49 33 0.598 11 112.4 108 3.9
## 5 Indiana Pacers 48 34 0.585 12 108 104.7 2.76
## 6 Brooklyn Nets 42 40 0.512 18 112.2 112.3 -0.4
## 7 Orlando Magic 42 40 0.512 18 107.3 106.6 0.28
## 8 Detroit Pistons 41 41 0.5 19 107 107.3 -0.56
## 9 Charlotte Hornets 39 43 0.476 21 110.7 111.8 -1.32
## 10 Miami Heat 39 43 0.476 21 105.7 105.9 -0.45
## 11 Washington Wizards 32 50 0.39 28 114 116.9 -3.3
## 12 Atlanta Hawks 29 53 0.354 31 113.3 119.4 -6.06
## 13 Chicago Bulls 22 60 0.268 38 104.9 113.4 -8.32
## 14 Cleveland Cavaliers 19 63 0.232 41 104.5 114.1 -9.39
## 15 New York Knicks 17 65 0.207 43 104.6 113.8 -8.93
## 16 Western Conference W L W/L% GB PS/G PA/G SRS
## 17 Golden State Warriors 57 25 0.695 — 117.7 111.2 6.42
## 18 Denver Nuggets 54 28 0.659 3 110.7 106.7 4.19
## 19 Portland Trail Blazers 53 29 0.646 4 114.7 110.5 4.43
## 20 Houston Rockets 53 29 0.646 4 113.9 109.1 4.96
## 21 Utah Jazz 50 32 0.61 7 111.7 106.5 5.28
## 22 Oklahoma City Thunder 49 33 0.598 8 114.5 111.1 3.56
## 23 San Antonio Spurs 48 34 0.585 9 111.7 110 1.8
## 24 Los Angeles Clippers 48 34 0.585 9 115.1 114.3 1.09
## 25 Sacramento Kings 39 43 0.476 18 114.2 115.3 -0.81
## 26 Los Angeles Lakers 37 45 0.451 20 111.8 113.5 -1.33
## 27 Minnesota Timberwolves 36 46 0.439 21 112.5 114 -1.02
## 28 Memphis Grizzlies 33 49 0.402 24 103.5 106.1 -2.08
## 29 New Orleans Pelicans 33 49 0.402 24 115.4 116.8 -1.1
## 30 Dallas Mavericks 33 49 0.402 24 108.9 110.1 -0.87
## 31 Phoenix Suns 19 63 0.232 38 107.5 116.8 -8.61
I did notice that my SRS column is in a “Factor” format; however, I need to change it to “Numeric” in order to run my Anova Test. Let’s do that now.
nbaSRS$SRS<-as.numeric(nbaSRS$SRS)
str(nbaSRS)
## 'data.frame': 31 obs. of 8 variables:
## $ Eastern.Conference: Factor w/ 31 levels "Atlanta Hawks ",..: 17 28 23 2 12 3 22 9 4 16 ...
## $ W : Factor w/ 21 levels "17","19","22",..: 20 19 15 13 12 11 11 10 9 9 ...
## $ L : Factor w/ 21 levels "22","24","25",..: 1 2 6 8 9 10 10 11 12 12 ...
## $ W.L. : Factor w/ 21 levels "0.207","0.232",..: 20 19 15 13 12 11 11 10 9 9 ...
## $ GB : Factor w/ 20 levels "—","11","12",..: 1 6 19 2 3 4 4 5 8 8 ...
## $ PS.G : Factor w/ 29 levels "103.5","104.5",..: 28 21 25 15 9 14 7 6 11 5 ...
## $ PA.G : Factor w/ 30 levels "104.7","105.9",..: 11 9 19 8 1 18 5 7 17 2 ...
## $ SRS : num 30 28 20 23 21 1 17 3 8 2 ...
Since my goal for the Anova Test is to look for significant differences in the means between four groups, I need to create a column for the groups. My groups will be the following:
nbaSRS$Groups<-c("East_P","East_P","East_P","East_P","East_P","East_P","East_P","East_P","East_NP","East_NP","East_NP","East_NP","East_NP","East_NP","East_NP","West_P","West_P","West_P","West_P","West_P","West_P","West_P","West_P","West_NP","West_NP","West_NP","West_NP","West_NP","West_NP","West_NP","West_NP")
View (nbaSRS)
One last data clean up before I perform the test. About half way down the dataset, the Western Conference Header Row is located on the 16th row. I do not need that for my analysis, so will remove that completely.
WestHeader<-16
nbaSRS<-nbaSRS[-WestHeader,]
head(nbaSRS,16)
## Eastern.Conference W L W.L. GB PS.G PA.G SRS Groups
## 1 Milwaukee Bucks 60 22 0.732 — 118.1 109.3 30 East_P
## 2 Toronto Raptors 58 24 0.707 2 114.4 108.4 28 East_P
## 3 Philadelphia 76ers 51 31 0.622 9 115.2 112.5 20 East_P
## 4 Boston Celtics 49 33 0.598 11 112.4 108 23 East_P
## 5 Indiana Pacers 48 34 0.585 12 108 104.7 21 East_P
## 6 Brooklyn Nets 42 40 0.512 18 112.2 112.3 1 East_P
## 7 Orlando Magic 42 40 0.512 18 107.3 106.6 17 East_P
## 8 Detroit Pistons 41 41 0.5 19 107 107.3 3 East_P
## 9 Charlotte Hornets 39 43 0.476 21 110.7 111.8 8 East_NP
## 10 Miami Heat 39 43 0.476 21 105.7 105.9 2 East_NP
## 11 Washington Wizards 32 50 0.39 28 114 116.9 11 East_NP
## 12 Atlanta Hawks 29 53 0.354 31 113.3 119.4 12 East_NP
## 13 Chicago Bulls 22 60 0.268 38 104.9 113.4 13 East_NP
## 14 Cleveland Cavaliers 19 63 0.232 41 104.5 114.1 16 East_NP
## 15 New York Knicks 17 65 0.207 43 104.6 113.8 15 East_NP
## 17 Golden State Warriors 57 25 0.695 — 117.7 111.2 29 West_P
At this time I am able to perform the Anova test to see if there is a statistically significant difference in the SRS Means between the four groups.
anova<-aov(SRS~ Groups, data=nbaSRS)
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## Groups 3 1088 362.7 8.133 0.000553 ***
## Residuals 26 1160 44.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After performing the Anova test, I can clearly see that there is a statistically significant difference in the SRS Means between the groups, based on the p-value that was returned. Since the p-value is below the .05, we have enough evidence to reject the Null Hypothesis.
I am not completely finish with the analysis at this time. The Anova test explains that there is a statistically significant difference in the SRS Means between the groups; however, it doesn’t necessarily tell me which group. To understand which group, I will perform a Post Hoc Analysis test called TukeyHSD.
TukeyHSD(anova)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = SRS ~ Groups, data = nbaSRS)
##
## $Groups
## diff lwr upr p adj
## East_P-East_NP 6.875000 -2.606348 16.3563478 0.2178261
## West_NP-East_NP -1.875000 -11.356348 7.6063478 0.9477304
## West_P-East_NP 13.571429 3.779135 23.3637225 0.0040891
## West_NP-East_P -8.750000 -17.909852 0.4098522 0.0650040
## West_P-East_P 6.696429 -2.784919 16.1777764 0.2375479
## West_P-West_NP 15.446429 5.965081 24.9277764 0.0007429
After running the TukeyHSD test, I noticed that I can’t reject the Null Hypothesis on every scenario; however, I can for 3 of the scenarios:
This informs me that there is a 12.75, -10.01, and 15.89 statistical difference in the Means between the groups, respectively.
Thank you you for reviewing this analysis! I invite you to check out my other R, SQL, Excel, and Tableau projects on the following websites:
Thanks!!!