Set Working Directory and import CSV data
setwd("~/R Projects/Hawks Season Project")
HawksSeasons<-read.csv("HawksSeasons.csv")
head(HawksSeasons)
## Season Lg Tm W L W.L. Finish Age Ht. Wt. G MP FG FGA FG.
## 1 2019-20 NBA ATL 20 47 0.299 5 24.1 6-Jun 214 67 16280 2723 6067 0.449
## 2 2018-19 NBA ATL 29 53 0.354 5 25.1 7-Jun 215 82 19855 3392 7524 0.451
## 3 2017-18 NBA ATL 24 58 0.293 5 25.4 6-Jun 212 82 19705 3130 7015 0.446
## 4 2016-17 NBA ATL 43 39 0.524 2 27.9 6-Jun 219 82 19880 3123 6918 0.451
## 5 2015-16 NBA ATL 48 34 0.585 2 28.2 6-Jun 217 82 19830 3168 6923 0.458
## 6 2014-15 NBA ATL 60 22 0.732 1 27.8 6-Jun 218 82 19730 3121 6699 0.466
## X3P X3PA X3P. X2P X2PA X2P. FT FTA FT. ORB DRB TRB AST STL BLK
## 1 805 2416 0.333 1918 3651 0.525 1237 1566 0.790 661 2237 2898 1605 523 341
## 2 1067 3034 0.352 2325 4490 0.518 1443 1918 0.752 955 2825 3780 2118 675 419
## 3 917 2544 0.360 2213 4471 0.495 1298 1654 0.785 743 2693 3436 1946 638 348
## 4 729 2137 0.341 2394 4781 0.501 1484 2039 0.728 842 2793 3635 1938 672 397
## 5 815 2326 0.350 2353 4597 0.512 1282 1638 0.783 679 2772 3451 2100 747 486
## 6 818 2152 0.380 2303 4547 0.506 1349 1735 0.778 715 2611 3326 2111 744 380
## TOV PF PTS
## 1 1086 1548 7488
## 2 1397 1932 9294
## 3 1276 1606 8475
## 4 1294 1491 8459
## 5 1226 1570 8433
## 6 1167 1457 8409
Before we begin, let’s define our columns
Now that we have our columns defined, let’s delete the Rows and Columns that will not be used in this analysis. Delete row 1 because we only want seasons 2000-2018. Delete columns Lg, W, L, Finish, Age, Ht., Wt., MP because we will not use them in this analysis
leave<-c(2,4,5,7,8,9,10,12)
HawksSeasons<-HawksSeasons[,-leave]
HawksSeasons<-HawksSeasons[-1,]
head(HawksSeasons)
## Season Tm W.L. G FG FGA FG. X3P X3PA X3P. X2P X2PA X2P. FT
## 2 2018-19 ATL 0.354 82 3392 7524 0.451 1067 3034 0.352 2325 4490 0.518 1443
## 3 2017-18 ATL 0.293 82 3130 7015 0.446 917 2544 0.360 2213 4471 0.495 1298
## 4 2016-17 ATL 0.524 82 3123 6918 0.451 729 2137 0.341 2394 4781 0.501 1484
## 5 2015-16 ATL 0.585 82 3168 6923 0.458 815 2326 0.350 2353 4597 0.512 1282
## 6 2014-15 ATL 0.732 82 3121 6699 0.466 818 2152 0.380 2303 4547 0.506 1349
## 7 2013-14 ATL 0.463 82 3061 6688 0.458 768 2116 0.363 2293 4572 0.502 1392
## FTA FT. ORB DRB TRB AST STL BLK TOV PF PTS
## 2 1918 0.752 955 2825 3780 2118 675 419 1397 1932 9294
## 3 1654 0.785 743 2693 3436 1946 638 348 1276 1606 8475
## 4 2039 0.728 842 2793 3635 1938 672 397 1294 1491 8459
## 5 1638 0.783 679 2772 3451 2100 747 486 1226 1570 8433
## 6 1735 0.778 715 2611 3326 2111 744 380 1167 1457 8409
## 7 1782 0.781 713 2565 3278 2041 680 326 1251 1577 8282
Let’s rename the rows in the season columns to the Year that the season begins, which will be 2018-2000
HawksSeasons$Season<-rep(2018:2000)
head(HawksSeasons)
## Season Tm W.L. G FG FGA FG. X3P X3PA X3P. X2P X2PA X2P. FT FTA
## 2 2018 ATL 0.354 82 3392 7524 0.451 1067 3034 0.352 2325 4490 0.518 1443 1918
## 3 2017 ATL 0.293 82 3130 7015 0.446 917 2544 0.360 2213 4471 0.495 1298 1654
## 4 2016 ATL 0.524 82 3123 6918 0.451 729 2137 0.341 2394 4781 0.501 1484 2039
## 5 2015 ATL 0.585 82 3168 6923 0.458 815 2326 0.350 2353 4597 0.512 1282 1638
## 6 2014 ATL 0.732 82 3121 6699 0.466 818 2152 0.380 2303 4547 0.506 1349 1735
## 7 2013 ATL 0.463 82 3061 6688 0.458 768 2116 0.363 2293 4572 0.502 1392 1782
## FT. ORB DRB TRB AST STL BLK TOV PF PTS
## 2 0.752 955 2825 3780 2118 675 419 1397 1932 9294
## 3 0.785 743 2693 3436 1946 638 348 1276 1606 8475
## 4 0.728 842 2793 3635 1938 672 397 1294 1491 8459
## 5 0.783 679 2772 3451 2100 747 486 1226 1570 8433
## 6 0.778 715 2611 3326 2111 744 380 1167 1457 8409
## 7 0.781 713 2565 3278 2041 680 326 1251 1577 8282
Now let’s divide specific columns by 82 games to get the average for my stats instead of a total sum for the year
divisiblestats<-c(5,6,8,9,11,12,14,15,17,18,19,20,21,22,23,24,25)
nondivisiblestats<-c(1,2,3,4,7,10,13,16)
HawksAverage<-HawksSeasons[,divisiblestats]/82
HawksAverage<-round(HawksAverage,2)
HawksSame<-HawksSeasons[,nondivisiblestats]
HawksSeasons<-cbind(HawksSame,HawksAverage)
head(HawksSeasons)
## Season Tm W.L. G FG. X3P. X2P. FT. FG FGA X3P X3PA X2P
## 2 2018 ATL 0.354 82 0.451 0.352 0.518 0.752 41.37 91.76 13.01 37.00 28.35
## 3 2017 ATL 0.293 82 0.446 0.360 0.495 0.785 38.17 85.55 11.18 31.02 26.99
## 4 2016 ATL 0.524 82 0.451 0.341 0.501 0.728 38.09 84.37 8.89 26.06 29.20
## 5 2015 ATL 0.585 82 0.458 0.350 0.512 0.783 38.63 84.43 9.94 28.37 28.70
## 6 2014 ATL 0.732 82 0.466 0.380 0.506 0.778 38.06 81.70 9.98 26.24 28.09
## 7 2013 ATL 0.463 82 0.458 0.363 0.502 0.781 37.33 81.56 9.37 25.80 27.96
## X2PA FT FTA ORB DRB TRB AST STL BLK TOV PF PTS
## 2 54.76 17.60 23.39 11.65 34.45 46.10 25.83 8.23 5.11 17.04 23.56 113.34
## 3 54.52 15.83 20.17 9.06 32.84 41.90 23.73 7.78 4.24 15.56 19.59 103.35
## 4 58.30 18.10 24.87 10.27 34.06 44.33 23.63 8.20 4.84 15.78 18.18 103.16
## 5 56.06 15.63 19.98 8.28 33.80 42.09 25.61 9.11 5.93 14.95 19.15 102.84
## 6 55.45 16.45 21.16 8.72 31.84 40.56 25.74 9.07 4.63 14.23 17.77 102.55
## 7 55.76 16.98 21.73 8.70 31.28 39.98 24.89 8.29 3.98 15.26 19.23 101.00
It turns out, I do not need the “Tm” column, which is the Team name nor the “G” column which is the number of games played, since its given that each season every team plays 82 games
leave2<-c(2,4)
HawksSeasons<-HawksSeasons[,-leave2]
head(HawksSeasons)
## Season W.L. FG. X3P. X2P. FT. FG FGA X3P X3PA X2P X2PA
## 2 2018 0.354 0.451 0.352 0.518 0.752 41.37 91.76 13.01 37.00 28.35 54.76
## 3 2017 0.293 0.446 0.360 0.495 0.785 38.17 85.55 11.18 31.02 26.99 54.52
## 4 2016 0.524 0.451 0.341 0.501 0.728 38.09 84.37 8.89 26.06 29.20 58.30
## 5 2015 0.585 0.458 0.350 0.512 0.783 38.63 84.43 9.94 28.37 28.70 56.06
## 6 2014 0.732 0.466 0.380 0.506 0.778 38.06 81.70 9.98 26.24 28.09 55.45
## 7 2013 0.463 0.458 0.363 0.502 0.781 37.33 81.56 9.37 25.80 27.96 55.76
## FT FTA ORB DRB TRB AST STL BLK TOV PF PTS
## 2 17.60 23.39 11.65 34.45 46.10 25.83 8.23 5.11 17.04 23.56 113.34
## 3 15.83 20.17 9.06 32.84 41.90 23.73 7.78 4.24 15.56 19.59 103.35
## 4 18.10 24.87 10.27 34.06 44.33 23.63 8.20 4.84 15.78 18.18 103.16
## 5 15.63 19.98 8.28 33.80 42.09 25.61 9.11 5.93 14.95 19.15 102.84
## 6 16.45 21.16 8.72 31.84 40.56 25.74 9.07 4.63 14.23 17.77 102.55
## 7 16.98 21.73 8.70 31.28 39.98 24.89 8.29 3.98 15.26 19.23 101.00
Lets make the “W.L.” column more descriptive. We will rename it our “Win Percentage” column
names(HawksSeasons)[names(HawksSeasons)=="W.L."]<-"Win Percent"
head (HawksSeasons)
## Season Win Percent FG. X3P. X2P. FT. FG FGA X3P X3PA X2P
## 2 2018 0.354 0.451 0.352 0.518 0.752 41.37 91.76 13.01 37.00 28.35
## 3 2017 0.293 0.446 0.360 0.495 0.785 38.17 85.55 11.18 31.02 26.99
## 4 2016 0.524 0.451 0.341 0.501 0.728 38.09 84.37 8.89 26.06 29.20
## 5 2015 0.585 0.458 0.350 0.512 0.783 38.63 84.43 9.94 28.37 28.70
## 6 2014 0.732 0.466 0.380 0.506 0.778 38.06 81.70 9.98 26.24 28.09
## 7 2013 0.463 0.458 0.363 0.502 0.781 37.33 81.56 9.37 25.80 27.96
## X2PA FT FTA ORB DRB TRB AST STL BLK TOV PF PTS
## 2 54.76 17.60 23.39 11.65 34.45 46.10 25.83 8.23 5.11 17.04 23.56 113.34
## 3 54.52 15.83 20.17 9.06 32.84 41.90 23.73 7.78 4.24 15.56 19.59 103.35
## 4 58.30 18.10 24.87 10.27 34.06 44.33 23.63 8.20 4.84 15.78 18.18 103.16
## 5 56.06 15.63 19.98 8.28 33.80 42.09 25.61 9.11 5.93 14.95 19.15 102.84
## 6 55.45 16.45 21.16 8.72 31.84 40.56 25.74 9.07 4.63 14.23 17.77 102.55
## 7 55.76 16.98 21.73 8.70 31.28 39.98 24.89 8.29 3.98 15.26 19.23 101.00
Let’s Correlate all of our statistical columns with our Win% column to get a better idea of which stats have a positive or negative relationship with Win Percentage
win_correlations<-sort(round(cor(HawksSeasons[,c(2:23)]),digits = 2)[,1],decreasing = T)
win_correlations
## Win Percent FG. X3P. X2P. AST X3P
## 1.00 0.77 0.62 0.57 0.36 0.31
## X3PA FT. STL DRB FG PTS
## 0.26 0.12 0.11 0.09 0.07 0.07
## BLK FGA X2P FT TRB FTA
## 0.03 -0.22 -0.33 -0.34 -0.36 -0.38
## X2PA ORB TOV PF
## -0.51 -0.65 -0.71 -0.77
FG., which is comprised of both 3P and 2P, is the most correlated to Win percentage and PF is the least correlated. Interesting to see that Total Points (PTS) kind of falls in the middle of the pack.
Now let’s perform a Regression Analysis to see which stats could help predict Win% for the upcoming season
x<-HawksSeasons$FG.
y<-HawksSeasons$`Win Percent`
mod<-lm(y~x)
summary(mod)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.18682 -0.05368 0.01957 0.06868 0.12636
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.2374 0.9392 -4.512 0.000308 ***
## x 10.3928 2.0802 4.996 0.000110 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09456 on 17 degrees of freedom
## Multiple R-squared: 0.5949, Adjusted R-squared: 0.571
## F-statistic: 24.96 on 1 and 17 DF, p-value: 0.0001105
After a series of test with different variables, we have noticed that FG. represents the model best, which can be displayed by an R-squared at nearly 60%
Now its time to plot our Win% and FG. points, along with the best fitted line
plot(x,y, main="Atlanta Hawks Season's Win Percentage \n Regression Analysis", xlab="Field Goal Percentage",ylab="Winning Percentage")
text(.46,.2,"y=-4.237+10.393(X)")
abline(mod, col="red")