R Download-https://cloud.r-project.org/ R Script Download-https://www.rstudio.com/products/rstudio/download/#download
Download these datasets for lessons(Open in New Tab)- https://drive.google.com/drive/folders/13AJY5Z1j1Uq1pCsgjXDxghBy8-z9pMWI?usp=sharing
Please download R Studio Desktop
I like to use sports data not only because I like it, but also its some of the most readily available data in terms of numbers and generally easy to understand. Just ask me if you have any questions about anything covered here.
field goal makes (FGM), field goal attempts (FGA), three point makes (TPM), three point attempts (TPA), free throw makes (FTM), free throw attempts (FTA),
## # A tibble: 10 x 7
## PLAYER FGM FGA TPM TPA FTM FTA
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Ray Allen 804 1596 401 877 362 390
## 2 Gary Payton 571 1341 118 328 278 348
## 3 Shawn Kemp 470 1033 63 270 397 477
## 4 Mitch Richmond 710 1617 236 657 720 837
## 5 Rick Barry 698 1381 186 480 447 498
## 6 Oscar Robertson 737 1416 87 282 359 491
## 7 Hakeem Olajuwon 498 1112 126 342 250 280
## 8 Vince Carter 373 886 81 243 201 240
## 9 Moses Malone 215 442 0 2 92 131
## 10 Patrick Ewing 552 1061 2 6 208 586
How could we load this data into R? Vectors! The fundamental unit of data storage in R is a vector, which is just an ordered collection of data. We can input each column in the above table as a vector. Vectors allow us to do multiple calculations on the same group of data.
fgm <-c(804, 571, 470, 710, 698, 737, 498, 373, 215, 552)
fgm
## [1] 804 571 470 710 698 737 498 373 215 552
Practice-Create these vectors for all other categories.
In R, vectors don’t have to always be numbers. They can be letters and words also known as strings and characters. Here is an example of creating vectors with player names. Vectors must contain only characters or only numbers.
players <- c("Ray Allen", "Gary Payton", "Shawn Kemp", "Mitch Richmond", "Rick Barry", "Oscar Robertson", "Hakeem Olajuwon", "Vince Carter", "Moses Malone", "Patrick Ewing")
players
## [1] "Ray Allen" "Gary Payton" "Shawn Kemp"
## [4] "Mitch Richmond" "Rick Barry" "Oscar Robertson"
## [7] "Hakeem Olajuwon" "Vince Carter" "Moses Malone"
## [10] "Patrick Ewing"
We know that the first element of the vector fgm is Ray Allen’s FGM. When we print fgm for instance, R does not tell us which values correspond to which players.
fgm
## [1] 804 571 470 710 698 737 498 373 215 552
As we proceed with our analysis, it’ll be useful to have named the elements of the vector. We can do that with the names() function. When we use names(), we put the vector whose elements we want to name in the parantheses and on the right-hand side of the <- we put a character vector of the same length as the vector we want to name containing the element names. In the example above, we set the names of fgm to be the elements of the vector players.
names(fgm) <- players
fgm
## Ray Allen Gary Payton Shawn Kemp Mitch Richmond
## 804 571 470 710
## Rick Barry Oscar Robertson Hakeem Olajuwon Vince Carter
## 698 737 498 373
## Moses Malone Patrick Ewing
## 215 552
Practice-Name the rest of the categories so that they correspond
Vector math is something relatively simple in R, for now it is just taking your variable name and manipulating it like below.
fgm+2
## Ray Allen Gary Payton Shawn Kemp Mitch Richmond
## 806 573 472 712
## Rick Barry Oscar Robertson Hakeem Olajuwon Vince Carter
## 700 739 500 375
## Moses Malone Patrick Ewing
## 217 554
fgm-11
## Ray Allen Gary Payton Shawn Kemp Mitch Richmond
## 793 560 459 699
## Rick Barry Oscar Robertson Hakeem Olajuwon Vince Carter
## 687 726 487 362
## Moses Malone Patrick Ewing
## 204 541
fgm/3
## Ray Allen Gary Payton Shawn Kemp Mitch Richmond
## 268.00000 190.33333 156.66667 236.66667
## Rick Barry Oscar Robertson Hakeem Olajuwon Vince Carter
## 232.66667 245.66667 166.00000 124.33333
## Moses Malone Patrick Ewing
## 71.66667 184.00000
fgm*2
## Ray Allen Gary Payton Shawn Kemp Mitch Richmond
## 1608 1142 940 1420
## Rick Barry Oscar Robertson Hakeem Olajuwon Vince Carter
## 1396 1474 996 746
## Moses Malone Patrick Ewing
## 430 1104
fgm^2
## Ray Allen Gary Payton Shawn Kemp Mitch Richmond
## 646416 326041 220900 504100
## Rick Barry Oscar Robertson Hakeem Olajuwon Vince Carter
## 487204 543169 248004 139129
## Moses Malone Patrick Ewing
## 46225 304704
Now if we wanted to make a category for Field Goal Percentage (Shot percenatage) we could just do vector math with the variables we have which our fgm and fga.
fgp<-fgm/fga
fgp
## Ray Allen Gary Payton Shawn Kemp Mitch Richmond
## 0.5037594 0.4258016 0.4549855 0.4390847
## Rick Barry Oscar Robertson Hakeem Olajuwon Vince Carter
## 0.5054308 0.5204802 0.4478417 0.4209932
## Moses Malone Patrick Ewing
## 0.4864253 0.5202639
Now try this with TPP(Three Point Percentage) and FTP(Free Throw Percentage)
R can also do some very basic statistics calculations with certain headers.
sum(fgm)
## [1] 5628
mean(fgm)
## [1] 562.8
median(fgm)
## [1] 561.5
min(fgm)
## [1] 215
max(fgm)
## [1] 804
sd(fgm)
## [1] 182.103
range(fgm)
## [1] 215 804
sort(fgm)
## Moses Malone Vince Carter Shawn Kemp Hakeem Olajuwon
## 215 373 470 498
## Patrick Ewing Gary Payton Rick Barry Mitch Richmond
## 552 571 698 710
## Oscar Robertson Ray Allen
## 737 804
quantile(fgm)
## 0% 25% 50% 75% 100%
## 215.0 477.0 561.5 707.0 804.0
summary(fgm)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 215.0 477.0 561.5 562.8 707.0 804.0
If we wanted to just observe the data of certain players we could do what is called subsetting which lets us take specific parts of a vector by position.
fgp[4]
## Mitch Richmond
## 0.4390847
fgp[8]
## Vince Carter
## 0.4209932
With subsetting, try changing the value of Hakeem Olajuwon’s fgm to 503.
Field Goal Percentage is a good statistic but has some flaws with it as it does not take into account the difficulty invoved with three pointers. We can create a statistic known as Effective Field Goal Percentage (efgp) which is calculated as efgp=(fgm+tpm*0.5)/fga. Try doing this yourself.
Once you do Effective Field Goal Percentage try working with True Shooting Percentage (TSP) to account for free throws. TSP=Points/(2(fga+0.44FTA)). You will need to also create a points category yourself, PTS=FTM+2×FGM+TPM.