R Download-https://cloud.r-project.org/ R Script Download-https://www.rstudio.com/products/rstudio/download/#download

Download these datasets for lessons(Open in New Tab)- https://drive.google.com/drive/folders/13AJY5Z1j1Uq1pCsgjXDxghBy8-z9pMWI?usp=sharing

Please download R Studio Desktop

I like to use sports data not only because I like it, but also its some of the most readily available data in terms of numbers and generally easy to understand. Just ask me if you have any questions about anything covered here.

field goal makes (FGM), field goal attempts (FGA), three point makes (TPM), three point attempts (TPA), free throw makes (FTM), free throw attempts (FTA),

## # A tibble: 10 x 7
##    PLAYER            FGM   FGA   TPM   TPA   FTM   FTA
##    <chr>           <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Ray Allen         804  1596   401   877   362   390
##  2 Gary Payton       571  1341   118   328   278   348
##  3 Shawn Kemp        470  1033    63   270   397   477
##  4 Mitch Richmond    710  1617   236   657   720   837
##  5 Rick Barry        698  1381   186   480   447   498
##  6 Oscar Robertson   737  1416    87   282   359   491
##  7 Hakeem Olajuwon   498  1112   126   342   250   280
##  8 Vince Carter      373   886    81   243   201   240
##  9 Moses Malone      215   442     0     2    92   131
## 10 Patrick Ewing     552  1061     2     6   208   586

Vectors

How could we load this data into R? Vectors! The fundamental unit of data storage in R is a vector, which is just an ordered collection of data. We can input each column in the above table as a vector. Vectors allow us to do multiple calculations on the same group of data.

fgm <-c(804, 571, 470, 710, 698, 737, 498, 373, 215, 552)
fgm
##  [1] 804 571 470 710 698 737 498 373 215 552

Practice-Create these vectors for all other categories.

In R, vectors don’t have to always be numbers. They can be letters and words also known as strings and characters. Here is an example of creating vectors with player names. Vectors must contain only characters or only numbers.

players <- c("Ray Allen", "Gary Payton", "Shawn Kemp", "Mitch Richmond", "Rick Barry", "Oscar Robertson", "Hakeem Olajuwon", "Vince Carter", "Moses Malone", "Patrick Ewing")
players
##  [1] "Ray Allen"       "Gary Payton"     "Shawn Kemp"     
##  [4] "Mitch Richmond"  "Rick Barry"      "Oscar Robertson"
##  [7] "Hakeem Olajuwon" "Vince Carter"    "Moses Malone"   
## [10] "Patrick Ewing"

We know that the first element of the vector fgm is Ray Allen’s FGM. When we print fgm for instance, R does not tell us which values correspond to which players.

fgm
##  [1] 804 571 470 710 698 737 498 373 215 552

As we proceed with our analysis, it’ll be useful to have named the elements of the vector. We can do that with the names() function. When we use names(), we put the vector whose elements we want to name in the parantheses and on the right-hand side of the <- we put a character vector of the same length as the vector we want to name containing the element names. In the example above, we set the names of fgm to be the elements of the vector players.

names(fgm) <- players
fgm
##       Ray Allen     Gary Payton      Shawn Kemp  Mitch Richmond 
##             804             571             470             710 
##      Rick Barry Oscar Robertson Hakeem Olajuwon    Vince Carter 
##             698             737             498             373 
##    Moses Malone   Patrick Ewing 
##             215             552

Practice-Name the rest of the categories so that they correspond

Vector Math

Vector math is something relatively simple in R, for now it is just taking your variable name and manipulating it like below.

fgm+2
##       Ray Allen     Gary Payton      Shawn Kemp  Mitch Richmond 
##             806             573             472             712 
##      Rick Barry Oscar Robertson Hakeem Olajuwon    Vince Carter 
##             700             739             500             375 
##    Moses Malone   Patrick Ewing 
##             217             554
fgm-11
##       Ray Allen     Gary Payton      Shawn Kemp  Mitch Richmond 
##             793             560             459             699 
##      Rick Barry Oscar Robertson Hakeem Olajuwon    Vince Carter 
##             687             726             487             362 
##    Moses Malone   Patrick Ewing 
##             204             541
fgm/3
##       Ray Allen     Gary Payton      Shawn Kemp  Mitch Richmond 
##       268.00000       190.33333       156.66667       236.66667 
##      Rick Barry Oscar Robertson Hakeem Olajuwon    Vince Carter 
##       232.66667       245.66667       166.00000       124.33333 
##    Moses Malone   Patrick Ewing 
##        71.66667       184.00000
fgm*2
##       Ray Allen     Gary Payton      Shawn Kemp  Mitch Richmond 
##            1608            1142             940            1420 
##      Rick Barry Oscar Robertson Hakeem Olajuwon    Vince Carter 
##            1396            1474             996             746 
##    Moses Malone   Patrick Ewing 
##             430            1104
fgm^2
##       Ray Allen     Gary Payton      Shawn Kemp  Mitch Richmond 
##          646416          326041          220900          504100 
##      Rick Barry Oscar Robertson Hakeem Olajuwon    Vince Carter 
##          487204          543169          248004          139129 
##    Moses Malone   Patrick Ewing 
##           46225          304704

Now if we wanted to make a category for Field Goal Percentage (Shot percenatage) we could just do vector math with the variables we have which our fgm and fga.

fgp<-fgm/fga
fgp
##       Ray Allen     Gary Payton      Shawn Kemp  Mitch Richmond 
##       0.5037594       0.4258016       0.4549855       0.4390847 
##      Rick Barry Oscar Robertson Hakeem Olajuwon    Vince Carter 
##       0.5054308       0.5204802       0.4478417       0.4209932 
##    Moses Malone   Patrick Ewing 
##       0.4864253       0.5202639

Now try this with TPP(Three Point Percentage) and FTP(Free Throw Percentage)

R can also do some very basic statistics calculations with certain headers.

sum(fgm)
## [1] 5628
mean(fgm)
## [1] 562.8
median(fgm)
## [1] 561.5
min(fgm)
## [1] 215
max(fgm)
## [1] 804
sd(fgm)
## [1] 182.103
range(fgm)
## [1] 215 804
sort(fgm)
##    Moses Malone    Vince Carter      Shawn Kemp Hakeem Olajuwon 
##             215             373             470             498 
##   Patrick Ewing     Gary Payton      Rick Barry  Mitch Richmond 
##             552             571             698             710 
## Oscar Robertson       Ray Allen 
##             737             804
quantile(fgm)
##    0%   25%   50%   75%  100% 
## 215.0 477.0 561.5 707.0 804.0
summary(fgm)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   215.0   477.0   561.5   562.8   707.0   804.0

If we wanted to just observe the data of certain players we could do what is called subsetting which lets us take specific parts of a vector by position.

fgp[4]
## Mitch Richmond 
##      0.4390847
fgp[8]
## Vince Carter 
##    0.4209932

With subsetting, try changing the value of Hakeem Olajuwon’s fgm to 503.

More Practice

Field Goal Percentage is a good statistic but has some flaws with it as it does not take into account the difficulty invoved with three pointers. We can create a statistic known as Effective Field Goal Percentage (efgp) which is calculated as efgp=(fgm+tpm*0.5)/fga. Try doing this yourself.

Once you do Effective Field Goal Percentage try working with True Shooting Percentage (TSP) to account for free throws. TSP=Points/(2(fga+0.44FTA)). You will need to also create a points category yourself, PTS=FTM+2×FGM+TPM.