Problem Set 1 - Part 2

We’re now going to dive into a much larger analysis of NBA shooting statistics for all players between the 1996-97 season and the 2021-22 season.

Step 2: load the tidyverse library:

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.7
## ✔ tidyr   1.1.3     ✔ stringr 1.4.0
## ✔ readr   1.4.0     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Step 3: read in the larger csv file:

nba_shooting <- read_csv('data/nba_shooting_1997_2022.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   PLAYER = col_character(),
##   SEASON = col_double(),
##   FGM = col_double(),
##   FGA = col_double(),
##   TPM = col_double(),
##   TPA = col_double(),
##   FTM = col_double(),
##   FTA = col_double()
## )

Step 4: Add columns to the tbl containing field goal percentage (FGP), three point percentage (TPP), and (FTP). You should write the code to do this in your R Script and then go ahead execute the code.

nba_shooting <- mutate(nba_shooting, 
                       FGP = FGM / FGA, 
                       TPP = TPM / TPA, 
                       FTP = FTM / FTA)

nba_shooting
## # A tibble: 11,841 × 11
##    PLAYER           SEASON   FGM   FGA   TPM   TPA   FTM   FTA   FGP   TPP   FTP
##    <chr>             <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Jayson Tatum       2022   708  1564   230   651   400   469 0.453 0.353 0.853
##  2 Trae Young         2022   711  1544   233   610   500   553 0.460 0.382 0.904
##  3 DeMar DeRozan      2022   774  1535    50   142   520   593 0.504 0.352 0.877
##  4 Devin Booker       2022   662  1421   183   478   315   363 0.466 0.383 0.868
##  5 Luka Dončić        2022   641  1403   201   569   364   489 0.457 0.353 0.744
##  6 Donovan Mitchell   2022   617  1376   232   654   267   313 0.448 0.355 0.853
##  7 Joel Embiid        2022   666  1334    93   251   654   803 0.499 0.371 0.814
##  8 Nikola Jokić       2022   764  1311    97   288   379   468 0.583 0.337 0.810
##  9 LaMelo Ball        2022   538  1254   220   565   212   243 0.429 0.389 0.872
## 10 Julius Randle      2022   512  1246   120   390   303   401 0.411 0.308 0.756
## # … with 11,831 more rows

Step 5: One criticism of FGP is that it treats 2-point shots the same as 3-point shots. As a result, the league leader in FGP is usually a center whose shots mostly come from near the rim. effective Field Goal Percentage is a statistic that adjusts FGP to account for the fact that a made 3-point shots is worth 50% more than a made 2-point shot. The formula for eFGP is eFGP = (FGM + 0.5 × TPM) / FGA. Write another line of code to your script which will add a column for eFGP to the data.

nba_shooting <- mutate(nba_shooting, 
                       eFGP = (FGM + (0.5 * TPM)) / FGA )

nba_shooting
## # A tibble: 11,841 × 12
##    PLAYER     SEASON   FGM   FGA   TPM   TPA   FTM   FTA   FGP   TPP   FTP  eFGP
##    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Jayson Ta…   2022   708  1564   230   651   400   469 0.453 0.353 0.853 0.526
##  2 Trae Young   2022   711  1544   233   610   500   553 0.460 0.382 0.904 0.536
##  3 DeMar DeR…   2022   774  1535    50   142   520   593 0.504 0.352 0.877 0.521
##  4 Devin Boo…   2022   662  1421   183   478   315   363 0.466 0.383 0.868 0.530
##  5 Luka Donč…   2022   641  1403   201   569   364   489 0.457 0.353 0.744 0.529
##  6 Donovan M…   2022   617  1376   232   654   267   313 0.448 0.355 0.853 0.533
##  7 Joel Embi…   2022   666  1334    93   251   654   803 0.499 0.371 0.814 0.534
##  8 Nikola Jo…   2022   764  1311    97   288   379   468 0.583 0.337 0.810 0.620
##  9 LaMelo Ba…   2022   538  1254   220   565   212   243 0.429 0.389 0.872 0.517
## 10 Julius Ra…   2022   512  1246   120   390   303   401 0.411 0.308 0.756 0.459
## # … with 11,831 more rows

Step 6: Add a new column to the tbl that records the number of points scored

nba_shooting <- mutate(nba_shooting, 
                       PTS =  FTM + 2*FGM + TPM)

Step 7: Both field goal percentage and effective field goal percentage totally ignore free throws. One metric that accounts for all field goals, three pointers, and free throws is true shooting percentage, whose formula is given by

nba_shooting <- mutate(nba_shooting, 
                       TPS =  PTS/(2*(FGA + 0.44*FTA)))

Step 8: Arrange your tbl so that the players are sorted in increasing order of true shooting percentage. Which player has the best true shooting percentage?

nba_shooting <- arrange(nba_shooting, desc(TPS))

nba_shooting
## # A tibble: 11,841 × 14
##    PLAYER  SEASON   FGM   FGA   TPM   TPA   FTM   FTA   FGP     TPP    FTP  eFGP
##    <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>  <dbl> <dbl>
##  1 Donnel…   2005     2     2     0     0     4     4 1     NaN       1      1  
##  2 Tyson …   1999     1     1     1     1     1     2 1       1       0.5    1.5
##  3 David …   2018     2     3     2     3     4     4 0.667   0.667   1      1  
##  4 Jackie…   2005     4     4     0     0     2     2 1     NaN       1      1  
##  5 Udonis…   2021     2     2     0     0     0     0 1     NaN     NaN      1  
##  6 Rakeem…   2016     2     2     0     0     0     0 1     NaN     NaN      1  
##  7 Marcus…   2009     2     2     0     0     0     0 1     NaN     NaN      1  
##  8 David …   2001     3     3     0     0     0     0 1     NaN     NaN      1  
##  9 Julyan…   2013     2     2     0     0     3     4 1     NaN       0.75   1  
## 10 Kostas…   2020     3     3     0     0     1     2 1     NaN       0.5    1  
## # … with 11,831 more rows, and 2 more variables: PTS <dbl>, TPS <dbl>

Step 9: Repeat the above steps with and without using the pipe operator %>%, and try to become familiar with using it.

nba_shooting_raw <- read_csv('nba_shooting.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   PLAYER = col_character(),
##   SEASON = col_double(),
##   FGM = col_double(),
##   FGA = col_double(),
##   TPM = col_double(),
##   TPA = col_double(),
##   FTM = col_double(),
##   FTA = col_double()
## )
nba_shooting <- nba_shooting_raw  %>%
  mutate(FGP = FGM / FGA, 
         TPP = TPM / TPA, 
         FTP = FTM / FTA) %>%
  mutate(PTS =  FTM + 2*FGM + TPM) %>%
  mutate(TPS =  PTS/(2*(FGA + 0.44*FTA))) %>%
  arrange(desc(TPS))

nba_shooting
## # A tibble: 7,447 × 13
##    PLAYER     SEASON   FGM   FGA   TPM   TPA   FTM   FTA   FGP   TPP   FTP   PTS
##    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Dajuan Wa…   2007     1     1     1     1     1     2 1     1     0.5       4
##  2 Tyson Whe…   1999     1     1     1     1     1     2 1     1     0.5       4
##  3 Amir John…   2006     7    10     2     3     4     4 0.7   0.667 1        20
##  4 Keith Bog…   2014     3     6     3     6     3     3 0.5   0.5   1        12
##  5 Steve Nov…   2011    35    67    26    46     8     8 0.522 0.565 1       104
##  6 Chris Cra…   2003     8    13     1     3     7     8 0.615 0.333 0.875    24
##  7 Chris Wil…   2013   110   153     0     1    39    58 0.719 0     0.672   259
##  8 Maceo Bas…   2007    49    76     3     7    37    47 0.645 0.429 0.787   138
##  9 Tyson Cha…   2012   241   355     0     2   217   315 0.679 0     0.689   699
## 10 Kyle Korv…   2015   292   600   221   449   106   118 0.487 0.492 0.898   911
## # … with 7,437 more rows, and 1 more variable: TPS <dbl>

Step 9: Repeat the above steps with and without using the pipe operator %>%, and try to become familiar with using it.

arrange(nba_shooting, desc(FGA))
## # A tibble: 7,447 × 13
##    PLAYER     SEASON   FGM   FGA   TPM   TPA   FTM   FTA   FGP   TPP   FTP   PTS
##    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Kobe Brya…   2006   978  2173   180   518   696   819 0.450 0.347 0.850  2832
##  2 Allen Ive…   2003   804  1940    84   303   570   736 0.414 0.277 0.774  2262
##  3 Jerry Sta…   2001   774  1927   166   473   666   810 0.402 0.351 0.822  2380
##  4 Kobe Brya…   2003   868  1924   124   324   601   713 0.451 0.383 0.843  2461
##  5 Michael J…   1998   881  1893    30   126   565   721 0.465 0.238 0.784  2357
##  6 Michael J…   1997   920  1892   111   297   480   576 0.486 0.374 0.833  2431
##  7 LeBron Ja…   2006   875  1823   127   379   601   814 0.480 0.335 0.738  2478
##  8 Allen Ive…   2006   815  1822    72   223   675   829 0.447 0.323 0.814  2377
##  9 Allen Ive…   2005   771  1818   104   338   656   786 0.424 0.308 0.835  2302
## 10 Tracy McG…   2003   829  1813   173   448   576   726 0.457 0.386 0.793  2407
## # … with 7,437 more rows, and 1 more variable: TPS <dbl>