We’re now going to dive into a much larger analysis of NBA shooting statistics for all players between the 1996-97 season and the 2021-22 season.
Step 2: load the tidyverse library:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.7
## ✔ tidyr 1.1.3 ✔ stringr 1.4.0
## ✔ readr 1.4.0 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Step 3: read in the larger csv file:
nba_shooting <- read_csv('data/nba_shooting_1997_2022.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## PLAYER = col_character(),
## SEASON = col_double(),
## FGM = col_double(),
## FGA = col_double(),
## TPM = col_double(),
## TPA = col_double(),
## FTM = col_double(),
## FTA = col_double()
## )
Step 4: Add columns to the tbl containing field goal percentage (FGP), three point percentage (TPP), and (FTP). You should write the code to do this in your R Script and then go ahead execute the code.
nba_shooting <- mutate(nba_shooting,
FGP = FGM / FGA,
TPP = TPM / TPA,
FTP = FTM / FTA)
nba_shooting
## # A tibble: 11,841 × 11
## PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Jayson Tatum 2022 708 1564 230 651 400 469 0.453 0.353 0.853
## 2 Trae Young 2022 711 1544 233 610 500 553 0.460 0.382 0.904
## 3 DeMar DeRozan 2022 774 1535 50 142 520 593 0.504 0.352 0.877
## 4 Devin Booker 2022 662 1421 183 478 315 363 0.466 0.383 0.868
## 5 Luka Dončić 2022 641 1403 201 569 364 489 0.457 0.353 0.744
## 6 Donovan Mitchell 2022 617 1376 232 654 267 313 0.448 0.355 0.853
## 7 Joel Embiid 2022 666 1334 93 251 654 803 0.499 0.371 0.814
## 8 Nikola Jokić 2022 764 1311 97 288 379 468 0.583 0.337 0.810
## 9 LaMelo Ball 2022 538 1254 220 565 212 243 0.429 0.389 0.872
## 10 Julius Randle 2022 512 1246 120 390 303 401 0.411 0.308 0.756
## # … with 11,831 more rows
Step 5: One criticism of FGP is that it treats 2-point shots the same as 3-point shots. As a result, the league leader in FGP is usually a center whose shots mostly come from near the rim. effective Field Goal Percentage is a statistic that adjusts FGP to account for the fact that a made 3-point shots is worth 50% more than a made 2-point shot. The formula for eFGP is eFGP = (FGM + 0.5 × TPM) / FGA. Write another line of code to your script which will add a column for eFGP to the data.
nba_shooting <- mutate(nba_shooting,
eFGP = (FGM + (0.5 * TPM)) / FGA )
nba_shooting
## # A tibble: 11,841 × 12
## PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP eFGP
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Jayson Ta… 2022 708 1564 230 651 400 469 0.453 0.353 0.853 0.526
## 2 Trae Young 2022 711 1544 233 610 500 553 0.460 0.382 0.904 0.536
## 3 DeMar DeR… 2022 774 1535 50 142 520 593 0.504 0.352 0.877 0.521
## 4 Devin Boo… 2022 662 1421 183 478 315 363 0.466 0.383 0.868 0.530
## 5 Luka Donč… 2022 641 1403 201 569 364 489 0.457 0.353 0.744 0.529
## 6 Donovan M… 2022 617 1376 232 654 267 313 0.448 0.355 0.853 0.533
## 7 Joel Embi… 2022 666 1334 93 251 654 803 0.499 0.371 0.814 0.534
## 8 Nikola Jo… 2022 764 1311 97 288 379 468 0.583 0.337 0.810 0.620
## 9 LaMelo Ba… 2022 538 1254 220 565 212 243 0.429 0.389 0.872 0.517
## 10 Julius Ra… 2022 512 1246 120 390 303 401 0.411 0.308 0.756 0.459
## # … with 11,831 more rows
Step 6: Add a new column to the tbl that records the number of points scored
nba_shooting <- mutate(nba_shooting,
PTS = FTM + 2*FGM + TPM)
Step 7: Both field goal percentage and effective field goal percentage totally ignore free throws. One metric that accounts for all field goals, three pointers, and free throws is true shooting percentage, whose formula is given by
nba_shooting <- mutate(nba_shooting,
TPS = PTS/(2*(FGA + 0.44*FTA)))
Step 8: Arrange your tbl so that the players are sorted in increasing order of true shooting percentage. Which player has the best true shooting percentage?
nba_shooting <- arrange(nba_shooting, desc(TPS))
nba_shooting
## # A tibble: 11,841 × 14
## PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP eFGP
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Donnel… 2005 2 2 0 0 4 4 1 NaN 1 1
## 2 Tyson … 1999 1 1 1 1 1 2 1 1 0.5 1.5
## 3 David … 2018 2 3 2 3 4 4 0.667 0.667 1 1
## 4 Jackie… 2005 4 4 0 0 2 2 1 NaN 1 1
## 5 Udonis… 2021 2 2 0 0 0 0 1 NaN NaN 1
## 6 Rakeem… 2016 2 2 0 0 0 0 1 NaN NaN 1
## 7 Marcus… 2009 2 2 0 0 0 0 1 NaN NaN 1
## 8 David … 2001 3 3 0 0 0 0 1 NaN NaN 1
## 9 Julyan… 2013 2 2 0 0 3 4 1 NaN 0.75 1
## 10 Kostas… 2020 3 3 0 0 1 2 1 NaN 0.5 1
## # … with 11,831 more rows, and 2 more variables: PTS <dbl>, TPS <dbl>
Step 9: Repeat the above steps with and without using the pipe operator %>%, and try to become familiar with using it.
nba_shooting_raw <- read_csv('nba_shooting.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## PLAYER = col_character(),
## SEASON = col_double(),
## FGM = col_double(),
## FGA = col_double(),
## TPM = col_double(),
## TPA = col_double(),
## FTM = col_double(),
## FTA = col_double()
## )
nba_shooting <- nba_shooting_raw %>%
mutate(FGP = FGM / FGA,
TPP = TPM / TPA,
FTP = FTM / FTA) %>%
mutate(PTS = FTM + 2*FGM + TPM) %>%
mutate(TPS = PTS/(2*(FGA + 0.44*FTA))) %>%
arrange(desc(TPS))
nba_shooting
## # A tibble: 7,447 × 13
## PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP PTS
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dajuan Wa… 2007 1 1 1 1 1 2 1 1 0.5 4
## 2 Tyson Whe… 1999 1 1 1 1 1 2 1 1 0.5 4
## 3 Amir John… 2006 7 10 2 3 4 4 0.7 0.667 1 20
## 4 Keith Bog… 2014 3 6 3 6 3 3 0.5 0.5 1 12
## 5 Steve Nov… 2011 35 67 26 46 8 8 0.522 0.565 1 104
## 6 Chris Cra… 2003 8 13 1 3 7 8 0.615 0.333 0.875 24
## 7 Chris Wil… 2013 110 153 0 1 39 58 0.719 0 0.672 259
## 8 Maceo Bas… 2007 49 76 3 7 37 47 0.645 0.429 0.787 138
## 9 Tyson Cha… 2012 241 355 0 2 217 315 0.679 0 0.689 699
## 10 Kyle Korv… 2015 292 600 221 449 106 118 0.487 0.492 0.898 911
## # … with 7,437 more rows, and 1 more variable: TPS <dbl>
Step 9: Repeat the above steps with and without using the pipe operator %>%, and try to become familiar with using it.
arrange(nba_shooting, desc(FGA))
## # A tibble: 7,447 × 13
## PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP PTS
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Kobe Brya… 2006 978 2173 180 518 696 819 0.450 0.347 0.850 2832
## 2 Allen Ive… 2003 804 1940 84 303 570 736 0.414 0.277 0.774 2262
## 3 Jerry Sta… 2001 774 1927 166 473 666 810 0.402 0.351 0.822 2380
## 4 Kobe Brya… 2003 868 1924 124 324 601 713 0.451 0.383 0.843 2461
## 5 Michael J… 1998 881 1893 30 126 565 721 0.465 0.238 0.784 2357
## 6 Michael J… 1997 920 1892 111 297 480 576 0.486 0.374 0.833 2431
## 7 LeBron Ja… 2006 875 1823 127 379 601 814 0.480 0.335 0.738 2478
## 8 Allen Ive… 2006 815 1822 72 223 675 829 0.447 0.323 0.814 2377
## 9 Allen Ive… 2005 771 1818 104 338 656 786 0.424 0.308 0.835 2302
## 10 Tracy McG… 2003 829 1813 173 448 576 726 0.457 0.386 0.793 2407
## # … with 7,437 more rows, and 1 more variable: TPS <dbl>