To access R and R Studio which are installed on the Saint Ann’s server you can go to: http://rstudio.saintannsny.org:8787/ and log in with your Saint Ann’s email address.

Once again we’ll be using the to filter, arrange, select and mutate functions from the dplyr package in R. This time we’ll be using NBA data and focusing on using the piping operator, %>%, to combine operations.

Loading the packages and fetching the data

The Sports Analytics package can be used to fetch NBA data. Below, we use it to fetch data from the 2015-2016 NBA season and assign it to the dataframe nba. We also load the dplyr package once again to help us explore the data. Finally, we take a look at the first six rows of the nba dataframe we created in order to know the column names and see what our data looks like.

library(SportsAnalytics)
nba <- fetch_NBAPlayerStatistics("15-16")
library(dplyr)
head(nba)

For the most part, I’ll be expecing you to look back at our first data transformation lab for help on how to use the filter, arrange and select functions. However, for the sake of a little review, here are three lines of code that will: find the top 10 players in the NBA by minutes played, find the top 10 knicks by minutes played, and show the Knicks roster.

nba %>% top_n(10, TotalMinutesPlayed) %>% arrange(desc(TotalMinutesPlayed))
nba %>% filter(Team=="NYK") %>% top_n(10, TotalMinutesPlayed) %>% arrange(desc(TotalMinutesPlayed))
nba %>% filter(Team=="NYK" & TotalMinutesPlayed>=100) %>% select(Name, Position)

We might also want to make use of the mutate() function to add columns based on our own calculations. Our data set does not include each player’s shooting percentage but we can calculate it:

nba %>% mutate(FieldGoalPercentage = FieldGoalsMade/FieldGoalsAttempted)

The line above calculate every player’s shooting percentage and shows them but it doesn’t permanently add this calcuation to our dataframe. To do so, we actually need to write over the dataframe as follows:

nba <- nba %>% mutate(FieldGoalPercentage = FieldGoalsMade/FieldGoalsAttempted)

Problems:

  1. Add three point shooting percentage, free throw shooting percentage to the data set.
  2. Add points per game, rebounds per game, assists per game and steals per game to the data set.
  3. Produce a top 10 list of steals per game using some minimum number of games played.
  4. Produce a top 10 list of rebounds per game that includes only shooting guards and point guards (SG and PG.)

Questions:

  1. Which player had the most rebounds per 48 minutes (minimum 100 minutes played)?
  2. Which player had the highest assists to turnovers ratio (minimum 100 assists)?
  3. Which center was the best free throw shooter?

Going Further:

Google “true shooting percentage” and then calculate it for every player. Decide on a reasonable minimum amount of either playing time or shots taken and then produce a top 5 list by true shooting percentage for every position.