1 Introduction

Hello! and welcome to another visual presentation from the “For Fun” series. Today we will be exploring graphs and the coding it took to get there. I will not go to much into detail on each graph, as this project is more to showcase my abilities to produce graphs and clean data. This is also for those that want to learn more about basketball, and the fundamentals of R programming. This data frame was created by me, but numbers were provided by (https://www.basketball-reference.com/). Enjoy!

rm(list = ls())
graphics.off()
library("foreign")
library('scales')
library('tidyverse')
## ── Attaching packages ──────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ──
## x readr::col_factor() masks scales::col_factor()
## x purrr::discard()    masks scales::discard()
## x dplyr::filter()     masks stats::filter()
## x dplyr::lag()        masks stats::lag()
library('grid')
library('egg')
## Loading required package: gridExtra
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library('patchwork')
library('gridExtra')
setwd("~/Desktop") #set work directory 
nba <- read.csv("NBA STATsR.csv") #upload data set 

2 Exploring the data frame

2.1 Dimensions

dim(nba)
## [1] 6280   40

Over 6000 rows, with 40 columns.

2.2 Structure

Lets get a more in depth view.

str(nba) #Structure of date
## 'data.frame':    6280 obs. of  40 variables:
##  $ Year              : Factor w/ 10 levels "2010-2011","2011-2012",..: 6 5 5 2 1 4 5 5 3 5 ...
##  $ Rk                : int  170 373 373 357 339 360 373 373 344 63 ...
##  $ Player            : Factor w/ 1287 levels "A.J. Hammons",..: 1 2 2 2 2 2 2 2 2 3 ...
##  $ ID                : Factor w/ 1291 levels "abrinal01","acyqu01",..: 470 962 962 962 962 962 962 962 962 156 ...
##  $ Pos               : Factor w/ 15 levels "C","C-PF","PF",..: 1 6 6 6 6 12 6 6 6 6 ...
##  $ Age               : int  24 28 28 25 24 27 28 28 26 30 ...
##  $ Tm                : Factor w/ 34 levels "ATL","BOS","BRK",..: 8 7 13 13 13 19 27 32 34 5 ...
##  $ Western.Conference: int  1 0 0 0 0 1 1 NA 0 0 ...
##  $ Eastern.Conference: int  0 1 1 1 1 0 0 NA 1 1 ...
##  $ Atlantic          : int  0 0 0 0 0 0 0 NA 0 0 ...
##  $ Central           : int  0 1 1 1 1 0 0 NA 0 1 ...
##  $ Northwest         : int  0 0 0 0 0 1 0 NA 0 0 ...
##  $ Pacific           : int  0 0 0 0 0 0 1 NA 0 0 ...
##  $ Southeast         : int  0 0 0 0 0 0 0 NA 1 0 ...
##  $ Southwest         : int  1 0 0 0 0 0 0 NA 0 0 ...
##  $ G                 : int  22 11 10 44 50 28 5 26 57 82 ...
##  $ GS                : int  0 0 0 1 0 0 0 0 22 21 ...
##  $ MP                : num  7.4 7.9 19.3 12.9 15.9 3.5 8.8 12.5 22.4 23 ...
##  $ FG                : num  0.8 0.8 3.9 1.3 2.3 0.7 0.6 2 2.8 4.2 ...
##  $ FGA               : num  1.9 3.1 8.9 4 6.4 1.6 2.8 5.3 7.2 10 ...
##  $ FG.               : num  0.405 0.265 0.438 0.339 0.356 0.413 0.214 0.372 0.39 0.421 ...
##  $ X3P               : num  0.2 0 1.5 0.6 0.8 0.2 0 0.6 1.2 1.5 ...
##  $ X3PA              : num  0.5 1 3.9 2 3 0.8 1.4 2.2 3.5 3.8 ...
##  $ X3P.              : num  0.5 0 0.385 0.295 0.275 0.273 0 0.263 0.35 0.387 ...
##  $ X2P               : num  0.5 0.8 2.4 0.8 1.5 0.5 0.6 1.4 1.6 2.7 ...
##  $ X2PA              : num  1.5 2.1 5 2 3.4 0.9 1.4 3.1 3.7 6.1 ...
##  $ X2P.              : num  0.375 0.391 0.48 0.384 0.427 0.542 0.429 0.45 0.427 0.442 ...
##  $ eFG.              : num  0.464 0.265 0.522 0.414 0.42 0.478 0.214 0.427 0.475 0.495 ...
##  $ FT                : num  0.4 0.4 1.2 0.6 1.1 0 0 0.6 0.9 1.8 ...
##  $ FTA               : num  0.9 0.5 1.8 0.8 1.6 0.1 0 0.9 1.1 2.1 ...
##  $ FT.               : num  0.45 0.667 0.667 0.8 0.667 0 NA 0.667 0.79 0.833 ...
##  $ ORB               : num  0.4 0.4 0.2 0.3 0.3 0 0 0.2 0.4 0.4 ...
##  $ DRB               : num  1.3 1 1.2 1.1 1.1 0.3 0.6 1 1.6 1.6 ...
##  $ TRB               : num  1.6 1.4 1.4 1.4 1.4 0.4 0.6 1.2 2 2 ...
##  $ AST               : num  0.2 1.2 2.7 2 2.2 0.5 1.2 1.8 3.6 3.2 ...
##  $ STL               : num  0 0.3 0.4 0.5 0.6 0 0 0.3 0.6 0.7 ...
##  $ BLK               : num  0.6 0 0 0 0 0 0 0 0.1 0.2 ...
##  $ TOV               : num  0.5 0.3 1 0.7 1.1 0.3 0.2 0.5 1.1 1.9 ...
##  $ PF                : num  1 0.1 0.9 0.7 1.2 0.2 1 0.6 1.3 2.3 ...
##  $ PTS               : num  2.2 2 10.5 3.9 6.5 1.6 1.2 5.1 7.7 11.6 ...

We see the season is listed as a factor variable, which means its discrete. This may be useful in later studies, but it may be best to create a continuous variable.

2.3 Viewing data frame

Lets see what kind of data we are working with.

#First 7 columns
head(nba [1:7])
##        Year  Rk       Player        ID Pos Age  Tm
## 1 2015-2016 170 A.J. Hammons hammoaj01   C  24 DAL
## 2 2014-2015 373   A.J. Price priceaj01  PG  28 CLE
## 3 2014-2015 373   A.J. Price priceaj01  PG  28 IND
## 4 2011-2012 357   A.J. Price priceaj01  PG  25 IND
## 5 2010-2011 339   A.J. Price priceaj01  PG  24 IND
## 6 2013-2014 360   A.J. Price priceaj01  SG  27 MIN
tail(nba [1:7])
##           Year  Rk             Player        ID Pos Age  Tm
## 6275 2014-2015 133       Zoran Dragić dragizo01  SG  25 MIA
## 6276 2014-2015 133       Zoran Dragić dragizo01  SG  25 PHO
## 6277 2014-2015 133       Zoran Dragić dragizo01  SG  25 TOT
## 6278 2010-2011 218 Zydrunas Ilgauskas ilgauzy01   C  35 MIA
## 6279 2018-2019  98     Zylan Cheatham cheatzy01  PF  24 NOP
## 6280 2019-2020  98     Zylan Cheatham cheatzy01  PF  24 NOP

3 Cleaning Data

nba <-  filter(nba,Tm != 'TOT') #TOT mean player was traded in that year
nba <- nba %>% 
  mutate(Player = as.character(Player)) %>% 
  rename(
    '3P' = 'X3P',
    '3P%' = 'X3P.',
    '2P' = 'X2P',
    '2P%' = 'X2P.',
    '3PA' = 'X3PA',
    '2PA' = 'X2PA'
    ) %>% #Change the type of varible and rename columns
  mutate(Western.Conference = as.factor(Western.Conference)) %>% 
  mutate(Eastern.Conference = as.factor(Eastern.Conference)) %>% 
  mutate(Atlantic = as.factor(Atlantic)) %>% 
  mutate(Central = as.factor(Central)) %>% 
  mutate(Northwest = as.factor(Northwest)) %>% 
  mutate(Pacific = as.factor(Pacific)) %>% 
  mutate(Southeast = as.factor(Southeast)) %>% 
  mutate(Southwest = as.factor(Southwest)) %>% 
    arrange( Player,desc(Year))#Arrange by name first, Year second
nba$Conference <- ifelse(nba$Eastern.Conference==1,"East","West") 
nba$Conference <- as.factor(nba$Conference)

Some players are traded during the middle of a season, so their stats for the following year would be aggregated under ‘TOT’. Under ‘TOT’ we cannot get proper visuals for our data, it would be best to eliminate them.

Next we want to clear out any missing or NA values we see.

sum(is.na(nba)) #number of missing values
## [1] 1098
nba <- nba %>% 
  replace(is.na(nba),0) #Replacing values with 0
sum(is.na(nba))
## [1] 0

If the player had zero three-point attempts, then it’s safe to assume that they had zero three pointers through the season. There are a lot of ‘NA’ or ‘-’ values plaguing this column, so we will eliminate those and replace them with ‘0’.

nba <- nba %>% 
  mutate(Year.cv = as.numeric(Year)) %>% 
  mutate(Year.cv = (2010 + Year.cv))
#Making year a separate and continuous variable

With dplyr package we can easily create this. I will label it “Year.cv”.

4 Creating Data Frames per Conference, and Division

#Creating EC, and divisions
East.Conference <- nba %>% 
  filter(Eastern.Conference == 1)
Atlantic <- nba %>% 
  filter(Atlantic == 1)
Central <- nba %>% 
  filter(Central == 1)
Southeast <- nba %>% 
  filter(Southeast == 1)
#Creating WC, and divisions
West.Conference <- nba %>% 
  filter(Western.Conference == 1)
Northwest <- nba %>% 
  filter(Northwest == 1)
Pacific <- nba %>% 
  filter(Pacific == 1)
Southwest <- nba %>% 
  filter(Southwest == 1)

I wanted to create separate data frames to explore visuals of stats per division with respect to their conference. It was not necessary to create data frames for every division in this markdown, but maybe we will explore them further in future research.

5 NBA Players Age

A histogram can perfectly illustrate the average age of an NBA player between 2010-2020.

#Avg Age of the past decade
plot.age <- ggplot(nba, aes(Age, fill = Conference))
(plot.age + geom_histogram(aes(y= ..density..),
                           color = 'white',
                           binwidth = 1,
                           boundary = 19,
                           position="dodge")+
    ylab("Variance") + 
    ggtitle("Age Distribution the Past Decade") + 
    theme_bw()+
    scale_fill_manual(name = "Conference", labels = c("East","West"),
                       values = c("blue", "red")))

The age for the average NBA player is around the 22-25 years old. With this histogram we see players typically start considering retiring after age 30.

6 Scoring East vs West

6.1 Statisitics

##West
West.Conference %>% 
  select(32:40, !everything()) %>% 
  summary()
##       ORB              DRB              TRB             AST       
##  Min.   :0.0000   Min.   : 0.000   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.:0.3000   1st Qu.: 1.300   1st Qu.: 1.70   1st Qu.: 0.60  
##  Median :0.6000   Median : 2.200   Median : 2.90   Median : 1.20  
##  Mean   :0.8554   Mean   : 2.598   Mean   : 3.45   Mean   : 1.78  
##  3rd Qu.:1.2000   3rd Qu.: 3.500   3rd Qu.: 4.60   3rd Qu.: 2.30  
##  Max.   :5.1000   Max.   :10.900   Max.   :15.20   Max.   :11.40  
##       STL              BLK              TOV              PF       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.3000   1st Qu.:0.1000   1st Qu.:0.500   1st Qu.:1.100  
##  Median :0.5000   Median :0.3000   Median :0.900   Median :1.700  
##  Mean   :0.6199   Mean   :0.3941   Mean   :1.108   Mean   :1.707  
##  3rd Qu.:0.9000   3rd Qu.:0.5000   3rd Qu.:1.500   3rd Qu.:2.300  
##  Max.   :2.5000   Max.   :3.7000   Max.   :5.700   Max.   :5.000  
##       PTS        
##  Min.   : 0.000  
##  1st Qu.: 3.500  
##  Median : 6.600  
##  Mean   : 8.125  
##  3rd Qu.:11.300  
##  Max.   :36.100

The offensive capabilities of the western conference is interesting. There is a player who averaged at least 36 points per game. Although the all around capabilities of the eastern conference, reigned supreme the last decade.

##East
East.Conference %>% 
  select(32:40, !everything()) %>% 
  summary()
##       ORB              DRB              TRB             AST        
##  Min.   :0.0000   Min.   : 0.000   Min.   : 0.00   Min.   : 0.000  
##  1st Qu.:0.3000   1st Qu.: 1.400   1st Qu.: 1.80   1st Qu.: 0.600  
##  Median :0.6000   Median : 2.300   Median : 3.00   Median : 1.200  
##  Mean   :0.8541   Mean   : 2.637   Mean   : 3.49   Mean   : 1.808  
##  3rd Qu.:1.1500   3rd Qu.: 3.500   3rd Qu.: 4.60   3rd Qu.: 2.400  
##  Max.   :6.0000   Max.   :12.000   Max.   :18.00   Max.   :12.800  
##       STL              BLK             TOV              PF       
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.3000   1st Qu.:0.100   1st Qu.:0.600   1st Qu.:1.200  
##  Median :0.6000   Median :0.300   Median :1.000   Median :1.700  
##  Mean   :0.6171   Mean   :0.383   Mean   :1.116   Mean   :1.708  
##  3rd Qu.:0.9000   3rd Qu.:0.500   3rd Qu.:1.500   3rd Qu.:2.300  
##  Max.   :2.4000   Max.   :6.000   Max.   :4.800   Max.   :4.000  
##       PTS        
##  Min.   : 0.000  
##  1st Qu.: 3.700  
##  Median : 6.700  
##  Mean   : 8.064  
##  3rd Qu.:11.600  
##  Max.   :30.500

When it comes to rebounds, assists and blocks, the eastern conference reigns supreme when compared to its adversary.

6.2 Average Points by Conference

#Avg points by conf
plot.con.points <- ggplot(nba, aes(Conference, PTS, fill = Conference))
  (plot.con.points + geom_boxplot() +
      coord_flip()+ 
      theme_bw() + xlab("Conference")+
      ggtitle("Points by Conference")+
    scale_fill_manual(name = "Conference", labels = c("East","West"),
                       values = c("blue", "red")))

There are a lot of outliers within this graph. In this particular case, we are observing players that averaged over 20 points the past decade, clearly setting themselves apart from the average NBA player.

7 Creating Data Frame from Scratch

Suppose we want to observe the change in points within this given data set. To do that I must create a data frame that contains the average points for each conference through 2010-2020. We can do this by doing the following:

#Creating trend line 
e.2010_2011 <- East.Conference %>% 
  filter(Year == "2010-2011") 
e.2011_2012 <- East.Conference %>% 
  filter(Year == "2011-2012") 
e.2012_2013 <- East.Conference %>% 
  filter(Year == "2012-2013") 
e.2013_2014 <- East.Conference %>% 
  filter(Year == "2013-2014") 
e.2014_2015 <- East.Conference %>% 
  filter(Year == "2014-2015") 
e.2015_2016 <- East.Conference %>% 
  filter(Year == "2015-2016") 
e.2016_2017 <- East.Conference %>% 
  filter(Year == "2017-2018") 
e.2017_2018 <- East.Conference %>% 
  filter(Year == "2017-2018") 
e.2018_2019 <- East.Conference %>% 
  filter(Year == "2018-2019") 
e.2019_2020 <- East.Conference %>% 
  filter(Year == "2019-2020")
##getting avg points per season
mean(e.2010_2011$PTS)
## [1] 7.967647

With the mean calculated, I calculate the rest of the conference’s with their respective season of performance.

7.1 Creating the Data Frame

With the calculated mean and some simple vector basics, creating a data frame is easy.

#Creating data frame for graph trend
name.value <- c("East","East","East","East","East","East","East","East","East","East",
                "West","West","West","West","West","West","West","West","West","West")
avg.points <- c(7.967647,7.751172,7.636466,7.580645,8.090288,8.142222,8.429114,8.429114,8.512111,8.512111,
                7.875556,7.827799,7.636187,7.911524,7.714478,8.400735,8.03887,8.460131,8.615563,8.615563)
time <- c(2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,
          2011,2012,2013,2014,2015,2016,2017,2018,2019,2020) 
avg.points.df <- rbind(name.value,avg.points,time)
avg.points.df <- as.data.frame(t(avg.points.df)) #transpose matrix
avg.points.df <- avg.points.df %>% 
  rename(
    'Conference' = 'name.value',
    'Points' = 'avg.points',
    'Year.Ending' = 'time'
  ) %>% 
      mutate(Points = as.numeric(Points)) %>% 
      mutate(Year.Ending = as.integer(Year.Ending)) %>% 
      mutate(Year.Ending = (2010 + Year.Ending)) #creating continuous variable

Using vectors, I was able to create this data frame. Now we can see change in average points between 2010-2020.

8 Avgerage Points Across the Decade

##graphing trend
plot.avg.points <- ggplot(avg.points.df, aes(Year.Ending, Points, color = Conference))
  (plot.avg.points + 
    geom_line()+ theme_bw()+ 
      labs(title = "Eastern vs Western",
           x = "Years",
           y = "Avg. Points")+
    scale_color_manual(name = "Conference", labels = c("East","West"),
                                               values = c("blue", "red")))

The 2011 NBA Lockout was effective, as we can see a drop in average scoring. Overall we can see scoring increase with the west having an edge in 2020.

9 Box Plot Points by Division

##########################
#Per division 
East.Conference$Division <- ifelse(East.Conference$Atlantic==1,"Atlantic",0)
East.Conference$Division <- ifelse(East.Conference$Central==1,"Central",East.Conference$Division)
East.Conference$Division <- ifelse(East.Conference$Southeast==1,"Southeast",East.Conference$Division)

West.Conference$Division <- ifelse(West.Conference$Pacific==1,"Pacific",0)
West.Conference$Division <- ifelse(West.Conference$Northwest==1,"Northwest",West.Conference$Division)
West.Conference$Division <- ifelse(West.Conference$Southwest==1,"Southwest",West.Conference$Division)

Here I created some indicator variables to help create a good descriptive legend for our graph.

9.1 Box Graph

#grid box plots
nba.division <- rbind(East.Conference,West.Conference)
plot.div.box <- ggplot(nba.division, aes(Conference, PTS, fill = Division))
(plot.div.box + geom_boxplot() +
    coord_flip()+ 
    theme_bw() + xlab("Avg. Points")+
    ggtitle("Points by Division")+
    labs( title = "Points by Division",
          x = "Conference",
          y = "Avg PTS"))

For the western conference, the northwest seem to take average scoring implying they have a good offensive system. Yet, they don’t have many outliers when compared to northwest. Meaning this division might have more star power compared to it’s adversary.

Eastern conference scoring, the central division has a slight edge. It also noticed how many capable players that can score over 20 points in the west when compared to the east.

10 Stat Visuals per Division

#grid jitters 
nba.avg <- nba.division %>% 
  group_by(Year, Division, Conference) %>% 
  summarise_each(funs(mean)) %>%   #As you study this language further you find shortcuts
  arrange(Division, Year) %>% 
  select(-c(Rk, Age, Player:Pos, Tm:Southwest))

I calculated the average stat for every division within each season, creating a continuous variable. These averages will help construct a more vivid line graph.

I will create various plots, and arrange them accordingly.

plot.avgp <- ggplot(nba.avg, aes(Year.cv, PTS,color = Division))
q1 <- (plot.avgp + geom_point() + scale_color_manual(name = "Division",  
                                                         labels = c("Atlantic","Central","Northwest","Pacific","Southeast","Southwest"),
                                                         values = c("#66FF33", "#336600","#0066CC","#FFFF33","#FF0033","#9966FF"))+
         theme_bw() + 
         theme(legend.position = "top") +
         labs( x = "Year",
               y = "Avg. Points"))
plot.avga <- ggplot(nba.avg, aes(Year.cv, AST, color = Division))
q2 <- (plot.avga + geom_point() + scale_color_manual(name = "Division",  
                                                        labels = c("Atlantic","Central","Northwest","Pacific","Southeast","Southwest"),
                                                        values = c("#66FF33", "#336600","#0066CC","#FFFF33","#FF0033","#9966FF")) +
       theme_bw() +
       theme(legend.position = "none") +
        labs(x = "Year",
             y = "Avg. Assists"))
plot.avgr <- ggplot(nba.avg, aes(Year.cv, TRB, color = Division))
q3 <- (plot.avgr + geom_point() + scale_color_manual(name = "Division",  
                                                         labels = c("Atlantic","Central","Northwest","Pacific","Southeast","Southwest"),
                                                         values = c("#66FF33", "#336600","#0066CC","#FFFF33","#FF0033","#9966FF")) +
        theme_bw() +
         theme(legend.position = "none") +
        labs( x = "Year",
              y = "Avg. Rebounds"))
plot.avgt <- ggplot(nba.avg, aes(Year.cv, TOV, color = Division))
q4 <- (plot.avgt + geom_point() + scale_color_manual(name = "Division",  
                                                         labels = c("Atlantic","Central","Northwest","Pacific","Southeast","Southwest"),
                                                         values = c("#66FF33", "#336600","#0066CC","#FFFF33","#FF0033","#9966FF")) +
         theme_bw() +
         theme(legend.position = "none") +
         labs( x = "Year",
               y = "Avg. Turnovers"))

With the plots created, I will now arrange them into a dashboard.

10.1 Stat Plot

ggarrange(q1, q2, q3, q4, 
          ncol = 2,
          top = "Through the Decade",
          bottom = textGrob(
            "This footnote is right-justified",
            gp = gpar(fontface = 3, fontsize = 10),
            hjust = 1, x = 1))

A dot plot is perfect for showing comparisons given the same circumstances.

10.2 2’s and 3’s

Lets observe shooting statistics between divisions. First I will create separate line graphs for each category we observing. -Attempted 2/3 point shots -Shooting percentages for both 2/3 point shots

#Conference Scoring
##West
### This whole section focuses on creating plots 
wcon <- nba.avg %>%
  filter(Conference == "West")

plot.wcon.3attempts <- ggplot(wcon, aes(Year.cv, `3PA`, color = Division))
q5 <- (plot.wcon.3attempts + geom_line() +
         scale_color_manual(name = "Division",  
                            labels = c("Northwest","Pacific","Southwest"),
                            values = c("#0066CC","#FFFF33","#9966FF")) +
         theme_bw() + 
         theme(legend.position = "top") +
         labs( x = "Year",
               y = "Attempted 3's"))
plot.wcon.3made <- ggplot(wcon, aes(Year.cv, `3P%`, color = Division))
q6 <- (plot.wcon.3made + geom_line() +
         scale_color_manual(name = "Division",  
                            labels = c("Northwest","Pacific","Southwest"),
                            values = c("#0066CC","#FFFF33","#9966FF")) +
         theme_bw() +
         theme(legend.position = "none") +
         labs( x = "Year"))

plot.wcon.2attempts <- ggplot(wcon, aes(Year.cv, `2PA`, color = Division))
q7 <- (plot.wcon.2attempts + geom_line() +
         scale_color_manual(name = "Division",  
                            labels = c("Northwest","Pacific","Southwest"),
                            values = c("#0066CC","#FFFF33","#9966FF")) +
         theme_bw() +
         theme(legend.position = "none") +
         labs( x = "Year",
               y = "Attempted 2's"))
plot.wcon.2made <- ggplot(wcon, aes(Year.cv, `2P%`, color = Division))
q8 <- (plot.wcon.2made + geom_line() +
         scale_color_manual(name = "Division",  
                            labels = c("Northwest","Pacific","Southwest"),
                            values = c("#0066CC","#FFFF33","#9966FF")) +
         theme_bw() +
         theme(legend.position = "none") +
         labs( x = "Year"))

I do the same for the Eastern Conference, but wont include it in this markdown.

10.3 Line Plot per Division

ggarrange(q5, q6, q7, q8, 
          ncol = 2,
          top = "Western Conf. Scoring",
          bottom = textGrob(
            "This footnote is right-justified",
            gp = gpar(fontface = 3, fontsize = 10),
            hjust = 1, x = 1))

ggarrange(q9, q10, q11, q12, 
          ncol = 2,
          top = "Eastern Conf. Scoring",
          bottom = textGrob(
            "This footnote is right-justified",
            gp = gpar(fontface = 3, fontsize = 10),
            hjust = 1, x = 1)) #arrange graps

Both graphs we can see the three point shot becoming more favorable over time, in return turning down more 2 point shot attempts.

10.4 3’s Attempted

plotb <- ggplot(nba, aes(Year,`3PA`))
(plotb + geom_violin(aes(fill = Year), size = 1) +
    theme_bw()+ theme(legend.position="none")+
    labs( title = "3 Pointers Attempted 2010-2020",
          x = "Years",
          y = "3's Attempted")) #Plot violin graph

The 3 point shot attempt increased drastically over the past decade. Also observe the 0 attempts decrease in volume as well, as more players are encouraged to shoot more. Shows how crucial shooting behind the arc has become in a modern NBA game.

11 Best 3 point shooters

nba %>% 
  filter(`3P` > 3)%>% 
  arrange(Player, desc(Year)) %>% 
  distinct(Player)
##                Player
## 1         Buddy Hield
## 2    D'Angelo Russell
## 3      Damian Lillard
## 4       Dāvis Bertāns
## 5     Devonte' Graham
## 6     Duncan Robinson
## 7         Eric Gordon
## 8       Isaiah Thomas
## 9         J.J. Redick
## 10       James Harden
## 11 Karl-Anthony Towns
## 12       Kemba Walker
## 13      Klay Thompson
## 14         Kyle Lowry
## 15      Malik Beasley
## 16        Paul George
## 17        R.J. Hunter
## 18      Stephen Curry
## 19         Trae Young
## 20        Zach LaVine

The only players that have averaged at least three 3 pointers made.

Although its impressive, we should be obsessed with shooting rate. That is, what are the chances of the shot going in given that you’re behind the arc.

nba %>% 
  filter(`3P` > 3 & `3P%` > 0.40) %>% 
  arrange(Player, desc(Year)) %>% 
  distinct(Player)
##                Player
## 1         Buddy Hield
## 2      Damian Lillard
## 3       Dāvis Bertāns
## 4     Duncan Robinson
## 5  Karl-Anthony Towns
## 6       Klay Thompson
## 7          Kyle Lowry
## 8       Malik Beasley
## 9         Paul George
## 10      Stephen Curry

These are more efficient shooters, as at one point in their careers they shot over 40% from the three point line.

best.shooters <- nba %>% 
  filter(Player == "Buddy Hield"|Player == "Damian Lillard"|
           Player == "Dāvis Bertāns"|Player == "Duncan Robinson"|
           Player == "Karl-Anthony Towns"|Player == "Klay Thompson"|
           Player == "Kyle Lowry"|Player == "Malik Beasley"|
           Player == "Paul George"|Player == "Stephen Curry") #Filtering out best shooters 

I want to create a dot plot that represents the best 3 point shooters this NBA has seen this past decade.

table1 <- best.shooters %>% 
  group_by(Player) %>% 
  summarise_each(funs(mean)) %>% 
  select(c(Player,`3P`,`3PA`,`3P%`)) %>% 
  mutate_if(is.numeric, round, 3) #Table over aggregate stats in respect to 3 point shooting 

Created this separate data frame as I will combine this with my graph.

11.1 Best Shooter Plot

plot.best.shooters <- ggplot(best.shooters, aes(Year.cv, `3P%`, color = Player))
p1 <- (plot.best.shooters + geom_jitter() + theme_bw()+ 
         theme(legend.position = "top") +
         labs( title = "Shooting Percentages", 
               x = "Year",
               y = "3P%"))
grid.arrange( p1,
  tableGrob(table1),
  ncol = 2,
  widths = c(2, 1.5),
  clip = FALSE)

12 Conclusion

I’ve got some pretty interesting graphs computed from the numbers provided by (https://www.basketball-reference.com/). I’ve been a big fan of the NBA for years, and thought it would be fun to combine data analytics and NBA stats together. Hopefully my other NBA fans found this interesting as much as I did. Thank you for taking the time to view this!