NBA Project

Author

Dany Drammeh

Brief Introduction

The data set I have selected is a collection of NBA statistics from the 2021-2022 regular season. It contains 812 observations and 30 variables (3 categorical, 27 quantitative). The variables relate to shooting, offensive, and defensive statistics for players and teams. The author of this data set is Ron Yurko. He gathered the data from a source that collects extensive NBA statistics and information. The source is https://www.basketball-reference.com//leagues/NBA_2022_per_poss.html

The variables I plan on implementing into my data visualization include player and x3ppercent (3 Point Percentage). I will select 10 of my favorite players in the league and see which player has the best 3 Point Percentage (in other words who is the best 3 point shooter). Once I select my 10 players, I will look at the number of 3-point shots attempted per game (x3pa) to form a hypothesis of who I think will be the best shooter before creating my data visualization. If you think about it, the more attempts a player has, the more confident they are about shooting and making more shots.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/danyd/OneDrive/Desktop/data 110/week7hw")
nbapstats <- read.csv("nbapstats.csv")

First, we will look at the top and bottom of the data and scan for any NAs in the data set.

head(nbapstats)
                    player pos age  tm  g gs   mp   fg  fga fgpercent x3p x3pa
1         Precious Achiuwa   C  22 TOR 73 28 1725  7.7 17.5     0.439 1.6  4.5
2             Steven Adams   C  28 MEM 76 75 1999  5.0  9.2     0.547 0.0  0.0
3              Bam Adebayo   C  24 MIA 56 56 1825 11.1 20.0     0.557 0.0  0.2
4             Santi Aldama  PF  21 MEM 32  0  360  7.0 17.5     0.402 0.8  6.4
5        LaMarcus Aldridge   C  36 BRK 47 12 1050 11.6 21.1     0.550 0.6  2.1
6 Nickeil Alexander-Walker  SG  23 TOT 65 21 1466  8.5 22.9     0.372 3.5 11.4
  x3ppercent  x2p x2pa x2ppercent  ft fta ftpercent orb  drb  trb ast stl blk
1      0.359  6.1 13.0      0.468 2.3 3.8     0.595 4.2  9.5 13.7 2.4 1.1 1.2
2      0.000  5.0  9.2      0.548 2.6 4.8     0.543 8.4  9.8 18.2 6.1 1.6 1.4
3      0.000 11.1 19.8      0.562 7.0 9.3     0.753 3.8 11.7 15.5 5.2 2.2 1.2
4      0.125  6.2 11.2      0.560 2.7 4.3     0.625 4.4  7.2 11.6 2.8 0.8 1.3
5      0.304 11.0 19.0      0.578 4.1 4.7     0.873 3.4  8.5 11.9 1.9 0.6 2.2
6      0.311  5.0 11.5      0.433 2.7 3.7     0.743 1.2  5.1  6.3 5.3 1.5 0.8
  tov  pf  pts ortg drtg
1 2.4 4.4 19.2  105  110
2 2.8 3.7 12.6  125  108
3 4.1 4.7 29.3  117  104
4 2.1 4.8 17.5  101  111
5 2.0 3.6 28.0  119  112
6 3.1 3.5 23.3   98  114
tail(nbapstats)
            player pos age  tm  g gs   mp   fg  fga fgpercent x3p x3pa
807 Thaddeus Young  PF  33 SAS 26  1  370  9.6 16.6     0.578 0.0  0.6
808 Thaddeus Young  PF  33 TOR 26  0  475  7.1 15.2     0.465 1.8  4.5
809     Trae Young  PG  23 ATL 76 76 2652 13.2 28.6     0.460 4.3 11.3
810 Omer Yurtseven   C  23 MIA 56 12  706  9.2 17.5     0.526 0.1  0.8
811    Cody Zeller   C  29 POR 27  0  355  7.0 12.4     0.567 0.0  0.6
812    Ivica Zubac   C  24 LAC 76 76 1852  8.2 13.1     0.626 0.0  0.0
    x3ppercent x2p x2pa x2ppercent  ft  fta ftpercent orb  drb  trb  ast stl
807      0.000 9.6 16.0      0.602 1.3  2.9     0.455 5.2  6.9 12.1  7.7 3.0
808      0.395 5.3 10.6      0.495 1.4  2.8     0.481 4.2  7.9 12.1  4.7 3.3
809      0.382 8.9 17.3      0.512 9.3 10.2     0.904 0.9  4.3  5.3 13.7 1.3
810      0.091 9.1 16.7      0.547 2.7  4.3     0.623 6.0 14.8 20.8  3.5 1.2
811      0.000 7.0 11.8      0.593 5.2  6.7     0.776 6.9 10.3 17.2  3.0 1.1
812         NA 8.2 13.1      0.626 4.4  6.0     0.727 5.7 11.3 17.0  3.2 1.0
    blk tov  pf  pts ortg drtg
807 0.9 3.9 4.9 20.5  113  109
808 1.2 2.3 4.5 17.3  113  106
809 0.1 5.6 2.4 39.9  119  118
810 1.4 2.9 6.0 21.2  112  104
811 0.8 2.6 7.7 19.3  127  116
812 2.0 3.0 5.4 20.8  126  107
summary(nbapstats)
    player              pos                 age             tm           
 Length:812         Length:812         Min.   :19.00   Length:812        
 Class :character   Class :character   1st Qu.:23.00   Class :character  
 Mode  :character   Mode  :character   Median :25.00   Mode  :character  
                                       Mean   :26.05                     
                                       3rd Qu.:29.00                     
                                       Max.   :41.00                     
                                                                         
       g               gs              mp               fg        
 Min.   : 1.00   Min.   : 0.00   Min.   :   1.0   Min.   : 0.000  
 1st Qu.:12.00   1st Qu.: 0.00   1st Qu.: 121.0   1st Qu.: 5.100  
 Median :36.50   Median : 4.00   Median : 577.5   Median : 6.850  
 Mean   :36.71   Mean   :16.67   Mean   : 825.2   Mean   : 6.935  
 3rd Qu.:61.00   3rd Qu.:25.00   3rd Qu.:1414.5   3rd Qu.: 8.600  
 Max.   :82.00   Max.   :82.00   Max.   :2854.0   Max.   :49.000  
                                                                  
      fga          fgpercent           x3p             x3pa       
 Min.   : 0.00   Min.   :0.0000   Min.   :0.000   Min.   : 0.000  
 1st Qu.:12.60   1st Qu.:0.3850   1st Qu.:0.600   1st Qu.: 3.400  
 Median :15.75   Median :0.4410   Median :2.000   Median : 6.550  
 Mean   :16.07   Mean   :0.4343   Mean   :2.029   Mean   : 6.468  
 3rd Qu.:19.20   3rd Qu.:0.5000   3rd Qu.:3.100   3rd Qu.: 9.100  
 Max.   :49.70   Max.   :1.0000   Max.   :9.900   Max.   :49.700  
                 NA's   :15                                       
   x3ppercent          x2p              x2pa          x2ppercent    
 Min.   :0.0000   Min.   : 0.000   Min.   : 0.000   Min.   :0.0000  
 1st Qu.:0.2587   1st Qu.: 2.600   1st Qu.: 5.900   1st Qu.:0.4507  
 Median :0.3310   Median : 4.700   Median : 9.250   Median :0.5160  
 Mean   :0.3034   Mean   : 4.908   Mean   : 9.601   Mean   :0.5055  
 3rd Qu.:0.3762   3rd Qu.: 6.700   3rd Qu.:12.500   3rd Qu.:0.5793  
 Max.   :1.0000   Max.   :49.000   Max.   :49.000   Max.   :1.0000  
 NA's   :72                                         NA's   :28      
       ft              fta           ftpercent           orb        
 Min.   : 0.000   Min.   : 0.000   Min.   :0.0000   Min.   : 0.000  
 1st Qu.: 1.200   1st Qu.: 1.800   1st Qu.:0.6735   1st Qu.: 0.800  
 Median : 2.500   Median : 3.400   Median :0.7650   Median : 1.600  
 Mean   : 2.895   Mean   : 3.906   Mean   :0.7476   Mean   : 2.424  
 3rd Qu.: 3.900   3rd Qu.: 5.200   3rd Qu.:0.8460   3rd Qu.: 3.400  
 Max.   :49.700   Max.   :49.700   Max.   :1.0000   Max.   :24.400  
                                   NA's   :97                       
      drb              trb              ast              stl        
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 4.700   1st Qu.: 5.900   1st Qu.: 2.300   1st Qu.: 0.900  
 Median : 6.000   Median : 7.800   Median : 3.450   Median : 1.400  
 Mean   : 6.644   Mean   : 9.069   Mean   : 4.338   Mean   : 1.592  
 3rd Qu.: 8.300   3rd Qu.:11.900   3rd Qu.: 6.000   3rd Qu.: 1.900  
 Max.   :48.500   Max.   :48.500   Max.   :49.000   Max.   :25.000  
                                                                    
      blk               tov               pf              pts       
 Min.   : 0.0000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.00  
 1st Qu.: 0.3000   1st Qu.: 1.600   1st Qu.: 3.175   1st Qu.:14.07  
 Median : 0.7000   Median : 2.300   Median : 4.100   Median :18.70  
 Mean   : 0.9431   Mean   : 2.535   Mean   : 4.499   Mean   :18.79  
 3rd Qu.: 1.2250   3rd Qu.: 3.100   3rd Qu.: 5.425   3rd Qu.:23.30  
 Max.   :16.6000   Max.   :25.000   Max.   :48.800   Max.   :98.00  
                                                                    
      ortg            drtg      
 Min.   :  0.0   Min.   : 63.0  
 1st Qu.:101.0   1st Qu.:110.0  
 Median :110.0   Median :113.0  
 Mean   :107.7   Mean   :112.3  
 3rd Qu.:118.0   3rd Qu.:116.0  
 Max.   :232.0   Max.   :125.0  
 NA's   :10                     

Since we have discovered that there are NAs in the data set, we will remove all rows that contain NAs to ensure we have a cleaner data set. Now we must place the 10 players we want in a new data frame and find the highest 3 point percentage, however when we look at the variables we see that the 3 point percentage is in decimal form. We must create a separate variable called threepoint_percentage that prints the numbers in percentages, so we must multiply by 100 to convert the decimal to a percentage. Once we do that, we can filter the data for the selected players we want using their player name. Based on the total number of 3 point attempts we see from each player, we can make a hypothesis that Stephen Curry will have the highest 3 point percentage since he has the most 3 point attempts per game (16.5).

threepoint_in_percentage <- nbapstats %>%
  mutate(threepoint_percentage = x3ppercent * 100)
nbastats <- na.omit(threepoint_in_percentage)
playersten <- nbastats %>%
  filter(player %in% c("Kyrie Irving", "Damian Lillard", "Kevin Durant", "Jayson Tatum", "Paul George", "Stephen Curry", "Anthony Edwards", "Donovan Mitchell", "Ja Morant", "Karl-Anthony Towns"))

Data Visualization

The horizontal bar graph below displays a color coded graph of the percentages of the selected NBA players in the data frame. The colors that are visible on the graph correspond to the teams they play for and the colors that the team jersey represents. Looking at this graph, we see that Kyrie Irving had the best 3 point percentage out of all 10 players with 41.8%. Damian Lillard had the worst 3 point percentage at 32.4%

ggplot(playersten, aes(x = player, y = threepoint_percentage, fill= player)) +
    geom_bar(stat = "identity") +
  labs(
    title = "NBA Players' 3 Point Percentages",
    x = "NBA Player",
    y = "3 Point Percentage",
    caption = "Data Source: https://www.basketball-reference.com//leagues/NBA_2022_per_poss.html"
  ) +
  scale_fill_manual(
    values = c("purple", "red", "brown", "cyan", "green", "purple", "black", "black", "blue", "yellow"),  
    name = "NBA Players"  
  ) +
  coord_flip() +
  theme_minimal() 

Conclusion

Although this data set contained NAs, luckily, the players I selected did not contain any NAs so by removing all NAs in the data set, my player selection was unaffected. The visualization shows the 3 point percentages of each player but in my opinion, I feel that it is a bit misleading. Although Kyrie Irving had the highest 3 point percentage that season, it is undeniable that Stephen Curry is the greatest shooter of all time regardless of the difference in his percentage for that season. For someone who lacks knowledge about NBA basketball, this graph would also make them think that Damian Lillard is the worst shooter which is far from the truth. Stephen Curry and Damian Lillard are two of the top three players with the most 3 point attempts in the data set, meaning that they are also more likely to miss more shots. I enjoyed color coding the players by the team color so that there could be variety in the graph and made it horizontal so it was easier to read. One adjustment I wanted to make was a player substitution. One of the players, CJ McCollum was among my list of favorite players who I wanted to add to the graph but I was unable to because he was duplicated in the graph. The reason why he was duplicated in the graph is because he was traded in the middle of the season to the New Orlean Pelicans, so it showed his stats on his new and old team. I tried to use the total average to input in my plot but I had some trouble doing so. Nonetheless, I enjoyed creating this project.