NBA Information

Author

Declan Le Warn

NBA Player Data Analysis

I have deiced to use data scraped from Basketball Reference (https://www.basketball-reference.com/) in order to closer examine players impacts on their stint with a team and see how players perform at each stop in their career.

library(tidyverse)  # The tidyverse collection of packages
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr)       # Useful for web authentication
library(rvest)      # Useful tools for working with HTML and XML

Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding
library(magrittr)   # Piping output easily with loops

Attaching package: 'magrittr'

The following object is masked from 'package:purrr':

    set_names

The following object is masked from 'package:tidyr':

    extract
stints <- 
  read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/lewarnd_xavier_edu/ERMrFqGgYVdPsIrQHjV3Oe0BbtGDLrjAH2rYZynDhidOIw?download=1")
New names:
Rows: 12857 Columns: 21
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): Player dbl (20): ...1, From, To, Yrs, G, MP, FG, FGA, 3P, 3PA, FT, FTA,
ORB, TRB, A...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

Analysis

I originally intend to look at trends on a team by team level, but that unfortunately is still yet to come. I will instead look at various trends of how player stats change over length of time spent with a team.

Question 1: What players have scored over 10,000 points and recorded over 10,000 total rebounds with a single team?

I will answer this by filtering down to players that have over 10,000 of both points and rebounds. I want to see this as these are rather high numbers for a player to reach, and I want to see if there is anyone on this list who is not in the Hall of Fame (I will check manually).

stints %>%
  filter(stints$TRB >10000, stints$TRB >10000) 
# A tibble: 17 × 21
    ...1 Player       From    To   Yrs     G    MP    FG   FGA  `3P` `3PA`    FT
   <dbl> <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1   711 Bob Pettit   1955  1965    11   792 30690  7349 16872    NA    NA  6182
 2   981 Dave Cowens  1971  1980    10   726 28551  5608 12193     1    12  1975
 3  1223 Robert Par…  1981  1994    14  1106 34977  7483 13558     0     5  3279
 4  1273 Bill Russe…  1957  1969    13   963 40726  5687 12930    NA    NA  3148
 5  3378 Dirk Nowit…  1999  2019    21  1522 51368 11169 23734  1982  5210  7240
 6  4560 Wilt Chamb…  1960  1965     6   429 20231  7216 14270    NA    NA  3351
 7  4973 Nate Thurm…  1964  1974    11   757 30735  5029 11836    NA    NA  3133
 8  5334 Hakeem Ola…  1985  2001    17  1177 42844 10555 20573    25   122  5376
 9  6338 Kareem Abd…  1976  1989    14  1093 37492  9935 17520     1    18  4305
10  6356 Elgin Bayl…  1959  1972    14   846 33863  8693 20171    NA    NA  5763
11  7960 Kevin Garn…  1996  2016    14   970 36189  7647 15560   164   569  3743
12  8550 Patrick Ew…  1986  2000    15  1039 37586  9260 18224    19   122  5126
13 10065 Dolph Scha…  1950  1964    15   996 29800  5863 15427    NA    NA  6712
14 11667 Tim Duncan   1998  2016    19  1392 47368 10285 20334    30   168  5896
15 11897 David Robi…  1990  2003    14   987 34271  7365 14221    25   100  6035
16 12221 Karl Malone  1986  2003    18  1434 53479 13335 25810    85   309  9619
17 12804 Wes Unseld   1969  1981    13   984 35832  4369  8586     3     6  1883
# ℹ 9 more variables: FTA <dbl>, ORB <dbl>, TRB <dbl>, AST <dbl>, STL <dbl>,
#   BLK <dbl>, TOV <dbl>, PF <dbl>, PTS <dbl>

This returns the list of players that have reached the 10k mark in both points and rebounds for one team. Only 17 players have done this, all of whom are in the Hall of Fame.

Question 2: What is the distribution of length of time spent with team?

Here I am looking to examine what is the most common tenure length in terms of years spent with a team to see how many more players spend a year or less with a team compared to other durations.

I will reach this goal by grouping by Yrs and making a bar chart.

stints %>%
  group_by(Yrs) %>%
  ggplot(aes(x=Yrs)) +
  geom_bar(fill = "purple") +
  labs(y= "number of players")

This graph shows that it is far more common for a player to play a single year than another other amount, and the longer the duration the rarer it is.

Question 3: What year of player tenure has committed the most fouls for a team?

Given that there are more players who play shorter careers but players with long tenures will commit more fouls, what stint duration has committed the most fouls?

This will again be done by grouping by Yrs, then making a column chart with Yrs as X and total amount of fouls being the Y.

stints %>%
  group_by(Yrs) %>%
  summarise(total_PF = sum(PF)) %>%
  ggplot(aes(x = Yrs, y = total_PF)) +
  geom_col()

This shows that while there are more players who have played a single year (seen in question 2), the most amount of fouls have been committed by players who spent two years with a team. This is interesting and leads to more questions, but it also leads for me to believe that lots of players who have recorded 1 year stints played sparingly and did not record many stats

Question 4: How many players played only one game with a team?

Given I feel like so many players had a stint where they did not play much, I will filter and sum how many players only played a single game.

 stints %>%
  filter(G == 1) %>%
  summarize(sum_G = sum(G))
# A tibble: 1 × 1
  sum_G
  <dbl>
1   273

This shows that there are 273 occurrences of a player playing only a single game for a team, providing merit to my idea that some of the “one year” instances did not contribute much.

Question 5: What is the relationship between FG and FGA?

This is more of a general basketball question, but I want to look to see if its really true that the more shots you take, the more you make.

I will see this by looking at how FG and FGA are related on a scatter plot.

stints %>%
  ggplot(aes(x = FGA, y = FG)) +
  geom_point()

It really is true, the more shots attempted corresponded with more shots being made. I will not longer be scared to let it rip when I am out on the court.