I have deiced to use data scraped from Basketball Reference (https://www.basketball-reference.com/) in order to closer examine players impacts on their stint with a team and see how players perform at each stop in their career.
library(tidyverse) # The tidyverse collection of packages
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr) # Useful for web authenticationlibrary(rvest) # Useful tools for working with HTML and XML
Attaching package: 'rvest'
The following object is masked from 'package:readr':
guess_encoding
library(magrittr) # Piping output easily with loops
Attaching package: 'magrittr'
The following object is masked from 'package:purrr':
set_names
The following object is masked from 'package:tidyr':
extract
New names:
Rows: 12857 Columns: 21
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): Player dbl (20): ...1, From, To, Yrs, G, MP, FG, FGA, 3P, 3PA, FT, FTA,
ORB, TRB, A...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Analysis
I originally intend to look at trends on a team by team level, but that unfortunately is still yet to come. I will instead look at various trends of how player stats change over length of time spent with a team.
Question 1: What players have scored over 10,000 points and recorded over 10,000 total rebounds with a single team?
I will answer this by filtering down to players that have over 10,000 of both points and rebounds. I want to see this as these are rather high numbers for a player to reach, and I want to see if there is anyone on this list who is not in the Hall of Fame (I will check manually).
This returns the list of players that have reached the 10k mark in both points and rebounds for one team. Only 17 players have done this, all of whom are in the Hall of Fame.
Question 2: What is the distribution of length of time spent with team?
Here I am looking to examine what is the most common tenure length in terms of years spent with a team to see how many more players spend a year or less with a team compared to other durations.
I will reach this goal by grouping by Yrs and making a bar chart.
stints %>%group_by(Yrs) %>%ggplot(aes(x=Yrs)) +geom_bar(fill ="purple") +labs(y="number of players")
This graph shows that it is far more common for a player to play a single year than another other amount, and the longer the duration the rarer it is.
Question 3: What year of player tenure has committed the most fouls for a team?
Given that there are more players who play shorter careers but players with long tenures will commit more fouls, what stint duration has committed the most fouls?
This will again be done by grouping by Yrs, then making a column chart with Yrs as X and total amount of fouls being the Y.
This shows that while there are more players who have played a single year (seen in question 2), the most amount of fouls have been committed by players who spent two years with a team. This is interesting and leads to more questions, but it also leads for me to believe that lots of players who have recorded 1 year stints played sparingly and did not record many stats
Question 4: How many players played only one game with a team?
Given I feel like so many players had a stint where they did not play much, I will filter and sum how many players only played a single game.
This shows that there are 273 occurrences of a player playing only a single game for a team, providing merit to my idea that some of the “one year” instances did not contribute much.
Question 5: What is the relationship between FG and FGA?
This is more of a general basketball question, but I want to look to see if its really true that the more shots you take, the more you make.
I will see this by looking at how FG and FGA are related on a scatter plot.
stints %>%ggplot(aes(x = FGA, y = FG)) +geom_point()
It really is true, the more shots attempted corresponded with more shots being made. I will not longer be scared to let it rip when I am out on the court.