── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(magrittr)
Warning: package 'magrittr' was built under R version 4.5.2
Attaching package: 'magrittr'
The following object is masked from 'package:purrr':
set_names
The following object is masked from 'package:tidyr':
extract
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
Rows: 1205 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Team, Opp
dbl (8): Season_End_Year, PTS, TRB, AST, TOV, FG_percent, ThreeP_percent, H...
date (1): Date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 199 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Team, Opp
dbl (7): PTS, TRB, AST, TOV, FG_percent, ThreeP_percent, Harden_Score
date (1): Date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Introduction# For this project, I wanted to figure out what James Harden's best statistical season has# been so far in his NBA career and compare his playoff stats to his regular seasons stats. # Instead of just looking at points, rebounds, and assists, I created a new variable called a # Harden score which adds up those three categories and then subtracts turnovers. This # allows me to see which season was truly his best. I wanted to do this because I am a fan # of the Cleveland Cavaliers, and Harden is currently playing for them. (and causing them # to lose playoff games). I am interested in seeing if he is actually worse in the playoffs,# or if I am just scapegoating him due to his reputation of being a bad playoff player.# This means my research question is: What is James Harden's best statistical season based # on his Harden score and how does it compare to his playoff performances?# This data comes from Harden's basketball reference game log page and his playoff game log# page. This data includes his points, assists, rebounds, turnovers, team, opposing team, # field goal percentage, three point percentage, year of the season, and the date each game# was played.# here are links to my dataset:# regular season: https://www.basketball-reference.com/players/h/hardeja01.html # playoffs: https://www.basketball-reference.com/players/h/hardeja01/gamelog-playoffs/
You can add options to executable code like this
# Data Dictionary:# PTS - Points# TRB - Rebounds# AST - Assists# TOV - Turnovers# FG_percent - Field Goal Percentage# ThreeP_percent - Three Point Percentage# Season_End_Year - Year the NBA Season Ended# Date - Date of the Game# Team - Harden's Team# Opp - Opposing Team# Harden Score - PTS + AST + TRB - TOV
# Data Cleaning: This takes the game log and changes it to show average stats for each # seasonharden_seasons <- harden_games %>%filter(str_detect(Date, "[0-9]{4}-[0-9]{2}-[0-9]{2}")) %>%group_by(Season_End_Year) %>%summarise(Games =n(),Avg_PTS =mean(PTS, na.rm =TRUE),Avg_TRB =mean(TRB, na.rm =TRUE),Avg_AST =mean(AST, na.rm =TRUE),Avg_TOV =mean(TOV, na.rm =TRUE),Avg_FG_percent =mean(FG_percent, na.rm =TRUE),Avg_ThreeP_percent =mean(ThreeP_percent, na.rm =TRUE),Avg_Harden_Score =mean(Harden_Score, na.rm =TRUE),.groups ="drop" ) %>%arrange(desc(Avg_Harden_Score))harden_seasons
# These are some summary statistics for James Harden's averages by season across his regular# season and playoff games.
# Here are Harden's top 5 regular seasons by Harden score:harden_seasons %>%slice_max(Avg_Harden_Score, n =5) %>%kable()
Season_End_Year
Games
Avg_PTS
Avg_TRB
Avg_AST
Avg_TOV
Avg_FG_percent
Avg_ThreeP_percent
Avg_Harden_Score
2019
82
36.12821
6.641026
7.512821
4.961538
0.4471667
0.3727821
45.32051
2020
72
34.33824
6.558823
7.529412
4.529412
0.4442059
0.3558235
43.89706
2017
82
29.08642
8.135803
11.197531
5.728395
0.4432716
0.3390988
42.69136
2018
82
30.43056
5.402778
8.750000
4.375000
0.4476389
0.3653750
40.20833
2021
68
24.61364
7.909091
10.795454
4.022727
0.4599773
0.3383409
39.29545
# Visualization 1: Average Harden score across all of his regular seasons:ggplot(harden_seasons, aes(x = Season_End_Year, y = Avg_Harden_Score)) +geom_col() +labs(title ="James Harden's Average Harden Score by Regular Season",x ="Season Ending Year",y ="Average Harden Score" )
# Harden's harden score peaked around 2017 - 2020, with 3 out of these 4 seasons having a # score greater than 40.
# Visualization 2 - This shows Harden's average points per game by season of his careerggplot(harden_seasons, aes(x = Season_End_Year, y = Avg_PTS)) +geom_line() +geom_point() +labs(title ="James Harden's Average Points by Season",x ="Season Ending Year",y ="Average Points" )
# Harden's points per game were at their highest in 2019 and 2020, meaning that he had to # have averaged more assists and rebounds than usual in 2017 to have it be one of his # higher harden scores in his career.
# Visualization 3 - This shows Harden's average assists per game by season of his careerggplot(harden_seasons, aes(x = Season_End_Year, y = Avg_AST)) +geom_line() +geom_point() +labs(title ="James Harden's Average Assists by Season",x ="Season Ending Year",y ="Average Assists" )
# Harden's assists peaked in 2017, which is why his harden score for that year is so # high despite averaging fewer points.
# Visualization 4 - This shows Harden's assist to turnover ratio for each regular season.assist_turnover_summary <- harden_seasons %>%select( Season_End_Year, Avg_AST, Avg_TOV ) %>%pivot_longer(cols =c(Avg_AST, Avg_TOV),names_to ="Stat",values_to ="Average" )ggplot(assist_turnover_summary,aes(x =factor(Season_End_Year),y = Average,fill = Stat )) +geom_col(position ="dodge") +labs(title ="James Harden Average Assists vs. Turnovers by Season",x ="Season Ending Year",y ="Average Per Game",fill ="Statistic" ) +scale_fill_discrete(labels =c("Assists", "Turnovers") ) +theme(axis.text.x =element_text(angle =45, hjust =1) )
# Harden's turnover numbers are correlated to his assist numbers, which makes sense. His best passing season is also his # worst turnover season. However, it appears that as he is aging, he is turning the ball over less.
# Visualization 5 - This shows the relationship between total points scored and total Harden Score for each season.ggplot(harden_games, aes(x = PTS, y = Harden_Score)) +geom_point(alpha =0.4) +labs(title ="Relationship Between Points and Harden Score",x ="Points",y ="Harden Score" )
Warning: Removed 118 rows containing missing values or values outside the scale range
(`geom_point()`).
# This visualization is helpful in determining which seasons that Harden may have averaged more turnovers than usual# or less assists/rebounds than usual.
# Combining regular season and playoff stats for further comparison.regular_clean <- harden_games %>%filter(!is.na(PTS), !is.na(TRB), !is.na(AST), !is.na(TOV)) %>%mutate(Game_Type ="Regular Season")playoff_clean <- harden_playoff_games %>%filter(!is.na(PTS), !is.na(TRB), !is.na(AST), !is.na(TOV)) %>%mutate(Game_Type ="Playoffs")combined_games <-bind_rows(regular_clean, playoff_clean)
# Visualization 5 - Average Harden Score by season (Playoffs vs regular season)regular_summary <- harden_seasons %>%mutate(Game_Type ="Regular Season") %>%select( Season_End_Year, Game_Type, Avg_Harden_Score )playoff_summary <- harden_playoff_seasons %>%mutate(Game_Type ="Playoffs") %>%select( Season_End_Year, Game_Type, Avg_Harden_Score )combined_season_summary <-bind_rows(regular_summary, playoff_summary)ggplot(combined_season_summary,aes(x =interaction(Season_End_Year, Game_Type),y = Avg_Harden_Score,fill = Game_Type )) +geom_col() +labs(title ="James Harden Average Harden Score by Season and Game Type",x ="Season and Game Type",y ="Average Harden Score",fill ="Game Type" ) +theme(axis.text.x =element_text(angle =90, hjust =1) )
# This shows that Harden statistically plays worse in the playoffs than the regular season
# Conclusion: By looking at these visualizations, it can be concluded that Harden's # best statistical season was in 2019. This makes sense, as this was also the year he # averaged the most points in his career. His worst seasons were early in his career in # 2010 and 2011. This also makes sense because these were the seasons where he was # establishing himself in the NBA, so he wasn't playing as much. When it comes to the # playoffs, Harden's stats do in fact drop in every post season compared to it's # corresponding regular season. So all in all, Harden does statistically play worse in the # playoffs than he does in the regular season.
The echo: false option disables the printing of code (only output is displayed).