Warning: package 'readr' was built under R version 4.5.2
library(ggplot2)library(tidyverse)
Warning: package 'lubridate' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ stringr 1.5.1
✔ forcats 1.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.5 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rvest)
Attaching package: 'rvest'
The following object is masked from 'package:readr':
guess_encoding
Quarto
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
Rows: 1205 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Team, Opp
dbl (8): Season_End_Year, PTS, TRB, AST, TOV, FG_percent, ThreeP_percent, H...
date (1): Date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
You can add options to executable code like this
# For this assignment, I wanted to figure out what James Harden's best statistical season has# been so far in his NBA career. Instead of just looking at points, rebounds, and assists, # I created a new variable called a Harden score which adds up those three categories and # then subtracts turnovers. This allows me to see which season was truly his best. I wanted # to do this because I am a fan of the Cleveland Cavaliers, and Harden is currently playing # for them. (and causing them to lose playoff games) :( # This means my research question is: What is James Harden's best statistical season based # on his Harden score?# This data comes from Harden's basketball reference game log page.
# Data wranglingharden_seasons <- harden_games %>%filter(str_detect(Date, "[0-9]{4}-[0-9]{2}-[0-9]{2}") ) %>%# this filter is here to eliminate the rows that show his season totals from the per game # averagesgroup_by(Season_End_Year) %>%summarise(Games =n(),Avg_PTS =mean(PTS, na.rm =TRUE),Avg_TRB =mean(TRB, na.rm =TRUE),Avg_AST =mean(AST, na.rm =TRUE),Avg_TOV =mean(TOV, na.rm =TRUE),Avg_FG_percent =mean(FG_percent, na.rm =TRUE),Avg_ThreeP_percent =mean(ThreeP_percent, na.rm =TRUE),Avg_Harden_Score =mean(Harden_Score, na.rm =TRUE) ) %>%arrange(desc(Avg_Harden_Score))# This will arrange the data by season in descending order based on the average harden # score from each season
# visualization 1 - this shows Hardens average harden score per game by season of his careerggplot(harden_seasons, aes(x = Season_End_Year, y = Avg_Harden_Score)) +geom_col() +labs(title ="James Harden's Average Statistical Score by Season",x ="Season Ending Year",y ="Average Harden Score" )
# Harden's harden score peaked around 2017 - 2020, with 3 out of these 4 seasons having a # score greater than 40.
# visualization 2 - This shows Harden's average points per game by season of his careerggplot(harden_seasons, aes(x = Season_End_Year, y = Avg_PTS)) +geom_line() +geom_point() +labs(title ="James Harden's Average Points by Season",x ="Season Ending Year",y ="Average Points" )
# Harden's points per game were at their highest in 2019 and 2020, meaning that he had to # have averaged more assists and rebounds than usual in 2017 to have it be one of his # higher harden scores in his career.
# visualization 3 - This shows Harden's average assists per game by season of his careerggplot(harden_seasons, aes(x = Season_End_Year, y = Avg_AST)) +geom_line() +geom_point() +labs(title ="James Harden's Average Assists by Season",x ="Season Ending Year",y ="Average Assists" )
# Harden's assists peaked in 2017, which is why his harden score for that year is so # high despite averaging fewer points.
# visualization 4 - assists vs turnoversggplot(harden_games, aes(x = TOV, y = AST)) +geom_point(alpha =0.4) +labs(title ="James Harden's Assists Compared to Turnovers",x ="Turnovers",y ="Assists" )
Warning: Removed 118 rows containing missing values or values outside the scale range
(`geom_point()`).
# This graph shows Harden's total assists vs turnovers for each season.
# visualization 5ggplot(harden_games, aes(x =factor(Season_End_Year), y = Harden_Score)) +geom_boxplot() +labs(title ="Distribution of James Harden's Game Scores by Season",x ="Season Ending Year",y ="Harden Score" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Warning: Removed 118 rows containing non-finite outside the scale range
(`stat_boxplot()`).
# Conclusion: By looking at these visualizations, it can be concluded that Harden's # best statistical season was in 2019. This makes sense, as this was also the year he # averaged the most points in his career. His worst seasons were early in his career in # 2010 and 2011. This also makes sense because these were the seasons where he was # establishing himself in the NBA, so he wasn't playing as much.
The echo: false option disables the printing of code (only output is displayed).