For our project, the topic we will be exploring is shooting in the National Basketball Association (NBA). The NBA has undergone significant changes over the past few decades, with shooting trends being among the most notable and drastic of changes. The goal of our project is to examine and answer the question of how shooting in the NBA has evolved over the years. We will examine how three point shooting has evolved, how its evolution has affected the mid range shot, who has contributed most to the evolution (Stephen Curry), and how it has affected other players. (LeBron James)
Prior studies have found that shooting has in fact evolved in the modern NBA. The NBA has transitioned from a mid-range and post-heavy style of play to an era in which three-point shooting has become much the go-to for offenses. Research has also discovered that shooting accuracy from all distances have seen slight increase as well with the exception of three point range. The increased number of three-point shots as well as the increased accuracy of shooting in the NBA can be attributed to many different factors. Increased skill, rule changes, the “Curry Effect”, and much more have played key roles in shaping and transforming the modern NBA.
The research that we conduct for our project is important for numerous reasons. For one, the research will be very telling about how the NBA has changed over the years. Fans like us, or anyone interested in the NBA, can gain a deeper understanding of the NBA and how the game has transformed as well as where the game may be headed in the future.
For our analysis, the data we used is a broad NBA dataset that can be found from github.com with this link: https://github.com/DomSamangy/NBA_Shots_04_24. This dataset contains tons of relevant and valuable data from all NBA games from the 2004 to 2024 seasons. The dataset we used for our analysis contains information on players, games, shooting types and results, shot locations/zones, etc. For our analysis, we used the following packages for data wrangling and visualizing: tidyverse, ggplot2, dplyr, ggpubr,
To conduct our analysis, we used many different variables from the dataset. The variables that we utilized were: season (SEASON_1), player names(PLAYER_NAME), shot types (SHOT_TYPE), shot results (SHOT_MADE). All of these variables played key roles in helping us conduct our analysis.
To address our research questions with the data we acquired, we mainly aimed towards creating data visualizations as we felt as though they were the most informative. Bar graphs, line charts, and shot charts are the main visualizations that we utilized for our analysis. We also conducted a correlation analysis between 3 point and 2 point attempts and seasons to determine the correlation between the variables.
#Packages used
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)
library(ggpubr)
#Data preparation
nbadata <- read.csv("E:/BDATA PROJECT/NBA_2004_2024_Shots.csv" , header=TRUE)
#Data visualizations and findings
#Shot attempts with seasons grouped into bins of 4/5.
nbadata <- nbadata %>%
mutate(Decade_Group = cut(SEASON_1,
breaks = seq(2004, 2024, by = 5),
labels = c("2004-2008", "2009-2013", "2014-2018", "2019-2024"),
include.lowest = TRUE))
# With this code, I am aggregating total shot attempts per period
shot_counts_decade_group <- nbadata %>%
group_by(Decade_Group, SHOT_TYPE) %>%
summarise(Total_Attempts = n(), .groups = "drop")
#Finished product, this visualization is a bar graph that shows 2 point and 3 point shot attempts of each decade grouping. As we can see, 2 point attmepts are falling and 3 point attempts are rising
ggplot(shot_counts_decade_group, aes(x = Decade_Group, y = Total_Attempts, fill = SHOT_TYPE)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "3PT vs 2PT Attempts (2004-2024)", x = "Season Group", y = "Total Attempts", fill = "Shot Type") +
theme_minimal() +
theme(plot.title = element_text(size = 16, face = "bold"))
shot_counts_by_years <- nbadata %>%
group_by(SEASON_1, SHOT_TYPE) %>%
summarise(Shot_Count = n(), .groups = 'drop') %>%
group_by(SEASON_1) %>%
mutate(Percentage = Shot_Count / sum(Shot_Count) * 100)
#Next, we created a new visual. This visual is a trend line chart that compares 2 point field goal attempts to 3 point field goal attempts
#from 2004 to 2024. We can see that 2 point attempts are declining while 3 point attempts are rising
ggplot(shot_counts_by_years, aes(x = SEASON_1, y = Shot_Count, color = SHOT_TYPE, group = SHOT_TYPE)) +
geom_line(linewidth = 1) +
geom_point() +
labs(title = "2PT vs 3PT Field Goal Attempts From 2004 to 2024",
x = "Season",
y = "Attempts",
color = "Shot Type") +
theme_minimal() +
theme(legend.position = "top")
# To begin with our correlation analysis, we first created a 3 point trend summary
three_point_trend <- nbadata %>%
filter(SHOT_TYPE == "3PT Field Goal") %>%
group_by(SEASON_1) %>%
summarise(
total_3PA = n())
#Here, we conducted a correlation test: 3 point attempts over time, 95% confidence interval
cor.test(three_point_trend$SEASON_1, three_point_trend$total_3PA, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: three_point_trend$SEASON_1 and three_point_trend$total_3PA
## t = 13.414, df = 19, p-value = 3.863e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8810994 0.9802768
## sample estimates:
## cor
## 0.9510469
#Linear correlation graph. r=0.95. We can see that there is a strong positive correlation between 3-point attempts and year progression
#2012 season dip is attributed to the lockdown where the season was shortened from 82 games to 66 games. 2020-2021 season dip is attributed to COVID which also shortened the season.
#Below is the code for the correlation graph
ggplot(three_point_trend, aes(x = SEASON_1, y = total_3PA)) +
geom_point(color = "red", size = 2) +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
stat_cor(method = "pearson", label.x = 2002, label.y = max(three_point_trend$total_3PA) * 0.9) +
labs(title = "Positive Linear Correlation: 3-Point Attempts Over Seasons",
x = "Season",
y = "Total 3-Point Attempts") +
theme_minimal() +
theme(plot.title = element_text(hjust = 1, face = "bold"))
## `geom_smooth()` using formula = 'y ~ x'
#Here, the same is conducted for 2 point attempts over time.
# 2 point trend summary
two_point_trend <- nbadata %>%
filter(SHOT_TYPE == "2PT Field Goal") %>%
group_by(SEASON_1) %>%
summarise(
total_2PA = n())
#correlation test between 2 point attempts and seasons
cor.test(two_point_trend$SEASON_1, two_point_trend$total_2PA, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: two_point_trend$SEASON_1 and two_point_trend$total_2PA
## t = -5.1005, df = 19, p-value = 6.355e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8973966 -0.4890019
## sample estimates:
## cor
## -0.7602114
#Here, the code for the correlation graph. We can see there is a moderately negative linear correlation between 2 point attempts and seasons. Again there is slight skew in the data that can be attributed to the lockdown where the season was shortened from 82 games to 66 games. 2020-2021 season dip is attributed to COVID.
#Below is the code for the correlation graph
ggplot(two_point_trend, aes(x = SEASON_1, y = total_2PA)) +
geom_point(color = "red", size = 2) +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
stat_cor(method = "pearson", label.x = 2002, label.y = max(two_point_trend$total_2PA) * 0.9) +
labs(title = "Negative Linear Correlation: 2-Point Attempts Over Seasons",
x = "Season",
y = "Total 2-Point Attempts") +
theme_minimal() +
theme(plot.title = element_text(hjust = 1, face = "bold"))
## `geom_smooth()` using formula = 'y ~ x'
#From the analysis conducted above, we can clearly see that three point attempts are rising while two point attempts are falling.
#First mutate the SHOT_MADE column and add a DECADE column
nbadata <- nbadata %>%
mutate(SHOT_MADE = case_when(
SHOT_MADE == TRUE ~ "Made",
SHOT_MADE == FALSE ~ "Missed"
))
nbadata <- nbadata %>%
mutate(YEAR = SEASON_1)
year_summary <- nbadata %>%
filter(SHOT_TYPE == "3PT Field Goal") %>%
group_by(YEAR) %>%
summarise(total_3PA = n())
# split year by around 5 year chinks for better data distribution
nbadata <- nbadata %>%
mutate(YEAR = SEASON_1,
YEAR_split = case_when(
YEAR >= 2004 & YEAR <= 2008 ~ "2004-2008",
YEAR >= 2009 & YEAR <= 2013 ~ "2009-2013",
YEAR >= 2014 & YEAR <= 2018 ~ "2014-2018",
YEAR >= 2019 & YEAR <= 2024 ~ "2019-2024",
TRUE ~ "Other"
))
# Assign era based on Curry's influence
trend_comparison <- year_summary %>%
mutate(YEAR_split = case_when(
YEAR >= 2004 & YEAR <= 2008 ~ "2004-2008",
YEAR >= 2009 & YEAR <= 2013 ~ "2009-2013",
YEAR >= 2014 & YEAR <= 2018 ~ "2014-2018",
YEAR >= 2019 & YEAR <= 2024 ~ "2019-2024",
TRUE ~ "Other"
),era = ifelse(YEAR < 2009, "Before Curry", "After Curry"))
# find the average total 3PT attempts per era
era_summary <- trend_comparison %>%
group_by(YEAR_split, era) %>%
summarise(avg_3PA = mean(total_3PA, na.rm = TRUE), .groups = "drop")
# plots the total NBA t3 point attempts filterd by the era before and after curry
ggplot(trend_comparison, aes(x = YEAR_split, y = total_3PA, fill = era)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("Before Curry" = "darkred", "After Curry" = "steelblue")) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Total NBA three-points Attempts Before and After Stephen Curry",
x = "Year",
y = "Total 3 point Attempts") +
theme_minimal()
## LeBron James
# filter data for LeBron James by year year and shot type.
lebron_shots <- nbadata %>%
filter(PLAYER_NAME == "LeBron James") %>%
group_by(YEAR, SHOT_TYPE) %>%
summarise(total_attempts = n())
## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.
# Mutate LeBron's shot data with era and event type (Made or Missed)
lebron_shots_era <- nbadata %>%
filter(PLAYER_NAME == "LeBron James") %>%
mutate(YEAR_split,
era = ifelse(YEAR < 2009, "Before Curry", "After Curry"),
EVENT_TYPE = case_when(
SHOT_MADE == "Made" ~ "Made Shot",
SHOT_MADE == "Missed" ~ "Missed Shot")) %>%
group_by(era, EVENT_TYPE, SHOT_TYPE) %>%
summarise(total_attempts = n(), .groups = "drop")
# plot lebron's Shot attempts Before and After Curry
ggplot(lebron_shots_era, aes(x = interaction(SHOT_TYPE,factor(era, levels = c("Before Curry", "After Curry"))),
y = total_attempts, fill = EVENT_TYPE)) +
geom_bar(position = "dodge",stat = "identity") +
scale_fill_manual(values = c("Made Shot" = "steelblue", "Missed Shot" = "darkred")) +
labs(title = "lebron james Shot attempts Before and After Curry",
x = "Shot Type",
y = "Attempts per Season",
fill = "Era") +
theme_minimal()
# github tutorial :
# https://github.com/DomSamangy/R_Tutorials/blob/main/1_Shot_Chart_Tutorial.Rmd
# youtube tutorial :https://www.youtube.com/watch?v=U_RTbdvXonA&t=93s
# github page : https://github.com/DomSamangy
# dataset source : https://github.com/DomSamangy/NBA_Shots_04_24
# creator of clott plotting fucntion : https://x.com/owenlhjphillips
The process used to create the shot chart first loads the necessary libraries (tidyverse, dplyr, ggplot2, and ggpubr) for data manipulation and plotting. It then goes on to the Court Plotting Function, which creates a visualization of a basketball court that includes the hoop, backboard, baseline, 3 point line, the key, and free throw circle/line. Using the same previous dataset. It then filters based on the selected player, and selected date of game. Using ggplot2, the filtered shot data is plotted onto the court, where made shots are represented in green and missed shots in red, using variables such as LOC_X, LOC_Y, SHOT_MADE.
# github tutorial :
# https://github.com/DomSamangy/R_Tutorials/blob/main/1_Shot_Chart_Tutorial.Rmd
# youtube tutorial :https://www.youtube.com/watch?v=U_RTbdvXonA&t=93s
# github page : https://github.com/DomSamangy
# dataset source : https://github.com/DomSamangy/NBA_Shots_04_24
# creator of clott plotting fucntion : https://x.com/owenlhjphillips
# If you do not have the packages loaded,
# Remove the hashtag and run the line to install.
#install.packages("tidyverse")
library(tidyverse)
#install.packages("dplyr")
library(dplyr)
#install.packages("ggplot2")
library(ggplot2)
# Creating Court and Plotting
circle_points = function(center = c(0, 0), radius = 1, npoints = 360) {
angles = seq(0, 2 * pi, length.out = npoints)
return(data_frame(x = center[1] + radius * cos(angles),
y = center[2] + radius * sin(angles)))
}
width = 50
height = 94 / 2
key_height = 19
inner_key_width = 12
outer_key_width = 16
backboard_width = 6
backboard_offset = 4
neck_length = 0.5
hoop_radius = 0.75
hoop_center_y = backboard_offset + neck_length + hoop_radius
three_point_radius = 23.75
three_point_side_radius = 22
three_point_side_height = 14
# List of court themes to use!
court_themes = list(
BDATA200 = list(
court = '#f9c852',
lines = 'black',
text = '#222222',
made = '#00bfc4',
missed = '#f8766d',
hex_border_size = 1,
hex_border_color = "#000000"
),
white = list(
court = 'white',
lines = 'black',
text = 'black',
made = '#00bfc4',
missed = '#f8766d',
hex_border_size = 0,
hex_border_color = "gray15"
)
)
plot_court = function(court_theme = court_themes$BDATA200, use_short_three = FALSE) {
if (use_short_three) {
three_point_radius = 22
three_point_side_height = 0
}
court_points = data_frame(
x = c(width / 2, width / 2, -width / 2, -width / 2, width / 2),
y = c(height, 0, 0, height, height),
desc = "perimeter"
)
court_points = bind_rows(court_points , data_frame(
x = c(outer_key_width / 2, outer_key_width / 2, -outer_key_width / 2, -outer_key_width / 2),
y = c(0, key_height, key_height, 0),
desc = "outer_key"
))
court_points = bind_rows(court_points , data_frame(
x = c(-backboard_width / 2, backboard_width / 2),
y = c(backboard_offset, backboard_offset),
desc = "backboard"
))
court_points = bind_rows(court_points , data_frame(
x = c(0, 0), y = c(backboard_offset, backboard_offset + neck_length), desc = "neck"
))
foul_circle = circle_points(center = c(0, key_height), radius = inner_key_width / 2)
foul_circle_top = filter(foul_circle, y > key_height) %>%
mutate(desc = "foul_circle_top")
foul_circle_bottom = filter(foul_circle, y < key_height) %>%
mutate(
angle = atan((y - key_height) / x) * 180 / pi,
angle_group = floor((angle - 5.625) / 11.25),
desc = paste0("foul_circle_bottom_", angle_group)
) %>%
filter(angle_group %% 2 == 0) %>%
select(x, y, desc)
hoop = circle_points(center = c(0, hoop_center_y), radius = hoop_radius) %>%
mutate(desc = "hoop")
restricted = circle_points(center = c(0, hoop_center_y), radius = 4) %>%
filter(y >= hoop_center_y) %>%
mutate(desc = "restricted")
three_point_circle = circle_points(center = c(0, hoop_center_y), radius = three_point_radius) %>%
filter(y >= three_point_side_height, y >= hoop_center_y)
three_point_line = data_frame(
x = c(three_point_side_radius, three_point_side_radius, three_point_circle$x, -three_point_side_radius, -three_point_side_radius),
y = c(0, three_point_side_height, three_point_circle$y, three_point_side_height, 0),
desc = "three_point_line"
)
court_points = bind_rows(
court_points,
foul_circle_top,
foul_circle_bottom,
hoop,
restricted,
three_point_line
)
court_points <- court_points
ggplot() +
geom_path(
data = court_points,
aes(x = x, y = y, group = desc),
color = court_theme$lines, sizes = 2
) +
coord_fixed(ylim = c(0, 45), xlim = c(-25, 25)) +
theme_minimal(base_size = 22) +
theme(
text = element_text(color = court_theme$text),
plot.background = element_rect(fill = 'gray15', color = 'gray15'),
panel.background = element_rect(fill = court_theme$court, color = court_theme$court),
panel.grid = element_blank(),
panel.border = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.background = element_rect(fill = court_theme$court, color = court_theme$court),
legend.margin = margin(-1, 0, 0, 0, unit = "lines"),
legend.position = "bottom",
legend.key = element_blank(),
legend.text = element_text(size = rel(1.0))
)
}
# Load the Dataset
shots_data <- read_csv("E:/BDATA PROJECT/NBA_2004_2024_Shots.csv")
## Rows: 4231262 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): SEASON_2, TEAM_NAME, PLAYER_NAME, POSITION_GROUP, POSITION, GAME_D...
## dbl (10): SEASON_1, TEAM_ID, PLAYER_ID, GAME_ID, LOC_X, LOC_Y, SHOT_DISTANCE...
## lgl (1): SHOT_MADE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Filter for Single Game Shots by Date & Player Name
# Showcase lebron 2004 old progression to three pointer this year game
#MAKE SURE TO CHANGE MANUALLY FOR PLAYER AND GAME
shots <- shots_data %>%
filter(GAME_DATE == "02-03-2024") %>%
filter(PLAYER_NAME == "Stephen Curry" )
# Create Plot
plot_court(court_themes$white, use_short_three = F) +
# plot shot "points" with x & y locations
geom_point(data = shots, aes(x = LOC_X, y = LOC_Y, color = SHOT_MADE, fill = SHOT_MADE),
size =3, shape = 21, stroke = .5) +
#fill the points with color
scale_color_manual(values = c("green4","red3"), aesthetics = "color", breaks = c("TRUE", "FALSE"), labels=c("Made", "Missed")) +
scale_fill_manual(values = c("green2","red2"), aesthetics = "fill", breaks = c("TRUE", "FALSE"), labels=c("Made", "Missed")) +
scale_x_continuous(limits = c(-27.5, 27.5)) +
scale_y_continuous(limits = c(0, 45)) +
# Add title and subtitle (manual!)
labs(
title = "Stephen Curry",
subtitle = "2024-02-03 GSW vs. ATL"
) +
#
# Theme options for manipulating the look of the plot
theme(
legend.direction = "horizontal",
legend.title = element_blank(),
legend.text = element_text(hjust = .5, size = 7, face = "bold", colour = "black"),
legend.background = element_rect(fill = court_themes$BDATA200$court, color = court_themes$BDATA200$court),
legend.box.background = element_rect(fill = court_themes$BDATA200$court, color = court_themes$BDATA200$court),
legend.key = element_rect(fill = court_themes$BDATA200$court, color = court_themes$BDATA200$court),
legend.margin = margin(t = -.5, unit='cm'),
legend.box.margin=margin(-15,0,15,0),
plot.background = element_rect(fill = court_themes$BDATA200$court, color = court_themes$BDATA200$court),
panel.background = element_rect(fill = court_themes$BDATA200$court, color = court_themes$BDATA200$court),
plot.title = element_text(hjust = 0.5, size = 22, vjust = -9, face = "bold", colour = "black"),
plot.subtitle = element_text(hjust = 0.5, size = 12, vjust = -15, face = "bold", colour = "black"),
plot.margin = margin(0, 0, .5, 0, "cm"),
)
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## ℹ Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in geom_path(data = court_points, aes(x = x, y = y, group = desc), :
## Ignoring unknown parameters: `sizes`
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_path()`).
#Save Plot
ggsave("Player_Shot_Chart.png", height = 6, width = 6, dpi = "retina")
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_path()`).
The heatmap uses a different dataset from the dataset used in the shotchart, and uses some new packages (ncaahoopR,). First, it uses essentially the same Court Plotting Function in the previous shot chart, which was originally created by Owen Phillips. Then, it filters out the selected player, seasons and season_types. A heatmap visualization is then generated using geom_density_2d_filled(), Using variables such as locationX, locationY, then applying a color gradient from the RColorBrewer package to highlight shooting density.
# Github tutorial :
# https://github.com/DomSamangy/R_Tutorials/blob/main/3_Heatmap_Tutorial.Rmd
# Youtube tutorial : https://www.youtube.com/watch?v=rBBVSmFJqyE&t=590s
# Github page : https://github.com/DomSamangy
# Dataset source : utilzies the NBA API and ncaahoopR package
# creator of clott plotting function : https://x.com/owenlhjphillips
# If you do not have the packages loaded,
# Remove the hashtag (#) and run the line to install.
#install.packages("ggplot2")
library(ggplot2)
#install.packages("tidyverse")
library(tidyverse)
#install.packages("nbastatR")
library(nbastatR)
## Warning: replacing previous import 'curl::handle_reset' by 'httr::handle_reset'
## when loading 'nbastatR'
## Warning: replacing previous import 'httr::timeout' by 'memoise::timeout' when
## loading 'nbastatR'
## Warning: replacing previous import 'magrittr::set_names' by 'purrr::set_names'
## when loading 'nbastatR'
## Warning: replacing previous import 'jsonlite::flatten' by 'purrr::flatten' when
## loading 'nbastatR'
## Warning: replacing previous import 'curl::parse_date' by 'readr::parse_date'
## when loading 'nbastatR'
## Warning: replacing previous import 'purrr::%@%' by 'rlang::%@%' when loading
## 'nbastatR'
## Warning: replacing previous import 'purrr::flatten_lgl' by 'rlang::flatten_lgl'
## when loading 'nbastatR'
## Warning: replacing previous import 'purrr::splice' by 'rlang::splice' when
## loading 'nbastatR'
## Warning: replacing previous import 'purrr::flatten_chr' by 'rlang::flatten_chr'
## when loading 'nbastatR'
## Warning: replacing previous import 'purrr::flatten_raw' by 'rlang::flatten_raw'
## when loading 'nbastatR'
## Warning: replacing previous import 'purrr::flatten' by 'rlang::flatten' when
## loading 'nbastatR'
## Warning: replacing previous import 'jsonlite::unbox' by 'rlang::unbox' when
## loading 'nbastatR'
## Warning: replacing previous import 'purrr::flatten_dbl' by 'rlang::flatten_dbl'
## when loading 'nbastatR'
## Warning: replacing previous import 'purrr::invoke' by 'rlang::invoke' when
## loading 'nbastatR'
## Warning: replacing previous import 'purrr::flatten_int' by 'rlang::flatten_int'
## when loading 'nbastatR'
## Warning: replacing previous import 'readr::guess_encoding' by
## 'rvest::guess_encoding' when loading 'nbastatR'
## Warning: replacing previous import 'magrittr::extract' by 'tidyr::extract' when
## loading 'nbastatR'
## Warning: replacing previous import 'rlang::as_list' by 'xml2::as_list' when
## loading 'nbastatR'
#install.packages("devtools")
library(devtools)
## Loading required package: usethis
#devtools::install_github("lbenz730/ncaahoopR")
#library(ncaahoopR)
#had to add cowplot package because of draw_image() method
#https://github.com/DomSamangy/R_Tutorials/issues/1
#install.packages("cowplot")
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggpubr':
##
## get_legend
## The following object is masked from 'package:lubridate':
##
## stamp
# Creating court and plotting
circle_points = function(center = c(0, 0), radius = 1, npoints = 360) {
angles = seq(0, 2 * pi, length.out = npoints)
return(data_frame(x = center[1] + radius * cos(angles),
y = center[2] + radius * sin(angles)))
}
# Court Dimensions & lines
width = 50
height = 94 / 2
key_height = 19
inner_key_width = 12
outer_key_width = 16
backboard_width = 6
backboard_offset = 4
neck_length = 0.5
hoop_radius = 0.75
hoop_center_y = backboard_offset + neck_length + hoop_radius
three_point_radius = 23.75
three_point_side_radius = 22
three_point_side_height = 14
# Court themes
court_themes = list(
light = list(
court = 'floralwhite',
lines = '#999999',
text = '#222222',
made = '#00bfc4',
missed = '#f8766d',
hex_border_size = 1,
hex_border_color = "#000000"
),
dark = list(
court = '#000004',
lines = '#999999',
text = '#f0f0f0',
made = '#00bfc4',
missed = '#f8766d',
hex_border_size = 0,
hex_border_color = "#000000"
),
ppt = list(
court = 'gray15',
lines = 'white',
text = '#f0f0f0',
made = '#00bfc4',
missed = '#f8766d',
hex_border_size = 0,
hex_border_color = "gray15"
)
)
# Function to create court based on given dimensions
plot_court = function(court_theme = court_themes$light, use_short_three = FALSE) {
if (use_short_three) {
three_point_radius = 22
three_point_side_height = 0
}
court_points = data_frame(
x = c(width / 2, width / 2, -width / 2, -width / 2, width / 2),
y = c(height, 0, 0, height, height),
desc = "perimeter"
)
court_points = bind_rows(court_points , data_frame(
x = c(outer_key_width / 2, outer_key_width / 2, -outer_key_width / 2, -outer_key_width / 2),
y = c(0, key_height, key_height, 0),
desc = "outer_key"
))
court_points = bind_rows(court_points , data_frame(
x = c(-backboard_width / 2, backboard_width / 2),
y = c(backboard_offset, backboard_offset),
desc = "backboard"
))
court_points = bind_rows(court_points , data_frame(
x = c(0, 0), y = c(backboard_offset, backboard_offset + neck_length), desc = "neck"
))
foul_circle = circle_points(center = c(0, key_height), radius = inner_key_width / 2)
foul_circle_top = filter(foul_circle, y > key_height) %>%
mutate(desc = "foul_circle_top")
foul_circle_bottom = filter(foul_circle, y < key_height) %>%
mutate(
angle = atan((y - key_height) / x) * 180 / pi,
angle_group = floor((angle - 5.625) / 11.25),
desc = paste0("foul_circle_bottom_", angle_group)
) %>%
filter(angle_group %% 2 == 0) %>%
select(x, y, desc)
hoop = circle_points(center = c(0, hoop_center_y), radius = hoop_radius) %>%
mutate(desc = "hoop")
restricted = circle_points(center = c(0, hoop_center_y), radius = 4) %>%
filter(y >= hoop_center_y) %>%
mutate(desc = "restricted")
three_point_circle = circle_points(center = c(0, hoop_center_y), radius = three_point_radius) %>%
filter(y >= three_point_side_height, y >= hoop_center_y)
three_point_line = data_frame(
x = c(three_point_side_radius, three_point_side_radius, three_point_circle$x, -three_point_side_radius, -three_point_side_radius),
y = c(0, three_point_side_height, three_point_circle$y, three_point_side_height, 0),
desc = "three_point_line"
)
court_points = bind_rows(
court_points,
foul_circle_top,
foul_circle_bottom,
hoop,
restricted,
three_point_line
)
court_points <- court_points
# Final plot creation
ggplot() +
geom_path(
data = court_points,
aes(x = x, y = y, group = desc),
color = court_theme$lines
) +
coord_fixed(ylim = c(0, 45), xlim = c(-25, 25)) +
theme_minimal(base_size = 22) +
theme(
text = element_text(color = court_theme$text),
plot.background = element_rect(fill = 'gray15', color = 'gray15'),
panel.background = element_rect(fill = court_theme$court, color = court_theme$court),
panel.grid = element_blank(),
panel.border = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.background = element_rect(fill = court_theme$court, color = court_theme$court),
legend.margin = margin(-1, 0, 0, 0, unit = "lines"),
legend.position = "bottom",
legend.key = element_blank(),
legend.text = element_text(size = rel(1.0))
)
}
help(teams_shots)
## starting httpd help server ... done
# Had to add this because of errors, default buffer size of 131072 bytes
# to fix this, increase the buffer size by running this line
Sys.setenv("VROOM_CONNECTION_SIZE" = 5000000)
# Grab team names to type into teams_shots function
nba_teams() %>% filter(isNonNBATeam == 0)
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## ℹ The deprecated feature was likely used in the nbastatR package.
## Please report the issue at <https://github.com/abresler/nbastatR/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## # A tibble: 30 × 13
## nameTeam idTeam slugTeam teamName cityTeam teamNameFull idConference
## <chr> <int> <chr> <chr> <chr> <chr> <int>
## 1 Cairns Taipans 1.50e4 CNS Cairns … Cairns Taipans 0
## 2 Washington Wizar… 1.61e9 WAS wizards Washing… Wizards 1
## 3 Utah Jazz 1.61e9 UTA jazz Utah Jazz 2
## 4 Toronto Raptors 1.61e9 TOR raptors Toronto Raptors 1
## 5 San Antonio Spurs 1.61e9 SAS spurs San Ant… Spurs 2
## 6 Sacramento Kings 1.61e9 SAC kings Sacrame… Kings 2
## 7 Portland Trail B… 1.61e9 POR blazers Portland Trail Blaze… 2
## 8 Phoenix Suns 1.61e9 PHX suns Phoenix Suns 2
## 9 Philadelphia 76e… 1.61e9 PHI sixers Philade… 76ers 1
## 10 Orlando Magic 1.61e9 ORL magic Orlando Magic 1
## # ℹ 20 more rows
## # ℹ 6 more variables: idDivision <int>, isNonNBATeam <int>,
## # yearPlayedLast <int>, idLeague <int>, colorsTeam <chr>,
## # urlThumbnailTeam <chr>
#MAKE SURE TO CHANGE MANUALLY FOR PLAYER AND TEAM
# Grab team shot data
team <- teams_shots(teams = "Golden State Warriors", seasons = 2015, season_types = "Regular Season")
## Golden State Warriors 2014-15 shot data
# Filter shot data for player & clean data to fit court dimensions
player <- team %>% filter(namePlayer=="Stephen Curry") %>%
mutate(x = as.numeric(as.character(locationX)) / 10, y = as.numeric(as.character(locationY)) / 10 + hoop_center_y)
# Horizontally flip the data
player$x <- player$x * -1
palette <- paletteer::paletteer_d("RColorBrewer::YlOrRd", direction = -1)
p1 <-
# run specific selected lines if you want to see the density values,
# example :
# 6:01, https://www.youtube.com/watch?v=rBBVSmFJqyE&t=590s
plot_court(court_themes$ppt) +
geom_density_2d_filled(player, mapping = aes(x=x,y=y,fill = ..level..,),
contour_var = "ndensity", breaks = seq(0.1, 1.0, length.out = 10), alpha = .5) +
scale_fill_manual(values = c(palette), aesthetics = c("fill", "color")) +
scale_x_continuous(limits = c(-27.5, 27.5)) +
scale_y_continuous(limits = c(0, 45)) +
theme(legend.position = "none",
plot.title = element_text(hjust = .5, size = 22, face = "bold", vjust = -4),
plot.subtitle = element_text(hjust = .5, size = 10, face = "bold", vjust = -8),
legend.title = element_blank(),
legend.text = element_text(hjust = .5, size = 10, face = "bold", colour = "white"),
plot.caption = element_text(hjust = .5, size = 6, face = "bold", colour = "lightgrey", vjust = 8)) +
labs(title = "Stephen Curry Shooting Heatmap",
subtitle = "2015-16 Regular Season",
caption = "")
ggdraw(p1) + theme(plot.background = element_rect(fill="gray15", color = NA))
## Warning: The dot-dot notation (`..level..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(level)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 8 rows containing non-finite outside the scale range
## (`stat_density2d_filled()`).
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_path()`).
#ggsave("CP3_Heatmap.png", height = 6, width = 6, dpi = 300)
Upon completion of our analysis, our findings ultimately reinforced what prior studies have found. Shooting has in fact seen an evolution in the NBA, where three point attempts have risen and as a result two point attempts have fallen. To further answer our research question, it appears that mid range/ two point shots have not been totally abandoned as players and teams are still shooting them at high rates, just not as high as previous years. As for if they will be abandoned? Only time will tell. For our analysis, we looked into Stephen Curry as the main catalyst for the evolution of the three point shot. Stephen Curry has also influenced other players in the NBA, like LeBron James, as many players have now adopted three point shooting to thrive in the modern NBA. In terms of implications for stakeholders, our findings have many different implications. For one, fans of the NBA can gain a deeper understanding of the game of basketball and see how it has changed over the years as well as why it has changed. Along with this, basketball players at all levels are now being influenced by the evolution of NBA shooting. More players are becoming three point specialists and revolving their game around three point shooting which ultimately reduces the dependence on genetics such as height and vertical.