Watching the 2017-18 NBA Semi-Finals between the Houston Rockets and Golden State Warriors, I was suprised how many 3-pt shots were attempted. It reminded me of the strategy I used for NBA Jam on Super Nintendo: grab a team with two good 3-pt shooters and chuck the ball at the baseline. Figured I’d do some exploratory analysis on 3-point shots as a percentage of total Field Goal shots.
Packages I used for the exploratory analysis are dplyr, tidyr, ggplot2, tidyverse, scales, RColorBrewer, directlabels, and gridExtra.
I received all of the data from https://www.basketball-reference.com/. I scraped the player totals from season beginning 1987-2017. Players with a team name of “TOT” were removed since this refers to players who were traded among different teams and thus their stats were double counted. The data was then combined into one data set which is where this analysis begins.
I pulled the data set in, changed up the headers and then grouped it by year to get a total look before looking by team.
raw_data <- as_tibble(read.csv("Player Total 87-17.csv"))
tidy_data <- raw_data %>%
select(Year, Player, Position = Pos, Team = Tm, Made_3P = X3P,Attempt_3P = X3PA,
Made_2P = X2P,Attempt_2P = X2PA, Made_FT = FT, Attempt_FT = FTA, Off_Rebound = ORB,
Def_Rebound = DRB, Assists = AST, Steals = STL, Blocks = BLK, Turnovers = TOV, Fouls = PF)
Once the data is in a nicer format, I grouped the data by year, and created the needed columns for the analysis. I also created the same data grouped by Team to get a more granular look
year_data <- tidy_data %>%
group_by(Year) %>%
summarize(Attempt_2P = sum(Attempt_2P, na.rm = TRUE),
Made_2P = sum(Made_2P, na.rm = TRUE),
Attempt_3P = sum(Attempt_3P, na.rm = TRUE),
Made_3P = sum(Made_3P, na.rm = TRUE),
Attempt_FT = sum(Attempt_FT, na.rm = TRUE),
Made_FT = sum(Made_FT, na.rm = TRUE)) %>%
ungroup() %>%
mutate(Percent_Attempt_3P = Attempt_3P / (Attempt_3P + Attempt_2P),
Percent_Made_3P = Made_3P / (Made_3P + Made_2P),
Accuracy_2P = Made_2P / Attempt_2P,
Accuracy_3P = Made_3P / Attempt_3P,
Accuracy_FG = (Made_2P+Made_3P)/(Attempt_2P+Attempt_3P),
Accuracy_FT = Made_FT / Attempt_FT)
team_data <- tidy_data %>%
group_by(Year, Team) %>%
summarize(Attempt_2P = sum(Attempt_2P, na.rm = TRUE),
Made_2P = sum(Made_2P, na.rm = TRUE),
Attempt_3P = sum(Attempt_3P, na.rm = TRUE),
Made_3P = sum(Made_3P, na.rm = TRUE),
Attempt_FT = sum(Attempt_FT, na.rm = TRUE),
Made_FT = sum(Made_FT, na.rm = TRUE)) %>%
ungroup() %>%
mutate(Percent_Attempt_3P = Attempt_3P / (Attempt_3P + Attempt_2P),
Percent_Made_3P = Made_3P / (Made_3P + Made_2P),
Accuracy_2P = Made_2P / Attempt_2P,
Accuracy_3P = Made_3P / Attempt_3P,
Accuracy_FG = (Made_2P+Made_3P)/(Attempt_2P+Attempt_3P),
Accuracy_FT = Made_FT / Attempt_FT)
My first instinct is to create a plot showing the total number of attempted 3-pt shots as a percentage of total attempted field goal shots. I also wanted to add total number of made 3-pt shots as a percentage of total made field goal shots. That way we can see if there are any obvious points where accuracy increased.
Graphing is an iterative process. What you initially start out with might night be what you end up with. The goal is to design a graph for your audience that is easy to understand. Even though there are flashy graphs out there, the best ones are where your audience spends time thinking about the message you’re trying to send rather than what the graph is trying to tell them. I focus on bringing attention to the areas where people naturally look and bringing only the important parts to the foreground. Below is the code used for the first chart.
threes_data <- year_data %>%
select(Year, Attempted = Percent_Attempt_3P,Made = Percent_Made_3P) %>%
gather(Type, Percent, -Year)
h <- ggplot(threes_data, aes(x = Year, y = Percent, group = Type, color = Type))
h <- h + geom_line(size = 1.2)
h <- h + xlab("Season Beginning") + ylab("Percent of Total FG Shots")
h <- h + ggtitle("History of 3-Point Shots as a Percentage of Total FG Shots",
"3-Point attempt percentage has increased seven-fold since 1987")
h <- h + scale_x_continuous(breaks = c(1987,1993,1997,2011,2017))
h <- h + expand_limits(x = 2019)
h <- h + scale_y_continuous(breaks = seq(0,.5, by = .05), labels = percent)
h <- h + scale_color_brewer(palette = "Accent")
h <- h + geom_dl(aes(label = Type), method = list(dl.trans(x = x + 0.2), "last.points", cex = 0.8))
h <- h + geom_vline(xintercept = 1993, linetype = "dashed", color = "gray") +
geom_vline(xintercept = 1997, linetype = "dashed", color = "gray")
h <- h + annotate("rect", xmin = 1993, xmax = 1997, ymin = -Inf, ymax = Inf, fill = "blue", alpha = .05, color = NA)
h <- h + geom_vline(xintercept = 2011, linetype = "dashed", color = "gray") +
geom_vline(xintercept = 2017, linetype = "dashed", color = "gray")
h <- h + annotate("rect", xmin = 2011, xmax = 2017, ymin = -Inf, ymax = Inf, fill = "red", alpha = .05, color = NA)
h <- h + theme_classic() + theme(legend.position = "none")
h <- h + theme(axis.title.x=element_text(hjust=0,vjust=0.2))
h <- h + theme(axis.title.y=element_text(hjust=1,vjust=1))
h <- h + labs(caption = "Based on data from: https://www.basketball-reference.com")
h
The graph outlines two major points of interest. From the 1994-95 season to the 1996-97 season, they moved the distance of the 3-pt line from 23ft 9in to 22ft. The second from 2011-2017 is the advent of predictive analytics in the NBA. It is shown that the statistically better shots are 3-pt shots in the baseline or drives that draw fouls. A separate analysis could be done to see if there have been more focus on obtaining 3-point shooters and if shot selection has changed over time. However, I do not have the data for that.
We can do a very similar graph by team to see if certain teams show specific trends. I filtered the data by the top 8 teams that made the 2017-18 playoffs just for a quick look.
team_threes <- team_data %>%
select(Year = Year, Team = Team, Attempted = Percent_Attempt_3P, Made = Percent_Made_3P) %>%
gather(Type, Percent, -Year, -Team) %>%
filter(Team %in% c("GSW","CLE","HOU","BOS","TOR","UTA","PHI","NOP"))
t <- ggplot(team_threes, aes(x = Year, y = Percent, group = Type, color = Type))
t <- t + geom_line()
t <- t + facet_wrap(~Team)
t <- t + xlab("Season Beginning") + ylab("Percent of Total FG Shots")
t <- t + ggtitle("History of 3-Point Shots as a Percentage of Total FG Shots (1987-2017)",
"Half of Houston's FG are 3-pt Attempts")
t <- t + scale_y_continuous(breaks = seq(0,.5, by = .1), labels = percent)
t <- t + scale_x_continuous(breaks = c(1990, 2005, 2017))
t <- t + scale_color_brewer(palette = "Accent")
t <- t + theme_bw() + theme(legend.position = "none")
t <- t + theme(axis.title.x=element_text(hjust=0,vjust=0.2))
t <- t + theme(axis.title.y=element_text(hjust=1,vjust=1))
t <- t + labs(caption = "Based on data from: https://www.basketball-reference.com")
t
Here, we can see Houston has the largest number of attempts, but most teams are showing an increasing trend toward this.
Now I’m curious if accuracy has changed at all. So I made a graph showing accuracy of 3-Pt mapped against 2-Pt and Free Throw shots. If more 3-pt shots are taken, does the accuracy decrease? It would also be interesting to see if the number of attempts change throughout the game. Are they heavier in the first half? Are there more or less attempts when down by a large margin?
accuracy_data <- year_data %>%
select(Year, `3-pt` = Accuracy_3P, `2-pt` = Accuracy_2P, FT = Accuracy_FT) %>%
gather(Type,Accuracy,-Year)
g <- ggplot(accuracy_data, aes(x = Year, y = Accuracy, group = Type))
g <- g + geom_line(size = 1.2)
g <- g + xlab("Season Beginning") + ylab("Percent Shots Made")
g <- g + ggtitle("Historical Accuracy by Shot Type", "Accuracy by shot type has been consistent over the years")
g <- g + labs(caption = "Based on data from: https://www.basketball-reference.com")
g <- g + scale_x_continuous(breaks = c(1987,2017))
g <- g + scale_y_continuous(breaks = c(0,.1,.2,.3,.4,.5,.6,.7,.8), labels = percent)
g <- g + expand_limits(y = 0, x = 2019)
g <- g + geom_dl(aes(label = Type), method = list(dl.trans(x = x + 0.2), "last.points", cex = 0.8))
g <- g + theme_classic()
g <- g + theme(axis.title.x=element_text(hjust=0,vjust=0.2))
g <- g + theme(axis.title.y=element_text(hjust=1,vjust=1))
g
I don’t see any evidence that accuracy has changed historically. There’s a slight increase during the 1994-95 rule change, but other noise could be due to a few dominating players. An additional analysis would need to be done to determine. Lastly, let’s take a look at accuracy by team.
team_accuracy <- team_data %>%
select(Year = Year, Team = Team, `3-pt` = Accuracy_3P, `2-pt` = Accuracy_2P, FT = Accuracy_FT) %>%
gather(Type,Accuracy,-Year,-Team) %>%
filter(Team %in% c("GSW","CLE","HOU","BOS","TOR","UTA","PHI","NOP"))
m <- ggplot(team_accuracy, aes(x = Year, y = Accuracy, group = Type))
m <- m + geom_line(size = 1.2)
m <- m + facet_wrap(~Team)
m <- m + xlab("Season Beginning") + ylab("Percent Shots Made")
m <- m + ggtitle("Historical Accuracy by Shot Type", "Accuracy by shot type has been consistent over the years")
m <- m + labs(caption = "Based on data from: https://www.basketball-reference.com")
m <- m + scale_x_continuous(breaks = c(1987,2017))
m <- m + scale_y_continuous(breaks = c(0,.1,.2,.3,.4,.5,.6,.7,.8), labels = percent)
m <- m + expand_limits(x = 2021)
m <- m + geom_dl(aes(label = Type), method = list(dl.trans(x = x + 0.2), "last.points", cex = 0.8))
m <- m + theme_bw()
m <- m + theme(axis.title.x=element_text(hjust=0,vjust=0.2))
m <- m + theme(axis.title.y=element_text(hjust=1,vjust=1))
m
There definitely is a trend of 3-point shots taking over a larger percentage of total field goal shots. Without more granular data, it’s hard to make any statistical inferences to why that is the case. There also is not any signs of 3-point shots increasing at a faster rate than other shot types. The accuracy increases shown on a team level can be due to having better players. If they were to release a new NBA Jam, it seems like the strategy would be the same as back in 1994: grab a team with good 3 point shooters and keep chucking at the baseline.