The Data Set, NBA Free Throws, provided by Kaggle.com, contains statistics on over 600k free throw shots taken from the 2006-2016 NBA seasons.
The information included within this data set is the final game score, the teams playing, the game ID, the period the free throw shot was taken, the player that took it, what game was it taking during (regular season game or the playoffs), the current game score after the free throw shot, the season the game was played, if the free throw shot was made or missed, and at what time during the game the free throw shot was taken.
For the purposes of my analysis and the relevant data visualizations, the only variables from this data set that I utilized were the period, the player, the game type, the season, and if the free throw shot was made or missed.
Initially, I wanted to get an idea of the correlation between how many free throw shots were taken during the regular season games vs the playoffs to decide if there would be any benefit to separating out these two types of free throw shots.
As illustrated by the data visualization, there is a significant difference in the number of free throw shots taken during regular season games and playoff games, but since the playoff game free throws are so minimal and consistent, I ultimately decided to include both types in my analysis and instead focus on other variables such as the seasons, made vs missed shots, and the players to gather a more comprehensive understanding of the data set.
getwd()
## [1] "/Users/taylorbosse/Desktop"
setwd("/Users/taylorbosse/Desktop")
library(tidyverse)
library(readr)
library(dplyr)
library(lubridate)
library(scales)
library(ggplot2)
library(ggthemes)
library(RColorBrewer)
library(data.table)
filename<-"free_throws.csv"
df<-fread(filename)
games<-df%>%
select(playoffs,season,shot_made)%>%
mutate(gametype=ifelse(playoffs=="regular","Regular",ifelse(playoffs=="playoffs","Playoffs","NA")))%>%
group_by(gametype,season)%>%
summarise(n=length(playoffs))%>%
data.frame()
one<-ggplot(games,aes(x=season,y=n,group=gametype))+
geom_line(aes(color=gametype),size=3)+
labs(title="Multiple Line Plot: # of Free Throws per Season by Game Type",x="Season",y="# of Free Throws")+
theme_light()+
theme(plot.title=element_text(hjust=0.5))+
geom_point(shape=21,linewidth=5,color="black",fill="white")+
scale_color_brewer(palette="Paired",name="Game Type")
one
To get a better idea of the data set as a whole, I created a histogram of all free throw shots, including both regular and playoff games as well as both made and missed shots. Although I opted to order the graph in chronological order of seasons, I was interested in seeing which season that the most free throw shots were taken so that I could then take a closer look at that season’s specific statistics.
Based on this data visualization, we can see that the first season included in the data set, the 2006-2007 season, was the one in which the most free throw shots were taken.
two<-ggplot(df,aes(x=season))+
geom_histogram(bins=10,stat="count",color="darkblue",fill="lightblue")+
labs(title="Histogram of Free Throws by Season",x="Season",y="# of Free Throws")+
scale_y_continuous(labels=comma)
two
After determining that the 2006-2007 season had the most free throw shots taken, I wanted to know which players took the most free throw shots during the season in which there were the most free throw shots taken overall.
Since this illustration was based on a singular season and didn’t need to be sorted chronologically, I decided to order it by the players who took the most shots to the players that took the least shots so that we could easily see the comparisons. For the purposes of this graph, I chose to only display the top 10 players; The top 10 players who took the most free throw shots (made or missed) during the 2006-2007 season were: LeBron James, Kobe Bryant, Gilbert Arenas, Tim Duncan, Dwight Howard, Amare Stoudemire, Vince Carter, Eddy Curry, Allen Iverson, and Corey Maggette, respectively.
season1<-df[df$season=="2006 - 2007",]
s1_playercount<-data.frame(count(season1,player))
s1_playercount<-s1_playercount[order(s1_playercount$n,decreasing=TRUE),]
s1_playercount$n<-as.numeric(s1_playercount$n)
three<-ggplot(s1_playercount[1:10,],aes(x=reorder(player,-n),y=n))+
geom_bar(color="black",fill="gray76",stat="identity")+
labs(title="Number of Free Throws by Player (Top 10) 2006-2007",x="Player",y="# of Free Throws")+
theme(plot.title=element_text(hjust=0.5))
three
Since we looked at the overall number of free throws taken per season before we identified the season with the most free throw shots taken, I figured it might be beneficial to determine which players took the most free throw shots (made or missed) throughout the entirety of the data set which includes 10 seasons.
Although this illustration is displaying data from an extended period of time, for the purposes of this graph I also chose to order it by the players who took the most shots to the players that took the least shots- also only displaying the top 10 players. According to this data visualization, the top 10 players from the 2006-2016 seasons that took the most free throw shots were: LeBron James, Dwight Howard, Kevin Durant, Dwayne Wade, Kobe Bryant, Carmelo Anthony, Dirk Nowitzki, James Harden, Russell Westbrook, and Chris Bosh, respectively.
playercount<-data.frame(count(df,player))
playercount<-playercount[order(playercount$n,decreasing=TRUE),]
playercount$n<-as.numeric(playercount$n)
four<-ggplot(playercount[1:10,],aes(x=reorder(player,-n),y=n))+
geom_bar(color="black",fill="gray76",stat="identity")+
labs(title="Number of Free Throws by Player (Top 10) 2006-2016",x="Player",y="# of Free Throws")+
theme(plot.title=element_text(hjust=0.5))
four
After exploring the overall data and focusing on the statistics in terms of the players, I wanted to look at the free throw shots that were made and the free throw shots that were missed to get a better idea of how many points can be attributed to free throw shots alone. The dataset includes periods 1-8, but traditionally basketball games are only 4 periods and since I could not find an explanation for what periods 5-8 were referencing, I simply decided to only include the statistics from periods 1-4 for the purposes of this analysis and these data visualizations.
First, I wanted to see how many were made vs how many were missed over the entirety of the 10 seasons included in the data set but instead of looking at the generalized overall numbers I decided to break it out by season so that we could determine which season had the most points scored by free throws.
As illustrated by the data visualization, the made vs missed ratio is fairly consistent across all of the seasons- about ¾ of the free throws are made, while about ¼ of the free throws are missed. We can see from the percentages included on the pie charts that the 2008-2009 season had the most free throw shots made- 76.9% of the shots scored and 23.1% of the shots missed.
periods<-df[df$period<=4,]
periods_df<-periods%>%
select(season,shot_made)%>%
mutate(ft_status=ifelse(shot_made=="1","Made",ifelse(shot_made=="0","Missed","NA")))%>%
group_by(season,ft_status)%>%
summarise(n=length(season))%>%
group_by(season)%>%
mutate(percent_of_total=round(100*n/sum(n),1))%>%
ungroup()%>%
data.frame()
five<-ggplot(data=periods_df,aes(x="",y=n,fill=ft_status))+
geom_bar(stat="identity",position="fill")+
coord_polar(theta="y",start=0)+
labs(fill="Shot",x=NULL,y=NULL,title="Pie Chart: # of Made vs Missed Free Throws per Season", caption="Only using Free Throw Statistics for Periods 1-4")+
theme_light()+
theme(plot.title = element_text(hjust=0.5),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid=element_blank())+
facet_wrap(~season,ncol=5,nrow=2)+
scale_fill_brewer(palette = "Blues")+
geom_text(aes(x=1.7,label=paste0(percent_of_total,"%")),
size=4,
position=position_fill(vjust=0.5))
five
Since I’ve looked at this data on free throws in terms of game type, seasons, and player, I wanted to also look at the trend seen in the previous pie chart but narrow it down to the last variable I haven’t explored yet: periods. Similarly, to the previous pie chart, I also only included statistics from periods 1-4.
I previously determined that the 2008-2009 season had the highest amount of free throw shots made so I decided to illustrate just this season by period to see if the made vs miss ratio also generally applies to free throw shots by periods as well.
This data visualization illustrates that this is the case, with the made vs miss ratio following the same trend of about ¾ of the free throw shots being made and about ¼ of the free throw shots missing.
periods_s3<-periods[periods$season=="2008 - 2009",]
periods_s3_df<-periods_s3%>%
select(period,season,shot_made)%>%
mutate(ft_status=ifelse(shot_made=="1","Made",ifelse(shot_made=="0","Missed","NA")))%>%
group_by(period,season,ft_status)%>%
summarise(n=length(period))%>%
group_by(period)%>%
mutate(percent_of_total=round(100*n/sum(n),1))%>%
ungroup()%>%
data.frame()
periods_s3_df$period<-as.character(periods_s3_df$period)
six<-ggplot(data=periods_s3_df,aes(x="",y=n,fill=ft_status))+
geom_bar(stat="identity",position="fill")+
coord_polar(theta="y",start=0)+
labs(fill="Shot",x=NULL,y=NULL,title="Pie Chart: Made vs Missed Free Throws by Period for the 2008-2009 Season", caption="Only using Free Throw Statistics for Periods 1-4")+
theme_light()+
theme(plot.title = element_text(hjust=0.5),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid=element_blank())+
facet_wrap(~period,ncol=2,nrow=2)+
scale_fill_brewer(palette = "Blues")+
geom_text(aes(x=1.7,label=paste0(percent_of_total,"%")),
size=4,
position=position_fill(vjust=0.5))
six
Lastly, I wanted to get an overall idea of the points scored solely from free throw shots (both regular and playoff season games) across all 10 seasons included in the data set, and if there was any specific time frame during a game that seemed to contribute the most points scored
Intuitively, I would assume that the 4th period would have more free throw shots and subsequently more points scored than any other period. The 4th period is generally more high-risk and fast-paced, especially if the game score is close and often fouling to stop a guaranteed basket is a strategy that players often use.
As illustrated by the data visualization, this is exactly the case. Not only were there significantly more free throw shots taken (and made) during the 4th period than any of the other 3 periods- but each period had more than the previous period- and this was true across the board for all 10 seasons.
heatmap_periods_df<-periods%>%
select(period,season,shot_made)%>%
mutate(ft_status=ifelse(shot_made=="1","Made",ifelse(shot_made=="0","Missed","NA")))%>%
group_by(period,season,ft_status)%>%
summarise(n=length(season))%>%
data.frame()
madeftperiods<-heatmap_periods_df[heatmap_periods_df$ft_status=="Made",]
madeftperiods$period<-as.character(madeftperiods$period)
seven<-ggplot(madeftperiods,aes(x=season,y=period,fill=n))+
geom_tile(color="black")+
geom_text(aes(label=n))+
coord_equal(ratio=1)+
labs(title="Heatmap: # of Made Free Throws by Period per Season",x="Season",y="Period",fill="# of FT")+
theme_minimal()+
theme(plot.title=element_text(hjust=0.5))+
scale_y_discrete(limits=rev(levels(madeftperiods$Season)))+
scale_fill_continuous(low="white",high="red")+
guides(fill=guide_legend(reverse=TRUE,override.aes=list(colour="black")))
seven
Ultimately, from this analysis and the relevant data visualizations, we can see that free throws are a strategic and important aspect of basketball. We also were able to determine that these trends are fundamental and have been consistent and prevalent in basketball during prior seasons and almost certainly will continue this way.
For example, something as small and isolated as a singular player who takes a significant number of free throws being able to increase their free throw shot scoring average could have the potential to influence the trajectory of an individual game or the overall trends explored here.