The National Football League is one of the most popular sports leagues in modern day America and, due to its dedicated fans, produces millions of dollars in revenue yearly. Consisting of 32 franchises, the yearly goal of winning the Superbowl is heavily reliant on the efficiency and consistency of the Quarterback. Though NFL teams with top ranked defenses have been known to win championships, quarterback performance is still important in order for a team to beat the best of the best, and it is unsurprisingly one of the highest paid positions on the team. This project will evaluate starting NFL Quarterbacks from the last 4 years for efficiency. Through the following analysis, we will develop strategies to answer the following question: how can a franchise manager or coach evaluate the consistency of a quarterback in order to win a Superbowl?
Like many other sports fans, I have found myself on both ends: happy with a teams performance, or flat out frustrated that they did not do as well as expected. I am big fan of football, specifically the NFL and am interested in exploring how to quantify performance and ease the frustrations felt by sports fans when their team’s quarterback does not perform as expected.
The data used in this project was retrieved from Kaggle, but was derived from the NFL official website, and consists of 29 variables with 40247 observations. With this data set having records for quarterbacks from 1970-2016, there will be a reduction in data size after subsetting down to current starting quarterbacks that have played for at least 4 years. There are 8 character variables and 21 numeric variables in total.
This project hopes to explore how consistent the current starting quarterbacks in the NFL are based on the number of starts, touchdown passes, interceptions, completion percentage, and rushing yards.
library(tidyverse) #ggplot, dplyr, readr
library(DT) #Data presentation in R Markdown
library(gridExtra) #Plot layout within R Markdown
library(knitr) #Table presenting with Kable
I will be utilizing the tidyverse, DT, gridExtra, and knitr packages in R for my analysis and findings. Uses for each package can be seen in the code above.
I created two new variables called “char” and “num” that could define what character and numeric variables I have in my data based on column number. I then used the sapply function to apply character and numeric formats to my columns.
setwd("~/BAN 6003 Intro to R/Project")
qb.orig<-read_csv("Game_Logs_Quarterback.csv")
char=c(1:3,5,7:11) #Character variable column numbers
num=c(4,6,12:29) #Numeric variable column numbers
qb.orig[,char]<-sapply(qb.orig[, char], as.character) #Converting to character
qb.orig[,num]<- sapply(qb.orig[, num], as.numeric) #Converting to numeric
Unfortunately, there was no “Current Player” or “Starter” variable in my data set. I had to subset my original data (which had all quarterbacks that played since 1970) and thus, decided to create a variable with all names of the 32 current starting quarterbacks in the NFL.
starting_qbs<-c('Brady, Tom','Roethlisberger, Ben',
'Wilson, Russell','Brees, Drew','Newton, Cam','Rodgers, Aaron',
'Carr, Derek','Ryan, Matt','Bradford, Sam','Wentz, Carson',
'Rivers, Philip','Luck, Andrew','Palmer, Carson',
'Stafford, Matthew','Prescott, Dak','Dalton, Andy',
'Winston, Jameis','Smith, Alex',
'Flacco, Joe','Taylor, Tyrod','Tannehill, Ryan','Manning, Eli',
'Bortles, Blake','Mariota, Marcus','Cousins, Kirk',
'Fitzpatrick, Ryan','Siemian, Trevor','Hoyer, Brian',
'Osweiler, Brock','Kessler, Cody','Keenum, Case',
'Gabbert, Blaine')
#32 Starting Quarterbacks in the NFL
I then filtered only observations including the starting 32 Quarterbacks, only games played within the last 4 years, and games played in both the regular and post season. I also replaced the NA values to be 0 for Rushing Attempts, Rushing Yards,Yards Per Carry, Rushing TDs, Fumbles, and Fumbles Lost.
dat=qb.orig %>%
filter(`Name` %in% starting_qbs,
`Season` %in% c('Regular Season','Postseason'),
`Year`>=2013)
dat[is.na(dat)] <- 0 # Setting NA Values =0
I will compare my final results I make based on two subsets of my data, regular season games played and post season games played.
Regular Season data will be subset and reduced to starters that have played in 85% of NFL games in the last 4 seasons, or 55 games total. In order to be a consistent quarterback, the player must be available to start the game to begin with. Though I did leave room for the unexpected injuries to occur to these players, availability to start a game is a necessity.
dat.regseason=dat %>%
filter(`Season`=='Regular Season',
`Games Started`==1)%>%
group_by(`Player Id`) %>%
filter(sum(`Games Started`) >= 55)
Here are all Quarterbacks that meet the criteria of being a “consistent” regular season starter by playing for 85% of the games for the last 4 seasons.
## [1] "Brees, Drew" "Rivers, Philip" "Wilson, Russell"
## [4] "Smith, Alex" "Stafford, Matthew" "Brady, Tom"
## [7] "Ryan, Matt" "Roethlisberger, Ben" "Manning, Eli"
## [10] "Tannehill, Ryan" "Newton, Cam" "Dalton, Andy"
## [13] "Flacco, Joe" "Rodgers, Aaron"
Post Season data derived from the original will be a subset to only show quarterbacks that have started at least 5 post season games since 2013. Below are the names of the qualifying quarterbacks as well as the final post season game log data set.
dat.postseason=dat %>%
filter(`Season`=='Postseason')%>%
group_by(`Player Id`) %>%
filter(sum(`Games Started`) >= 5) #Played at least 5 post season games in the last 4 years
unique(dat.postseason$`Name`)
## [1] "Wilson, Russell" "Brady, Tom" "Roethlisberger, Ben"
## [4] "Newton, Cam" "Luck, Andrew" "Rodgers, Aaron"
datatable(dat.postseason,caption='Post Season Data (Cleaned)')
For reference, here are all qualifying regular season quarterbacks that have started 85% of games from 2013-2016:
## [1] "Brees, Drew" "Rivers, Philip" "Wilson, Russell"
## [4] "Smith, Alex" "Stafford, Matthew" "Brady, Tom"
## [7] "Ryan, Matt" "Roethlisberger, Ben" "Manning, Eli"
## [10] "Tannehill, Ryan" "Newton, Cam" "Dalton, Andy"
## [13] "Flacco, Joe" "Rodgers, Aaron"
I will begin my analysis of Quarterbacks by measuring the top five performers in touchdown and interception rate. In order to be a good pocket passing quarterback, the position must not only be able to throw touchdowns, but minimize turning the ball over to the other team with costly interceptions.
#### REGULAR SEASON SUMMARIES
QB_REG_SUM=dat.regseason%>%
group_by(`Name`)%>%
summarise(AVG_TD=round(mean(`TD Passes`),digits=2),AVG_PASS_COMPLETION_PCT=round(mean(`Completion Percentage`),digits=1),AVG_INTERCEPTIONS=round(mean(`Ints`),digits=2),AVG_PASSES_PER_GAME=round(mean(`Passes Attempted`),digits=0),AVG_YARDS_PER_CARRY=round(mean(`Yards Per Carry`),digits=1),AVG_SACKS=round(mean(`Sacks`),digits=1))
#Top 5 Touchdowns Thrown/Game
td<-arrange(QB_REG_SUM,desc(`AVG_TD`))
kable(td[1:5,1:2], caption='Touchdowns Per Game Leaders (2013-2016)')
Name | AVG_TD |
---|---|
Brees, Drew | 2.24 |
Rodgers, Aaron | 2.21 |
Brady, Tom | 2.03 |
Rivers, Philip | 1.95 |
Roethlisberger, Ben | 1.88 |
dat.regseason%>%
filter(`Name` %in% td$Name[1:5])%>%
ggplot(aes(`TD Passes`,fill=`Name`)) + facet_grid(~ `Name`,scales='fixed') + geom_histogram(binwidth = 1)+ggtitle("Touchdown Pass Distribution Since 2013")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")
For the last 4 seasons, Drew Brees appears to be leading starting quarterbacks at 2.24 touchdowns thrown per game, with Aaron Rodgers and Tom Brady falling closely behind. We can also see from the touchdown pass distributions that only Rodgers and Brady have more frequent 2 touchdown games in the regular season.
#Top 5 Least Interceptions Thrown/Game
int<-arrange(QB_REG_SUM,`AVG_INTERCEPTIONS`)
kable(int[1:5,c(1,4)], caption='Least Interceptions Thrown Per Game (2013-2016)')
Name | AVG_INTERCEPTIONS |
---|---|
Rodgers, Aaron | 0.46 |
Smith, Alex | 0.46 |
Brady, Tom | 0.48 |
Wilson, Russell | 0.55 |
Newton, Cam | 0.82 |
dat.regseason%>%
filter(`Name` %in% int$Name[1:5])%>%
ggplot(aes(`Ints`,fill=`Name`)) + facet_grid(~ `Name`,scales='fixed') + geom_histogram(binwidth = 1)+ggtitle("Interceptions Thrown Distribution Since 2013")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")+xlab("Number of Interceptions")
Aaron Rodgers, Alex Smith, and Tom Brady all appear to have minimized their throwing of interceptions and turning the ball over. Though some interceptions may be caused by Wide Receiver catching error, we will assume most interceptions thrown are due to quarterback decision error. Note that both Brady and Rodgers have not had a 3 interception game for the past 4 seasons combined.
One of the best ways to measure a quarterbacks throwing accuracy is by Pass Completion Percentage. Though reliant upon a catch from the wide receiver, pass completions give great insight on how quarterbacks interpret defensive coverage. Below are the top five most efficient pass completing quarterbacks as well as each players corresponding average passes thrown per game.
pcent<-arrange(QB_REG_SUM,desc(`AVG_PASS_COMPLETION_PCT`))
kable(pcent[1:5,c(1,3)], caption='Pass Completion Percentage Leaders (2013-2016)')
Name | AVG_PASS_COMPLETION_PCT |
---|---|
Brees, Drew | 69.4 |
Ryan, Matt | 67.7 |
Roethlisberger, Ben | 66.4 |
Rivers, Philip | 65.7 |
Smith, Alex | 65.3 |
passes<-arrange(QB_REG_SUM,desc(`AVG_PASSES_PER_GAME`))%>%
filter(`Name` %in% pcent$Name[1:5])
ggplot(passes[1:5,c(1,5)], aes(x=reorder(`Name`,-`AVG_PASSES_PER_GAME`),y=`AVG_PASSES_PER_GAME`,fill=`Name`))+ geom_bar(stat="identity")+ggtitle("Top 5 Most Passes Thrown Per Game (2013-2016)")+theme(plot.title = element_text(hjust = 0.5))+xlab("Name")+ylab("Average Number of Throws/Game")
Surprisingly, Drew Brees has the highest completion percentage as well as averaged the most passes thrown per game. Notable mentions for efficient pass completion percentage include Matt Ryan, Ben Roethlisberger, and Philip Rivers.
When passing options or time to throw is too limited due to strong defensive coverage, quarterbacks must use quick decision making in order to gain valuable yardage during a game and may decide to rush as the ball carrier themself. Many quarterbacks in league history have had success at becoming “duel threat”, meaning they can gain yards by both passing and rushing. Lets take a look at the top rushing quarterbacks by average yards gained per rushing attempt.
#Top 5 Rushing Yards Per Attempt
carry<-arrange(QB_REG_SUM,desc(`AVG_YARDS_PER_CARRY`))
kable(carry[1:5,c(1,6)], caption='Rushing Yards Per Attempt Leaders (2013-2016)')
Name | AVG_YARDS_PER_CARRY |
---|---|
Rodgers, Aaron | 5.4 |
Wilson, Russell | 5.4 |
Newton, Cam | 4.7 |
Smith, Alex | 4.7 |
Tannehill, Ryan | 4.7 |
Aaron Rodgers and Russell Wilson both appear to be great rushers when scrambling for yardage. Surprisingly enough, when compared to the most sacked quarterbacks in the league for the last four years, we can see Rodgers, Wilson, and Newton being quarterbacks that are pressured to rush frequently to avoid being sacked.
sacks<-arrange(QB_REG_SUM,desc(`AVG_SACKS`))
ggplot(sacks[1:5,c(1,7)], aes(x=reorder(`Name`,-AVG_SACKS),y=`AVG_SACKS`,fill=`Name`))+ geom_bar(stat="identity")+ggtitle("Top 5 Most Sacked Quarterbacks Per Game (2013-2016)")+theme(plot.title = element_text(hjust = 0.5))+xlab("Name")+ylab("Average Number of Sacks/Game")
This section of our analysis is provided to give good insight on how our qualifying regular season quarterbacks perform as whole over the season. Below we can see the trends from our quarterbacks:
#(ALL) Passes Attempted vs Passing Yards
ggplot(data=dat.regseason, aes(x=`Passing Yards`, y=`Passes Attempted`)) + geom_point(color="blue", size=2, shape=17)+geom_smooth(method='lm',color='red')+facet_wrap(~`Year`, nrow=2)+ggtitle("Passes Attempted vs Passing Yards (2013-2016)")+theme(plot.title = element_text(hjust = 0.5))
Passes Attempted appear to be mostly over 20 per game from 2013-2016, proving that efficient throwing offenses are well used throughout the NFL. There also appears to be less 100 yard games from 2013-2016.
#(ALL) Regular Season Touchdown Passes by Year Since 2013
plot1<-ggplot(dat.regseason, aes(x=`Year`,y=`TD Passes`, fill=factor(`Year`)))+ ylim(0, 400)+ geom_bar(stat="identity")+ggtitle("Touchdown Passes (2013-2016)")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")+ scale_fill_brewer(palette="Set1")
#(ALL) Regular Season Total Passes Attempted By Year Since 2013
plot2<-ggplot(dat.regseason, aes(x=`Year`,y=`Passes Attempted`, fill=factor(`Year`)))+ geom_bar(stat="identity")+ggtitle("Passes Attempted (2013-2016)")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")+ scale_y_continuous(breaks=seq(0,8000,1000))+ scale_fill_brewer(palette="Set1")
grid.arrange(plot1, plot2, ncol=2)
We can also see that from 2013-2015, the total number of Touchdown Passes for our qualifying quarterbacks seemed to increase. Though there is a slight dip of TD Passes in 2016, I expect to see an increase in passes over the next 4 years due to the number of attempted being virtually the same from 2013-2016. This signifies that coaches of qualifying quarterbacks are willing to pass the football, and on average, have improved their quarterbacks play over the last 4 years.
#(ALL) TD Pass Distribution BY Year
ggplot(data=dat.regseason, aes(x = `TD Passes`,fill=factor(`Year`)))+ggtitle("Regular Season Touch Down Pass Distribution Since 2013") + geom_histogram(binwidth = 1)+facet_grid(~Year)+scale_fill_brewer(palette="Set1")+ labs(fill='Year')
From the histogram above, we can see that from 2013-2015, there appears to be a trend of our quarterbacks throwing more 2 Touchdown games as opposed to only throwing 1. Though 2016 has more 1 Touchdown games, note that there is a significant increase in 4 touchdown games when compared to 2013-2015.
We can see that the total Touchdown Passes by our qualifying quarterbacks appears to be quite uniform, with an exception to weeks 12-14, which have a small increase in the number of touchdown passes. We can also see that Passing Yards appear to be at the highest during Weeks 1 and 2, possibly because of the newness of the football season early on. Along with Touchdown Passes, Passing Yards also increase in value during Weeks 12-14. A possible explanation for this spike is that teams are be more likely to throw during these games, as playoffs and seed positioning are on the line late in the season. However, the line graph below shows a decreasing trend of Passes Attempted in Week 12-14.
#(ALL) Regular Season Touchdown Passes By Week Since 2013
plot3<-ggplot(dat.regseason, aes(x=`Week`,y=`TD Passes`, fill=`Week`))+ ylim(0, 110)+scale_x_continuous(breaks=(1:17))+ geom_bar(stat="identity", fill="darkolivegreen")+ggtitle("Touchdown Passes (2013-2016)")+theme(plot.title = element_text(hjust = 0.5))
plot4<-ggplot(dat.regseason, aes(x=`Week`,y=`Passing Yards`, fill=`Week`))+scale_x_continuous(breaks=(1:17))+ geom_bar(stat="identity", fill="darkorange2")+ggtitle("Passing Yards (2013-2016)")+theme(plot.title = element_text(hjust = 0.5))
grid.arrange(plot3, plot4, ncol=2)
sum<-dat.regseason %>%
group_by(`Week`)%>%
summarise(Avg_Passes=round(mean(`Passes Attempted`),digits=0))
ggplot(sum, aes(x=`Week`,y=`Avg_Passes`))+
geom_line(colour='darkorange2',size=2)+
geom_point(colour='darkolivegreen',size=3)+
scale_x_continuous(breaks=(1:17))+
ggtitle("Average Number of Passes Thrown (2013-2016)")+
theme(plot.title = element_text(hjust = 0.5))
For Post Season analysis, we will only be examining quarterbacks that have started 5 playoff games from 2013-2016. Below are the qualifying quarterbacks for examination:
dat.postseason=dat %>%
filter(`Season`=='Postseason')%>%
group_by(`Player Id`) %>%
filter(sum(`Games Started`) >= 5) #Played at least 5 post season games in the last 4 years
unique(dat.postseason$`Name`)
## [1] "Wilson, Russell" "Brady, Tom" "Roethlisberger, Ben"
## [4] "Newton, Cam" "Luck, Andrew" "Rodgers, Aaron"
The Post Season is the point at which quarterback performance is most vital. Touchdowns and Interceptions can decide if a team will win the Super Bowl and are at utmost importance for this analysis.
QB_POST_SUM=dat.postseason%>%
group_by(`Name`)%>%
summarise(AVG_TD=round(mean(`TD Passes`),digits=2),AVG_PASS_COMPLETION_PCT=round(mean(`Completion Percentage`),digits=1),AVG_INTERCEPTIONS=round(mean(`Ints`),digits=2),AVG_PASSES_PER_GAME=round(mean(`Passes Attempted`),digits=0),AVG_YARDS_PER_CARRY=round(mean(`Yards Per Carry`),digits=1),AVG_SACKS=round(mean(`Sacks`),digits=1))
#Top 5 Touchdowns Thrown/Game
td<-arrange(QB_POST_SUM,desc(`AVG_TD`))
kable(td[1:5,1:2], caption='Touchdowns Per Game Leaders (2013-2016)')
Name | AVG_TD |
---|---|
Rodgers, Aaron | 2.25 |
Brady, Tom | 2.10 |
Luck, Andrew | 1.80 |
Wilson, Russell | 1.70 |
Newton, Cam | 1.33 |
dat.postseason%>%
filter(`Name` %in% td$Name[1:5])%>%
ggplot(aes(`TD Passes`,fill=`Name`)) + facet_grid(~ `Name`,scales='fixed') + geom_histogram(binwidth = 1)+ggtitle("Touchdown Pass Distribution Since 2013")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")+ scale_fill_brewer(palette="Set2")
Aaron Rodgers and Tom Brady both appear to be leaders in touchdowns thrown in the Post Season. As seen below, these quarterbacks both also throw the least interceptions. Russell Wilson also appears to be an efficient passer in the post season, ranking 4th in touchdowns and least interceptions thrown.
#Top 5 Least Interceptions Thrown/Game
int<-arrange(QB_POST_SUM,`AVG_INTERCEPTIONS`)
kable(int[1:5,c(1,4)], caption='Least Interceptions Thrown Per Game (2013-2016)')
Name | AVG_INTERCEPTIONS |
---|---|
Rodgers, Aaron | 0.62 |
Brady, Tom | 0.90 |
Roethlisberger, Ben | 1.00 |
Wilson, Russell | 1.00 |
Newton, Cam | 1.17 |
dat.regseason%>%
filter(`Name` %in% int$Name[1:5])%>%
ggplot(aes(`Ints`,fill=`Name`)) + facet_grid(~ `Name`,scales='fixed') + geom_histogram(binwidth = 1)+ggtitle("Interceptions Thrown Distribution Since 2013")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")+xlab("Number of Interceptions")+ scale_fill_brewer(palette="Set2")
Name | AVG_PASS_COMPLETION_PCT |
---|---|
Roethlisberger, Ben | 65.8 |
Brady, Tom | 62.9 |
Newton, Cam | 61.5 |
Rodgers, Aaron | 61.3 |
Wilson, Russell | 60.8 |
We can see that Ben Roethlisberger leads all qualifying quarterbacks in completion percentage with 65.8% of passes completed, followed closely behind by Tom Brady and Cam Newton.
Based on the below table, Andrew Luck, Aaron Rodgers, and Russell Wilson are all top rushing quarterbacks in the Post Season. Also note from the histogram below that Wilson and Rodgers are among the most sacked quarterbacks within post season games.
#Top 5 Rushing Yards Per Attempt
carry<-arrange(QB_POST_SUM,desc(`AVG_YARDS_PER_CARRY`))
kable(carry[1:5,c(1,6)], caption='Rushing Yards Per Attempt Leaders (2013-2016)')
Name | AVG_YARDS_PER_CARRY |
---|---|
Luck, Andrew | 7.1 |
Rodgers, Aaron | 5.7 |
Wilson, Russell | 5.6 |
Newton, Cam | 4.4 |
Brady, Tom | 2.5 |
sacks<-arrange(QB_POST_SUM,desc(`AVG_SACKS`))
ggplot(sacks[1:5,c(1,7)], aes(x=reorder(`Name`,-AVG_SACKS),y=`AVG_SACKS`,fill=`Name`))+ geom_bar(stat="identity")+ggtitle("Top 5 Most Sacked Quarterbacks Per Game (2013-2016)")+theme(plot.title = element_text(hjust = 0.5))+xlab("Name")+ylab("Average Number of Sacks/Game")
How can a franchise manager or coach evaluate the consistency of a quarterback in order to win a Superbowl?
Used tidyverse to clean original quarterback game log data.
Summarized qualified quarterbacks on average Touchdown Passes, Interceptions, Pass Completion Percentage, Yards Per Carry, and Sacks per game. Recorded the best value in each category to develop “consistent” standard.
Graphed and displayed individual quarterback trends as well as Year and Week tendencies for all combined qualified quarterbacks to gain insight on changes from 2013-2016.
In order to evaluate quarterbacks for their “consistent” performance, we must keep the goal of winning a Super Bowl championship in mind. Evaluating a quarterback’s decision making for consistency must be addressed using both passing and rushing measures on positive yardage acquired from the play. Below are the findings of the analysis conducted:
While Rodgers is the 5th most sacked quarterback since 2013, he still manages to be a statistically consistent quarterback. His regular season averages, near identical to post season, are the closest to the best measures of all other quarterbacks (2.21 TD,0.46 INT, 64.8% Pass Completion, 5.4 Yds/Carry).
To a coach or team manager, making sure a quarterback can average the above statistics can prove optimal for winning a Super Bowl. Aiming for at least 2 touchdowns, 60% completion, and 5 Yds/Carry for each game, while also minimizing interceptions, will optimize quarterback performance. Further analysis that would be beneficial to this project include yards per attempt, fumble and injury mimimizing, performance for home vs away games as well as indoor vs outdoor stadiums, and offensive vs defensive scheme analysis. This analysis did assume that all game logs were complete games, and did not account for quarterbacks leaving midway injured, though starting the game.