Problem Statement

The National Football League is one of the most popular sports leagues in modern day America and, due to its dedicated fans, produces millions of dollars in revenue yearly. Consisting of 32 franchises, the yearly goal of winning the Superbowl is heavily reliant on the efficiency and consistency of the Quarterback. Though NFL teams with top ranked defenses have been known to win championships, quarterback performance is still important in order for a team to beat the best of the best, and it is unsurprisingly one of the highest paid positions on the team. This project will evaluate starting NFL Quarterbacks from the last 4 years for efficiency. Through the following analysis, we will develop strategies to answer the following question: how can a franchise manager or coach evaluate the consistency of a quarterback in order to win a Superbowl?

Credit: USA Today

Synopsis/Data

Introduction

Like many other sports fans, I have found myself on both ends: happy with a teams performance, or flat out frustrated that they did not do as well as expected. I am big fan of football, specifically the NFL and am interested in exploring how to quantify performance and ease the frustrations felt by sports fans when their team’s quarterback does not perform as expected.

Original Data

The data used in this project was retrieved from Kaggle, but was derived from the NFL official website, and consists of 29 variables with 40247 observations. With this data set having records for quarterbacks from 1970-2016, there will be a reduction in data size after subsetting down to current starting quarterbacks that have played for at least 4 years. There are 8 character variables and 21 numeric variables in total.

This project hopes to explore how consistent the current starting quarterbacks in the NFL are based on the number of starts, touchdown passes, interceptions, completion percentage, and rushing yards.

Data Dictionary

Data Dictionary

Packages For Analysis

Tidyverse
library(tidyverse) #ggplot, dplyr, readr
library(DT) #Data presentation in R Markdown
library(gridExtra) #Plot layout within R Markdown
library(knitr) #Table presenting with Kable

I will be utilizing the tidyverse, DT, gridExtra, and knitr packages in R for my analysis and findings. Uses for each package can be seen in the code above.

Data Cleaning

Importing

I created two new variables called “char” and “num” that could define what character and numeric variables I have in my data based on column number. I then used the sapply function to apply character and numeric formats to my columns.

setwd("~/BAN 6003 Intro to R/Project")
qb.orig<-read_csv("Game_Logs_Quarterback.csv")

char=c(1:3,5,7:11) #Character variable column numbers
num=c(4,6,12:29)   #Numeric variable column numbers

qb.orig[,char]<-sapply(qb.orig[, char], as.character) #Converting to character
qb.orig[,num]<- sapply(qb.orig[, num], as.numeric) #Converting to numeric

Cleaning and Filtering

Unfortunately, there was no “Current Player” or “Starter” variable in my data set. I had to subset my original data (which had all quarterbacks that played since 1970) and thus, decided to create a variable with all names of the 32 current starting quarterbacks in the NFL.

starting_qbs<-c('Brady, Tom','Roethlisberger, Ben',
'Wilson, Russell','Brees, Drew','Newton, Cam','Rodgers, Aaron',
'Carr, Derek','Ryan, Matt','Bradford, Sam','Wentz, Carson',
'Rivers, Philip','Luck, Andrew','Palmer, Carson',
'Stafford, Matthew','Prescott, Dak','Dalton, Andy',
'Winston, Jameis','Smith, Alex',
'Flacco, Joe','Taylor, Tyrod','Tannehill, Ryan','Manning, Eli',
'Bortles, Blake','Mariota, Marcus','Cousins, Kirk',
'Fitzpatrick, Ryan','Siemian, Trevor','Hoyer, Brian',
'Osweiler, Brock','Kessler, Cody','Keenum, Case',
'Gabbert, Blaine')
#32 Starting Quarterbacks in the NFL

I then filtered only observations including the starting 32 Quarterbacks, only games played within the last 4 years, and games played in both the regular and post season. I also replaced the NA values to be 0 for Rushing Attempts, Rushing Yards,Yards Per Carry, Rushing TDs, Fumbles, and Fumbles Lost.

dat=qb.orig %>%
  filter(`Name` %in% starting_qbs,
         `Season` %in% c('Regular Season','Postseason'),
         `Year`>=2013)
dat[is.na(dat)] <- 0 # Setting NA Values =0 

Regular Season and Post Season Data Sets

I will compare my final results I make based on two subsets of my data, regular season games played and post season games played.

Regular Season

Regular Season data will be subset and reduced to starters that have played in 85% of NFL games in the last 4 seasons, or 55 games total. In order to be a consistent quarterback, the player must be available to start the game to begin with. Though I did leave room for the unexpected injuries to occur to these players, availability to start a game is a necessity.

dat.regseason=dat %>%  
  filter(`Season`=='Regular Season',
         `Games Started`==1)%>%
  group_by(`Player Id`) %>%
  filter(sum(`Games Started`) >= 55)

Here are all Quarterbacks that meet the criteria of being a “consistent” regular season starter by playing for 85% of the games for the last 4 seasons.

##  [1] "Brees, Drew"         "Rivers, Philip"      "Wilson, Russell"    
##  [4] "Smith, Alex"         "Stafford, Matthew"   "Brady, Tom"         
##  [7] "Ryan, Matt"          "Roethlisberger, Ben" "Manning, Eli"       
## [10] "Tannehill, Ryan"     "Newton, Cam"         "Dalton, Andy"       
## [13] "Flacco, Joe"         "Rodgers, Aaron"

Post Season

Post Season data derived from the original will be a subset to only show quarterbacks that have started at least 5 post season games since 2013. Below are the names of the qualifying quarterbacks as well as the final post season game log data set.

dat.postseason=dat %>%
  filter(`Season`=='Postseason')%>%
  group_by(`Player Id`) %>%
  filter(sum(`Games Started`) >= 5)  #Played at least 5 post season games in the last 4 years
 
unique(dat.postseason$`Name`)
## [1] "Wilson, Russell"     "Brady, Tom"          "Roethlisberger, Ben"
## [4] "Newton, Cam"         "Luck, Andrew"        "Rodgers, Aaron"
datatable(dat.postseason,caption='Post Season Data (Cleaned)')

Exploratory Analysis

Regular Season

Individual Quarterback Analysis

Qualifying Quarterbacks

For reference, here are all qualifying regular season quarterbacks that have started 85% of games from 2013-2016:

##  [1] "Brees, Drew"         "Rivers, Philip"      "Wilson, Russell"    
##  [4] "Smith, Alex"         "Stafford, Matthew"   "Brady, Tom"         
##  [7] "Ryan, Matt"          "Roethlisberger, Ben" "Manning, Eli"       
## [10] "Tannehill, Ryan"     "Newton, Cam"         "Dalton, Andy"       
## [13] "Flacco, Joe"         "Rodgers, Aaron"

Touchdowns and Interceptions

I will begin my analysis of Quarterbacks by measuring the top five performers in touchdown and interception rate. In order to be a good pocket passing quarterback, the position must not only be able to throw touchdowns, but minimize turning the ball over to the other team with costly interceptions.

#### REGULAR SEASON SUMMARIES
QB_REG_SUM=dat.regseason%>% 
  group_by(`Name`)%>%
  summarise(AVG_TD=round(mean(`TD Passes`),digits=2),AVG_PASS_COMPLETION_PCT=round(mean(`Completion Percentage`),digits=1),AVG_INTERCEPTIONS=round(mean(`Ints`),digits=2),AVG_PASSES_PER_GAME=round(mean(`Passes Attempted`),digits=0),AVG_YARDS_PER_CARRY=round(mean(`Yards Per Carry`),digits=1),AVG_SACKS=round(mean(`Sacks`),digits=1))

#Top 5 Touchdowns Thrown/Game
td<-arrange(QB_REG_SUM,desc(`AVG_TD`))
kable(td[1:5,1:2], caption='Touchdowns Per Game Leaders (2013-2016)')
Touchdowns Per Game Leaders (2013-2016)
Name AVG_TD
Brees, Drew 2.24
Rodgers, Aaron 2.21
Brady, Tom 2.03
Rivers, Philip 1.95
Roethlisberger, Ben 1.88
dat.regseason%>%
  filter(`Name` %in% td$Name[1:5])%>%
ggplot(aes(`TD Passes`,fill=`Name`)) + facet_grid(~ `Name`,scales='fixed') + geom_histogram(binwidth = 1)+ggtitle("Touchdown Pass Distribution Since 2013")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")

For the last 4 seasons, Drew Brees appears to be leading starting quarterbacks at 2.24 touchdowns thrown per game, with Aaron Rodgers and Tom Brady falling closely behind. We can also see from the touchdown pass distributions that only Rodgers and Brady have more frequent 2 touchdown games in the regular season.

#Top 5 Least Interceptions Thrown/Game 
int<-arrange(QB_REG_SUM,`AVG_INTERCEPTIONS`)
kable(int[1:5,c(1,4)], caption='Least Interceptions Thrown Per Game (2013-2016)')
Least Interceptions Thrown Per Game (2013-2016)
Name AVG_INTERCEPTIONS
Rodgers, Aaron 0.46
Smith, Alex 0.46
Brady, Tom 0.48
Wilson, Russell 0.55
Newton, Cam 0.82
dat.regseason%>%
  filter(`Name` %in% int$Name[1:5])%>%
ggplot(aes(`Ints`,fill=`Name`)) + facet_grid(~ `Name`,scales='fixed') + geom_histogram(binwidth = 1)+ggtitle("Interceptions Thrown Distribution Since 2013")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")+xlab("Number of Interceptions")

Aaron Rodgers, Alex Smith, and Tom Brady all appear to have minimized their throwing of interceptions and turning the ball over. Though some interceptions may be caused by Wide Receiver catching error, we will assume most interceptions thrown are due to quarterback decision error. Note that both Brady and Rodgers have not had a 3 interception game for the past 4 seasons combined.


Pass Completion Percentage

One of the best ways to measure a quarterbacks throwing accuracy is by Pass Completion Percentage. Though reliant upon a catch from the wide receiver, pass completions give great insight on how quarterbacks interpret defensive coverage. Below are the top five most efficient pass completing quarterbacks as well as each players corresponding average passes thrown per game.

pcent<-arrange(QB_REG_SUM,desc(`AVG_PASS_COMPLETION_PCT`))
kable(pcent[1:5,c(1,3)], caption='Pass Completion Percentage Leaders (2013-2016)')
Pass Completion Percentage Leaders (2013-2016)
Name AVG_PASS_COMPLETION_PCT
Brees, Drew 69.4
Ryan, Matt 67.7
Roethlisberger, Ben 66.4
Rivers, Philip 65.7
Smith, Alex 65.3
passes<-arrange(QB_REG_SUM,desc(`AVG_PASSES_PER_GAME`))%>%
  filter(`Name` %in% pcent$Name[1:5])
ggplot(passes[1:5,c(1,5)], aes(x=reorder(`Name`,-`AVG_PASSES_PER_GAME`),y=`AVG_PASSES_PER_GAME`,fill=`Name`))+ geom_bar(stat="identity")+ggtitle("Top 5 Most Passes Thrown Per Game (2013-2016)")+theme(plot.title = element_text(hjust = 0.5))+xlab("Name")+ylab("Average Number of Throws/Game")

Surprisingly, Drew Brees has the highest completion percentage as well as averaged the most passes thrown per game. Notable mentions for efficient pass completion percentage include Matt Ryan, Ben Roethlisberger, and Philip Rivers.


Rushing Attempt Efficency and Sacks

When passing options or time to throw is too limited due to strong defensive coverage, quarterbacks must use quick decision making in order to gain valuable yardage during a game and may decide to rush as the ball carrier themself. Many quarterbacks in league history have had success at becoming “duel threat”, meaning they can gain yards by both passing and rushing. Lets take a look at the top rushing quarterbacks by average yards gained per rushing attempt.

#Top 5 Rushing Yards Per Attempt 
carry<-arrange(QB_REG_SUM,desc(`AVG_YARDS_PER_CARRY`))
kable(carry[1:5,c(1,6)], caption='Rushing Yards Per Attempt Leaders (2013-2016)')
Rushing Yards Per Attempt Leaders (2013-2016)
Name AVG_YARDS_PER_CARRY
Rodgers, Aaron 5.4
Wilson, Russell 5.4
Newton, Cam 4.7
Smith, Alex 4.7
Tannehill, Ryan 4.7

Aaron Rodgers and Russell Wilson both appear to be great rushers when scrambling for yardage. Surprisingly enough, when compared to the most sacked quarterbacks in the league for the last four years, we can see Rodgers, Wilson, and Newton being quarterbacks that are pressured to rush frequently to avoid being sacked.

sacks<-arrange(QB_REG_SUM,desc(`AVG_SACKS`))

ggplot(sacks[1:5,c(1,7)], aes(x=reorder(`Name`,-AVG_SACKS),y=`AVG_SACKS`,fill=`Name`))+ geom_bar(stat="identity")+ggtitle("Top 5 Most Sacked Quarterbacks Per Game (2013-2016)")+theme(plot.title = element_text(hjust = 0.5))+xlab("Name")+ylab("Average Number of Sacks/Game")

Post Season

Qualifying Quarterbacks

For Post Season analysis, we will only be examining quarterbacks that have started 5 playoff games from 2013-2016. Below are the qualifying quarterbacks for examination:

dat.postseason=dat %>%
  filter(`Season`=='Postseason')%>%
  group_by(`Player Id`) %>%
  filter(sum(`Games Started`) >= 5)  #Played at least 5 post season games in the last 4 years

unique(dat.postseason$`Name`)
## [1] "Wilson, Russell"     "Brady, Tom"          "Roethlisberger, Ben"
## [4] "Newton, Cam"         "Luck, Andrew"        "Rodgers, Aaron"

Touchdowns and Interceptions

The Post Season is the point at which quarterback performance is most vital. Touchdowns and Interceptions can decide if a team will win the Super Bowl and are at utmost importance for this analysis.

QB_POST_SUM=dat.postseason%>% 
  group_by(`Name`)%>%
  summarise(AVG_TD=round(mean(`TD Passes`),digits=2),AVG_PASS_COMPLETION_PCT=round(mean(`Completion Percentage`),digits=1),AVG_INTERCEPTIONS=round(mean(`Ints`),digits=2),AVG_PASSES_PER_GAME=round(mean(`Passes Attempted`),digits=0),AVG_YARDS_PER_CARRY=round(mean(`Yards Per Carry`),digits=1),AVG_SACKS=round(mean(`Sacks`),digits=1))

#Top 5 Touchdowns Thrown/Game
td<-arrange(QB_POST_SUM,desc(`AVG_TD`))
kable(td[1:5,1:2], caption='Touchdowns Per Game Leaders (2013-2016)')
Touchdowns Per Game Leaders (2013-2016)
Name AVG_TD
Rodgers, Aaron 2.25
Brady, Tom 2.10
Luck, Andrew 1.80
Wilson, Russell 1.70
Newton, Cam 1.33
dat.postseason%>%
  filter(`Name` %in% td$Name[1:5])%>%
ggplot(aes(`TD Passes`,fill=`Name`)) + facet_grid(~ `Name`,scales='fixed') + geom_histogram(binwidth = 1)+ggtitle("Touchdown Pass Distribution Since 2013")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")+ scale_fill_brewer(palette="Set2")

Aaron Rodgers and Tom Brady both appear to be leaders in touchdowns thrown in the Post Season. As seen below, these quarterbacks both also throw the least interceptions. Russell Wilson also appears to be an efficient passer in the post season, ranking 4th in touchdowns and least interceptions thrown.

#Top 5 Least Interceptions Thrown/Game 
int<-arrange(QB_POST_SUM,`AVG_INTERCEPTIONS`)
kable(int[1:5,c(1,4)], caption='Least Interceptions Thrown Per Game (2013-2016)')
Least Interceptions Thrown Per Game (2013-2016)
Name AVG_INTERCEPTIONS
Rodgers, Aaron 0.62
Brady, Tom 0.90
Roethlisberger, Ben 1.00
Wilson, Russell 1.00
Newton, Cam 1.17
dat.regseason%>%
  filter(`Name` %in% int$Name[1:5])%>%
ggplot(aes(`Ints`,fill=`Name`)) + facet_grid(~ `Name`,scales='fixed') + geom_histogram(binwidth = 1)+ggtitle("Interceptions Thrown Distribution Since 2013")+theme(plot.title = element_text(hjust = 0.5),legend.position="none")+xlab("Number of Interceptions")+ scale_fill_brewer(palette="Set2")


Pass Completion Percentage
Pass Completion Percentage Leaders (2013-2016)
Name AVG_PASS_COMPLETION_PCT
Roethlisberger, Ben 65.8
Brady, Tom 62.9
Newton, Cam 61.5
Rodgers, Aaron 61.3
Wilson, Russell 60.8

We can see that Ben Roethlisberger leads all qualifying quarterbacks in completion percentage with 65.8% of passes completed, followed closely behind by Tom Brady and Cam Newton.


Rushing Attempt Efficency and Sacks

Based on the below table, Andrew Luck, Aaron Rodgers, and Russell Wilson are all top rushing quarterbacks in the Post Season. Also note from the histogram below that Wilson and Rodgers are among the most sacked quarterbacks within post season games.

#Top 5 Rushing Yards Per Attempt 
carry<-arrange(QB_POST_SUM,desc(`AVG_YARDS_PER_CARRY`))
kable(carry[1:5,c(1,6)], caption='Rushing Yards Per Attempt Leaders (2013-2016)')
Rushing Yards Per Attempt Leaders (2013-2016)
Name AVG_YARDS_PER_CARRY
Luck, Andrew 7.1
Rodgers, Aaron 5.7
Wilson, Russell 5.6
Newton, Cam 4.4
Brady, Tom 2.5
sacks<-arrange(QB_POST_SUM,desc(`AVG_SACKS`))

ggplot(sacks[1:5,c(1,7)], aes(x=reorder(`Name`,-AVG_SACKS),y=`AVG_SACKS`,fill=`Name`))+ geom_bar(stat="identity")+ggtitle("Top 5 Most Sacked Quarterbacks Per Game (2013-2016)")+theme(plot.title = element_text(hjust = 0.5))+xlab("Name")+ylab("Average Number of Sacks/Game")

Summary

Problem Statement

How can a franchise manager or coach evaluate the consistency of a quarterback in order to win a Superbowl?


Methodology

  1. Used tidyverse to clean original quarterback game log data.

  2. Summarized qualified quarterbacks on average Touchdown Passes, Interceptions, Pass Completion Percentage, Yards Per Carry, and Sacks per game. Recorded the best value in each category to develop “consistent” standard.

  3. Graphed and displayed individual quarterback trends as well as Year and Week tendencies for all combined qualified quarterbacks to gain insight on changes from 2013-2016.


Findings

In order to evaluate quarterbacks for their “consistent” performance, we must keep the goal of winning a Super Bowl championship in mind. Evaluating a quarterback’s decision making for consistency must be addressed using both passing and rushing measures on positive yardage acquired from the play. Below are the findings of the analysis conducted:

  • A “consistent” duel threat quarterback will have the following statistics per game in the Regular Season:
    (2.24 TD,0.46 INT, 69.4% Pass Completion, 5.4 Yds/Carry)
  • A “consistent” duel threat quarterback will have the following statistics per game in the Post Season:
    (2.25 TD,0.62 INT, 65.8% Pass Completion, 7.1 Yds/Carry)
  • In the last 4 Seasons, the total number of Passing Yards have increased, moving away from 100 yard to 200 yard passing games.
  • During the Regular Season, Touchdown Passes have increased from 1 to 2 per game based on distribution.
  • There were significantly more 4 touchdown games in 2016 when compared to 2013-2015.
  • Touchdown Passes and Passing Yards seem to usually be higher in weeks 12-15.
  • Most Consistent Quarterback (2013-2016): Aaron Rodgers

While Rodgers is the 5th most sacked quarterback since 2013, he still manages to be a statistically consistent quarterback. His regular season averages, near identical to post season, are the closest to the best measures of all other quarterbacks (2.21 TD,0.46 INT, 64.8% Pass Completion, 5.4 Yds/Carry).


Conclusions/Suggestions

To a coach or team manager, making sure a quarterback can average the above statistics can prove optimal for winning a Super Bowl. Aiming for at least 2 touchdowns, 60% completion, and 5 Yds/Carry for each game, while also minimizing interceptions, will optimize quarterback performance. Further analysis that would be beneficial to this project include yards per attempt, fumble and injury mimimizing, performance for home vs away games as well as indoor vs outdoor stadiums, and offensive vs defensive scheme analysis. This analysis did assume that all game logs were complete games, and did not account for quarterbacks leaving midway injured, though starting the game.