The National Football League is one of the most popular sports leagues in modern day America and, due to its dedicated fans, produces millions of dollars in revenue yearly. Consisting of 32 franchises, the yearly goal of winning the Superbowl is heavily reliant on the efficiency and consistency of the Quarterback. Though NFL teams with top ranked defenses have been known to win championships, quarterback performance is still important in order for a team to beat the best of the best, and it is unsurprisingly one of the highest paid positions on the team. Thus, how can a franchise manager or coach evaluate the consistency of a quarterback in order to win a Superbowl?
Like many other sports fans, I have found myself on both ends: happy with a teams performance, or flat out frustrated that they did not do as well as expected. I am big fan of football, specifically the NFL and am interested in exploring how to quantify performance and ease the frustrations felt by sports fans when their team’s quarterback does not perform as expected.
The data used in this project was retrieved from Kaggle, but was derived from the NFL official website, and consists of 29 variables with 40247 observations. With this data set having records for quarterbacks from 1970-2016, there will be a reduction in data size after subsetting down to current starting quarterbacks that have played for at least 3 years. There are 8 character variables and 21 numeric variables in total.
This project hopes to explore how consistent the current starting quarterbacks in the NFL based on the number of starts, touchdown passes, completion percentage, performance during away vs home games, and more factors.
library(tidyverse) #ggplot, dplyr, readr
library(DT) #Data Presenting
I will be utilizing the tidyverse and DT packages in R for my analysis and findings.
setwd("~/BAN 6003 Intro to R/Project")
qb.orig<-read_csv("Game_Logs_Quarterback.csv")
char=c(1:3,5,7:11) #Character variable column numbers
num=c(4,6,12:29) #Numeric variable column numbers
qb.orig[,char]<-sapply(qb.orig[, char], as.character) #Converting to character
qb.orig[,num]<- sapply(qb.orig[, num], as.numeric) #Converting to numeric
Unfortunately, there was no “Current Player” or “Starter” variable in my data set. I had to subset my original data (which had all quarterbacks that played since 1970) and thus, decided to create a variable with all names of the 32 current starting quarterbacks in the NFL.
starting_qbs<-c('Brady, Tom','Roethlisberger, Ben',
'Wilson, Russell','Brees, Drew','Newton, Cam','Rodgers, Aaron',
'Carr, Derek','Ryan, Matt','Bradford, Sam','Wentz, Carson',
'Rivers, Philip','Luck, Andrew','Palmer, Carson',
'Stafford, Matthew','Prescott, Dak','Dalton, Andy',
'Winston, Jameis','Smith, Alex',
'Flacco, Joe','Taylor, Tyrod','Tannehill, Ryan','Manning, Eli',
'Bortles, Blake','Mariota, Marcus','Cousins, Kirk',
'Fitzpatrick, Ryan','Siemian, Trevor','Hoyer, Brian',
'Osweiler, Brock','Kessler, Cody','Keenum, Case',
'Gabbert, Blaine')
length(starting_qbs) #32 Starting Quarterbacks in the NFL
## [1] 32
I then filtered only observations including the starting 32 Quarterbacks, only games played within the last 4 years, and games played in both the regular and post season.
dat=qb.orig %>%
filter(`Name` %in% starting_qbs)%>%
filter(`Season` %in% c('Regular Season','Postseason'))%>%
filter(`Year`>=2013)
I will compare my final results I make based on two subsets of my data, regular season games played and post season games played.
Regular Season data will be subset and reduced to starters that have played in 85% of NFL games in the last 4 seasons, or 55 games. In order to be a consistent quarterback, the player must be available to start the game and play to begin with. Though I did leave room for the unexpected injuries to occur to these players, availability to start a game is a necessity. I also replaced the NA values to 0 for Rushing Attempts, Rushing Yards,Yards Per Carry, Rushing TDs, Fumbles, and Fumbles Lost.
dat.regseason=dat %>%
filter(`Season`=='Regular Season')%>%
filter(`Games Started`==1)%>%
group_by(`Player Id`) %>%
filter(sum(`Games Started`) >= 55)
dat.regseason[is.na(dat.regseason)] <- 0
Here are all Quarterbacks that meet the criteria of being a “consistent” regular season starter.
## [1] "Brees, Drew" "Rivers, Philip" "Wilson, Russell"
## [4] "Smith, Alex" "Stafford, Matthew" "Brady, Tom"
## [7] "Ryan, Matt" "Roethlisberger, Ben" "Manning, Eli"
## [10] "Tannehill, Ryan" "Newton, Cam" "Dalton, Andy"
## [13] "Flacco, Joe" "Rodgers, Aaron"
Post Season data derived from the original will be a subset to only show quarterbacks that have started at least 5 post season games since 2013. Below are the names of the qualifying quarterbacks as well as the final post season game log data set.
dat.postseason=dat %>%
filter(`Season`=='Postseason')%>%
group_by(`Player Id`) %>%
filter(sum(`Games Started`) >= 5) #Played at least 5 post season games in the last 4 years
unique(dat.postseason$`Name`)
## [1] "Wilson, Russell" "Brady, Tom" "Roethlisberger, Ben"
## [4] "Newton, Cam" "Luck, Andrew" "Rodgers, Aaron"
datatable(dat.postseason,caption='Post Season Data')
To be continued…