Problem Statement

The National Football League is one of the most popular sports leagues in modern day America and, due to its dedicated fans, produces millions of dollars in revenue yearly. Consisting of 32 franchises, the yearly goal of winning the Superbowl is heavily reliant on the efficiency and consistency of the Quarterback. Though NFL teams with top ranked defenses have been known to win championships, quarterback performance is still important in order for a team to beat the best of the best, and it is unsurprisingly one of the highest paid positions on the team. Thus, how can a franchise manager or coach evaluate the consistency of a quarterback in order to win a Superbowl?

Credit: USA Today

Synopsis/Data

Introduction

Like many other sports fans, I have found myself on both ends: happy with a teams performance, or flat out frustrated that they did not do as well as expected. I am big fan of football, specifically the NFL and am interested in exploring how to quantify performance and ease the frustrations felt by sports fans when their team’s quarterback does not perform as expected.

Original Data

The data used in this project was retrieved from Kaggle, but was derived from the NFL official website, and consists of 29 variables with 40247 observations. With this data set having records for quarterbacks from 1970-2016, there will be a reduction in data size after subsetting down to current starting quarterbacks that have played for at least 3 years. There are 8 character variables and 21 numeric variables in total.

This project hopes to explore how consistent the current starting quarterbacks in the NFL based on the number of starts, touchdown passes, completion percentage, performance during away vs home games, and more factors.

Data Dictionary

Data Dictionary

Packages For Analysis

Tidyverse
library(tidyverse) #ggplot, dplyr, readr
library(DT) #Data Presenting 

I will be utilizing the tidyverse and DT packages in R for my analysis and findings.

Data Cleaning

Importing

setwd("~/BAN 6003 Intro to R/Project")
qb.orig<-read_csv("Game_Logs_Quarterback.csv")


char=c(1:3,5,7:11) #Character variable column numbers
num=c(4,6,12:29)   #Numeric variable column numbers


qb.orig[,char]<-sapply(qb.orig[, char], as.character) #Converting to character
qb.orig[,num]<- sapply(qb.orig[, num], as.numeric) #Converting to numeric
  • I created two new variables called “char” and “num” that could define what character and numeric variables I have in my data based on column number. I then used the sapply function to apply character and numeric formats to my columns.

Cleaning and Filtering

Unfortunately, there was no “Current Player” or “Starter” variable in my data set. I had to subset my original data (which had all quarterbacks that played since 1970) and thus, decided to create a variable with all names of the 32 current starting quarterbacks in the NFL.

starting_qbs<-c('Brady, Tom','Roethlisberger, Ben',
'Wilson, Russell','Brees, Drew','Newton, Cam','Rodgers, Aaron',
'Carr, Derek','Ryan, Matt','Bradford, Sam','Wentz, Carson',
'Rivers, Philip','Luck, Andrew','Palmer, Carson',
'Stafford, Matthew','Prescott, Dak','Dalton, Andy',
'Winston, Jameis','Smith, Alex',
'Flacco, Joe','Taylor, Tyrod','Tannehill, Ryan','Manning, Eli',
'Bortles, Blake','Mariota, Marcus','Cousins, Kirk',
'Fitzpatrick, Ryan','Siemian, Trevor','Hoyer, Brian',
'Osweiler, Brock','Kessler, Cody','Keenum, Case',
'Gabbert, Blaine')
length(starting_qbs) #32 Starting Quarterbacks in the NFL
## [1] 32

I then filtered only observations including the starting 32 Quarterbacks, only games played within the last 4 years, and games played in both the regular and post season.

dat=qb.orig %>%
  filter(`Name` %in% starting_qbs)%>%
  filter(`Season` %in% c('Regular Season','Postseason'))%>%
  filter(`Year`>=2013)

Regular Season and Post Season Data Sets

I will compare my final results I make based on two subsets of my data, regular season games played and post season games played.

Regular Season

Regular Season data will be subset and reduced to starters that have played in 85% of NFL games in the last 4 seasons, or 55 games. In order to be a consistent quarterback, the player must be available to start the game and play to begin with. Though I did leave room for the unexpected injuries to occur to these players, availability to start a game is a necessity. I also replaced the NA values to 0 for Rushing Attempts, Rushing Yards,Yards Per Carry, Rushing TDs, Fumbles, and Fumbles Lost.

dat.regseason=dat %>%  
  filter(`Season`=='Regular Season')%>%
  filter(`Games Started`==1)%>%
  group_by(`Player Id`) %>%
  filter(sum(`Games Started`) >= 55)

dat.regseason[is.na(dat.regseason)] <- 0 

Here are all Quarterbacks that meet the criteria of being a “consistent” regular season starter.

##  [1] "Brees, Drew"         "Rivers, Philip"      "Wilson, Russell"    
##  [4] "Smith, Alex"         "Stafford, Matthew"   "Brady, Tom"         
##  [7] "Ryan, Matt"          "Roethlisberger, Ben" "Manning, Eli"       
## [10] "Tannehill, Ryan"     "Newton, Cam"         "Dalton, Andy"       
## [13] "Flacco, Joe"         "Rodgers, Aaron"

Post Season

Post Season data derived from the original will be a subset to only show quarterbacks that have started at least 5 post season games since 2013. Below are the names of the qualifying quarterbacks as well as the final post season game log data set.

dat.postseason=dat %>%
  filter(`Season`=='Postseason')%>%
  group_by(`Player Id`) %>%
  filter(sum(`Games Started`) >= 5)  #Played at least 5 post season games in the last 4 years
 
unique(dat.postseason$`Name`)
## [1] "Wilson, Russell"     "Brady, Tom"          "Roethlisberger, Ben"
## [4] "Newton, Cam"         "Luck, Andrew"        "Rodgers, Aaron"
datatable(dat.postseason,caption='Post Season Data')

Exploratory Analysis

Regular Season

Post Season

Summary

To be continued…