Assignment 6: 2018 Advanced Defensive NFL Data

Introduction

This document will be looking back at 2018 NFL Defensive Player advanced statistics, scraped from Pro Football Reference. I will be looking at different trends from individual players and comparing how they performed during that season.

Packages

There are a couple of packages that will be critical to use for my analysis:

## -- Attaching packages ---------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0     v purrr   0.3.3
## v tibble  3.0.0     v dplyr   0.8.5
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor

Importing the Dataset

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   Player = col_character(),
##   Tm = col_character(),
##   Pos = col_character(),
##   `Cmp%` = col_character(),
##   `MTkl%` = col_character()
## )
## See spec(...) for full column specifications.

My Analysis

Question 1

Question 1: Who were the best players in pass coverage during the 2018 NFL Season. I will be using statistics such as passer rating allowed, ADOT (Average Depth of Target), cmp% (Completion Percentage), and TD Allowed to evaluate how well a player was in pass coverage using Pro Football Reference’s Advanced Statistics page.

Data Process: I will be using the scraped data from Pro Football Reference and generate some graphs using ggplot in order to further my analysis. Additionally, I will create some formulas to look at a per-game basis, a better indicator of a defender’s success over an entire year’s stats

Formulas Created: Yards Allowed Per Game: Yards / Games Played Air Yards Allowed Per Game: Air Yards / Games Played

Analysis:

The Results: Looking on a per game basis and looking at each team, I have noticed that typically Air Yards Allowed Per Game and Average Yards Allowed Per Game are correlated, with a few exceptions such as the San Fransisco 49ers and the Seattle Seahawks. One thing to point out on Air Yards is that it is how far the ball is traveled in the air before getting caught, which could indicate some pass coverage breakdowns if heaved and completed at a long rate.

Statistical/Analytical Method: I would go further into this by trying to scrape some play-by-play data using the nflscrapR Package and see if correlation is the same on a per game basis as on a per-play-basis. I would also run a T-Test and look at p-values to find indicators that are evident to how Air Yards Allowed per game contributes to yards allowed per game.

The Results: Similiar to comparing Air Yards Allowed Per Game and Passing Yards Allowed per game, there is a pretty good indication that if you are allowing a completion that has a higher average depth of target, there is also a more than likely chance that it will be completed longer. Not doing on this on a play-by-play basis is more harder to determine expectancy between each team, but we can see that teams such as the Giants, Panthers, Broncos, and Vikings all seem to have similarities when it comes to allowing depth of target and allowing a pass to be completed downfield.

Analytical Method: With further analysis, I would use an ANOVA test to test freqeuncy on what is a root cause behind a ball traveling farther in the real may or will result in a longer completed pass than a shorter one. I will also look to do a regression analysis and then dive into individual players to see how each team fares based on these defensive passing metrics.

Question 2

Question 2: I will be looking at the efficiency of tackling and see if the opportunities of getting a tackle increase of decrease the chances of a defensive player missing a tackle on the ball carrier? I am curious to see how it is on a per 16 game basis, using this data.

Data Process: Using scraped data from Pro Football Reference of this 2018 season will help identify who are some of the best and some of the worst based on the number of combined tackles made by a defender during the 2018 NFL Season. I will be using combined tackles (Comb), Miss Tackle Percentage (Mtkl%), the age of the defender, and will also be looking by position as well.

Data Wrangling: I will be using the dplyr function and also the ggplot2 function for this part of the analysis. In addition, I have to change the MTKL% to a numeric factor in order to use it in my ggplot graph.

Analysis:

Results: One of the things that I noticed about this graph was that it was very sporadic when comparing the three different positons I filtered. I would like to point out that tackling in the NFL does not seem consistent, as shown by this 2018 Data. I notcied that as the more amount of tackles are being made by a player, specifically for positions that do not even get that many opportunities such as DE (Defensive End), I am not surprised since teams have been running the ball less and throwing the ball more. As shown with LB (Linebackers), there will be more frequent times a LB does not miss a tackle compared to positions such as CB, who are not known as a tackling position.

Statistical/Analytical Method: I will also look to the nflscrapR Package to dive deeper into this data and look at a play-by-play basis and see if there is the same correlation. I could also run an ANOVA test to test variance between different players and the frequency of which a player either misses a tackle or not. I can also do a regression analysis and find the relationship between one or more independent variables and forecast to see how efficient a defensive player is at tackling.

Results: Looking at different positions this time, I decided to compare between at least one position that plays on each level of the defense. Looking at DT (Defensive Tackle), I noticed that due to the lack of opportunties to tackle ball carriers, it was not as well fit. They do not miss the ball carrier too much, one reason just to use from this data is that playing on the front seven helps miss less tackles then playing in the backend of the secondary, such as the FS (Free Safety) and the SS (Strong Safety).

Statistical/Analytical Method: Similiar to as what I was saying on my other graph, I will plan on looking to dive more into the nflscrapR Package to dive deeper into this data and look at a play-by-play basis and see if there is the same correlation. I could also run an ANOVA test to test variance between different players and the frequency of which a player either misses a tackle or not. I can also do a regression analysis and find the relationship between one or more independent variables and forecast to see how efficient a defensive player is at tackling.

Question 3

Question 3: : What defensive lineman are the best at pressuring the QB in 2018?

Data Process: Will use scraped data from Pro Football Reference to find out who had the most pressures and was productive at rushing the passer during the 2018 NFL Season. I will be filtering by position, specifically the defensive line using dplyr and graphing through ggplot2 package.

Data Wrangling: I will not need to do much data wrangling for this type of analysis, but will be using the both the dplyr and the ggplot2 package for this type of analysis.

Analysis:

Player Prss Pos Tm Bltz Hrry Sk
Aaron Donald*+ 70 DT LAR 0 30 20.5
J.J. Watt*+ 60 DE HOU 0 32 16.0
Dee Ford* 54 LOLB/rolb KAN 50 23 13.0
Yannick Ngakoue 51 DE JAX 8 21 9.5
Chris Jones 49 DE KAN 0 19 15.5
Jadeveon Clowney* 48 de/LB HOU 20 27 9.0
Myles Garrett* 48 DE CLE 19 19 13.5
Frank Clark 48 DE/lolb SEA 2 22 13.0
Khalil Mack*+ 47 LB CHI 19 26 12.5
Michael Bennett 46 DE PHI 0 18 9.0
Fletcher Cox*+ 45 DT PHI 0 15 10.5
Cameron Jordan* 43 DE NOR 2 24 12.0
DeMarcus Lawrence* 42 DE DAL 0 20 10.5
Chandler Jones 39 DE ARI 6 19 13.0
Jerry Hughes 39 DE BUF 3 21 7.0
Danielle Hunter* 38 DE MIN 0 20 14.5
T.J. Watt* 38 de/LOLB PIT 46 18 13.0
Carlos Dunlap 38 DE CIN 1 18 8.0
Terrell Suggs 37 dl/dt/LB BAL 86 24 7.0
Bradley Chubb 36 LB DEN 66 15 12.0

Results: The majority of defensive lineman produced the most amount of pressures during the 2018 season. Once again, as shown in his Pro Football Focus grades, Aaron Donald is a one man wrecking crew for the Los Angeles Rams, piling up 70 pressures and 20 sacks during the 2018 campaign. I also notice here that the Kansas City Chiefs, Philadelphia Eagles, and Houston Texans each have two players ranked in the top 15 in pressures during the entire 2018 season, something that gives their team an edge when trying to force the QB to make a poor decision.

Statistical Method/Analysis: I will look to dive deeper into the pressure data by going into the NFLscrapR Package and then performing some regression analysis, as well as performing a nonlinear model and possibly use a piecewise function to help describe the nature of finding different methods in which a defensive player can be good at rushing the passer.