R Bridge Week 4 Assignment

Logan Thomson

Loading ggplot2

Since we are not using base R to form our plots, we need to load the ggplot2 package to make sure that our plots appear in the published R markdown file.

library(ggplot2)

Loading the Data

Now we can load the data we want to plot. Here, I found a .csv file of every play from every game held during the 2014 National Football League (NFL) season. The data can be found at http://nflsavant.com/about.php

nflplays <- read.csv('https://raw.githubusercontent.com/Logan213/MSDA_R_Bridge_Wk4/master/pbp-2014.csv')

Here we can see the first couple of rows of the data, and the first seven columns by using the square brackets after the nflplays object which has stored the data from the .csv file.

head(nflplays[1:7])
##       GameId   GameDate Quarter Minute Second OffenseTeam DefenseTeam
## 1 2014090709 2014-09-07       2      1     57                     MIN
## 2 2014090711 2014-09-07       4      0      0                     CAR
## 3 2014090702 2014-09-07       4      4      2         BUF         CHI
## 4 2014090703 2014-09-07       2      0      4                     WAS
## 5 2014091100 2014-09-11       4      0     31         BAL         PIT
## 6 2014091100 2014-09-11       4      0      0                     PIT

Using the summary function, we can get a summary of the data contained in the different columns. Since the file had so many columns for data, following is a preview of the columns containing infomation such as Yards (gained/lost on play), play formation, etc.

summary(nflplays[20:25])
##      Yards                     Formation         PlayType    
##  Min.   :-20.000                    :  692   PASS    :18881  
##  1st Qu.:  0.000   FIELD GOAL       : 1006   RUSH    :12765  
##  Median :  0.000   NO HUDDLE        :  873   KICK OFF: 2635  
##  Mean   :  4.212   NO HUDDLE SHOTGUN: 3518   PUNT    : 2511  
##  3rd Qu.:  6.000   PUNT             : 2397   TIMEOUT : 1715  
##  Max.   :102.000   SHOTGUN          :17111           : 1462  
##                    UNDER CENTER     :20098   (Other) : 5726  
##      IsRush           IsPass        IsIncomplete   
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.2951   Mean   :0.4132   Mean   :0.1504  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
## 

Plotting the Data

Boxplots

First, I created a series of boxplots showing the different play formations (in general, so no 2-tight end sets, etc.), and the number of yards gained or lost from that formation. Most plays in the NFL result in zero yards, and most plays are run from the Quarterback “Under Center” formation, which is why the mean is pretty much zero for that boxplot.

ggplot(nflplays, aes(y = Yards, x = Formation), size=10) + geom_boxplot()

Histogram

Following is a histogram of the yards gained on every NFL play. The staggering number of plays resulting in no gain skews the plot greatly

qplot(Yards, data=nflplays, binwidth = 1, main='Distribution of Yards Gained')

We can use the subset function to create a different dataset that will exclude any plays resulting in no gained or lost yardage:

nonzeroyds <- subset(nflplays, Yards !=0)

And then plot that data, setting the bin size to 1 so we can see each gain of yardage as its own bin. Removing the zero yard plays allows us to see the rest of the data a little easier.

qplot(Yards, data=nonzeroyds, bin=1)

Scatterplot

Lastly is a scatterplot created using the “absolute” Yardline (0-100 yards on the field) versus the fixed Yardline (0-50, then designated by “Own” or “Opponent’s”). Because there are so many points, this may not be the best way to present the data. However, it is intersting to look at the plot for a few reasons.

There are only a finite number of yards to gain on a play, so any play resutling in a touchdown from that yard marker will be plotted as 100-YardLine, so there is a linear plot representing all scoring plays with the offense possesing the ball. Any plots above this line are turnovers resulting in a score (fumble or interception resulting in a score).

qplot(YardLine, Yards, data=nflplays, color = YardLine, main='Field Location and Length of Play')