In this lab we’ll use decision trees to predict the outcomes of NFL plays. We’ll be using the 2015 NFL play-by-play data that we used earlier in the year when looking at NFL penalties.
Let’s load the data and take a look.
nfl <- read.csv('/home/rstudioshared/shared_files/data/NFLPlaybyPlay2015.csv')
View(nfl)
We’ll also need dplyr and the decision tree packages.
library(dplyr);library(rpart); library(rpart.plot)
Let’s imagine that we want to predict whether the upcoming play is a run or a pass. First, we’ll eliminate special teams plays and create a column run which takes on a value of 1 if the team ran the ball and 0 otherwise.
nfl.run.or.pass <- nfl %>% filter(PlayType %in% c("Run", "Pass")) %>% mutate(run = ifelse(PlayType=="Run", 1, 0))
Now, let’s try to predict whether a team will run the ball using the down and distance.
fit <- rpart(run ~ down+ydstogo,data=nfl.run.or.pass, cp=0.01)
prp(fit, type=1, fallen.leaves=TRUE, extra=1, cex=0.7)
Q1: Explain your results.
Q2: Try adding the quarter and score differential to your model (“qtr” and “ScoreDiff”). You may want to reduce the complexity parameter. What does your model show?
You could also try adding the offensive team (“posteam”) to your model. Are there any other variables that help you predict whether a team will run the ball?
Let’s create a data.frame with only those plays in which a team went for it on 4th down and create a column “got.it” that takes the value of 1 if the team succeeded and 0 otherwise.
go.on.4th <- nfl.run.or.pass %>% filter(down==4 & ydstogo >0) %>% mutate(got.it = as.numeric(Yards.Gained >= ydstogo))
Build a model to predict “got.it”, whether the offensive team succeeded on 4th down. You should use only information that was known prior to the snap. In other words, you shouldn’t use the number of yards that the team gained as part of your model. Try tweaking the complexity parameter to create a model that is sufficiently complex but not too complex.
Q3: Paste your decision tree into the Google doc with your answers and describe your model to predict success on 4th downs.
This time, we’ll make a data.frame of only pass plays:
pass <- nfl.run.or.pass <- nfl %>% filter(PlayType %in% c("Pass"))
Build a model to predict “InterceptionThrown”. Once again you should use only information that was known prior to the snap.
Q4: Paste your decision tree into the Google doc with your answers and describe your model to predict interceptions.