How the NFL Rankings Are Produced

In this article, I’ll go over how the NFL rankings that ultimately get fed into the J.A.R.E.D.G.O.F.F. model for predicting spreads are created. First and foremost I need to thank the Yale University Sports Analytics Group for helping me learn how to do this kind of stuff. This model is based heavily off their NBA Rankings model, which can be found on their website.

In the near future, I will add some code for scraping the 2018 NFL game data from the web. Pro Football Reference contains everything we’ll need, but as the season moves on they change their table formatting from Home/Away columns to Winner/Loser columns. This makes it more challenging for our purposes, since the model is based off of Home and Away variables. For now, I have just been manually entering the information into a .CSV file, which looks like this.

CSV File

CSV File

Everything in the CSV File should be pretty self-explanatory. PTSA = Points for the Away team, YDSA = Yards for the Away team, TOA = Turnovers for the Away team, apply the same to the Home teams. The PICKSIX column captures all the pick sixes that occurred in a game. For each one, 6 points are removed from that teams total points, so if you noticed that some game scores don’t match what you remember from that game, that would be why. This is done to improve the model performance later, as pick sixes are not replicable.

Now that we have our CSV file completely filled out, we’ll read it into R.

data <- read.csv("Schedule18.csv")

frame <- data
frame$Away.Neutral <- as.character(frame$Away.Neutral)
frame$Home.Neutral <- as.character(frame$Home.Neutral)
frame$PTSA <- as.numeric(as.character(frame[[6]]))
frame$PTSH <- as.numeric(as.character(frame[[10]]))
frame <- frame[!is.na(frame$PTSA),]

The dataframe “frame” now holds the basic game data for the season so far. Let’s have a look at what we’re working with.

head(frame)
##   Week Day     Date   Time         Away.Neutral PTSA YDSA TOA
## 1    1 Thu 6-Sep-18 8:20PM      Atlanta Falcons   12  299   1
## 2    1 Sun 9-Sep-18 1:00PM  San Francisco 49ers   16  327   4
## 3    1 Sun 9-Sep-18 1:00PM   Cincinnati Bengals   34  330   2
## 4    1 Sun 9-Sep-18 1:00PM Tampa Bay Buccaneers   48  529   0
## 5    1 Sun 9-Sep-18 1:00PM Jacksonville Jaguars   14  305   1
## 6    1 Sun 9-Sep-18 1:00PM     Tennessee Titans   20  336   3
##          Home.Neutral PTSH YDSH TOH              PICKSIX
## 1 Philadelphia Eagles   18  232   2                     
## 2   Minnesota Vikings   18  343   1    Minnesota Vikings
## 3  Indianapolis Colts   24  380   2                     
## 4  New Orleans Saints   40  475   2                     
## 5     New York Giants   15  324   2 Jacksonville Jaguars
## 6      Miami Dolphins   27  342   2

Next we’re going to create a dataframe called games that just contains the score of each game along with the point differential.

games <- data.frame(frame$Away.Neutral,frame$PTSA,frame$Home.Neutral,frame$PTSH)
games$frame.Away.Neutral <- as.character(games$frame.Away.Neutral)
games$frame.Home.Neutral <- as.character(games$frame.Home.Neutral)
games$pt_dif <- games$frame.PTSH - games$frame.PTSA
head(games)
##     frame.Away.Neutral frame.PTSA  frame.Home.Neutral frame.PTSH pt_dif
## 1      Atlanta Falcons         12 Philadelphia Eagles         18      6
## 2  San Francisco 49ers         16   Minnesota Vikings         18      2
## 3   Cincinnati Bengals         34  Indianapolis Colts         24    -10
## 4 Tampa Bay Buccaneers         48  New Orleans Saints         40     -8
## 5 Jacksonville Jaguars         14     New York Giants         15      1
## 6     Tennessee Titans         20      Miami Dolphins         27      7

Now we can put this clean data into a final data frame that will contain 4 columns; team, opponent, ptdif and location.

team <- character(length = 2*length(games$pt_dif))
opponent <- character(length = 2*length(games$pt_dif))
location <- character(length = 2*length(games$pt_dif))
ptdif <- vector(mode='numeric',length = 2*length(games$pt_dif))

clean <- data.frame(team,opponent,location,ptdif)
clean$team <- as.character(clean$team)
clean$opponent <- as.character(clean$opponent)
clean$location <- as.character(clean$location)

for(i in 1:length(games$pt_dif)){
  
  clean$team[i] <- games$frame.Home.Neutral[i]
  clean$opponent[i] <- games$frame.Away.Neutral[i]
  clean$location[i] <- "H"
  clean$ptdif[i] <- games$pt_dif[i]
  
  
  clean$team[i + length(games$pt_dif)] <- games$frame.Away.Neutral[i]
  clean$opponent[i + length(games$pt_dif)] <- games$frame.Home.Neutral[i]
  clean$location[i + length(games$pt_dif)] <- "A"
  clean$ptdif[i + length(games$pt_dif)] <- (-1)*games$pt_dif[i]
  
  
}

#Correct for the 3 NFL games this season that were played at neutral locations
#Remember when the Chiefs & Rams game was supposed to be in Mexico?
clean$location[84] <- "N"
clean$location[95] <- "N"
clean$location[109] <- "N"
clean$location[84 + length(games$pt_dif)] <- "N"
clean$location[95 + length(games$pt_dif)] <- "N"
clean$location[109 + length(games$pt_dif)] <- "N"
head(clean)
##                  team             opponent location ptdif
## 1 Philadelphia Eagles      Atlanta Falcons        H     6
## 2   Minnesota Vikings  San Francisco 49ers        H     2
## 3  Indianapolis Colts   Cincinnati Bengals        H   -10
## 4  New Orleans Saints Tampa Bay Buccaneers        H    -8
## 5     New York Giants Jacksonville Jaguars        H     1
## 6      Miami Dolphins     Tennessee Titans        H     7

With this dataframe we can onto the creation of the rankings and model. Here’s a boxplot of the home point differentials for this season.

boxplot(clean$ptdif[1:(.5*length(clean$ptdif))], col = "blue", horizontal = TRUE, 
        main = "Home Point Differentials for 2018 NFL Season", 
        xlab = "Home Score - Away Score")
abline( v = 0, col = "red")

You can see on the boxplot that on average the home team will win the game by a few points. Interestingly enough, the Buffalo Bills are responsible for the 2 biggest outliers this season. The 44 point away loss to Baltimore and the 32 point home loss to Chicago.

Our NFL model (btw since you’re probably wondering, J.A.R.E.D.G.O.F.F. stands for Judgmental Algorithm Regressing Empirical Data for Gambling On Football Fixtures) is a linear model that uses team, opponent & location to predict point differential. We fit a coefficient for each team, as well as for home field advantage, such that the formula for a given game’s predicted point differential is R1 - R2 + H where R1 and R2 are the power ranking coefficients for home and away teams respectively. J.A.R.E.D. does this by finding the set of coefficients that minimize the residuals between predicted and actual score over the course of the entire season.

lm.NFLfootball <- lm(ptdif ~ team + opponent + location, data = clean) 
#Let's take a look at some of the values
lm.NFLfootball$coefficients[1:7]
##            (Intercept)    teamAtlanta Falcons   teamBaltimore Ravens 
##              -2.933127               7.529349              17.718959 
##      teamBuffalo Bills  teamCarolina Panthers      teamChicago Bears 
##               2.112729              10.973096              15.891987 
## teamCincinnati Bengals 
##               8.660101

Now we can create the rankings using these coefficients. Each value is linearly adjusted so that the mean of all the rankings is 0. This allows us to see how we predict a team would perform against an average NFL team.

rankings <- data.frame("team" = sort(unique(clean$team)),
                       "bs_coeff" = rep(NA, 32))
scale_factor <- mean(lm.NFLfootball$coefficients[2:32])
rankings$bs_coeff <- c(0, lm.NFLfootball$coefficients[2:32]) - scale_factor
rankings <- rankings[(order(rankings$bs_coeff, decreasing = T)),]
rankings
##                    team    bs_coeff
## 22   New Orleans Saints   9.2871188
## 18     Los Angeles Rams   9.1851058
## 16   Kansas City Chiefs   8.4993957
## 3      Baltimore Ravens   7.1796981
## 17 Los Angeles Chargers   6.3736295
## 29     Seattle Seahawks   6.1817765
## 6         Chicago Bears   5.3527263
## 27  Pittsburgh Steelers   4.4540319
## 21 New England Patriots   4.1135638
## 13       Houston Texans   2.7024486
## 14   Indianapolis Colts   1.4190302
## 10       Denver Broncos   1.3007420
## 9        Dallas Cowboys   1.2358664
## 5     Carolina Panthers   0.4338357
## 31     Tennessee Titans  -0.7311400
## 12    Green Bay Packers  -0.8130954
## 20    Minnesota Vikings  -1.3553974
## 8      Cleveland Browns  -1.3783654
## 7    Cincinnati Bengals  -1.8791591
## 23      New York Giants  -2.1822924
## 26  Philadelphia Eagles  -2.2143069
## 2       Atlanta Falcons  -3.0099120
## 30 Tampa Bay Buccaneers  -3.1657289
## 11        Detroit Lions  -3.9086921
## 15 Jacksonville Jaguars  -4.4642397
## 32  Washington Redskins  -5.4404500
## 19       Miami Dolphins  -6.0691152
## 28  San Francisco 49ers  -6.3655301
## 24        New York Jets  -6.5930722
## 4         Buffalo Bills  -8.4265314
## 25      Oakland Raiders  -9.7219409
## 1     Arizona Cardinals -10.5392606

Looks like Carolina is about as close to baseline as you’re going to get. So when reading these, we’d expect the Cardinals to lose to the average NFL team by 10.54 points and the Rams to beat the average NFL team by 9.17 points. Here is a stripchart of the rankings.

stripchart(rankings$bs_coeff, pch = 19 , col = "blue", 
xlab = "BS Coefficient", main = "BS Coefficients for 2018 Season")

Next we can create another model, based on the first model, that finds the probability that a team will win given their predicted point differential. We must add 2 new columns showing the predicted point differential of a game, as well as whether or not that team actually won. Then we fit a logistic function that takes in 1 parameter (predicted point differential) and outputs the win probability based on this season’s game data. This fucntion has limits such that as x goes to infinity or negative infinity, y will go to 1 or 0.

clean$predscore <- predict(lm.NFLfootball, newdata = clean)
clean$win <- ifelse(clean$ptdif > 0, 1, 0)
glm.pointspread <- glm(win ~ predscore, data = clean, family = "binomial"
                       ,control = list(maxit = 50))
clean$winprob <- predict(glm.pointspread, newdata = clean, type = "response")

#We can plot the win probability vs. predicted score 
plot(clean$predscore, clean$winprob, xlab = "Home Predicted Score Differential", 
     ylab = "Home Win Probability", 
     main = "Logistic Function for Predicting Games with the G.O.F.F. NFL Model", 
     pch = 4, col = c("red","green")[(clean$ptdif > 0) + 1])
     legend(5,.4,legend=c("Actual Home Win", "Actual Road Win"), fill=c("green","red"))