There are a variety of metrics used to evaluate football players and teams. Pretty simple measures, often related to total production, are offered featured on broadcasts but considered less powerful by analysts. Some of these measures are total yardage, yards per attempt, and passer rating. This document is intended to present an overview of those metrics, how they’re calculated, how important and commonly used they are, and how I could incorporate them into my analyses.
In the second section, I also include some discussion of findings which generally seems to be accepted among analysts, possibly as a result of using these metrics.
WAR applies to all players, and seeks to provide a single number that quantifies the value of a player. To do this, it reports the projected number of wins (including decimals) the player brought to a team compared to a replacement level player. Multiple sources have developed WAR metrics and do so based on different techniques and models.
One example of this is the metric created by Pro Football Focus (PFF). It does this in a multi-step process.
The exact details of how this is done are complicated, but they aren’t necessary for understanding the metric itself. I haven’t yet learned enough to tell how well thought of this measure, and this particular version of it, are compared to other similar measures. The PFF measure does display a pretty strong correlation for players between seasons which is an indicator of strength.
There are clear relationships between positions and WAR values. The positions that tend to get the highest WAR (and are therefore the most important to a team’s win percentage) are quarterback,, defensive back, wide receiver, and tight end. It has the highest variability among defensive lineman, likely due to a could standout players at those positions. One reasons lineman score lower is that offside and holder penalties really hurt them. Runningbacks also score much lower than a position like quarterback. This contributes to the general consensus that runningbacks are fairly unimportant.
WAR can also be used to predict team win totals. Overall, this particular method is only available through PFF and not likely to be included in any analysis. If available, it could be a good way to evaluate the impact of individual players. It has apparently been a go-to metric in baseball for a while, but is early in adoption for the NFL. Here’s an additional article that computes WAR for punters. This presentation also presents a more detailed exploration of how it’s generated
It seeks to measure the value of individual plays in terms of points (source). It compares the EP at the beginning and end of the play to calculate the value of each play. This helps differentiate a three yard run on first down to a three yard run on 3rd and 2. It can be used to examine what should be done on individual plays, but can also extend to value the players themselves. This can avoid the heavy influence of outlier plays on metrics such as yards gained, and captures the contextual value of plays. However, there’s still the difficulty of dividing it between players.
The post cited above also discusses some of the drawbacks. One these is that it has increased over time, reflecting increased offensive efficiency. Ways of divided the measure between teammates and the inherent subjectivity of defining things like ‘garbage time’ are also issues.
This is a classic measure of quarterback skill, dating back to the 70’s. Overall, it’s considered a bit dated and is not necessarily a favored metric among analysts, even if it is still commonly used during broadcasts. It’s calculated differently between the NFL and NCAA, but for the pros it incorporates four variables: completion percentage, yards per attempt, touchdowns per attempts, and interceptions per attempt. Wikipedia has a good specific explanation of how this works, but essentially the measures are weighted, scored, and combined to produce a passer rating with a maximum value of 158.3
One issue with this measure is that quarterback performance has improved over time in the NFL, resulted in higher average scores, yet the measure has stayed the same. However, the Wikipedia entry cited above also mentions a .793 correlation between the qb that posts the highest rating and the qb that wins. A slightly adjusted version of this metric, with plays removed that are out of the qb’s control (i.e. drops and others) but that uses the same scale, is called the Independent Quarterback Rating (IQR). Another measure that is somewhat different is Adjusted Net Yards per Attempt (ANY/A), with major differences being that it incorporates sacks and doesn’t have a finite scale.
ESPN has a good explainer of QBR. It’s designed to examine all of a quarterbacks contributions, put plays in their proper context, and reflects that teammates also influence what happens on the field. It incorporates thing like rushes, sacks, turnovers, and penalties. It also makes sure to weight a five-yard first down pass and a five-yard completion short of a first-down differently (context). Same thing with a redzone interception and a hail mary interception before halftime.
In general, the measure begins with asking: how successful was the play for the team, given its context? The change in expected points caused by the play can be estimated. Then, it’s designed to separate the role of the quarterback from his teammates. This is done by dividing the play’s EPA among all teammates taking into account things like YAC or QB pressure. The article explains in more depth. Overall, this should be thought of as an efficiency stat rather than a value stat. Even if one quarterback produces more total value (for instance by being part of more plays), he may have less value per play than his opponent. ESPN scales this total efficiency value to between 0-100 using logistic regression.
The Sharp Football Analysis article discusses this together with Total QBR because they share similar modeling strategies. Like QBR, it uses EPA and divide credit among the team for each play. It also attempts to statistics such as YAC and air yards to apportion credit for a play. However, this method includes slightly more inputs because it was designed to evaluate all players, not just QBs.
Several useful figures and tables from Sharp Football Analysis are included below that give some indication of the validity of these measures for quarterback evaluation. Read the article for more specific information about each one. The first table summarizes which specific facets of the game are reflected in each metric.
There are several possible ways to evaluate the effectiveness of different metrics. One assumption would be that an effective metric would correlate highly with the stats and important measures such as points scored or victories. Another assumption is that an accurate metric would be fairly consistent for each player year-by-year. The next figure and table evaluate different metrics based on these assumptions.
Next, this figure shows the correlation between each measure.
Third, this table shows how strongly player scores using each metric are correlated between seasons. An accurate metric that provides insight into the value of the a player should be fairly consistent between years, although of course ‘player value’ does vary to some extent between years. The table shows the correlation of the metric when calculated for just passing plays and for all offensive plays. Overall, they’re all moderately correlated and predictive of future values.
Finally, this article concludes that any of these measures could be justifiably used to evaluate performance. Still, it argues that ANY/A is a strong method because it’s easily interpretable and accounts for sacks. For metrics with more modern analytical underpinnings, IQR and EPA are strong, while they are slightly harder to ‘see’ on the field. EPA is more all-encompassing, while IQR focuses more on throwing. For my own analysis, it may be best to always try to use one easily interpretable ones and one less easy one.
Pro Football Focus produces player grades for each play. However, they aren’t public data so I’ll only mention them here for now.
This is used by 538 and measures strength based on game-by-game results. It can apply to teams, but I don’t think players. It works similar to ELO used in competitive gaming rankings. Teams gain and lose based on the strength of their results and how unexpected the results are. At the end of each season, team ratings are regressed back towards the mean (methodology explained more here).
So far, most of these stats have primarily applied to QBs. I’ll have to spend more time looking at RB stats later, but some to think about may be receiving-related stats, pass blocking efficiency, and various measures of elusiveness. Some dicussion of these terms is found here. The site has similar discussions of WR metrics.
There are many other stats that are important to remember, but aren’t worth an entire section on their own. Many of these are available through Next Gen States are defined in its glossary. Many of these debuted for the 2020 season, so that’s something to keep in mind.
Passing
Rushing
Receiving Stats
This is a measure used to assess the strength of defenders covering the pass. It estimates the number of incompletions over expected a team creates, than assigns them to the nearest defender. See this 538 article. This helps grade coverage, which is otherwise quite hard. It’s hard for a couple reasons. One, good coverage often means the ball will be thrown somewhere else, and more important metrics like breakups and interceptions are fairly rare. However, the strength of this metric seems like it varies by coverage style, particularly being more accurate in man coverage. Also, it is not stable or predictive year by year.
This metric is used to evaluate the performance of offensive lines in the run game. THis is created using a regression model that assigns the line a portion of a run’s value. Essentially, it assigns blame for losses, gives credit for mid-yardage runs, and gives the line no credit for long runs (see here). This source also has this data for each team! One problem is that it doesn’t account for the number of defenders the line faced (538).
Advanced Football Analytics also has a glossary that includes additional stats.
I came across a number of interesting insights while looking through these measures and how they’re used. Some short summaries of these are below.
One takeaway from the articles in that runningbacks aren’t considered to be very important for a team. See articles here and here. This generally seems to be because their production is very dependent on offensive line play, much more than quarterback performance, for instance.
This analysis, which was referenced as evidence by several other major analysts, uses statistics to present evidence that sacks are mainly the fault of quarterbacks. It cites examples of QBs changing teams and teams changing QBs to support this conclusion. It also tests the yearly variation between common quarterback metrics like completion percentage by the same player and finds the sack rate is similarly consistent. This could be due to QB characteristics like pocket awareness and pre/post-snap reads.
Defensive performance is more variable from year-to-year than offensive performance. Another article by Josh at 538 explains this pretty well. The table below shows this. Key offensive stats are about twice as easy to predict. This seems to have pretty large ramifications for teams that succeed because of their defense. However, quarterback hits do correlate pretty well, making that a better indicator.
The most important factor for rushing success is the number of men in the box. Knowing just that and field position, you can explain 96% of the variation in yards per carry (538).
In particular, 538 has a couple good articles about eh 2018 Rams, which have already been cited above. I’ll also link them here for ease of access.