Introduction

For this lab, I wanted to model something that gets closer to how offensive consistency actually works in football. Instead of using a broad outcome like total yards or total points, I built a custom team-level metric called First Down Success Rate and then regressed it on several offensive indicators from nflreadr.

The idea behind this metric is simple. A strong offense is not just one that occasionally hits big plays. A strong offense is one that consistently stays on schedule, creates manageable situations, and gives itself a better chance to sustain drives. First Down Success Rate is meant to capture that idea at the play level and then summarize it across a full season.

What First Down Success Rate Means

First Down Success Rate is a play-based measure of whether an offense gained enough yardage on a given play to be considered meaningfully successful relative to the down and distance.

A play is counted as a success using the following rules:

This framework reflects normal football logic. On first down, an offense does not need to gain every yard immediately for the play to be useful. A moderate gain can still keep the offense on schedule and preserve flexibility. On second down, the threshold becomes more demanding because there is less room for error. By third and fourth down, the offense has to fully convert.

So the process works in two steps. First, each run or pass play is labeled as either a 1 for success or a 0 for failure based on those down-and-distance rules. Then those success indicators are averaged across the season for each team. That produces a season-long First Down Success Rate.

In practical terms, if a team has a First Down Success Rate of 48%, that means 48% of its offensive run and pass plays met the success threshold for their specific situation.

Data Construction

I limited the analysis to regular season offensive run and pass plays and summarized the data at the team-season level. The independent variables were chosen to capture different dimensions of offensive performance. EPA per play measures overall efficiency, turnover rate captures mistakes that can disrupt or end drives, explosive play rate measures how often an offense generated a gain of at least 20 yards, and red zone success reflects how often an offense still produced successful plays once the field became compressed inside the 20-yard line.

Team Rankings Table

The following table provides a full team-by-team ranking based on First Down Success Rate. It is meant to give a clean descriptive overview of where each offense stood before moving into the scatter plot and regression analysis. In addition to the custom success metric, the table also includes the main independent variables used later in the model so it is easier to see how efficiency, explosiveness, turnovers, and red zone execution vary across teams.

Rank Team Success Rate EPA/Play Turnover Rate Explosive Rate Red Zone Success
1 Los Angeles Rams 57.0% 0.145 1.4% 8.0% 55.6%
2 Buffalo Bills 53.7% 0.141 1.7% 7.3% 53.7%
3 San Francisco 49ers 53.1% 0.089 2.1% 5.2% 50.0%
4 New England Patriots 52.5% 0.157 1.6% 8.4% 47.4%
5 Green Bay Packers 52.1% 0.113 1.3% 6.5% 49.7%
6 Dallas Cowboys 51.0% 0.095 1.9% 6.3% 42.9%
7 Baltimore Ravens 49.8% 0.034 2.3% 7.1% 42.3%
8 Indianapolis Colts 49.7% 0.075 2.1% 5.4% 47.6%
9 Washington Commanders 49.4% 0.007 2.2% 5.2% 49.0%
10 Kansas City Chiefs 49.3% 0.034 1.4% 5.3% 45.2%
11 Cincinnati Bengals 49.2% −0.008 2.3% 5.3% 51.4%
12 Chicago Bears 48.6% 0.080 1.0% 6.5% 48.9%
13 Seattle Seahawks 48.6% 0.030 2.6% 7.5% 43.4%
14 Detroit Lions 48.4% 0.081 1.4% 7.6% 43.2%
15 Jacksonville Jaguars 48.3% 0.033 1.7% 6.2% 50.7%
16 Atlanta Falcons 48.2% −0.022 1.6% 5.4% 46.4%
17 Denver Broncos 48.0% 0.048 1.4% 5.4% 46.3%
18 Pittsburgh Steelers 47.8% 0.025 1.4% 5.3% 48.8%
19 Los Angeles Chargers 47.5% −0.024 1.8% 5.5% 42.4%
20 Carolina Panthers 47.3% −0.037 2.2% 5.2% 43.8%
21 Philadelphia Eagles 46.6% 0.027 1.2% 5.1% 50.3%
22 Miami Dolphins 46.5% −0.016 2.4% 6.3% 48.6%
23 New York Giants 46.1% 0.010 1.5% 5.8% 31.5%
24 Arizona Cardinals 46.1% −0.017 1.8% 5.3% 44.2%
25 New Orleans Saints 45.5% −0.086 2.2% 3.6% 37.4%
26 Minnesota Vikings 45.5% −0.116 2.9% 4.9% 39.7%
27 Tampa Bay Buccaneers 45.0% −0.006 1.5% 6.1% 41.1%
28 Houston Texans 44.6% −0.010 1.0% 5.2% 35.6%
29 New York Jets 44.1% −0.132 2.1% 3.9% 40.5%
30 Las Vegas Raiders 42.4% −0.215 2.3% 4.5% 38.6%
31 Tennessee Titans 40.8% −0.160 1.8% 5.1% 45.5%
32 Cleveland Browns 37.9% −0.189 2.3% 4.0% 36.8%

This table ranks all teams by First Down Success Rate, which is the share of offensive plays that met the success benchmark for their down and distance. Explosive Rate shows the share of plays that gained at least 20 yards, so it represents how often an offense generated chunk production rather than steady incremental gains. The color scale is meant to make the table easier to read at a glance, with stronger values standing out more clearly and weaker values fading into lighter shades.

This visual shows how First Down Success Rate aligns with EPA per play across the league. Teams in the upper-right quadrant were above average in both efficiency and down-to-down consistency, while teams in the lower-left quadrant lagged in both areas. The fitted line reinforces that there is a positive league-wide relationship between the two, even if there is still meaningful variation from team to team.

Initial Regression Model

Variable EPA per Play Turnover Rate Explosive Play Rate Red Zone Success
epa_per_play 1.000 -0.478 0.759 0.585
turnover_rate -0.478 1.000 -0.254 -0.194
explosive_play_rate 0.759 -0.254 1.000 0.429
red_zone_success 0.585 -0.194 0.429 1.000

The correlation matrix is used here as a diagnostic tool rather than a result of substantive interest on its own. In regression, the main concern is not simply whether predictors are related, because some relationship among offensive variables is expected. The real issue is whether that relationship becomes strong enough that the model struggles to separate the individual contribution of each predictor. That is the core problem of multicollinearity. When two independent variables move too closely together, the model can still fit well overall, but the coefficient estimates become less stable and harder to interpret cleanly.

From an analytical standpoint, the most important relationship in this matrix is the one between EPA per play and explosive play rate. That pairing deserves attention because both variables reflect offensive efficiency and the ability to generate valuable outcomes. A stronger correlation there would suggest that both variables are partly measuring overlapping versions of the same offensive quality. That does not automatically mean one of them must be removed, but it does mean the model should be evaluated carefully before treating both coefficients as fully distinct effects.

Variable VIF
EPA per Play 3.716
Turnover Rate 1.368
Explosive Play Rate 2.455
Red Zone Success 1.548

The Variance Inflation Factor, or VIF, gives a more targeted diagnostic for multicollinearity by measuring how much the variance of a coefficient is inflated because that predictor overlaps with the others in the model. Conceptually, it asks whether a variable is bringing in distinct information or whether much of what it contributes could already be reconstructed from the other predictors. A VIF close to 1 suggests very little overlap-driven inflation. As that value rises, it becomes a sign that the predictor is less independent from the rest of the model and that its coefficient may be less reliable as a stand-alone effect.

That makes VIF especially useful as a follow-up to the correlation matrix. The correlation matrix shows pairwise relationships, but VIF reflects the combined overlap structure of the full set of predictors. In other words, it is possible for no single pairwise correlation to look extreme while the broader set of variables still creates estimation instability. Using both tools together gives a stronger basis for deciding whether a simplified model would be cleaner and more interpretable.

Final Regression Model

Initial Regression Results
Success rate regressed on EPA per play, turnover rate, explosive play rate, and red zone success
Term Estimate Std. Error t value p value 95% CI Low 95% CI High
Intercept 0.394 0.033 11.769 <0.001 0.325 0.462
EPA per Play 0.383 0.053 7.164 <0.001 0.273 0.493
Turnover Rate 1.923 0.644 2.986 0.006 0.601 3.244
Explosive Play Rate -0.230 0.351 -0.655 0.518 -0.950 0.490
Red Zone Success 0.134 0.059 2.291 0.030 0.014 0.254
Final Regression Results
Explosive play rate removed after diagnosing overlap and multicollinearity
Term Estimate Std. Error t value p value 95% CI Low 95% CI High
Intercept 0.381 0.027 14.032 <0.001 0.326 0.437
EPA per Play 0.359 0.038 9.417 <0.001 0.281 0.437
Turnover Rate 1.840 0.625 2.944 0.006 0.560 3.120
Red Zone Success 0.136 0.058 2.354 0.026 0.018 0.255
Model Fit Comparison
Comparing overall model quality before and after removing overlap
Model R-squared Adjusted R-squared Residual Std. Error F Statistic Model p-value Observations
Initial Model 0.873 0.854 0.014 46.502 <0.001 32
Final Model 0.871 0.857 0.014 63.148 <0.001 32

The initial regression table shows how each predictor relates to First Down Success Rate when all four variables are included together. The estimate gives the direction and size of the relationship, the t value reflects the strength of the estimate relative to its uncertainty, and the p value indicates whether the coefficient appears statistically distinguishable from zero within the model.

The final regression table is more analytically useful because it reflects the version of the model after removing a variable that risked overlapping too heavily with broader efficiency measures. That makes the remaining coefficients easier to interpret as distinct drivers of offensive success. In this final setup, EPA per play stands in for overall offensive efficiency, turnover rate captures possession-ending or drive-damaging mistakes, and red zone success reflects whether an offense maintains execution quality in the most compressed part of the field.

The model fit comparison is important because simplification alone is not automatically an improvement. What matters is whether the cleaner model preserves most of the explanatory power while improving interpretability. If the fit metrics remain strong after removing the overlapping variable, that supports the decision to favor the more parsimonious specification. In a regression setting, that is often the preferable result because it gives a model that is not only strong statistically, but also easier to explain in football terms.


Overall Summary

For this lab, I wanted to create a version of offensive performance that better reflects how football is actually played from snap to snap. That is why I used First Down Success Rate instead of something broader like total yards or total points. This metric focuses on whether an offense is consistently staying on schedule based on the demands of the situation. A 4-yard gain on first down is not spectacular, but it is still useful because it keeps the offense in a favorable rhythm and makes the next snap easier to manage. That kind of logic is built directly into this measure.

The team rankings show that offensive success is not just about isolated explosive moments. Teams that grade out well tend to combine broad efficiency, fewer mistakes, and stronger situational execution. The scatterplot supports that idea by showing a positive relationship between EPA per play and First Down Success Rate, while still leaving enough spread to show that the two measures are related without being interchangeable.

From the modeling side, the biggest analytical issue was whether the independent variables were sufficiently distinct from one another. The correlation matrix suggested that EPA per play and explosive play rate were the most likely to overlap conceptually, and the VIF analysis provided a second check on whether that overlap was strong enough to threaten interpretability. That combination of diagnostics supported the decision to move from the initial model to a cleaner final specification.

The final regression model is the strongest version of the analysis because it balances explanatory power with interpretability. Rather than stacking multiple variables that may partially duplicate one another, it isolates three core dimensions of offense. EPA per play captures overall efficiency, turnover rate captures damaging mistakes, and red zone success captures execution under tighter spatial constraints. Together, those predictors provide a cleaner explanation of why some teams were more consistently successful on a down-to-down basis than others.

Overall, I think this approach fits the assignment well because it uses nflreadr data to build a meaningful team-season regression model while also taking diagnostic testing seriously. More importantly, I think it gets closer to real football logic. Offensive quality is not just about accumulating yards or points in the aggregate. It is about repeatedly creating favorable situations, avoiding drive-killing errors, and executing with enough consistency to sustain offense over the course of a season. ```