Overview

Quarterback stats can get messy fast. Passing yards and touchdowns tell part of the story, but they do not really explain how a quarterback plays. Two quarterbacks can finish with similar box score numbers and still get there in completely different ways.

That is the point of this project. I used NFL play by play data from 2021 through 2025 to group quarterbacks into statistical archetypes. Instead of only looking at traditional stats, this analysis uses efficiency, accuracy, aggressiveness, explosiveness, sacks, and turnovers to get a better picture of quarterback style.

The main model is built at the player-season level. That means a quarterback’s 2021 season and 2024 season are treated separately, which matters because quarterbacks change. Scheme, coaching, supporting cast, injuries, and development can all change what a player looks like from year to year.

To keep the sample meaningful, I only included quarterback seasons with at least 400 dropbacks. The goal is not just to rank quarterbacks from best to worst. The goal is to see what types of quarterback profiles show up when the stats are studied together.

Data and Feature Engineering

Feature Definitions

Quarterback Variables Used in the Analysis
Eight aggregate PbP-based quarterback metrics
Statistic Definition
EPA per Dropback Average expected points added on all quarterback dropbacks.
Success Rate Share of dropbacks considered successful using down and distance thresholds: 40% of yards to go on 1st down, 60% on 2nd down, and 100% on 3rd or 4th down.
CPOE Completion percentage over expected on pass attempts.
Air Yards per Attempt Average intended air yards on pass attempts.
Deep Pass Rate Rate of pass attempts with at least 20 air yards.
Explosive Play Rate Rate of quarterback dropbacks that gained at least 20 yards.
Sack Rate Rate of dropbacks ending in a sack.
Turnover Rate Rate of dropbacks ending in an interception or lost fumble.

Sample Overview

Sample Construction Summary
Measure Value
Regular seasons included 2021–2025
Quarterback dropbacks in raw sample 97,017
Quarterback player-seasons before threshold 568
Quarterback player-seasons used in PCA/clustering 118
Minimum dropback threshold 400

Main Analytical Dataset

Top Quarterback Seasons in the Modeling Sample
Sorted by EPA per dropback, minimum 400 dropbacks
Season Team Player Dropbacks EPA/DB Success Rate CPOE AY/A Deep Rate Explosive Rate Sack Rate Turnover Rate
2024 BAL L.Jackson 494 0.341 51.8% 4.55 8.75 14.4% 11.3% 4.7% 1.2%
2023 SF B.Purdy 469 0.297 55.0% 5.38 8.25 10.0% 15.4% 6.0% 3.0%
2025 NE D.Maye 539 0.296 54.5% 10.78 9.13 13.0% 12.4% 8.7% 2.2%
2024 DET J.Goff 568 0.276 55.0% 5.71 6.36 7.6% 10.7% 5.5% 2.5%
2024 BUF J.Allen 497 0.262 49.2% 0.84 8.34 15.1% 10.9% 2.8% 1.6%
2022 KC P.Mahomes 677 0.258 54.6% 3.57 7.28 8.6% 10.8% 3.8% 2.4%
2025 GB J.Love 461 0.240 50.3% 5.53 8.84 13.4% 10.6% 4.6% 1.7%
2025 LA M.Stafford 617 0.229 54.6% 1.59 9.14 14.6% 11.7% 3.7% 1.9%
2022 MIA T.Tagovailoa 421 0.214 50.0% 1.43 9.61 13.5% 11.9% 5.0% 2.4%
2024 MIA T.Tagovailoa 421 0.203 52.2% 3.78 5.72 6.7% 6.9% 5.0% 2.1%
2021 LA M.Stafford 634 0.185 52.6% -0.12 8.48 11.5% 10.3% 4.7% 3.0%
2021 KC P.Mahomes 687 0.185 53.2% 2.67 7.34 11.1% 8.4% 4.1% 2.5%
2021 TB T.Brady 740 0.185 52.8% 1.69 8.09 12.3% 10.1% 3.0% 2.2%
2023 MIA T.Tagovailoa 589 0.185 50.9% 4.50 7.68 10.9% 9.8% 4.9% 2.9%
2023 DAL D.Prescott 631 0.176 51.5% 3.93 7.76 10.6% 9.8% 6.2% 1.9%
2024 TB B.Mayfield 612 0.175 54.0% 3.60 6.98 9.2% 9.5% 6.5% 3.1%
2025 DET J.Goff 614 0.174 48.9% 1.78 6.45 6.4% 10.7% 6.2% 1.8%
2021 DAL D.Prescott 631 0.172 52.1% 2.31 7.77 11.4% 8.7% 4.8% 2.1%
2021 SF J.Garoppolo 469 0.166 50.7% 2.62 7.55 7.9% 11.5% 6.2% 3.8%
2025 DAL D.Prescott 630 0.165 48.4% 2.16 8.06 10.5% 8.4% 4.9% 2.2%

Principal Component Analysis

Variance Explained

PCA Variance Explained
Component Variance Explained Cumulative Variance
PC1 40.8% 40.8%
PC2 24.5% 65.3%
PC3 13.4% 78.7%
PC4 10.3% 89.1%
PC5 5.7% 94.7%
PC6 2.8% 97.6%

Principal Component Loadings

Principal Component Loadings
Higher absolute values indicate stronger contribution to that component
Variable PC1 PC2 PC3
EPA per Dropback -0.530 0.120 -0.049
Success Rate -0.461 0.301 0.145
CPOE -0.365 0.195 0.429
Air Yards per Attempt -0.195 -0.618 0.102
Deep Pass Rate -0.155 -0.628 -0.094
Explosive Play Rate -0.410 -0.247 0.221
Sack Rate 0.324 -0.143 0.472
Turnover Rate 0.197 0.005 0.709

PCA Interpretation

How I read the PCA

PC1 is mostly driven by EPA per Dropback, Success Rate, Explosive Play Rate. In football terms, I read this as the can you keep the offense on schedule? axis. Quarterbacks on the better end of this component are generally more efficient, more consistent, and less likely to kill drives with sacks or turnovers.

PC2 is driven most by Deep Pass Rate, Air Yards per Attempt, Success Rate. I read this more as a style axis. This is where the model starts separating quarterbacks who push the ball downfield and chase explosives from quarterbacks who play a more controlled, underneath game.

The first two components explain 65.3% of the total variance. That does not capture everything about quarterback play, but it gives a useful two-dimensional view of the bigger picture. PCA is helpful here because a lot of these stats overlap with each other. Instead of staring at eight different columns one by one, PCA helps show the main patterns underneath them.

Choosing the Number of Clusters

Based on the elbow plot, I used k = 3 for the final k means model. This is the point where adding more clusters starts to give less payoff. In other words, the model gets enough separation without overcomplicating the analysis.

I also checked silhouette width as a second opinion, but I kept the final choice tied to the elbow method since that was the required approach for this project. # K Means Clustering

Cluster Membership Summary

Cluster Sizes
cluster Quarterback Seasons
1 31
2 49
3 38

Cluster Visualization

Cluster Profiles

Cluster Profile Table
Average statistical profile of each quarterback cluster
Profile QB Seasons Avg Dropbacks EPA/DB Success Rate CPOE AY/A Deep Rate Explosive Rate Sack Rate Turnover Rate
Cluster 1: Aggressive Vertical Playmakers 31 516 -0.099 42.1% -1.48 7.68 11.1% 7.4% 7.9% 3.0%
Cluster 2: Efficient Operators 49 544 0.120 48.4% 2.09 8.42 12.3% 9.9% 5.7% 2.5%
Cluster 3: Volatile Pressure Profiles 38 575 0.079 49.0% 2.03 7.00 8.5% 8.0% 5.9% 2.7%

Standardized Cluster Strengths and Weaknesses

Cluster Strengths and Weaknesses
Positive values are above the overall sample average; negative values are below
Profile EPA/DB Success CPOE AY/A Deep Explosive Sack Turnover
Cluster 1: Aggressive Vertical Playmakers -1.22 -1.14 -0.93 -0.10 0.13 -0.69 0.83 0.46
Cluster 2: Efficient Operators 0.58 0.34 0.34 0.78 0.71 0.71 -0.32 -0.24
Cluster 3: Volatile Pressure Profiles 0.24 0.50 0.32 -0.92 -1.02 -0.35 -0.26 -0.06

Cluster Interpretation

Representative Quarterback Seasons by Cluster
These are the seasons closest to the center of each profile
cluster Profile Most Typical Examples
1 Cluster 1: Aggressive Vertical Playmakers D.Mills 2021, J.Dobbs 2023, C.Stroud 2024
2 Cluster 2: Efficient Operators J.Love 2023, P.Mahomes 2025, L.Jackson 2023
3 Cluster 3: Volatile Pressure Profiles J.Burrow 2022, K.Murray 2024, T.Tagovailoa 2021

Cluster Interpretation

Representative Quarterback Seasons by Cluster
These are the seasons closest to the center of each profile
cluster Profile Most Typical Examples
1 Cluster 1: Aggressive Vertical Playmakers D.Mills 2021, J.Dobbs 2023, C.Stroud 2024
2 Cluster 2: Efficient Operators J.Love 2023, P.Mahomes 2025, L.Jackson 2023
3 Cluster 3: Volatile Pressure Profiles J.Burrow 2022, K.Murray 2024, T.Tagovailoa 2021

Supplemental Multi-Year Total View

The main model looks at quarterback seasons one at a time, but I also wanted a longer view. This section rolls everything up across 2021 through 2025. It is not the main clustering model, but it helps show which quarterbacks sustained strong production over multiple seasons instead of just popping in one year.

Top Quarterbacks, Multi-Year Total View
2021-2025 window, minimum 1,000 dropbacks
Player Team Window Seasons Dropbacks EPA/DB Success Rate CPOE AY/A Deep Rate Explosive Rate Sack Rate Turnover Rate
B.Purdy SF 2022-2025 4 1,430 0.218 52.7% 3.87 8.04 10.2% 12.7% 5.7% 3.2%
J.Love GB 2021-2025 5 1,598 0.150 48.4% 1.93 8.65 13.6% 10.0% 4.3% 2.4%
P.Mahomes KC 2021-2025 5 3,140 0.147 51.2% 2.47 7.10 9.9% 8.6% 4.8% 2.5%
J.Allen BUF 2021-2025 5 2,838 0.145 49.9% 2.32 8.38 12.5% 9.1% 4.8% 2.8%
T.Tagovailoa MIA 2021-2025 5 2,255 0.139 50.1% 2.87 7.41 9.4% 8.9% 5.4% 2.8%
J.Goff DET 2021-2025 5 2,957 0.135 50.6% 1.52 6.62 7.8% 9.9% 5.3% 2.2%
D.Prescott DAL 2021-2025 5 2,613 0.130 50.2% 1.94 7.93 10.7% 8.8% 5.4% 2.5%
J.Garoppolo SF 2021-2024 4 1,023 0.121 50.9% 0.38 7.35 8.6% 9.8% 6.3% 3.5%
T.Brady TB 2021-2022 2 1,498 0.120 51.2% 1.10 7.49 11.1% 8.3% 2.9% 2.1%
J.Burrow CIN 2021-2025 5 2,589 0.116 50.1% 4.66 7.18 8.6% 7.9% 7.0% 2.5%
M.Stafford LA 2021-2025 5 2,680 0.115 51.2% -0.31 8.08 11.3% 9.9% 5.2% 2.5%
L.Jackson BAL 2021-2025 5 2,099 0.113 48.0% 2.24 8.76 12.6% 10.0% 7.6% 2.7%
J.Hurts PHI 2021-2025 5 2,418 0.065 45.5% 3.32 8.54 11.9% 9.4% 7.0% 2.6%
J.Herbert LAC 2021-2025 5 3,042 0.064 47.3% 0.75 7.54 10.5% 8.0% 6.3% 2.2%
B.Nix DEN 2024-2025 2 1,229 0.061 44.8% -0.35 7.31 11.9% 8.0% 3.7% 2.4%
D.Carr LV 2021-2024 4 2,069 0.051 46.5% 1.28 8.27 12.6% 9.1% 5.1% 2.6%
C.Stroud HOU 2023-2025 3 1,563 0.051 45.0% -0.44 8.53 10.9% 9.4% 7.2% 2.2%
K.Cousins MIN 2021-2025 5 2,362 0.047 46.4% 1.18 7.61 10.0% 8.6% 5.6% 2.7%
A.Rodgers GB 2021-2025 5 2,018 0.040 45.4% 0.25 7.05 11.1% 8.5% 5.7% 2.2%
K.Murray ARI 2021-2025 5 1,968 0.039 48.0% 1.17 7.12 11.3% 7.5% 6.1% 2.1%

Discussion

The biggest takeaway from the clustering model is that quarterbacks are not separated by one magic stat. EPA matters, but so do success rate, CPOE, air yards, explosive plays, sacks, and turnovers. The value of this model is that it looks at all of those things together.

That is why I like the player-season setup. It lets the model treat each season as its own quarterback profile. A player can look one way in 2021 and a completely different way in 2024 depending on scheme, health, protection, receivers, or his own development. A single five-year average would hide a lot of that.

The multi-year table adds a different kind of context. Quarterbacks like B.Purdy, J.Love, P.Mahomes, J.Allen, T.Tagovailoa rise to the top because they combined efficiency with enough volume over the full sample. That is useful, but it answers a slightly different question. The clustering model is more about style and profile. The multi-year table is more about sustained production.

Overall, this approach shows that quarterback archetypes exist on a spectrum. Some quarterbacks win with efficiency and control. Some win by pushing the ball downfield. Others are more volatile because of sacks, turnovers, or lower consistency. That is the part basic passing totals usually miss.

Practical Implications and Limitations

Why This Matters

This kind of clustering is useful because it gives more context than a normal leaderboard. It can help compare quarterbacks who may have similar raw stats but very different playing styles. That matters for scouting, team building, scheme fit, opponent prep, and even tracking how a quarterback changes from one season to the next.

Limitations

This model still has limits. Play by play data tells us what happened, but it does not perfectly separate the quarterback from the offense around him. Offensive line play, receivers, play calling, game script, and injuries all matter. The cluster names also require some football judgment. The data creates the groups, but I am still interpreting what those groups mean.

Conclusion

This analysis used five regular seasons of NFL play by play data and built a clustering model around 118 quarterback seasons with at least 400 dropbacks. The model grouped those seasons into 3 quarterback profiles based on efficiency, accuracy, aggressiveness, explosiveness, sack rate, and turnover rate.

The main point is that quarterback evaluation gets a lot more interesting when the stats are combined instead of viewed one at a time. Passing yards and touchdowns are useful, but they do not fully explain how a quarterback plays. This model does a better job showing whether a quarterback is efficient, aggressive, explosive, mistake-prone, or more of a controlled operator.

The biggest takeaway is simple: quarterback style matters. Some players create value by staying on schedule. Others create value by attacking downfield. Others are harder to trust because the negative plays show up too often. Clustering does not answer every quarterback question, but it gives a cleaner way to see those differences.