Overview
Quarterback stats can get messy fast. Passing yards and touchdowns
tell part of the story, but they do not really explain how a
quarterback plays. Two quarterbacks can finish with similar box score
numbers and still get there in completely different ways.
That is the point of this project. I used NFL play by play data from
2021 through 2025 to group quarterbacks into statistical archetypes.
Instead of only looking at traditional stats, this analysis uses
efficiency, accuracy, aggressiveness, explosiveness, sacks, and
turnovers to get a better picture of quarterback style.
The main model is built at the player-season level.
That means a quarterback’s 2021 season and 2024 season are treated
separately, which matters because quarterbacks change. Scheme, coaching,
supporting cast, injuries, and development can all change what a player
looks like from year to year.
To keep the sample meaningful, I only included quarterback seasons
with at least 400 dropbacks. The goal is not just to
rank quarterbacks from best to worst. The goal is to see what types of
quarterback profiles show up when the stats are studied together.
Data and Feature Engineering
Feature Definitions
| Quarterback Variables Used in the Analysis |
| Eight aggregate PbP-based quarterback metrics |
| Statistic |
Definition |
| EPA per Dropback |
Average expected points added on all quarterback dropbacks. |
| Success Rate |
Share of dropbacks considered successful using down and distance thresholds: 40% of yards to go on 1st down, 60% on 2nd down, and 100% on 3rd or 4th down. |
| CPOE |
Completion percentage over expected on pass attempts. |
| Air Yards per Attempt |
Average intended air yards on pass attempts. |
| Deep Pass Rate |
Rate of pass attempts with at least 20 air yards. |
| Explosive Play Rate |
Rate of quarterback dropbacks that gained at least 20 yards. |
| Sack Rate |
Rate of dropbacks ending in a sack. |
| Turnover Rate |
Rate of dropbacks ending in an interception or lost fumble. |
Sample Overview
| Sample Construction Summary |
| Measure |
Value |
| Regular seasons included |
2021–2025 |
| Quarterback dropbacks in raw sample |
97,017 |
| Quarterback player-seasons before threshold |
568 |
| Quarterback player-seasons used in PCA/clustering |
118 |
| Minimum dropback threshold |
400 |
Main Analytical Dataset
| Top Quarterback Seasons in the Modeling Sample |
| Sorted by EPA per dropback, minimum 400 dropbacks |
| Season |
Team |
Player |
Dropbacks |
EPA/DB |
Success Rate |
CPOE |
AY/A |
Deep Rate |
Explosive Rate |
Sack Rate |
Turnover Rate |
| 2024 |
BAL |
L.Jackson |
494 |
0.341 |
51.8% |
4.55 |
8.75 |
14.4% |
11.3% |
4.7% |
1.2% |
| 2023 |
SF |
B.Purdy |
469 |
0.297 |
55.0% |
5.38 |
8.25 |
10.0% |
15.4% |
6.0% |
3.0% |
| 2025 |
NE |
D.Maye |
539 |
0.296 |
54.5% |
10.78 |
9.13 |
13.0% |
12.4% |
8.7% |
2.2% |
| 2024 |
DET |
J.Goff |
568 |
0.276 |
55.0% |
5.71 |
6.36 |
7.6% |
10.7% |
5.5% |
2.5% |
| 2024 |
BUF |
J.Allen |
497 |
0.262 |
49.2% |
0.84 |
8.34 |
15.1% |
10.9% |
2.8% |
1.6% |
| 2022 |
KC |
P.Mahomes |
677 |
0.258 |
54.6% |
3.57 |
7.28 |
8.6% |
10.8% |
3.8% |
2.4% |
| 2025 |
GB |
J.Love |
461 |
0.240 |
50.3% |
5.53 |
8.84 |
13.4% |
10.6% |
4.6% |
1.7% |
| 2025 |
LA |
M.Stafford |
617 |
0.229 |
54.6% |
1.59 |
9.14 |
14.6% |
11.7% |
3.7% |
1.9% |
| 2022 |
MIA |
T.Tagovailoa |
421 |
0.214 |
50.0% |
1.43 |
9.61 |
13.5% |
11.9% |
5.0% |
2.4% |
| 2024 |
MIA |
T.Tagovailoa |
421 |
0.203 |
52.2% |
3.78 |
5.72 |
6.7% |
6.9% |
5.0% |
2.1% |
| 2021 |
LA |
M.Stafford |
634 |
0.185 |
52.6% |
-0.12 |
8.48 |
11.5% |
10.3% |
4.7% |
3.0% |
| 2021 |
KC |
P.Mahomes |
687 |
0.185 |
53.2% |
2.67 |
7.34 |
11.1% |
8.4% |
4.1% |
2.5% |
| 2021 |
TB |
T.Brady |
740 |
0.185 |
52.8% |
1.69 |
8.09 |
12.3% |
10.1% |
3.0% |
2.2% |
| 2023 |
MIA |
T.Tagovailoa |
589 |
0.185 |
50.9% |
4.50 |
7.68 |
10.9% |
9.8% |
4.9% |
2.9% |
| 2023 |
DAL |
D.Prescott |
631 |
0.176 |
51.5% |
3.93 |
7.76 |
10.6% |
9.8% |
6.2% |
1.9% |
| 2024 |
TB |
B.Mayfield |
612 |
0.175 |
54.0% |
3.60 |
6.98 |
9.2% |
9.5% |
6.5% |
3.1% |
| 2025 |
DET |
J.Goff |
614 |
0.174 |
48.9% |
1.78 |
6.45 |
6.4% |
10.7% |
6.2% |
1.8% |
| 2021 |
DAL |
D.Prescott |
631 |
0.172 |
52.1% |
2.31 |
7.77 |
11.4% |
8.7% |
4.8% |
2.1% |
| 2021 |
SF |
J.Garoppolo |
469 |
0.166 |
50.7% |
2.62 |
7.55 |
7.9% |
11.5% |
6.2% |
3.8% |
| 2025 |
DAL |
D.Prescott |
630 |
0.165 |
48.4% |
2.16 |
8.06 |
10.5% |
8.4% |
4.9% |
2.2% |
Principal Component Analysis
Variance Explained

| PCA Variance Explained |
| Component |
Variance Explained |
Cumulative Variance |
| PC1 |
40.8% |
40.8% |
| PC2 |
24.5% |
65.3% |
| PC3 |
13.4% |
78.7% |
| PC4 |
10.3% |
89.1% |
| PC5 |
5.7% |
94.7% |
| PC6 |
2.8% |
97.6% |
Principal Component Loadings
| Principal Component Loadings |
| Higher absolute values indicate stronger contribution to that component |
| Variable |
PC1 |
PC2 |
PC3 |
| EPA per Dropback |
-0.530 |
0.120 |
-0.049 |
| Success Rate |
-0.461 |
0.301 |
0.145 |
| CPOE |
-0.365 |
0.195 |
0.429 |
| Air Yards per Attempt |
-0.195 |
-0.618 |
0.102 |
| Deep Pass Rate |
-0.155 |
-0.628 |
-0.094 |
| Explosive Play Rate |
-0.410 |
-0.247 |
0.221 |
| Sack Rate |
0.324 |
-0.143 |
0.472 |
| Turnover Rate |
0.197 |
0.005 |
0.709 |
PCA Interpretation
How I read the PCA
PC1 is mostly driven by EPA per Dropback, Success Rate,
Explosive Play Rate. In football terms, I read this as the
can you keep the offense on schedule? axis.
Quarterbacks on the better end of this component are generally more
efficient, more consistent, and less likely to kill drives with sacks or
turnovers.
PC2 is driven most by Deep Pass Rate, Air Yards per Attempt,
Success Rate. I read this more as a style
axis. This is where the model starts separating quarterbacks
who push the ball downfield and chase explosives from quarterbacks who
play a more controlled, underneath game.
The first two components explain 65.3% of the total
variance. That does not capture everything about quarterback play, but
it gives a useful two-dimensional view of the bigger picture. PCA is
helpful here because a lot of these stats overlap with each other.
Instead of staring at eight different columns one by one, PCA helps show
the main patterns underneath them.
Choosing the Number of Clusters


Based on the elbow plot, I used k = 3 for the final
k means model. This is the point where adding more clusters starts to
give less payoff. In other words, the model gets enough separation
without overcomplicating the analysis.
I also checked silhouette width as a second opinion, but I kept the
final choice tied to the elbow method since that was the required
approach for this project. # K Means Clustering
Cluster Membership Summary
| Cluster Sizes |
| cluster |
Quarterback Seasons |
| 1 |
31 |
| 2 |
49 |
| 3 |
38 |
Cluster Visualization

Cluster Profiles
| Cluster Profile Table |
| Average statistical profile of each quarterback cluster |
| Profile |
QB Seasons |
Avg Dropbacks |
EPA/DB |
Success Rate |
CPOE |
AY/A |
Deep Rate |
Explosive Rate |
Sack Rate |
Turnover Rate |
| Cluster 1: Aggressive Vertical Playmakers |
31 |
516 |
-0.099 |
42.1% |
-1.48 |
7.68 |
11.1% |
7.4% |
7.9% |
3.0% |
| Cluster 2: Efficient Operators |
49 |
544 |
0.120 |
48.4% |
2.09 |
8.42 |
12.3% |
9.9% |
5.7% |
2.5% |
| Cluster 3: Volatile Pressure Profiles |
38 |
575 |
0.079 |
49.0% |
2.03 |
7.00 |
8.5% |
8.0% |
5.9% |
2.7% |

Standardized Cluster Strengths and Weaknesses
| Cluster Strengths and Weaknesses |
| Positive values are above the overall sample average; negative values are below |
| Profile |
EPA/DB |
Success |
CPOE |
AY/A |
Deep |
Explosive |
Sack |
Turnover |
| Cluster 1: Aggressive Vertical Playmakers |
-1.22 |
-1.14 |
-0.93 |
-0.10 |
0.13 |
-0.69 |
0.83 |
0.46 |
| Cluster 2: Efficient Operators |
0.58 |
0.34 |
0.34 |
0.78 |
0.71 |
0.71 |
-0.32 |
-0.24 |
| Cluster 3: Volatile Pressure Profiles |
0.24 |
0.50 |
0.32 |
-0.92 |
-1.02 |
-0.35 |
-0.26 |
-0.06 |
Cluster Interpretation
| Representative Quarterback Seasons by Cluster |
| These are the seasons closest to the center of each profile |
| cluster |
Profile |
Most Typical Examples |
| 1 |
Cluster 1: Aggressive Vertical Playmakers |
D.Mills 2021, J.Dobbs 2023, C.Stroud 2024 |
| 2 |
Cluster 2: Efficient Operators |
J.Love 2023, P.Mahomes 2025, L.Jackson 2023 |
| 3 |
Cluster 3: Volatile Pressure Profiles |
J.Burrow 2022, K.Murray 2024, T.Tagovailoa 2021 |
Cluster Interpretation
| Representative Quarterback Seasons by Cluster |
| These are the seasons closest to the center of each profile |
| cluster |
Profile |
Most Typical Examples |
| 1 |
Cluster 1: Aggressive Vertical Playmakers |
D.Mills 2021, J.Dobbs 2023, C.Stroud 2024 |
| 2 |
Cluster 2: Efficient Operators |
J.Love 2023, P.Mahomes 2025, L.Jackson 2023 |
| 3 |
Cluster 3: Volatile Pressure Profiles |
J.Burrow 2022, K.Murray 2024, T.Tagovailoa 2021 |
Supplemental Multi-Year Total View
The main model looks at quarterback seasons one at a time, but I also
wanted a longer view. This section rolls everything up across 2021
through 2025. It is not the main clustering model, but it helps show
which quarterbacks sustained strong production over multiple seasons
instead of just popping in one year.
| Top Quarterbacks, Multi-Year Total View |
| 2021-2025 window, minimum 1,000 dropbacks |
| Player |
Team |
Window |
Seasons |
Dropbacks |
EPA/DB |
Success Rate |
CPOE |
AY/A |
Deep Rate |
Explosive Rate |
Sack Rate |
Turnover Rate |
| B.Purdy |
SF |
2022-2025 |
4 |
1,430 |
0.218 |
52.7% |
3.87 |
8.04 |
10.2% |
12.7% |
5.7% |
3.2% |
| J.Love |
GB |
2021-2025 |
5 |
1,598 |
0.150 |
48.4% |
1.93 |
8.65 |
13.6% |
10.0% |
4.3% |
2.4% |
| P.Mahomes |
KC |
2021-2025 |
5 |
3,140 |
0.147 |
51.2% |
2.47 |
7.10 |
9.9% |
8.6% |
4.8% |
2.5% |
| J.Allen |
BUF |
2021-2025 |
5 |
2,838 |
0.145 |
49.9% |
2.32 |
8.38 |
12.5% |
9.1% |
4.8% |
2.8% |
| T.Tagovailoa |
MIA |
2021-2025 |
5 |
2,255 |
0.139 |
50.1% |
2.87 |
7.41 |
9.4% |
8.9% |
5.4% |
2.8% |
| J.Goff |
DET |
2021-2025 |
5 |
2,957 |
0.135 |
50.6% |
1.52 |
6.62 |
7.8% |
9.9% |
5.3% |
2.2% |
| D.Prescott |
DAL |
2021-2025 |
5 |
2,613 |
0.130 |
50.2% |
1.94 |
7.93 |
10.7% |
8.8% |
5.4% |
2.5% |
| J.Garoppolo |
SF |
2021-2024 |
4 |
1,023 |
0.121 |
50.9% |
0.38 |
7.35 |
8.6% |
9.8% |
6.3% |
3.5% |
| T.Brady |
TB |
2021-2022 |
2 |
1,498 |
0.120 |
51.2% |
1.10 |
7.49 |
11.1% |
8.3% |
2.9% |
2.1% |
| J.Burrow |
CIN |
2021-2025 |
5 |
2,589 |
0.116 |
50.1% |
4.66 |
7.18 |
8.6% |
7.9% |
7.0% |
2.5% |
| M.Stafford |
LA |
2021-2025 |
5 |
2,680 |
0.115 |
51.2% |
-0.31 |
8.08 |
11.3% |
9.9% |
5.2% |
2.5% |
| L.Jackson |
BAL |
2021-2025 |
5 |
2,099 |
0.113 |
48.0% |
2.24 |
8.76 |
12.6% |
10.0% |
7.6% |
2.7% |
| J.Hurts |
PHI |
2021-2025 |
5 |
2,418 |
0.065 |
45.5% |
3.32 |
8.54 |
11.9% |
9.4% |
7.0% |
2.6% |
| J.Herbert |
LAC |
2021-2025 |
5 |
3,042 |
0.064 |
47.3% |
0.75 |
7.54 |
10.5% |
8.0% |
6.3% |
2.2% |
| B.Nix |
DEN |
2024-2025 |
2 |
1,229 |
0.061 |
44.8% |
-0.35 |
7.31 |
11.9% |
8.0% |
3.7% |
2.4% |
| D.Carr |
LV |
2021-2024 |
4 |
2,069 |
0.051 |
46.5% |
1.28 |
8.27 |
12.6% |
9.1% |
5.1% |
2.6% |
| C.Stroud |
HOU |
2023-2025 |
3 |
1,563 |
0.051 |
45.0% |
-0.44 |
8.53 |
10.9% |
9.4% |
7.2% |
2.2% |
| K.Cousins |
MIN |
2021-2025 |
5 |
2,362 |
0.047 |
46.4% |
1.18 |
7.61 |
10.0% |
8.6% |
5.6% |
2.7% |
| A.Rodgers |
GB |
2021-2025 |
5 |
2,018 |
0.040 |
45.4% |
0.25 |
7.05 |
11.1% |
8.5% |
5.7% |
2.2% |
| K.Murray |
ARI |
2021-2025 |
5 |
1,968 |
0.039 |
48.0% |
1.17 |
7.12 |
11.3% |
7.5% |
6.1% |
2.1% |
Discussion
The biggest takeaway from the clustering model is that quarterbacks
are not separated by one magic stat. EPA matters, but so do success
rate, CPOE, air yards, explosive plays, sacks, and turnovers. The value
of this model is that it looks at all of those things together.
That is why I like the player-season setup. It lets the model treat
each season as its own quarterback profile. A player can look one way in
2021 and a completely different way in 2024 depending on scheme, health,
protection, receivers, or his own development. A single five-year
average would hide a lot of that.
The multi-year table adds a different kind of context. Quarterbacks
like B.Purdy, J.Love, P.Mahomes, J.Allen, T.Tagovailoa
rise to the top because they combined efficiency with enough volume over
the full sample. That is useful, but it answers a slightly different
question. The clustering model is more about style and profile. The
multi-year table is more about sustained production.
Overall, this approach shows that quarterback archetypes exist on a
spectrum. Some quarterbacks win with efficiency and control. Some win by
pushing the ball downfield. Others are more volatile because of sacks,
turnovers, or lower consistency. That is the part basic passing totals
usually miss.
Practical Implications and Limitations
Why This Matters
This kind of clustering is useful because it gives more context than
a normal leaderboard. It can help compare quarterbacks who may have
similar raw stats but very different playing styles. That matters for
scouting, team building, scheme fit, opponent prep, and even tracking
how a quarterback changes from one season to the next.
Limitations
This model still has limits. Play by play data tells us what
happened, but it does not perfectly separate the quarterback from the
offense around him. Offensive line play, receivers, play calling, game
script, and injuries all matter. The cluster names also require some
football judgment. The data creates the groups, but I am still
interpreting what those groups mean.
Conclusion
This analysis used five regular seasons of NFL play by play data and
built a clustering model around 118 quarterback seasons
with at least 400 dropbacks. The model grouped those
seasons into 3 quarterback profiles based on
efficiency, accuracy, aggressiveness, explosiveness, sack rate, and
turnover rate.
The main point is that quarterback evaluation gets a lot more
interesting when the stats are combined instead of viewed one at a time.
Passing yards and touchdowns are useful, but they do not fully explain
how a quarterback plays. This model does a better job showing whether a
quarterback is efficient, aggressive, explosive, mistake-prone, or more
of a controlled operator.
The biggest takeaway is simple: quarterback style matters. Some
players create value by staying on schedule. Others create value by
attacking downfield. Others are harder to trust because the negative
plays show up too often. Clustering does not answer every quarterback
question, but it gives a cleaner way to see those differences.