Overview

Quarterback stats can get messy fast. Passing yards and touchdowns tell part of the story, but they do not really explain how a quarterback plays. Two quarterbacks can finish with similar box score numbers and still get there in completely different ways.

That is the point of this project. I used NFL play by play data from 2021 through 2025 to group quarterbacks into statistical archetypes. Instead of only looking at traditional stats, this analysis uses efficiency, accuracy, aggressiveness, explosiveness, sacks, and turnovers to get a better picture of quarterback style.

The main model is built at the player-season level. That means a quarterback’s 2021 season and 2024 season are treated separately, which matters because quarterbacks change. Scheme, coaching, supporting cast, injuries, and development can all change what a player looks like from year to year.

To keep the sample meaningful, I only included quarterback seasons with at least 400 dropbacks. The goal is not just to rank quarterbacks from best to worst. The goal is to see what types of quarterback profiles show up when the stats are studied together.

Data and Feature Engineering

Feature Definitions

Statistic	Definition
Quarterback Variables Used in the Analysis
Eight aggregate PbP-based quarterback metrics
EPA per Dropback	Average expected points added on all quarterback dropbacks.
Success Rate	Share of dropbacks considered successful using down and distance thresholds: 40% of yards to go on 1st down, 60% on 2nd down, and 100% on 3rd or 4th down.
CPOE	Completion percentage over expected on pass attempts.
Air Yards per Attempt	Average intended air yards on pass attempts.
Deep Pass Rate	Rate of pass attempts with at least 20 air yards.
Explosive Play Rate	Rate of quarterback dropbacks that gained at least 20 yards.
Sack Rate	Rate of dropbacks ending in a sack.
Turnover Rate	Rate of dropbacks ending in an interception or lost fumble.

Sample Overview

Measure	Value
Sample Construction Summary
Regular seasons included	2021–2025
Quarterback dropbacks in raw sample	97,017
Quarterback player-seasons before threshold	568
Quarterback player-seasons used in PCA/clustering	118
Minimum dropback threshold	400

Main Analytical Dataset

Season	Team	Player	Dropbacks	EPA/DB	Success Rate	CPOE	AY/A	Deep Rate	Explosive Rate	Sack Rate	Turnover Rate
Top Quarterback Seasons in the Modeling Sample
Sorted by EPA per dropback, minimum 400 dropbacks
2024	BAL	L.Jackson	494	0.341	51.8%	4.55	8.75	14.4%	11.3%	4.7%	1.2%
2023	SF	B.Purdy	469	0.297	55.0%	5.38	8.25	10.0%	15.4%	6.0%	3.0%
2025	NE	D.Maye	539	0.296	54.5%	10.78	9.13	13.0%	12.4%	8.7%	2.2%
2024	DET	J.Goff	568	0.276	55.0%	5.71	6.36	7.6%	10.7%	5.5%	2.5%
2024	BUF	J.Allen	497	0.262	49.2%	0.84	8.34	15.1%	10.9%	2.8%	1.6%
2022	KC	P.Mahomes	677	0.258	54.6%	3.57	7.28	8.6%	10.8%	3.8%	2.4%
2025	GB	J.Love	461	0.240	50.3%	5.53	8.84	13.4%	10.6%	4.6%	1.7%
2025	LA	M.Stafford	617	0.229	54.6%	1.59	9.14	14.6%	11.7%	3.7%	1.9%
2022	MIA	T.Tagovailoa	421	0.214	50.0%	1.43	9.61	13.5%	11.9%	5.0%	2.4%
2024	MIA	T.Tagovailoa	421	0.203	52.2%	3.78	5.72	6.7%	6.9%	5.0%	2.1%
2021	LA	M.Stafford	634	0.185	52.6%	-0.12	8.48	11.5%	10.3%	4.7%	3.0%
2021	KC	P.Mahomes	687	0.185	53.2%	2.67	7.34	11.1%	8.4%	4.1%	2.5%
2021	TB	T.Brady	740	0.185	52.8%	1.69	8.09	12.3%	10.1%	3.0%	2.2%
2023	MIA	T.Tagovailoa	589	0.185	50.9%	4.50	7.68	10.9%	9.8%	4.9%	2.9%
2023	DAL	D.Prescott	631	0.176	51.5%	3.93	7.76	10.6%	9.8%	6.2%	1.9%
2024	TB	B.Mayfield	612	0.175	54.0%	3.60	6.98	9.2%	9.5%	6.5%	3.1%
2025	DET	J.Goff	614	0.174	48.9%	1.78	6.45	6.4%	10.7%	6.2%	1.8%
2021	DAL	D.Prescott	631	0.172	52.1%	2.31	7.77	11.4%	8.7%	4.8%	2.1%
2021	SF	J.Garoppolo	469	0.166	50.7%	2.62	7.55	7.9%	11.5%	6.2%	3.8%
2025	DAL	D.Prescott	630	0.165	48.4%	2.16	8.06	10.5%	8.4%	4.9%	2.2%

Principal Component Analysis

Variance Explained

Component	Variance Explained	Cumulative Variance
PCA Variance Explained
PC1	40.8%	40.8%
PC2	24.5%	65.3%
PC3	13.4%	78.7%
PC4	10.3%	89.1%
PC5	5.7%	94.7%
PC6	2.8%	97.6%

Principal Component Loadings

Variable	PC1	PC2	PC3
Principal Component Loadings
Higher absolute values indicate stronger contribution to that component
EPA per Dropback	-0.530	0.120	-0.049
Success Rate	-0.461	0.301	0.145
CPOE	-0.365	0.195	0.429
Air Yards per Attempt	-0.195	-0.618	0.102
Deep Pass Rate	-0.155	-0.628	-0.094
Explosive Play Rate	-0.410	-0.247	0.221
Sack Rate	0.324	-0.143	0.472
Turnover Rate	0.197	0.005	0.709

PCA Interpretation

How I read the PCA

PC1 is mostly driven by EPA per Dropback, Success Rate, Explosive Play Rate. In football terms, I read this as the can you keep the offense on schedule? axis. Quarterbacks on the better end of this component are generally more efficient, more consistent, and less likely to kill drives with sacks or turnovers.

PC2 is driven most by Deep Pass Rate, Air Yards per Attempt, Success Rate. I read this more as a style axis. This is where the model starts separating quarterbacks who push the ball downfield and chase explosives from quarterbacks who play a more controlled, underneath game.

The first two components explain 65.3% of the total variance. That does not capture everything about quarterback play, but it gives a useful two-dimensional view of the bigger picture. PCA is helpful here because a lot of these stats overlap with each other. Instead of staring at eight different columns one by one, PCA helps show the main patterns underneath them.

Choosing the Number of Clusters

Based on the elbow plot, I used k = 3 for the final k means model. This is the point where adding more clusters starts to give less payoff. In other words, the model gets enough separation without overcomplicating the analysis.

I also checked silhouette width as a second opinion, but I kept the final choice tied to the elbow method since that was the required approach for this project. # K Means Clustering

Cluster Membership Summary

cluster	Quarterback Seasons
Cluster Sizes
1	31
2	49
3	38

Cluster Visualization

Cluster Profiles

Profile	QB Seasons	Avg Dropbacks	EPA/DB	Success Rate	CPOE	AY/A	Deep Rate	Explosive Rate	Sack Rate	Turnover Rate
Cluster Profile Table
Average statistical profile of each quarterback cluster
Cluster 1: Aggressive Vertical Playmakers	31	516	-0.099	42.1%	-1.48	7.68	11.1%	7.4%	7.9%	3.0%
Cluster 2: Efficient Operators	49	544	0.120	48.4%	2.09	8.42	12.3%	9.9%	5.7%	2.5%
Cluster 3: Volatile Pressure Profiles	38	575	0.079	49.0%	2.03	7.00	8.5%	8.0%	5.9%	2.7%

Standardized Cluster Strengths and Weaknesses

Profile	EPA/DB	Success	CPOE	AY/A	Deep	Explosive	Sack	Turnover
Cluster Strengths and Weaknesses
Positive values are above the overall sample average; negative values are below
Cluster 1: Aggressive Vertical Playmakers	-1.22	-1.14	-0.93	-0.10	0.13	-0.69	0.83	0.46
Cluster 2: Efficient Operators	0.58	0.34	0.34	0.78	0.71	0.71	-0.32	-0.24
Cluster 3: Volatile Pressure Profiles	0.24	0.50	0.32	-0.92	-1.02	-0.35	-0.26	-0.06

Cluster Interpretation

cluster	Profile	Most Typical Examples
Representative Quarterback Seasons by Cluster
These are the seasons closest to the center of each profile
1	Cluster 1: Aggressive Vertical Playmakers	D.Mills 2021, J.Dobbs 2023, C.Stroud 2024
2	Cluster 2: Efficient Operators	J.Love 2023, P.Mahomes 2025, L.Jackson 2023
3	Cluster 3: Volatile Pressure Profiles	J.Burrow 2022, K.Murray 2024, T.Tagovailoa 2021

Cluster Interpretation

cluster	Profile	Most Typical Examples
Representative Quarterback Seasons by Cluster
These are the seasons closest to the center of each profile
1	Cluster 1: Aggressive Vertical Playmakers	D.Mills 2021, J.Dobbs 2023, C.Stroud 2024
2	Cluster 2: Efficient Operators	J.Love 2023, P.Mahomes 2025, L.Jackson 2023
3	Cluster 3: Volatile Pressure Profiles	J.Burrow 2022, K.Murray 2024, T.Tagovailoa 2021

Supplemental Multi-Year Total View

The main model looks at quarterback seasons one at a time, but I also wanted a longer view. This section rolls everything up across 2021 through 2025. It is not the main clustering model, but it helps show which quarterbacks sustained strong production over multiple seasons instead of just popping in one year.

Player	Team	Window	Seasons	Dropbacks	EPA/DB	Success Rate	CPOE	AY/A	Deep Rate	Explosive Rate	Sack Rate	Turnover Rate
Top Quarterbacks, Multi-Year Total View
2021-2025 window, minimum 1,000 dropbacks
B.Purdy	SF	2022-2025	4	1,430	0.218	52.7%	3.87	8.04	10.2%	12.7%	5.7%	3.2%
J.Love	GB	2021-2025	5	1,598	0.150	48.4%	1.93	8.65	13.6%	10.0%	4.3%	2.4%
P.Mahomes	KC	2021-2025	5	3,140	0.147	51.2%	2.47	7.10	9.9%	8.6%	4.8%	2.5%
J.Allen	BUF	2021-2025	5	2,838	0.145	49.9%	2.32	8.38	12.5%	9.1%	4.8%	2.8%
T.Tagovailoa	MIA	2021-2025	5	2,255	0.139	50.1%	2.87	7.41	9.4%	8.9%	5.4%	2.8%
J.Goff	DET	2021-2025	5	2,957	0.135	50.6%	1.52	6.62	7.8%	9.9%	5.3%	2.2%
D.Prescott	DAL	2021-2025	5	2,613	0.130	50.2%	1.94	7.93	10.7%	8.8%	5.4%	2.5%
J.Garoppolo	SF	2021-2024	4	1,023	0.121	50.9%	0.38	7.35	8.6%	9.8%	6.3%	3.5%
T.Brady	TB	2021-2022	2	1,498	0.120	51.2%	1.10	7.49	11.1%	8.3%	2.9%	2.1%
J.Burrow	CIN	2021-2025	5	2,589	0.116	50.1%	4.66	7.18	8.6%	7.9%	7.0%	2.5%
M.Stafford	LA	2021-2025	5	2,680	0.115	51.2%	-0.31	8.08	11.3%	9.9%	5.2%	2.5%
L.Jackson	BAL	2021-2025	5	2,099	0.113	48.0%	2.24	8.76	12.6%	10.0%	7.6%	2.7%
J.Hurts	PHI	2021-2025	5	2,418	0.065	45.5%	3.32	8.54	11.9%	9.4%	7.0%	2.6%
J.Herbert	LAC	2021-2025	5	3,042	0.064	47.3%	0.75	7.54	10.5%	8.0%	6.3%	2.2%
B.Nix	DEN	2024-2025	2	1,229	0.061	44.8%	-0.35	7.31	11.9%	8.0%	3.7%	2.4%
D.Carr	LV	2021-2024	4	2,069	0.051	46.5%	1.28	8.27	12.6%	9.1%	5.1%	2.6%
C.Stroud	HOU	2023-2025	3	1,563	0.051	45.0%	-0.44	8.53	10.9%	9.4%	7.2%	2.2%
K.Cousins	MIN	2021-2025	5	2,362	0.047	46.4%	1.18	7.61	10.0%	8.6%	5.6%	2.7%
A.Rodgers	GB	2021-2025	5	2,018	0.040	45.4%	0.25	7.05	11.1%	8.5%	5.7%	2.2%
K.Murray	ARI	2021-2025	5	1,968	0.039	48.0%	1.17	7.12	11.3%	7.5%	6.1%	2.1%

Discussion

The biggest takeaway from the clustering model is that quarterbacks are not separated by one magic stat. EPA matters, but so do success rate, CPOE, air yards, explosive plays, sacks, and turnovers. The value of this model is that it looks at all of those things together.

That is why I like the player-season setup. It lets the model treat each season as its own quarterback profile. A player can look one way in 2021 and a completely different way in 2024 depending on scheme, health, protection, receivers, or his own development. A single five-year average would hide a lot of that.

The multi-year table adds a different kind of context. Quarterbacks like B.Purdy, J.Love, P.Mahomes, J.Allen, T.Tagovailoa rise to the top because they combined efficiency with enough volume over the full sample. That is useful, but it answers a slightly different question. The clustering model is more about style and profile. The multi-year table is more about sustained production.

Overall, this approach shows that quarterback archetypes exist on a spectrum. Some quarterbacks win with efficiency and control. Some win by pushing the ball downfield. Others are more volatile because of sacks, turnovers, or lower consistency. That is the part basic passing totals usually miss.

Practical Implications and Limitations

Why This Matters

This kind of clustering is useful because it gives more context than a normal leaderboard. It can help compare quarterbacks who may have similar raw stats but very different playing styles. That matters for scouting, team building, scheme fit, opponent prep, and even tracking how a quarterback changes from one season to the next.

Limitations

This model still has limits. Play by play data tells us what happened, but it does not perfectly separate the quarterback from the offense around him. Offensive line play, receivers, play calling, game script, and injuries all matter. The cluster names also require some football judgment. The data creates the groups, but I am still interpreting what those groups mean.

Conclusion

This analysis used five regular seasons of NFL play by play data and built a clustering model around 118 quarterback seasons with at least 400 dropbacks. The model grouped those seasons into 3 quarterback profiles based on efficiency, accuracy, aggressiveness, explosiveness, sack rate, and turnover rate.

The main point is that quarterback evaluation gets a lot more interesting when the stats are combined instead of viewed one at a time. Passing yards and touchdowns are useful, but they do not fully explain how a quarterback plays. This model does a better job showing whether a quarterback is efficient, aggressive, explosive, mistake-prone, or more of a controlled operator.

The biggest takeaway is simple: quarterback style matters. Some players create value by staying on schedule. Others create value by attacking downfield. Others are harder to trust because the negative plays show up too often. Clustering does not answer every quarterback question, but it gives a cleaner way to see those differences.

Quarterback Archetypes from NFL Play-by-Play, 2021-2025

DJ Barry

Overview

Data and Feature Engineering

Feature Definitions

Sample Overview

Main Analytical Dataset

Principal Component Analysis

Variance Explained

Principal Component Loadings

PCA Interpretation

Choosing the Number of Clusters

Cluster Membership Summary

Cluster Visualization

Cluster Profiles

Standardized Cluster Strengths and Weaknesses

Cluster Interpretation

Cluster Interpretation

Supplemental Multi-Year Total View

Discussion

Practical Implications and Limitations

Why This Matters

Limitations

Conclusion