The National Football League (NFL) has evolved into a passing-focused league where quarterback performance plays a major role in offensive success. This project analyzes NFL quarterback passing statistics from 2001 to 2023 using data originally collected from Pro Football Reference and accessed through Kaggle. Variables explored in this project include passing yards, touchdowns, interceptions, passing attempts, completion percentage, quarterback age, and team affiliation.
The dataset contains both quantitative variables, such as passing yards and touchdowns, and categorical variables, such as player names and NFL teams. The purpose of this project is to explore relationships between quarterback performance statistics and passing production in the modern NFL using data visualization and statistical modeling techniques in RStudio.
I chose this topic because I enjoy football and sports analytics, and I was interested in seeing how quarterback efficiency and offensive production relate to overall passing success in today’s NFL.
Background Research
Over the last two decades, the NFL has become increasingly focused on passing offenses. Rule changes designed to protect quarterbacks and receivers have contributed to rising passing yard totals and touchdown production across the league. According to Pro Football Reference and NFL statistical reports, modern quarterbacks are attempting more passes and producing higher offensive numbers than previous generations. This shift has made quarterback efficiency and passing performance critical components of team success in the modern NFL.
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.3
Warning: package 'readr' was built under R version 4.5.3
Warning: package 'dplyr' was built under R version 4.5.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# A tibble: 6 × 8
Player Tm Age Yds TD Int `Cmp%` Att
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Peyton Manning DEN 37 5477 55 10 68.3 659
2 Drew Brees NOR 32 5476 46 14 71.2 657
3 Tom Brady TAM 44 5316 43 12 67.5 719
4 Patrick Mahomes KAN 27 5250 41 12 67.1 648
5 Tom Brady NWE 34 5235 39 12 65.6 611
6 Drew Brees NOR 37 5208 37 15 70 673
qb_clean <- qb_clean %>%mutate(Age_Group =case_when( Age <25~"Under 25", Age >=25& Age <30~"25-29", Age >=30& Age <35~"30-34",TRUE~"35+" ) )ggplot(qb_clean, aes(x = TD, y = Yds, color = Age_Group)) +geom_point(size =3, alpha =0.7) +labs(title ="Relationship Between Passing Touchdowns and Passing Yards",subtitle ="NFL Quarterback Statistics from 2001–2023",x ="Passing Touchdowns",y ="Passing Yards",color ="Age Group",caption ="Source: Pro Football Reference via Kaggle" ) +theme_dark()
This visualization shows a strong positive relationship between passing touchdowns and passing yards among NFL quarterbacks from 2001–2023. Quarterbacks with higher touchdown totals generally also recorded higher passing yard totals. The graph also suggests that quarterbacks in the 30–34 and 35+ age groups appear more frequently among the highest-performing passing seasons, indicating that experience may contribute to elite quarterback production.
model <-lm(Yds ~ TD + Int + Att +`Cmp%`, data = qb_clean)summary(model)
Call:
lm(formula = Yds ~ TD + Int + Att + `Cmp%`, data = qb_clean)
Residuals:
Min 1Q Median 3Q Max
-784.70 -146.20 6.82 132.67 683.30
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1523.6975 126.9105 -12.006 <2e-16 ***
TD 32.0084 1.5076 21.231 <2e-16 ***
Int 2.1712 2.1236 1.022 0.307
Att 5.4985 0.1123 48.956 <2e-16 ***
`Cmp%` 25.7668 2.0957 12.295 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 214.8 on 784 degrees of freedom
Multiple R-squared: 0.9549, Adjusted R-squared: 0.9547
F-statistic: 4150 on 4 and 784 DF, p-value: < 2.2e-16
A multiple linear regression model was used to examine how passing touchdowns, interceptions, passing attempts, and completion percentage predict passing yards among NFL quarterbacks from 2001–2023. The model produced an adjusted R² value of 0.9547, indicating that approximately 95% of the variation in passing yards can be explained by the selected variables. Passing touchdowns, passing attempts, and completion percentage were all statistically significant predictors of passing yards, while interceptions showed a weaker relationship. Overall, the model suggests that quarterback efficiency and offensive volume strongly influence passing production in the modern NFL.
This project explored quarterback passing performance in the NFL from 2001–2023 using statistical modeling and data visualization techniques in RStudio. The visualizations demonstrated strong relationships between passing touchdowns, passing yards, and quarterback efficiency. The multiple linear regression model showed that passing attempts, touchdowns, and completion percentage are strong predictors of passing yards. Overall, the project highlights how quarterback performance continues to shape offensive success in the modern NFL.