NFL QB Passing Data Analysis

Author

Jean Tcheby

Image Source: NFL.com

Introduction

The National Football League (NFL) has evolved into a passing-focused league where quarterback performance plays a major role in offensive success. This project analyzes NFL quarterback passing statistics from 2001 to 2023 using data originally collected from Pro Football Reference and accessed through Kaggle. Variables explored in this project include passing yards, touchdowns, interceptions, passing attempts, completion percentage, quarterback age, and team affiliation.

The dataset contains both quantitative variables, such as passing yards and touchdowns, and categorical variables, such as player names and NFL teams. The purpose of this project is to explore relationships between quarterback performance statistics and passing production in the modern NFL using data visualization and statistical modeling techniques in RStudio.

I chose this topic because I enjoy football and sports analytics, and I was interested in seeing how quarterback efficiency and offensive production relate to overall passing success in today’s NFL.

Background Research

Over the last two decades, the NFL has become increasingly focused on passing offenses. Rule changes designed to protect quarterbacks and receivers have contributed to rising passing yard totals and touchdown production across the league. According to Pro Football Reference and NFL statistical reports, modern quarterbacks are attempting more passes and producing higher offensive numbers than previous generations. This shift has made quarterback efficiency and passing performance critical components of team success in the modern NFL.

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.3
Warning: package 'readr' was built under R version 4.5.3
Warning: package 'dplyr' was built under R version 4.5.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)

qb_data <- read_csv("C:/Users/14408/Documents/passing_cleaned.csv")
New names:
Rows: 2350 Columns: 27
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(2): Player, Tm dbl (25): ...1, Age, G, GS, Cmp, Att, Cmp%, Yds, TD, TD%, Int,
Int%, 1D, Lng...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
head(qb_data)
# A tibble: 6 × 27
   ...1 Player      Tm      Age     G    GS   Cmp   Att `Cmp%`   Yds    TD `TD%`
  <dbl> <chr>       <chr> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1     0 Kurt Warner STL      30    16    16   375   546   68.7  4830    36   6.6
2     1 Peyton Man… IND      25    16    16   343   547   62.7  4131    26   4.8
3     2 Brett Favre GNB      32    16    16   314   510   61.6  3921    32   6.3
4     3 Aaron Broo… NOR      25    16    16   312   558   55.9  3832    26   4.7
5     4 Rich Gannon OAK      36    16    16   361   549   65.8  3828    27   4.9
6     5 Trent Green KAN      31    16    16   296   523   56.6  3783    17   3.3
# ℹ 15 more variables: Int <dbl>, `Int%` <dbl>, `1D` <dbl>, Lng <dbl>,
#   `Y/A` <dbl>, `AY/A` <dbl>, `Y/C` <dbl>, `Y/G` <dbl>, Rate <dbl>, Sk <dbl>,
#   `Yds-s` <dbl>, `Sk%` <dbl>, `NY/A` <dbl>, `ANY/A` <dbl>, Year <dbl>
qb_clean <- qb_data %>%
  filter(Att > 200) %>%
  select(Player, Tm, Age, Yds, TD, Int, `Cmp%`, Att) %>%
  arrange(desc(Yds))
head(qb_clean)
# A tibble: 6 × 8
  Player          Tm      Age   Yds    TD   Int `Cmp%`   Att
  <chr>           <chr> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
1 Peyton Manning  DEN      37  5477    55    10   68.3   659
2 Drew Brees      NOR      32  5476    46    14   71.2   657
3 Tom Brady       TAM      44  5316    43    12   67.5   719
4 Patrick Mahomes KAN      27  5250    41    12   67.1   648
5 Tom Brady       NWE      34  5235    39    12   65.6   611
6 Drew Brees      NOR      37  5208    37    15   70     673
qb_clean <- qb_clean %>%
  mutate(
    Age_Group = case_when(
      Age < 25 ~ "Under 25",
      Age >= 25 & Age < 30 ~ "25-29",
      Age >= 30 & Age < 35 ~ "30-34",
      TRUE ~ "35+"
    )
  )

ggplot(qb_clean, aes(x = TD, y = Yds, color = Age_Group)) +
  geom_point(size = 3, alpha = 0.7) +
  labs(
    title = "Relationship Between Passing Touchdowns and Passing Yards",
    subtitle = "NFL Quarterback Statistics from 2001–2023",
    x = "Passing Touchdowns",
    y = "Passing Yards",
    color = "Age Group",
    caption = "Source: Pro Football Reference via Kaggle"
  ) +
  theme_dark()

This visualization shows a strong positive relationship between passing touchdowns and passing yards among NFL quarterbacks from 2001–2023. Quarterbacks with higher touchdown totals generally also recorded higher passing yard totals. The graph also suggests that quarterbacks in the 30–34 and 35+ age groups appear more frequently among the highest-performing passing seasons, indicating that experience may contribute to elite quarterback production.

model <- lm(Yds ~ TD + Int + Att + `Cmp%`, data = qb_clean)

summary(model)

Call:
lm(formula = Yds ~ TD + Int + Att + `Cmp%`, data = qb_clean)

Residuals:
    Min      1Q  Median      3Q     Max 
-784.70 -146.20    6.82  132.67  683.30 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1523.6975   126.9105 -12.006   <2e-16 ***
TD             32.0084     1.5076  21.231   <2e-16 ***
Int             2.1712     2.1236   1.022    0.307    
Att             5.4985     0.1123  48.956   <2e-16 ***
`Cmp%`         25.7668     2.0957  12.295   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 214.8 on 784 degrees of freedom
Multiple R-squared:  0.9549,    Adjusted R-squared:  0.9547 
F-statistic:  4150 on 4 and 784 DF,  p-value: < 2.2e-16

A multiple linear regression model was used to examine how passing touchdowns, interceptions, passing attempts, and completion percentage predict passing yards among NFL quarterbacks from 2001–2023. The model produced an adjusted R² value of 0.9547, indicating that approximately 95% of the variation in passing yards can be explained by the selected variables. Passing touchdowns, passing attempts, and completion percentage were all statistically significant predictors of passing yards, while interceptions showed a weaker relationship. Overall, the model suggests that quarterback efficiency and offensive volume strongly influence passing production in the modern NFL.

par(mfrow = c(2,2))
plot(model)

graphics.off()
par(mfrow = c(1,1))
plot(model, which = 1)

top_qbs <- qb_clean %>%
  group_by(Player) %>%
  summarize(Total_TDs = sum(TD, na.rm = TRUE)) %>%
  arrange(desc(Total_TDs)) %>%
  slice(1:10)
ggplot(top_qbs, aes(x = reorder(Player, Total_TDs),
                    y = Total_TDs,
                    fill = Total_TDs)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 NFL Quarterbacks by Passing Touchdowns",
    subtitle = "NFL Passing Statistics from 2001–2023",
    x = "Quarterback",
    y = "Total Passing Touchdowns",
    caption = "Source: Pro Football Reference via Kaggle"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Tableau Visualization

Top 10 NFL Quarterbacks by Passing Touchdowns | Tableau Public

Conclusion

This project explored quarterback passing performance in the NFL from 2001–2023 using statistical modeling and data visualization techniques in RStudio. The visualizations demonstrated strong relationships between passing touchdowns, passing yards, and quarterback efficiency. The multiple linear regression model showed that passing attempts, touchdowns, and completion percentage are strong predictors of passing yards. Overall, the project highlights how quarterback performance continues to shape offensive success in the modern NFL.