Assignment 7

Introduction

In the NFL, quarterback performance is one of the most important factors for an NFL team’s success. NFL analysts often debate whether high‑yardage quarterbacks are also more efficient, or whether volume and efficiency diverge. Instead of relying on pre‑built datasets, I scraped raw passing statistics directly from CBS Sports to explore the following question:

Do high‑yardage quarterbacks also have higher passer ratings?

Data Collection

The following data used comes from the CBS Sports NFL passing stats page:

https://www.cbssports.com/nfl/stats/player/passing/nfl/regular/all/?page=

CBS displays its passing statistics across multiple URLs following a consistent pattern using the ?page= parameter. This structure makes the site suited to be web scraped because each page presents the same table with the same variables just different data, allowing a loop to collect multiple sets of data.

The process of web scraping took place in a separate .R script where I created a scraping function to collect from three pages of the website (0, 1, 2). This allowed the loop to extract the passing-stats table from each page.

After scraping the data I discovered CBS stores player, position, and team in a single combined column. I separated them into three columns and converted numeric columns. Afterwards I saved the data set as a .csv file.

passing_stats <- read_csv("cbs_passing_stats.csv")
Rows: 100 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Player, Position, Team, pct_completion_percentage, lng_longest_comp...
dbl (9): gp_games_played, att_pass_attempts, cmp_pass_completions, yds_passi...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Wrangling

The data set includes quarterbacks, punters, wide receivers, running backs, and any other NFL players who may have attempted only one or two passes during the 2025 NFL regular season. To avoid misleading results, I filtered the data to only include:

  • Only players who primarily play Quarterback

  • Quarterback has thrown at least 70 pass attempts

  • Quarterback has to have played in at least 6 games

  • A valid Passer Rating

  • Valid Passing Yards

passing_qb <- passing_stats %>%
  mutate(
    rate_passer_rating = na_if(rate_passer_rating, "—"),
    rate_passer_rating = as.numeric(rate_passer_rating)
  ) %>% 
  filter(Position == "QB",
         att_pass_attempts >= 70,
         gp_games_played >= 6,
         !is.na(rate_passer_rating),
         !is.na(yds_passing_yards)
         )

These filters are in place to ensure the analysis reflects true quarterback performance rather than trick plays or extremely small sample sizes

Analysis

Summary Statistics

passing_qb %>%
  summarise(
    n_qbs = n(),
    avg_yards = mean(yds_passing_yards, na.rm = TRUE),
    avg_td = mean(td_touchdown_passes, na.rm = TRUE),
    avg_int = mean(int_interceptions_thrown, na.rm = TRUE),
    avg_rate = mean(rate_passer_rating, na.rm = TRUE)
  )
# A tibble: 1 × 5
  n_qbs avg_yards avg_td avg_int avg_rate
  <int>     <dbl>  <dbl>   <dbl>    <dbl>
1    42     2673.   18.0    7.76     90.2

The summary statistics above provide an overview of quarterback play focusing on:

  • n_qbs - the number of quarterbacks who met the filtering requirements

    • 42 Quarterbacks met the criteria
  • avg_yards - the average total passing yards, representing overall production

    • The average total passing yards per quarterback in 2025 is about 2,673 yards,showing moderate production across the qualified players
  • avg_td - the average number of touchdown passes

    • The average passing touchdowns in a single season is about 18 touchdowns
  • avg_int - the average number of interceptions thrown

    • The average number of interceptions thrown in a single season is roughly 8 interceptions
  • avg_rate - the average passer rating, a composite efficiency metric

    • The average passer rating is about an 90.23, which aligns with what is typically considered league-average

These statistics represent a baseline for evaluating quarterback performance. By filtering out low-usage players and cleaning non-numeric values in the passer-rating column, the summary reflects true season-level quarterback output rather than misleading results caused by limited-action players. This baseline is essential for determining the visualizations and hypothesis that follow the main focus of if quarterback passing yards are the main drive to determining their passer rating.

Distribution of Passing Yards

`geom_smooth()` using formula = 'y ~ x'

The plot shows that high-yardage quarterbacks typically play more games, confirming the high-yardage group represent full-time starters that were healthy and maintained their starting position throughout the regular season. This context ensures that later comparison of passer rating are not distorted by unequal playing time.

Touchdowns vs Interceptions

`geom_smooth()` using formula = 'y ~ x'

Quarterbacks who throw more touchdowns also tend to throw more interceptions, which reflects higher usage and more aggressive passing. This matters to our main focus because it highlights that high-volume quarterbacks often take on greater responsibility in the offense, setting the stage for why high-yardage quarterbacks may have higher passer ratings

Passing Yards vs Passer Rating

`geom_smooth()` using formula = 'y ~ x'

This plot addresses how production relates to efficiency. The upward trend tells us quarterbacks who throw for more yards are most likely to have higher passer ratings, suggesting that high-yardage quarterbacks are generally more efficient rather than high-volume passers

Top 10 Quarterbacks by Passing Yards

Top 10 Quarterback Passer Ratings

Looking at the Top 10 Quarterbacks by Passing Yards and Top 10 Quarterbacks by Passer Rating charts, they show many of the league’s top yardage producers also appearing among the top passer‑rating quarterbacks. This overlap supports the main question by showing that the quarterbacks who generate the most passing yards are often the same ones who perform most efficiently. While a few high‑yardage players fall outside the top efficiency group, the overall pattern reinforces the idea that strong production and strong passer ratings tend to go together.

Hypothesis Test

To statistically evaluate whether high‑yardage quarterbacks also have higher passer ratings, I created a binary variable identifying quarterbacks in the top 25% of passing yards. I then performed a two‑sample t‑test comparing the passer ratings of high‑yardage quarterbacks to the rest of the league.


    Welch Two Sample t-test

data:  rate_passer_rating by high_yards
t = -3.4161, df = 22.801, p-value = 0.002385
alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
95 percent confidence interval:
 -17.817898  -4.373305
sample estimates:
mean in group FALSE  mean in group TRUE 
           87.32258            98.41818 

The t‑test compares passer ratings between quarterbacks below the top 25% in passing yards (FALSE) and those within the top 25% (TRUE). The results show a statistically significant difference:

  • High‑yardage QBs average a passer rating of 98.4

  • Other QBs average a passer rating of 87.3

  • p‑value = 0.002, which is well below the 0.05 threshold

This means high‑yardage quarterbacks have significantly higher passer ratings, providing strong statistical evidence that production and efficiency rise together.

Conclusion

The results of this analysis show a clear connection between quarterback production and efficiency. Across the visualizations and statistical testing, high‑yardage quarterbacks consistently demonstrated higher passer ratings than the rest of the league. The overlap between the top yardage leaders and top efficiency leaders, along with the significant t‑test results, supports the main finding: quarterbacks who throw for more yards tend to have a higher passer rating.