IS 607 – Project 2: NFL QB Leaders (Wide → Tidy)

Author

Cai Lin

Introduction

This mini-analysis uses a current snapshot of NFL quarterback passing leaders. I start from a single-stat leaderboard, intentionally reshape it into a wide layout to simulate a common “wide” data scenario, then tidy it back to long with tidyr. I summarize who leads in passing yards and provide a brief interpretation. The main goal is to demonstrate correct wide→tidy transformations and produce readable, publication-quality output.

Code
# Setup
library(readr)
library(dplyr)
library(tidyr)
library(gt)

qb <- read_csv("nfl_qb_passing_leaders_2025_top5.csv", show_col_types = FALSE)

# Keep a tidy version for analysis
qbTidy <- qb %>%
  transmute(
    player,
    team,
    stat,
    yards = value,
    asOfDate
  )

qbTidy |>
  arrange(desc(yards)) |>
  gt() |>
  tab_header(title = "NFL QB Passing Leaders — Tidy Snapshot") |>
  fmt_number(columns = yards, decimals = 0) |>
  cols_label(
    player = "Player",
    team = "Team",
    stat = "Stat",
    yards = "Yards",
    asOfDate = "As of"
  )
NFL QB Passing Leaders — Tidy Snapshot
Player Team Stat Yards As of
Sam Darnold SEA Passing Yards 905 2025-09-28
Justin Herbert LAC Passing Yards 860 2025-09-28
Geno Smith LV Passing Yards 831 2025-09-28
Daniel Jones IND Passing Yards 816 2025-09-28
Dak Prescott DAL Passing Yards 800 2025-09-28
Code
qbTidy %>% 
  arrange(desc(yards)) %>% 
  head(10)
# A tibble: 5 × 5
  player         team  stat          yards asOfDate  
  <chr>          <chr> <chr>         <dbl> <date>    
1 Sam Darnold    SEA   Passing Yards   905 2025-09-28
2 Justin Herbert LAC   Passing Yards   860 2025-09-28
3 Geno Smith     LV    Passing Yards   831 2025-09-28
4 Daniel Jones   IND   Passing Yards   816 2025-09-28
5 Dak Prescott   DAL   Passing Yards   800 2025-09-28
Code
qbWide <- qbTidy |>
  select(stat, player, yards) |>
  pivot_wider(names_from = player, values_from = yards)

qbLong <- qbWide |>
  pivot_longer(
    cols = -stat,
    names_to = "player",
    values_to = "yards"
  ) |>
  left_join(distinct(qbTidy, player, team, asOfDate), by = "player") |>
  relocate(team, .after = player)

qbLong |>
  arrange(desc(yards)) |>
  gt() |>
  tab_header(title = "Tidy Table Recovered from Wide") |>
  fmt_number(columns = yards, decimals = 0) |>
  cols_label(
    player = "Player",
    team = "Team",
    stat = "Stat",
    yards = "Yards",
    asOfDate = "As of"
  )
Tidy Table Recovered from Wide
Stat Player Team Yards As of
Passing Yards Sam Darnold SEA 905 2025-09-28
Passing Yards Justin Herbert LAC 860 2025-09-28
Passing Yards Geno Smith LV 831 2025-09-28
Passing Yards Daniel Jones IND 816 2025-09-28
Passing Yards Dak Prescott DAL 800 2025-09-28
Code
# Example analyses
leader <- qbLong |>
  arrange(desc(yards)) |>
  slice_head(n = 1)

summaryStats <- qbLong |>
  summarize(
    avgYards = mean(yards, na.rm = TRUE),
    maxYards = max(yards, na.rm = TRUE),
    minYards = min(yards, na.rm = TRUE),
    spread = maxYards - minYards
  )

leader |>
  select(player, team, yards) |>
  gt() |>
  tab_header(title = "Leader") |>
  fmt_number(columns = yards, decimals = 0)
Leader
player team yards
Sam Darnold SEA 905
Code
summaryStats |>
  gt() |>
  tab_header(title = "Summary of Yards (Top 5)") |>
  fmt_number(everything(), decimals = 0)
Summary of Yards (Top 5)
avgYards maxYards minYards spread
842 905 800 105
Code
# Save outputs for your project folder
write_csv(qbWide, "qb_passing_wide.csv")
write_csv(qbLong, "qb_passing_tidy.csv")

#Data import and cleanup

I read in the CSV of the top five NFL quarterbacks by passing yards. I renamed value to yards and kept only the needed columns (player, team, stat, yards, asOfDate) to make the dataset clearer and easier to analyze.

#Wide format

I reshaped the tidy table into wide form with pivot_wider(), creating one row per stat and one column per player. This demonstrates how data can be stored in a less analysis-friendly format.

#Back to tidy

I then used pivot_longer() to return the data to tidy form, with each row showing one player’s passing yards. I rejoined the team and date fields so the dataset was complete.

#Analysis

From the tidy data I identified the leader in yards and calculated summary statistics (average, max, min, spread).

#Conclusion The current leader holds the highest passing yardage in this snapshot. The remaining players are relatively close, with an average in the low-to-mid 800s and a modest spread from first to fifth.