Xavier University Final Project

Author

Ryan Zaccour

🏈 Beyond the Box Score: An Analytical Look at NFL Running Back Value and the Fan Experience

As a recent graduate in Business Analytics and Marketing, my passion lies in applying data-driven methods to understand and predict complex real-world phenomena. Few areas are as complex, or as passionately debated, as the National Football League (NFL). This analysis merges two key data science techniques in statistical performance modeling and natural language processing to address critical questions facing the league:

  1. Running Back Sustainability - Does the “workload penalty” truly limit the efficiency of elite running backs?
  2. The Fan Factor - How does fan base sentiment influence the non-financial decision-making of an elite free agent?

Part 1: Quantifying the True Value of an NFL Running Back

Research Question: Does an increase in rushing volume (attempts) among the NFL’s leading rushers lead to a decline in per-carry efficiency (Yards Per Attempt), and how do factors like player availability, single-play outliers, and goal-line usage influence this relationship?

The long-running debate in football is whether a team should rely on a “workhorse” running back (RB) or employ a committee approach. Analytics often warns of a “workload penalty”, where efficiency declines with increases carries due to fatigue and defensive adjustments. I scraped data from CBS Sports for the top 250 rushers through Week 14 of the 2025 NFL season to test this hypothesis across three dimensions: per-carry efficiency, outlier play potential, and goal-line scoring effectiveness.

This gave data on the player’s name, games played, rushing attempts, rushing yards, rushing yards per game, average yards per rush, rushing touchdowns, and their longest rush. I will then create a series of visualizations to compare players across categories like workload to see how their efficiency and production differ.

To get this data, I web-scraped from the paginated rushing statistics table on CBS Sports, which lists 50 players per page. Here is an example of the first page of this data. https://www.cbssports.com/nfl/stats/player/rushing/nfl/regular/all/?page=

I utilized a for loop to programmatically access and scrape five separate pages of the CBS Sports statistics table, retrieving 250 rows (players) in total. To ensure ethical scraping practices and prevent overloading on the server, a three second sleep delay was intentionally implemented between the scraping of each page. Once all the raw data was collected, I put it all together to get one data frame then began the cleaning process. I used the janitor function to change the messy column headers into clear, usable names, then transformed all the statistical fields from the initial character data type to numeric in order for the calculations and visuals. Here is a portion of the final neat data:

# A tibble: 6 × 9
  Player_Name            Games_Played Rushing_Attempts Rushing_Yards Rushing_YPG
  <chr>                         <dbl>            <dbl>         <dbl>       <dbl>
1 "J. Taylor\n         …           13              247          1356       104. 
2 "J. Cook\n           …           13              249          1308       101. 
3 "D. Achane\n         …           13              193          1126        86.6
4 "B. Robinson\n       …           13              215          1081        83.2
5 "J. Gibbs\n          …           13              187          1062        81.7
6 "D. Henry\n          …           13              222          1025        78.8
# ℹ 4 more variables: AVG_Yards_Per_Rush <dbl>, Rushing_Tds <dbl>,
#   Longest_Rush <dbl>, source_page <dbl>

1. Workload vs. Efficiency: Defying the Penalty

The first step was to determine the relationship between rushing attempts (Workload) and Average Yards Per Rush (Efficiency)

This scatterplot reveals that the theoretical workload penalty is largely absent among the NFL’s top performers. The scatter plot below shows a nearly flat trend line positioned just below the 5 yards per attempt (YPA) mark.

This suggests that for the top 250 rushers, increasing volume does not cause a significant decline in per-carry efficiency. While low-volume runners show extreme variance with the outliers, high-volume runners cluster tightly around a reliable, consistent 4-5 YPA.

Light Workload vs. Medium Workload vs. Heavy Workload

To confirm this, I segmented players into workload tiers. The box plot reinforced the finding: the Heavy Workload (150+ Attempts) category had the highest average YPA and the tightest distribution.

This is a critical finding: High-volume backs are not just efficient; they are predictably and reliably efficient. the instability and extreme outliers are confined to the low-volume group, proving that for teams seeking a reliable foundation for their running game, trusting a high-volume runner provides a consistent, high floor of production.

2. The Influence of Outlier Plays

Does a massive, game-breaking run artificially inflate a player’s seasonal efficiency?

By analyzing the longest rush of the season, we can separate reliable consistency from explosive, game-breaking ability. While there is no strong correlation between a player’s longest rush and their overall YPA, the visual confirms that a single outlier can skew the season statistics of low-volume backs.

Players with low total attempts (like Emari Demercado) saw their YPA potentially inflated by a single 71-yard run. The elite players such as Jonathon Taylor demonstrate the rare combination of both high day-to-day efficiency and superior outlier potential (league leading 83 yard rush). This analysis allows decision-makers to distinguish between a “flash-in-the-pan” player and one who offers both stability and explosion.

3. Durability and Goal-Line Effectiveness

The most valuable RBs are those who sustain high performance over a full season and convert near the goal line.

Filtering for players with full availability (13 games played) and a medium-to-heavy workload (minimum 150 attempts) reveals the “sweet spot” of value. I did 13 games because even though it is 14 weeks of data, every team has had a bye week so the players have only had 13 games played and the 150 carries shows they are averaging 11+ rushes per game signaling they are a starter. Players at the top like De’Von Achane & Jahmyr Gibbs are the ideal targets, combining elite efficiency with the heaviest workload, successfully defying the theoretical workload penalty through exceptional durability and talent.

Goal-line usage is a function of trust and talent. By isolating players with at least 150 rushing attempts, we identify who is trusted near the end zone. Jonathon Taylor (16 TDs) leads the league dramatically, solidifying his status as the elite performer who successfully combines high volume with top-tier efficiency and dominant scoring. Coming next up in line are players like Jahmyr Gibbs, Josh Jacobs, and Derrick Henry. All of these players have 10+ TDs showing they are great at getting it done in the redzone.

Part 1 Conclusion

The research concludes that the “workload penalty” is a myth among elite NFL rushers. High volume is not a detriment but rather a correlate of reliability and high consistency. The most valuable running backs are those who combine high volume with high YPA, demonstrating exceptional talent and durability, proving they are “worth it” long-term investments.

Part 2: The Fan Factor: Analyzing Emotional Sentiment in Free Agency

Research question: If an elite running back is considering both the San Francisco 49ers and the Pittsburgh Steelers in free agency, which team’s fan base expresses a sentiment profile that suggests a more stable and positive environment?

For an elite free agent, the decision to sign is not purely financial; the team’s culture and environment p[lay a major role. I used Natural Language Processing on fan comments that I scraped from the 49ers’ and Steelers’ Subreddits to quantify the emotional pulse of each fan base.

I am using a single, non-tidy CSV file named nfl_url_master.csv that contains data from Reddit comments for the 49ers and Steelers, collected from their respective Subreddits. I parsed URL’s on 10 different pages for both teams and then combined all of the data into one in order to create this csv. The key columns used were the fans comments and the team (Niners or Steelers). Below are a few rows of the data I scraped through the 1,981 rows of fan reviews:

1. Fan Base Emotional Profile

Using NRC Emotion Lexicon, I calculated the relative frequency of core emotions for both fan bases, filtering out generic “positive” and “negative” labels to focus on specific emotional states (Trust, Joy, Fear, Anger, etc.).

The data strongly suggests the San Francisco 49ers offer a more emotionally positive and stable environment. they express significantly higher levels of Joy, Anticipation, and Surprise, indicating widespread fan optimism and positive momentum.

On the other side, the Steelers, despite having the highest score in fundamental Trust (signifying loyalty), also express dramatically higher levels of Anger and Fear. This indicates that their loyalty is currently mixed with significant frustration and current turmoil.

A free agent would find a less volatile, morev positive environment with the 49ers.

2. Sentiment Trend Over Time

To assess the stability of these emotions, I tracked the daily average sentiment score for both teams over the last two months.

The trend lines show a critical divergence since mid-November:

  • 49ers (Red): Display a clear, sustained positive upward trend, showing steadily rising fan morale and optimism towards the teams direction.

  • Steelers (Blue): Show a distinct negative decline, falling into negative territory. Then a sudden increase in December.

This chronological analysis proves that the Steelers’ environment is currently more high-pressure while the 49ers’ stable positive trend signals a more consistently supportive environment for a new player.

3. Unique Brand Attributes

Using the “term frequency method”, I defined the most distinctive words that define each fan base’s unique discussion focus.

This reveals that the 49ers are overwhelmingly positive and deeply committed, with “faithful” (from their motto “Faithful to the Bay”) being the most distinctive. This signals an enduring, loyal, and supportive emotional foundation.

The Steelers are very negative, including terms like “mediocrity”, “mediocre”, “lack”, and “garbage”. This indicates the fan base is preoccupied with the current perceived shortcomings and painful discussion regarding the quality of recent play.

Analysis on Team Management

  1. Running Back Value: The high volume workload penalty is a non-factor for elite RBs. Their most valuable assets are their sustained efficiency, high-impact outlier potential, and durability to convert goal-line opportunities.

  2. Free Agent Environment: Based purely on the current emotional pulse of the fan bases, an elite free agent would find a more stable, less volatile, and overwhelmingly positive environment with the San Francisco 49ers than with the Pittsburgh Steelers during the 2025 season. While the Steelers possess a higher degree of fundamental trust, the high current levels of fan anger and fear signal a more challenging workplace for a new player.

My analysis, combining deep sports performance statistics with cutting-edge sentiment analysis, demonstrates the power of data science to provide actionable intelligence in the United States most competitive professional league.

Thank you for reading and for your interest in the data-driven future of NFL analysis!