In the NFL, fans and analysts often say that point differential tells the truth — meaning that how much a team outscores its opponents is a better indicator of team strength than its win–loss record alone. This idea is widely discussed in football analytics because point differential captures not just whether a team wins, but how convincingly it wins.
Overall Question:How strongly is point differential related to win percentage for NFL teams from 2019–2023, and does this relationship remain consistent across seasons?
This question is interesting because it tests a core analytics belief using real multi‑season data. If point differential is strongly tied to winning, it supports the idea that teams with unusually high or low records relative to their scoring margins may regress in future seasons. If the relationship varies across seasons, that may indicate changes in league parity, scoring environment, or competitive balance.
To answer this question, I collected team‑level NFL regular‑season standings data from FootballDB, a publicly accessible website that provides HTML tables of wins, losses, points scored, and points allowed for every team.
I wrote a dedicated scraping function in R that:
Visits each season’s standings page on FootballDB
Extracts the HTML table containing team results
Cleans and formats the data
Adds derived variables such as win percentage and point differential
The function loops through all seasons from 2019 to 2023
The scraped dataset was saved as a static CSV file and uploaded to GitHub, where it can be imported into R using a direct raw link.
Data Wrangling
games = total games played
win_pct = wins divided by games
point_diff = points scored minus points allowed
season is converted to a factor for plotting
This prepares the dataset for all visualizations.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 200 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): team, X8, X9, X10, X11, X12
dbl (9): wins, losses, ties, pct, points_for, points_against, season, games,...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
This plot shows how strongly point differential predicts win percentage. Each point = one team in one season. The red line = linear trend. A steep upward slope means teams with higher point differential win more often. This directly tests the idea that “point differential tells the truth.”
ggplot(nfl_clean, aes(point_diff, win_pct)) +geom_point(alpha =0.7, color ="steelblue") +geom_smooth(method ="lm", se =TRUE, color ="darkred") +labs(title ="Point Differential vs Win Percentage (NFL 2019–2023)",x ="Point Differential",y ="Win Percentage" )
This version breaks the relationship down by season. Each color = a different NFL season. Each season gets its own trend line. This lets you see whether the relationship is stable or varies year‑to‑year.
ggplot(nfl_clean, aes(point_diff, win_pct, color = season)) +geom_point(alpha =0.7) +geom_smooth(method ="lm", se =FALSE) +labs(title ="Point Differential vs Win % by Season",x ="Point Differential",y ="Win Percentage",color ="Season" )
This shows how point differential varies across seasons. The median line inside each box shows typical team performance. The dashed line at 0 shows whether teams tend to be positive or negative. This helps identify whether certain seasons were more competitive or more lopsided.
We can see that only in 2021 were the most teams positive in point differential. This could be caused by a clear separation between the good and bad teams in the league.
ggplot(nfl_clean, aes(season, point_diff)) +geom_boxplot(fill ="orange", alpha =0.7) +geom_hline(yintercept =0, linetype ="dashed") +labs(title ="Distribution of Point Differential by Season",x ="Season",y ="Point Differential" )
It visually reinforces how point differential aligns with team success.
top_bottom <- nfl_clean %>%arrange(desc(point_diff)) %>%slice_head(n =10) %>%mutate(group ="Top 10") %>%bind_rows( nfl_clean %>%arrange(point_diff) %>%slice_head(n =10) %>%mutate(group ="Bottom 10") )ggplot(top_bottom, aes(reorder(team, point_diff), point_diff, fill = group)) +geom_col() +coord_flip() +labs(title ="Top & Bottom 10 Teams by Point Differential",x ="Team",y ="Point Differential",fill ="Group" )
Conclusion
Across all five seasons from 2019–2023, the analysis shows a strong and consistent positive relationship between point differential and win percentage. Teams that outscore their opponents by larger margins almost always finish with higher winning percentages, and this pattern holds true in every season examined.
The scatterplots reveal a clear upward trend, and the correlation between point differential and win percentage is high, confirming that point differential is one of the most reliable indicators of team strength. The top‑10 and bottom‑10 visualizations further reinforce this: teams with the largest positive point differentials consistently appear among the league’s best records, while teams with large negative differentials almost always finish near the bottom.
Overall, the results strongly support the analytics claim that point differential “tells the truth” about team performance. Not only does it correlate closely with winning, but the relationship remains stable across multiple seasons, suggesting that point differential is a robust and meaningful metric for evaluating NFL teams.