BAIS 462 Final Project

Author

Charlie Gainor

Code Setup

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(jsonlite)

Attaching package: 'jsonlite'

The following object is masked from 'package:purrr':

    flatten
library(magrittr)

Attaching package: 'magrittr'

The following object is masked from 'package:purrr':

    set_names

The following object is masked from 'package:tidyr':

    extract
library(httr)
library(dbplyr)

Attaching package: 'dbplyr'

The following objects are masked from 'package:dplyr':

    ident, sql
library(ggplot2)

Introduction

On September 3, 2022, the country bore witness to one of the greatest college football games in the modern history of the sport. A game between Appalachian State and North Carolina was already high-scoring heading into the final quarter, but the Tar Heels led the game 41-21 and everyone thought the game was over. What followed was the equivalent of jumping out of a plane without a parachute while on a lethal amount of cocaine: the Mountaineers came all the way back, scoring the most points in the 4th quarter in their entire program’s history, and would have sent the game to overtime had Dashaun Davis not dropped a two-point conversion pass in the final play of the game.

Final Score: UNC 63, App St. 61. 124 combined points, over 1,200 yards of offense combined and 62 points combined in the 4th quarter alone. It was beautiful.

On the other side of the country, on the very same day at the very same time, the college football world got to witness the worst game of football ever. It is so atrocious that it’s even more beautiful than it’s counterpart.


There is an art to bad football. For every right way there is the play the sport, there are hundreds more ways to make a mockery of it. Sometimes it’s through one team playing like they pulled three-striaght all nighters and other times it’s a coordinated travesty with everybody playing along. The definition of a great game is almost always up for debate, but everyone can look at a horrible game and make the same grimace as their neighbor - a lone piece of unity within a sport filled with division.

In this project, I will examine the traits of the two different kinds of horrible football games, examine why they happen, and who are the worst offenders of the last 22 years.

Import Dataset

cfbdata <- data.frame()
cfbdata <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/gainorp_xavier_edu/IQBecWZLk8mLRYEaJjj3T9aVAVBzPaWZoNMF2LasxGn7dZM?download=1")
Rows: 19843 Columns: 58
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (6): game_type, away, home, conf_away, conf_home, tv
dbl  (49): season, week, rank_away, rank_home, score_away, score_home, q1_aw...
lgl   (1): neutral
date  (1): date
time  (1): time_et

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cfbdata <- 
  cfbdata %>%
  filter(season >= 2004) #Filter from 2004-2026

cfbdata <- 
  cfbdata %>%
  mutate(`comb_points` = score_away + score_home,
         `point_diff_home` = score_home - score_away,
         `point_diff_away` = score_away - score_home,
         `total_yards` = total_yards_away + total_yards_home,
         `point_diff` = abs(score_home - score_away),
         `yard_diff` = abs(total_yards_away - total_yards_home),
         `total_turnover` = int_home + int_away + fum_home + fum_away,
         `rvr` = ifelse(is.na(rank_away), FALSE, 
                        ifelse(is.na(rank_home), FALSE, TRUE)))

cfbdata <- 
  cfbdata %>%
  mutate(`blowout` = ifelse(point_diff >= 40, TRUE, FALSE))

Part 1 - The Blowout

With over 130 teams playing college football with drastically swinging income levels and athletic funding support, blowouts are a part of the game. Some schools will host “buy” games at the beginning of the season where they pay schools from either mid-major conferences or the lower FCS divison to get kicked around (usually) by their team.

There are large deficits everywhere in college football, so for the purpose of this project, we’ll define a blowout as a win/loss by 40 points or more.

cfbdata %>%
  ggplot(aes(x = week, y= point_diff, colour = ifelse(point_diff >= 40, TRUE, FALSE))) + 
  geom_point()
Warning: Removed 809 rows containing missing values or values outside the scale range
(`geom_point()`).

Why does a blowout happen?

Many things can lead to a blowout, but fundamentally we can see that the more dominant you are in total yards gained through offense, the more you shoot towards a blowout, as seen below.

cfbdata %>%
  ggplot(aes(x = yard_diff, y = point_diff)) + 
  geom_point() + 
  labs(title = "Yard Differential vs. Point Differential", x = "Yard Diff", y = "Point Diff")
Warning: Removed 40 rows containing missing values or values outside the scale range
(`geom_point()`).

cfbdata %>%
  ggplot(aes(x = yard_diff, y = point_diff, color = blowout)) + 
  geom_point() +
  labs(title = "Yard Differential vs. Point Differential", x = "Yard Diff", y = "Point Diff", color = "Blowout")
Warning: Removed 40 rows containing missing values or values outside the scale range
(`geom_point()`).

Turnovers are also a major reason decider of both who wins the game and how much they win by. Below are the total number of turnovers in a game against the point differential that the winner was decided by with all games in the sample.

cfbdata %>%
  drop_na(total_turnover) %>%
  ggplot(aes(x = as.factor(total_turnover), y = point_diff)) + 
  geom_point() + 
  labs(title = "Turnovers vs. Point Differential", x = "Total Turnovers", y = "Point Diff")

Interestingly, the sample doesn’t break the pattern that much when filtered by blowout wins - if anything, it shows that even when teams are down a signficant amount, a game with a lot of turnovers is significantly more rare.

#|label: Total Turnovers in Blowouts

cfbdata %>%
  filter(blowout == TRUE) %>%
  drop_na(total_turnover) %>%
  ggplot(aes(x = as.factor(total_turnover), y = point_diff)) + 
  geom_point() + 
  labs(title = "Yard Differential vs. Point Differential in Blowouts", x = "Total Turnovers", y = "Point Diff")

When does a blowout suck?

So if there are games where you expect a blowout, what are the games where you don’t expect or want a lopsided result? If the average college football fan with no attachment to a relevant team flips onto ESPN expecting to watch a good, competitive game, what would leave them completely disappointed?

How about games where both teams are ranked in the top 25?

cfbdata %>%
  filter(blowout == TRUE,
         rvr == TRUE) %>%
  ggplot(aes(x = yard_diff, y = point_diff)) + 
  geom_point() + 
  labs(title = "Yard Differential vs. Point Differential in Ranked vs. Ranked Blowouts", x = "Yard Diff", y = "Point Diff")

Since 2004, a blowout game between two ranked teams has happened 33 times, just over once per season. Now there are some reasonable explanations for a couple of these results, like the AP overrating a team in the preseason poll and them getting blown out by a much better ranked team (see #25 Maryland’s 63-0 loss to eventual National Champions Florida State in 2013 at the very top of the graph), but some of these are downright dismal. There is no bigger humiliation than being voted into the top echelon of college football hoping to prove yourself worthy and then being kicked around so hard, pundits around the country believe that you were never worthy to begin with.

The worst part is that as a casual fan with expectations of good football, it’s not even fun. But for most of these, you can switch to a different channel and watch a much better game.

But what if you can’t?

cfbdata %>%
  filter(blowout == TRUE,
         rvr == TRUE) %>%
  ggplot(aes(x = yard_diff, y = point_diff, color = game_type)) + 
  geom_point() + 
  labs(title = "Yard Differential vs. Point Differential", x = "Yard Diff", y = "Point Diff", color = "Game Type")

In the past 22 years with over 19,000 games played, three games have involved a ranked team getting absolutely blown out by another ranked team by over 40 points in the postseason, whether it be a bowl game or in the damn National Championship. Not only did three teams get blown out, they got blown out and the world had no choice but to watch.

So who has the worst blowout?

TCU. They lost in the 2024 National Championship Game 65-7. It’s not close.

Part 2 - The Slog

What is a “slog?”

A “slog” as we’ll call it is a game where there is not a lot of scoring and not a lot of offense. It’s essentially a game with way too much empty space. For this project, we’ll quantify a “slog” as a game where both teams fail to combine for more than 20 points and fail to combine for 500 total yards of offense.

cfbdata <- 
  cfbdata %>%
  mutate(`slog` = ifelse((score_home + score_away) >= 20, FALSE, ifelse(total_yards > 500, FALSE, TRUE)))

cfbdata %>%
  filter(slog == TRUE) %>%
  ggplot(aes(x = total_yards, y = score_home+score_away)) + 
  geom_point() + 
  labs(title = "Total Score vs. Total Yards of 'Slog' Games", x = "Total Score", y = "Total Yards")

Defensive Masterclass vs. Slog

It’s important to make the distinction that just because a game is low-scoring doesn’t mean it sucks - plenty of games in recent college football history like the Game of the Century in 2013 between Alabama and LSU are remembered fondly because of the defensive highlights. Just because a game ends 9-6 doesn’t mean nothing happened. A slog is when there are no defensive highlights to be found. It’s when no one wants to win.

Below are the number of slog games based on different turnover counts. Remember, these are games with little scoring and little offense - these are the highlights of the game.

#|label: Defense vs. Bad

cfbdata %>%
  filter(slog == TRUE) %>%
  ggplot(aes(x = as.factor(total_turnover))) + 
  geom_bar() + 
  labs(title = "Number of 'Slog' Games with Different Turnover Counts", x = "Turnovers", y = "Game Count")

The lack of turnovers is discouraging from weeding out the bad from the good, but that doesn’t mean a dominant defensive line can force a massive yard defecit. Usually.

In slog games, both teams are guilty of ruining the sport. When both teams combine for so little, there’s usually not that much to set them apart - not even in the yard category.

#|label: Yard Difference in Slogs

cfbdata %>%
  filter(slog == TRUE) %>%
  ggplot(aes(x = yard_diff)) + 
  geom_histogram(bins = 10) + 
  labs(title = "Average Yard Difference Between Teams in 'Slog' Games", x = "Yard Diff", y = "Count")

Conclusion

So that Iowa game that happened during one of the craziest games in college football history? Iowa hosted FCS-division South Dakota State and neither team managed to break 200 yards of offense. There was absolutely nothing going for the entire game - the punter was receiving MVP chants in the final few minutes.

The final score? Iowa 7, South Dakota State 3.

Iowa didn’t score a touchdown. They scored a field goal and two safeties. Their defense scored more points than their offense without either one getting into the end zone.

That’s the beauty of bad football. There are common trends that make them fall in correlation - if your opponent outgains you by more than 400 yards, you’re probably going to lose by a lot - but everything else is grouped together. The intermediaries and details that set everything apart fall within the story that sports tells. Despite different schools with completely different revenues, student body sizes and regions, anyone in the United States can come together with another group of people and perform in a beautiful part of America’s Game: being really, really bad at it.

The failed attempt to get the scorecard API working

Originally, this project was going to incorporate the scorecard API to include school revenue and student body size - however, I did not realize until far too late that the names between the two schools were so different it was basically impossible to search up the information cleanly.

Below is the code draft showing what I was planning to do with the API, to some extent.

#scorecard_call <- function(name)

#sc_endpoint <- "https://api.data.gov/ed/collegescorecard/v1/schools.json?"
#api_key <- "api_key=Xya8GFKPr981ZhfCo71n69LTKhbWKjbWIM0eFWlP"
#sc_url <- paste(sc_endpoint,"?",api_key,"&school.name=",name)

#sc_data <- sc_url %>%
  #GET %>%
  #content(as = "text",
          #encoding = "UTF-8") %>%
  #fromJSON() %>%
  #use_series(results)

#return(sc_data)

#loop through names and collect student data and total revenue. the stats would then be used to contrast the very bad football worthy of middle school games.