Superbowls

Author

My Nguyen

Introduction

According to statista.com, in the 2023 Super Bowl, over 113 million viewers tuned in to watch the game (an Rihanna), and about 16.7 million people watch the games during the regular season. Coming from Vietnam, I had only known the game of football - or in the US, soccer. But as I live in the US and I’m continuously exposed to football, I started to pay attention to it. The very first Super Bowl I saw was Superbowl 48 when the Seattle Seahawks beat the Denver Broncos. That marked the beginning of my football journey, but I did not get too into it until I finally learned and understood how it’s played. Now, I am hooked! Which is why I chose this dataset. This dataset is a collection of Superbowl history from 1967 to 2020 with variables such as:

  • Superbowl Dates: the exact date when the Super Bowl was help
  • Winners:
  • Losers
  • Winner Points
  • Loser Points
  • The MVP: (Most Valuable Player) of each Superbowl
  • Where the Superbowl was held (stadium, city, and state)

Considering that I have only followed football for 7 years, and the first Superbowl was held in 1967, I don’t know much about the history of football yet. So I want to explore:

  • The winners vs. losers throughout the years and what were the difference in winning and losing points

  • The total of wins/losses versus the total of Super Bowls attended each team has

  • Which team has won the most Super Bowls

This dataset is sourced from https://www.pro-football-reference.com/super-bowl which provides a complete csv file option for download and usage.

Setting up the Data

# Load the necessary libraries
library(tidyverse)
library(dplyr)
library(ggplot2)
library(plotly)
library(gganimate) 
library(RColorBrewer)

# Set working directory
setwd("C:/Users/myngu/OneDrive/Montgomery College/Spring 2023/DATA 110/Data Sets")
superbowl <- read_csv("superbowl.csv")

# Let's take a quick look at the data
head(superbowl)
# A tibble: 6 × 10
  Date       SB        Winner    Winne…¹ Loser Loser…² MVP   Stadium City  State
  <chr>      <chr>     <chr>       <dbl> <chr>   <dbl> <chr> <chr>   <chr> <chr>
1 Feb 2 2020 LIV (54)  Kansas C…      31 San …      20 Patr… Hard R… Miam… Flor…
2 Feb 3 2019 LIII (53) New Engl…      13 Los …       3 Juli… Merced… Atla… Geor…
3 Feb 4 2018 LII (52)  Philadel…      41 New …      33 Nick… U.S. B… Minn… Minn…
4 Feb 5 2017 LI (51)   New Engl…      34 Atla…      28 Tom … NRG St… Hous… Texas
5 Feb 7 2016 50        Denver B…      24 Caro…      10 Von … Levi's… Sant… Cali…
6 Feb 1 2015 XLIX (49) New Engl…      28 Seat…      24 Tom … Univer… Glen… Ariz…
# … with abbreviated variable names ¹​`Winner Pts`, ²​`Loser Pts`
summary(superbowl)
     Date                SB               Winner            Winner Pts   
 Length:54          Length:54          Length:54          Min.   :13.00  
 Class :character   Class :character   Class :character   1st Qu.:23.25  
 Mode  :character   Mode  :character   Mode  :character   Median :30.50  
                                                          Mean   :30.11  
                                                          3rd Qu.:35.00  
                                                          Max.   :55.00  
    Loser             Loser Pts        MVP              Stadium         
 Length:54          Min.   : 3.0   Length:54          Length:54         
 Class :character   1st Qu.:10.0   Class :character   Class :character  
 Mode  :character   Median :17.0   Mode  :character   Mode  :character  
                    Mean   :16.2                                        
                    3rd Qu.:21.0                                        
                    Max.   :33.0                                        
     City              State          
 Length:54          Length:54         
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      
# Adjusting the category names to lowercase and getting rid of the spaces names

names(superbowl) <- tolower(names(superbowl)) 
names(superbowl) <- gsub(" ","",names(superbowl))

# Convert the date format from chr to date
superbowl <- superbowl %>% 
  mutate(date = as.Date(date, format = "%B %d %Y")) 

# Adding a new variable to work with
superbowl['ptdiff'] = superbowl['winnerpts'] - superbowl['loserpts']

#Check the datatset
head(superbowl)
# A tibble: 6 × 11
  date       sb    winner winne…¹ loser loser…² mvp   stadium city  state ptdiff
  <date>     <chr> <chr>    <dbl> <chr>   <dbl> <chr> <chr>   <chr> <chr>  <dbl>
1 2020-02-02 LIV … Kansa…      31 San …      20 Patr… Hard R… Miam… Flor…     11
2 2019-02-03 LIII… New E…      13 Los …       3 Juli… Merced… Atla… Geor…     10
3 2018-02-04 LII … Phila…      41 New …      33 Nick… U.S. B… Minn… Minn…      8
4 2017-02-05 LI (… New E…      34 Atla…      28 Tom … NRG St… Hous… Texas      6
5 2016-02-07 50    Denve…      24 Caro…      10 Von … Levi's… Sant… Cali…     14
6 2015-02-01 XLIX… New E…      28 Seat…      24 Tom … Univer… Glen… Ariz…      4
# … with abbreviated variable names ¹​winnerpts, ²​loserpts

Create a graph to show the point differential for each Super Bowl game from 1967 to 2020

I want to look at the difference in winning versus losing points over the years. Since two of the best teams compete against each other in the Superbowl, my initial thought is that the games should be relatively tight.

graph1 <- superbowl %>%
  ggplot(aes(date, y=ptdiff, fill = winner)) +
  geom_point(aes(size=ptdiff, label2=loser, label3=winnerpts, label4=loserpts, label5=stadium)) + 
  theme_light() + 
  theme(axis.text.x = element_text(vjust = 0.5, hjust=1)) +
  scale_fill_manual(values=c("#99CCFF", "#9999CC", "#66CC99", "#66CC00", "#FF3399", "#193300", "#0000CC", "#FFCCE5", "#CCFFE5", "#e0e0e0", "#00FF00", "#FF9933", "#CC0000", "#660033", "#FF66FF", "#00994c", "#C77727", "#A2BEFF", "#808080", "#004c99", "#FAF683", "#AF1C30")) +
  xlab("Date") + 
  ylab("Points Difference") + 
  ggtitle("Super Bowl Point Diffenrential from 1967 to 2020")
graph1  <- ggplotly(graph1, tooltip = c("date", "winner", "loser", "winnerpts", "loserpts", "stadium"))
graph1

Contrary to my assumption, there are more games than not that have a point differential of 10 or more - which in football term, is about a touchdown (6, 7 or 8 points) plus a field goal (3 points). There is one anomaly that stood out among the rest, the Super Bowl that had the biggest point differential (a whopping 45 points) is the 1990 game between the San Francisco 49ers (W) and the Denver Broncos (L). This wasn’t the only Superbowl that the Broncos lost by 30+ points. In 2014 they lost to the Seattle Seahawks by 35 points. It must have been hard to be a Broncos’ fan during those years.

Another interesting view I drew from looking at this graph is from 2000 to 2020, the New England Patriots occupied the map as the Winning team 6 times! And they won tight games, all by about 10 points or less. Another team that appeared numerous times throughout the years, although not as close together) is the Pittsburgh Steelers. Let’s put a graph together to see how many times each team has been to the Super Bowl and which team has the most wins (or losses) over the years.

Create a new dataset with the list of teams and their number of wins & losses

# Create a new dataframe for Superbowl Winners and their number of wins
sbwinners <- superbowl %>%
  count(winner)
sbwinners
# A tibble: 22 × 2
   winner                  n
   <chr>               <int>
 1 Baltimore Colts         1
 2 Baltimore Ravens        2
 3 Chicago Bears           1
 4 Dallas Cowboys          5
 5 Denver Broncos          3
 6 Green Bay Packers       4
 7 Indianapolis Colts      1
 8 Kansas City Chiefs      2
 9 Los Angeles Raiders     1
10 Miami Dolphins          2
# … with 12 more rows
colnames(sbwinners)[1] ="teamname"
colnames(sbwinners)[2] = "wins"

# Create a new dataframe for Superbowl losers and their number of losses
sblosers <- superbowl %>%
  count(loser)
sblosers
# A tibble: 26 × 2
   loser                  n
   <chr>              <int>
 1 Arizona Cardinals      1
 2 Atlanta Falcons        2
 3 Baltimore Colts        1
 4 Buffalo Bills          4
 5 Carolina Panthers      2
 6 Chicago Bears          1
 7 Cincinnati Bengals     2
 8 Dallas Cowboys         3
 9 Denver Broncos         5
10 Green Bay Packers      1
# … with 16 more rows
colnames(sblosers)[1] ="teamname"
colnames(sblosers)[2] = "losses"

# Merge the two new dataframes
sbnew <- jointdataset <- merge(sblosers, sbwinners, by = 'teamname', all=TRUE)

# Converting the number of wins and losses to numeric values
sbnew$losses <- as.numeric(sbnew$losses)
sbnew$wins <- as.numeric(sbnew$wins)
sbnew[is.na(sbnew)] <- 0 #convert N/A to zero

# Adding the number of wins and losses to create a new "total" column
sbnew$total = rowSums(cbind(sbnew$losses,sbnew$wins))
  
# Check the result
head(sbnew)
           teamname losses wins total
1 Arizona Cardinals      1    0     1
2   Atlanta Falcons      2    0     2
3   Baltimore Colts      1    1     2
4  Baltimore Ravens      0    2     2
5     Buffalo Bills      4    0     4
6 Carolina Panthers      2    0     2

Creating an overlay bar graph to show the number of Superbowls each team attended and how many of those games resulted in wins (or losses)

library(patchwork)
Warning: package 'patchwork' was built under R version 4.2.3
graph4 <- sbnew %>% 
  ggplot() +
  geom_col(aes(x = teamname, y = total), 
           alpha = 0.2, fill = "darkgrey", color = "grey", width = 0.8) +
  geom_col(aes(x = teamname,  
               y = wins), 
           fill = "lightblue", color = "darkblue", width = 0.5) +
  geom_col(aes(x = teamname, y = losses), 
           alpha = 0.6, fill = "red", color = "darkred", width = 0.2) +
  labs(title = "Superbowl 1967 - 2020", x = "Teams", y = "Number of Super Bowls") +
  theme(plot.title = element_text(hjust = .5), 
        axis.text.x = element_text(size = 8,angle = 90), legend.position = "top")
graph4 <- ggplotly(graph4)
graph4

The blue indicates the number of wins and the red indicates the number of losses, grey indicates the total number of SBs each team attended

As of 2020, the New England Patriots and the Pittsburg Steelers share the same “most SB wins” record of 6 total. But the Patriots have appeared in 11 Super Bowls total - which is an amazing accomplishment, especially when considering a lot of those appearances are back to back!

The Patriots’ reign over the Super Bowl over the 20 years span had Forbes writing an article about in 2018 titled “Why The Patriots Always Win”. In the article, Francis Bridges discussed four main pillars that might have lead the team to wins after wins:

  • Leadership: the Patriots continuous success started when Rober Kraft (owner) hired Bill Bellichick (head Coach) and Tom Brady (quaterback). They are one of the greatest quarterback and head coach duo of all time.
  • Preparation: it has been mentioned by several opponents of the Patriots that the team doesn’t show signs of slowing down even in the last quarter of the game. Additionally, Patriots players confirmed in interviews that their conditioning and training are quite intense, more so than other teams; that their practices are the same fashion as playoffs games.
  • Resilience: As mentioned above, the Patriots seem to win close games. They have a reputation of coming back from deficits. They do not give up.
  • High stakes = their jam: when everything is on the line, they thrive.

Some might argue that, with 6 wins and 5 losses, the Patriots winning percentage is about 54% - which isn’t that great. But in football, it’s hard to compare winning percentages as some teams have only appeared in the Super Bowl once (such as the LA Raiders, NO Saints, NY Jets) and won - which makes their winning percentage 100%. As a football fan, I’d much rather watch my team compete in as many Super Bowl as possible for that 50% chance that they will bring home a Lombardi Trophy.

There are a couple things…

  1. I could not get a legend to show in my bar graph.
  2. I wish I figured out a way to stack the number of wins and losses on top of each other in the bar graph so they add up to present the total bar.

Bonus

I just couldn’t help it!

graph <- ggplot(superbowl, aes(winner, loser, color = sb)) + 
  geom_point() + 
  theme_bw(base_size = 10) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  ggtitle("Superbowl Winners and Losers from 1967 to 2020") + 

# gganimate specific bits: 
  labs(title = 'SB Date: {frame_time}', x = 'Winner', y = 'Loser') + 
  transition_time(date) + 
  ease_aes('linear', interval = 5)+
  shadow_mark(alpha = 0.3, size = 0.5)
graph

# save as gif
anim_save("superbow1967to2020.gif")

I wanted to do an animated graph because I was inspired by Jake. But I did not think that this graph serves a purpose in my project, the dots moves too quickly and it is difficult to identify, so I added it at the end as a bonus.