The Article

This assignment is based on the article According To Super Bowl Ads, Americans Love America, Animals And Sex by FiveThirtyEight.com. This article categorizes Super Bowl commercials based on combinations of defining characteristics, such as whether there is a celebrity appearance, how patriotic the commercial is, etc. The authors of the article watched over 200 Super Bowl commercials and marked “yes” or “no” for each of the characteristics they were looking for. They then compared commercials that fit the same criteria and the ways in which advertisers chose to combine these criteria, sometimes with surprising results and drastically different approaches.

Loading the Data

commercials <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/superbowl-ads/main/superbowl-ads.csv"))

commercials$funny <- as.logical(commercials$funny)
commercials$show_product_quickly <- as.logical(commercials$show_product_quickly)
commercials$patriotic <- as.logical(commercials$patriotic)
commercials$celebrity <- as.logical(commercials$celebrity)
commercials$danger <- as.logical(commercials$danger)
commercials$animals <- as.logical(commercials$animals)
commercials$use_sex <- as.logical(commercials$use_sex)

commercials <- commercials %>%
    mutate(brand = replace(brand, brand == "Hynudai", "Hyundai"))

Documentation for the data set can be found here.

In the data set, the presence or absence of a defining characteristic in a commercial is denoted by character values “True” or “False.” I chose to convert these to boolean (logical) values.

I also noticed that the brand “Hyundai” was spelled wrong in the initial data set, so I used mutate() and replace() to change all instances of “Hynudai” to the correct spelling of “Hyundai.”

Analysis

commercials %>%
    summarize(start_year = min(year), end_year = max(year))
##   start_year end_year
## 1       2000     2020

The years of the commercials span from 2000-2020.

commercials %>%
    count(brand)
##        brand  n
## 1  Bud Light 60
## 2  Budweiser 43
## 3  Coca-Cola 21
## 4    Doritos 25
## 5    E-Trade 13
## 6    Hyundai 22
## 7        Kia 13
## 8        NFL 11
## 9      Pepsi 25
## 10    Toyota 11

There are 10 brands which were studied in this article: Bud Light, Budweiser, Coca-Cola, Doritos, E-Trade, Hyundai, Kia, NFL, Pepsi, and Toyota.

How many commercials fall within each category?

commercials %>%
    summary()
##       year         brand           superbowl_ads_dot_com_url youtube_url       
##  Min.   :2000   Length:244         Length:244                Length:244        
##  1st Qu.:2005   Class :character   Class :character          Class :character  
##  Median :2010   Mode  :character   Mode  :character          Mode  :character  
##  Mean   :2010                                                                  
##  3rd Qu.:2015                                                                  
##  Max.   :2020                                                                  
##    funny         show_product_quickly patriotic       celebrity      
##  Mode :logical   Mode :logical        Mode :logical   Mode :logical  
##  FALSE:76        FALSE:78             FALSE:203       FALSE:176      
##  TRUE :168       TRUE :166            TRUE :41        TRUE :68       
##                                                                      
##                                                                      
##                                                                      
##    danger         animals         use_sex       
##  Mode :logical   Mode :logical   Mode :logical  
##  FALSE:169       FALSE:155       FALSE:181      
##  TRUE :75        TRUE :89        TRUE :63       
##                                                 
##                                                 
## 

In the summary, you can see the number of TRUE and FALSE observations for each column.

Column True False
funny 168 76
show_product_quickly 166 78
patriotic 41 203
celebrity 68 176
danger 75 169
animals 89 155
use_sex 63 181

Analyzing Combinations

Here I transformed the initial data set by adding columns for the combinations chosen by the article.

Documentation for using SQL within R can be found here.

Which commercials combine being funny with being patriotic?

funny_and_patriotic <- sqldf("SELECT superbowl_ads_dot_com_url 
      FROM commercials 
      WHERE funny == TRUE and patriotic == TRUE") %>%
    mutate(funny_and_patriotic = TRUE)

funny_and_patriotic$superbowl_ads_dot_com_url
##  [1] "https://superbowl-ads.com/2019-hyundai-shopper-assurance-the-elevator-with-jason-bateman/"          
##  [2] "https://superbowl-ads.com/2007-etrade-one-finger/"                                                  
##  [3] "https://superbowl-ads.com/2009-budweiser-clydesdale-plays-fetch/"                                   
##  [4] "https://superbowl-ads.com/doritos-2015-super-bowl-xlix-ad-when-pigs-fly/"                           
##  [5] "https://superbowl-ads.com/bud-light-2016-super-bowl-50-ad-the-bud-light-party/"                     
##  [6] "https://superbowl-ads.com/video-bud-light-2013-super-bowl-xlvii-commercial-lucky-chair/"            
##  [7] "https://superbowl-ads.com/2008-coca-cola-political-parties/"                                        
##  [8] "https://superbowl-ads.com/2008-bud-light-language-of-love/"                                         
##  [9] "https://superbowl-ads.com/etrade-commercial-this-is-getting-old/"                                   
## [10] "https://superbowl-ads.com/toyota-2018-big-game-ad-one-team/"                                        
## [11] "https://superbowl-ads.com/hd-exclusive-budweiser-body-bridge-2010-super-bowl-44-xliv-commercial-ad/"
## [12] "https://superbowl-ads.com/2001-pepsi-bob-dole/"
commercials <- commercials %>%
    left_join(funny_and_patriotic, by = "superbowl_ads_dot_com_url") %>%
    mutate(funny_and_patriotic = replace(funny_and_patriotic, is.na(funny_and_patriotic),
        FALSE))

Which commercials combine danger with not being funny?

danger_and_not_funny <- sqldf("SELECT superbowl_ads_dot_com_url 
      FROM commercials 
      WHERE danger == TRUE and funny == FALSE") %>%
    mutate(danger_and_not_funny = TRUE)

danger_and_not_funny$superbowl_ads_dot_com_url
## [1] "https://superbowl-ads.com/2020-budweiser-typical-american/"                            
## [2] "https://superbowl-ads.com/2007-coca-cola-video-game/"                                  
## [3] "https://superbowl-ads.com/2017-budweiser-super-bowl-51-li-tv-commercial-born-hard-way/"
## [4] "https://superbowl-ads.com/2014-kia-the-truth/"                                         
## [5] "https://superbowl-ads.com/2007-budweiser-dalmatian/"                                   
## [6] "https://superbowl-ads.com/2009-pepsi-anthem/"                                          
## [7] "https://superbowl-ads.com/2001-etrade-dot-com-ghost-town/"                             
## [8] "https://superbowl-ads.com/2012-coca-cola-catch-ne-bear/"                               
## [9] "https://superbowl-ads.com/2015-budweiser-lost-dog/"
commercials <- commercials %>%
    left_join(danger_and_not_funny, by = "superbowl_ads_dot_com_url") %>%
    mutate(danger_and_not_funny = replace(danger_and_not_funny, is.na(danger_and_not_funny),
        FALSE))

Which commercials combine celebrities and being patriotic?

celebrity_and_patriotic <- sqldf("SELECT superbowl_ads_dot_com_url 
      FROM commercials 
      WHERE celebrity == TRUE and patriotic == TRUE") %>%
    mutate(celebrity_and_patriotic = TRUE)

celebrity_and_patriotic$superbowl_ads_dot_com_url
##  [1] "https://superbowl-ads.com/2020-budweiser-typical-american/"                               
##  [2] "https://superbowl-ads.com/2020-nfl-next-100/"                                             
##  [3] "https://superbowl-ads.com/2019-hyundai-shopper-assurance-the-elevator-with-jason-bateman/"
##  [4] "https://superbowl-ads.com/2018_pepsi_this_is_the_pepsi/"                                  
##  [5] "https://superbowl-ads.com/2017-nfl-super-bowl-51-li-tv-commercial-inside-lines/"          
##  [6] "https://superbowl-ads.com/nfl-2016-super-bowl-50-ad-super-bowl-babies-choir/"             
##  [7] "https://superbowl-ads.com/bud-light-2016-super-bowl-50-ad-the-bud-light-party/"           
##  [8] "https://superbowl-ads.com/video-bud-light-2013-super-bowl-xlvii-commercial-lucky-chair/"  
##  [9] "https://superbowl-ads.com/2009-pepsi-anthem/"                                             
## [10] "https://superbowl-ads.com/2008-coca-cola-political-parties/"                              
## [11] "https://superbowl-ads.com/2008-bud-light-language-of-love/"                               
## [12] "https://superbowl-ads.com/2001-pepsi-bob-dole/"
commercials <- commercials %>%
    left_join(celebrity_and_patriotic, by = "superbowl_ads_dot_com_url") %>%
    mutate(celebrity_and_patriotic = replace(celebrity_and_patriotic, is.na(celebrity_and_patriotic),
        FALSE))

Which commercials combine being funny, sex, and animals?

funny_sex_and_animals <- sqldf("SELECT superbowl_ads_dot_com_url 
      FROM commercials 
      WHERE funny == TRUE and use_sex == TRUE and animals == TRUE") %>%
    mutate(funny_sex_and_animals = TRUE)

funny_sex_and_animals$superbowl_ads_dot_com_url
##  [1] "https://superbowl-ads.com/2003-bud-light-hermit-crab/"                                      
##  [2] "https://superbowl-ads.com/2007-budweiser-king-crab/"                                        
##  [3] "https://superbowl-ads.com/2005-bud-light-parrot/"                                           
##  [4] "https://superbowl-ads.com/2004-bud-light-frank/"                                            
##  [5] "https://superbowl-ads.com/hyundai-2016-super-bowl-50-ad-ryanville/"                         
##  [6] "https://superbowl-ads.com/2012-bud-light-rescue-dog/"                                       
##  [7] "https://superbowl-ads.com/2015-kia-sorento-the-perfect-getaway-with-pierce-brosnan/"        
##  [8] "https://superbowl-ads.com/2004-bud-light-sleigh-ride/"                                      
##  [9] "https://superbowl-ads.com/video-bud-light-2013-super-bowl-xlvii-commercial-lucky-chair/"    
## [10] "https://superbowl-ads.com/2012-toyota-camry-its-reinvented/"                                
## [11] "https://superbowl-ads.com/2002-bud-light-falcon/"                                           
## [12] "https://superbowl-ads.com/2009-doritos-power-of-the-crunch/"                                
## [13] "https://superbowl-ads.com/hd-exclusive-kias-big-game-commercial-2010-super-bowl-44-xliv-ad/"
## [14] "https://superbowl-ads.com/2000-bud-light-she-has-a-cat/"                                    
## [15] "https://superbowl-ads.com/video-2013-kia-super-bowl-xlvii-commercial-space-babies/"         
## [16] "https://superbowl-ads.com/2002-etrade-monkey-musical/"                                      
## [17] "https://superbowl-ads.com/2006-budweiser-streaker/"                                         
## [18] "https://superbowl-ads.com/2005-bud-light-cedric-stranded/"                                  
## [19] "https://superbowl-ads.com/2004-budweiser-lipstick/"                                         
## [20] "https://superbowl-ads.com/2001-pepsi-bob-dole/"
commercials <- commercials %>%
    left_join(funny_sex_and_animals, by = "superbowl_ads_dot_com_url") %>%
    mutate(funny_sex_and_animals = replace(funny_sex_and_animals, is.na(funny_sex_and_animals),
        FALSE))
summary(commercials)
##       year         brand           superbowl_ads_dot_com_url youtube_url       
##  Min.   :2000   Length:244         Length:244                Length:244        
##  1st Qu.:2005   Class :character   Class :character          Class :character  
##  Median :2010   Mode  :character   Mode  :character          Mode  :character  
##  Mean   :2010                                                                  
##  3rd Qu.:2015                                                                  
##  Max.   :2020                                                                  
##    funny         show_product_quickly patriotic       celebrity      
##  Mode :logical   Mode :logical        Mode :logical   Mode :logical  
##  FALSE:76        FALSE:78             FALSE:203       FALSE:176      
##  TRUE :168       TRUE :166            TRUE :41        TRUE :68       
##                                                                      
##                                                                      
##                                                                      
##    danger         animals         use_sex        funny_and_patriotic
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical      
##  FALSE:169       FALSE:155       FALSE:181       FALSE:232          
##  TRUE :75        TRUE :89        TRUE :63        TRUE :12           
##                                                                     
##                                                                     
##                                                                     
##  danger_and_not_funny celebrity_and_patriotic funny_sex_and_animals
##  Mode :logical        Mode :logical           Mode :logical        
##  FALSE:235            FALSE:232               FALSE:224            
##  TRUE :9              TRUE :12                TRUE :20             
##                                                                    
##                                                                    
## 

Adding to the table from above:

Column True False
funny_and_patriotic 12 232
danger_and_not_funny 9 235
celebrity_and_patriotic 12 232
funny_sex_and_animals 20 224

Further Exploration

Have commercials started to use sex more over the years?

commercials %>%
    group_by(year) %>%
    count(use_sex) %>%
    mutate(use_sex = tolower(as.character(use_sex))) %>%
    pivot_wider(names_from = "use_sex", values_from = "n") %>%
    mutate(true = replace(true, is.na(true), 0), total = true + false, prop = true/total) %>%
    ggplot(aes(x = year, y = prop)) + geom_smooth() + geom_point() + scale_y_continuous(labels = percent) +
    labs(title = "Sex in Super Bowl Ads Over the Years", x = "Year", y = "Proportion of Ads that Used Sex")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

It actually seems that ads have used less sex over the years.

How many commercials did each brand make from 2000-2020?

To generate a table recording the number of ads created by each brand, we would use the following:

commercials %>%
    count(brand)
##        brand  n
## 1  Bud Light 60
## 2  Budweiser 43
## 3  Coca-Cola 21
## 4    Doritos 25
## 5    E-Trade 13
## 6    Hyundai 22
## 7        Kia 13
## 8        NFL 11
## 9      Pepsi 25
## 10    Toyota 11

If we wanted to visualize these results, we could use a bar graph to compare the number of commercials that each brand made from 2000-2020.

commercials %>%
    ggplot(aes(x = brand)) + geom_bar(aes(color = brand, fill = brand)) + labs(title = "Number of Ads by Brand",
    x = "Brand", y = "Number of Ads") + theme(legend.position = "none")

From here we can see that Bud Light and Budweiser, both beer companies, made the most ads, with Bud Light making the most. We can also see that Hyundai made the most ads out of all three car companies, and Pepsi made a few more ads than their competitor, Coca-Cola.

What is the proportion of ads that show the product quickly?

sum(as.integer(commercials$show_product_quickly))/nrow(commercials)
## [1] 0.6803279

68% of advertisements shows their products within the first 10 seconds.

Which brands tend to show their product quickly?

commercials %>%
    group_by(brand) %>%
    count(show_product_quickly) %>%
    mutate(show_product_quickly = tolower(as.character(show_product_quickly))) %>%
    pivot_wider(names_from = show_product_quickly, values_from = n) %>%
    mutate(total = false + true, prop = true/total)
## # A tibble: 10 × 5
## # Groups:   brand [10]
##    brand     false  true total  prop
##    <chr>     <int> <int> <int> <dbl>
##  1 Bud Light    11    49    60 0.817
##  2 Budweiser    16    27    43 0.628
##  3 Coca-Cola     6    15    21 0.714
##  4 Doritos       3    22    25 0.88 
##  5 E-Trade       6     7    13 0.538
##  6 Hyundai      10    12    22 0.545
##  7 Kia           9     4    13 0.308
##  8 NFL           5     6    11 0.545
##  9 Pepsi        10    15    25 0.6  
## 10 Toyota        2     9    11 0.818

Since some companies created more commercials than other, I used proportion to see which companies are quick to show their product. Doritos (88%), Toyota (81.8%), and Bud Light (81.7%) are the companies that are quickest to reveal themselves, with over 80% of their ads showing the product within the first 10 seconds.

Which brands combined humor, sex, and animals?

commercials %>%
    group_by(brand) %>%
    count(funny_sex_and_animals) %>%
    mutate(funny_sex_and_animals = tolower(as.character(funny_sex_and_animals))) %>%
    pivot_wider(names_from = funny_sex_and_animals, values_from = n)
## # A tibble: 10 × 3
## # Groups:   brand [10]
##    brand     false  true
##    <chr>     <int> <int>
##  1 Bud Light    51     9
##  2 Budweiser    40     3
##  3 Coca-Cola    21    NA
##  4 Doritos      24     1
##  5 E-Trade      12     1
##  6 Hyundai      21     1
##  7 Kia          10     3
##  8 NFL          11    NA
##  9 Pepsi        24     1
## 10 Toyota       10     1

Every company, except for Coca-Cola and the NFL, has in some way combined humor, sex, and animals in their advertisements.

Conclusions and Recommendations

Using a combination of R and SQL (from sqldf package), I was able to wrangle the data into a data frame that included the combinations mentioned in the article. Each smaller subset of the data frame, created using SQL code, would direct someone to the proper URL to be able to watch commercials based on their preferred combinations. Any of the code above in the “Analyzing Combinations” section can be modified to include new or alternative columns to create newly categorized ads based on new combinations.

Possible Extension

I would be interested to know which commercials and which combinations of characteristics are most popular. To do this, I would go through each YouTube link and add columns into the data set for total_views and number_of_likes. This way I could compare to see which advertisements are seem most popular and I could see what combination of factors were used. Alternatively, I could see which combinations are least popular based on most dislikes.

This method would be somewhat limited, as not all people who enjoy a YouTube video actually take the time to “like” it. Also, most, if not all, of these advertisements are probably posted on many different YouTube channels while this data set only provides one. Therefore, any of the links provided in the data set could be less visited/liked than the same advertisement posted on another channel. However, I think this would anyways be a good start to measuring the popularity of these commercials, and it would certainly give insight to someone who was trying to figure out which of these criteria or combinations of factors would be good to put in their next ad.