This assignment is based on the article According To Super Bowl Ads, Americans Love America, Animals And Sex by FiveThirtyEight.com. This article categorizes Super Bowl commercials based on combinations of defining characteristics, such as whether there is a celebrity appearance, how patriotic the commercial is, etc. The authors of the article watched over 200 Super Bowl commercials and marked “yes” or “no” for each of the characteristics they were looking for. They then compared commercials that fit the same criteria and the ways in which advertisers chose to combine these criteria, sometimes with surprising results and drastically different approaches.
commercials <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/superbowl-ads/main/superbowl-ads.csv"))
commercials$funny <- as.logical(commercials$funny)
commercials$show_product_quickly <- as.logical(commercials$show_product_quickly)
commercials$patriotic <- as.logical(commercials$patriotic)
commercials$celebrity <- as.logical(commercials$celebrity)
commercials$danger <- as.logical(commercials$danger)
commercials$animals <- as.logical(commercials$animals)
commercials$use_sex <- as.logical(commercials$use_sex)
commercials <- commercials %>%
mutate(brand = replace(brand, brand == "Hynudai", "Hyundai"))
Documentation for the data set can be found here.
In the data set, the presence or absence of a defining characteristic in a commercial is denoted by character values “True” or “False.” I chose to convert these to boolean (logical) values.
I also noticed that the brand “Hyundai” was spelled wrong in the
initial data set, so I used mutate()
and
replace()
to change all instances of “Hynudai” to the
correct spelling of “Hyundai.”
commercials %>%
summarize(start_year = min(year), end_year = max(year))
## start_year end_year
## 1 2000 2020
The years of the commercials span from 2000-2020.
commercials %>%
count(brand)
## brand n
## 1 Bud Light 60
## 2 Budweiser 43
## 3 Coca-Cola 21
## 4 Doritos 25
## 5 E-Trade 13
## 6 Hyundai 22
## 7 Kia 13
## 8 NFL 11
## 9 Pepsi 25
## 10 Toyota 11
There are 10 brands which were studied in this article: Bud Light, Budweiser, Coca-Cola, Doritos, E-Trade, Hyundai, Kia, NFL, Pepsi, and Toyota.
commercials %>%
summary()
## year brand superbowl_ads_dot_com_url youtube_url
## Min. :2000 Length:244 Length:244 Length:244
## 1st Qu.:2005 Class :character Class :character Class :character
## Median :2010 Mode :character Mode :character Mode :character
## Mean :2010
## 3rd Qu.:2015
## Max. :2020
## funny show_product_quickly patriotic celebrity
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:76 FALSE:78 FALSE:203 FALSE:176
## TRUE :168 TRUE :166 TRUE :41 TRUE :68
##
##
##
## danger animals use_sex
## Mode :logical Mode :logical Mode :logical
## FALSE:169 FALSE:155 FALSE:181
## TRUE :75 TRUE :89 TRUE :63
##
##
##
In the summary, you can see the number of TRUE and FALSE observations for each column.
Column | True | False |
---|---|---|
funny | 168 | 76 |
show_product_quickly | 166 | 78 |
patriotic | 41 | 203 |
celebrity | 68 | 176 |
danger | 75 | 169 |
animals | 89 | 155 |
use_sex | 63 | 181 |
Here I transformed the initial data set by adding columns for the combinations chosen by the article.
Documentation for using SQL within R can be found here.
funny_and_patriotic <- sqldf("SELECT superbowl_ads_dot_com_url
FROM commercials
WHERE funny == TRUE and patriotic == TRUE") %>%
mutate(funny_and_patriotic = TRUE)
funny_and_patriotic$superbowl_ads_dot_com_url
## [1] "https://superbowl-ads.com/2019-hyundai-shopper-assurance-the-elevator-with-jason-bateman/"
## [2] "https://superbowl-ads.com/2007-etrade-one-finger/"
## [3] "https://superbowl-ads.com/2009-budweiser-clydesdale-plays-fetch/"
## [4] "https://superbowl-ads.com/doritos-2015-super-bowl-xlix-ad-when-pigs-fly/"
## [5] "https://superbowl-ads.com/bud-light-2016-super-bowl-50-ad-the-bud-light-party/"
## [6] "https://superbowl-ads.com/video-bud-light-2013-super-bowl-xlvii-commercial-lucky-chair/"
## [7] "https://superbowl-ads.com/2008-coca-cola-political-parties/"
## [8] "https://superbowl-ads.com/2008-bud-light-language-of-love/"
## [9] "https://superbowl-ads.com/etrade-commercial-this-is-getting-old/"
## [10] "https://superbowl-ads.com/toyota-2018-big-game-ad-one-team/"
## [11] "https://superbowl-ads.com/hd-exclusive-budweiser-body-bridge-2010-super-bowl-44-xliv-commercial-ad/"
## [12] "https://superbowl-ads.com/2001-pepsi-bob-dole/"
commercials <- commercials %>%
left_join(funny_and_patriotic, by = "superbowl_ads_dot_com_url") %>%
mutate(funny_and_patriotic = replace(funny_and_patriotic, is.na(funny_and_patriotic),
FALSE))
danger_and_not_funny <- sqldf("SELECT superbowl_ads_dot_com_url
FROM commercials
WHERE danger == TRUE and funny == FALSE") %>%
mutate(danger_and_not_funny = TRUE)
danger_and_not_funny$superbowl_ads_dot_com_url
## [1] "https://superbowl-ads.com/2020-budweiser-typical-american/"
## [2] "https://superbowl-ads.com/2007-coca-cola-video-game/"
## [3] "https://superbowl-ads.com/2017-budweiser-super-bowl-51-li-tv-commercial-born-hard-way/"
## [4] "https://superbowl-ads.com/2014-kia-the-truth/"
## [5] "https://superbowl-ads.com/2007-budweiser-dalmatian/"
## [6] "https://superbowl-ads.com/2009-pepsi-anthem/"
## [7] "https://superbowl-ads.com/2001-etrade-dot-com-ghost-town/"
## [8] "https://superbowl-ads.com/2012-coca-cola-catch-ne-bear/"
## [9] "https://superbowl-ads.com/2015-budweiser-lost-dog/"
commercials <- commercials %>%
left_join(danger_and_not_funny, by = "superbowl_ads_dot_com_url") %>%
mutate(danger_and_not_funny = replace(danger_and_not_funny, is.na(danger_and_not_funny),
FALSE))
celebrity_and_patriotic <- sqldf("SELECT superbowl_ads_dot_com_url
FROM commercials
WHERE celebrity == TRUE and patriotic == TRUE") %>%
mutate(celebrity_and_patriotic = TRUE)
celebrity_and_patriotic$superbowl_ads_dot_com_url
## [1] "https://superbowl-ads.com/2020-budweiser-typical-american/"
## [2] "https://superbowl-ads.com/2020-nfl-next-100/"
## [3] "https://superbowl-ads.com/2019-hyundai-shopper-assurance-the-elevator-with-jason-bateman/"
## [4] "https://superbowl-ads.com/2018_pepsi_this_is_the_pepsi/"
## [5] "https://superbowl-ads.com/2017-nfl-super-bowl-51-li-tv-commercial-inside-lines/"
## [6] "https://superbowl-ads.com/nfl-2016-super-bowl-50-ad-super-bowl-babies-choir/"
## [7] "https://superbowl-ads.com/bud-light-2016-super-bowl-50-ad-the-bud-light-party/"
## [8] "https://superbowl-ads.com/video-bud-light-2013-super-bowl-xlvii-commercial-lucky-chair/"
## [9] "https://superbowl-ads.com/2009-pepsi-anthem/"
## [10] "https://superbowl-ads.com/2008-coca-cola-political-parties/"
## [11] "https://superbowl-ads.com/2008-bud-light-language-of-love/"
## [12] "https://superbowl-ads.com/2001-pepsi-bob-dole/"
commercials <- commercials %>%
left_join(celebrity_and_patriotic, by = "superbowl_ads_dot_com_url") %>%
mutate(celebrity_and_patriotic = replace(celebrity_and_patriotic, is.na(celebrity_and_patriotic),
FALSE))
funny_sex_and_animals <- sqldf("SELECT superbowl_ads_dot_com_url
FROM commercials
WHERE funny == TRUE and use_sex == TRUE and animals == TRUE") %>%
mutate(funny_sex_and_animals = TRUE)
funny_sex_and_animals$superbowl_ads_dot_com_url
## [1] "https://superbowl-ads.com/2003-bud-light-hermit-crab/"
## [2] "https://superbowl-ads.com/2007-budweiser-king-crab/"
## [3] "https://superbowl-ads.com/2005-bud-light-parrot/"
## [4] "https://superbowl-ads.com/2004-bud-light-frank/"
## [5] "https://superbowl-ads.com/hyundai-2016-super-bowl-50-ad-ryanville/"
## [6] "https://superbowl-ads.com/2012-bud-light-rescue-dog/"
## [7] "https://superbowl-ads.com/2015-kia-sorento-the-perfect-getaway-with-pierce-brosnan/"
## [8] "https://superbowl-ads.com/2004-bud-light-sleigh-ride/"
## [9] "https://superbowl-ads.com/video-bud-light-2013-super-bowl-xlvii-commercial-lucky-chair/"
## [10] "https://superbowl-ads.com/2012-toyota-camry-its-reinvented/"
## [11] "https://superbowl-ads.com/2002-bud-light-falcon/"
## [12] "https://superbowl-ads.com/2009-doritos-power-of-the-crunch/"
## [13] "https://superbowl-ads.com/hd-exclusive-kias-big-game-commercial-2010-super-bowl-44-xliv-ad/"
## [14] "https://superbowl-ads.com/2000-bud-light-she-has-a-cat/"
## [15] "https://superbowl-ads.com/video-2013-kia-super-bowl-xlvii-commercial-space-babies/"
## [16] "https://superbowl-ads.com/2002-etrade-monkey-musical/"
## [17] "https://superbowl-ads.com/2006-budweiser-streaker/"
## [18] "https://superbowl-ads.com/2005-bud-light-cedric-stranded/"
## [19] "https://superbowl-ads.com/2004-budweiser-lipstick/"
## [20] "https://superbowl-ads.com/2001-pepsi-bob-dole/"
commercials <- commercials %>%
left_join(funny_sex_and_animals, by = "superbowl_ads_dot_com_url") %>%
mutate(funny_sex_and_animals = replace(funny_sex_and_animals, is.na(funny_sex_and_animals),
FALSE))
summary(commercials)
## year brand superbowl_ads_dot_com_url youtube_url
## Min. :2000 Length:244 Length:244 Length:244
## 1st Qu.:2005 Class :character Class :character Class :character
## Median :2010 Mode :character Mode :character Mode :character
## Mean :2010
## 3rd Qu.:2015
## Max. :2020
## funny show_product_quickly patriotic celebrity
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:76 FALSE:78 FALSE:203 FALSE:176
## TRUE :168 TRUE :166 TRUE :41 TRUE :68
##
##
##
## danger animals use_sex funny_and_patriotic
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:169 FALSE:155 FALSE:181 FALSE:232
## TRUE :75 TRUE :89 TRUE :63 TRUE :12
##
##
##
## danger_and_not_funny celebrity_and_patriotic funny_sex_and_animals
## Mode :logical Mode :logical Mode :logical
## FALSE:235 FALSE:232 FALSE:224
## TRUE :9 TRUE :12 TRUE :20
##
##
##
Adding to the table from above:
Column | True | False |
---|---|---|
funny_and_patriotic | 12 | 232 |
danger_and_not_funny | 9 | 235 |
celebrity_and_patriotic | 12 | 232 |
funny_sex_and_animals | 20 | 224 |
commercials %>%
group_by(year) %>%
count(use_sex) %>%
mutate(use_sex = tolower(as.character(use_sex))) %>%
pivot_wider(names_from = "use_sex", values_from = "n") %>%
mutate(true = replace(true, is.na(true), 0), total = true + false, prop = true/total) %>%
ggplot(aes(x = year, y = prop)) + geom_smooth() + geom_point() + scale_y_continuous(labels = percent) +
labs(title = "Sex in Super Bowl Ads Over the Years", x = "Year", y = "Proportion of Ads that Used Sex")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
It actually seems that ads have used less sex over the years.
To generate a table recording the number of ads created by each brand, we would use the following:
commercials %>%
count(brand)
## brand n
## 1 Bud Light 60
## 2 Budweiser 43
## 3 Coca-Cola 21
## 4 Doritos 25
## 5 E-Trade 13
## 6 Hyundai 22
## 7 Kia 13
## 8 NFL 11
## 9 Pepsi 25
## 10 Toyota 11
If we wanted to visualize these results, we could use a bar graph to compare the number of commercials that each brand made from 2000-2020.
commercials %>%
ggplot(aes(x = brand)) + geom_bar(aes(color = brand, fill = brand)) + labs(title = "Number of Ads by Brand",
x = "Brand", y = "Number of Ads") + theme(legend.position = "none")
From here we can see that Bud Light and Budweiser, both beer companies, made the most ads, with Bud Light making the most. We can also see that Hyundai made the most ads out of all three car companies, and Pepsi made a few more ads than their competitor, Coca-Cola.
sum(as.integer(commercials$show_product_quickly))/nrow(commercials)
## [1] 0.6803279
68% of advertisements shows their products within the first 10 seconds.
commercials %>%
group_by(brand) %>%
count(show_product_quickly) %>%
mutate(show_product_quickly = tolower(as.character(show_product_quickly))) %>%
pivot_wider(names_from = show_product_quickly, values_from = n) %>%
mutate(total = false + true, prop = true/total)
## # A tibble: 10 × 5
## # Groups: brand [10]
## brand false true total prop
## <chr> <int> <int> <int> <dbl>
## 1 Bud Light 11 49 60 0.817
## 2 Budweiser 16 27 43 0.628
## 3 Coca-Cola 6 15 21 0.714
## 4 Doritos 3 22 25 0.88
## 5 E-Trade 6 7 13 0.538
## 6 Hyundai 10 12 22 0.545
## 7 Kia 9 4 13 0.308
## 8 NFL 5 6 11 0.545
## 9 Pepsi 10 15 25 0.6
## 10 Toyota 2 9 11 0.818
Since some companies created more commercials than other, I used proportion to see which companies are quick to show their product. Doritos (88%), Toyota (81.8%), and Bud Light (81.7%) are the companies that are quickest to reveal themselves, with over 80% of their ads showing the product within the first 10 seconds.
commercials %>%
group_by(brand) %>%
count(funny_sex_and_animals) %>%
mutate(funny_sex_and_animals = tolower(as.character(funny_sex_and_animals))) %>%
pivot_wider(names_from = funny_sex_and_animals, values_from = n)
## # A tibble: 10 × 3
## # Groups: brand [10]
## brand false true
## <chr> <int> <int>
## 1 Bud Light 51 9
## 2 Budweiser 40 3
## 3 Coca-Cola 21 NA
## 4 Doritos 24 1
## 5 E-Trade 12 1
## 6 Hyundai 21 1
## 7 Kia 10 3
## 8 NFL 11 NA
## 9 Pepsi 24 1
## 10 Toyota 10 1
Every company, except for Coca-Cola and the NFL, has in some way combined humor, sex, and animals in their advertisements.
Using a combination of R and SQL (from sqldf package), I was able to wrangle the data into a data frame that included the combinations mentioned in the article. Each smaller subset of the data frame, created using SQL code, would direct someone to the proper URL to be able to watch commercials based on their preferred combinations. Any of the code above in the “Analyzing Combinations” section can be modified to include new or alternative columns to create newly categorized ads based on new combinations.
I would be interested to know which commercials and which
combinations of characteristics are most popular. To do this, I would go
through each YouTube link and add columns into the data set for
total_views
and number_of_likes
. This way I
could compare to see which advertisements are seem most popular and I
could see what combination of factors were used. Alternatively, I could
see which combinations are least popular based on most dislikes.
This method would be somewhat limited, as not all people who enjoy a YouTube video actually take the time to “like” it. Also, most, if not all, of these advertisements are probably posted on many different YouTube channels while this data set only provides one. Therefore, any of the links provided in the data set could be less visited/liked than the same advertisement posted on another channel. However, I think this would anyways be a good start to measuring the popularity of these commercials, and it would certainly give insight to someone who was trying to figure out which of these criteria or combinations of factors would be good to put in their next ad.