##Overview
Analysis based on the data set provided by FiveThirtyEight regarding Super Bowl ads. Data set includes information about the brands, categories, and characteristics of ads aired. Objective is to clean and transform the data set by selecting columns with meaningful names and highlighting potential areas for analysis.
Source Data: https://github.com/fivethirtyeight/superbowl-ads/blob/main/superbowl-ads.csv
##Loading
Loading data directly from GitHub into R. Selecting columns of importance for further analysis. Lastly, renaming them for readability and relevance.
url <- "https://raw.githubusercontent.com/fivethirtyeight/superbowl-ads/main/superbowl-ads.csv"
superbowl_ads <- read_csv(url)
## Rows: 244 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): brand, superbowl_ads_dot_com_url, youtube_url
## dbl (1): year
## lgl (7): funny, show_product_quickly, patriotic, celebrity, danger, animals,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(superbowl_ads)
## # A tibble: 6 × 11
## year brand superbowl_ads_dot_com…¹ youtube_url funny show_product_quickly
## <dbl> <chr> <chr> <chr> <lgl> <lgl>
## 1 2018 Toyota https://superbowl-ads.… https://ww… FALSE FALSE
## 2 2020 Bud Light https://superbowl-ads.… https://ww… TRUE TRUE
## 3 2006 Bud Light https://superbowl-ads.… https://ww… TRUE FALSE
## 4 2018 Hynudai https://superbowl-ads.… https://ww… FALSE TRUE
## 5 2003 Bud Light https://superbowl-ads.… https://ww… TRUE TRUE
## 6 2020 Toyota https://superbowl-ads.… https://ww… TRUE TRUE
## # ℹ abbreviated name: ¹​superbowl_ads_dot_com_url
## # ℹ 5 more variables: patriotic <lgl>, celebrity <lgl>, danger <lgl>,
## # animals <lgl>, use_sex <lgl>
##Cleaning
Selecting columns relevant for analysis (brand, category, year, humor, patriotism, and likeability of the ads).
ads_cleaned <- superbowl_ads %>%
select(brand, year, funny, patriotic, celebrity, animals, use_sex) %>%
rename(
Brand = brand,
Year = year,
Funny = funny,
Patriotic = patriotic,
Celebrity = celebrity,
Animals = animals,
Sex_Appeal = use_sex
)
head(ads_cleaned)
## # A tibble: 6 × 7
## Brand Year Funny Patriotic Celebrity Animals Sex_Appeal
## <chr> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl>
## 1 Toyota 2018 FALSE FALSE FALSE FALSE FALSE
## 2 Bud Light 2020 TRUE FALSE TRUE FALSE FALSE
## 3 Bud Light 2006 TRUE FALSE FALSE TRUE FALSE
## 4 Hynudai 2018 FALSE FALSE FALSE FALSE FALSE
## 5 Bud Light 2003 TRUE FALSE FALSE TRUE TRUE
## 6 Toyota 2020 TRUE FALSE TRUE TRUE FALSE
str(ads_cleaned)
## tibble [244 × 7] (S3: tbl_df/tbl/data.frame)
## $ Brand : chr [1:244] "Toyota" "Bud Light" "Bud Light" "Hynudai" ...
## $ Year : num [1:244] 2018 2020 2006 2018 2003 ...
## $ Funny : logi [1:244] FALSE TRUE TRUE FALSE TRUE TRUE ...
## $ Patriotic : logi [1:244] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Celebrity : logi [1:244] FALSE TRUE FALSE FALSE FALSE TRUE ...
## $ Animals : logi [1:244] FALSE FALSE TRUE FALSE TRUE TRUE ...
## $ Sex_Appeal: logi [1:244] FALSE FALSE FALSE FALSE TRUE FALSE ...
##Analysis
Plotting data to identify any possible trends or correlation between points.
summary(ads_cleaned)
## Brand Year Funny Patriotic
## Length:244 Min. :2000 Mode :logical Mode :logical
## Class :character 1st Qu.:2005 FALSE:76 FALSE:203
## Mode :character Median :2010 TRUE :168 TRUE :41
## Mean :2010
## 3rd Qu.:2015
## Max. :2020
## Celebrity Animals Sex_Appeal
## Mode :logical Mode :logical Mode :logical
## FALSE:176 FALSE:155 FALSE:181
## TRUE :68 TRUE :89 TRUE :63
##
##
##
ggplot(ads_cleaned, aes(x = Brand)) +
geom_bar() +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Distribution of Super Bowl Ads by Brand",
x = "Brand",
y = "Count")
ads_cleaned %>%
group_by(Year) %>%
summarise(
Funny_Ratio = mean(Funny),
Patriotic_Ratio = mean(Patriotic),
Celebrity_Ratio = mean(Celebrity)
) %>%
pivot_longer(cols = ends_with("_Ratio"),
names_to = "Characteristic",
values_to = "Ratio") %>%
ggplot(aes(x = Year, y = Ratio, color = Characteristic)) +
geom_line() +
theme_minimal() +
labs(title = "Trends in Ad Characteristics Over Time",
x = "Year",
y = "Proportion of Ads")
##Columns
##Conclusion
write_csv(ads_cleaned, "superbowl_ads_cleaned.csv")
print("Cleaned dataset saved as 'superbowl_ads_cleaned.csv'")
## [1] "Cleaned dataset saved as 'superbowl_ads_cleaned.csv'"