##Overview

Analysis based on the data set provided by FiveThirtyEight regarding Super Bowl ads. Data set includes information about the brands, categories, and characteristics of ads aired. Objective is to clean and transform the data set by selecting columns with meaningful names and highlighting potential areas for analysis.

Source Data: https://github.com/fivethirtyeight/superbowl-ads/blob/main/superbowl-ads.csv

##Loading

Loading data directly from GitHub into R. Selecting columns of importance for further analysis. Lastly, renaming them for readability and relevance.

url <- "https://raw.githubusercontent.com/fivethirtyeight/superbowl-ads/main/superbowl-ads.csv"
superbowl_ads <- read_csv(url)
## Rows: 244 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): brand, superbowl_ads_dot_com_url, youtube_url
## dbl (1): year
## lgl (7): funny, show_product_quickly, patriotic, celebrity, danger, animals,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(superbowl_ads)
## # A tibble: 6 × 11
##    year brand     superbowl_ads_dot_com…¹ youtube_url funny show_product_quickly
##   <dbl> <chr>     <chr>                   <chr>       <lgl> <lgl>               
## 1  2018 Toyota    https://superbowl-ads.… https://ww… FALSE FALSE               
## 2  2020 Bud Light https://superbowl-ads.… https://ww… TRUE  TRUE                
## 3  2006 Bud Light https://superbowl-ads.… https://ww… TRUE  FALSE               
## 4  2018 Hynudai   https://superbowl-ads.… https://ww… FALSE TRUE                
## 5  2003 Bud Light https://superbowl-ads.… https://ww… TRUE  TRUE                
## 6  2020 Toyota    https://superbowl-ads.… https://ww… TRUE  TRUE                
## # ℹ abbreviated name: ¹​superbowl_ads_dot_com_url
## # ℹ 5 more variables: patriotic <lgl>, celebrity <lgl>, danger <lgl>,
## #   animals <lgl>, use_sex <lgl>

##Cleaning

Selecting columns relevant for analysis (brand, category, year, humor, patriotism, and likeability of the ads).

ads_cleaned <- superbowl_ads %>%
  select(brand, year, funny, patriotic, celebrity, animals, use_sex) %>%
  rename(
    Brand = brand,
    Year = year,
    Funny = funny,
    Patriotic = patriotic,
    Celebrity = celebrity,
    Animals = animals,
    Sex_Appeal = use_sex
  )

head(ads_cleaned)
## # A tibble: 6 × 7
##   Brand      Year Funny Patriotic Celebrity Animals Sex_Appeal
##   <chr>     <dbl> <lgl> <lgl>     <lgl>     <lgl>   <lgl>     
## 1 Toyota     2018 FALSE FALSE     FALSE     FALSE   FALSE     
## 2 Bud Light  2020 TRUE  FALSE     TRUE      FALSE   FALSE     
## 3 Bud Light  2006 TRUE  FALSE     FALSE     TRUE    FALSE     
## 4 Hynudai    2018 FALSE FALSE     FALSE     FALSE   FALSE     
## 5 Bud Light  2003 TRUE  FALSE     FALSE     TRUE    TRUE      
## 6 Toyota     2020 TRUE  FALSE     TRUE      TRUE    FALSE
str(ads_cleaned)
## tibble [244 × 7] (S3: tbl_df/tbl/data.frame)
##  $ Brand     : chr [1:244] "Toyota" "Bud Light" "Bud Light" "Hynudai" ...
##  $ Year      : num [1:244] 2018 2020 2006 2018 2003 ...
##  $ Funny     : logi [1:244] FALSE TRUE TRUE FALSE TRUE TRUE ...
##  $ Patriotic : logi [1:244] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Celebrity : logi [1:244] FALSE TRUE FALSE FALSE FALSE TRUE ...
##  $ Animals   : logi [1:244] FALSE FALSE TRUE FALSE TRUE TRUE ...
##  $ Sex_Appeal: logi [1:244] FALSE FALSE FALSE FALSE TRUE FALSE ...

##Analysis

Plotting data to identify any possible trends or correlation between points.

summary(ads_cleaned)
##     Brand                Year        Funny         Patriotic      
##  Length:244         Min.   :2000   Mode :logical   Mode :logical  
##  Class :character   1st Qu.:2005   FALSE:76        FALSE:203      
##  Mode  :character   Median :2010   TRUE :168       TRUE :41       
##                     Mean   :2010                                  
##                     3rd Qu.:2015                                  
##                     Max.   :2020                                  
##  Celebrity        Animals        Sex_Appeal     
##  Mode :logical   Mode :logical   Mode :logical  
##  FALSE:176       FALSE:155       FALSE:181      
##  TRUE :68        TRUE :89        TRUE :63       
##                                                 
##                                                 
## 
ggplot(ads_cleaned, aes(x = Brand)) +
  geom_bar() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Distribution of Super Bowl Ads by Brand",
       x = "Brand",
       y = "Count")

ads_cleaned %>%
  group_by(Year) %>%
  summarise(
    Funny_Ratio = mean(Funny),
    Patriotic_Ratio = mean(Patriotic),
    Celebrity_Ratio = mean(Celebrity)
  ) %>%
  pivot_longer(cols = ends_with("_Ratio"), 
               names_to = "Characteristic", 
               values_to = "Ratio") %>%
  ggplot(aes(x = Year, y = Ratio, color = Characteristic)) +
  geom_line() +
  theme_minimal() +
  labs(title = "Trends in Ad Characteristics Over Time",
       x = "Year",
       y = "Proportion of Ads")

##Columns

##Conclusion

write_csv(ads_cleaned, "superbowl_ads_cleaned.csv")
print("Cleaned dataset saved as 'superbowl_ads_cleaned.csv'")
## [1] "Cleaned dataset saved as 'superbowl_ads_cleaned.csv'"