Objectives

The purpose of this analysis is to analyze online engagement (measured in YouTube Likes) in Super Bowl advertisement of four major beverage brands, including Coca Cola, Pepsi, Budweiser and Bud Light. By conducting an NOVA test, we aim to determine if there are significant differences in the number of YouTube likes the brands got.The aim of this study is to identify leading brands in the beverage industry to understand what strategies they might be employing to differenciate themselves from the competition.

This analysis uses a data set obtained from the Maven Analytics website, which can be found at this link. This data set contains variables such as ‘Brand’, ‘Year’, ‘Youtube Views’, and ‘Youtube Likes’, focusing particularly on the Youtube likes received by Super Bowl ads from the brands mentioned above.

Data Dictionary

Year: The year the commercial aired.
Brand: The name of the company that aired the commercial.
Youtube Views: The number of views the commercial has received on YouTube.
Youtube Likes: The number of likes the commercial has received on YouTube

Step 1: Install and load required libraries

library(dplyr)
library(readxl)
library(car)

df <- read_excel("C:\\Users\\belen\\OneDrive\\Homework\\Stats\\superbowl_commercials.xlsx")

Data Summary

summary(df)

##       Year         Brand           Superbowl Ads Link Youtube Link      
##  Min.   :2000   Length:249         Length:249         Length:249        
##  1st Qu.:2006   Class :character   Class :character   Class :character  
##  Median :2010   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2010                                                           
##  3rd Qu.:2015                                                           
##  Max.   :2021                                                           
##                                                                         
##    Funny         Shows Product Quickly Patriotic       Celebrity      
##  Mode :logical   Mode :logical         Mode :logical   Mode :logical  
##  FALSE:77        FALSE:81              FALSE:207       FALSE:178      
##  TRUE :172       TRUE :168             TRUE :42        TRUE :71       
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##    Danger         Animals         Uses Sex           Length      
##  Mode :logical   Mode :logical   Mode :logical   Min.   : 10.00  
##  FALSE:171       FALSE:159       FALSE:186       1st Qu.: 30.00  
##  TRUE :78        TRUE :90        TRUE :63        Median : 30.00  
##                                                  Mean   : 44.46  
##                                                  3rd Qu.: 60.00  
##                                                  Max.   :180.00  
##                                                                  
##  Estimated Cost   Youtube Views       Youtube Likes        TV Viewers    
##  Min.   : 0.980   Min.   :        5   Min.   :     0.0   Min.   : 84.34  
##  1st Qu.: 2.400   1st Qu.:     7256   1st Qu.:    20.5   1st Qu.: 90.75  
##  Median : 3.150   Median :    47309   Median :   146.0   Median : 98.73  
##  Mean   : 5.157   Mean   :  1569672   Mean   :  5086.6   Mean   :100.48  
##  3rd Qu.: 5.900   3rd Qu.:   181362   3rd Qu.:   704.5   3rd Qu.:111.01  
##  Max.   :31.730   Max.   :181423810   Max.   :295000.0   Max.   :232.00  
##                   NA's   :12          NA's   :18

Step 2: Clean data

We will only keep beverage companies (Coca Cola, Pepsi, Budweiser, Bud Light).

filtered_data <- df %>%
  filter(Brand %in% c("Coca-Cola", "Pepsi", "Budweiser", "Bud Light"))%>%
  filter(!is.na(`Youtube Likes`) & `Youtube Likes` != 0)

Now we will identify the outliers, which in this case are understood as observations that fall below Q1- (1.5/IQR) or above QR3 +(1.5*IQR)

a)Calculate IQR by Q3-Q1:

Q1 <- quantile(filtered_data$`Youtube Likes`, 0.25, na.rm = TRUE)
Q3 <- quantile(filtered_data$`Youtube Likes`, 0.75, na.rm = TRUE)
IQR <- Q3 - Q1

b)Calculate lower and upper bounds:

lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR

c)Filter the values below the lower bound and above the upper bound:

new_data <- filtered_data %>%
  filter(`Youtube Likes` >= lower_bound & `Youtube Likes` <= upper_bound)

-Median Super Bowl ads YouTube likes

We will procede to calculate the median of Youtube Likes for each brand:

medians_per_brand <- filtered_data %>%
  group_by(Brand) %>%
  summarise(median_likes = median(`Youtube Likes`, na.rm = TRUE))
print(medians_per_brand)

## # A tibble: 4 × 2
##   Brand     median_likes
##   <chr>            <dbl>
## 1 Bud Light          70 
## 2 Budweiser         166.
## 3 Coca-Cola         263 
## 4 Pepsi             104.

Interpretation: The calculated median values of YouTube likes for each brand are as follows:

    •   Bud Light: 70 likes
    •   Budweiser: 166 likes
    •   Coca-Cola: 263 likes
    •   Pepsi: 104 likes

Coca-Cola has the highest median number of YouTube likes among the brands analyzed, indicating that their content may resonate more with the platform’s audience or that they have a larger reach. Bud Light has the lowest median, which could suggest less engagement with their content or a smaller audience on YouTube. Budweiser and Pepsi have median values that fall between those of Bud Light and Coca-Cola, with Budweiser having a notably higher median than Pepsi.

Step 3: Develop hypothesis

-H0: u1 = u2 = u3 = u4

-Ha: not all population mean are equal

-Parameters:

u1 = population mean of YouTube likes of Bud Light ads
u2 = population mean of YouTube likes of Budweiser ads
u3= population mean of YouTube likes of Coca-Cola ads
u4= population mean of YouTube likes of Pepsi ads
Response variable: YouTube likes
Factor: Brand
Treatments: Pepsi, Bud Light, Budweiser, Coca-Cola

Step 4: Check the following ANOVA assumptions:

Homogeneity of Variances: All treatment groups have the same variance
Normality: data is normally distributed

1) Levene’s Test for Homogeneity of Variance

leveneTest(`Youtube Likes` ~ Brand, data = new_data)

interpreptation: There is not enough evidence to conclude that the variance across groups is statistically significantly different. Therefore, we can assume the variance of the dara are equal.

2) Normality plot of residuals

res.aov <- aov(`Youtube Likes` ~ Brand, data = new_data)
summary(res.aov)

##              Df  Sum Sq Mean Sq F value Pr(>F)
## Brand         3   54580   18193   0.313  0.816
## Residuals   110 6395859   58144

plot(res.aov, 2)

-Interpreptation: As shown in the graph above,the residual points don't fall approximately along the 45-degree reference line, so we cannot assume normality. Because not all assumptions are met, there is a need to conduct a Non-parametric ANOVA Test.

-Non-parametric alternative to One-Way ANOVA test (Kruskal-Walls rank sum test):

kruskal.test(`Youtube Likes` ~ Brand, data = new_data)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Youtube Likes by Brand
## Kruskal-Wallis chi-squared = 1.3573, df = 3, p-value = 0.7156

-Interpretation:  Given p-value (0.83) > alpha (0.05), there is not significant evidence to reject H0. Therefore we cannot conclude the 4 beverage brands differ in the average Super Bowls ads YouTube likes.Since we do not reject H0, there is no need to  perform a Fisher’s least significant difference test.

Conclusion and Recommendations

Super Bowl ads for Coca-cola, Pepsi, Bud Lights, Budweiser (beverage categories) have no significant differences in average likes on YouTube. Additionally, since our analysis indicates no significant differences in the average YouTube likes among the four companies studied, it suggests that the current advertising strategies are effective.

To enhance advertising success, further analysis is advisable to identify which variables significantly influence YouTube likes. This targeted investigation would enable the optimization of advertising strategies, ensuring the most effective techniques are implemented in future campaigns. Additionally, conducting detailed research will help the beverage companies distinguish themselves from competitors,and tailor their approach to better meet market demands and viewer preferences.

Super Bowl Ads - An ANOVA Analysis

Authors: Thu Lam, Belen Cerrutti, Chinmayi Bolisetty

2024-04-26

Introduction