The Super Bowl is one of the most viewed events yearly in America, with around 99.18 million viewers watching in 2021 and 115 million viewers watching in 2023. With that amount of viewership companies know that spending money to get their advertisement out there during the Super Bowl will result in a large amount of viewership. It is estimated that companies spend approximately 545 million dollars just for 30 seconds of ad time during the Super Bowl.
Furthermore, the advertisements companies produce do add enjoyment to the Super Bowl. With the amount of Super Bowl viewers and advertisements entertain viewers, with 3/10 viewers stating “commercials were their favorite part” and more that 5/10 viewers watching stating the funny ads were their “favorite type of Super Bowl commercial”. These advertisements also bring in profit with total revenue for commercials being 600 million.
The purpose of this analysis is to analyze online engagement (measured in YouTube Likes) in Super Bowl advertisement of four major beverage brands, including Coca Cola, Pepsi, Budweiser and Bud Light. By conducting an NOVA test, we aim to determine if there are significant differences in the number of YouTube likes the brands got.The aim of this study is to identify leading brands in the beverage industry to understand what strategies they might be employing to differenciate themselves from the competition.
This analysis uses a data set obtained from the Maven Analytics website, which can be found at this link. This data set contains variables such as ‘Brand’, ‘Year’, ‘Youtube Views’, and ‘Youtube Likes’, focusing particularly on the Youtube likes received by Super Bowl ads from the brands mentioned above.
Year: The year the commercial aired.Brand: The name of the company that aired the
commercial.Youtube Views: The number of views the commercial has
received on YouTube.Youtube Likes: The number of likes the commercial has
received on YouTubelibrary(dplyr)
library(readxl)
library(car)
df <- read_excel("C:\\Users\\belen\\OneDrive\\Homework\\Stats\\superbowl_commercials.xlsx")
summary(df)
## Year Brand Superbowl Ads Link Youtube Link
## Min. :2000 Length:249 Length:249 Length:249
## 1st Qu.:2006 Class :character Class :character Class :character
## Median :2010 Mode :character Mode :character Mode :character
## Mean :2010
## 3rd Qu.:2015
## Max. :2021
##
## Funny Shows Product Quickly Patriotic Celebrity
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:77 FALSE:81 FALSE:207 FALSE:178
## TRUE :172 TRUE :168 TRUE :42 TRUE :71
##
##
##
##
## Danger Animals Uses Sex Length
## Mode :logical Mode :logical Mode :logical Min. : 10.00
## FALSE:171 FALSE:159 FALSE:186 1st Qu.: 30.00
## TRUE :78 TRUE :90 TRUE :63 Median : 30.00
## Mean : 44.46
## 3rd Qu.: 60.00
## Max. :180.00
##
## Estimated Cost Youtube Views Youtube Likes TV Viewers
## Min. : 0.980 Min. : 5 Min. : 0.0 Min. : 84.34
## 1st Qu.: 2.400 1st Qu.: 7256 1st Qu.: 20.5 1st Qu.: 90.75
## Median : 3.150 Median : 47309 Median : 146.0 Median : 98.73
## Mean : 5.157 Mean : 1569672 Mean : 5086.6 Mean :100.48
## 3rd Qu.: 5.900 3rd Qu.: 181362 3rd Qu.: 704.5 3rd Qu.:111.01
## Max. :31.730 Max. :181423810 Max. :295000.0 Max. :232.00
## NA's :12 NA's :18
We will only keep beverage companies (Coca Cola, Pepsi, Budweiser, Bud Light).
filtered_data <- df %>%
filter(Brand %in% c("Coca-Cola", "Pepsi", "Budweiser", "Bud Light"))%>%
filter(!is.na(`Youtube Likes`) & `Youtube Likes` != 0)
Now we will identify the outliers, which in this case are understood as observations that fall below Q1- (1.5/IQR) or above QR3 +(1.5*IQR)
a)Calculate IQR by Q3-Q1:
Q1 <- quantile(filtered_data$`Youtube Likes`, 0.25, na.rm = TRUE)
Q3 <- quantile(filtered_data$`Youtube Likes`, 0.75, na.rm = TRUE)
IQR <- Q3 - Q1
b)Calculate lower and upper bounds:
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
c)Filter the values below the lower bound and above the upper bound:
new_data <- filtered_data %>%
filter(`Youtube Likes` >= lower_bound & `Youtube Likes` <= upper_bound)
-Median Super Bowl ads YouTube likes
medians_per_brand <- filtered_data %>%
group_by(Brand) %>%
summarise(median_likes = median(`Youtube Likes`, na.rm = TRUE))
print(medians_per_brand)
## # A tibble: 4 × 2
## Brand median_likes
## <chr> <dbl>
## 1 Bud Light 70
## 2 Budweiser 166.
## 3 Coca-Cola 263
## 4 Pepsi 104.
Interpretation: The calculated median values of YouTube likes for each brand are as follows:
• Bud Light: 70 likes
• Budweiser: 166 likes
• Coca-Cola: 263 likes
• Pepsi: 104 likes
Coca-Cola has the highest median number of YouTube likes among the brands analyzed, indicating that their content may resonate more with the platform’s audience or that they have a larger reach. Bud Light has the lowest median, which could suggest less engagement with their content or a smaller audience on YouTube. Budweiser and Pepsi have median values that fall between those of Bud Light and Coca-Cola, with Budweiser having a notably higher median than Pepsi.
-H0: u1 = u2 = u3 = u4
-Ha: not all population mean are equal
-Parameters:
u1 = population mean of YouTube likes of Bud Light ads
u2 = population mean of YouTube likes of Budweiser ads
u3= population mean of YouTube likes of Coca-Cola ads
u4= population mean of YouTube likes of Pepsi ads
Response variable: YouTube likes
Factor: Brand
Treatments: Pepsi, Bud Light, Budweiser, Coca-Cola
Homogeneity of Variances: All treatment groups have the same variance
Normality: data is normally distributed
1) Levene’s Test for Homogeneity of Variance
leveneTest(`Youtube Likes` ~ Brand, data = new_data)
interpreptation: There is not enough evidence to conclude that the variance across groups is statistically significantly different. Therefore, we can assume the variance of the dara are equal.
2) Normality plot of residuals
res.aov <- aov(`Youtube Likes` ~ Brand, data = new_data)
summary(res.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Brand 3 54580 18193 0.313 0.816
## Residuals 110 6395859 58144
plot(res.aov, 2)
-Interpreptation: As shown in the graph above,the residual points don't fall approximately along the 45-degree reference line, so we cannot assume normality. Because not all assumptions are met, there is a need to conduct a Non-parametric ANOVA Test.
-Non-parametric alternative to One-Way ANOVA test (Kruskal-Walls rank sum test):
kruskal.test(`Youtube Likes` ~ Brand, data = new_data)
##
## Kruskal-Wallis rank sum test
##
## data: Youtube Likes by Brand
## Kruskal-Wallis chi-squared = 1.3573, df = 3, p-value = 0.7156
-Interpretation: Given p-value (0.83) > alpha (0.05), there is not significant evidence to reject H0. Therefore we cannot conclude the 4 beverage brands differ in the average Super Bowls ads YouTube likes.Since we do not reject H0, there is no need to perform a Fisher’s least significant difference test.
Super Bowl ads for Coca-cola, Pepsi, Bud Lights, Budweiser (beverage categories) have no significant differences in average likes on YouTube. Additionally, since our analysis indicates no significant differences in the average YouTube likes among the four companies studied, it suggests that the current advertising strategies are effective.
To enhance advertising success, further analysis is advisable to identify which variables significantly influence YouTube likes. This targeted investigation would enable the optimization of advertising strategies, ensuring the most effective techniques are implemented in future campaigns. Additionally, conducting detailed research will help the beverage companies distinguish themselves from competitors,and tailor their approach to better meet market demands and viewer preferences.