This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Use control+Enter to run the code chunks on PC. Use command+Enter to run the code chunks on MAC.
In this section, we install and load the necessary packages.
In this section, we import the necessary data for this lab.
You are hired as a Business Analyst to help the manager to find out if spending millions of dollars on Super Bowl ads creates social media buzz.
As a second step, carry out descriptive analysis on superbowl.csv data. There are some questions for each sections that you need to answer.
Task 1, compute the mean, variance, and standard deviation of adspend for each new_brand before the Super Bowl and after the Super Bowl. Hint: you may use dplyr functions group_by() and summarise().
Note that the superbowl column is 0 for the weeks before super bowl and 1 for the weeks of or after super bowl.
task1_newbrand <- superbowl %>%
group_by(new_brand, superbowl) %>%
summarise(
ads_mean = mean(adspend),
ads_var = var(adspend),
ads_sd = sd(adspend)
)
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
task1_newbrand
## # A tibble: 6 × 5
## # Groups: new_brand [3]
## new_brand superbowl ads_mean ads_var ads_sd
## <chr> <int> <dbl> <dbl> <dbl>
## 1 Beetle 0 2.17 2.18 1.48
## 2 Beetle 1 891. 4839676. 2200.
## 3 CR-Z 0 1.25 0.703 0.839
## 4 CR-Z 1 0.709 0.0529 0.230
## 5 Camaro 0 71.1 156. 12.5
## 6 Camaro 1 518. 1335443. 1156.
Task 2, compute the mean and standard deviation of volume (the number of total mentions form social media), pos (the number of positive mentions) and neg (the number of negative mentions) for each new_brand before the Super Bowl and after the Super Bowl.
task2_newbrand <- superbowl %>%
group_by(new_brand, superbowl) %>%
summarise(
vol_mean = mean(volume),
vol_sd = sd(volume),
pos_mean = mean(pos),
pos_sd = sd(pos),
neg_mean = mean(neg),
neg_sd = sd(neg)
)
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
task2_newbrand
## # A tibble: 6 × 8
## # Groups: new_brand [3]
## new_brand superbowl vol_mean vol_sd pos_mean pos_sd neg_mean neg_sd
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Beetle 0 3418. 622. 659. 91.8 86.5 17.2
## 2 Beetle 1 4319. 1463. 965. 455. 156. 100.
## 3 CR-Z 0 1172. 266. 268 53.7 60.5 33.8
## 4 CR-Z 1 1486. 712. 339. 136. 49.8 21.6
## 5 Camaro 0 87596. 7817. 21432 2610. 4458. 334.
## 6 Camaro 1 93613. 22222. 23509. 6351. 4679. 1333.
Then, answer the following questions. Please write your answers after Response:
Questions:
What brand has the highest mean ad spend after the Super Bowl? Has the mean ad spending increased after the Super Bowl for that brand? By how much?
What brand has the highest mean volume of mentions after the Super Bowl? What brand has the lowest mean volume of positive mentions after the Super Bowl?
Responses:
1.The brand with the highest mean ad spend after the Super Bowl is Beetle, with a mean ad spend of 891. Before the Super Bowl, its average ad spend was 2.18, showing a significant increase of approximately 888.82, which is almost 400 times higher than before.
2.The brand with the highest mean volume of mentions after the Super Bowl is Camaro, with an average of 93,613 mentions. The brand with the lowest mean volume of positive mentions after the Super Bowl is CR-Z, with an average of 339 positive mentions.
Task 1, compute the correlation between adspend and volume in superbowl dataset, before the Super Bowl and after the Super Bowl. Similarly, compute the correlation between adspend and pos, and between adspend and neg in superbowl dataset, before the Super Bowl and after the Super Bowl. Hint: you may use dplyr functions group_by() and summarise().
s2_task1_newbrand <- superbowl %>%
group_by(superbowl) %>%
summarise(
cor_ads_vol = cor(adspend, volume),
cor_ads_pos = cor(adspend, pos),
cor_ads_neg = cor(adspend, neg)
)
s2_task1_newbrand
## # A tibble: 2 × 4
## superbowl cor_ads_vol cor_ads_pos cor_ads_neg
## <int> <dbl> <dbl> <dbl>
## 1 0 0.975 0.975 0.979
## 2 1 0.0513 0.0536 0.0495
Task 2, for each new_brand, compute the correlation between adspend and volume, between adspend and pos, and between adspend and neg, before the Super Bowl and after the Super Bowl.
s2_task2_newbrand <- superbowl %>%
group_by(superbowl) %>%
summarise(
cor_ads_vol = cor(adspend, volume),
cor_ads_pos = cor(adspend, pos),
cor_ads_neg = cor(adspend, neg)
)
s2_task2_newbrand
## # A tibble: 2 × 4
## superbowl cor_ads_vol cor_ads_pos cor_ads_neg
## <int> <dbl> <dbl> <dbl>
## 1 0 0.975 0.975 0.979
## 2 1 0.0513 0.0536 0.0495
s2_task2_newbrand_insights <- superbowl %>%
group_by(new_brand, superbowl) %>%
summarise(
mean_pos = mean(pos),
mean_neg = mean(neg),
med_pos = median(pos),
med_neg = median(neg),
var_pos = var(pos),
var_neg = var(neg),
sd_pos = sd(pos),
sd_neg = sd(neg),
cor_ads_pos = cor(adspend, pos),
cor_ads_neg = cor(adspend, neg),
cov_ads_pos = cov(adspend, pos),
cov_ads_neg = cov(adspend, neg)
)
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
s2_task2_newbrand_insights
## # A tibble: 6 × 14
## # Groups: new_brand [3]
## new_brand superbowl mean_pos mean_neg med_pos med_neg var_pos var_neg sd_pos
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Beetle 0 659. 86.5 676. 92.5 8420. 2.96e2 91.8
## 2 Beetle 1 965. 156. 881 145 206928. 1.01e4 455.
## 3 CR-Z 0 268 60.5 246 48 2888 1.14e3 53.7
## 4 CR-Z 1 339. 49.8 298 50 18631. 4.67e2 136.
## 5 Camaro 0 21432 4458. 21598 4523 6810120 1.11e5 2610.
## 6 Camaro 1 23509. 4679. 26190 5010 40341243. 1.78e6 6351.
## # ℹ 5 more variables: sd_neg <dbl>, cor_ads_pos <dbl>, cor_ads_neg <dbl>,
## # cov_ads_pos <dbl>, cov_ads_neg <dbl>
Then, answer the following questions. Please write your answers after Response:
Questions:
What is the correlation between adspend and pos before AND after the Super Bowl? Does that imply a positive, negative or no relationship? Is the relationship strong? What about the relationship between adspend and neg before AND after the Super Bowl?
Which brand’s ad spending did actually create a social media buzz (either positive or negative mentions)? Use both univariate and bivariate statistics for each brand to support your answer.
Responses:
1.The correlation between adspend and positive mentions (pos) before the Super Bowl is 0.975, indicating a strong positive relationship — higher ad spending was strongly linked to more positive mentions. After the Super Bowl, the correlation drops sharply to 0.0536, showing a very weak positive relationship and suggesting that post-event ad spending did not strongly influence positive mentions. Similarly, the correlation between adspend and negative mentions (neg) before the Super Bowl is 0.979, also a strong positive relationship, but it falls to 0.0495 after the event, implying a very weak positive relationship afterward.
2.Camaro’s ad spending created the strongest social media buzz after the Super Bowl. Univariate statistics show that it had the highest mean volume of mentions (93,613) and large increases in both positive (from 21,432 to 23,509) and negative mentions (from 4,458 to 4,679). Bivariate results also show strong pre-event correlations, confirming that Camaro’s ad campaign generated significant engagement and continued audience discussion after the Super Bowl.