R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Use control+Enter to run the code chunks on PC. Use command+Enter to run the code chunks on MAC.

Load Packages

In this section, we install and load the necessary packages.

Import Data

In this section, we import the necessary data for this lab.

Homework Assignment 2: Super Bowl Caselet (Part 2)

You are hired as a Business Analyst to help the manager to find out if spending millions of dollars on Super Bowl ads creates social media buzz.

As a second step, carry out descriptive analysis on superbowl.csv data. There are some questions for each sections that you need to answer.

Section 1: Univariate Statistics

Task 1, compute the mean, variance, and standard deviation of adspend for each new_brand before the Super Bowl and after the Super Bowl. Hint: you may use dplyr functions group_by() and summarise().

Note that the superbowl column is 0 for the weeks before super bowl and 1 for the weeks of or after super bowl.

task1_newbrand <- superbowl %>% 
  group_by(new_brand, superbowl) %>%
  summarise(
    ads_mean = mean(adspend),
    ads_var = var(adspend),
    ads_sd = sd(adspend)
  )
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
task1_newbrand
## # A tibble: 6 × 5
## # Groups:   new_brand [3]
##   new_brand superbowl ads_mean      ads_var   ads_sd
##   <chr>         <int>    <dbl>        <dbl>    <dbl>
## 1 Beetle            0    2.17        2.18      1.48 
## 2 Beetle            1  891.    4839676.     2200.   
## 3 CR-Z              0    1.25        0.703     0.839
## 4 CR-Z              1    0.709       0.0529    0.230
## 5 Camaro            0   71.1       156.       12.5  
## 6 Camaro            1  518.    1335443.     1156.

Task 2, compute the mean and standard deviation of volume (the number of total mentions form social media), pos (the number of positive mentions) and neg (the number of negative mentions) for each new_brand before the Super Bowl and after the Super Bowl.

task2_newbrand <- superbowl %>% 
  group_by(new_brand, superbowl) %>%
  summarise(
    vol_mean = mean(volume),
    vol_sd = sd(volume),
    pos_mean = mean(pos),
    pos_sd = sd(pos),
    neg_mean = mean(neg),
    neg_sd = sd(neg)
  )
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
task2_newbrand
## # A tibble: 6 × 8
## # Groups:   new_brand [3]
##   new_brand superbowl vol_mean vol_sd pos_mean pos_sd neg_mean neg_sd
##   <chr>         <int>    <dbl>  <dbl>    <dbl>  <dbl>    <dbl>  <dbl>
## 1 Beetle            0    3418.   622.     659.   91.8     86.5   17.2
## 2 Beetle            1    4319.  1463.     965.  455.     156.   100. 
## 3 CR-Z              0    1172.   266.     268    53.7     60.5   33.8
## 4 CR-Z              1    1486.   712.     339.  136.      49.8   21.6
## 5 Camaro            0   87596.  7817.   21432  2610.    4458.   334. 
## 6 Camaro            1   93613. 22222.   23509. 6351.    4679.  1333.

Then, answer the following questions. Please write your answers after Response:

Questions:

  1. What brand has the highest mean ad spend after the Super Bowl? Has the mean ad spending increased after the Super Bowl for that brand? By how much?

  2. What brand has the highest mean volume of mentions after the Super Bowl? What brand has the lowest mean volume of positive mentions after the Super Bowl?

Responses:

1.The brand with the highest mean ad spend after the Super Bowl is Beetle, with a mean ad spend of 891. Before the Super Bowl, its average ad spend was 2.18, showing a significant increase of approximately 888.82, which is almost 400 times higher than before.

2.The brand with the highest mean volume of mentions after the Super Bowl is Camaro, with an average of 93,613 mentions. The brand with the lowest mean volume of positive mentions after the Super Bowl is CR-Z, with an average of 339 positive mentions.

Section 2: Bivariate Statistics

Task 1, compute the correlation between adspend and volume in superbowl dataset, before the Super Bowl and after the Super Bowl. Similarly, compute the correlation between adspend and pos, and between adspend and neg in superbowl dataset, before the Super Bowl and after the Super Bowl. Hint: you may use dplyr functions group_by() and summarise().

s2_task1_newbrand <- superbowl %>%
  group_by(superbowl) %>%
  summarise(
    cor_ads_vol = cor(adspend, volume),
    cor_ads_pos = cor(adspend, pos),
    cor_ads_neg = cor(adspend, neg)
  )
s2_task1_newbrand
## # A tibble: 2 × 4
##   superbowl cor_ads_vol cor_ads_pos cor_ads_neg
##       <int>       <dbl>       <dbl>       <dbl>
## 1         0      0.975       0.975       0.979 
## 2         1      0.0513      0.0536      0.0495

Task 2, for each new_brand, compute the correlation between adspend and volume, between adspend and pos, and between adspend and neg, before the Super Bowl and after the Super Bowl.

s2_task2_newbrand <- superbowl %>%
  group_by(superbowl) %>%
  summarise(
    cor_ads_vol = cor(adspend, volume),
    cor_ads_pos = cor(adspend, pos),
    cor_ads_neg = cor(adspend, neg)
  )
s2_task2_newbrand
## # A tibble: 2 × 4
##   superbowl cor_ads_vol cor_ads_pos cor_ads_neg
##       <int>       <dbl>       <dbl>       <dbl>
## 1         0      0.975       0.975       0.979 
## 2         1      0.0513      0.0536      0.0495
s2_task2_newbrand_insights <- superbowl %>%
  group_by(new_brand, superbowl) %>%
  summarise(
    mean_pos = mean(pos),
    mean_neg = mean(neg),
    med_pos = median(pos),
    med_neg = median(neg),
    var_pos = var(pos),
    var_neg = var(neg),
    sd_pos = sd(pos),
    sd_neg = sd(neg),
    cor_ads_pos = cor(adspend, pos),
    cor_ads_neg = cor(adspend, neg),
    cov_ads_pos = cov(adspend, pos),
    cov_ads_neg = cov(adspend, neg)
  )
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
s2_task2_newbrand_insights
## # A tibble: 6 × 14
## # Groups:   new_brand [3]
##   new_brand superbowl mean_pos mean_neg med_pos med_neg   var_pos var_neg sd_pos
##   <chr>         <int>    <dbl>    <dbl>   <dbl>   <dbl>     <dbl>   <dbl>  <dbl>
## 1 Beetle            0     659.     86.5    676.    92.5     8420.  2.96e2   91.8
## 2 Beetle            1     965.    156.     881    145     206928.  1.01e4  455. 
## 3 CR-Z              0     268      60.5    246     48       2888   1.14e3   53.7
## 4 CR-Z              1     339.     49.8    298     50      18631.  4.67e2  136. 
## 5 Camaro            0   21432    4458.   21598   4523    6810120   1.11e5 2610. 
## 6 Camaro            1   23509.   4679.   26190   5010   40341243.  1.78e6 6351. 
## # ℹ 5 more variables: sd_neg <dbl>, cor_ads_pos <dbl>, cor_ads_neg <dbl>,
## #   cov_ads_pos <dbl>, cov_ads_neg <dbl>

Then, answer the following questions. Please write your answers after Response:

Questions:

  1. What is the correlation between adspend and pos before AND after the Super Bowl? Does that imply a positive, negative or no relationship? Is the relationship strong? What about the relationship between adspend and neg before AND after the Super Bowl?

  2. Which brand’s ad spending did actually create a social media buzz (either positive or negative mentions)? Use both univariate and bivariate statistics for each brand to support your answer.

Responses:

1.The correlation between adspend and positive mentions (pos) before the Super Bowl is 0.975, indicating a strong positive relationship — higher ad spending was strongly linked to more positive mentions. After the Super Bowl, the correlation drops sharply to 0.0536, showing a very weak positive relationship and suggesting that post-event ad spending did not strongly influence positive mentions. Similarly, the correlation between adspend and negative mentions (neg) before the Super Bowl is 0.979, also a strong positive relationship, but it falls to 0.0495 after the event, implying a very weak positive relationship afterward.

2.Camaro’s ad spending created the strongest social media buzz after the Super Bowl. Univariate statistics show that it had the highest mean volume of mentions (93,613) and large increases in both positive (from 21,432 to 23,509) and negative mentions (from 4,458 to 4,679). Bivariate results also show strong pre-event correlations, confirming that Camaro’s ad campaign generated significant engagement and continued audience discussion after the Super Bowl.