R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Use control+Enter to run the code chunks on PC. Use command+Enter to run the code chunks on MAC.

Load Packages

In this section, we install and load the necessary packages.

Import Data

In this section, we import the necessary data for this lab.

Task 1.1: Compute the mean, variance, and standard deviation of adspend for each new_brand before the Super Bowl and after the Super Bowl, using the relevant R functions. Hint: you may use dplyr functions groupby() and summarise().

superbowl %>% group_by(new_brand,superbowl) %>% summarise(mean=mean(adspend), sd=sd(adspend), var=var(adspend))

### mean, variance, and standard deviation of adspend for each new_brand before Super Bowl and after Super Bowl
superbowl %>% 
  ## group by new_brand and superbowl
  group_by(new_brand, superbowl) %>% 
  ## summarize by groups
  summarise(
    # calculate means
    mean_adspend = mean(adspend), 
    # calculate variances
    var_adspend = var(adspend), 
    # calculate standard deviations
    sd_adspend = sd(adspend)
    )
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 5
## # Groups:   new_brand [3]
##   new_brand superbowl mean_adspend  var_adspend sd_adspend
##   <chr>         <int>        <dbl>        <dbl>      <dbl>
## 1 Beetle            0        2.17        2.18        1.48 
## 2 Beetle            1      891.    4839676.       2200.   
## 3 CR-Z              0        1.25        0.703       0.839
## 4 CR-Z              1        0.709       0.0529      0.230
## 5 Camaro            0       71.1       156.         12.5  
## 6 Camaro            1      518.    1335443.       1156.

Task 1.2: Create a new column in superbowl dataset, call it pos_prop, and calculate the proportion of positive mentions for each new_brand and each week (each row). Similarly, create a new column in superbowl dataset, call it neg_prop, and calculate the proportion of negative mentions for each new_brand and each week.

Hint: pos_prop is just the ratio of pos to volume, meaning pos_prop = pos/volume. Similarly, neg_prop = neg/volume.

superbowl<-superbowl %>% mutate(pos_pos=pos/volume,neg_prop = neg/volume)

### create new columns pos_prop and neg_prop, note that to add the columns to the dataset we have to reassign the columns to the dataset
superbowl <- superbowl %>% 
  ## mutate new columns pos_prop and neg_prop
  mutate(pos_prop = pos/volume, neg_prop = neg/volume)

Task 1.3: Compute the mean and standard deviation of volume, pos_prop and neg_prop for each new_brand before the Super Bowl and after the Super Bowl.

superbowl %>% group_by(new_brand,superbowl) %>% summarise(mean_volume=mean(volume), sd_volume=sd(volume), mean_pos=mean(pos_prop), sd_pos=sd(pos_prop), mean_neg=mean(neg_prop), sd_neg=sd(neg_prop))

### mean and standard deviation of volume, pos_prop and neg_prop for each new_brand before Super Bowl and after Super Bowl
superbowl %>%  
  ## group by new_brand and superbowl
  group_by(new_brand, superbowl) %>% 
  ## summarize by groups
  summarise(
    # calculate means and standard deviation for volume 
    mean_volume = mean(volume), sd_volume = sd(volume),
    # calculate means and standard deviation for pos_prop
    mean_pos = mean(pos_prop), sd_pos = sd(pos_prop),
    # calculate means and standard deviation for neg_prop
    mean_neg = mean(neg_prop), sd_neg = sd(neg_prop)
    ) 
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 8
## # Groups:   new_brand [3]
##   new_brand superbowl mean_volume sd_volume mean_pos  sd_pos mean_neg   sd_neg
##   <chr>         <int>       <dbl>     <dbl>    <dbl>   <dbl>    <dbl>    <dbl>
## 1 Beetle            0       3418.      622.    0.194 0.0116    0.0254 0.00338 
## 2 Beetle            1       4319.     1463.    0.216 0.0337    0.0339 0.00900 
## 3 CR-Z              0       1172.      266.    0.230 0.0133    0.0497 0.0167  
## 4 CR-Z              1       1486.      712.    0.236 0.0697    0.0339 0.0139  
## 5 Camaro            0      87596.     7817.    0.244 0.00915   0.0510 0.000940
## 6 Camaro            1      93613.    22222.    0.247 0.0222    0.0487 0.00638
ggplot(superbowl, aes (x = new_brand, y = volume)) +
  geom_boxplot()

ggplot(superbowl, aes (x = new_brand, y = volume)) +
  geom_bar(stat = "identity", fill = "blue", color = "blue")

Camaro <- superbowl %>% filter(new_brand == "Camaro")

ggplot(Camaro, mapping = aes(x = pos_prop, y = neg_prop)) + 
 geom_point()