This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Use control+Enter to run the code chunks on PC. Use command+Enter to run the code chunks on MAC.
In this section, we install and load the necessary packages.
In this section, we import the necessary data for this lab.
Task 1.1: Compute the mean, variance, and standard deviation of adspend for each new_brand before the Super Bowl and after the Super Bowl, using the relevant R functions. Hint: you may use dplyr functions groupby() and summarise().
superbowl %>% group_by(new_brand,superbowl) %>% summarise(mean=mean(adspend), sd=sd(adspend), var=var(adspend))
### mean, variance, and standard deviation of adspend for each new_brand before Super Bowl and after Super Bowl
superbowl %>%
## group by new_brand and superbowl
group_by(new_brand, superbowl) %>%
## summarize by groups
summarise(
# calculate means
mean_adspend = mean(adspend),
# calculate variances
var_adspend = var(adspend),
# calculate standard deviations
sd_adspend = sd(adspend)
)
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 5
## # Groups: new_brand [3]
## new_brand superbowl mean_adspend var_adspend sd_adspend
## <chr> <int> <dbl> <dbl> <dbl>
## 1 Beetle 0 2.17 2.18 1.48
## 2 Beetle 1 891. 4839676. 2200.
## 3 CR-Z 0 1.25 0.703 0.839
## 4 CR-Z 1 0.709 0.0529 0.230
## 5 Camaro 0 71.1 156. 12.5
## 6 Camaro 1 518. 1335443. 1156.
Task 1.2: Create a new column in superbowl dataset, call it pos_prop, and calculate the proportion of positive mentions for each new_brand and each week (each row). Similarly, create a new column in superbowl dataset, call it neg_prop, and calculate the proportion of negative mentions for each new_brand and each week.
Hint: pos_prop is just the ratio of pos to volume, meaning pos_prop = pos/volume. Similarly, neg_prop = neg/volume.
superbowl<-superbowl %>% mutate(pos_pos=pos/volume,neg_prop = neg/volume)
### create new columns pos_prop and neg_prop, note that to add the columns to the dataset we have to reassign the columns to the dataset
superbowl <- superbowl %>%
## mutate new columns pos_prop and neg_prop
mutate(pos_prop = pos/volume, neg_prop = neg/volume)
Task 1.3: Compute the mean and standard deviation of volume, pos_prop and neg_prop for each new_brand before the Super Bowl and after the Super Bowl.
superbowl %>% group_by(new_brand,superbowl) %>% summarise(mean_volume=mean(volume), sd_volume=sd(volume), mean_pos=mean(pos_prop), sd_pos=sd(pos_prop), mean_neg=mean(neg_prop), sd_neg=sd(neg_prop))
### mean and standard deviation of volume, pos_prop and neg_prop for each new_brand before Super Bowl and after Super Bowl
superbowl %>%
## group by new_brand and superbowl
group_by(new_brand, superbowl) %>%
## summarize by groups
summarise(
# calculate means and standard deviation for volume
mean_volume = mean(volume), sd_volume = sd(volume),
# calculate means and standard deviation for pos_prop
mean_pos = mean(pos_prop), sd_pos = sd(pos_prop),
# calculate means and standard deviation for neg_prop
mean_neg = mean(neg_prop), sd_neg = sd(neg_prop)
)
## `summarise()` has grouped output by 'new_brand'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 8
## # Groups: new_brand [3]
## new_brand superbowl mean_volume sd_volume mean_pos sd_pos mean_neg sd_neg
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Beetle 0 3418. 622. 0.194 0.0116 0.0254 0.00338
## 2 Beetle 1 4319. 1463. 0.216 0.0337 0.0339 0.00900
## 3 CR-Z 0 1172. 266. 0.230 0.0133 0.0497 0.0167
## 4 CR-Z 1 1486. 712. 0.236 0.0697 0.0339 0.0139
## 5 Camaro 0 87596. 7817. 0.244 0.00915 0.0510 0.000940
## 6 Camaro 1 93613. 22222. 0.247 0.0222 0.0487 0.00638
ggplot(superbowl, aes (x = new_brand, y = volume)) +
geom_boxplot()
ggplot(superbowl, aes (x = new_brand, y = volume)) +
geom_bar(stat = "identity", fill = "blue", color = "blue")
Camaro <- superbowl %>% filter(new_brand == "Camaro")
ggplot(Camaro, mapping = aes(x = pos_prop, y = neg_prop)) +
geom_point()