R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Use control+Enter to run the code chunks on PC. Use command+Enter to run the code chunks on MAC.

Load Packages

In this section, we install and load the necessary packages.

Import Data

In this section, we import the necessary data for this lab.

Homework Assignment 2: Super Bowl Ads Dataset

You are hired as a Business Analyst to help the manager to find out if spending millions of dollars on Super Bowl ads creates social media buzz.

As a second step, carry out descriptive analysis on superbowl_commmercials.csv data. There are some questions for each sections that you need to answer.

Section 1: Univariate Statistics

Task 1, compute the mean, variance, and standard deviation of Cost for each Brand before the Super Bowl and after the Super Bowl, using the relevant R functions. Hint: you may use dplyr functions groupby() and summarise().

superbowl %>%
  group_by(Brand) %>%
  summarise(mean=mean(Estimated.Cost),
sd=sd(Estimated.Cost),
variance=var(Estimated.Cost))
## # A tibble: 10 × 4
##    Brand      mean    sd variance
##    <chr>     <dbl> <dbl>    <dbl>
##  1 Bud Light  3.63  2.76     7.60
##  2 Budweiser  4.58  2.81     7.91
##  3 Coca-Cola  6.17  2.40     5.75
##  4 Doritos    4.29  2.92     8.53
##  5 E-Trade    2.81  1.17     1.37
##  6 Hyundai    5.58  3.88    15.0 
##  7 Kia        9.00  3.68    13.5 
##  8 NFL       11.7  10.6    111.  
##  9 Pepsi      4.51  2.28     5.19
## 10 Toyota     8.35  4.03    16.3

Task 2 Create a histogram to show the distribution of Super Bowl advertising cost and YouTube likes.Do you observe any extreme values?

hist(superbowl$Estimated.Cost)

hist(superbowl$Youtube.Likes)

Questions:

  1. What brand has the highest mean ad cost for the Super Bowl? and What brand has the lowest variability in Cost
  2. Which brand has the extreme value in YouTube Likes?

The NFL has the highest mean cost and e-trade has the lowest variance. Doritos has the extreme value in youtube likes.

Section 2: Bivariate Statistics

Task 1, compute the covariance and correlation between Cost and Youtube.Likes in superbowl dataset.

cov(superbowl$Estimated.Cost,superbowl$Youtube.Likes)
## [1] NA
cor(superbowl$Estimated.Cost,superbowl$Youtube.Likes)
## [1] NA

Task 2,Create a scatter plot showing the relationship between Super Bowl advertising cost and YouTube Likes.

plot(superbowl$Estimated.Cost,superbowl$Youtube.Likes)

Then, answer the following questions. Please write your answers after Response:

Questions:

  1. What is the correlation between Cost and Youtube.Likes? Does that imply a positive, negative or no relationship? Is the relationship strong? There is a strong correlation between youtube likes and superbowl costs of 5 and 10 million. There is no relationship however as there is no linear pattern.