This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
This is usually the first step of product design or service design for marketing analytics professionals who work for Wayfair, Walmart, Netflix, Booking.com, etc (see the references) H0- Null hypotheses use as basis for argument but has not yet proven, no difference prediction (all equal). H1 - Alternative hypotheses statement set-up to establish like new effect compared to existing (e.g new drug is better than the existing standard products).
Which type of in-store advertisement is more effective? To answer this question, the marketing team decided to place two types of ads in a pilot store for testing using two themes of juices: one theme is natural production of the juice, and the other theme is family health caring. The goal of this experiment is to see if they can place the better one into all of the stores after the pilot period.
In this study, we analyze the effectiveness of ads on sales using Welch’s independent sample t-test. Here independent means that points (i.e.,customers in this case) do not match up with each other. Alternatively, for instance, we might perform a paired sample t test in which we could test if a before (let consumers be exposed to natural themes) and after (let consumers be exposed to family themes) condition will affect the sales of each store.
Sales: Total unit sales of the grape juice in one week in a store Price: Average unit price of the grape juice in the week ad_type: The in-store advertisement type to promote the grape juice ad_type = 0, the theme of the ad is natural production of the juice ad_type = 1, the theme of the ad is family health caring price_apple: Average unit price of the apple juice in the same store in the week price_cookies: Average unit price of the cookies in the same store in the week
Please write a null hypothesis and an alternative hypothesis using the template hypotheses available in the research design module.
H1: The family health caring theme of grape juice has promoted greater average sales than the theme of natural production. H0: The family health caring theme of the grape juice does NOT have greater average sales than the theme of natural production.
Please make your conclusions based on the results in descriptive analysis 3. What is your conclusion?
Looking at the results from the descriptive analysis 3, the family health caring theme has been more effective in selling grape juice than the natural production theme.
We performed a normality test in Step - normality check 1. What is your conclusion?
After the normality test performed in the Assumption Check 1, it shows that the sales data between the ad types is normally distributed.
In this step, you will be performing a t-test using Excel. Once you get the result, please attach your output Spreadsheet in the discussion forum. Reference: Excel - Independent samples Welch t test (via data analysis) https://www.youtube.com/watch?v=sHqCrK_FMyY
In this step, you will be performing the t-test again using R and R studio. The goal is to help you document your analysis for future reference.
library(lsr)
data <- read.csv('grapeJuice.csv')
# Let's perform a Welch's T-Test
independentSamplesTTest(formula = Sales ~ ad_type, data)
## Warning in independentSamplesTTest(formula = Sales ~ ad_type, data): group
## variable is not a factor
##
## Welch's independent samples t-test
##
## Outcome variable: Sales
## Grouping variable: ad_type
##
## Descriptive statistics:
## 0 1
## mean 186.667 246.667
## std dev. 35.864 50.504
##
## Hypotheses:
## null: population means equal for both groups
## alternative: different population means in each group
##
## Test results:
## t-statistic: -3.752
## degrees of freedom: 25.257
## p-value: <.001
##
## Other information:
## two-sided 95% confidence interval: [-92.922, -27.078]
## estimated effect size (Cohen's d): 1.37
We performed a Welch’s t test in the step 3. What is your conclusion? Hint: read the first three reference articles. Make sure to cite.
The ad_type=1 which is the family health caring ad theme has greater average sales than ad=type=0 or the natural production theme. The test results show a statistical significance.
https://splitmetrics.com/blog/mobile-a-b-testing-statistical- significance/
data <- read.csv('grapeJuice.csv') #read data
str(data)
## 'data.frame': 30 obs. of 6 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Sales : int 222 201 247 169 317 227 214 187 188 275 ...
## $ price : num 9.83 9.72 10.15 10.04 8.38 ...
## $ ad_type : int 0 1 1 0 1 0 1 0 1 0 ...
## $ price_apple : num 7.36 7.43 7.66 7.57 7.33 7.51 7.57 7.66 7.39 8.29 ...
## $ price_cookies: num 8.8 9.62 8.9 10.26 9.54 ...
head(data) #view the first 6 lines
## X Sales price ad_type price_apple price_cookies
## 1 1 222 9.83 0 7.36 8.80
## 2 2 201 9.72 1 7.43 9.62
## 3 3 247 10.15 1 7.66 8.90
## 4 4 169 10.04 0 7.57 10.26
## 5 5 317 8.38 1 7.33 9.54
## 6 6 227 9.74 0 7.51 9.49
tail(data) #view the last 6 lines
## X Sales price ad_type price_apple price_cookies
## 25 25 335 8.34 1 8.23 9.13
## 26 26 145 10.27 0 7.41 10.58
## 27 27 201 10.26 1 7.67 9.22
## 28 28 131 10.49 0 7.59 10.43
## 29 29 210 10.36 0 7.93 9.44
## 30 30 279 8.56 1 7.65 10.44
#perform some basic descriptive analysis
summary(data)
## X Sales price ad_type price_apple
## Min. : 1.00 Min. :131.0 Min. : 8.200 Min. :0.0 Min. :7.300
## 1st Qu.: 8.25 1st Qu.:182.5 1st Qu.: 9.585 1st Qu.:0.0 1st Qu.:7.438
## Median :15.50 Median :204.5 Median : 9.855 Median :0.5 Median :7.580
## Mean :15.50 Mean :216.7 Mean : 9.738 Mean :0.5 Mean :7.659
## 3rd Qu.:22.75 3rd Qu.:244.2 3rd Qu.:10.268 3rd Qu.:1.0 3rd Qu.:7.805
## Max. :30.00 Max. :335.0 Max. :10.490 Max. :1.0 Max. :8.290
## price_cookies
## Min. : 8.790
## 1st Qu.: 9.190
## Median : 9.515
## Mean : 9.622
## 3rd Qu.:10.140
## Max. :10.580
#set the 1 by 2 layout plot window
par(mfrow=c(1,2))
#Check if there are outliers using a boxplot
#Let's perform boxplots in two different ways
boxplot(data$Sales,main="Boxplot for sales data", ylab="Sales")
boxplot(data$Sales,main="Boxplot for sales data", horizontal = TRUE, xlab="Sales")
#Let's perform a histogram analysis
hist(data$Sales,main='histogram plot for sales data',xlab='sales_grape',prob=T)
lines(density(data$Sales),lty='dashed',lwd=2.5, col='blue')
It seems that there is no outlier and the distribution of the data is roughly normal.
hist(data$price,main='histogram plot for pricing data',xlab='price_grape',prob=T)
lines(density(data$price),lty='dashed',lwd=2.5, col='blue')
#divide the dataset into two sub dataset by ad_type
sales_ad_nature = subset(data,ad_type==0)
sales_ad_family = subset(data,ad_type==1)
#calculate the mean of sales with different ad_type
mean(sales_ad_nature$Sales)
## [1] 186.6667
mean(sales_ad_family$Sales)
## [1] 246.6667
The assumptions of t-tests assumes the observations are normally distributed and independent.
#set the 1 by 2 layout plot window
par(mfrow = c(1,2))
# Explore the distribution of the data using histogram
hist(sales_ad_nature$Sales,main="",xlab="sales with nature theme ad",prob=T)
lines(density(sales_ad_nature$Sales),lty="dashed",lwd=2.5,col="red")
hist(sales_ad_family$Sales,main="",xlab="sales with family theme ad",prob=T)
lines(density(sales_ad_family$Sales),lty="dashed",lwd=2.5,col="red")
#set the 1 by 2 layout plot window
par(mfrow = c(1,2))
# boxplot to check if there are outliers in each group
boxplot(sales_ad_family$Sales,horizontal = TRUE, xlab="sales with family theme ad")
boxplot(sales_ad_nature$Sales,horizontal = TRUE, xlab="sales with nature theme ad")
data$ad_type <- as.factor(data$ad_type)
head(data)
## X Sales price ad_type price_apple price_cookies
## 1 1 222 9.83 0 7.36 8.80
## 2 2 201 9.72 1 7.43 9.62
## 3 3 247 10.15 1 7.66 8.90
## 4 4 169 10.04 0 7.57 10.26
## 5 5 317 8.38 1 7.33 9.54
## 6 6 227 9.74 0 7.51 9.49
# Import the ggplot library
library(ggplot2)
# Wait for the magic to happen
ggplot(data, aes(x=ad_type, y=Sales, fill=ad_type))+
geom_boxplot(outlier.shape = NA, alpha=.5) +
geom_jitter(width=.1, size=1) +
theme_classic() +
scale_fill_manual(values=c("lightseagreen","darkseagreen"))
In this step, we perform a Shapiro test to see if our data is from a normally distributed population.
shapiro.test(sales_ad_nature$Sales)
##
## Shapiro-Wilk normality test
##
## data: sales_ad_nature$Sales
## W = 0.94255, p-value = 0.4155
shapiro.test(sales_ad_family$Sales)
##
## Shapiro-Wilk normality test
##
## data: sales_ad_family$Sales
## W = 0.89743, p-value = 0.08695
Performing a t-test with which has two categories (e.g., Controlled and Treated) helps us understand if there are differences in the population means between the two groups. mu=0 refers to the null hypothesis that the difference between Control and Treated is 0, and hence they are similar. alt= two.sided refers to the a two sided t test. conf=0.95 is the confidence interval.
t.test(Sales ~ ad_type, data)
##
## Welch Two Sample t-test
##
## data: Sales by ad_type
## t = -3.7515, df = 25.257, p-value = 0.0009233
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -92.92234 -27.07766
## sample estimates:
## mean in group 0 mean in group 1
## 186.6667 246.6667
library(pander)
panderOptions('round',4)
panderOptions('digits',7)
panderOptions('keep.trailing.zeros',TRUE)
panderOptions("table.split.table", Inf)
pander(t.test(sales_ad_nature$Sales,sales_ad_family$Sales))
| Test statistic | df | P value | Alternative hypothesis | mean of x | mean of y |
|---|---|---|---|---|---|
| -3.7515 | 25.2571 | 9e-04 * * * | two.sided | 186.6667 | 246.6667 |
For R-related questions, use https://stackoverflow.com/questions/tagged/r For statistics related questions, use https://stats.stackexchange.com. Data Science Specialization at John Hopkins University, https://www.coursera.org/specializations/jhu-data-science ## References Stuart Frisby, Booking.com - Conversions@Google 2017. https://www.youtube.com/watch?v=_sx5LV23hIE Design Testing at Netflix https://www.youtube.com/watch?v=-Gy8TnoXZf8 Mobile A/B Testing Results Analysis: Statistical Significance, Confidence Level and Intervals https://splitmetrics.com/blog/mobile-a-b-testing-statistical- significance/ Gemini: Wayfairs advanced marketing test design and measurement platform� https://tech.wayfair.com/data-science/2019/07/gemini-wayfairs-advanced-marketing- test-design-and-measurement-platform/ Two Independent Samples Unequal Variance (Welch s Test)� https://sites.nicholas.duke.edu/statsreview/means/welch/ ANOVA, t-tests and regression: different ways of showing the same thing http://deevybee.blogspot.com/2017/11/anova-t-tests-and-regression-different.html The Independent Samples t-test (Welch Test) https://stats.libretexts.org/Bookshelves/Applied_Statistics/Book%3A_Learning_Statis tics_with_R_-A_tutorial_for_Psychology_Students_and_other_Beginners(Navarro)/ 13%3A_Comparing_Two_Means/13.04%3A_The_Independent_Samples_t-test_(Welch_Test) https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test My R Journey: Thomas Mock https://rfortherestofus.com/2019/09/my-r-journey-thomas- mock/ R Programming Tutorial - Learn the Basics of Statistical Computing: https://www.youtube.com/watch?v=_V8eKsto3Ug Pander Library 1. https://www.r-project.org/nosvn/pandoc/pander.html Generating-tables-using-pander-knitr.https://r-norberg.blogspot.com/2013/06/ generating-tables-using-pander-knitr.html