#setwd("C:/Users/zxu3/Documents/R/abtesting")
#Please install the following package if the package "readr" is not installed.
#install.packages("readr")
library(readr)
data <- read_csv("ab_testing1.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Ads = col_double(),
##   Purchase = col_double()
## )
ls(data) # list the variables in the dataset
## [1] "Ads"      "Purchase"
head(data) #list the first 6 rows of the dataset
## # A tibble: 6 x 2
##     Ads Purchase
##   <dbl>    <dbl>
## 1     1      152
## 2     0       21
## 3     2       77
## 4     0       65
## 5     1      183
## 6     1       87
# creating the factor variable
data$Ads <- factor(data$Ads)
is.factor(data$Ads)
## [1] TRUE
# showing the first 15 rows of the variable "Ads"
data$Ads[1:15]
##  [1] 1 0 2 0 1 1 2 2 2 0 2 2 0 2 2
## Levels: 0 1 2
#now we do the regression analysis and examine the results
summary(lm(Purchase~Ads, data = data))
## 
## Call:
## lm(formula = Purchase ~ Ads, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -59.75 -22.75  -3.75  30.25  64.29 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    49.00      10.21   4.800 5.69e-05 ***
## Ads1           69.71      15.91   4.383 0.000171 ***
## Ads2           24.75      13.82   1.791 0.084982 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.28 on 26 degrees of freedom
## Multiple R-squared:  0.4262, Adjusted R-squared:  0.3821 
## F-statistic: 9.656 on 2 and 26 DF,  p-value: 0.0007308
#now we do an analysis for a predictor with 4 different levels
display <- read_csv("ab_testing1.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Ads = col_double(),
##   Purchase = col_double()
## )
ls(display) # list the variables in the dataset
## [1] "Ads"      "Purchase"
head(display)
## # A tibble: 6 x 2
##     Ads Purchase
##   <dbl>    <dbl>
## 1     1      152
## 2     0       21
## 3     2       77
## 4     0       65
## 5     1      183
## 6     1       87
# creating the factor variable
display$Ads <- factor(display$Ads)
is.factor(display$Ads)
## [1] TRUE
# showing the first 15 rows of the variable "Ads"
display$Ads[1:15]
##  [1] 1 0 2 0 1 1 2 2 2 0 2 2 0 2 2
## Levels: 0 1 2
#now we do a regression analysis for a predictor with 4 different levels
summary(lm(Purchase~Ads, data = display))
## 
## Call:
## lm(formula = Purchase ~ Ads, data = display)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -59.75 -22.75  -3.75  30.25  64.29 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    49.00      10.21   4.800 5.69e-05 ***
## Ads1           69.71      15.91   4.383 0.000171 ***
## Ads2           24.75      13.82   1.791 0.084982 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.28 on 26 degrees of freedom
## Multiple R-squared:  0.4262, Adjusted R-squared:  0.3821 
## F-statistic: 9.656 on 2 and 26 DF,  p-value: 0.0007308
#Alternatively, you can also use the factor function within the lm function, saving the step of creating the factor variable first.
summary(lm(Purchase~ factor(Ads), data = display))
## 
## Call:
## lm(formula = Purchase ~ factor(Ads), data = display)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -59.75 -22.75  -3.75  30.25  64.29 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     49.00      10.21   4.800 5.69e-05 ***
## factor(Ads)1    69.71      15.91   4.383 0.000171 ***
## factor(Ads)2    24.75      13.82   1.791 0.084982 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.28 on 26 degrees of freedom
## Multiple R-squared:  0.4262, Adjusted R-squared:  0.3821 
## F-statistic: 9.656 on 2 and 26 DF,  p-value: 0.0007308

Q1 Please interpret the results for the regression model in Example 2.1. Assume that you work for this company. What are the marketing implications?

Answer: The p-values for Example 2.1 turned out to be 3.31e-12 for Ads1, 0.00038 for Ads2 and 0.77 for Ads3. For Ads1, because 3.31e-12 is far below .05 we would reject the null hypothesis and accept the alternative hypothesis. We would also reject the null hypothesis and accept the alternative hypothesis for Ads2 because the p-value is also below the .05 threshold. However, we would fail to reject the null hypothesis and reject the alternative hypothesis for Ads3 because the p-value is above .05. In other words, there is no relationship between ad exposure and product purchases with Ads3, while Ads1 and Ads2 we can see observe that there is a relationship between the two variables. We can also see that the regression output for Ads1 is 75.71 which tells us that Ads1 is the most responsive and effective ad. Estimates for β0 and β1 are 55.38 and 75.71 leading to a prediction of average sales of 55.38 for the control group and a prediction of average sales which is 55.38 + 75.71*1 = 131.09 for the treatment group or the group of consumers who were exposed to the advertising campaign. The results speak for themselves. I would advise my company to run the Ads3 advertisement because on average they will make $131 per sale verus $55 per sale if they don’t run the ad.

Q2 Did we get the same results in Example 2.1 and Example 2.2? Which solution do you like better? Why?

Answer:

Q4: Write a summary on at least one article listed in the Reference section.

Answer: I’ll be commenting on the video: https://www.youtube.com/watch?v=VQpQ0YHSfqM&t=189s Stuart Frisby from Booking.com gave us a general overview of what A/B Testing is and the dos and don’ts of it. I was surprised to hear him say that 9 out of 10 times we are absolutely wrong in our assumptions about something, so get use to it if you want to be in the testing field! He explained that there are shallow and deep metrics, shallow metrics are things like “how many clicks” can we get and deep metrics are things like “how many bookings/conversions”. Stuart pointed out reasons for running A/B tests should be to clear up any assumptios (that you’re probably wrong about), to ultimatly help you be more profitable, to make decisions based on facts and to help you create products based on customer sentiments. He also pointed out some common test mistakes - making too many changes at the same time, only testing limited parts of something instead of the entire thing and assuming other competitor’s test results will alwasy work for yours. Stuart also mentioned that the main reason to run testing is to improve customer experience and the focus on customer “pain points”.