midterm-part2-q2.knit

title: “Mid-term Exam - Part 2 – Q2”

author: “Davian Rosales”

date: “11/07/2023”

output: html_document

editor_options:

chunk_output_type: console

Null Hypothesis: H0: β1 = 0 Alternative Hypothesis: H1: β1 ≠ 0

First, we will be creating a new variable that has a value of one for each observation at that level and zeroes for all others. In our example using the variable (Ads), the first new variable (Ads1) will have a value of one for each observation in which the consumers are exposed to the 1st ads campaign and zero for all other observations. Likewise, we create Ads2 when the consumers are exposed to the 1st ads campaign, and 0 otherwise, and Ads3 is 1 when the consumers are exposed to the 3rd ads campaign, and 0 otherwise. The level of the categorical variable that is coded as zero in the new variables is the reference level or the level to which all of the other levels are compared. In our example, it is the reference level Ads0. Our objective is to see which ads campaign leads to more product sales.

Example 1 - A simple A/B test

#setwd("C:/Users/zxu3/Documents/R/abtesting")
#Please install the following package if the package "readr" is not installed.
#install.packages("readr")
library(readr)
data <- read_csv("ab_testing1.csv")

## Rows: 29 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Ads, Purchase
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

ls(data) # list the variables in the dataset

## [1] "Ads"      "Purchase"

head(data) #list the first 6 rows of the dataset

## # A tibble: 6 × 2
##     Ads Purchase
##   <dbl>    <dbl>
## 1     1      152
## 2     0       21
## 3     2       77
## 4     0       65
## 5     1      183
## 6     1       87

# creating the factor variable
data$Ads <- factor(data$Ads)
is.factor(data$Ads)

## [1] TRUE

# showing the first 15 rows of the variable "Ads"
data$Ads[1:15]

##  [1] 1 0 2 0 1 1 2 2 2 0 2 2 0 2 2
## Levels: 0 1 2

#now we do the regression analysis and examine the results
summary(lm(Purchase~Ads, data = data))

## 
## Call:
## lm(formula = Purchase ~ Ads, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -59.75 -22.75  -3.75  30.25  64.29 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    49.00      10.21   4.800 5.69e-05 ***
## Ads1           69.71      15.91   4.383 0.000171 ***
## Ads2           24.75      13.82   1.791 0.084982 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.28 on 26 degrees of freedom
## Multiple R-squared:  0.4262, Adjusted R-squared:  0.3821 
## F-statistic: 9.656 on 2 and 26 DF,  p-value: 0.0007308

Example 2.2 -An A/B test with only one line of syntax (no dummy coding required)

#Alternatively, you can also use the factor function within the lm function, saving the step of creating the factor variable first.
summary(lm(Purchase~ factor(Ads), data))

## 
## Call:
## lm(formula = Purchase ~ factor(Ads), data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -59.75 -22.75  -3.75  30.25  64.29 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     49.00      10.21   4.800 5.69e-05 ***
## factor(Ads)1    69.71      15.91   4.383 0.000171 ***
## factor(Ads)2    24.75      13.82   1.791 0.084982 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.28 on 26 degrees of freedom
## Multiple R-squared:  0.4262, Adjusted R-squared:  0.3821 
## F-statistic: 9.656 on 2 and 26 DF,  p-value: 0.0007308

Interpretations - Is our campaign effective? Let’s show the significance of the independent variable.

Since the p-value for X is .00018, which is less than .05, we reject the null hypothesis in favor of the alternative hypothesis.

The coefficient for Ads1 in the regression output is 41.57, which indicates that the 1st Advertising campaign is more effective (relative to the group who did not receive any advertising exposure).

Now the estimates for β0 and β1 are 95.43 and 41.57, respectively, leading to a prediction of average sales of 95.43 for the control group (group A) and a prediction of average sales which is 95.43 + 41.57*1 = 137 for the treatment group or the group of consumers who were exposed to the advertising campaign.