| title: “Mid-term Exam - Part 2 – Q2” |
| author: “Davian Rosales” |
| date: “11/07/2023” |
| output: html_document |
| editor_options: |
| chunk_output_type: console |
Null Hypothesis: H0: β1 = 0 Alternative Hypothesis: H1: β1 ≠ 0
First, we will be creating a new variable that has a value of one for each observation at that level and zeroes for all others. In our example using the variable (Ads), the first new variable (Ads1) will have a value of one for each observation in which the consumers are exposed to the 1st ads campaign and zero for all other observations. Likewise, we create Ads2 when the consumers are exposed to the 1st ads campaign, and 0 otherwise, and Ads3 is 1 when the consumers are exposed to the 3rd ads campaign, and 0 otherwise. The level of the categorical variable that is coded as zero in the new variables is the reference level or the level to which all of the other levels are compared. In our example, it is the reference level Ads0. Our objective is to see which ads campaign leads to more product sales.
#setwd("C:/Users/zxu3/Documents/R/abtesting")
#Please install the following package if the package "readr" is not installed.
#install.packages("readr")
library(readr)
data <- read_csv("ab_testing1.csv")
## Rows: 29 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Ads, Purchase
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ls(data) # list the variables in the dataset
## [1] "Ads" "Purchase"
head(data) #list the first 6 rows of the dataset
## # A tibble: 6 × 2
## Ads Purchase
## <dbl> <dbl>
## 1 1 152
## 2 0 21
## 3 2 77
## 4 0 65
## 5 1 183
## 6 1 87
# creating the factor variable
data$Ads <- factor(data$Ads)
is.factor(data$Ads)
## [1] TRUE
# showing the first 15 rows of the variable "Ads"
data$Ads[1:15]
## [1] 1 0 2 0 1 1 2 2 2 0 2 2 0 2 2
## Levels: 0 1 2
#now we do the regression analysis and examine the results
summary(lm(Purchase~Ads, data = data))
##
## Call:
## lm(formula = Purchase ~ Ads, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -59.75 -22.75 -3.75 30.25 64.29
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.00 10.21 4.800 5.69e-05 ***
## Ads1 69.71 15.91 4.383 0.000171 ***
## Ads2 24.75 13.82 1.791 0.084982 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 32.28 on 26 degrees of freedom
## Multiple R-squared: 0.4262, Adjusted R-squared: 0.3821
## F-statistic: 9.656 on 2 and 26 DF, p-value: 0.0007308
#Alternatively, you can also use the factor function within the lm function, saving the step of creating the factor variable first.
summary(lm(Purchase~ factor(Ads), data))
##
## Call:
## lm(formula = Purchase ~ factor(Ads), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -59.75 -22.75 -3.75 30.25 64.29
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.00 10.21 4.800 5.69e-05 ***
## factor(Ads)1 69.71 15.91 4.383 0.000171 ***
## factor(Ads)2 24.75 13.82 1.791 0.084982 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 32.28 on 26 degrees of freedom
## Multiple R-squared: 0.4262, Adjusted R-squared: 0.3821
## F-statistic: 9.656 on 2 and 26 DF, p-value: 0.0007308
Since the p-value for X is .00018, which is less than .05, we reject the null hypothesis in favor of the alternative hypothesis.
The coefficient for Ads1 in the regression output is 41.57, which indicates that the 1st Advertising campaign is more effective (relative to the group who did not receive any advertising exposure).
Now the estimates for β0 and β1 are 95.43 and 41.57, respectively, leading to a prediction of average sales of 95.43 for the control group (group A) and a prediction of average sales which is 95.43 + 41.57*1 = 137 for the treatment group or the group of consumers who were exposed to the advertising campaign.