AB Testing is an important element in the data analysis for marketing. Opening the example dataset:
# E:\ACER Mar 2021\Documents\R\HOMEPROJECTS\DIGITAL MARKETING\AB Testing\AB_Testing_Markdown_Document.Rmd
raw_AB_test_data <- read.csv('https://raw.githubusercontent.com/pthiagu2/DataMining/master/WA_Fn-UseC_-Marketing-Campaign-Eff-UseC_-FastF.csv', header = TRUE, na.strings = c("." , "Na", "", "..", "..."))
dim(raw_AB_test_data)
## [1] 548 7
knitr::kable(head(raw_AB_test_data))
MarketID | MarketSize | LocationID | AgeOfStore | Promotion | week | SalesInThousands |
---|---|---|---|---|---|---|
1 | Medium | 1 | 4 | 3 | 1 | 33.73 |
1 | Medium | 1 | 4 | 3 | 2 | 35.67 |
1 | Medium | 1 | 4 | 3 | 3 | 29.03 |
1 | Medium | 1 | 4 | 3 | 4 | 39.25 |
1 | Medium | 2 | 5 | 2 | 1 | 27.81 |
1 | Medium | 2 | 5 | 2 | 2 | 34.67 |
print("Here is the main summary statistics:")
## [1] "Here is the main summary statistics:"
raw_AB_test_data$MarketSize = as.factor(raw_AB_test_data$MarketSize)
knitr::kable(summary(raw_AB_test_data))
MarketID | MarketSize | LocationID | AgeOfStore | Promotion | week | SalesInThousands | |
---|---|---|---|---|---|---|---|
Min. : 1.000 | Large :168 | Min. : 1.0 | Min. : 1.000 | Min. :1.000 | Min. :1.00 | Min. :17.34 | |
1st Qu.: 3.000 | Medium:320 | 1st Qu.:216.0 | 1st Qu.: 4.000 | 1st Qu.:1.000 | 1st Qu.:1.75 | 1st Qu.:42.55 | |
Median : 6.000 | Small : 60 | Median :504.0 | Median : 7.000 | Median :2.000 | Median :2.50 | Median :50.20 | |
Mean : 5.715 | NA | Mean :479.7 | Mean : 8.504 | Mean :2.029 | Mean :2.50 | Mean :53.47 | |
3rd Qu.: 8.000 | NA | 3rd Qu.:708.0 | 3rd Qu.:12.000 | 3rd Qu.:3.000 | 3rd Qu.:3.25 | 3rd Qu.:60.48 | |
Max. :10.000 | NA | Max. :920.0 | Max. :28.000 | Max. :3.000 | Max. :4.00 | Max. :99.65 |
Promotion | mean_age_store | St.Dev | Minimum | Maximum | Median | perc_75 | perc_25 |
---|---|---|---|---|---|---|---|
1 | 8.28 | 6.64 | 1 | 27 | 6 | 12 | 3 |
2 | 7.98 | 6.60 | 1 | 28 | 7 | 10 | 3 |
3 | 9.23 | 6.65 | 1 | 24 | 8 | 12 | 5 |
week | Promotion | Counter |
---|---|---|
1 | 1 | 43 |
1 | 2 | 47 |
1 | 3 | 47 |
2 | 1 | 43 |
2 | 2 | 47 |
2 | 3 | 47 |
3 | 1 | 43 |
3 | 2 | 47 |
3 | 3 | 47 |
4 | 1 | 43 |
4 | 2 | 47 |
4 | 3 | 47 |
##
## Regression Results
## ====================================================
## Dependent variable:
## ---------------------------
## SalesInThousands
## ----------------------------------------------------
## factor(Promotion)2 -9.716***
## (0.557)
##
## factor(Promotion)3 -4.931***
## (0.567)
##
## AgeOfStore 0.013
## (0.035)
##
## factor(MarketSize)Medium -19.413***
## (0.924)
##
## factor(MarketSize)Small -0.021
## (1.040)
##
## factor(MarketID)2 6.427***
## (1.420)
##
## factor(MarketID)3 30.262***
## (0.811)
##
## factor(MarketID)4
##
##
## factor(MarketID)5 15.667***
## (0.989)
##
## factor(MarketID)6 1.627*
## (0.980)
##
## factor(MarketID)7 9.398***
## (0.987)
##
## factor(MarketID)8 12.607***
## (1.047)
##
## factor(MarketID)9 17.380***
## (1.096)
##
## factor(MarketID)10
##
##
## factor(week)2 -0.404
## (0.625)
##
## factor(week)3 -0.316
## (0.625)
##
## factor(week)4 -0.578
## (0.625)
##
## Constant 59.608***
## (0.805)
##
## ----------------------------------------------------
## Observations 548
## R2 0.907
## Adjusted R2 0.905
## Residual Std. Error 5.170 (df = 532)
## F Statistic 347.487*** (df = 15; 532)
## ====================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
##
## One Sample t-test
##
## data: raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands
## t = 8.5323, df = 171, p-value = 7.435e-15
## alternative hypothesis: true mean is not equal to 47.32941
## 95 percent confidence interval:
## 55.60748 60.59054
## sample estimates:
## mean of x
## 58.09901
\[t = \frac{\bar{X}-\mu}{S/\sqrt{n}}\] Where:
\(\bar{X}\) - is the mean of our sample;
\(\mu\) - is the suggested mean of population (in our case, of the second promotion we compare of)
S - is the standard deviation
\(\sqrt{n}\) - is the square root of the number of observations
The source of the formula is here, or from Wikipedia
our_n_obs = length(raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands)
our_mean_of_sample = mean((raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands))
our_mean_of_population = mean(raw_AB_test_data[which(raw_AB_test_data$Promotion == 2), ]$SalesInThousands)
our_sd_of_sample = sd((raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands))
our_t_test = (our_mean_of_sample-our_mean_of_population)/(our_sd_of_sample/(our_n_obs^(1/2)))
# Obtain the p-values:
our_p_value = 2*pt(-abs(our_t_test),df=our_n_obs-1, lower.tail=TRUE)
print(paste("Our t-test value is: ", round(our_t_test, 3), "; the p-value of this test is: ", round(our_p_value, 3), "(precisely:", our_p_value, ")", sep=""))
## [1] "Our t-test value is: 8.532; the p-value of this test is: 0(precisely:7.43522826520984e-15)"
##
## One Sample t-test
##
## data: raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands
## t = 2.1665, df = 171, p-value = 0.03166
## alternative hypothesis: true mean is not equal to 55.36447
## 95 percent confidence interval:
## 55.60748 60.59054
## sample estimates:
## mean of x
## 58.09901