1) Opening the File

AB Testing is an important element in the data analysis for marketing. Opening the example dataset:

# E:\ACER Mar 2021\Documents\R\HOMEPROJECTS\DIGITAL MARKETING\AB Testing\AB_Testing_Markdown_Document.Rmd
raw_AB_test_data <- read.csv('https://raw.githubusercontent.com/pthiagu2/DataMining/master/WA_Fn-UseC_-Marketing-Campaign-Eff-UseC_-FastF.csv', header = TRUE, na.strings = c("." , "Na", "", "..", "..."))

dim(raw_AB_test_data)
## [1] 548   7
knitr::kable(head(raw_AB_test_data))
MarketID MarketSize LocationID AgeOfStore Promotion week SalesInThousands
1 Medium 1 4 3 1 33.73
1 Medium 1 4 3 2 35.67
1 Medium 1 4 3 3 29.03
1 Medium 1 4 3 4 39.25
1 Medium 2 5 2 1 27.81
1 Medium 2 5 2 2 34.67
print("Here is the main summary statistics:")
## [1] "Here is the main summary statistics:"
raw_AB_test_data$MarketSize = as.factor(raw_AB_test_data$MarketSize)
knitr::kable(summary(raw_AB_test_data))
MarketID MarketSize LocationID AgeOfStore Promotion week SalesInThousands
Min. : 1.000 Large :168 Min. : 1.0 Min. : 1.000 Min. :1.000 Min. :1.00 Min. :17.34
1st Qu.: 3.000 Medium:320 1st Qu.:216.0 1st Qu.: 4.000 1st Qu.:1.000 1st Qu.:1.75 1st Qu.:42.55
Median : 6.000 Small : 60 Median :504.0 Median : 7.000 Median :2.000 Median :2.50 Median :50.20
Mean : 5.715 NA Mean :479.7 Mean : 8.504 Mean :2.029 Mean :2.50 Mean :53.47
3rd Qu.: 8.000 NA 3rd Qu.:708.0 3rd Qu.:12.000 3rd Qu.:3.000 3rd Qu.:3.25 3rd Qu.:60.48
Max. :10.000 NA Max. :920.0 Max. :28.000 Max. :3.000 Max. :4.00 Max. :99.65

2) The data descriptive charts

Plotting the pie chart for PROMOTIONS

Conclusions

1) Promotion 3 is the Largest
2) Each promotion group takes roughly about one third of the total sales during the promotion weeks

3 BARS for EACH FACTOR CATEGORY

3 BARS for EACH FACTOR CATEGORY SUM OF SALES

Waterfall = 1 bar for each factor category, SUM OF SALES

Conclusions:

1) 3 promition groups have 3 Market Sizes with similar compositions

2) Medium sized market occupies the most among the 3 promotion groups

Bar plot + Line plot with age of store:

Promotion mean_age_store St.Dev Minimum Maximum Median perc_75 perc_25
1 8.28 6.64 1 27 6 12 3
2 7.98 6.60 1 28 7 10 3
3 9.23 6.65 1 24 8 12 5

Conclusions:

1) The age group is similar for all 3 promotions

2) Stores are 8-9 years, on average

3) Most of the shops are yonger than 10-12 years

week Promotion Counter
1 1 43
1 2 47
1 3 47
2 1 43
2 2 47
2 3 47
3 1 43
3 2 47
3 3 47
4 1 43
4 2 47
4 3 47

3) Statistical Testing

## 
## Regression Results
## ====================================================
##                              Dependent variable:    
##                          ---------------------------
##                               SalesInThousands      
## ----------------------------------------------------
## factor(Promotion)2                -9.716***         
##                                    (0.557)          
##                                                     
## factor(Promotion)3                -4.931***         
##                                    (0.567)          
##                                                     
## AgeOfStore                          0.013           
##                                    (0.035)          
##                                                     
## factor(MarketSize)Medium         -19.413***         
##                                    (0.924)          
##                                                     
## factor(MarketSize)Small            -0.021           
##                                    (1.040)          
##                                                     
## factor(MarketID)2                 6.427***          
##                                    (1.420)          
##                                                     
## factor(MarketID)3                 30.262***         
##                                    (0.811)          
##                                                     
## factor(MarketID)4                                   
##                                                     
##                                                     
## factor(MarketID)5                 15.667***         
##                                    (0.989)          
##                                                     
## factor(MarketID)6                  1.627*           
##                                    (0.980)          
##                                                     
## factor(MarketID)7                 9.398***          
##                                    (0.987)          
##                                                     
## factor(MarketID)8                 12.607***         
##                                    (1.047)          
##                                                     
## factor(MarketID)9                 17.380***         
##                                    (1.096)          
##                                                     
## factor(MarketID)10                                  
##                                                     
##                                                     
## factor(week)2                      -0.404           
##                                    (0.625)          
##                                                     
## factor(week)3                      -0.316           
##                                    (0.625)          
##                                                     
## factor(week)4                      -0.578           
##                                    (0.625)          
##                                                     
## Constant                          59.608***         
##                                    (0.805)          
##                                                     
## ----------------------------------------------------
## Observations                         548            
## R2                                  0.907           
## Adjusted R2                         0.905           
## Residual Std. Error           5.170 (df = 532)      
## F Statistic               347.487*** (df = 15; 532) 
## ====================================================
## Note:                    *p<0.1; **p<0.05; ***p<0.01

Promotion 1 vs. 2

## 
##  One Sample t-test
## 
## data:  raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands
## t = 8.5323, df = 171, p-value = 7.435e-15
## alternative hypothesis: true mean is not equal to 47.32941
## 95 percent confidence interval:
##  55.60748 60.59054
## sample estimates:
## mean of x 
##  58.09901

Hand calculation method

The Formula:

\[t = \frac{\bar{X}-\mu}{S/\sqrt{n}}\] Where:

\(\bar{X}\) - is the mean of our sample;

\(\mu\) - is the suggested mean of population (in our case, of the second promotion we compare of)

S - is the standard deviation

\(\sqrt{n}\) - is the square root of the number of observations

The source of the formula is here, or from Wikipedia

our_n_obs = length(raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands)
our_mean_of_sample = mean((raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands))
our_mean_of_population = mean(raw_AB_test_data[which(raw_AB_test_data$Promotion == 2), ]$SalesInThousands)
our_sd_of_sample = sd((raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands))
our_t_test = (our_mean_of_sample-our_mean_of_population)/(our_sd_of_sample/(our_n_obs^(1/2)))

# Obtain the p-values:
our_p_value = 2*pt(-abs(our_t_test),df=our_n_obs-1, lower.tail=TRUE)
print(paste("Our t-test value is: ", round(our_t_test, 3), "; the p-value of this test is: ", round(our_p_value, 3), "(precisely:", our_p_value, ")", sep=""))
## [1] "Our t-test value is: 8.532; the p-value of this test is: 0(precisely:7.43522826520984e-15)"

Promotion 1 vs. 3

## 
##  One Sample t-test
## 
## data:  raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands
## t = 2.1665, df = 171, p-value = 0.03166
## alternative hypothesis: true mean is not equal to 55.36447
## 95 percent confidence interval:
##  55.60748 60.59054
## sample estimates:
## mean of x 
##  58.09901

Task: Draw your own conclusion on the material