1) Opening the File

AB Testing is an important element in the data analysis for marketing. Opening the example dataset:

# E:\ACER Mar 2021\Documents\R\HOMEPROJECTS\DIGITAL MARKETING\AB Testing\AB_Testing_Markdown_Document.Rmd
raw_AB_test_data <- read.csv('https://raw.githubusercontent.com/pthiagu2/DataMining/master/WA_Fn-UseC_-Marketing-Campaign-Eff-UseC_-FastF.csv', header = TRUE, na.strings = c("." , "Na", "", "..", "..."))

dim(raw_AB_test_data)

## [1] 548   7

knitr::kable(head(raw_AB_test_data))

MarketID	MarketSize	LocationID	AgeOfStore	Promotion	week	SalesInThousands
1	Medium	1	4	3	1	33.73
1	Medium	1	4	3	2	35.67
1	Medium	1	4	3	3	29.03
1	Medium	1	4	3	4	39.25
1	Medium	2	5	2	1	27.81
1	Medium	2	5	2	2	34.67

print("Here is the main summary statistics:")

## [1] "Here is the main summary statistics:"

raw_AB_test_data$MarketSize = as.factor(raw_AB_test_data$MarketSize)
knitr::kable(summary(raw_AB_test_data))

MarketID	MarketSize	LocationID	AgeOfStore	Promotion	week	SalesInThousands
Min. : 1.000	Large :168	Min. : 1.0	Min. : 1.000	Min. :1.000	Min. :1.00	Min. :17.34
1st Qu.: 3.000	Medium:320	1st Qu.:216.0	1st Qu.: 4.000	1st Qu.:1.000	1st Qu.:1.75	1st Qu.:42.55
Median : 6.000	Small : 60	Median :504.0	Median : 7.000	Median :2.000	Median :2.50	Median :50.20
Mean : 5.715	NA	Mean :479.7	Mean : 8.504	Mean :2.029	Mean :2.50	Mean :53.47
3rd Qu.: 8.000	NA	3rd Qu.:708.0	3rd Qu.:12.000	3rd Qu.:3.000	3rd Qu.:3.25	3rd Qu.:60.48
Max. :10.000	NA	Max. :920.0	Max. :28.000	Max. :3.000	Max. :4.00	Max. :99.65

2) The data descriptive charts

Plotting the pie chart for PROMOTIONS

Conclusions

1) Promotion 3 is the Largest

2) Each promotion group takes roughly about one third of the total sales during the promotion weeks

3 BARS for EACH FACTOR CATEGORY

3 BARS for EACH FACTOR CATEGORY SUM OF SALES

Waterfall = 1 bar for each factor category, SUM OF SALES

Conclusions:

1) 3 promition groups have 3 Market Sizes with similar compositions

2) Medium sized market occupies the most among the 3 promotion groups

Bar plot + Line plot with age of store:

Promotion	mean_age_store	St.Dev	Minimum	Maximum	Median	perc_75	perc_25
1	8.28	6.64	1	27	6	12	3
2	7.98	6.60	1	28	7	10	3
3	9.23	6.65	1	24	8	12	5

Conclusions:

1) The age group is similar for all 3 promotions

2) Stores are 8-9 years, on average

3) Most of the shops are yonger than 10-12 years

week	Promotion	Counter
1	1	43
1	2	47
1	3	47
2	1	43
2	2	47
2	3	47
3	1	43
3	2	47
3	3	47
4	1	43
4	2	47
4	3	47

3) Statistical Testing

## 
## Regression Results
## ====================================================
##                              Dependent variable:    
##                          ---------------------------
##                               SalesInThousands      
## ----------------------------------------------------
## factor(Promotion)2                -9.716***         
##                                    (0.557)          
##                                                     
## factor(Promotion)3                -4.931***         
##                                    (0.567)          
##                                                     
## AgeOfStore                          0.013           
##                                    (0.035)          
##                                                     
## factor(MarketSize)Medium         -19.413***         
##                                    (0.924)          
##                                                     
## factor(MarketSize)Small            -0.021           
##                                    (1.040)          
##                                                     
## factor(MarketID)2                 6.427***          
##                                    (1.420)          
##                                                     
## factor(MarketID)3                 30.262***         
##                                    (0.811)          
##                                                     
## factor(MarketID)4                                   
##                                                     
##                                                     
## factor(MarketID)5                 15.667***         
##                                    (0.989)          
##                                                     
## factor(MarketID)6                  1.627*           
##                                    (0.980)          
##                                                     
## factor(MarketID)7                 9.398***          
##                                    (0.987)          
##                                                     
## factor(MarketID)8                 12.607***         
##                                    (1.047)          
##                                                     
## factor(MarketID)9                 17.380***         
##                                    (1.096)          
##                                                     
## factor(MarketID)10                                  
##                                                     
##                                                     
## factor(week)2                      -0.404           
##                                    (0.625)          
##                                                     
## factor(week)3                      -0.316           
##                                    (0.625)          
##                                                     
## factor(week)4                      -0.578           
##                                    (0.625)          
##                                                     
## Constant                          59.608***         
##                                    (0.805)          
##                                                     
## ----------------------------------------------------
## Observations                         548            
## R2                                  0.907           
## Adjusted R2                         0.905           
## Residual Std. Error           5.170 (df = 532)      
## F Statistic               347.487*** (df = 15; 532) 
## ====================================================
## Note:                    *p<0.1; **p<0.05; ***p<0.01

Promotion 1 vs. 2

## 
##  One Sample t-test
## 
## data:  raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands
## t = 8.5323, df = 171, p-value = 7.435e-15
## alternative hypothesis: true mean is not equal to 47.32941
## 95 percent confidence interval:
##  55.60748 60.59054
## sample estimates:
## mean of x 
##  58.09901

Hand calculation method

The Formula:

\[t = \frac{\bar{X}-\mu}{S/\sqrt{n}}\] Where:

\(\bar{X}\) - is the mean of our sample;

\(\mu\) - is the suggested mean of population (in our case, of the second promotion we compare of)

S - is the standard deviation

\(\sqrt{n}\) - is the square root of the number of observations

The source of the formula is here, or from Wikipedia

our_n_obs = length(raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands)
our_mean_of_sample = mean((raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands))
our_mean_of_population = mean(raw_AB_test_data[which(raw_AB_test_data$Promotion == 2), ]$SalesInThousands)
our_sd_of_sample = sd((raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands))
our_t_test = (our_mean_of_sample-our_mean_of_population)/(our_sd_of_sample/(our_n_obs^(1/2)))

# Obtain the p-values:
our_p_value = 2*pt(-abs(our_t_test),df=our_n_obs-1, lower.tail=TRUE)
print(paste("Our t-test value is: ", round(our_t_test, 3), "; the p-value of this test is: ", round(our_p_value, 3), "(precisely:", our_p_value, ")", sep=""))

## [1] "Our t-test value is: 8.532; the p-value of this test is: 0(precisely:7.43522826520984e-15)"

Promotion 1 vs. 3

## 
##  One Sample t-test
## 
## data:  raw_AB_test_data[which(raw_AB_test_data$Promotion == 1), ]$SalesInThousands
## t = 2.1665, df = 171, p-value = 0.03166
## alternative hypothesis: true mean is not equal to 55.36447
## 95 percent confidence interval:
##  55.60748 60.59054
## sample estimates:
## mean of x 
##  58.09901

Marketing: A/B Testing

Александр Шеметев

17 6 2021

1) Opening the File

2) The data descriptive charts

Plotting the pie chart for PROMOTIONS

Conclusions

1) Promotion 3 is the Largest

2) Each promotion group takes roughly about one third of the total sales during the promotion weeks

3 BARS for EACH FACTOR CATEGORY

3 BARS for EACH FACTOR CATEGORY SUM OF SALES

Waterfall = 1 bar for each factor category, SUM OF SALES

Conclusions:

1) 3 promition groups have 3 Market Sizes with similar compositions

2) Medium sized market occupies the most among the 3 promotion groups

Bar plot + Line plot with age of store:

Conclusions:

1) The age group is similar for all 3 promotions

2) Stores are 8-9 years, on average

3) Most of the shops are yonger than 10-12 years

3) Statistical Testing

Promotion 1 vs. 2

Hand calculation method

The Formula:

Promotion 1 vs. 3

Task: Draw your own conclusion on the material