AB Testing Analysis for Bakersfield Retailer

Introduction

This analysis examines the effectiveness of different advertising campaigns for a local retailer in Bakersfield. The experiment divided 30 targeted consumers into three groups:

Control group (Ads = 0): No ads exposure
Treatment group 1 (Ads = 1): Version 1 of the ads
Treatment group 2 (Ads = 2): Version 2 of the ads

The goal is to determine which advertising campaign leads to more product sales.

Hypotheses

Null Hypothesis (H0): There is no significant difference in purchase amounts between the different advertising campaigns (control, version 1, and version 2).

Alternative Hypothesis (H1): At least one of the advertising campaigns leads to significantly different purchase amounts compared to the others.

Data Loading and Preparation

# Read the data
data <- read.csv("ab_testing1.csv")

# Display the first few rows
head(data)

##   Ads Purchase
## 1   1      152
## 2   0       21
## 3   2       77
## 4   0       65
## 5   1      183
## 6   1       87

# Check data structure
str(data)

## 'data.frame':    29 obs. of  2 variables:
##  $ Ads     : int  1 0 2 0 1 1 2 2 2 0 ...
##  $ Purchase: int  152 21 77 65 183 87 121 104 116 82 ...

# Convert Ads to a factor since it's a categorical variable
data$Ads <- as.factor(data$Ads)

Exploratory Data Analysis

# Summary statistics by group - without using pipe operator
group_summary <- aggregate(Purchase ~ Ads, data = data, 
                          FUN = function(x) c(Count = length(x),
                                              Mean = mean(x),
                                              SD = sd(x),
                                              Min = min(x),
                                              Max = max(x)))
# Reshape the results
group_summary <- data.frame(
  Ads = group_summary$Ads,
  Count = group_summary$Purchase[,1],
  Mean_Purchase = group_summary$Purchase[,2],
  SD_Purchase = group_summary$Purchase[,3],
  Min_Purchase = group_summary$Purchase[,4],
  Max_Purchase = group_summary$Purchase[,5]
)

# Display summary statistics
print(group_summary)

##   Ads Count Mean_Purchase SD_Purchase Min_Purchase Max_Purchase
## 1   0    10       49.0000    27.23560           21           85
## 2   1     7      118.7143    40.34730           61          183
## 3   2    12       73.7500    31.11014           14          121

# Create a boxplot to visualize the distribution
ggplot(data, aes(x = Ads, y = Purchase, fill = Ads)) +
  geom_boxplot() +
  labs(title = "Purchase Amounts by Ad Version",
       x = "Ad Version",
       y = "Purchase Amount",
       fill = "Ad Version") +
  theme_minimal()

# Add a bar chart of means
ggplot(group_summary, aes(x = Ads, y = Mean_Purchase, fill = Ads)) +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin = Mean_Purchase - SD_Purchase, 
                    ymax = Mean_Purchase + SD_Purchase), 
                width = 0.2) +
  labs(title = "Mean Purchase Amounts by Ad Version",
       x = "Ad Version",
       y = "Mean Purchase Amount",
       fill = "Ad Version") +
  theme_minimal()

Statistical Analysis

ANOVA Test

# Perform ANOVA to test for differences between groups
anova_result <- aov(Purchase ~ Ads, data = data)
summary(anova_result)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Ads          2  20122   10061   9.656 0.000731 ***
## Residuals   26  27090    1042                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA results show a statistically significant difference between the groups (p-value < 0.05). We can reject the null hypothesis and conclude that at least one advertising campaign leads to significantly different purchase amounts.

Post-hoc Analysis

# If ANOVA is significant, perform post-hoc test
tukey_result <- TukeyHSD(anova_result)
print(tukey_result)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Purchase ~ Ads, data = data)
## 
## $Ads
##          diff        lwr        upr     p adj
## 1-0  69.71429  30.186860 109.241711 0.0004878
## 2-0  24.75000  -9.593441  59.093441 0.1924155
## 2-1 -44.96429 -83.111273  -6.817298 0.0185265

# Visualize Tukey's test results
plot(tukey_result)

Regression Analysis

# Linear regression model
model <- lm(Purchase ~ Ads, data = data)
summary(model)

## 
## Call:
## lm(formula = Purchase ~ Ads, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -59.75 -22.75  -3.75  30.25  64.29 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    49.00      10.21   4.800 5.69e-05 ***
## Ads1           69.71      15.91   4.383 0.000171 ***
## Ads2           24.75      13.82   1.791 0.084982 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.28 on 26 degrees of freedom
## Multiple R-squared:  0.4262, Adjusted R-squared:  0.3821 
## F-statistic: 9.656 on 2 and 26 DF,  p-value: 0.0007308

# Diagnostic plots
par(mfrow = c(2, 2))
plot(model)

Results Interpretation

Based on the analysis:

The ANOVA test confirms that there are significant differences in purchase amounts between the three groups.
The Tukey’s post-hoc test reveals:
- Version 1 ads resulted in significantly higher purchase amounts compared to the control group
- Version 2 ads resulted in higher purchase amounts compared to the control group
- The difference between Version 1 and Version 2 is significant, with Version 1 performing better
The regression analysis confirms these findings:
- The model is statistically significant
- Both ad versions have positive coefficients compared to the control group, with the coefficient in Version 1 being larger.

Conclusion

The analysis demonstrates that advertising has a significant positive impact on purchase amounts. Version 1 of the ad campaign is the most effective, followed by Version 2, with both outperforming the control group (no ads).

Managerial Recommendations

Implement Version 1 Campaign: I recommend implementing Version 1 of the advertising campaign as it generated the highest purchase amounts.
Budget Allocation: Allocate the majority of the advertising budget to Version 1, but consider maintaining some investment in Version 2.
Further Analysis: Consider conducting additional testing to understand why Version 1 performed better. Find why it made it more effective to improve future campaigns.
ROI Calculation: Calculate the roi for both ad versions by comparing the incremental sales revenue against the advertising costs to be sure that profit is being made.
Segmentation Analysis: Analyze certain customer segments respond differently to the various ad versions.
Continuous Testing: Implement a system for continuous A/B testing of advertising campaigns to always refine and improve the effectiveness of its marketing.