This analysis examines the effectiveness of different advertising campaigns for a local retailer in Bakersfield. The experiment divided 30 targeted consumers into three groups:
The goal is to determine which advertising campaign leads to more product sales.
Null Hypothesis (H0): There is no significant difference in purchase amounts between the different advertising campaigns (control, version 1, and version 2).
Alternative Hypothesis (H1): At least one of the advertising campaigns leads to significantly different purchase amounts compared to the others.
# Read the data
data <- read.csv("ab_testing1.csv")
# Display the first few rows
head(data)
## Ads Purchase
## 1 1 152
## 2 0 21
## 3 2 77
## 4 0 65
## 5 1 183
## 6 1 87
# Check data structure
str(data)
## 'data.frame': 29 obs. of 2 variables:
## $ Ads : int 1 0 2 0 1 1 2 2 2 0 ...
## $ Purchase: int 152 21 77 65 183 87 121 104 116 82 ...
# Convert Ads to a factor since it's a categorical variable
data$Ads <- as.factor(data$Ads)
# Summary statistics by group - without using pipe operator
group_summary <- aggregate(Purchase ~ Ads, data = data,
FUN = function(x) c(Count = length(x),
Mean = mean(x),
SD = sd(x),
Min = min(x),
Max = max(x)))
# Reshape the results
group_summary <- data.frame(
Ads = group_summary$Ads,
Count = group_summary$Purchase[,1],
Mean_Purchase = group_summary$Purchase[,2],
SD_Purchase = group_summary$Purchase[,3],
Min_Purchase = group_summary$Purchase[,4],
Max_Purchase = group_summary$Purchase[,5]
)
# Display summary statistics
print(group_summary)
## Ads Count Mean_Purchase SD_Purchase Min_Purchase Max_Purchase
## 1 0 10 49.0000 27.23560 21 85
## 2 1 7 118.7143 40.34730 61 183
## 3 2 12 73.7500 31.11014 14 121
# Create a boxplot to visualize the distribution
ggplot(data, aes(x = Ads, y = Purchase, fill = Ads)) +
geom_boxplot() +
labs(title = "Purchase Amounts by Ad Version",
x = "Ad Version",
y = "Purchase Amount",
fill = "Ad Version") +
theme_minimal()
# Add a bar chart of means
ggplot(group_summary, aes(x = Ads, y = Mean_Purchase, fill = Ads)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = Mean_Purchase - SD_Purchase,
ymax = Mean_Purchase + SD_Purchase),
width = 0.2) +
labs(title = "Mean Purchase Amounts by Ad Version",
x = "Ad Version",
y = "Mean Purchase Amount",
fill = "Ad Version") +
theme_minimal()
# Perform ANOVA to test for differences between groups
anova_result <- aov(Purchase ~ Ads, data = data)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## Ads 2 20122 10061 9.656 0.000731 ***
## Residuals 26 27090 1042
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA results show a statistically significant difference between the groups (p-value < 0.05). We can reject the null hypothesis and conclude that at least one advertising campaign leads to significantly different purchase amounts.
# If ANOVA is significant, perform post-hoc test
tukey_result <- TukeyHSD(anova_result)
print(tukey_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Purchase ~ Ads, data = data)
##
## $Ads
## diff lwr upr p adj
## 1-0 69.71429 30.186860 109.241711 0.0004878
## 2-0 24.75000 -9.593441 59.093441 0.1924155
## 2-1 -44.96429 -83.111273 -6.817298 0.0185265
# Visualize Tukey's test results
plot(tukey_result)
# Linear regression model
model <- lm(Purchase ~ Ads, data = data)
summary(model)
##
## Call:
## lm(formula = Purchase ~ Ads, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -59.75 -22.75 -3.75 30.25 64.29
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.00 10.21 4.800 5.69e-05 ***
## Ads1 69.71 15.91 4.383 0.000171 ***
## Ads2 24.75 13.82 1.791 0.084982 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 32.28 on 26 degrees of freedom
## Multiple R-squared: 0.4262, Adjusted R-squared: 0.3821
## F-statistic: 9.656 on 2 and 26 DF, p-value: 0.0007308
# Diagnostic plots
par(mfrow = c(2, 2))
plot(model)
Based on the analysis:
The ANOVA test confirms that there are significant differences in purchase amounts between the three groups.
The Tukey’s post-hoc test reveals:
The regression analysis confirms these findings:
The analysis demonstrates that advertising has a significant positive impact on purchase amounts. Version 1 of the ad campaign is the most effective, followed by Version 2, with both outperforming the control group (no ads).
Implement Version 1 Campaign: I recommend implementing Version 1 of the advertising campaign as it generated the highest purchase amounts.
Budget Allocation: Allocate the majority of the advertising budget to Version 1, but consider maintaining some investment in Version 2.
Further Analysis: Consider conducting additional testing to understand why Version 1 performed better. Find why it made it more effective to improve future campaigns.
ROI Calculation: Calculate the roi for both ad versions by comparing the incremental sales revenue against the advertising costs to be sure that profit is being made.
Segmentation Analysis: Analyze certain customer segments respond differently to the various ad versions.
Continuous Testing: Implement a system for continuous A/B testing of advertising campaigns to always refine and improve the effectiveness of its marketing.