Hypothesis 1
H1: There is a positive relationship between ad spend and revenue. Higher spend leads to increased revenue.
Hypothesis 2
H2: Display campaigns positively influence revenue. Campaigns with display ads generate more revenue than those without.
# Load necessary libraries
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Read the data
data <- read.csv("Display_data.csv")
# Quick overview of the data
summary(data)
## spend clicks impressions display
## Min. : 1.12 Min. : 48.0 Min. : 1862 Min. :0.0000
## 1st Qu.:28.73 1st Qu.:172.0 1st Qu.: 6048 1st Qu.:0.0000
## Median :39.68 Median :241.0 Median : 9934 Median :0.0000
## Mean :44.22 Mean :257.1 Mean :11858 Mean :0.3103
## 3rd Qu.:55.57 3rd Qu.:303.0 3rd Qu.:14789 3rd Qu.:1.0000
## Max. :91.28 Max. :593.0 Max. :29324 Max. :1.0000
## transactions revenue ctr con_rate
## Min. :1.000 Min. : 16.16 Min. :1.890 Min. :0.810
## 1st Qu.:2.000 1st Qu.:117.32 1st Qu.:1.970 1st Qu.:0.990
## Median :3.000 Median :235.16 Median :2.020 Median :1.130
## Mean :2.966 Mean :223.50 Mean :2.306 Mean :1.227
## 3rd Qu.:4.000 3rd Qu.:298.92 3rd Qu.:2.790 3rd Qu.:1.470
## Max. :6.000 Max. :522.00 Max. :3.290 Max. :2.080
str(data)
## 'data.frame': 29 obs. of 8 variables:
## $ spend : num 22.6 37.3 55.6 45.4 50.2 ...
## $ clicks : int 165 228 291 247 290 172 68 112 306 300 ...
## $ impressions : int 8672 11875 14631 11709 14768 8698 2924 5919 14789 14818 ...
## $ display : int 0 0 0 0 0 0 0 0 0 0 ...
## $ transactions: int 2 2 3 2 3 2 1 1 3 3 ...
## $ revenue : num 58.9 44.9 141.6 209.8 197.7 ...
## $ ctr : num 1.9 1.92 1.99 2.11 1.96 1.98 2.33 1.89 2.07 2.02 ...
## $ con_rate : num 1.21 0.88 1.03 0.81 1.03 1.16 1.47 0.89 0.98 1 ...
simple_model <- lm(revenue ~ spend, data = data)
summary(simple_model)
##
## Call:
## lm(formula = revenue ~ spend, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -145.210 -54.647 1.117 67.780 149.476
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.9397 37.9668 0.288 0.775
## spend 4.8066 0.7775 6.182 1.31e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 86.71 on 27 degrees of freedom
## Multiple R-squared: 0.586, Adjusted R-squared: 0.5707
## F-statistic: 38.22 on 1 and 27 DF, p-value: 1.311e-06
#Data Visualization:
ggplot(data, aes(x = spend, y = revenue)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(title = "Simple Linear Regression: Spend vs Revenue",
x = "Spend",
y = "Revenue")
## `geom_smooth()` using formula = 'y ~ x'
Based on the results of the simple regression model, the R-squared value of 0.586 suggests that approximately 58.6% of the variability in revenue can be explained by ad spend alone. The p-value of 1.311e-06 indicates that the relationship between spend and revenue is statistically significant. Additionally, the residual standard error of 86.71 implies some level of unexplained variation, suggesting other factors may also influence revenue. Given these results, increasing ad spend is likely to result in higher revenue, but further analysis with additional variables may provide a more comprehensive understanding of the factors driving revenue.
Based on the significant relationship between spend and revenue, managers should consider increasing the ad budget to further drive revenue. However, it’s important to monitor for diminishing returns and ensure the additional spend continues to generate sufficient revenue. Additionally, exploring other factors like seasonality, competitor activity, or display campaigns could provide further insights for budget optimization.
multiple_model <- lm(revenue ~ spend + display, data = data)
summary(multiple_model)
##
## Call:
## lm(formula = revenue ~ spend + display, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -176.730 -35.020 8.661 56.440 129.231
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -50.8612 40.3336 -1.261 0.21850
## spend 5.5473 0.7415 7.482 6.07e-08 ***
## display 93.5856 33.1910 2.820 0.00908 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 77.33 on 26 degrees of freedom
## Multiple R-squared: 0.6829, Adjusted R-squared: 0.6586
## F-statistic: 28 on 2 and 26 DF, p-value: 3.271e-07
data$predicted_revenue <- predict(multiple_model)
ggplot(data, aes(x = spend, y = revenue, color = as.factor(display))) +
geom_point() +
geom_line(aes(y = predicted_revenue), size = 1) +
labs(title = "Multiple Linear Regression: Spend, Display vs Revenue",
x = "Spend",
y = "Revenue",
color = "Display Campaign") +
scale_color_manual(values = c("0" = "pink", "1" = "red"),
labels = c("No Display Campaign", "Display Campaign"))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Interpretation: The multiple regression model indicates that both
spend and display are statistically significant predictors of revenue.
The coefficient for spend (5.55) suggests that for every additional
dollar spent, revenue increases by approximately $5.55. The display
variable has a coefficient of 93.59, meaning campaigns with display ads
generate significantly higher revenue compared to those without. The
R-squared value of 0.6829 shows that 68.3% of the variation in revenue
is explained by spend and display, indicating a strong model fit. Both
predictors are statistically significant, with p-values well below
0.05.
Based on the regression results, managers should consider increasing the ad spend, as it has a significant positive impact on revenue. Display campaigns should also be prioritized, given their strong influence on revenue generation. It is crucial to monitor the balance between spend and display to ensure resources are allocated efficiently for maximum return. Regular performance tracking is necessary to identify any diminishing returns from increased spend or display campaign saturation.
Null Hypothesis (H₀): There is no significant difference in product purchases across the three ad groups (Ads = 0, 1, 2).
𝐻0:𝜇0 = 𝜇1 = 𝜇2
Alternative Hypothesis (H₁): At least one of the ad campaigns leads to a significantly different number of purchases.
𝐻1:𝜇𝑖≠𝜇𝑗for some𝑖≠𝑗
# Load necessary libraries
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(ggplot2)
library(dplyr)
library(readr)
# Load your data (adjust the path as needed)
data2 <- read_csv("ab_testing.csv")
## Rows: 80 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Ads, Purchase
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Filter to include only Ads 0, 1, and 2 (excluding any unexpected group like 3)
data2 <- data2 %>% filter(Ads %in% c(0, 1, 2))
# Convert Ads to factor
data2$Ads <- factor(data2$Ads)
# Run ANOVA (linear regression with categorical predictor)
model <- aov(Purchase ~ Ads, data = data2)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Ads 2 60212 30106 31.31 8.34e-10 ***
## Residuals 55 52880 961
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(data2, aes(x = Ads, y = Purchase)) +
geom_jitter(width = 0.1, alpha = 0.6, color = "darkgray") + # actual points
stat_summary(fun = mean, geom = "point", size = 4, color = "blue") + # group means
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2, color = "blue") + # error bars
stat_summary(fun = mean, geom = "line", aes(group = 1), color = "blue", linetype = "dashed") + # regression line across factor levels
labs(title = "Regression Model: Ad Campaign Effect on Purchases",
x = "Ad Group (0 = Control, 1 = Ad1, 2 = Ad2)",
y = "Purchase Count") +
theme_minimal()
Based on the results, Ad Version 1 led to the highest number of purchases and is the most effective campaign. The retailer should prioritize using Ad 1 for future promotions to maximize sales. Ad Version 2 showed moderate impact but was less effective than Ad 1, so it may need revision or testing. The control group had the lowest performance, confirming the value of advertising overall.