H1: There is a positive relationship between ad
spend and revenue. That is, higher ad spend leads to higher
revenue.
H2: Campaigns that include display ads result in higher
revenue than those without display ads.
This analysis is reported in an RMarkdown document, which includes code, model output, visualizations, and written interpretations.
# Load necessary libraries
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Load dataset
data <- read.csv("Display.csv")
# View data summary
summary(data)
## spend clicks impressions display
## Min. : 1.12 Min. : 48.0 Min. : 1862 Min. :0.0000
## 1st Qu.:28.73 1st Qu.:172.0 1st Qu.: 6048 1st Qu.:0.0000
## Median :39.68 Median :241.0 Median : 9934 Median :0.0000
## Mean :44.22 Mean :257.1 Mean :11858 Mean :0.3103
## 3rd Qu.:55.57 3rd Qu.:303.0 3rd Qu.:14789 3rd Qu.:1.0000
## Max. :91.28 Max. :593.0 Max. :29324 Max. :1.0000
## transactions revenue ctr con_rate
## Min. :1.000 Min. : 16.16 Min. :1.890 Min. :0.810
## 1st Qu.:2.000 1st Qu.:117.32 1st Qu.:1.970 1st Qu.:0.990
## Median :3.000 Median :235.16 Median :2.020 Median :1.130
## Mean :2.966 Mean :223.50 Mean :2.306 Mean :1.227
## 3rd Qu.:4.000 3rd Qu.:298.92 3rd Qu.:2.790 3rd Qu.:1.470
## Max. :6.000 Max. :522.00 Max. :3.290 Max. :2.080
str(data)
## 'data.frame': 29 obs. of 8 variables:
## $ spend : num 22.6 37.3 55.6 45.4 50.2 ...
## $ clicks : int 165 228 291 247 290 172 68 112 306 300 ...
## $ impressions : int 8672 11875 14631 11709 14768 8698 2924 5919 14789 14818 ...
## $ display : int 0 0 0 0 0 0 0 0 0 0 ...
## $ transactions: int 2 2 3 2 3 2 1 1 3 3 ...
## $ revenue : num 58.9 44.9 141.6 209.8 197.7 ...
## $ ctr : num 1.9 1.92 1.99 2.11 1.96 1.98 2.33 1.89 2.07 2.02 ...
## $ con_rate : num 1.21 0.88 1.03 0.81 1.03 1.16 1.47 0.89 0.98 1 ...
# Simple linear regression model
simple_model <- lm(revenue ~ spend, data = data)
summary(simple_model)
##
## Call:
## lm(formula = revenue ~ spend, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -145.210 -54.647 1.117 67.780 149.476
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.9397 37.9668 0.288 0.775
## spend 4.8066 0.7775 6.182 1.31e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 86.71 on 27 degrees of freedom
## Multiple R-squared: 0.586, Adjusted R-squared: 0.5707
## F-statistic: 38.22 on 1 and 27 DF, p-value: 1.311e-06
# Visualize the regression
ggplot(data, aes(x = spend, y = revenue)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(title = "Simple Linear Regression: Spend vs Revenue",
x = "Spend",
y = "Revenue")
## `geom_smooth()` using formula = 'y ~ x'
Interpretation The simple regression model shows a strong, statistically
significant positive relationship between ad spend and revenue.
R-squared ≈ 0.586 → Spend explains 58.6% of revenue variation
p-value < 0.001 → Strong evidence of a real relationship
Coefficient for spend is positive → More spend leads to more revenue
Managerial Recommendation Managers should consider increasing ad spend to improve revenue, while monitoring for diminishing returns and other influencing factors.
# Multiple linear regression model
multiple_model <- lm(revenue ~ spend + display, data = data)
summary(multiple_model)
##
## Call:
## lm(formula = revenue ~ spend + display, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -176.730 -35.020 8.661 56.440 129.231
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -50.8612 40.3336 -1.261 0.21850
## spend 5.5473 0.7415 7.482 6.07e-08 ***
## display 93.5856 33.1910 2.820 0.00908 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 77.33 on 26 degrees of freedom
## Multiple R-squared: 0.6829, Adjusted R-squared: 0.6586
## F-statistic: 28 on 2 and 26 DF, p-value: 3.271e-07
# Visualize the model
data$predicted_revenue <- predict(multiple_model)
ggplot(data, aes(x = spend, y = revenue, color = as.factor(display))) +
geom_point() +
geom_line(aes(y = predicted_revenue), size = 1) +
labs(title = "Multiple Linear Regression: Spend, Display vs Revenue",
x = "Spend",
y = "Revenue",
color = "Display Campaign") +
scale_color_manual(values = c("0" = "pink", "1" = "red"),
labels = c("No Display", "Display"))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Interpretation The multiple regression shows that both spend and display
significantly predict revenue.
Coefficient for spend ≈ 5.55 → Each $1 spent increases revenue by ~$5.55
Coefficient for display ≈ 93.6 → Display campaigns increase revenue by ~$93.60
R-squared increases to ≈ 0.683 → Better model fit
Managerial Recommendation Managers should invest in both ad spend and display campaigns for maximum revenue impact. Display campaigns provide a significant uplift and should be prioritized.
Question 2 – A/B Testing (ANOVA) 1. Hypotheses Null Hypothesis (H₀): There is no significant difference in average purchases across Ad groups 0, 1, and 2. Alternative Hypothesis (H₁): At least one group has a significantly different number of purchases.
Report Format The following ANOVA test and visualization are used to evaluate campaign performance and support recommendations.
ANOVA Analysis
# Load required package
library(readr)
# Load and filter data
data2 <- read_csv("ab_testing.csv")
## Rows: 80 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Ads, Purchase
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data2 <- data2 %>% filter(Ads %in% c(0, 1, 2))
data2$Ads <- factor(data2$Ads)
# Run ANOVA
model <- aov(Purchase ~ Ads, data = data2)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Ads 2 60212 30106 31.31 8.34e-10 ***
## Residuals 55 52880 961
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Visualize ad performance
ggplot(data2, aes(x = Ads, y = Purchase)) +
geom_jitter(width = 0.1, alpha = 0.6, color = "darkgray") +
stat_summary(fun = mean, geom = "point", size = 4, color = "blue") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2, color = "blue") +
stat_summary(fun = mean, geom = "line", aes(group = 1), color = "blue", linetype = "dashed") +
labs(title = "ANOVA: Ad Group Effect on Purchases",
x = "Ad Group (0 = Control, 1 = Ad1, 2 = Ad2)",
y = "Purchases") +
theme_minimal()
## Warning: Computation failed in `stat_summary()`.
## Caused by error in `fun.data()`:
## ! The package "Hmisc" is required.
Interpretation The ANOVA results indicate significant differences
between ad groups.
Ad 1 produced the highest number of purchases
Ad 2 was moderately effective
Ad 0 (control group) performed the worst
Managerial Recommendation Ad 1 should be the default campaign moving forward based on purchase performance. Ad 2 may still be worth optimizing. Advertising clearly outperforms no promotion.