1. Hypotheses

Hypothesis 1

H1: There is a positive relationship between ad spend and revenue. Higher spend leads to increased revenue.

Hypothesis 2

H2: Display campaigns positively influence revenue. Campaigns with display ads generate more revenue than those without.

  1. Data Summary and Exploration
# Load necessary libraries
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Read the data 
data <- read.csv("Display_data.csv")

# Quick overview of the data
summary(data)
##      spend           clicks       impressions       display      
##  Min.   : 1.12   Min.   : 48.0   Min.   : 1862   Min.   :0.0000  
##  1st Qu.:28.73   1st Qu.:172.0   1st Qu.: 6048   1st Qu.:0.0000  
##  Median :39.68   Median :241.0   Median : 9934   Median :0.0000  
##  Mean   :44.22   Mean   :257.1   Mean   :11858   Mean   :0.3103  
##  3rd Qu.:55.57   3rd Qu.:303.0   3rd Qu.:14789   3rd Qu.:1.0000  
##  Max.   :91.28   Max.   :593.0   Max.   :29324   Max.   :1.0000  
##   transactions      revenue            ctr           con_rate    
##  Min.   :1.000   Min.   : 16.16   Min.   :1.890   Min.   :0.810  
##  1st Qu.:2.000   1st Qu.:117.32   1st Qu.:1.970   1st Qu.:0.990  
##  Median :3.000   Median :235.16   Median :2.020   Median :1.130  
##  Mean   :2.966   Mean   :223.50   Mean   :2.306   Mean   :1.227  
##  3rd Qu.:4.000   3rd Qu.:298.92   3rd Qu.:2.790   3rd Qu.:1.470  
##  Max.   :6.000   Max.   :522.00   Max.   :3.290   Max.   :2.080
str(data)
## 'data.frame':    29 obs. of  8 variables:
##  $ spend       : num  22.6 37.3 55.6 45.4 50.2 ...
##  $ clicks      : int  165 228 291 247 290 172 68 112 306 300 ...
##  $ impressions : int  8672 11875 14631 11709 14768 8698 2924 5919 14789 14818 ...
##  $ display     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ transactions: int  2 2 3 2 3 2 1 1 3 3 ...
##  $ revenue     : num  58.9 44.9 141.6 209.8 197.7 ...
##  $ ctr         : num  1.9 1.92 1.99 2.11 1.96 1.98 2.33 1.89 2.07 2.02 ...
##  $ con_rate    : num  1.21 0.88 1.03 0.81 1.03 1.16 1.47 0.89 0.98 1 ...
simple_model <- lm(revenue ~ spend, data = data)
summary(simple_model)
## 
## Call:
## lm(formula = revenue ~ spend, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -145.210  -54.647    1.117   67.780  149.476 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.9397    37.9668   0.288    0.775    
## spend         4.8066     0.7775   6.182 1.31e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 86.71 on 27 degrees of freedom
## Multiple R-squared:  0.586,  Adjusted R-squared:  0.5707 
## F-statistic: 38.22 on 1 and 27 DF,  p-value: 1.311e-06
#Data Visualization:
ggplot(data, aes(x = spend, y = revenue)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(title = "Simple Linear Regression: Spend vs Revenue",
       x = "Spend",
       y = "Revenue")
## `geom_smooth()` using formula = 'y ~ x'

Interpretation:

Based on the results of the simple regression model, the R-squared value of 0.586 suggests that approximately 58.6% of the variability in revenue can be explained by ad spend alone. The p-value of 1.311e-06 indicates that the relationship between spend and revenue is statistically significant. Additionally, the residual standard error of 86.71 implies some level of unexplained variation, suggesting other factors may also influence revenue. Given these results, increasing ad spend is likely to result in higher revenue, but further analysis with additional variables may provide a more comprehensive understanding of the factors driving revenue.

Managerial Recommendation:

Based on the significant relationship between spend and revenue, managers should consider increasing the ad budget to further drive revenue. However, it’s important to monitor for diminishing returns and ensure the additional spend continues to generate sufficient revenue. Additionally, exploring other factors like seasonality, competitor activity, or display campaigns could provide further insights for budget optimization.

  1. Multiple Regression
multiple_model <- lm(revenue ~ spend + display, data = data)
summary(multiple_model)
## 
## Call:
## lm(formula = revenue ~ spend + display, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -176.730  -35.020    8.661   56.440  129.231 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -50.8612    40.3336  -1.261  0.21850    
## spend         5.5473     0.7415   7.482 6.07e-08 ***
## display      93.5856    33.1910   2.820  0.00908 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 77.33 on 26 degrees of freedom
## Multiple R-squared:  0.6829, Adjusted R-squared:  0.6586 
## F-statistic:    28 on 2 and 26 DF,  p-value: 3.271e-07
data$predicted_revenue <- predict(multiple_model)
ggplot(data, aes(x = spend, y = revenue, color = as.factor(display))) +
  geom_point() +
  geom_line(aes(y = predicted_revenue), size = 1) +
  labs(title = "Multiple Linear Regression: Spend, Display vs Revenue",
       x = "Spend",
       y = "Revenue",
       color = "Display Campaign") +
  scale_color_manual(values = c("0" = "pink", "1" = "red"),
                     labels = c("No Display Campaign", "Display Campaign"))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Interpretation: The multiple regression model indicates that both spend and display are statistically significant predictors of revenue. The coefficient for spend (5.55) suggests that for every additional dollar spent, revenue increases by approximately $5.55. The display variable has a coefficient of 93.59, meaning campaigns with display ads generate significantly higher revenue compared to those without. The R-squared value of 0.6829 shows that 68.3% of the variation in revenue is explained by spend and display, indicating a strong model fit. Both predictors are statistically significant, with p-values well below 0.05.

Managerial Recommendation:

Based on the regression results, managers should consider increasing the ad spend, as it has a significant positive impact on revenue. Display campaigns should also be prioritized, given their strong influence on revenue generation. It is crucial to monitor the balance between spend and display to ensure resources are allocated efficiently for maximum return. Regular performance tracking is necessary to identify any diminishing returns from increased spend or display campaign saturation.

Question 2

Null Hypothesis (H₀): There is no significant difference in product purchases across the three ad groups (Ads = 0, 1, 2).

𝐻0:𝜇0 = 𝜇1 = 𝜇2

Alternative Hypothesis (H₁): At least one of the ad campaigns leads to a significantly different number of purchases.

𝐻1:𝜇𝑖≠𝜇𝑗for some𝑖≠𝑗

# Load necessary libraries
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(ggplot2)
library(dplyr)
library(readr)
# Load your data (adjust the path as needed)
data2 <- read_csv("ab_testing.csv")
## Rows: 80 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Ads, Purchase
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Filter to include only Ads 0, 1, and 2 (excluding any unexpected group like 3)
data2 <- data2 %>% filter(Ads %in% c(0, 1, 2))

# Convert Ads to factor
data2$Ads <- factor(data2$Ads)
# Run ANOVA (linear regression with categorical predictor)
model <- aov(Purchase ~ Ads, data = data2)
summary(model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Ads          2  60212   30106   31.31 8.34e-10 ***
## Residuals   55  52880     961                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(data2, aes(x = Ads, y = Purchase)) +
  geom_jitter(width = 0.1, alpha = 0.6, color = "darkgray") +  # actual points
  stat_summary(fun = mean, geom = "point", size = 4, color = "blue") +  # group means
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2, color = "blue") +  # error bars
  stat_summary(fun = mean, geom = "line", aes(group = 1), color = "blue", linetype = "dashed") +  # regression line across factor levels
  labs(title = "Regression Model: Ad Campaign Effect on Purchases",
       x = "Ad Group (0 = Control, 1 = Ad1, 2 = Ad2)",
       y = "Purchase Count") +
  theme_minimal()

Recommendations

Based on the results, Ad Version 1 led to the highest number of purchases and is the most effective campaign. The retailer should prioritize using Ad 1 for future promotions to maximize sales. Ad Version 2 showed moderate impact but was less effective than Ad 1, so it may need revision or testing. The control group had the lowest performance, confirming the value of advertising overall.