Question 1 – Regression Analysis

1. Hypotheses

H1: There is a positive relationship between ad spend and revenue. That is, higher ad spend leads to higher revenue.
H2: Campaigns that include display ads result in higher revenue than those without display ads.

2. Report Format

This analysis is reported in an RMarkdown document, which includes code, model output, visualizations, and written interpretations.

3. Simple Regression – Predicting Revenue Based on Spend

# Load necessary libraries
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Load dataset
data <- read.csv("Display.csv")

# View data summary
summary(data)

##      spend           clicks       impressions       display      
##  Min.   : 1.12   Min.   : 48.0   Min.   : 1862   Min.   :0.0000  
##  1st Qu.:28.73   1st Qu.:172.0   1st Qu.: 6048   1st Qu.:0.0000  
##  Median :39.68   Median :241.0   Median : 9934   Median :0.0000  
##  Mean   :44.22   Mean   :257.1   Mean   :11858   Mean   :0.3103  
##  3rd Qu.:55.57   3rd Qu.:303.0   3rd Qu.:14789   3rd Qu.:1.0000  
##  Max.   :91.28   Max.   :593.0   Max.   :29324   Max.   :1.0000  
##   transactions      revenue            ctr           con_rate    
##  Min.   :1.000   Min.   : 16.16   Min.   :1.890   Min.   :0.810  
##  1st Qu.:2.000   1st Qu.:117.32   1st Qu.:1.970   1st Qu.:0.990  
##  Median :3.000   Median :235.16   Median :2.020   Median :1.130  
##  Mean   :2.966   Mean   :223.50   Mean   :2.306   Mean   :1.227  
##  3rd Qu.:4.000   3rd Qu.:298.92   3rd Qu.:2.790   3rd Qu.:1.470  
##  Max.   :6.000   Max.   :522.00   Max.   :3.290   Max.   :2.080

str(data)

## 'data.frame':    29 obs. of  8 variables:
##  $ spend       : num  22.6 37.3 55.6 45.4 50.2 ...
##  $ clicks      : int  165 228 291 247 290 172 68 112 306 300 ...
##  $ impressions : int  8672 11875 14631 11709 14768 8698 2924 5919 14789 14818 ...
##  $ display     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ transactions: int  2 2 3 2 3 2 1 1 3 3 ...
##  $ revenue     : num  58.9 44.9 141.6 209.8 197.7 ...
##  $ ctr         : num  1.9 1.92 1.99 2.11 1.96 1.98 2.33 1.89 2.07 2.02 ...
##  $ con_rate    : num  1.21 0.88 1.03 0.81 1.03 1.16 1.47 0.89 0.98 1 ...

# Simple linear regression model
simple_model <- lm(revenue ~ spend, data = data)
summary(simple_model)

## 
## Call:
## lm(formula = revenue ~ spend, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -145.210  -54.647    1.117   67.780  149.476 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.9397    37.9668   0.288    0.775    
## spend         4.8066     0.7775   6.182 1.31e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 86.71 on 27 degrees of freedom
## Multiple R-squared:  0.586,  Adjusted R-squared:  0.5707 
## F-statistic: 38.22 on 1 and 27 DF,  p-value: 1.311e-06

# Visualize the regression
ggplot(data, aes(x = spend, y = revenue)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(title = "Simple Linear Regression: Spend vs Revenue",
       x = "Spend",
       y = "Revenue")

## `geom_smooth()` using formula = 'y ~ x'

Interpretation The simple regression model shows a strong, statistically significant positive relationship between ad spend and revenue.

R-squared ≈ 0.586 → Spend explains 58.6% of revenue variation

p-value < 0.001 → Strong evidence of a real relationship

Coefficient for spend is positive → More spend leads to more revenue

Managerial Recommendation Managers should consider increasing ad spend to improve revenue, while monitoring for diminishing returns and other influencing factors.

Multiple Regression – Using Spend and Display

# Multiple linear regression model
multiple_model <- lm(revenue ~ spend + display, data = data)
summary(multiple_model)

## 
## Call:
## lm(formula = revenue ~ spend + display, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -176.730  -35.020    8.661   56.440  129.231 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -50.8612    40.3336  -1.261  0.21850    
## spend         5.5473     0.7415   7.482 6.07e-08 ***
## display      93.5856    33.1910   2.820  0.00908 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 77.33 on 26 degrees of freedom
## Multiple R-squared:  0.6829, Adjusted R-squared:  0.6586 
## F-statistic:    28 on 2 and 26 DF,  p-value: 3.271e-07

# Visualize the model
data$predicted_revenue <- predict(multiple_model)

ggplot(data, aes(x = spend, y = revenue, color = as.factor(display))) +
  geom_point() +
  geom_line(aes(y = predicted_revenue), size = 1) +
  labs(title = "Multiple Linear Regression: Spend, Display vs Revenue",
       x = "Spend",
       y = "Revenue",
       color = "Display Campaign") +
  scale_color_manual(values = c("0" = "pink", "1" = "red"),
                     labels = c("No Display", "Display"))

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interpretation The multiple regression shows that both spend and display significantly predict revenue.

Coefficient for spend ≈ 5.55 → Each $1 spent increases revenue by ~$5.55

Coefficient for display ≈ 93.6 → Display campaigns increase revenue by ~$93.60

R-squared increases to ≈ 0.683 → Better model fit

Managerial Recommendation Managers should invest in both ad spend and display campaigns for maximum revenue impact. Display campaigns provide a significant uplift and should be prioritized.

Question 2 – A/B Testing (ANOVA) 1. Hypotheses Null Hypothesis (H₀): There is no significant difference in average purchases across Ad groups 0, 1, and 2. Alternative Hypothesis (H₁): At least one group has a significantly different number of purchases.

Report Format The following ANOVA test and visualization are used to evaluate campaign performance and support recommendations.
ANOVA Analysis

# Load required package
library(readr)

# Load and filter data
data2 <- read_csv("ab_testing.csv")

## Rows: 80 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Ads, Purchase
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

data2 <- data2 %>% filter(Ads %in% c(0, 1, 2))
data2$Ads <- factor(data2$Ads)

# Run ANOVA
model <- aov(Purchase ~ Ads, data = data2)
summary(model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Ads          2  60212   30106   31.31 8.34e-10 ***
## Residuals   55  52880     961                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Visualize ad performance
ggplot(data2, aes(x = Ads, y = Purchase)) +
  geom_jitter(width = 0.1, alpha = 0.6, color = "darkgray") +
  stat_summary(fun = mean, geom = "point", size = 4, color = "blue") +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2, color = "blue") +
  stat_summary(fun = mean, geom = "line", aes(group = 1), color = "blue", linetype = "dashed") +
  labs(title = "ANOVA: Ad Group Effect on Purchases",
       x = "Ad Group (0 = Control, 1 = Ad1, 2 = Ad2)",
       y = "Purchases") +
  theme_minimal()

## Warning: Computation failed in `stat_summary()`.
## Caused by error in `fun.data()`:
## ! The package "Hmisc" is required.

Interpretation The ANOVA results indicate significant differences between ad groups.

Ad 1 produced the highest number of purchases

Ad 2 was moderately effective

Ad 0 (control group) performed the worst

Managerial Recommendation Ad 1 should be the default campaign moving forward based on purchase performance. Ad 2 may still be worth optimizing. Advertising clearly outperforms no promotion.

Midterm

2025-03-25

Question 1 – Regression Analysis

1. Hypotheses

2. Report Format

3. Simple Regression – Predicting Revenue Based on Spend