Hypotheses

  1. Simple Regression Hypothesis: We hypothesize that spend positively impacts revenue—higher ad spend should correlate with higher revenue.
  2. Multiple Regression Hypothesis: We hypothesize that both spend and display campaign presence influence revenue, with spend having a stronger effect. Running a display campaign should contribute to higher revenue.

Load Data

library(readxl)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(broom)

# Load dataset
data <- read_excel("Display_data.xlsx")

# Ensure column names are clean
names(data) <- trimws(names(data))

# Convert to data frame
data <- as.data.frame(data)

# View structure and summary
str(data)
## 'data.frame':    29 obs. of  8 variables:
##  $ spend       : num  22.6 37.3 55.6 45.4 50.2 ...
##  $ clicks      : num  165 228 291 247 290 172 68 112 306 300 ...
##  $ impressions : num  8672 11875 14631 11709 14768 ...
##  $ display     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ transactions: num  2 2 3 2 3 2 1 1 3 3 ...
##  $ revenue     : num  58.9 44.9 141.6 209.8 197.7 ...
##  $ ctr         : num  1.9 1.92 1.99 2.11 1.96 1.98 2.33 1.89 2.07 2.02 ...
##  $ con_rate    : num  1.21 0.88 1.03 0.81 1.03 1.16 1.47 0.89 0.98 1 ...
summary(data)
##      spend           clicks       impressions       display      
##  Min.   : 1.12   Min.   : 48.0   Min.   : 1862   Min.   :0.0000  
##  1st Qu.:28.73   1st Qu.:172.0   1st Qu.: 6048   1st Qu.:0.0000  
##  Median :39.68   Median :241.0   Median : 9934   Median :0.0000  
##  Mean   :44.22   Mean   :257.1   Mean   :11858   Mean   :0.3103  
##  3rd Qu.:55.57   3rd Qu.:303.0   3rd Qu.:14789   3rd Qu.:1.0000  
##  Max.   :91.28   Max.   :593.0   Max.   :29324   Max.   :1.0000  
##   transactions      revenue            ctr           con_rate    
##  Min.   :1.000   Min.   : 16.16   Min.   :1.890   Min.   :0.810  
##  1st Qu.:2.000   1st Qu.:117.32   1st Qu.:1.970   1st Qu.:0.990  
##  Median :3.000   Median :235.16   Median :2.020   Median :1.130  
##  Mean   :2.966   Mean   :223.50   Mean   :2.306   Mean   :1.227  
##  3rd Qu.:4.000   3rd Qu.:298.92   3rd Qu.:2.790   3rd Qu.:1.470  
##  Max.   :6.000   Max.   :522.00   Max.   :3.290   Max.   :2.080

Exploratory Data Analysis

# Check column names
colnames(data)
## [1] "spend"        "clicks"       "impressions"  "display"      "transactions"
## [6] "revenue"      "ctr"          "con_rate"
# Scatterplot of Spend vs Revenue
ggplot(data, aes(x = as.numeric(spend), y = as.numeric(revenue))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "Spend vs Revenue", x = "Spend ($)", y = "Revenue ($)")
## `geom_smooth()` using formula = 'y ~ x'

Simple Regression Model: Predicting Revenue Based on Spend

simple_model <- lm(as.numeric(revenue) ~ as.numeric(spend), data = data)
simple_summary <- summary(simple_model)
simple_summary
## 
## Call:
## lm(formula = as.numeric(revenue) ~ as.numeric(spend), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -145.210  -54.647    1.117   67.780  149.476 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        10.9397    37.9668   0.288    0.775    
## as.numeric(spend)   4.8066     0.7775   6.182 1.31e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 86.71 on 27 degrees of freedom
## Multiple R-squared:  0.586,  Adjusted R-squared:  0.5707 
## F-statistic: 38.22 on 1 and 27 DF,  p-value: 1.311e-06

Interpretation & Managerial Recommendations

  • The coefficient for Spend should be positive, indicating that increasing spending leads to higher revenue.
  • If the p-value is significant (<0.05), this confirms a strong relationship between Spend and Revenue.
  • Managers should allocate more budget toward spending but assess the diminishing returns to ensure cost efficiency.

Multiple Regression Model: Predicting Revenue Based on Spend and Display Campaign

multi_model <- lm(as.numeric(revenue) ~ as.numeric(spend) + as.factor(display), data = data)
multi_summary <- summary(multi_model)
multi_summary
## 
## Call:
## lm(formula = as.numeric(revenue) ~ as.numeric(spend) + as.factor(display), 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -176.730  -35.020    8.661   56.440  129.231 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -50.8612    40.3336  -1.261  0.21850    
## as.numeric(spend)     5.5473     0.7415   7.482 6.07e-08 ***
## as.factor(display)1  93.5856    33.1910   2.820  0.00908 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 77.33 on 26 degrees of freedom
## Multiple R-squared:  0.6829, Adjusted R-squared:  0.6586 
## F-statistic:    28 on 2 and 26 DF,  p-value: 3.271e-07

Interpretation & Managerial Recommendations

  • If Display has a significant p-value, it indicates that running a display campaign positively affects revenue.
  • If the adjusted R² improves compared to the simple model, adding Display helps explain revenue variability.
  • Managers should run display campaigns strategically, ensuring they enhance rather than merely increase ad spend.

Conclusion