Project - Marketing Campaign Response Analysis

Overview

This analysis explores factors influencing customer responses to a marketing campaign and builds a predictive model for campaign acceptance.

Research goal: Identify which customer characteristics predict campaign response and provide actionable recommendations for targeting strategy.

library(readr)
library(lubridate)
library (ggplot2)
library(magrittr) 
library(dplyr)
# install.packages("coin")
library (coin)
# install.packages("rpart.plot")
library(rpart.plot)
# install.packages("pROC")
library(pROC)

data <- read_csv("~/Documents/r files/ml_project1_data.csv")

Research questions

Question 1

Does day of week affect campaign response?

Hypothesis: Customers who enrolled on weekends are more likely to respond to campaign

data$week = wday(data$Dt_Customer, label = TRUE) # adding 'day of week' variable. 
data$Response = as.factor(data$Response)

data2 = data %>% filter(Response == 1) %>%
  group_by(week) %>%
  summarise(count = n())

ch = chisq.test(data$Response, data$week)
ch

## 
##  Pearson's Chi-squared test
## 
## data:  data$Response and data$week
## X-squared = 10.987, df = 6, p-value = 0.08878

ggplot(data = data2) +
  geom_bar(aes(x = week, y = count), 
           stat = "identity", fill = "#104E8B") +
  geom_text(aes(x = week, y = count, 
                label = count), vjust = -0.5, size = 3.5) +
  labs(title = "Campaign responses by day of week",
    subtitle = "No statistically significant relationship detected",
    x = "Day of week",
    y = "Number of responses") +
  theme_classic()

Results: Hypothesis is not confirmed since p-value is higher than 0.01 (p-value = 0.09). The chi-square test shows that there is no statistically significant relationship between the day of the week and if the customer responses to the campaign. Therefore, day of the week can be exluded from the predictive model.

Question 2:

Does income level predict campaign response?

Hypothesis: Higher-income customers are more likely to respond to campaigns.

data3 = data %>% filter(!is.na(Income))
mean_income = mean(data3$Income)

data3 = mutate(data3, 
               income_group = factor(case_when(Income < mean_income ~ "below average", TRUE ~ "above average")))

ch2 = chisq.test(data3$Response, data3$income_group)
ch2

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data3$Response and data3$income_group
## X-squared = 24.483, df = 1, p-value = 7.498e-07

data3 = data3 %>% filter (Response == 1) %>% group_by (income_group) %>% summarise (count = n())

ggplot(data = data3) +
  geom_bar(aes(x = income_group, y = count), 
           stat = "identity", fill = "#104E8B") +
  geom_text(aes(x = income_group, y = count,
                label = count), vjust = -0.5, size = 3.5) +
  labs(title = "Campaign responses by income level",
    subtitle = paste0("Mean income threshold: $", 
                      round(mean_income, 0)),
    x = "Income group",
    y = "Number of responses") + theme_classic()

Results: Hypothesis is confirmed since p-value is lower than 0.01. The chi-square test shows that customers with above average income are significantly more likely to respond.

Question 3:

Does a type of place (web site or directly in store) affect if customers reponce to the campaign?

Hypothesis: Customers making more purchases in the website rather than directly in the store are more likely to respond to the campaign.

t.test (data$NumWebPurchases ~ data$Response)

## 
##  Welch Two Sample t-test
## 
## data:  data$NumWebPurchases by data$Response
## t = -7.5416, df = 481.43, p-value = 2.33e-13
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.4622265 -0.8577715
## sample estimates:
## mean in group 0 mean in group 1 
##        3.911857        5.071856

ggplot() + 
  geom_boxplot(data = data, aes(x=Response, y=NumWebPurchases), fill="#BFEFFF") +
  ylab ("Number of purchases made on website") + 
  ggtitle ("Campaign Responses by number of online purchases") +
  theme_classic()

t.test (data$NumStorePurchases ~ data$Response)

## 
##  Welch Two Sample t-test
## 
## data:  data$NumStorePurchases by data$Response
## t = -1.9458, df = 474.81, p-value = 0.05226
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.721904281  0.003529907
## sample estimates:
## mean in group 0 mean in group 1 
##        5.736621        6.095808

ggplot() + 
  geom_boxplot(data = data, aes(x=Response, y=NumStorePurchases), fill="#BFEFFF") +
  ylab ("Number of purchases made in the store") + 
  ggtitle ("Campaign Responses by number of offline purchases") +
  theme_classic()

Results: Hypothesis is partly confirmed. T-test shows that there is a statistically significant difference in mean web purchases between responders and non-responders (p-value < 0.01). Responders have a higher mean of purchases than non-responders. However, no significant difference was found for offline store purchases (p > 0.01), suggesting that digital engagement and not overall purchase frequency is the key behavioural signal for campaign responsiveness.

Question 4:

Does a number of days since the last purchase affects the campaign response?

Hypothesis: Customers who purchased more recently are more likely to respond to the campaign

t.test (data$Recency ~ data$Response)

## 
##  Welch Two Sample t-test
## 
## data:  data$Recency by data$Response
## t = 9.786, df = 465.81, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  12.89220 19.37072
## sample estimates:
## mean in group 0 mean in group 1 
##        51.51469        35.38323

ggplot() + 
  geom_boxplot(data = data, aes(x=Response, y=Recency), fill="#BFEFFF") +
  ylab ("Days since the last purchase") + 
  ggtitle ("Campaign responses by recency of the purchase") +
  theme_classic()

Results: Hypothesis is confirmed since t-test shows that there is a statistically significant difference in recency between responders and non-responders (p-value < 0.01). Responders show significantly lower recency values which indicates more recent engagement.

Predictive Model

set.seed(888)
train_index = sample(1:nrow(data), size = 0.8 * nrow(data))

train = data[train_index, ]
test = data[-train_index, ]

tree = rpart(
  Response ~ Income + NumWebPurchases + Recency,
  data = train,
  method = "class",
  weights = ifelse(train$Response == 1, 5, 1)) # added 5 times more weight to class 1 since there are more negative cases in the dataset
rpart.plot(tree)

pred_class = predict(
  tree,
  newdata = test,
  type = "class")

pred_prob = predict(
  tree,
  newdata = test,
  type = "prob")

mean(pred_class == test$Response)

## [1] 0.7388393

table(pred_class, test$Response)

##           
## pred_class   0   1
##          0 291  31
##          1  86  40

roc_obj = roc(test$Response, pred_prob[,2])
auc(roc_obj)

## Area under the curve: 0.6744

plot(roc_obj)

Model accuracy: 0.74 Area under the curve: 0.66. Given the class imbalance in the dataset, overall accuracy is less informative than AUC. The AUC of 0.66 indicates moderate predictive ability above a random baseline. While the model is not highly precise at individual-level prediction, it provides meaningful segmentation and ranking capability for marketing targeting purposes.

The decision tree reveals that Income is the strongest predictor of campaign response, acting as the primary segmentation factor in customer targeting. Customers with higher income levels are more likely to respond, indicating that purchasing power is a key driver of marketing engagement.

Among lower income customers, Recency becomes the most important behavioral factor. Customers who have interacted with the company more recently show significantly higher response rates especially when combined with online purchasing activity.

Key Findings and Recommendations

Income significantly predicts response -> Prioritise above average income segment in campaign targeting
Recency influences response -> Focus re-engagement efforts on recently active users
Web purchases increase likelihood of response -> Focus on digital campaigns
Low-income + inactive customers show low response probability -> Reduce marketing spend or use low-cost outreach strategies for these segments
Day of week has no effect -> Campaign can be launched on any day