Modeling Customer Engagement in Retail Promotions Through Logistic Regression

Luke Volm

2025-12-09

Agenda

Background

Description of Data Set

Definitions of Key Customer Behavior Variables
Variable Meaning Type
FRE Number of purchase visits Numeric
MON Total net sales ($ spent) Numeric ($)
STYLES Number of individual items purchased Numeric
PROMOS Marketing promotions on file Numeric
DAYS Customer tenure (days in the system) Numeric
PERCRET Percent of products returned Numeric (%)
HI Product uniformity Numeric
GMP Gross margin percentage Numeric
CLASSES Number of different product classes purchased Numeric
STORES Number of stores shopped at Numeric

Practical Questions

Key Data Insights

Exploratory Findings

Model Implications

Key Data Insights (Continued)

Exploratory Data Analysis

Data Preparation for Modeling

Summary of Variable Transformations Used for Logistic Regression
Original Transformed Transformation Rationale
FRE FRE_log log1p(FRE) Right-skewed visit counts
MON MON_log log1p(MON) Right-skewed spending
STYLES STYLES_log log1p(STYLES) Right-skewed item counts
CLASSES CLASSES_log log1p(CLASSES) Right-skewed product mix
STORES STORES_log log1p(STORES) Right-skewed store count
HI HI_log log1p(HI) Right-skewed product uniformity
PERCRET PERCRET_logit qlogis(PERCRET / 100) Bounded 0–100% proportion
Standardizing the Response Variable
Raw_Format Cleaned_Level
‘Yes’, 1 yes
‘No’, 0 no

Train/Test Split

Candidate 1: Full Model

\[ \log\left(\frac{P(\text{RESP}=1)}{1 - P(\text{RESP}=1)}\right) = \beta_0 + \beta_1 \text{MON}_{log} + \beta_2 \text{FRE}_{log} + \beta_3 \text{STYLES}_{log} + \beta_4 \text{CLASSES}_{log} + \beta_5 \text{STORES}_{log} + \beta_6 \text{HI}_{log} + \beta_7 \text{PERCRET}_{logit} + \beta_8 \text{PROMOS} + \beta_9 \text{DAYS} + \beta_{10} \text{GMP} \]
Full Model: Logistic Regression Coefficients
Estimate Std. Error z value Pr(>|z|) Signif.
(Intercept) -3.19999933745048 0.967864801058703 -3.30624621739539 0.000945549607610377 ***
MON_log -0.00653223554857481 0.257446225207256 -0.0253732038343777 0.979757284467346
FRE_log 1.42378098032587 0.244569857022652 5.82157178999372 5.82967546714758e-09 ***
STYLES_log 0.590069977353168 0.327111167077948 1.80388209495935 0.0712497924504306 .
CLASSES_log -0.415360661740594 0.382489599581305 -1.08593975416658 0.277505647246113
STORES_log 0.0390270676842855 0.2368921677216 0.164746129260596 0.869143807123073
HI_log -0.215554924025072 0.221089101503031 -0.974968564979748 0.329575853146838
PERCRET_logit -0.0187766155360595 0.0281758433119881 -0.666408289120152 0.505150165693981
PROMOS -0.0374718694153468 0.0152067684992752 -2.46415728740349 0.0137335829849217 *
DAYS -0.000874025954924173 0.000515180766016738 -1.69654228685971 0.0897832289588001 .
GMP -0.667540973632493 0.561206841675275 -1.18947404782129 0.234253177510772

VIF Results for Candidate 1

##       MON_log       FRE_log    STYLES_log   CLASSES_log    STORES_log 
##         12.04          5.52         15.81          9.95          1.52 
##        HI_log PERCRET_logit        PROMOS          DAYS           GMP 
##          3.56          2.07          2.27          1.86          1.70

Reducing Redundancy

Candidate 2: Reduced Model

\[ \log\left(\frac{P(\text{RESP}=1)}{1 - P(\text{RESP}=1)}\right) = \beta_0 + \beta_1 \,\text{FRE}_{\log} + \beta_2 \,\text{STYLES}_{\log} + \beta_3 \,\text{PROMOS} \]
Reduced Model: Logistic Regression Coefficients
Estimate Std. Error z value Pr(>|z|) Signif.
(Intercept) -4.57821184559361 0.249819684131899 -18.326065303871 5.12676963464791e-75 ***
FRE_log 1.35109844283195 0.214594391078006 6.29605664921978 3.05312832628115e-10 ***
STYLES_log 0.4557129724829 0.168925771235963 2.69771136250335 0.00698179472782768 **
PROMOS -0.0547471355208317 0.0130184330353159 -4.20535523532791 2.60672397566313e-05 ***

VIF Results for Reduced Model

##    FRE_log STYLES_log     PROMOS 
##       4.29       4.29       1.66

Cross Validation for Model Selection

Cross-Validation Accuracy: Model Comparison
Model Accuracy
Full Model 0.8603
Reduced Model 0.8609

Model Selection

Reduced Logistic Regression Model

\[ \log\left(\frac{P(\text{RESP}=1)}{1 - P(\text{RESP}=1)}\right) = \beta_0 + \beta_1 \,\text{FRE}_{\log} + \beta_2 \,\text{STYLES}_{\log} + \beta_3 \,\text{PROMOS} \]

Why?

Results

Reduced Model: Odds Ratios and 95% Confidence Intervals
Predictor Odds_Ratio Lower_95 Upper_95
FRE_log FRE_log 3.86 2.55 5.92
STYLES_log STYLES_log 1.58 1.13 2.20
PROMOS PROMOS 0.95 0.92 0.97

FRE_log (purchase frequency)
- Higher shopping frequency strongly increases the odds of responding
- Customers who shop more often are much more likely to engage with promotions

STYLES_log (product variety)
- Buying a wider variety of items is associated with higher response odds
- Variety shoppers seem more engaged with the brand overall

PROMOS (number of promotions on file) - - Odds ratio < 1 → each additional promo slightly reduces response odds
- Suggests promotion fatigue

Test Set Performance

Confusion Matrix on Test Set (Accuracy = 0.8397)
no yes
no 626 83
yes 43 34

Conclusions

Limitations & Future Work