The goal of this project is to evaluate and understand the performance of a marketing food campaign.
Key performance indicators (KPI’s) are used to evaluate the performance of the campaigns as well as understanding segments of the customer base. Income, amount spent on certain items, education level, relationship status, and children presence are some of the key indicators that will be used to gain insight into the campaign.
Analysis —Who is the audience?
Understanding —What is the audience’s knowledge and attitude toward the campaign?
Demographics —What is the audience’s age, gender, education, location,and so on?
Interest —Why is the audience reading, sharing, and interacting with your brand content?
Environment —Where does the audience spend time accepting the campaign?
Needs —What are the audience needs associated with your campaign?
Customization —What specific needs and/or interests should the brand address in order to add value for the audience?
Initially the raw dataset contains 2205 Observations/Rows and 39 variables Features/Variables/Columns total.
Through SAS manipulation and cleaning, the dimensionality grew in adding 24 more variables.
Link to SAS manipulation and cleaning here
For this project there, 2205 Observations/Rows and 53 Features/Variables/Columns will be used.
There are no missing values as the data set was curated and cleaned prior.
Three variables were removed due to the lack of relevance to the project. ## Data Glossary
People
Products
Depending if there is enough variability or need, this may be broken down into the last two variables of this list.
Promotion
Some variables in this section will be aggregated to a variable that accounts for all responses/acceptances. Don’t care when the campaign was accepted, care who it worked for.
Product Attainment Allocation
Relationship status (0 or 1)
These may be consolidated to single and not single if analysis shows it is a more practical approach.
Education status (0 or 1)
May also be consolidated to college and no college is analysis allows.
The following table shows the average and standard deviation of those who responded to all campaigns and those who didn’t.
| Characteristic | Did not Respond, N = 1,6011 | Respond, N = 6041 | Test Statistic | p-value2 |
|---|---|---|---|---|
| Income | 47,813.08 (19,110.66) | 61,718.51 (21,411.17) | -14 | <0.001 |
| Recency | 50.82 (28.59) | 44.20 (29.31) | 4.8 | <0.001 |
| MntWines | 219.01 (259.18) | 537.18 (405.53) | -18 | <0.001 |
| MntFruits | 23.00 (37.25) | 35.43 (44.62) | -6.1 | <0.001 |
| MntMeatProducts | 124.26 (177.01) | 274.12 (271.82) | -13 | <0.001 |
| MntFishProducts | 32.42 (50.23) | 51.91 (63.36) | -6.8 | <0.001 |
| MntSweetProducts | 23.01 (37.61) | 38.04 (47.61) | -7.0 | <0.001 |
| MntGoldProds | 38.01 (47.76) | 60.10 (58.11) | -8.3 | <0.001 |
| NumDealsPurchases | 2.35 (1.87) | 2.22 (1.93) | 1.5 | 0.15 |
| NumWebPurchases | 3.73 (2.71) | 5.08 (2.58) | -11 | <0.001 |
| NumCatalogPurchases | 2.08 (2.54) | 4.13 (2.92) | -15 | <0.001 |
| NumStorePurchases | 5.53 (3.19) | 6.60 (3.25) | -6.9 | <0.001 |
| NumWebVisitsMonth | 5.44 (2.34) | 5.06 (2.58) | 3.2 | 0.001 |
| Age | 50.87 (11.49) | 51.70 (12.25) | -1.5 | 0.15 |
| Customer_Days | 2,502.17 (198.15) | 2,540.67 (211.45) | -3.9 | <0.001 |
| MntTotal | 421.70 (466.79) | 936.68 (664.35) | -17 | <0.001 |
| TOT_Purchase | 11.35 (6.96) | 15.81 (6.73) | -14 | <0.001 |
| ChildrenBV | 1,254 / 1,601 (78%) | 323 / 604 (53%) | 133 | <0.001 |
| SingleBV | 538 / 1,601 (34%) | 245 / 604 (41%) | 9.3 | 0.002 |
| CollegeBV | 1,401 / 1,601 (88%) | 552 / 604 (91%) | 6.5 | 0.011 |
| TOT_Prods | 459.71 (493.35) | 996.78 (683.79) | -18 | <0.001 |
| TOT_Grocery | 421.70 (466.79) | 936.68 (664.35) | -17 | <0.001 |
| 1 Mean (SD); n / N (%) | ||||
| 2 Welch Two Sample t-test; Pearson’s Chi-squared test | ||||
In order to check for significance of the variables between two samples, T-test and chi squared test are used to check for significance of features.
Notable observations:
Customers who accepted/responded to campaigns.
Rejected/No Response to campaigns
Similar Statistics
For the analysis and predictions, the amount spent on wines, the presence of children, and income will be focus points on but will keep other variables in mind throughout.
Observations of Income and Response Class Histogram
Where to go from here
Observations from Income vs Cumulative Sum of Responses
The following table shows how the cumulative sum is calculated as it related to campaign response and income.
| Income | AcceptBV | FreqAccept | CSAccept | |
|---|---|---|---|---|
| 1226 | 1730 | 0 | -1 | -1 |
| 21 | 2447 | 0 | -1 | -2 |
| 1500 | 3502 | 0 | -1 | -3 |
| 1821 | 4023 | 0 | -1 | -4 |
| 1950 | 4428 | 0 | -1 | -5 |
| 962 | 4861 | 0 | -1 | -6 |
| 2182 | 5305 | 0 | -1 | -7 |
| 10 | 5648 | 1 | 1 | -6 |
| 1308 | 6560 | 0 | -1 | -7 |
| 755 | 6835 | 0 | -1 | -8 |
| 1781 | 7144 | 0 | -1 | -9 |
| 11 | 7500 | 0 | -1 | -10 |
| 42 | 7500 | 0 | -1 | -11 |
| 44 | 7500 | 1 | 1 | -10 |
| 226 | 7500 | 0 | -1 | -11 |
| 423 | 7500 | 0 | -1 | -12 |
| 705 | 7500 | 1 | 1 | -11 |
| 843 | 7500 | 0 | -1 | -12 |
| 1133 | 7500 | 0 | -1 | -13 |
| 1227 | 7500 | 0 | -1 | -14 |
Fitting a polynomial to the data in order to find an inflection point that will give an idea as to where the change in income responses starts to leans towards customers that accepted the campaigns.
x = dfEDA$Income
y =dfEDA$CSAccept
plot(x,y,type="l")
lo <- loess(y~x)
xl <- seq(min(x),max(x), (max(x) - min(x))/444)
out = predict(lo,xl)
lines(xl, out, col='red', lwd=2)
infl <- c(FALSE, diff(diff(out)>0)!=0)
points(xl[infl ], out[infl ], col="blue")
xl[infl]
## [1] 83714.91
Observations of cumulative sum and income relationship
Where to go from here
The goal should be to get this shift in trend to happen earlier.
Need to make campaign more responsive to customers making an income lower than $84,000.
Why is having children noteworthy for this analysis?
Observations of Children and Campaign Responses
In order to gain more insight into these campaigns, campaigns 1 and 5 will be compared to campaigns 3 and 4.
The following table shows the statistical comparison of these two aggregations for the campaigns.
Children = Campaign 3 and 4| No Children = Campaign 1 and 5
| Characteristic | Children, N = 3271 | No Children, N = 3031 | Test Statistic | p-value2 |
|---|---|---|---|---|
| Income | 59,760.22 (20,990.13) | 80,717.95 (10,200.70) | -16 | <0.001 |
| MntWines | 565.01 (429.77) | 819.92 (338.93) | -8.3 | <0.001 |
| MntFruits | 27.82 (40.30) | 55.76 (52.16) | -7.5 | <0.001 |
| MntMeatProducts | 210.76 (238.60) | 452.31 (261.60) | -12 | <0.001 |
| MntFishProducts | 39.18 (55.74) | 83.52 (67.66) | -8.9 | <0.001 |
| MntSweetProducts | 29.29 (43.32) | 65.20 (54.03) | -9.2 | <0.001 |
| MntGoldProds | 57.62 (56.81) | 77.05 (63.58) | -4.0 | <0.001 |
| NumDealsPurchases | 2.30 (1.88) | 1.21 (1.10) | 8.9 | <0.001 |
| NumWebPurchases | 5.09 (2.76) | 5.60 (2.11) | -2.7 | 0.008 |
| NumCatalogPurchases | 3.94 (2.88) | 6.08 (2.43) | -10 | <0.001 |
| NumStorePurchases | 6.43 (3.26) | 8.17 (2.74) | -7.2 | <0.001 |
| NumWebVisitsMonth | 5.46 (2.42) | 3.20 (1.95) | 13 | <0.001 |
| Response | 139 / 327 (43%) | 170 / 303 (56%) | 12 | <0.001 |
| Age | 51.28 (12.03) | 50.91 (13.68) | 0.35 | 0.7 |
| Customer_Days | 2,515.51 (208.99) | 2,496.43 (199.47) | 1.2 | 0.2 |
| marital_Divorced | 38 / 327 (12%) | 25 / 303 (8.3%) | 2.0 | 0.2 |
| marital_Married | 125 / 327 (38%) | 128 / 303 (42%) | 1.1 | 0.3 |
| marital_Single | 71 / 327 (22%) | 63 / 303 (21%) | 0.08 | 0.8 |
| marital_Together | 79 / 327 (24%) | 75 / 303 (25%) | 0.03 | 0.9 |
| marital_Widow | 14 / 327 (4.3%) | 12 / 303 (4.0%) | 0.04 | 0.8 |
| education_2n_Cycle | 24 / 327 (7.3%) | 24 / 303 (7.9%) | 0.08 | 0.8 |
| education_Basic | 6 / 327 (1.8%) | 0 / 303 (0%) | 0.031 | |
| education_Graduation | 157 / 327 (48%) | 166 / 303 (55%) | 2.9 | 0.089 |
| education_Master | 55 / 327 (17%) | 45 / 303 (15%) | 0.46 | 0.5 |
| education_PhD | 85 / 327 (26%) | 68 / 303 (22%) | 1.1 | 0.3 |
| MntTotal | 872.06 (659.75) | 1,476.70 (462.06) | -13 | <0.001 |
| TOT_Purchase | 15.46 (6.90) | 19.84 (4.40) | -9.6 | <0.001 |
| ChildrenBV | 212 / 327 (65%) | 58 / 303 (19%) | 134 | <0.001 |
| SingleBV | 123 / 327 (38%) | 100 / 303 (33%) | 1.5 | 0.2 |
| CollegeBV | 297 / 327 (91%) | 279 / 303 (92%) | 0.32 | 0.6 |
| TOT_Prods | 929.69 (680.43) | 1,553.75 (468.22) | -13 | <0.001 |
| TOT_Grocery | 872.06 (659.75) | 1,476.70 (462.06) | -13 | <0.001 |
| 1 Mean (SD); n / N (%) | ||||
| 2 Welch Two Sample t-test; Pearson’s Chi-squared test; Fisher’s exact test | ||||
Observations of Number of Acceptances by Campaign
The latest campaign managed to generate a significant response from new customers and customers who have only responded once in the previous 5 campaigns.
No new responses to the 2nd campaign, barely any responses at all. .
| Characteristic | Campaign1, N = 1421 | Campaign2, N = 301 | Campaign3, N = 1631 | Campaign4, N = 1641 | Campaign5, N = 1611 | Response, N = 3331 |
|---|---|---|---|---|---|---|
| Income | 78,872.63 (11,337.02) | 71,054.83 (16,069.84) | 50,802.58 (22,012.66) | 68,663.23 (15,478.93) | 82,345.50 (8,800.56) | 60,209.68 (23,194.08) |
| Recency | 46.68 (28.47) | 48.67 (31.61) | 45.70 (28.51) | 50.81 (29.28) | 49.04 (29.34) | 35.26 (27.62) |
| MntWines | 758.03 (335.81) | 898.67 (467.49) | 378.66 (396.50) | 750.23 (379.36) | 874.50 (333.23) | 502.62 (427.82) |
| NumWebVisitsMonth | 3.51 (2.04) | 5.17 (2.29) | 5.85 (2.53) | 5.07 (2.25) | 2.93 (1.83) | 5.31 (2.56) |
| Age | 51.62 (13.52) | 51.87 (11.09) | 48.55 (12.04) | 53.98 (11.42) | 50.29 (13.84) | 50.50 (12.33) |
| Customer_Days | 2,481.69 (201.43) | 2,523.10 (205.68) | 2,507.02 (211.13) | 2,523.95 (207.15) | 2,509.43 (197.44) | 2,607.08 (196.47) |
| MntTotal | 1,406.70 (501.92) | 1,241.27 (547.81) | 653.60 (663.95) | 1,089.20 (580.74) | 1,538.45 (415.67) | 924.41 (698.91) |
| TOT_Purchase | 19.90 (4.52) | 18.23 (5.83) | 13.26 (8.06) | 17.63 (4.60) | 19.80 (4.30) | 15.35 (6.83) |
| ChildrenBV | 33 / 142 (23%) | 12 / 30 (40%) | 115 / 163 (71%) | 97 / 164 (59%) | 25 / 161 (16%) | 165 / 333 (50%) |
| SingleBV | 48 / 142 (34%) | 11 / 30 (37%) | 63 / 163 (39%) | 60 / 164 (37%) | 52 / 161 (32%) | 175 / 333 (53%) |
| Marital_Status | ||||||
| Divorced | 12 / 142 (8.5%) | 5 / 30 (17%) | 20 / 163 (12%) | 18 / 164 (11%) | 13 / 161 (8.1%) | 48 / 333 (14%) |
| Married | 62 / 142 (44%) | 7 / 30 (23%) | 63 / 163 (39%) | 62 / 164 (38%) | 66 / 161 (41%) | 98 / 333 (29%) |
| Single | 31 / 142 (22%) | 5 / 30 (17%) | 39 / 163 (24%) | 32 / 164 (20%) | 32 / 161 (20%) | 109 / 333 (33%) |
| Together | 32 / 142 (23%) | 12 / 30 (40%) | 37 / 163 (23%) | 42 / 164 (26%) | 43 / 161 (27%) | 60 / 333 (18%) |
| Widow | 5 / 142 (3.5%) | 1 / 30 (3.3%) | 4 / 163 (2.5%) | 10 / 164 (6.1%) | 7 / 161 (4.3%) | 18 / 333 (5.4%) |
| CollegeBV | 128 / 142 (90%) | 28 / 30 (93%) | 142 / 163 (87%) | 155 / 164 (95%) | 151 / 161 (94%) | 309 / 333 (93%) |
| Education_Level | ||||||
| Basic | 0 / 142 (0%) | 0 / 30 (0%) | 6 / 163 (3.7%) | 0 / 164 (0%) | 0 / 161 (0%) | 2 / 333 (0.6%) |
| Masters | 18 / 142 (13%) | 2 / 30 (6.7%) | 24 / 163 (15%) | 31 / 164 (19%) | 27 / 161 (17%) | 56 / 333 (17%) |
| Phd | 30 / 142 (21%) | 10 / 30 (33%) | 40 / 163 (25%) | 45 / 164 (27%) | 38 / 161 (24%) | 101 / 333 (30%) |
| Second Cycle | 14 / 142 (9.9%) | 2 / 30 (6.7%) | 15 / 163 (9.2%) | 9 / 164 (5.5%) | 10 / 161 (6.2%) | 22 / 333 (6.6%) |
| Undergrad | 80 / 142 (56%) | 16 / 30 (53%) | 78 / 163 (48%) | 79 / 164 (48%) | 86 / 161 (53%) | 152 / 333 (46%) |
| Accept | ||||||
| 1 Cmpgn | 39 / 142 (27%) | 0 / 30 (0%) | 74 / 163 (45%) | 72 / 164 (44%) | 37 / 161 (23%) | 146 / 333 (44%) |
| 2 Cmpgn | 29 / 142 (20%) | 8 / 30 (27%) | 64 / 163 (39%) | 34 / 164 (21%) | 43 / 161 (27%) | 100 / 333 (30%) |
| 3 Cmpgn | 32 / 142 (23%) | 5 / 30 (17%) | 10 / 163 (6.1%) | 27 / 164 (16%) | 37 / 161 (23%) | 42 / 333 (13%) |
| 4 Cmpgn | 32 / 142 (23%) | 7 / 30 (23%) | 12 / 163 (7.4%) | 24 / 164 (15%) | 34 / 161 (21%) | 35 / 333 (11%) |
| 5 Cmpgn | 10 / 142 (7.0%) | 10 / 30 (33%) | 3 / 163 (1.8%) | 7 / 164 (4.3%) | 10 / 161 (6.2%) | 10 / 333 (3.0%) |
| AcceptBV | 142 / 142 (100%) | 30 / 30 (100%) | 163 / 163 (100%) | 164 / 164 (100%) | 161 / 161 (100%) | 333 / 333 (100%) |
| TOT_Prods | 1,484.35 (509.56) | 1,307.67 (555.42) | 720.54 (696.27) | 1,137.56 (597.45) | 1,614.96 (420.62) | 985.66 (719.39) |
| TOT_Grocery | 1,406.70 (501.92) | 1,241.27 (547.81) | 653.60 (663.95) | 1,089.20 (580.74) | 1,538.45 (415.67) | 924.41 (698.91) |
| 1 Mean (SD); n / N (%) | ||||||
Observations
Campaign 3 and the latest campaign show similarities in the customers that responded. Low income, low wine consumption, and overall lower purchase totals.
The percentage of people who graduated college is consistently high at 87%-95%.
Campaigns 1 and 5 seem to have targeted similar high income customers.
Campaigns 3 and the response campaign targets a variety of mixed income customers.
Campaigns 2 and 4 had an upper income response as well but not as high income as campaigns 1 and 5. Campaign 2 is less than a third of all other campaigns
| Characteristic | Campaign3, N = 1151 | Campaign4, N = 971 | Response, N = 1651 |
|---|---|---|---|
| Income | 43,443.47 (17,374.09) | 62,493.96 (12,802.49) | 45,585.50 (18,618.66) |
| MntWines | 269.43 (346.82) | 653.23 (367.81) | 315.12 (392.55) |
| MntFruits | 13.98 (22.30) | 13.78 (21.03) | 13.16 (20.24) |
| MntMeatProducts | 87.17 (106.18) | 119.86 (121.18) | 104.76 (120.03) |
| MntFishProducts | 18.16 (34.33) | 23.75 (44.80) | 19.96 (35.64) |
| MntSweetProducts | 13.85 (26.08) | 14.90 (29.40) | 14.85 (25.23) |
| MntGoldProds | 58.27 (61.05) | 41.43 (39.95) | 47.29 (50.07) |
| NumDealsPurchases | 2.67 (1.61) | 3.45 (2.23) | 3.62 (2.31) |
| NumWebPurchases | 4.27 (3.18) | 5.85 (2.42) | 4.73 (2.87) |
| NumCatalogPurchases | 2.74 (2.69) | 3.11 (2.03) | 2.47 (2.46) |
| NumStorePurchases | 4.19 (2.85) | 7.34 (2.54) | 4.62 (2.46) |
| NumWebVisitsMonth | 6.70 (2.08) | 6.24 (1.45) | 7.10 (1.36) |
| Response | 47 / 115 (41%) | 24 / 97 (25%) | 165 / 165 (100%) |
| Age | 48.75 (10.71) | 54.92 (8.71) | 51.00 (10.38) |
| Customer_Days | 2,517.83 (211.15) | 2,534.20 (219.48) | 2,640.51 (191.21) |
| education_Graduation | 55 / 115 (48%) | 49 / 97 (51%) | 70 / 165 (42%) |
| MntTotal | 402.59 (475.69) | 825.52 (468.10) | 467.85 (514.78) |
| TOT_Purchase | 11.20 (7.74) | 16.30 (4.42) | 11.82 (6.55) |
| ChildrenBV | 115 / 115 (100%) | 97 / 97 (100%) | 165 / 165 (100%) |
| SingleBV | 42 / 115 (37%) | 28 / 97 (29%) | 74 / 165 (45%) |
| CollegeBV | 102 / 115 (89%) | 93 / 97 (96%) | 155 / 165 (94%) |
| TOT_Prods | 460.86 (509.54) | 866.95 (485.97) | 515.14 (537.63) |
| TOT_Grocery | 402.59 (475.69) | 825.52 (468.10) | 467.85 (514.78) |
| 1 Mean (SD); n / N (%) | |||
##
## Shapiro-Wilk normality test
##
## data: df2[df2$Campaign == "Campaign3" & df2$Accepted_C == 1 & df2$ChildrenBV == 1, ]$Income
## W = 0.97862, p-value = 0.06271
##
## Shapiro-Wilk normality test
##
## data: df2[df2$Campaign == "Campaign4" & df2$Accepted_C == 1 & df2$ChildrenBV == 1, ]$Income
## W = 0.97776, p-value = 0.09855
##
## Shapiro-Wilk normality test
##
## data: df2[df2$Campaign == "Response" & df2$Accepted_C == 1 & df2$ChildrenBV == 1, ]$Income
## W = 0.97887, p-value = 0.01265
## Anova Table (Type III tests)
##
## Response: Income
## Sum Sq Df F value Pr(>F)
## (Intercept) 2.1704e+11 1 758.652 < 2.2e-16 ***
## Campaign 2.3107e+10 2 40.383 < 2.2e-16 ***
## Residuals 1.0700e+11 374
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The more times a customer accepts the campaign, the higher the income of the customer.
set.seed(123)
training.samples <- dfmodel$Response %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- dfmodel[training.samples, ]
test.data <- dfmodel[-training.samples, ]
x <- model.matrix(Response~., train.data)[,-22]
# Convert the outcome (class) to a numerical variable
y <- ifelse(train.data$Response == 1, 1, 0)
# Find the best lambda using cross-validation
set.seed(123)
cv.lasso <- cv.glmnet(x, y, alpha = 1, family = "binomial")
plot(cv.lasso)
# Fit the final model on the training data
model <- glmnet(x, y, alpha = 1, family = "binomial",
lambda = cv.lasso$lambda.min)
# Display regression coefficients
coef(model)
## 43 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) -1.338483e+01
## (Intercept) .
## Income .
## Kidhome 2.541566e-01
## Teenhome -8.944843e-01
## Recency -3.063529e-02
## MntWines -7.711868e-04
## MntFruits 2.721773e-03
## MntMeatProducts 2.371779e-03
## MntFishProducts -2.282393e-03
## MntSweetProducts .
## MntGoldProds 1.838066e-03
## NumDealsPurchases 1.644908e-01
## NumWebPurchases 6.105248e-02
## NumCatalogPurchases 1.624755e-01
## NumStorePurchases -2.042529e-01
## NumWebVisitsMonth 7.581466e-02
## AcceptedCmp3 1.436130e+00
## AcceptedCmp4 9.776652e-01
## AcceptedCmp5 1.494024e+00
## AcceptedCmp1 1.049988e+00
## AcceptedCmp2 1.761545e+00
## Age 6.702086e-03
## Customer_Days 4.175057e-03
## marital_Divorced 5.122041e-02
## marital_Married .
## marital_Single .
## marital_Together .
## marital_Widow -1.668337e-01
## education_2n_Cycle .
## education_Basic -9.599181e-01
## education_Graduation .
## education_Master 4.117749e-01
## education_PhD 1.187721e+00
## MntTotal .
## TOT_Purchase .
## TOT_Children .
## ChildrenBV -2.648628e-01
## SingleBV 1.160085e+00
## CollegeBV 5.142897e-02
## Accept5BV 5.759373e-01
## TOT_Prods .
## TOT_Grocery .
# Make predictions on the test data
x.test <- model.matrix(Response ~., test.data)[,-22]
probabilities <- model %>% predict(newx = x.test)
predicted.classes <- ifelse(probabilities > 0.5, 1, 0)
# Model accuracy
observed.classes <- test.data$Response
mean(predicted.classes == observed.classes)
## [1] 0.8956916
# Fit the model
full.model <- glm(Response ~., data = train.data, family = binomial)
# Make predictions
probabilities <- full.model %>% predict(test.data[,-22], type = "response")
predicted.classes <- ifelse(probabilities > 0.5, 1, 0)
# Model accuracy
observed.classes <- test.data$Response
mean(predicted.classes == observed.classes)
## [1] 0.893424
Not only does the LASSO model use 12 fewer indicators than the full Logistic model, it also predicts a higher rate of the observations.