1 Intro

The goal of this project is to evaluate and understand the performance of a marketing food campaign.

Key performance indicators (KPI’s) are used to evaluate the performance of the campaigns as well as understanding segments of the customer base. Income, amount spent on certain items, education level, relationship status, and children presence are some of the key indicators that will be used to gain insight into the campaign.

1.1 Motivating Questions

Analysis —Who is the audience?

Understanding —What is the audience’s knowledge and attitude toward the campaign?

Demographics —What is the audience’s age, gender, education, location,and so on?

Interest —Why is the audience reading, sharing, and interacting with your brand content?

Environment —Where does the audience spend time accepting the campaign?

Needs —What are the audience needs associated with your campaign?

Customization —What specific needs and/or interests should the brand address in order to add value for the audience?

2 Exploratory Data Analysis

2.1 Data Glossary and SAS Utilization

Initially the raw dataset contains 2205 Observations/Rows and 39 variables Features/Variables/Columns total.

Link to raw dataset here

Through SAS manipulation and cleaning, the dimensionality grew in adding 24 more variables.

Link to SAS manipulation and cleaning here

For this project there, 2205 Observations/Rows and 53 Features/Variables/Columns will be used.

There are no missing values as the data set was curated and cleaned prior.

Three variables were removed due to the lack of relevance to the project. ## Data Glossary

People

  • Income: Customer’s yearly household income
  • Kidhome: Number of children in customer’s household
  • Teenhome: Number of teenagers in customer’s household
  • Recency: Number of days since customer’s last purchase
  • Age: Customers age
  • Customer_Days: Days since customer’s enrollment with the company
  • TOT_Children: Total number of children and teens
  • ChildrenBV: 1 if there are kids/teens, 0 otherwise
  • Children: “Has Children” or “No Children”

Products

Depending if there is enough variability or need, this may be broken down into the last two variables of this list.

  • MntWines: Amount spent on wine in last 2 years
  • MntFruits: Amount spent on fruits in last 2 years
  • MntMeatProducts: Amount spent on meat in last 2 years
  • MntFishProducts: Amount spent on fish in last 2 years
  • MntSweetProducts: Amount spent on sweets in last 2 years
  • MntGoldProds: Amount spent on gold in last 2 years
  • MntTotal: Amount Spent by all products

Promotion

Some variables in this section will be aggregated to a variable that accounts for all responses/acceptances. Don’t care when the campaign was accepted, care who it worked for.

  • AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise
  • AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
  • AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
  • AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise
  • AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise
  • Response: 1 if customer accepted the offer in the last campaign, 0 otherwise
  • Complain: 1 if the customer complained, 0 otherwise
  • pattern: Responses to all campaigns in sequence
  • Accept: Number of campaigns accepted
  • AcceptBV: 1 if they accepted a campaign
  • AcceptCBV: “Respond” or “Did Not Respond”
  • Accept5BV: 1 if they accepted campaigns 1-5, 0 otherwise
  • Accept5CBV: “New Response” or “Did Not Respond”

Product Attainment Allocation

  • NumDealsPurchases: Number of purchases made with a discount
  • NumWebPurchases: Number of purchases made through the company’s web site
  • NumCatalogPurchases: Number of purchases made using a catalog
  • NumStorePurchases: Number of purchases made directly in stores
  • NumWebVisitsMonth: Number of visits to company’s web site in the last month.
  • TOT_Purchase: Total number of items purchased through mediums (NumWebPurchases+NumCatalogPurchases + NumStorePurchases)

Relationship status (0 or 1)

These may be consolidated to single and not single if analysis shows it is a more practical approach.

  • marital_Divorced: Customer is divorced.
  • marital_Married: Customer is married.
  • marital_Single: Customer is single.
  • marital_Together: Customer is in a relationship.
  • marital_widow: Customer is widowed.
  • SingleBV: 1 if Single or Divorced or Widow, 0 otherwise
  • Single: “Single” or “Not Single”
  • Marital_Status: All 5 marital options as a factor

Education status (0 or 1)

May also be consolidated to college and no college is analysis allows.

  • education_2n.Cycle: Did not pass high school.
  • education_Basic: High School/GED.
  • education_Graduation: Bachelors.
  • education_Master: Master degree.
  • education_PhD: Nerds.
  • CollegeBV: 1 if graduated from undergrad, masters or phd, 0 otherwise
  • College: “College” or “Non College Grad”
  • Education_Level: All five education options

2.2 Campaign Response

The following table shows the average and standard deviation of those who responded to all campaigns and those who didn’t.

Characteristic Did not Respond, N = 1,6011 Respond, N = 6041 Test Statistic p-value2
Income 47,813.08 (19,110.66) 61,718.51 (21,411.17) -14 <0.001
Recency 50.82 (28.59) 44.20 (29.31) 4.8 <0.001
MntWines 219.01 (259.18) 537.18 (405.53) -18 <0.001
MntFruits 23.00 (37.25) 35.43 (44.62) -6.1 <0.001
MntMeatProducts 124.26 (177.01) 274.12 (271.82) -13 <0.001
MntFishProducts 32.42 (50.23) 51.91 (63.36) -6.8 <0.001
MntSweetProducts 23.01 (37.61) 38.04 (47.61) -7.0 <0.001
MntGoldProds 38.01 (47.76) 60.10 (58.11) -8.3 <0.001
NumDealsPurchases 2.35 (1.87) 2.22 (1.93) 1.5 0.15
NumWebPurchases 3.73 (2.71) 5.08 (2.58) -11 <0.001
NumCatalogPurchases 2.08 (2.54) 4.13 (2.92) -15 <0.001
NumStorePurchases 5.53 (3.19) 6.60 (3.25) -6.9 <0.001
NumWebVisitsMonth 5.44 (2.34) 5.06 (2.58) 3.2 0.001
Age 50.87 (11.49) 51.70 (12.25) -1.5 0.15
Customer_Days 2,502.17 (198.15) 2,540.67 (211.45) -3.9 <0.001
MntTotal 421.70 (466.79) 936.68 (664.35) -17 <0.001
TOT_Purchase 11.35 (6.96) 15.81 (6.73) -14 <0.001
ChildrenBV 1,254 / 1,601 (78%) 323 / 604 (53%) 133 <0.001
SingleBV 538 / 1,601 (34%) 245 / 604 (41%) 9.3 0.002
CollegeBV 1,401 / 1,601 (88%) 552 / 604 (91%) 6.5 0.011
TOT_Prods 459.71 (493.35) 996.78 (683.79) -18 <0.001
TOT_Grocery 421.70 (466.79) 936.68 (664.35) -17 <0.001
1 Mean (SD); n / N (%)
2 Welch Two Sample t-test; Pearson’s Chi-squared test

In order to check for significance of the variables between two samples, T-test and chi squared test are used to check for significance of features.

Notable observations:

Customers who accepted/responded to campaigns.

  • Are wealthier, average 61k vs 47k.
  • Bought more products in each area but significantly more in wines.
  • Roughly half are childless.

Rejected/No Response to campaigns

  • Almost 80% have children.
  • Bought more items by means of discount

Similar Statistics

  • Education doesn’t seem to be as strong a factor as I thought it’d be.
  • Age is not significant.
  • Partner status is nearly similar.

For the analysis and predictions, the amount spent on wines, the presence of children, and income will be focus points on but will keep other variables in mind throughout.

2.2.1 Income

Observations of Income and Response Class Histogram

  • As income increases so does the ratio of people who responded with those who didn’t.

Where to go from here

  • Find out at which income will the ratio favor people that respond to campaign as income increases.

Observations from Income vs Cumulative Sum of Responses

  • Around the 75000 - 90000 income range is where the ratio favors those who respond to the campaign.

The following table shows how the cumulative sum is calculated as it related to campaign response and income.

Income AcceptBV FreqAccept CSAccept
1226 1730 0 -1 -1
21 2447 0 -1 -2
1500 3502 0 -1 -3
1821 4023 0 -1 -4
1950 4428 0 -1 -5
962 4861 0 -1 -6
2182 5305 0 -1 -7
10 5648 1 1 -6
1308 6560 0 -1 -7
755 6835 0 -1 -8
1781 7144 0 -1 -9
11 7500 0 -1 -10
42 7500 0 -1 -11
44 7500 1 1 -10
226 7500 0 -1 -11
423 7500 0 -1 -12
705 7500 1 1 -11
843 7500 0 -1 -12
1133 7500 0 -1 -13
1227 7500 0 -1 -14

Fitting a polynomial to the data in order to find an inflection point that will give an idea as to where the change in income responses starts to leans towards customers that accepted the campaigns.

x = dfEDA$Income
y =dfEDA$CSAccept
plot(x,y,type="l")
lo <- loess(y~x)
xl <- seq(min(x),max(x), (max(x) - min(x))/444)
out = predict(lo,xl)
lines(xl, out, col='red', lwd=2)

infl <- c(FALSE, diff(diff(out)>0)!=0)

points(xl[infl ], out[infl ], col="blue")

xl[infl]
## [1] 83714.91

Observations of cumulative sum and income relationship

  • After about 84,000 dollars the cumulative sum of income starts trend towards customers that accepted the campaign.

Where to go from here

  • The goal should be to get this shift in trend to happen earlier.

  • Need to make campaign more responsive to customers making an income lower than $84,000.

2.2.2 Children

Why is having children noteworthy for this analysis?

  • 53% of those that responded to any of the campaigns had children.
  • 78% of people that did not respond have children.

Observations of Children and Campaign Responses

  • Campaigns 1 and 5 drew in a higher ratio of responses from people without children.
  • Response campaign was well balanced.
  • Campaigns 3 and 4 drew in a higher ratio of response from people with children but to a smaller degree than Campaigns 1 and 5.

In order to gain more insight into these campaigns, campaigns 1 and 5 will be compared to campaigns 3 and 4.

The following table shows the statistical comparison of these two aggregations for the campaigns.

Children = Campaign 3 and 4| No Children = Campaign 1 and 5

Characteristic Children, N = 3271 No Children, N = 3031 Test Statistic p-value2
Income 59,760.22 (20,990.13) 80,717.95 (10,200.70) -16 <0.001
MntWines 565.01 (429.77) 819.92 (338.93) -8.3 <0.001
MntFruits 27.82 (40.30) 55.76 (52.16) -7.5 <0.001
MntMeatProducts 210.76 (238.60) 452.31 (261.60) -12 <0.001
MntFishProducts 39.18 (55.74) 83.52 (67.66) -8.9 <0.001
MntSweetProducts 29.29 (43.32) 65.20 (54.03) -9.2 <0.001
MntGoldProds 57.62 (56.81) 77.05 (63.58) -4.0 <0.001
NumDealsPurchases 2.30 (1.88) 1.21 (1.10) 8.9 <0.001
NumWebPurchases 5.09 (2.76) 5.60 (2.11) -2.7 0.008
NumCatalogPurchases 3.94 (2.88) 6.08 (2.43) -10 <0.001
NumStorePurchases 6.43 (3.26) 8.17 (2.74) -7.2 <0.001
NumWebVisitsMonth 5.46 (2.42) 3.20 (1.95) 13 <0.001
Response 139 / 327 (43%) 170 / 303 (56%) 12 <0.001
Age 51.28 (12.03) 50.91 (13.68) 0.35 0.7
Customer_Days 2,515.51 (208.99) 2,496.43 (199.47) 1.2 0.2
marital_Divorced 38 / 327 (12%) 25 / 303 (8.3%) 2.0 0.2
marital_Married 125 / 327 (38%) 128 / 303 (42%) 1.1 0.3
marital_Single 71 / 327 (22%) 63 / 303 (21%) 0.08 0.8
marital_Together 79 / 327 (24%) 75 / 303 (25%) 0.03 0.9
marital_Widow 14 / 327 (4.3%) 12 / 303 (4.0%) 0.04 0.8
education_2n_Cycle 24 / 327 (7.3%) 24 / 303 (7.9%) 0.08 0.8
education_Basic 6 / 327 (1.8%) 0 / 303 (0%) 0.031
education_Graduation 157 / 327 (48%) 166 / 303 (55%) 2.9 0.089
education_Master 55 / 327 (17%) 45 / 303 (15%) 0.46 0.5
education_PhD 85 / 327 (26%) 68 / 303 (22%) 1.1 0.3
MntTotal 872.06 (659.75) 1,476.70 (462.06) -13 <0.001
TOT_Purchase 15.46 (6.90) 19.84 (4.40) -9.6 <0.001
ChildrenBV 212 / 327 (65%) 58 / 303 (19%) 134 <0.001
SingleBV 123 / 327 (38%) 100 / 303 (33%) 1.5 0.2
CollegeBV 297 / 327 (91%) 279 / 303 (92%) 0.32 0.6
TOT_Prods 929.69 (680.43) 1,553.75 (468.22) -13 <0.001
TOT_Grocery 872.06 (659.75) 1,476.70 (462.06) -13 <0.001
1 Mean (SD); n / N (%)
2 Welch Two Sample t-test; Pearson’s Chi-squared test; Fisher’s exact test

Observations of Number of Acceptances by Campaign

  • The latest campaign managed to generate a significant response from new customers and customers who have only responded once in the previous 5 campaigns.

  • No new responses to the 2nd campaign, barely any responses at all. .

Characteristic Campaign1, N = 1421 Campaign2, N = 301 Campaign3, N = 1631 Campaign4, N = 1641 Campaign5, N = 1611 Response, N = 3331
Income 78,872.63 (11,337.02) 71,054.83 (16,069.84) 50,802.58 (22,012.66) 68,663.23 (15,478.93) 82,345.50 (8,800.56) 60,209.68 (23,194.08)
Recency 46.68 (28.47) 48.67 (31.61) 45.70 (28.51) 50.81 (29.28) 49.04 (29.34) 35.26 (27.62)
MntWines 758.03 (335.81) 898.67 (467.49) 378.66 (396.50) 750.23 (379.36) 874.50 (333.23) 502.62 (427.82)
NumWebVisitsMonth 3.51 (2.04) 5.17 (2.29) 5.85 (2.53) 5.07 (2.25) 2.93 (1.83) 5.31 (2.56)
Age 51.62 (13.52) 51.87 (11.09) 48.55 (12.04) 53.98 (11.42) 50.29 (13.84) 50.50 (12.33)
Customer_Days 2,481.69 (201.43) 2,523.10 (205.68) 2,507.02 (211.13) 2,523.95 (207.15) 2,509.43 (197.44) 2,607.08 (196.47)
MntTotal 1,406.70 (501.92) 1,241.27 (547.81) 653.60 (663.95) 1,089.20 (580.74) 1,538.45 (415.67) 924.41 (698.91)
TOT_Purchase 19.90 (4.52) 18.23 (5.83) 13.26 (8.06) 17.63 (4.60) 19.80 (4.30) 15.35 (6.83)
ChildrenBV 33 / 142 (23%) 12 / 30 (40%) 115 / 163 (71%) 97 / 164 (59%) 25 / 161 (16%) 165 / 333 (50%)
SingleBV 48 / 142 (34%) 11 / 30 (37%) 63 / 163 (39%) 60 / 164 (37%) 52 / 161 (32%) 175 / 333 (53%)
Marital_Status
    Divorced 12 / 142 (8.5%) 5 / 30 (17%) 20 / 163 (12%) 18 / 164 (11%) 13 / 161 (8.1%) 48 / 333 (14%)
    Married 62 / 142 (44%) 7 / 30 (23%) 63 / 163 (39%) 62 / 164 (38%) 66 / 161 (41%) 98 / 333 (29%)
    Single 31 / 142 (22%) 5 / 30 (17%) 39 / 163 (24%) 32 / 164 (20%) 32 / 161 (20%) 109 / 333 (33%)
    Together 32 / 142 (23%) 12 / 30 (40%) 37 / 163 (23%) 42 / 164 (26%) 43 / 161 (27%) 60 / 333 (18%)
    Widow 5 / 142 (3.5%) 1 / 30 (3.3%) 4 / 163 (2.5%) 10 / 164 (6.1%) 7 / 161 (4.3%) 18 / 333 (5.4%)
CollegeBV 128 / 142 (90%) 28 / 30 (93%) 142 / 163 (87%) 155 / 164 (95%) 151 / 161 (94%) 309 / 333 (93%)
Education_Level
    Basic 0 / 142 (0%) 0 / 30 (0%) 6 / 163 (3.7%) 0 / 164 (0%) 0 / 161 (0%) 2 / 333 (0.6%)
    Masters 18 / 142 (13%) 2 / 30 (6.7%) 24 / 163 (15%) 31 / 164 (19%) 27 / 161 (17%) 56 / 333 (17%)
    Phd 30 / 142 (21%) 10 / 30 (33%) 40 / 163 (25%) 45 / 164 (27%) 38 / 161 (24%) 101 / 333 (30%)
    Second Cycle 14 / 142 (9.9%) 2 / 30 (6.7%) 15 / 163 (9.2%) 9 / 164 (5.5%) 10 / 161 (6.2%) 22 / 333 (6.6%)
    Undergrad 80 / 142 (56%) 16 / 30 (53%) 78 / 163 (48%) 79 / 164 (48%) 86 / 161 (53%) 152 / 333 (46%)
Accept
    1 Cmpgn 39 / 142 (27%) 0 / 30 (0%) 74 / 163 (45%) 72 / 164 (44%) 37 / 161 (23%) 146 / 333 (44%)
    2 Cmpgn 29 / 142 (20%) 8 / 30 (27%) 64 / 163 (39%) 34 / 164 (21%) 43 / 161 (27%) 100 / 333 (30%)
    3 Cmpgn 32 / 142 (23%) 5 / 30 (17%) 10 / 163 (6.1%) 27 / 164 (16%) 37 / 161 (23%) 42 / 333 (13%)
    4 Cmpgn 32 / 142 (23%) 7 / 30 (23%) 12 / 163 (7.4%) 24 / 164 (15%) 34 / 161 (21%) 35 / 333 (11%)
    5 Cmpgn 10 / 142 (7.0%) 10 / 30 (33%) 3 / 163 (1.8%) 7 / 164 (4.3%) 10 / 161 (6.2%) 10 / 333 (3.0%)
AcceptBV 142 / 142 (100%) 30 / 30 (100%) 163 / 163 (100%) 164 / 164 (100%) 161 / 161 (100%) 333 / 333 (100%)
TOT_Prods 1,484.35 (509.56) 1,307.67 (555.42) 720.54 (696.27) 1,137.56 (597.45) 1,614.96 (420.62) 985.66 (719.39)
TOT_Grocery 1,406.70 (501.92) 1,241.27 (547.81) 653.60 (663.95) 1,089.20 (580.74) 1,538.45 (415.67) 924.41 (698.91)
1 Mean (SD); n / N (%)

Observations

  • Campaign 3 and the latest campaign show similarities in the customers that responded. Low income, low wine consumption, and overall lower purchase totals.

  • The percentage of people who graduated college is consistently high at 87%-95%.

  • Campaigns 1 and 5 seem to have targeted similar high income customers.

  • Campaigns 3 and the response campaign targets a variety of mixed income customers.

  • Campaigns 2 and 4 had an upper income response as well but not as high income as campaigns 1 and 5. Campaign 2 is less than a third of all other campaigns

2.2.2.1 Analysis of customers with children

Characteristic Campaign3, N = 1151 Campaign4, N = 971 Response, N = 1651
Income 43,443.47 (17,374.09) 62,493.96 (12,802.49) 45,585.50 (18,618.66)
MntWines 269.43 (346.82) 653.23 (367.81) 315.12 (392.55)
MntFruits 13.98 (22.30) 13.78 (21.03) 13.16 (20.24)
MntMeatProducts 87.17 (106.18) 119.86 (121.18) 104.76 (120.03)
MntFishProducts 18.16 (34.33) 23.75 (44.80) 19.96 (35.64)
MntSweetProducts 13.85 (26.08) 14.90 (29.40) 14.85 (25.23)
MntGoldProds 58.27 (61.05) 41.43 (39.95) 47.29 (50.07)
NumDealsPurchases 2.67 (1.61) 3.45 (2.23) 3.62 (2.31)
NumWebPurchases 4.27 (3.18) 5.85 (2.42) 4.73 (2.87)
NumCatalogPurchases 2.74 (2.69) 3.11 (2.03) 2.47 (2.46)
NumStorePurchases 4.19 (2.85) 7.34 (2.54) 4.62 (2.46)
NumWebVisitsMonth 6.70 (2.08) 6.24 (1.45) 7.10 (1.36)
Response 47 / 115 (41%) 24 / 97 (25%) 165 / 165 (100%)
Age 48.75 (10.71) 54.92 (8.71) 51.00 (10.38)
Customer_Days 2,517.83 (211.15) 2,534.20 (219.48) 2,640.51 (191.21)
education_Graduation 55 / 115 (48%) 49 / 97 (51%) 70 / 165 (42%)
MntTotal 402.59 (475.69) 825.52 (468.10) 467.85 (514.78)
TOT_Purchase 11.20 (7.74) 16.30 (4.42) 11.82 (6.55)
ChildrenBV 115 / 115 (100%) 97 / 97 (100%) 165 / 165 (100%)
SingleBV 42 / 115 (37%) 28 / 97 (29%) 74 / 165 (45%)
CollegeBV 102 / 115 (89%) 93 / 97 (96%) 155 / 165 (94%)
TOT_Prods 460.86 (509.54) 866.95 (485.97) 515.14 (537.63)
TOT_Grocery 402.59 (475.69) 825.52 (468.10) 467.85 (514.78)
1 Mean (SD); n / N (%)
## 
##  Shapiro-Wilk normality test
## 
## data:  df2[df2$Campaign == "Campaign3" & df2$Accepted_C == 1 & df2$ChildrenBV == 1, ]$Income
## W = 0.97862, p-value = 0.06271
## 
##  Shapiro-Wilk normality test
## 
## data:  df2[df2$Campaign == "Campaign4" & df2$Accepted_C == 1 & df2$ChildrenBV == 1, ]$Income
## W = 0.97776, p-value = 0.09855
## 
##  Shapiro-Wilk normality test
## 
## data:  df2[df2$Campaign == "Response" & df2$Accepted_C == 1 & df2$ChildrenBV == 1, ]$Income
## W = 0.97887, p-value = 0.01265

## Anova Table (Type III tests)
## 
## Response: Income
##                 Sum Sq  Df F value    Pr(>F)    
## (Intercept) 2.1704e+11   1 758.652 < 2.2e-16 ***
## Campaign    2.3107e+10   2  40.383 < 2.2e-16 ***
## Residuals   1.0700e+11 374                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The more times a customer accepts the campaign, the higher the income of the customer.

3 Predictive Model

3.1 LASSO

3.1.1 Training Model

set.seed(123)

training.samples <- dfmodel$Response %>% 
  createDataPartition(p = 0.8, list = FALSE)

train.data  <- dfmodel[training.samples, ]
test.data <- dfmodel[-training.samples, ]

3.1.2 LASSO model with reduced KPIs

x <- model.matrix(Response~., train.data)[,-22]
# Convert the outcome (class) to a numerical variable
y <- ifelse(train.data$Response == 1, 1, 0)

# Find the best lambda using cross-validation
set.seed(123) 
cv.lasso <- cv.glmnet(x, y, alpha = 1, family = "binomial")
plot(cv.lasso)

3.1.3 LASSO model predictions

# Fit the final model on the training data
model <- glmnet(x, y, alpha = 1, family = "binomial",
                lambda = cv.lasso$lambda.min)
# Display regression coefficients
coef(model)
## 43 x 1 sparse Matrix of class "dgCMatrix"
##                                 s0
## (Intercept)          -1.338483e+01
## (Intercept)           .           
## Income                .           
## Kidhome               2.541566e-01
## Teenhome             -8.944843e-01
## Recency              -3.063529e-02
## MntWines             -7.711868e-04
## MntFruits             2.721773e-03
## MntMeatProducts       2.371779e-03
## MntFishProducts      -2.282393e-03
## MntSweetProducts      .           
## MntGoldProds          1.838066e-03
## NumDealsPurchases     1.644908e-01
## NumWebPurchases       6.105248e-02
## NumCatalogPurchases   1.624755e-01
## NumStorePurchases    -2.042529e-01
## NumWebVisitsMonth     7.581466e-02
## AcceptedCmp3          1.436130e+00
## AcceptedCmp4          9.776652e-01
## AcceptedCmp5          1.494024e+00
## AcceptedCmp1          1.049988e+00
## AcceptedCmp2          1.761545e+00
## Age                   6.702086e-03
## Customer_Days         4.175057e-03
## marital_Divorced      5.122041e-02
## marital_Married       .           
## marital_Single        .           
## marital_Together      .           
## marital_Widow        -1.668337e-01
## education_2n_Cycle    .           
## education_Basic      -9.599181e-01
## education_Graduation  .           
## education_Master      4.117749e-01
## education_PhD         1.187721e+00
## MntTotal              .           
## TOT_Purchase          .           
## TOT_Children          .           
## ChildrenBV           -2.648628e-01
## SingleBV              1.160085e+00
## CollegeBV             5.142897e-02
## Accept5BV             5.759373e-01
## TOT_Prods             .           
## TOT_Grocery           .
# Make predictions on the test data
x.test <- model.matrix(Response ~., test.data)[,-22]
probabilities <- model %>% predict(newx = x.test)
predicted.classes <- ifelse(probabilities > 0.5, 1, 0)
# Model accuracy
observed.classes <- test.data$Response

mean(predicted.classes == observed.classes)
## [1] 0.8956916

3.1.4 Logistic model with all indicators

# Fit the model
full.model <- glm(Response ~., data = train.data, family = binomial)
# Make predictions
probabilities <- full.model %>% predict(test.data[,-22], type = "response")
predicted.classes <- ifelse(probabilities > 0.5, 1, 0)
# Model accuracy
observed.classes <- test.data$Response
mean(predicted.classes == observed.classes)
## [1] 0.893424

3.1.5 Results of LASSO model

Not only does the LASSO model use 12 fewer indicators than the full Logistic model, it also predicts a higher rate of the observations.