• Introduction
  • 1. Data Preparation
    • 1.1 Prerequisites
    • 1.2 Importing Data
    • 1.3 Data Inspection
    • 1.4 Missing Value
    • 1.5 Data Types
  • 2. Data Processing
    • 2.1 Data Ekploration and Visualization 1
    • 2.2 Data Ekploration and Visualization 2
    • 2.3 Data Ekploration and Visualization 3
    • 2.4 Data Ekploration and Visualization 4
  • 3. Conclusion & Business Recomendation
  • 4. Dataset

Introduction

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y). The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.

source: https://archive.ics.uci.edu/dataset/222/bank+marketing

Business Question

  1. How many times on average can a campaign reach success?

  2. Recommended best campaign segments?

  3. How much savings does someone who is successful in carrying out a campaign have on average?

Variable Description

1. Data Preparation

1.1 Prerequisites

1.2 Importing Data

data <- read.csv("data_input/bank.csv", header = FALSE, sep = ";", quote = "\"")
##   age         job marital education default balance housing loan  contact day
## 2  30  unemployed married   primary      no    1787      no   no cellular  19
## 3  33    services married secondary      no    4789     yes  yes cellular  11
## 4  35  management  single  tertiary      no    1350     yes   no cellular  16
## 5  30  management married  tertiary      no    1476     yes  yes  unknown   3
## 6  59 blue-collar married secondary      no       0     yes   no  unknown   5
## 7  35  management  single  tertiary      no     747      no   no cellular  23
##   month duration campaign pdays previous poutcome  y
## 2   oct       79        1    -1        0  unknown no
## 3   may      220        1   339        4  failure no
## 4   apr      185        1   330        1  failure no
## 5   jun      199        4    -1        0  unknown no
## 6   may      226        1    -1        0  unknown no
## 7   feb      141        2   176        3  failure no

First things first, let’s read our data.

1.3 Data Inspection

summary(data1)
##      age                job              marital           education        
##  Length:4521        Length:4521        Length:4521        Length:4521       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##    default            balance            housing              loan          
##  Length:4521        Length:4521        Length:4521        Length:4521       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##    contact              day               month             duration        
##  Length:4521        Length:4521        Length:4521        Length:4521       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##    campaign            pdays             previous           poutcome        
##  Length:4521        Length:4521        Length:4521        Length:4521       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##       y            
##  Length:4521       
##  Class :character  
##  Mode  :character

Based on the table above, let’s select the variables we need.

data2 <- data1 %>% select(
  age, job, marital, education, default, balance, housing, loan, duration, campaign, previous, y)

To make it easier to read, we changed some column names.

names(data2)[names(data2) == "y"] <- "result"
names(data2)[names(data2) == "previous"] <- "campaignbef"
names(data2)[names(data2) == "campaign"] <- "campaignnow"

Let’s check for missing values in our data.

1.4 Missing Value

colSums(is.na(data2))
##         age         job     marital   education     default     balance 
##           0           0           0           0           0           0 
##     housing        loan    duration campaignnow campaignbef      result 
##           0           0           0           0           0           0

To make processing easier, let’s change our data type.

1.5 Data Types

data3 <- data2 %>%
  mutate(age = as.numeric(age),  
         job = as.factor(job),    
         marital = as.factor(marital),  
         education = as.factor(education),  
         default = as.factor(default),   
         balance = as.numeric(balance),  
         housing = as.factor(housing),    
         loan = as.factor(loan),    
         duration = as.numeric(duration),  
         campaignnow = as.numeric(campaignnow),  
         campaignbef = as.numeric(campaignbef),  
         result = as.factor(result)    
         )

Once we have finished preparing our data, let’s start exploring our data.

2. Data Processing

2.1 Data Ekploration and Visualization 1

For the first exploration, we will try to answer the business question regarding How many times on average can a campaign achieve success? For this reason, we do aggregation for several variables.

camp <- data3 %>% 
 group_by(result) %>%
   summarise(avg_campaign = mean(campaignbef + campaignnow)) %>% 
  ungroup() %>% 
   mutate(avg_campaign = round(avg_campaign, 1))
camp
## # A tibble: 2 × 2
##   result avg_campaign
##   <fct>         <dbl>
## 1 no              3.3
## 2 yes             3.4

Visualize the data.

plot1 <- ggplot(camp, aes(x = result, y = avg_campaign, fill = result)) +
  geom_bar(stat = "identity", width = 0.1) + scale_fill_brewer(palette="Set2") +
  labs(x = "Result", y = NULL, title = "Average Campaign by Result") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot1

Based on the data processing above, from this plot we can see that on average 3.4 new campaigns can produce success until someone wants to carry out the goals of this campaign.

2.2 Data Ekploration and Visualization 2

To answer the second question, we try to aggregate based on variables that can group data that describes certain segments.

data3 %>%
  group_by(age, job, marital, education, balance, housing, loan) %>%
  summarise(
    result_yes = sum(result == "yes"),
    result_no = sum(result == "no")
  )
## `summarise()` has grouped output by 'age', 'job', 'marital', 'education',
## 'balance', 'housing'. You can override using the `.groups` argument.
## # A tibble: 4,464 × 9
## # Groups:   age, job, marital, education, balance, housing [4,456]
##      age job      marital education balance housing loan  result_yes result_no
##    <dbl> <fct>    <fct>   <fct>       <dbl> <fct>   <fct>      <int>     <int>
##  1    19 student  single  primary       103 no      no             1         0
##  2    19 student  single  secondary     302 no      no             1         0
##  3    19 student  single  unknown         0 no      no             0         1
##  4    19 student  single  unknown      1169 no      no             0         1
##  5    20 student  single  secondary     291 no      no             0         1
##  6    20 student  single  secondary     502 no      no             1         0
##  7    20 student  single  secondary    1191 no      no             0         1
##  8    21 services single  secondary     361 no      no             0         1
##  9    21 services single  secondary    1903 yes     no             0         1
## 10    21 student  single  secondary       6 no      no             0         1
## # ℹ 4,454 more rows

Let’s regroup for the job variable.

seg <- data3 %>%
  filter(result == "yes") %>% 
group_by(job, result) %>%
  summarise(count = n()) %>%
  arrange(desc(count)) %>%
  ungroup()
## `summarise()` has grouped output by 'job'. You can override using the `.groups`
## argument.
seg
## # A tibble: 12 × 3
##    job           result count
##    <fct>         <fct>  <int>
##  1 management    yes      131
##  2 technician    yes       83
##  3 blue-collar   yes       69
##  4 admin.        yes       58
##  5 retired       yes       54
##  6 services      yes       38
##  7 self-employed yes       20
##  8 student       yes       19
##  9 entrepreneur  yes       15
## 10 housemaid     yes       14
## 11 unemployed    yes       13
## 12 unknown       yes        7

Visualize for variable job based data.

plot2 <- ggplot(data = seg, aes(x = reorder(job, count), y = count, fill = result)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  scale_fill_brewer(palette="Set2") +
  labs(title = "Distribution of Job and Education with Result Counts",
       x = NULL,
       y = "Count") +
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "none") +
  coord_flip()
plot2

We see that the job segment provides a level of success in the campaign, especially management or managerial jobs.

2.3 Data Ekploration and Visualization 3

This time we aggregate based on savings.

bal <- data3 %>%
  group_by(result) %>%
  summarise(median_balance = median(balance, na.rm = TRUE)) %>%
  ungroup()
bal
## # A tibble: 2 × 2
##   result median_balance
##   <fct>           <dbl>
## 1 no               420.
## 2 yes              710

Here’s the visualization.

plot3 <- ggplot(bal, aes(x = result, y = median_balance, fill = result)) +
  geom_bar(stat = "identity") +scale_fill_brewer(palette="Set2") +
  labs(title = "Median Balance by Result",
       x = "Result",
       y = NULL) +
  theme_minimal()
plot3

Based on the median savings that have the potential for success in this campaign, it is worth 12,518,522 Rupiah or 710 Euros.

2.4 Data Ekploration and Visualization 4

Next, let’s try to aggregate it based on debt.

loany <- data3 %>%
  filter(result == "yes") %>% 
group_by(loan, housing, result) %>%
  summarise(count = n()) %>%
  arrange(desc(count)) %>%
  ungroup()
## `summarise()` has grouped output by 'loan', 'housing'. You can override using
## the `.groups` argument.
loany
## # A tibble: 4 × 4
##   loan  housing result count
##   <fct> <fct>   <fct>  <int>
## 1 no    no      yes      283
## 2 no    yes     yes      195
## 3 yes   yes     yes       25
## 4 yes   no      yes       18

It can be seen that the data still needs to be summarized again, let’s try manipulating it into several groups by aggregation.

loann <- data3 %>%
  filter(result == "no") %>% 
group_by(loan, housing, result) %>%
  summarise(count = n()) %>%
  arrange(desc(count)) %>%
  ungroup()
## `summarise()` has grouped output by 'loan', 'housing'. You can override using
## the `.groups` argument.
loann
## # A tibble: 4 × 4
##   loan  housing result count
##   <fct> <fct>   <fct>  <int>
## 1 no    yes     no      1958
## 2 no    no      no      1394
## 3 yes   yes     no       381
## 4 yes   no      no       267

After that, provide a new column with descriptive data to clarify the group.

loan1 <- loany %>%
  mutate(group_yes = paste0("housing - ", housing, " & loan - ", loan)) %>%
  select(group_yes, count)
loan1
## # A tibble: 4 × 2
##   group_yes                  count
##   <chr>                      <int>
## 1 housing - no & loan - no     283
## 2 housing - yes & loan - no    195
## 3 housing - yes & loan - yes    25
## 4 housing - no & loan - yes     18

Here’s the visualization.

# Compute the cumulative percentages (top of each rectangle)
loan1$fraction <- loan1$count / sum(loan1$count)

# Compute the cumulative percentages (top of each rectangle)
loan1$ymax <- cumsum(loan1$fraction)

# Compute the bottom of each rectangle
loan1$ymin <- c(0, head(loan1$ymax, n=-1))

# Compute label position
loan1$labelPosition <- (loan1$ymax + loan1$ymin) / 2

# Compute a good label
loan1$label <- paste0(loan1$count)

# Make the plot
plot4 <- ggplot(loan1, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=group_yes)) +
  geom_rect(color="black") +
  geom_text(x=2.5, aes(y=labelPosition, label=count), color="black", size=4) +
  scale_fill_brewer(palette="Set2") +
  scale_color_brewer(palette="Set1") +
  coord_polar(theta="y") +
  xlim(c(-1, 4)) +
  theme_void() +
  labs(title = "Doughnut Chart of Success Campaign Result Base on Loan")+
  guides(fill = guide_legend(title = "Group"))
plot4

Based on the data above, the best potential is someone who has no mortgage payments and no debt with a total of 283 data.

3. Conclusion & Business Recomendation

Based on the data above, we conclude in the form of recommendations as follows To increase success in this campaign, there are several steps that need to be taken. Firstly, increase the number of campaigns, based on the statistical data above to achieve success in a campaign, on average it requires 3.4 campaigns. It is unfortunate that the campaign failure ratio is also seen after 3.3 campaigns have been carried out, with a difference of 0.1 in the statistical value of success and failure, we need to improve other variables. Another variable needed is what segments can increase success. We tried to explore data with various variables, namely the ratio of job debt and savings. We concluded that the job variable provided quite significant value to the campaign success ratio. So increase the focus on job segmentation in the management/managerial section of the company and the savings value is above the median, namely 710. This will increase the success of the campaign.

4. Dataset