1 Introduction

1.1 Data set information

Bank marketing is now a vital tool for survival and is essentially dynamic in the actual world. Reusing business relationships is driving the growth of bank marketing, and the most successful banks are those who have strong client relationships (Subhani, et al.2021).

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.

The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y). In this report, we use bank-full.csv file.

These days, promoting investing within the managing an account industry is enormous, meaning that it is basic for banks to optimize showcasing methodologies and progress adequacy. Understanding customers’ require leads to more compelling showcasing plans, more astute item plans and more prominent client fulfillment.

knitr::include_graphics("mortgage-in-turkey-4-900x450.webp")

Data Columns
age : age of customer
job : type of job
marital : marital status (single, married, divorced)
education : education level (primary, secondary, tertiary)
default : has credit in default? (yes or no)
balance : average yearly balance
housing : has housing loan? (yes or no)
loan : has personal loan? (yes or no)
contact : contact communication type (categorical: ‘cellular’,‘telephone’)
day : last contact day of the week
month : last contact month of year
duration : last contact duration, in seconds (will be convert into minutes)
campaign -> num_contact : number of contacts performed during this campaign (includes last contact)
pdays : number of days that passed by after the client was last contacted from a previous campaign (numeric; -1 means client was not previously contacted)
previous : number of contacts performed before this campaign
poutcome : outcome of the previous marketing campaign
y -> subscribed : has the client subscribed a term deposit?

2 Data Preparation

2.1 Data Input and Checking Data

First we read the .csv file and use only the relevant columns, which then saved in a new data frame called marketing

marketing <- read.csv("bank-full.csv")
library(DT)
datatable(marketing, rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=T) )
## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html

Install library dplyrto change the name of column to make column names more intuitive using rename() function, manipulate character data and check the data structure using glimpse() function from dplyr package. Rename the y column to subscribed, campaign to num_contact

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
marketing <- marketing %>% rename(num_contact=campaign, subscribed=y)
head(marketing)
marketing %>% glimpse()
## Rows: 45,211
## Columns: 17
## $ age         <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57…
## $ job         <chr> "management", "technician", "entrepreneur", "blue-collar",…
## $ marital     <chr> "married", "single", "married", "married", "single", "marr…
## $ education   <chr> "tertiary", "secondary", "secondary", "unknown", "unknown"…
## $ default     <chr> "no", "no", "no", "no", "no", "no", "no", "yes", "no", "no…
## $ balance     <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, …
## $ housing     <chr> "yes", "yes", "yes", "yes", "no", "yes", "yes", "yes", "ye…
## $ loan        <chr> "no", "no", "yes", "no", "no", "no", "yes", "no", "no", "n…
## $ contact     <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "un…
## $ day         <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ month       <chr> "may", "may", "may", "may", "may", "may", "may", "may", "m…
## $ duration    <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 51…
## $ num_contact <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdays       <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1…
## $ previous    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ poutcome    <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "un…
## $ subscribed  <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no"…

marketing data frame has 45211 rows and 17 columns. We need to change the data type to the appropriate data type.

2.2 Data Cleansing

1. Change the data type Converted into factor:job, marital, education, month, day, poutcome, default, housing, and loan

marketing <- marketing %>% 
  mutate(job = as.factor(job),
         marital = as.factor(marital),
         education = as.factor(education),
         contact = as.factor(contact),
         poutcome = as.factor(poutcome),
         month = as.factor(month),
         day = as.factor(day),
         default = as.factor(default),
         housing = as.factor(housing),
         loan = as.factor(loan),
         subscribed = as.factor(subscribed)) %>% 
  glimpse()
## Rows: 45,211
## Columns: 17
## $ age         <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57…
## $ job         <fct> management, technician, entrepreneur, blue-collar, unknown…
## $ marital     <fct> married, single, married, married, single, married, single…
## $ education   <fct> tertiary, secondary, secondary, unknown, unknown, tertiary…
## $ default     <fct> no, no, no, no, no, no, no, yes, no, no, no, no, no, no, n…
## $ balance     <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, …
## $ housing     <fct> yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes, yes,…
## $ loan        <fct> no, no, yes, no, no, no, yes, no, no, no, no, no, no, no, …
## $ contact     <fct> unknown, unknown, unknown, unknown, unknown, unknown, unkn…
## $ day         <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ month       <fct> may, may, may, may, may, may, may, may, may, may, may, may…
## $ duration    <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 51…
## $ num_contact <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdays       <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1…
## $ previous    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ poutcome    <fct> unknown, unknown, unknown, unknown, unknown, unknown, unkn…
## $ subscribed  <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no…

2. Check missing value

anyNA(marketing)
## [1] FALSE
colSums(is.na(marketing))
##         age         job     marital   education     default     balance 
##           0           0           0           0           0           0 
##     housing        loan     contact         day       month    duration 
##           0           0           0           0           0           0 
## num_contact       pdays    previous    poutcome  subscribed 
##           0           0           0           0           0

Great!! No missing value.

3 Data Explanation

1. Check summary of data frame

summary(marketing)
##       age                 job           marital          education    
##  Min.   :18.00   blue-collar:9732   divorced: 5207   primary  : 6851  
##  1st Qu.:33.00   management :9458   married :27214   secondary:23202  
##  Median :39.00   technician :7597   single  :12790   tertiary :13301  
##  Mean   :40.94   admin.     :5171                    unknown  : 1857  
##  3rd Qu.:48.00   services   :4154                                     
##  Max.   :95.00   retired    :2264                                     
##                  (Other)    :6835                                     
##  default        balance       housing      loan            contact     
##  no :44396   Min.   : -8019   no :20081   no :37967   cellular :29285  
##  yes:  815   1st Qu.:    72   yes:25130   yes: 7244   telephone: 2906  
##              Median :   448                           unknown  :13020  
##              Mean   :  1362                                            
##              3rd Qu.:  1428                                            
##              Max.   :102127                                            
##                                                                        
##       day            month          duration       num_contact    
##  20     : 2752   may    :13766   Min.   :   0.0   Min.   : 1.000  
##  18     : 2308   jul    : 6895   1st Qu.: 103.0   1st Qu.: 1.000  
##  21     : 2026   aug    : 6247   Median : 180.0   Median : 2.000  
##  17     : 1939   jun    : 5341   Mean   : 258.2   Mean   : 2.764  
##  6      : 1932   nov    : 3970   3rd Qu.: 319.0   3rd Qu.: 3.000  
##  5      : 1910   apr    : 2932   Max.   :4918.0   Max.   :63.000  
##  (Other):32344   (Other): 6060                                    
##      pdays          previous           poutcome     subscribed 
##  Min.   : -1.0   Min.   :  0.0000   failure: 4901   no :39922  
##  1st Qu.: -1.0   1st Qu.:  0.0000   other  : 1840   yes: 5289  
##  Median : -1.0   Median :  0.0000   success: 1511              
##  Mean   : 40.2   Mean   :  0.5803   unknown:36959              
##  3rd Qu.: -1.0   3rd Qu.:  0.0000                              
##  Max.   :871.0   Max.   :275.0000                              
## 

2. Check any possibilities of outlier or uninformative data.
** Data in the contact column has a lot of “unknown” data making the contact column not very informative, so it can be deleted and save the new data as `marketing_new

marketing_new <- marketing %>% select(-contact)
head(marketing_new)

** Duration column should be better if in minutes to facilitate analysis. Then convert duration column into minutes.

marketing_new$duration <- round((marketing_new$duration/60), digits = 2)
marketing_new

** If you see data in the day and month columns have many other values, then these two columns can only be used as additional information ** In the pdays column, a value of -1 indicates that the client was not called in the previous campaign, so if we want to see insights in the previous campaign, a new data frame can be created. ** Because of any outlier in some columns, for data summary we prefer to use median than mean.

Marketing Data Summary:
1. The youngest bank’s customer is 18 years old and the oldest bank’s customer is 95 years old with mostly bank’s customers are 39 years old
2. Most of the bank’s customers work as blue collar workers.
3. Marital status of most of the bank’s customers is married.
4. Educational background of most of the bank’s customers is secondary level.
5. Most of the bank’s customers don’t have credit default.
6. The lowest average yearly balance is -8019, the highest average yearly balance is 102127 and median 448.
7. Most of the bank’s customers have housing loan (55.58%) and don’t have personal loan (83.98%).
8. During this campaign, the bank’s customers was contacted 2 times.
9. Most the bank’s customers have not placed term deposits.

2. Columns explanation
Some columns have values that need to be interpreted first because the explanation of the data on the source website is not given with a detailed explanation. In this data set, it will be divided based on the client’s participation in the previous campaign because there are some values that will be outlier if separated, even though the values still provide information. For example, in the pdays value, the pdays column will have a value of -1 if the client was not called in the previous campaign.

-The client data that was called in the previous campaign will be saved in the “client_previous” data frame.

client_previous <- marketing_new[marketing_new$pdays!= -1 , ]
client_previous

On the previous campaign, 8,257 clients have been contacted.

summary(client_previous)
##       age                 job           marital         education    default   
##  Min.   :18.00   management :1826   divorced: 931   primary  :1020   no :8200  
##  1st Qu.:33.00   blue-collar:1617   married :4745   secondary:4254   yes:  57  
##  Median :38.00   technician :1342   single  :2581   tertiary :2660             
##  Mean   :40.95   admin.     :1089                   unknown  : 323             
##  3rd Qu.:48.00   services   : 706                                              
##  Max.   :93.00   retired    : 488                                              
##                  (Other)    :1189                                              
##     balance      housing     loan           day           month     
##  Min.   :-1884   no :3115   no :7134   18     : 505   may    :2514  
##  1st Qu.:  168   yes:5142   yes:1123   15     : 466   nov    :1150  
##  Median :  602                         17     : 455   apr    :1118  
##  Mean   : 1557                         20     : 444   feb    : 924  
##  3rd Qu.: 1743                         13     : 430   aug    : 531  
##  Max.   :81204                         12     : 400   jan    : 498  
##                                        (Other):5557   (Other):1522  
##     duration       num_contact         pdays          previous      
##  Min.   : 0.020   Min.   : 1.000   Min.   :  1.0   Min.   :  1.000  
##  1st Qu.: 1.880   1st Qu.: 1.000   1st Qu.:133.0   1st Qu.:  1.000  
##  Median : 3.220   Median : 2.000   Median :194.0   Median :  2.000  
##  Mean   : 4.335   Mean   : 2.056   Mean   :224.6   Mean   :  3.178  
##  3rd Qu.: 5.400   3rd Qu.: 2.000   3rd Qu.:327.0   3rd Qu.:  4.000  
##  Max.   :36.980   Max.   :16.000   Max.   :871.0   Max.   :275.000  
##                                                                     
##     poutcome    subscribed
##  failure:4901   no :6352  
##  other  :1840   yes:1905  
##  success:1511             
##  unknown:   5             
##                           
##                           
## 

Based on summary of client_previous data frame,
- Median of last contact duration is 3.22 minutes, minimum value is 0.02 minutes and maximum value is 36.9 minutes.
- Median of contacts performed before this campaign, we can conclude that this happened during previous campaign, is 2 times.
- The category of poutcome (outcome of previous campaign) not only success and failure, but also other and unknown. Then we need to explore what the different of other and unknown in this data.
- From the previous campaign, only 1,511 from 8,257 clients that give success outcome. It’s only 18.3%. We need to explore further to find out what happened and make it a reference for the current campaign.
- From the 8,257 clients called in the previous campaign, only 1905 subscribed to the term deposit. Then we need to explore how many loyal clients (clients that give success on outcome and give yes on column subscribed).

-Explore what the difference of “other” and “unknown” value on poutcome column.

client_previous[client_previous$poutcome == "other" ,]
client_previous[client_previous$poutcome == "unknown" ,]

If we look at the subsetting results on the client_previous\(poutcome == "unknown" and client_previous\)poutcome == “other” data, there is no specific pattern that can be seen, but it can be concluded that when the previous campaign was ongoing, the client had not yet made a decision so the values “other” and “unknown” were given in the poutcome column.

-The client data that was not called in the previous campaign will be saved in the client_new data frame.

client_new <- marketing_new[marketing_new$pdays== -1 , ]
client_new
summary(client_new)
##       age                 job           marital          education    
##  Min.   :18.00   blue-collar:8115   divorced: 4276   primary  : 5831  
##  1st Qu.:33.00   management :7632   married :22469   secondary:18948  
##  Median :39.00   technician :6255   single  :10209   tertiary :10641  
##  Mean   :40.93   admin.     :4082                    unknown  : 1534  
##  3rd Qu.:49.00   services   :3448                                     
##  Max.   :95.00   retired    :1776                                     
##                  (Other)    :5646                                     
##  default        balance       housing      loan            day       
##  no :36196   Min.   : -8019   no :16966   no :30833   20     : 2308  
##  yes:  758   1st Qu.:    55   yes:19988   yes: 6121   18     : 1803  
##              Median :   414                           21     : 1738  
##              Mean   :  1319                           28     : 1653  
##              3rd Qu.:  1358                           6      : 1577  
##              Max.   :102127                           5      : 1545  
##                                                       (Other):26330  
##      month          duration       num_contact         pdays       previous
##  may    :11252   Min.   : 0.000   Min.   : 1.000   Min.   :-1   Min.   :0  
##  jul    : 6641   1st Qu.: 1.680   1st Qu.: 1.000   1st Qu.:-1   1st Qu.:0  
##  aug    : 5716   Median : 2.950   Median : 2.000   Median :-1   Median :0  
##  jun    : 5019   Mean   : 4.295   Mean   : 2.922   Mean   :-1   Mean   :0  
##  nov    : 2820   3rd Qu.: 5.300   3rd Qu.: 3.000   3rd Qu.:-1   3rd Qu.:0  
##  apr    : 1814   Max.   :81.970   Max.   :63.000   Max.   :-1   Max.   :0  
##  (Other): 3692                                                             
##     poutcome     subscribed 
##  failure:    0   no :33570  
##  other  :    0   yes: 3384  
##  success:    0              
##  unknown:36954              
##                             
##                             
## 

Based on summary of client_new data frame,
- Median of last contact duration is 177 second, minimum value is 0 second and maximum value is 2.95 minutes.
- Most of clients contact by telemarketing for current campaign as much 2 times
- Only 3384 from 36,954 clients (9.15%) that give success outcome or agree to subscribed the term deposit, it decreases from the previous campaign. We need to explore further, what influences clients to finally agree to participate in a campaign.

4 Data Exploration

4.1 Previous Campaign

4.1.1 Loyal Clients

How many loyal clients (clients that give success on poutcome and give yes on column y)?

loyal_clients <- client_previous[client_previous$poutcome=="success" & client_previous$subscribed=="yes" ,]
loyal_clients

Total loyal clients is 978 person (11.8%) from total 8,257 clients_previous and only 2.16% from total clients in this data set. Then let’s explore the demographic of loyal clients.

summary(loyal_clients)
##       age                 job          marital        education   default  
##  Min.   :18.00   management :266   divorced: 93   primary  : 81   no :978  
##  1st Qu.:32.00   technician :138   married :547   secondary:433   yes:  0  
##  Median :40.00   retired    :125   single  :338   tertiary :409            
##  Mean   :43.56   admin.     :122                  unknown  : 55            
##  3rd Qu.:54.00   blue-collar: 85                                           
##  Max.   :93.00   student    : 62                                           
##                  (Other)    :180                                           
##     balance        housing    loan          day          month    
##  Min.   : -309.0   no :729   no :934   12     : 65   aug    :132  
##  1st Qu.:  272.2   yes:249   yes: 44   13     : 60   may    :120  
##  Median :  872.0                       4      : 58   sep    :102  
##  Mean   : 2005.9                       9      : 50   feb    : 99  
##  3rd Qu.: 2240.0                       11     : 48   oct    : 95  
##  Max.   :81204.0                       15     : 46   apr    : 80  
##                                        (Other):651   (Other):350  
##     duration       num_contact         pdays          previous     
##  Min.   : 0.420   Min.   : 1.000   Min.   :  1.0   Min.   : 1.000  
##  1st Qu.: 3.435   1st Qu.: 1.000   1st Qu.: 92.0   1st Qu.: 1.000  
##  Median : 4.800   Median : 1.000   Median :175.0   Median : 2.000  
##  Mean   : 6.011   Mean   : 1.716   Mean   :161.5   Mean   : 3.109  
##  3rd Qu.: 7.270   3rd Qu.: 2.000   3rd Qu.:185.0   3rd Qu.: 4.000  
##  Max.   :34.370   Max.   :11.000   Max.   :561.0   Max.   :21.000  
##                                                                    
##     poutcome   subscribed
##  failure:  0   no :  0   
##  other  :  0   yes:978   
##  success:978             
##  unknown:  0             
##                          
##                          
## 

The most interesting thing to see from the above summary is that loyal clients have no credit defaults. Here we will analyze one by one per column.

-Age

hist(loyal_clients$age)

Mean the client’s age is 43 years old, median the client’s age is 40 years old. We can see from graphic above that mostly loyal client’s age between 25-60 years old.

-Job

prop.table(table(loyal_clients$job))
## 
##        admin.   blue-collar  entrepreneur     housemaid    management 
##   0.124744376   0.086912065   0.009202454   0.017382413   0.271983640 
##       retired self-employed      services       student    technician 
##   0.127811861   0.036809816   0.060327198   0.063394683   0.141104294 
##    unemployed       unknown 
##   0.051124744   0.009202454

Loyal clients have various job, 27.19% as management, 14,11% as technician, 12.78% already retired, 12.47% as admin, and other jobs with each amount below 10%.

-Marital Status

prop.table(table(loyal_clients$marital))
## 
##   divorced    married     single 
## 0.09509202 0.55930470 0.34560327

Mostly loyal clients is married with 55.93% then clients with marital status single with 34.56%. We can conclude that married client. Married clients are more concerned with long-term investments as additional income.

-Education

prop.table(table(loyal_clients$education))
## 
##    primary  secondary   tertiary    unknown 
## 0.08282209 0.44274029 0.41820041 0.05623722
plot(loyal_clients$education)

Mostly loyal clients have education background on secondary and tertiary level.

-Balance

plot(loyal_clients$balance,
     main = "Distribution of loyal client's balance",
     col = "maroon")
boxplot(loyal_clients$balance)

Most of loyal client’s balance is lower than 10000.

-Correlation of age and balance

plot(x = loyal_clients$age, y = loyal_clients$balance)

Judging from the graph above, it can be concluded that loyal clients, at any age, have an average annual balance of under 5000.

-Loan

prop.table(table(loyal_clients$housing))
## 
##        no       yes 
## 0.7453988 0.2546012
prop.table(table(loyal_clients$loan))
## 
##         no        yes 
## 0.95501022 0.04498978

Portuguese Bank loyal clients mostly don’t have any loan. Only 25% that have housing loan and only 4.49% that have personal loan.

-Correlation of duration and number of contact on “success” poutcome

A  <-  loyal_clients[loyal_clients$poutcome=="success",]
A
plot(x = A$duration, y = A$previous)

Loyal clients will accept the campaign if they called less than 5 times and less than 10 minutes.

Summary of demography loyal clients:
-Mostly loyal client’s age is between 25-60 years old.
-Loyal clients have various job.
-Mostly loyal clients have education background on secondary and tertiary level.
-Loyal clients, at any age, have an average annual balance of under 10000.
-Loyal clients mostly don’t have any loan.
-Loyal clients are mostly only contacted 1 time and no more than 10 times.
-Loyal clients prefer short call, less than 10 minutes.

4.2 Current Campaign

Then we will explore what happened in the current campaign, especially clients who rejected term deposits. First make new data frame with all yes in subscribed column as campaign_no.

campaign_no <- marketing_new[marketing_new$subscribed=="no" , ]
campaign_no

The total number of clients in this data set is 45,211 clients and 39,922 clients (88.3%) rejected term deposit. Then we need to explore the demograph of campaign_no data frame.

summary(campaign_no)
##       age                 job           marital          education    
##  Min.   :18.00   blue-collar:9024   divorced: 4585   primary  : 6260  
##  1st Qu.:33.00   management :8157   married :24459   secondary:20752  
##  Median :39.00   technician :6757   single  :10878   tertiary :11305  
##  Mean   :40.84   admin.     :4540                    unknown  : 1605  
##  3rd Qu.:48.00   services   :3785                                     
##  Max.   :95.00   retired    :1748                                     
##                  (Other)    :5911                                     
##  default        balance       housing      loan            day       
##  no :39159   Min.   : -8019   no :16727   no :33162   20     : 2560  
##  yes:  763   1st Qu.:    58   yes:23195   yes: 6760   18     : 2080  
##              Median :   417                           21     : 1825  
##              Mean   :  1304                           17     : 1763  
##              3rd Qu.:  1345                           6      : 1751  
##              Max.   :102127                           5      : 1695  
##                                                       (Other):28248  
##      month          duration       num_contact         pdays       
##  may    :12841   Min.   : 0.000   Min.   : 1.000   Min.   : -1.00  
##  jul    : 6268   1st Qu.: 1.580   1st Qu.: 1.000   1st Qu.: -1.00  
##  aug    : 5559   Median : 2.730   Median : 2.000   Median : -1.00  
##  jun    : 4795   Mean   : 3.686   Mean   : 2.846   Mean   : 36.42  
##  nov    : 3567   3rd Qu.: 4.650   3rd Qu.: 3.000   3rd Qu.: -1.00  
##  apr    : 2355   Max.   :81.970   Max.   :63.000   Max.   :871.00  
##  (Other): 4537                                                     
##     previous           poutcome     subscribed 
##  Min.   :  0.0000   failure: 4283   no :39922  
##  1st Qu.:  0.0000   other  : 1533   yes:    0  
##  Median :  0.0000   success:  533              
##  Mean   :  0.5021   unknown:33573              
##  3rd Qu.:  0.0000                              
##  Max.   :275.0000                              
## 

-Age

hist(campaign_no$age)

Age of most of clients rejected term deposits around 25-50 years old or productive age. The number of retired clients id very small.

-Job

prop.table(table(campaign_no$job))
## 
##        admin.   blue-collar  entrepreneur     housemaid    management 
##   0.113721757   0.226040780   0.034166625   0.028330244   0.204323431 
##       retired self-employed      services       student    technician 
##   0.043785381   0.034867993   0.094809879   0.016757677   0.169255047 
##    unemployed       unknown 
##   0.027578779   0.006362407

Job of most of clients rejected term deposits are blue-collar worker, management worker, technician and admin.

-Marital Status

prop.table(table(campaign_no$marital))
## 
##  divorced   married    single 
## 0.1148490 0.6126697 0.2724813

Marital status of most of clients rejected term deposits (failure) is married.

-Education

prop.table(table(campaign_no$education))
## 
##   primary secondary  tertiary   unknown 
## 0.1568058 0.5198136 0.2831772 0.0402034

Education level of most of clients rejected term deposits (failure) is secondary level.

**-Balance

plot(campaign_no$balance,
     main = "Distribution of failure client's balance",
     col = "maroon")
plot(x = campaign_no$age, y = campaign_no$balance, main = "Correlation of age and balance", col="maroon")

Yearly average balance of most of clients rejected term deposits (failure) is below 20000.

Correlation of age and balance: The increasing balance start from 25 years old and start to decrease on 60 years old. Clients age have high yearly balance is 30-60 years old.

-Loan

prop.table(table(campaign_no$housing))
## 
##       no      yes 
## 0.418992 0.581008
prop.table(table(campaign_no$loan))
## 
##        no       yes 
## 0.8306698 0.1693302

Most of clients rejected term deposits (failure) have housing loan (58%) and don’t have personal loan (83%).

-Correlation of duration and number of contact

plot(x = campaign_no$duration, y = campaign_no$num_contact, main= "Correlation of duration and number of contact", col ="maroon")
plot(x = campaign_no$balance, y = campaign_no$num_contact, main= "Correlation of balance and number of contact", col ="maroon")

The shorter the duration of the call, the more contacts will be made, but is it polite to make more than 5 calls for the same campaign? There are 5,727 clients called more than 5 times for this campaign. Clients will certainly be increasingly annoyed if they are called more often. Loyal clients data shows that they are contacted on average 1-2 times and they prefer short call. Furthermore, telemarketing team called low average balance clients, it seems like a wasted effort. Then marketing team should give a woodpecker before they call their clients.

nrow(campaign_no[campaign_no$num_contact>=5,])
## [1] 5727

5 Conclusion and Business Recommendation

In the current campaign, the percentage of clients deciding to subscribe to term deposits is still low. Furthermore, an evaluation needs to be carried out related to the segmentation of the target clients and other programs that support this campaign. Here are some recommendations from this telemarketing campaign data.

-The marketing team should dig deeper into the segments, preferences and demographics of loyal clients so that they can be in line with the strategy carried out by the telemarketing team.

-The marketing team should first provide a trigger to the client regarding the campaign being run such as in the form of advertising, broadcast messages or content on social media.

-In addition, the marketing team can also narrow the target market to clients whose loan period is almost over.

6 References