Introduction:

The core business of a financial institution can be broadly classified as lending and borrowing, Lending generates revenue to the bank in the form of interest from customers with some level of default risk involved. Borrowing, or rather attracting public’s savings into the bank is another source of revenue generation. Therefore, The research is focused on helping a corporate bank know their customers better. The traget is to identify the most possible and available customer to open a term deposit account.

Data Preparation

Check non-variables

library(dplyr)
apply(bank1,2,function(x)sum(is.na(x)))
##            age            job        marital      education        default 
##              0              0              0              0              0 
##        housing           loan        contact          month    day_of_week 
##              0              0              0              0              0 
##       duration       campaign          pdays       previous       poutcome 
##              0              0              0              0              0 
##   emp.var.rate cons.price.idx  cons.conf.idx      euribor3m    nr.employed 
##              0              0              0              0              0 
##              y 
##              0
# Results:No-NA value in the data

bank_data1 <- bank1 [, -c(2:9)]
bank_data2 <-bank_data1[,-c(2,7)]  
bank_data3 <-bank_data2 %>% 
  mutate(y = ifelse(y == "no", 0, 1))
bank_data3<- bank_data2[,-c(11)]
# Separate the Category and Continuous Variables

Data Visualization

glimpse(bank1)
## Observations: 41,188
## Variables: 21
## $ age            <dbl> 56, 57, 37, 40, 56, 45, 59, 41, 24, 25, 41, 25, 2…
## $ job            <chr> "housemaid", "services", "services", "admin.", "s…
## $ marital        <chr> "married", "married", "married", "married", "marr…
## $ education      <chr> "basic.4y", "high.school", "high.school", "basic.…
## $ default        <chr> "no", "unknown", "no", "no", "no", "unknown", "no…
## $ housing        <chr> "no", "no", "yes", "no", "no", "no", "no", "no", …
## $ loan           <chr> "no", "no", "no", "no", "yes", "no", "no", "no", …
## $ contact        <chr> "telephone", "telephone", "telephone", "telephone…
## $ month          <chr> "may", "may", "may", "may", "may", "may", "may", …
## $ day_of_week    <chr> "mon", "mon", "mon", "mon", "mon", "mon", "mon", …
## $ duration       <dbl> 261, 149, 226, 151, 307, 198, 139, 217, 380, 50, …
## $ campaign       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdays          <dbl> 999, 999, 999, 999, 999, 999, 999, 999, 999, 999,…
## $ previous       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ poutcome       <chr> "nonexistent", "nonexistent", "nonexistent", "non…
## $ emp.var.rate   <dbl> 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1,…
## $ cons.price.idx <dbl> 93.994, 93.994, 93.994, 93.994, 93.994, 93.994, 9…
## $ cons.conf.idx  <dbl> -36.4, -36.4, -36.4, -36.4, -36.4, -36.4, -36.4, …
## $ euribor3m      <dbl> 4.857, 4.857, 4.857, 4.857, 4.857, 4.857, 4.857, …
## $ nr.employed    <dbl> 5191, 5191, 5191, 5191, 5191, 5191, 5191, 5191, 5…
## $ y              <chr> "no", "no", "no", "no", "no", "no", "no", "no", "…
ggplot(bank1, aes(x =y))+ geom_histogram(stat="count")+labs(x="Term Deposit")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

# Probably 90% of customers least likely to subscribe the term deposit, a little portion would like to say yes

Data Analysis

Data Analysis by Category variables

barplot(table(bank1$job),col="grey",ylab="No. of Clients",las=2,main="Job",cex.names = 1,cex.axis = 1)

# Admin, blue-collar and Technical job have the largest portion, housemaid and student probably only 1%, it might becasue they donot enough money.

barplot(table(bank1$education),col="grey",ylab="No. of Clients",las=2,main="Education",cex.names = 1,cex.axis = 1)

# University and high school student are mostly likely to subscribe to the term deposit, it is intertesting that higher degree of education might more likely to male a term deposit. so we can see here University student probably is one the larggest customer.the reasons are following, it might becasue they do earn some money for wokring outside but thye need money to pay their tution and life expenditure, 

barplot(table(bank1$marital),col="grey",ylab="No. of Clients",las=2,main="Marital Status",cex.names = 1,cex.axis = 1)

# Married People are perfer to have a term deposit, following by signal, Divorced has the least.

barplot(table(bank1$month),col="grey",ylab="No. of Clients",las=2,main="Month",cex.names = 1,cex.axis = 1)

# May have the highest number of customer to subscribe the term deposit, december is the least

barplot(table(bank1$housing),col= "grey",ylab="No. of Clients",las=2,main="Housing",cex.names = 1,cex.axis = 1)

# People with housing and without housng loan almost same, so housing is not a mttter influence.

barplot(table(bank1$loan),col="grey",ylab="No. of Clients",las=2,main="Loan",cex.names = 1,cex.axis = 1)

#people without bnak loan are more likely to subscribe the term depoit, it is obvisous. 

barplot(table(bank1$contact),col="grey",ylab="No. of Clients",las=2,main="Contact",cex.names = 1,cex.axis = 1)

# Best contact by Celluar

Data Analysis by continuous variables

ggplot(aes(x= age,y = y, col = y), data = bank1) + geom_boxplot() +
  ggtitle("Elder(48-50)are more likely to open an account")

ggplot(bank1, aes(x = previous, fill=y))+ geom_histogram(bins = 30)+ggtitle("Times connected VS Response")

ggplot(bank1, aes(x = duration, fill=y))+ geom_histogram(bins = 40)

# Duration is the biggest influencce on the response within the perfect timing.

cor_matrix <- cor(bank_data3)
corrplot(cor_matrix, method="circle")

#Duration is one of the important factors affacted with y response or not, nr.employed and previous have same influence direction on the response.

Data Comparision

ggplot(aes(x= month,y = duration, col = y), data = bank1) + geom_boxplot() +
ggtitle("No response is independent of month Except for MAY and JULY")

ggplot(aes(x= marital,y = duration), data = bank1) + geom_point()

barplot(table(bank1$duration),col="red",ylab="No. of Clients",las=2,main="Duration",cex.names = 1,cex.axis = 1)

ggplot(aes(x= job,y = duration), data = bank1) + geom_point()

ggplot(aes(x= marital,y = duration), data = bank1) + geom_point() 

Findings

Based on the our analysis, we find that marital status, education, job, duration have the most influence on the possibility to subscribe the term deposit.However, we want to know the dependence between each varible which will be influence the the customer’s decision. The Test as following: ##Hypthesis Testing test

### maritri and job( with 0.05 significance level)
CrossTable(bank2$job, bank2$marital, chisq = T)
## Warning in chisq.test(t, correct = FALSE, ...): Chi-squared approximation
## may be incorrect
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  41188 
## 
##  
##               | bank2$marital 
##     bank2$job |  divorced |   married |    single |   unknown | Row Total | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##        admin. |      1280 |      5253 |      3875 |        14 |     10422 | 
##               |    10.942 |   176.340 |   306.959 |     1.925 |           | 
##               |     0.123 |     0.504 |     0.372 |     0.001 |     0.253 | 
##               |     0.278 |     0.211 |     0.335 |     0.175 |           | 
##               |     0.031 |     0.128 |     0.094 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##   blue-collar |       728 |      6687 |      1825 |        14 |      9254 | 
##               |    91.674 |   210.675 |   230.535 |     0.879 |           | 
##               |     0.079 |     0.723 |     0.197 |     0.002 |     0.225 | 
##               |     0.158 |     0.268 |     0.158 |     0.175 |           | 
##               |     0.018 |     0.162 |     0.044 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##  entrepreneur |       179 |      1071 |       203 |         3 |      1456 | 
##               |     1.563 |    40.877 |   103.703 |     0.010 |           | 
##               |     0.123 |     0.736 |     0.139 |     0.002 |     0.035 | 
##               |     0.039 |     0.043 |     0.018 |     0.037 |           | 
##               |     0.004 |     0.026 |     0.005 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##     housemaid |       161 |       777 |       119 |         3 |      1060 | 
##               |    15.080 |    28.603 |   107.276 |     0.430 |           | 
##               |     0.152 |     0.733 |     0.112 |     0.003 |     0.026 | 
##               |     0.035 |     0.031 |     0.010 |     0.037 |           | 
##               |     0.004 |     0.019 |     0.003 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##    management |       331 |      2089 |       501 |         3 |      2924 | 
##               |     0.039 |    57.619 |   124.870 |     1.264 |           | 
##               |     0.113 |     0.714 |     0.171 |     0.001 |     0.071 | 
##               |     0.072 |     0.084 |     0.043 |     0.037 |           | 
##               |     0.008 |     0.051 |     0.012 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##       retired |       348 |      1274 |        93 |         5 |      1720 | 
##               |   125.394 |    52.157 |   314.981 |     0.824 |           | 
##               |     0.202 |     0.741 |     0.054 |     0.003 |     0.042 | 
##               |     0.075 |     0.051 |     0.008 |     0.062 |           | 
##               |     0.008 |     0.031 |     0.002 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
## self-employed |       133 |       904 |       379 |         5 |      1421 | 
##               |     4.286 |     2.249 |     1.012 |     1.818 |           | 
##               |     0.094 |     0.636 |     0.267 |     0.004 |     0.035 | 
##               |     0.029 |     0.036 |     0.033 |     0.062 |           | 
##               |     0.003 |     0.022 |     0.009 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##      services |       532 |      2294 |      1137 |         6 |      3969 | 
##               |    17.256 |     4.868 |     0.445 |     0.379 |           | 
##               |     0.134 |     0.578 |     0.286 |     0.002 |     0.096 | 
##               |     0.115 |     0.092 |     0.098 |     0.075 |           | 
##               |     0.013 |     0.056 |     0.028 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##       student |         9 |        41 |       824 |         1 |       875 | 
##               |    80.804 |   450.746 |  1360.611 |     0.288 |           | 
##               |     0.010 |     0.047 |     0.942 |     0.001 |     0.021 | 
##               |     0.002 |     0.002 |     0.071 |     0.013 |           | 
##               |     0.000 |     0.001 |     0.020 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##    technician |       774 |      3670 |      2287 |        12 |      6743 | 
##               |     0.476 |    41.398 |    81.625 |     0.092 |           | 
##               |     0.115 |     0.544 |     0.339 |     0.002 |     0.164 | 
##               |     0.168 |     0.147 |     0.198 |     0.150 |           | 
##               |     0.019 |     0.089 |     0.056 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##    unemployed |       124 |       634 |       251 |         5 |      1014 | 
##               |     0.963 |     0.672 |     4.009 |     4.663 |           | 
##               |     0.122 |     0.625 |     0.248 |     0.005 |     0.025 | 
##               |     0.027 |     0.025 |     0.022 |     0.062 |           | 
##               |     0.003 |     0.015 |     0.006 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##       unknown |        13 |       234 |        74 |         9 |       330 | 
##               |    15.525 |     5.882 |     3.766 |   109.013 |           | 
##               |     0.039 |     0.709 |     0.224 |     0.027 |     0.008 | 
##               |     0.003 |     0.009 |     0.006 |     0.113 |           | 
##               |     0.000 |     0.006 |     0.002 |     0.000 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
##  Column Total |      4612 |     24928 |     11568 |        80 |     41188 | 
##               |     0.112 |     0.605 |     0.281 |     0.002 |           | 
## --------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  4197.469     d.f. =  33     p =  0 
## 
## 
## 
# from the test, we can conclude that married people, no matter what kind of job, it has the highest possibility to subscriber the term deposit almost 70%, and married student and unemployed have the lowest percentage to subscribe the term deposit, it is reasonable.
CrossTable(bank2$education, bank2$marital, chisq = T)
## Warning in chisq.test(t, correct = FALSE, ...): Chi-squared approximation
## may be incorrect
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  41188 
## 
##  
##                     | bank2$marital 
##     bank2$education |  divorced |   married |    single |   unknown | Row Total | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
##            basic.4y |       489 |      3228 |       453 |         6 |      4176 | 
##                     |     0.979 |   194.196 |   441.829 |     0.549 |           | 
##                     |     0.117 |     0.773 |     0.108 |     0.001 |     0.101 | 
##                     |     0.106 |     0.129 |     0.039 |     0.075 |           | 
##                     |     0.012 |     0.078 |     0.011 |     0.000 |           | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
##            basic.6y |       182 |      1767 |       337 |         6 |      2292 | 
##                     |    21.711 |   104.000 |   146.152 |     0.538 |           | 
##                     |     0.079 |     0.771 |     0.147 |     0.003 |     0.056 | 
##                     |     0.039 |     0.071 |     0.029 |     0.075 |           | 
##                     |     0.004 |     0.043 |     0.008 |     0.000 |           | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
##            basic.9y |       565 |      4156 |      1316 |         8 |      6045 | 
##                     |    18.494 |    67.628 |    85.855 |     1.192 |           | 
##                     |     0.093 |     0.688 |     0.218 |     0.001 |     0.147 | 
##                     |     0.123 |     0.167 |     0.114 |     0.100 |           | 
##                     |     0.014 |     0.101 |     0.032 |     0.000 |           | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
##         high.school |      1193 |      5158 |      3150 |        14 |      9515 | 
##                     |    15.273 |    62.663 |    85.367 |     1.087 |           | 
##                     |     0.125 |     0.542 |     0.331 |     0.001 |     0.231 | 
##                     |     0.259 |     0.207 |     0.272 |     0.175 |           | 
##                     |     0.029 |     0.125 |     0.076 |     0.000 |           | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
##          illiterate |         2 |        15 |         1 |         0 |        18 | 
##                     |     0.000 |     1.548 |     3.253 |     0.035 |           | 
##                     |     0.111 |     0.833 |     0.056 |     0.000 |     0.000 | 
##                     |     0.000 |     0.001 |     0.000 |     0.000 |           | 
##                     |     0.000 |     0.000 |     0.000 |     0.000 |           | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
## professional.course |       657 |      3156 |      1424 |         6 |      5243 | 
##                     |     8.327 |     0.093 |     1.600 |     1.719 |           | 
##                     |     0.125 |     0.602 |     0.272 |     0.001 |     0.127 | 
##                     |     0.142 |     0.127 |     0.123 |     0.075 |           | 
##                     |     0.016 |     0.077 |     0.035 |     0.000 |           | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
##   university.degree |      1337 |      6394 |      4406 |        31 |     12168 | 
##                     |     0.477 |   127.863 |   285.929 |     2.296 |           | 
##                     |     0.110 |     0.525 |     0.362 |     0.003 |     0.295 | 
##                     |     0.290 |     0.256 |     0.381 |     0.388 |           | 
##                     |     0.032 |     0.155 |     0.107 |     0.001 |           | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
##             unknown |       187 |      1054 |       481 |         9 |      1731 | 
##                     |     0.241 |     0.039 |     0.055 |     9.454 |           | 
##                     |     0.108 |     0.609 |     0.278 |     0.005 |     0.042 | 
##                     |     0.041 |     0.042 |     0.042 |     0.113 |           | 
##                     |     0.005 |     0.026 |     0.012 |     0.000 |           | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
##        Column Total |      4612 |     24928 |     11568 |        80 |     41188 | 
##                     |     0.112 |     0.605 |     0.281 |     0.002 |           | 
## --------------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  1690.44     d.f. =  21     p =  0 
## 
## 
## 
# Again Married people have the highest possible to subscribe the term deposit, there is no defference on their education.Specially, there is one variable need to mention that Married Professional have the lowest percenage than lower education people.

Model Valuation

##           CP nsplit rel error    xerror       xstd
## 1 0.07424569      0 1.0000000 1.0000000 0.01382889
## 2 0.02295259      2 0.8515086 0.8577586 0.01292279
## 3 0.01594828      4 0.8056034 0.7987069 0.01251586
## 4 0.01000000      6 0.7737069 0.7818966 0.01239633

Conclusion

Based on our analysis, we have two conclusions as following: First:Target customers:Married, university students or admin job. Almost of them have the highest possibility to subscribe the term deposit. Moreever, fromt he model, we can see the best follow up duration 250 after last contract 17 days.