The core business of a financial institution can be broadly classified as lending and borrowing, Lending generates revenue to the bank in the form of interest from customers with some level of default risk involved. Borrowing, or rather attracting public’s savings into the bank is another source of revenue generation. Therefore, The research is focused on helping a corporate bank know their customers better. The traget is to identify the most possible and available customer to open a term deposit account.
library(dplyr)
apply(bank1,2,function(x)sum(is.na(x)))
## age job marital education default
## 0 0 0 0 0
## housing loan contact month day_of_week
## 0 0 0 0 0
## duration campaign pdays previous poutcome
## 0 0 0 0 0
## emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed
## 0 0 0 0 0
## y
## 0
# Results:No-NA value in the data
bank_data1 <- bank1 [, -c(2:9)]
bank_data2 <-bank_data1[,-c(2,7)]
bank_data3 <-bank_data2 %>%
mutate(y = ifelse(y == "no", 0, 1))
bank_data3<- bank_data2[,-c(11)]
# Separate the Category and Continuous Variables
glimpse(bank1)
## Observations: 41,188
## Variables: 21
## $ age <dbl> 56, 57, 37, 40, 56, 45, 59, 41, 24, 25, 41, 25, 2…
## $ job <chr> "housemaid", "services", "services", "admin.", "s…
## $ marital <chr> "married", "married", "married", "married", "marr…
## $ education <chr> "basic.4y", "high.school", "high.school", "basic.…
## $ default <chr> "no", "unknown", "no", "no", "no", "unknown", "no…
## $ housing <chr> "no", "no", "yes", "no", "no", "no", "no", "no", …
## $ loan <chr> "no", "no", "no", "no", "yes", "no", "no", "no", …
## $ contact <chr> "telephone", "telephone", "telephone", "telephone…
## $ month <chr> "may", "may", "may", "may", "may", "may", "may", …
## $ day_of_week <chr> "mon", "mon", "mon", "mon", "mon", "mon", "mon", …
## $ duration <dbl> 261, 149, 226, 151, 307, 198, 139, 217, 380, 50, …
## $ campaign <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdays <dbl> 999, 999, 999, 999, 999, 999, 999, 999, 999, 999,…
## $ previous <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ poutcome <chr> "nonexistent", "nonexistent", "nonexistent", "non…
## $ emp.var.rate <dbl> 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1,…
## $ cons.price.idx <dbl> 93.994, 93.994, 93.994, 93.994, 93.994, 93.994, 9…
## $ cons.conf.idx <dbl> -36.4, -36.4, -36.4, -36.4, -36.4, -36.4, -36.4, …
## $ euribor3m <dbl> 4.857, 4.857, 4.857, 4.857, 4.857, 4.857, 4.857, …
## $ nr.employed <dbl> 5191, 5191, 5191, 5191, 5191, 5191, 5191, 5191, 5…
## $ y <chr> "no", "no", "no", "no", "no", "no", "no", "no", "…
ggplot(bank1, aes(x =y))+ geom_histogram(stat="count")+labs(x="Term Deposit")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
# Probably 90% of customers least likely to subscribe the term deposit, a little portion would like to say yes
barplot(table(bank1$job),col="grey",ylab="No. of Clients",las=2,main="Job",cex.names = 1,cex.axis = 1)
# Admin, blue-collar and Technical job have the largest portion, housemaid and student probably only 1%, it might becasue they donot enough money.
barplot(table(bank1$education),col="grey",ylab="No. of Clients",las=2,main="Education",cex.names = 1,cex.axis = 1)
# University and high school student are mostly likely to subscribe to the term deposit, it is intertesting that higher degree of education might more likely to male a term deposit. so we can see here University student probably is one the larggest customer.the reasons are following, it might becasue they do earn some money for wokring outside but thye need money to pay their tution and life expenditure,
barplot(table(bank1$marital),col="grey",ylab="No. of Clients",las=2,main="Marital Status",cex.names = 1,cex.axis = 1)
# Married People are perfer to have a term deposit, following by signal, Divorced has the least.
barplot(table(bank1$month),col="grey",ylab="No. of Clients",las=2,main="Month",cex.names = 1,cex.axis = 1)
# May have the highest number of customer to subscribe the term deposit, december is the least
barplot(table(bank1$housing),col= "grey",ylab="No. of Clients",las=2,main="Housing",cex.names = 1,cex.axis = 1)
# People with housing and without housng loan almost same, so housing is not a mttter influence.
barplot(table(bank1$loan),col="grey",ylab="No. of Clients",las=2,main="Loan",cex.names = 1,cex.axis = 1)
#people without bnak loan are more likely to subscribe the term depoit, it is obvisous.
barplot(table(bank1$contact),col="grey",ylab="No. of Clients",las=2,main="Contact",cex.names = 1,cex.axis = 1)
# Best contact by Celluar
ggplot(aes(x= age,y = y, col = y), data = bank1) + geom_boxplot() +
ggtitle("Elder(48-50)are more likely to open an account")
ggplot(bank1, aes(x = previous, fill=y))+ geom_histogram(bins = 30)+ggtitle("Times connected VS Response")
ggplot(bank1, aes(x = duration, fill=y))+ geom_histogram(bins = 40)
# Duration is the biggest influencce on the response within the perfect timing.
cor_matrix <- cor(bank_data3)
corrplot(cor_matrix, method="circle")
#Duration is one of the important factors affacted with y response or not, nr.employed and previous have same influence direction on the response.
ggplot(aes(x= month,y = duration, col = y), data = bank1) + geom_boxplot() +
ggtitle("No response is independent of month Except for MAY and JULY")
ggplot(aes(x= marital,y = duration), data = bank1) + geom_point()
barplot(table(bank1$duration),col="red",ylab="No. of Clients",las=2,main="Duration",cex.names = 1,cex.axis = 1)
ggplot(aes(x= job,y = duration), data = bank1) + geom_point()
ggplot(aes(x= marital,y = duration), data = bank1) + geom_point()
Based on the our analysis, we find that marital status, education, job, duration have the most influence on the possibility to subscribe the term deposit.However, we want to know the dependence between each varible which will be influence the the customer’s decision. The Test as following: ##Hypthesis Testing test
### maritri and job( with 0.05 significance level)
CrossTable(bank2$job, bank2$marital, chisq = T)
## Warning in chisq.test(t, correct = FALSE, ...): Chi-squared approximation
## may be incorrect
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 41188
##
##
## | bank2$marital
## bank2$job | divorced | married | single | unknown | Row Total |
## --------------|-----------|-----------|-----------|-----------|-----------|
## admin. | 1280 | 5253 | 3875 | 14 | 10422 |
## | 10.942 | 176.340 | 306.959 | 1.925 | |
## | 0.123 | 0.504 | 0.372 | 0.001 | 0.253 |
## | 0.278 | 0.211 | 0.335 | 0.175 | |
## | 0.031 | 0.128 | 0.094 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## blue-collar | 728 | 6687 | 1825 | 14 | 9254 |
## | 91.674 | 210.675 | 230.535 | 0.879 | |
## | 0.079 | 0.723 | 0.197 | 0.002 | 0.225 |
## | 0.158 | 0.268 | 0.158 | 0.175 | |
## | 0.018 | 0.162 | 0.044 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## entrepreneur | 179 | 1071 | 203 | 3 | 1456 |
## | 1.563 | 40.877 | 103.703 | 0.010 | |
## | 0.123 | 0.736 | 0.139 | 0.002 | 0.035 |
## | 0.039 | 0.043 | 0.018 | 0.037 | |
## | 0.004 | 0.026 | 0.005 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## housemaid | 161 | 777 | 119 | 3 | 1060 |
## | 15.080 | 28.603 | 107.276 | 0.430 | |
## | 0.152 | 0.733 | 0.112 | 0.003 | 0.026 |
## | 0.035 | 0.031 | 0.010 | 0.037 | |
## | 0.004 | 0.019 | 0.003 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## management | 331 | 2089 | 501 | 3 | 2924 |
## | 0.039 | 57.619 | 124.870 | 1.264 | |
## | 0.113 | 0.714 | 0.171 | 0.001 | 0.071 |
## | 0.072 | 0.084 | 0.043 | 0.037 | |
## | 0.008 | 0.051 | 0.012 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## retired | 348 | 1274 | 93 | 5 | 1720 |
## | 125.394 | 52.157 | 314.981 | 0.824 | |
## | 0.202 | 0.741 | 0.054 | 0.003 | 0.042 |
## | 0.075 | 0.051 | 0.008 | 0.062 | |
## | 0.008 | 0.031 | 0.002 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## self-employed | 133 | 904 | 379 | 5 | 1421 |
## | 4.286 | 2.249 | 1.012 | 1.818 | |
## | 0.094 | 0.636 | 0.267 | 0.004 | 0.035 |
## | 0.029 | 0.036 | 0.033 | 0.062 | |
## | 0.003 | 0.022 | 0.009 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## services | 532 | 2294 | 1137 | 6 | 3969 |
## | 17.256 | 4.868 | 0.445 | 0.379 | |
## | 0.134 | 0.578 | 0.286 | 0.002 | 0.096 |
## | 0.115 | 0.092 | 0.098 | 0.075 | |
## | 0.013 | 0.056 | 0.028 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## student | 9 | 41 | 824 | 1 | 875 |
## | 80.804 | 450.746 | 1360.611 | 0.288 | |
## | 0.010 | 0.047 | 0.942 | 0.001 | 0.021 |
## | 0.002 | 0.002 | 0.071 | 0.013 | |
## | 0.000 | 0.001 | 0.020 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## technician | 774 | 3670 | 2287 | 12 | 6743 |
## | 0.476 | 41.398 | 81.625 | 0.092 | |
## | 0.115 | 0.544 | 0.339 | 0.002 | 0.164 |
## | 0.168 | 0.147 | 0.198 | 0.150 | |
## | 0.019 | 0.089 | 0.056 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## unemployed | 124 | 634 | 251 | 5 | 1014 |
## | 0.963 | 0.672 | 4.009 | 4.663 | |
## | 0.122 | 0.625 | 0.248 | 0.005 | 0.025 |
## | 0.027 | 0.025 | 0.022 | 0.062 | |
## | 0.003 | 0.015 | 0.006 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## unknown | 13 | 234 | 74 | 9 | 330 |
## | 15.525 | 5.882 | 3.766 | 109.013 | |
## | 0.039 | 0.709 | 0.224 | 0.027 | 0.008 |
## | 0.003 | 0.009 | 0.006 | 0.113 | |
## | 0.000 | 0.006 | 0.002 | 0.000 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
## Column Total | 4612 | 24928 | 11568 | 80 | 41188 |
## | 0.112 | 0.605 | 0.281 | 0.002 | |
## --------------|-----------|-----------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 4197.469 d.f. = 33 p = 0
##
##
##
# from the test, we can conclude that married people, no matter what kind of job, it has the highest possibility to subscriber the term deposit almost 70%, and married student and unemployed have the lowest percentage to subscribe the term deposit, it is reasonable.
CrossTable(bank2$education, bank2$marital, chisq = T)
## Warning in chisq.test(t, correct = FALSE, ...): Chi-squared approximation
## may be incorrect
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 41188
##
##
## | bank2$marital
## bank2$education | divorced | married | single | unknown | Row Total |
## --------------------|-----------|-----------|-----------|-----------|-----------|
## basic.4y | 489 | 3228 | 453 | 6 | 4176 |
## | 0.979 | 194.196 | 441.829 | 0.549 | |
## | 0.117 | 0.773 | 0.108 | 0.001 | 0.101 |
## | 0.106 | 0.129 | 0.039 | 0.075 | |
## | 0.012 | 0.078 | 0.011 | 0.000 | |
## --------------------|-----------|-----------|-----------|-----------|-----------|
## basic.6y | 182 | 1767 | 337 | 6 | 2292 |
## | 21.711 | 104.000 | 146.152 | 0.538 | |
## | 0.079 | 0.771 | 0.147 | 0.003 | 0.056 |
## | 0.039 | 0.071 | 0.029 | 0.075 | |
## | 0.004 | 0.043 | 0.008 | 0.000 | |
## --------------------|-----------|-----------|-----------|-----------|-----------|
## basic.9y | 565 | 4156 | 1316 | 8 | 6045 |
## | 18.494 | 67.628 | 85.855 | 1.192 | |
## | 0.093 | 0.688 | 0.218 | 0.001 | 0.147 |
## | 0.123 | 0.167 | 0.114 | 0.100 | |
## | 0.014 | 0.101 | 0.032 | 0.000 | |
## --------------------|-----------|-----------|-----------|-----------|-----------|
## high.school | 1193 | 5158 | 3150 | 14 | 9515 |
## | 15.273 | 62.663 | 85.367 | 1.087 | |
## | 0.125 | 0.542 | 0.331 | 0.001 | 0.231 |
## | 0.259 | 0.207 | 0.272 | 0.175 | |
## | 0.029 | 0.125 | 0.076 | 0.000 | |
## --------------------|-----------|-----------|-----------|-----------|-----------|
## illiterate | 2 | 15 | 1 | 0 | 18 |
## | 0.000 | 1.548 | 3.253 | 0.035 | |
## | 0.111 | 0.833 | 0.056 | 0.000 | 0.000 |
## | 0.000 | 0.001 | 0.000 | 0.000 | |
## | 0.000 | 0.000 | 0.000 | 0.000 | |
## --------------------|-----------|-----------|-----------|-----------|-----------|
## professional.course | 657 | 3156 | 1424 | 6 | 5243 |
## | 8.327 | 0.093 | 1.600 | 1.719 | |
## | 0.125 | 0.602 | 0.272 | 0.001 | 0.127 |
## | 0.142 | 0.127 | 0.123 | 0.075 | |
## | 0.016 | 0.077 | 0.035 | 0.000 | |
## --------------------|-----------|-----------|-----------|-----------|-----------|
## university.degree | 1337 | 6394 | 4406 | 31 | 12168 |
## | 0.477 | 127.863 | 285.929 | 2.296 | |
## | 0.110 | 0.525 | 0.362 | 0.003 | 0.295 |
## | 0.290 | 0.256 | 0.381 | 0.388 | |
## | 0.032 | 0.155 | 0.107 | 0.001 | |
## --------------------|-----------|-----------|-----------|-----------|-----------|
## unknown | 187 | 1054 | 481 | 9 | 1731 |
## | 0.241 | 0.039 | 0.055 | 9.454 | |
## | 0.108 | 0.609 | 0.278 | 0.005 | 0.042 |
## | 0.041 | 0.042 | 0.042 | 0.113 | |
## | 0.005 | 0.026 | 0.012 | 0.000 | |
## --------------------|-----------|-----------|-----------|-----------|-----------|
## Column Total | 4612 | 24928 | 11568 | 80 | 41188 |
## | 0.112 | 0.605 | 0.281 | 0.002 | |
## --------------------|-----------|-----------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 1690.44 d.f. = 21 p = 0
##
##
##
# Again Married people have the highest possible to subscribe the term deposit, there is no defference on their education.Specially, there is one variable need to mention that Married Professional have the lowest percenage than lower education people.
## CP nsplit rel error xerror xstd
## 1 0.07424569 0 1.0000000 1.0000000 0.01382889
## 2 0.02295259 2 0.8515086 0.8577586 0.01292279
## 3 0.01594828 4 0.8056034 0.7987069 0.01251586
## 4 0.01000000 6 0.7737069 0.7818966 0.01239633
Based on our analysis, we have two conclusions as following: First:Target customers:Married, university students or admin job. Almost of them have the highest possibility to subscribe the term deposit. Moreever, fromt he model, we can see the best follow up duration 250 after last contract 17 days.