Bank marketing is now a vital tool for survival and is essentially dynamic in the actual world. Reusing business relationships is driving the growth of bank marketing, and the most successful banks are those who have strong client relationships (Subhani, et al.2021).
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.
The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y). In this report, we use bank-full.csv file.
These days, promoting investing within the managing an account industry is enormous, meaning that it is basic for banks to optimize showcasing methodologies and progress adequacy. Understanding customers’ require leads to more compelling showcasing plans, more astute item plans and more prominent client fulfillment.
Data Columns
age : age of customer
job : type of job
marital : marital status (single, married, divorced)
education : education level (primary, secondary,
tertiary)
default : has credit in default? (yes or no)
balance : average yearly balance
housing : has housing loan? (yes or no)
loan : has personal loan? (yes or no)
contact : contact communication type (categorical:
‘cellular’,‘telephone’)
day : last contact day of the week
month : last contact month of year
duration : last contact duration, in seconds (will be
convert into minutes)
campaign -> num_contact : number of contacts performed
during this campaign (includes last contact)
pdays : number of days that passed by after the client was
last contacted from a previous campaign (numeric; -1 means client was
not previously contacted)
previous : number of contacts performed before this
campaign
poutcome : outcome of the previous marketing campaign
y -> subscribed : has the client subscribed a term
deposit?
First we read the .csv file and use only the relevant columns, which
then saved in a new data frame called marketing
library(DT)
datatable(marketing, rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=T) )## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html
Install library dplyrto change the name of column to
make column names more intuitive using rename() function,
manipulate character data and check the data structure using
glimpse() function from dplyr package. Rename
the y column to subscribed,
campaign to num_contact
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Rows: 45,211
## Columns: 17
## $ age <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57…
## $ job <chr> "management", "technician", "entrepreneur", "blue-collar",…
## $ marital <chr> "married", "single", "married", "married", "single", "marr…
## $ education <chr> "tertiary", "secondary", "secondary", "unknown", "unknown"…
## $ default <chr> "no", "no", "no", "no", "no", "no", "no", "yes", "no", "no…
## $ balance <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, …
## $ housing <chr> "yes", "yes", "yes", "yes", "no", "yes", "yes", "yes", "ye…
## $ loan <chr> "no", "no", "yes", "no", "no", "no", "yes", "no", "no", "n…
## $ contact <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "un…
## $ day <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ month <chr> "may", "may", "may", "may", "may", "may", "may", "may", "m…
## $ duration <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 51…
## $ num_contact <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdays <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1…
## $ previous <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ poutcome <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "un…
## $ subscribed <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no"…
marketing data frame has 45211 rows and 17 columns. We
need to change the data type to the appropriate data type.
1. Change the data type Converted into factor:job, marital, education, month, day, poutcome, default, housing, and loan
marketing <- marketing %>%
mutate(job = as.factor(job),
marital = as.factor(marital),
education = as.factor(education),
contact = as.factor(contact),
poutcome = as.factor(poutcome),
month = as.factor(month),
day = as.factor(day),
default = as.factor(default),
housing = as.factor(housing),
loan = as.factor(loan),
subscribed = as.factor(subscribed)) %>%
glimpse()## Rows: 45,211
## Columns: 17
## $ age <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57…
## $ job <fct> management, technician, entrepreneur, blue-collar, unknown…
## $ marital <fct> married, single, married, married, single, married, single…
## $ education <fct> tertiary, secondary, secondary, unknown, unknown, tertiary…
## $ default <fct> no, no, no, no, no, no, no, yes, no, no, no, no, no, no, n…
## $ balance <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, …
## $ housing <fct> yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes, yes,…
## $ loan <fct> no, no, yes, no, no, no, yes, no, no, no, no, no, no, no, …
## $ contact <fct> unknown, unknown, unknown, unknown, unknown, unknown, unkn…
## $ day <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ month <fct> may, may, may, may, may, may, may, may, may, may, may, may…
## $ duration <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 51…
## $ num_contact <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pdays <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1…
## $ previous <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ poutcome <fct> unknown, unknown, unknown, unknown, unknown, unknown, unkn…
## $ subscribed <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no…
2. Check missing value
## [1] FALSE
## age job marital education default balance
## 0 0 0 0 0 0
## housing loan contact day month duration
## 0 0 0 0 0 0
## num_contact pdays previous poutcome subscribed
## 0 0 0 0 0
Great!! No missing value.
1. Check summary of data frame
## age job marital education
## Min. :18.00 blue-collar:9732 divorced: 5207 primary : 6851
## 1st Qu.:33.00 management :9458 married :27214 secondary:23202
## Median :39.00 technician :7597 single :12790 tertiary :13301
## Mean :40.94 admin. :5171 unknown : 1857
## 3rd Qu.:48.00 services :4154
## Max. :95.00 retired :2264
## (Other) :6835
## default balance housing loan contact
## no :44396 Min. : -8019 no :20081 no :37967 cellular :29285
## yes: 815 1st Qu.: 72 yes:25130 yes: 7244 telephone: 2906
## Median : 448 unknown :13020
## Mean : 1362
## 3rd Qu.: 1428
## Max. :102127
##
## day month duration num_contact
## 20 : 2752 may :13766 Min. : 0.0 Min. : 1.000
## 18 : 2308 jul : 6895 1st Qu.: 103.0 1st Qu.: 1.000
## 21 : 2026 aug : 6247 Median : 180.0 Median : 2.000
## 17 : 1939 jun : 5341 Mean : 258.2 Mean : 2.764
## 6 : 1932 nov : 3970 3rd Qu.: 319.0 3rd Qu.: 3.000
## 5 : 1910 apr : 2932 Max. :4918.0 Max. :63.000
## (Other):32344 (Other): 6060
## pdays previous poutcome subscribed
## Min. : -1.0 Min. : 0.0000 failure: 4901 no :39922
## 1st Qu.: -1.0 1st Qu.: 0.0000 other : 1840 yes: 5289
## Median : -1.0 Median : 0.0000 success: 1511
## Mean : 40.2 Mean : 0.5803 unknown:36959
## 3rd Qu.: -1.0 3rd Qu.: 0.0000
## Max. :871.0 Max. :275.0000
##
2. Check any possibilities of outlier or uninformative
data.
** Data in the contact column has a lot of “unknown” data
making the contact column not very informative, so it can
be deleted and save the new data as `marketing_new
** Duration column should be better if in minutes to facilitate analysis. Then convert duration column into minutes.
** If you see data in the day and month
columns have many other values, then these two columns can only be used
as additional information ** In the pdays column, a value of -1
indicates that the client was not called in the previous campaign, so if
we want to see insights in the previous campaign, a new data frame can
be created. ** Because of any outlier in some columns, for data summary
we prefer to use median than mean.
Marketing Data Summary:
1. The youngest bank’s customer is 18 years old and the oldest bank’s
customer is 95 years old with mostly bank’s customers are 39 years
old
2. Most of the bank’s customers work as blue collar workers.
3. Marital status of most of the bank’s customers is married.
4. Educational background of most of the bank’s customers is secondary
level.
5. Most of the bank’s customers don’t have credit default.
6. The lowest average yearly balance is -8019, the highest average
yearly balance is 102127 and median 448.
7. Most of the bank’s customers have housing loan (55.58%) and don’t
have personal loan (83.98%).
8. During this campaign, the bank’s customers was contacted 2
times.
9. Most the bank’s customers have not placed term deposits.
2. Columns explanation
Some columns have values that need to be interpreted first because the
explanation of the data on the source website is not given with a
detailed explanation. In this data set, it will be divided based on the
client’s participation in the previous campaign because there are some
values that will be outlier if separated, even though the values still
provide information. For example, in the pdays value, the pdays column
will have a value of -1 if the client was not called in the previous
campaign.
-The client data that was called in the previous campaign will be saved in the “client_previous” data frame.
On the previous campaign, 8,257 clients have been contacted.
## age job marital education default
## Min. :18.00 management :1826 divorced: 931 primary :1020 no :8200
## 1st Qu.:33.00 blue-collar:1617 married :4745 secondary:4254 yes: 57
## Median :38.00 technician :1342 single :2581 tertiary :2660
## Mean :40.95 admin. :1089 unknown : 323
## 3rd Qu.:48.00 services : 706
## Max. :93.00 retired : 488
## (Other) :1189
## balance housing loan day month
## Min. :-1884 no :3115 no :7134 18 : 505 may :2514
## 1st Qu.: 168 yes:5142 yes:1123 15 : 466 nov :1150
## Median : 602 17 : 455 apr :1118
## Mean : 1557 20 : 444 feb : 924
## 3rd Qu.: 1743 13 : 430 aug : 531
## Max. :81204 12 : 400 jan : 498
## (Other):5557 (Other):1522
## duration num_contact pdays previous
## Min. : 0.020 Min. : 1.000 Min. : 1.0 Min. : 1.000
## 1st Qu.: 1.880 1st Qu.: 1.000 1st Qu.:133.0 1st Qu.: 1.000
## Median : 3.220 Median : 2.000 Median :194.0 Median : 2.000
## Mean : 4.335 Mean : 2.056 Mean :224.6 Mean : 3.178
## 3rd Qu.: 5.400 3rd Qu.: 2.000 3rd Qu.:327.0 3rd Qu.: 4.000
## Max. :36.980 Max. :16.000 Max. :871.0 Max. :275.000
##
## poutcome subscribed
## failure:4901 no :6352
## other :1840 yes:1905
## success:1511
## unknown: 5
##
##
##
Based on summary of client_previous data frame,
- Median of last contact duration is 3.22 minutes, minimum value is 0.02
minutes and maximum value is 36.9 minutes.
- Median of contacts performed before this campaign, we can conclude
that this happened during previous campaign, is 2 times.
- The category of poutcome (outcome of previous campaign) not only
success and failure, but also other and unknown. Then we need to explore
what the different of other and unknown in this data.
- From the previous campaign, only 1,511 from 8,257 clients that give
success outcome. It’s only 18.3%. We need to explore further to find out
what happened and make it a reference for the current campaign.
- From the 8,257 clients called in the previous campaign, only 1905
subscribed to the term deposit. Then we need to explore how many loyal
clients (clients that give success on outcome and give yes on column
subscribed).
-Explore what the difference of “other” and “unknown” value on poutcome column.
If we look at the subsetting results on the client_previous\(poutcome == "unknown" and client_previous\)poutcome == “other” data, there is no specific pattern that can be seen, but it can be concluded that when the previous campaign was ongoing, the client had not yet made a decision so the values “other” and “unknown” were given in the poutcome column.
-The client data that was not called in the previous campaign
will be saved in the client_new data frame.
## age job marital education
## Min. :18.00 blue-collar:8115 divorced: 4276 primary : 5831
## 1st Qu.:33.00 management :7632 married :22469 secondary:18948
## Median :39.00 technician :6255 single :10209 tertiary :10641
## Mean :40.93 admin. :4082 unknown : 1534
## 3rd Qu.:49.00 services :3448
## Max. :95.00 retired :1776
## (Other) :5646
## default balance housing loan day
## no :36196 Min. : -8019 no :16966 no :30833 20 : 2308
## yes: 758 1st Qu.: 55 yes:19988 yes: 6121 18 : 1803
## Median : 414 21 : 1738
## Mean : 1319 28 : 1653
## 3rd Qu.: 1358 6 : 1577
## Max. :102127 5 : 1545
## (Other):26330
## month duration num_contact pdays previous
## may :11252 Min. : 0.000 Min. : 1.000 Min. :-1 Min. :0
## jul : 6641 1st Qu.: 1.680 1st Qu.: 1.000 1st Qu.:-1 1st Qu.:0
## aug : 5716 Median : 2.950 Median : 2.000 Median :-1 Median :0
## jun : 5019 Mean : 4.295 Mean : 2.922 Mean :-1 Mean :0
## nov : 2820 3rd Qu.: 5.300 3rd Qu.: 3.000 3rd Qu.:-1 3rd Qu.:0
## apr : 1814 Max. :81.970 Max. :63.000 Max. :-1 Max. :0
## (Other): 3692
## poutcome subscribed
## failure: 0 no :33570
## other : 0 yes: 3384
## success: 0
## unknown:36954
##
##
##
Based on summary of client_new data frame,
- Median of last contact duration is 177 second, minimum value is 0
second and maximum value is 2.95 minutes.
- Most of clients contact by telemarketing for current campaign as much
2 times
- Only 3384 from 36,954 clients (9.15%) that give success outcome or
agree to subscribed the term deposit, it decreases from the previous
campaign. We need to explore further, what influences clients to finally
agree to participate in a campaign.
How many loyal clients (clients that give success on poutcome and give yes on column y)?
loyal_clients <- client_previous[client_previous$poutcome=="success" & client_previous$subscribed=="yes" ,]
loyal_clientsTotal loyal clients is 978 person (11.8%) from total 8,257
clients_previous and only 2.16% from total clients in this
data set. Then let’s explore the demographic of loyal clients.
## age job marital education default
## Min. :18.00 management :266 divorced: 93 primary : 81 no :978
## 1st Qu.:32.00 technician :138 married :547 secondary:433 yes: 0
## Median :40.00 retired :125 single :338 tertiary :409
## Mean :43.56 admin. :122 unknown : 55
## 3rd Qu.:54.00 blue-collar: 85
## Max. :93.00 student : 62
## (Other) :180
## balance housing loan day month
## Min. : -309.0 no :729 no :934 12 : 65 aug :132
## 1st Qu.: 272.2 yes:249 yes: 44 13 : 60 may :120
## Median : 872.0 4 : 58 sep :102
## Mean : 2005.9 9 : 50 feb : 99
## 3rd Qu.: 2240.0 11 : 48 oct : 95
## Max. :81204.0 15 : 46 apr : 80
## (Other):651 (Other):350
## duration num_contact pdays previous
## Min. : 0.420 Min. : 1.000 Min. : 1.0 Min. : 1.000
## 1st Qu.: 3.435 1st Qu.: 1.000 1st Qu.: 92.0 1st Qu.: 1.000
## Median : 4.800 Median : 1.000 Median :175.0 Median : 2.000
## Mean : 6.011 Mean : 1.716 Mean :161.5 Mean : 3.109
## 3rd Qu.: 7.270 3rd Qu.: 2.000 3rd Qu.:185.0 3rd Qu.: 4.000
## Max. :34.370 Max. :11.000 Max. :561.0 Max. :21.000
##
## poutcome subscribed
## failure: 0 no : 0
## other : 0 yes:978
## success:978
## unknown: 0
##
##
##
The most interesting thing to see from the above summary is that loyal clients have no credit defaults. Here we will analyze one by one per column.
-Age
Mean the client’s age is 43 years old, median the client’s age is 40 years old. We can see from graphic above that mostly loyal client’s age between 25-60 years old.
-Job
##
## admin. blue-collar entrepreneur housemaid management
## 0.124744376 0.086912065 0.009202454 0.017382413 0.271983640
## retired self-employed services student technician
## 0.127811861 0.036809816 0.060327198 0.063394683 0.141104294
## unemployed unknown
## 0.051124744 0.009202454
Loyal clients have various job, 27.19% as management, 14,11% as technician, 12.78% already retired, 12.47% as admin, and other jobs with each amount below 10%.
-Marital Status
##
## divorced married single
## 0.09509202 0.55930470 0.34560327
Mostly loyal clients is married with 55.93% then clients with marital status single with 34.56%. We can conclude that married client. Married clients are more concerned with long-term investments as additional income.
-Education
##
## primary secondary tertiary unknown
## 0.08282209 0.44274029 0.41820041 0.05623722
Mostly loyal clients have education background on secondary and tertiary
level.
-Balance
plot(loyal_clients$balance,
main = "Distribution of loyal client's balance",
col = "maroon")
boxplot(loyal_clients$balance)
Most of loyal client’s balance is lower than 10000.
-Correlation of age and balance
Judging from the graph above, it can be concluded that loyal clients, at
any age, have an average annual balance of under 5000.
-Loan
##
## no yes
## 0.7453988 0.2546012
##
## no yes
## 0.95501022 0.04498978
Portuguese Bank loyal clients mostly don’t have any loan. Only 25% that have housing loan and only 4.49% that have personal loan.
-Correlation of duration and number of contact on “success” poutcome
Loyal clients will accept the campaign if they called less than 5 times
and less than 10 minutes.
Summary of demography loyal clients:
-Mostly loyal client’s age is between 25-60 years old.
-Loyal clients have various job.
-Mostly loyal clients have education background on secondary and
tertiary level.
-Loyal clients, at any age, have an average annual balance of under
10000.
-Loyal clients mostly don’t have any loan.
-Loyal clients are mostly only contacted 1 time and no more than 10
times.
-Loyal clients prefer short call, less than 10 minutes.
Then we will explore what happened in the current campaign,
especially clients who rejected term deposits. First make new data frame
with all yes in subscribed column as campaign_no.
The total number of clients in this data set is 45,211 clients and
39,922 clients (88.3%) rejected term deposit. Then we need to explore
the demograph of campaign_no data frame.
## age job marital education
## Min. :18.00 blue-collar:9024 divorced: 4585 primary : 6260
## 1st Qu.:33.00 management :8157 married :24459 secondary:20752
## Median :39.00 technician :6757 single :10878 tertiary :11305
## Mean :40.84 admin. :4540 unknown : 1605
## 3rd Qu.:48.00 services :3785
## Max. :95.00 retired :1748
## (Other) :5911
## default balance housing loan day
## no :39159 Min. : -8019 no :16727 no :33162 20 : 2560
## yes: 763 1st Qu.: 58 yes:23195 yes: 6760 18 : 2080
## Median : 417 21 : 1825
## Mean : 1304 17 : 1763
## 3rd Qu.: 1345 6 : 1751
## Max. :102127 5 : 1695
## (Other):28248
## month duration num_contact pdays
## may :12841 Min. : 0.000 Min. : 1.000 Min. : -1.00
## jul : 6268 1st Qu.: 1.580 1st Qu.: 1.000 1st Qu.: -1.00
## aug : 5559 Median : 2.730 Median : 2.000 Median : -1.00
## jun : 4795 Mean : 3.686 Mean : 2.846 Mean : 36.42
## nov : 3567 3rd Qu.: 4.650 3rd Qu.: 3.000 3rd Qu.: -1.00
## apr : 2355 Max. :81.970 Max. :63.000 Max. :871.00
## (Other): 4537
## previous poutcome subscribed
## Min. : 0.0000 failure: 4283 no :39922
## 1st Qu.: 0.0000 other : 1533 yes: 0
## Median : 0.0000 success: 533
## Mean : 0.5021 unknown:33573
## 3rd Qu.: 0.0000
## Max. :275.0000
##
-Age
Age of most of clients rejected term deposits around 25-50 years old or
productive age. The number of retired clients id very small.
-Job
##
## admin. blue-collar entrepreneur housemaid management
## 0.113721757 0.226040780 0.034166625 0.028330244 0.204323431
## retired self-employed services student technician
## 0.043785381 0.034867993 0.094809879 0.016757677 0.169255047
## unemployed unknown
## 0.027578779 0.006362407
Job of most of clients rejected term deposits are blue-collar worker, management worker, technician and admin.
-Marital Status
##
## divorced married single
## 0.1148490 0.6126697 0.2724813
Marital status of most of clients rejected term deposits (failure) is married.
-Education
##
## primary secondary tertiary unknown
## 0.1568058 0.5198136 0.2831772 0.0402034
Education level of most of clients rejected term deposits (failure) is secondary level.
**-Balance
plot(campaign_no$balance,
main = "Distribution of failure client's balance",
col = "maroon")
plot(x = campaign_no$age, y = campaign_no$balance, main = "Correlation of age and balance", col="maroon")
Yearly average balance of most of clients rejected term deposits
(failure) is below 20000.
Correlation of age and balance: The increasing balance start from 25 years old and start to decrease on 60 years old. Clients age have high yearly balance is 30-60 years old.
-Loan
##
## no yes
## 0.418992 0.581008
##
## no yes
## 0.8306698 0.1693302
Most of clients rejected term deposits (failure) have housing loan (58%) and don’t have personal loan (83%).
-Correlation of duration and number of contact
plot(x = campaign_no$duration, y = campaign_no$num_contact, main= "Correlation of duration and number of contact", col ="maroon")
plot(x = campaign_no$balance, y = campaign_no$num_contact, main= "Correlation of balance and number of contact", col ="maroon")The shorter the duration of the call, the more contacts will be made, but is it polite to make more than 5 calls for the same campaign? There are 5,727 clients called more than 5 times for this campaign. Clients will certainly be increasingly annoyed if they are called more often. Loyal clients data shows that they are contacted on average 1-2 times and they prefer short call. Furthermore, telemarketing team called low average balance clients, it seems like a wasted effort. Then marketing team should give a woodpecker before they call their clients.
## [1] 5727
In the current campaign, the percentage of clients deciding to subscribe to term deposits is still low. Furthermore, an evaluation needs to be carried out related to the segmentation of the target clients and other programs that support this campaign. Here are some recommendations from this telemarketing campaign data.
-The marketing team should dig deeper into the segments, preferences and demographics of loyal clients so that they can be in line with the strategy carried out by the telemarketing team.
-The marketing team should first provide a trigger to the client regarding the campaign being run such as in the form of advertising, broadcast messages or content on social media.
-In addition, the marketing team can also narrow the target market to clients whose loan period is almost over.