Bank Marketing Exploratory Data Analysis and Visualization

Berliana Harahap

2023-10-20


Data Background

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

Install Library and Packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)
library(leaflet)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Data Input

bank <- read.csv("bank.csv", header = TRUE, sep = ";")
head(bank)
##   age         job marital education default balance housing loan  contact day
## 1  30  unemployed married   primary      no    1787      no   no cellular  19
## 2  33    services married secondary      no    4789     yes  yes cellular  11
## 3  35  management  single  tertiary      no    1350     yes   no cellular  16
## 4  30  management married  tertiary      no    1476     yes  yes  unknown   3
## 5  59 blue-collar married secondary      no       0     yes   no  unknown   5
## 6  35  management  single  tertiary      no     747      no   no cellular  23
##   month duration campaign pdays previous poutcome  y
## 1   oct       79        1    -1        0  unknown no
## 2   may      220        1   339        4  failure no
## 3   apr      185        1   330        1  failure no
## 4   jun      199        4    -1        0  unknown no
## 5   may      226        1    -1        0  unknown no
## 6   feb      141        2   176        3  failure no

Data Description

Bank client data: - ‘age’ = age - ‘job’ = type of job (categorical:“admin.”,“unknown”,“unemployed”,“management”,“housemaid”,“entrepreneur”,“student”,“blue-collar”, “self-employed”,“retired”,“technician”,“services”) - ‘marital’ = marital status (categorical: “married”,“divorced”,“single”; note: “divorced” means divorced or widowed) - ‘education’ = education (categorical: “unknown”,“secondary”,“primary”,“tertiary”) - ‘default’ = has credit in default? (binary: “yes”,“no”) - ‘balance’ = average yearly balance, in euros (numeric) - ‘housing’ = has housing loan? (binary: “yes”,“no”) - ‘loan’ = has personal loan? (binary: “yes”,“no”)

Related with the last contact of the current campaign: - ‘contact’ = contact communication type (categorical: “unknown”,“telephone”,“cellular”) - ‘day’ = last contact day of the month (numeric) - ‘month’ = last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”, “dec”) - ‘duration’ = last contact duration, in seconds (numeric)

Other attributes: - ‘campaign’ = number of contacts performed during this campaign and for this client (numeric, includes last contact) - ‘pdays’ = number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted) - ‘previous’ = number of contacts performed before this campaign and for this client (numeric) - ‘poutcome’ = outcome of the previous marketing campaign (categorical: “unknown”,“other”,“failure”,“success”)

Output variable (desired target): - ‘y’ = has the client subscribed a term deposit? (binary: “yes”,“no”)

Data Inspection

head(bank)
##   age         job marital education default balance housing loan  contact day
## 1  30  unemployed married   primary      no    1787      no   no cellular  19
## 2  33    services married secondary      no    4789     yes  yes cellular  11
## 3  35  management  single  tertiary      no    1350     yes   no cellular  16
## 4  30  management married  tertiary      no    1476     yes  yes  unknown   3
## 5  59 blue-collar married secondary      no       0     yes   no  unknown   5
## 6  35  management  single  tertiary      no     747      no   no cellular  23
##   month duration campaign pdays previous poutcome  y
## 1   oct       79        1    -1        0  unknown no
## 2   may      220        1   339        4  failure no
## 3   apr      185        1   330        1  failure no
## 4   jun      199        4    -1        0  unknown no
## 5   may      226        1    -1        0  unknown no
## 6   feb      141        2   176        3  failure no

Data Cleansing & Coertions

glimpse(bank)
## Rows: 4,521
## Columns: 17
## $ age       <int> 30, 33, 35, 30, 59, 35, 36, 39, 41, 43, 39, 43, 36, 20, 31, …
## $ job       <chr> "unemployed", "services", "management", "management", "blue-…
## $ marital   <chr> "married", "married", "single", "married", "married", "singl…
## $ education <chr> "primary", "secondary", "tertiary", "tertiary", "secondary",…
## $ default   <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", …
## $ balance   <int> 1787, 4789, 1350, 1476, 0, 747, 307, 147, 221, -88, 9374, 26…
## $ housing   <chr> "no", "yes", "yes", "yes", "yes", "no", "yes", "yes", "yes",…
## $ loan      <chr> "no", "yes", "no", "yes", "no", "no", "no", "no", "no", "yes…
## $ contact   <chr> "cellular", "cellular", "cellular", "unknown", "unknown", "c…
## $ day       <int> 19, 11, 16, 3, 5, 23, 14, 6, 14, 17, 20, 17, 13, 30, 29, 29,…
## $ month     <chr> "oct", "may", "apr", "jun", "may", "feb", "may", "may", "may…
## $ duration  <int> 79, 220, 185, 199, 226, 141, 341, 151, 57, 313, 273, 113, 32…
## $ campaign  <int> 1, 1, 1, 4, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 5, 1, 1, 1, …
## $ pdays     <int> -1, 339, 330, -1, -1, 176, 330, -1, -1, 147, -1, -1, -1, -1,…
## $ previous  <int> 0, 4, 1, 0, 0, 3, 2, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 2, 0, 1, …
## $ poutcome  <chr> "unknown", "failure", "failure", "unknown", "unknown", "fail…
## $ y         <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", …
anyNA(bank)
## [1] FALSE

Data Manipulation & Transformation

bank <- mutate(.data = bank,
               default = as.factor(default),
               housing = as.factor(housing),
               loan = as.factor(loan),
               y = as.factor(y))

Answering Business Questions : How to predict if the client will subscribe a term deposit (variable y) or not?

To predict if the client will subscribe a term deposit (variable y). There are several important considerations that can influence whether people will subscribe to a term deposit or not, which are:

1. Demographic Factors

Demographic factors are influenced by age and gender. However there are only data for age, and so categorization for age was carried out.

bank <- bank %>%
      mutate(age = case_when(
        between(age, 15, 39) ~ "Young Adults",
        between(age, 40, 59) ~ "Middle Adulthood",
        between(age, 59, 90) ~ "Elderly"
      ))
bank_demographic <- bank %>%
  select(age, y)
plot1 <- ggplot(bank_demographic, aes(x = age, fill = y)) +
  geom_bar(position = "dodge") +
  geom_text(stat = "count", aes(label = y), position = position_dodge(0.9), vjust = -0.5) +
  labs(title = "Term Deposit based on Age Distribution", x = "Age Distribution", y = "Number of People") +
  theme_minimal()+
  scale_fill_brewer(palette = "Set3") +
  theme(legend.position = "none")
ggplotly(plot1, tootltip = "x")

Younger individuals are often just starting their careers, may have more immediate financial needs, and may not have significant savings. As a result, they might be less inclined to commit their money to a long-term investment like a term deposit. They may prefer more accessible and flexible financial options. On the other hand, older individuals, often in the middle-aged or senior category, tend to have more financial stability. They may have paid off mortgages, have more disposable income, and be thinking about retirement or other long-term financial goals. Older individuals may be more inclined to invest their savings in long-term options like term deposits because they are looking for secure and low-risk investment opportunities that offer higher interest rates than regular savings accounts. In summary, the explanation is highlighting a general trend where age can influence an individual’s financial behavior and investment preferences. Younger individuals may prioritize short-term financial needs, while older individuals may be more focused on long-term financial security, making term deposits a more attractive option for the latter group. However, based on the data, we found out that young adults tend to have a term deposit than elderly and middle adulthood age.

2. Economic and Financial Factors

bank_individual <- bank %>%
  select(job, y)
bank_individual0 <- bank %>%
  select(education, y)
bank_individual1 <- bank %>%
  select(marital, y)
bank_individual2 <- bank %>%
  select (balance, y)
bank_individual3 <- bank %>%
  select (housing, loan, y)
plot2 <- ggplot(bank_individual, aes(x = job, fill = y)) +
  geom_bar(position = "dodge") +
  geom_text(stat = "count", aes(label = y), position = position_dodge(0.9), vjust = -0.5) +
  labs(title = "Term Deposit based on Job", x = "Job", y = "Number of People") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3")+
  theme(legend.position = "none")
ggplotly(plot2, tootltip = "x")

Employed individuals with more higher income might have more stable finances and are more likely to subscribe a term deposits, which is shown in the data.

plot3 <- ggplot(bank_individual0, aes(x = education, fill = y)) +
  geom_bar(position = "dodge") +
  geom_text(stat = "count", aes(label = y), position = position_dodge(0.9), vjust = -0.5) +
  labs(title = "Term Deposit based on Education", x = "Education", y = "Number of People") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") +
  theme(legend.position = "none")
ggplotly(plot3, tootltip = "x")

Individuals with a higher level of education and financial literacy may be more aware of the benefits of term deposits and thus more likely to subscribe. This is clearly illustrated in the data, where people with higher education (secondary (245), followed by tertiary (193)) tend to have a term deposit.

plot4 <- ggplot(bank_individual1, aes(x = marital, fill = y)) +
  geom_bar(position = "dodge") +
  geom_text(stat = "count", aes(label = y), position = position_dodge(0.9), vjust = -0.5) +
  labs(title = "Term Deposit based on Marital Status", x = "Marital Status", y = "Number of People") +
  theme_minimal()+
  scale_fill_brewer(palette = "Set3") +
  theme(legend.position = "none")
ggplotly(plot4, tootltip = "x")

Family and marital status can affect financial planning, with married or partnered individuals possibly more likely to invest in long-term savings. This is truly demonstrated on the data, where married individuals (277) tend to have a term deposit, followed by single individuals (167), and divorced individuals (77). Divorced individuals may have an inability to make term deposits due to financial separation with their partners.

plot5 <- ggplot(data = bank_individual2, aes(x = y, y = balance, color = balance)) + 
  geom_line() +
  geom_point() +
  labs(title = "Term Deposit based on Balance", x = "Term Deposit", y = "Balance") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") + 
  theme(legend.position = "none")
ggplotly(plot5, tootltip = "x")

Higher income individuals may have more disposable income to invest in term deposits. However, based on the data, this is not necessarily the case. Where the data shows that people with more savings are less likely to have a term deposit.

plot6 <- ggplot(data = bank_individual3, aes(x = loan, fill = housing)) +
  geom_bar(position = "dodge") +
  geom_text(stat = "count", aes(label = housing), position = position_dodge(0.9), vjust = -0.5) +
  labs(title = "Term Deposit based on Housing and Personal Loan",
       x = "Personal Loan",
       y = "Number of People") +
  scale_fill_brewer(palette = "Set3") +  
  theme_minimal()+
  theme(legend.position = "none")
ggplotly(plot6, tootltip = "x")

People who have had success with savings or investment accounts in the past may be more willing to subscribe to term deposit. The data shown that people with housing loan and no personal loan tends to have a term deposit (2153), followed by people with housing and personal loan (406).

4. Customer Interaction and Marketing

bank_duration <- bank %>% 
  select (duration, y)
plot8 <- ggplot(data = bank_duration, aes(x = y, y = duration, color = duration)) + 
  geom_line() +
  geom_point() +
  labs(title = "Term Deposit based on Last Contact Duration", x = "Term Deposit", y = "Last Contact Duration") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") + 
  theme(legend.position = "none")
ggplotly(plot8, tootltip = "x")

Considerations for term deposit can be influenced by bank previous contact with customer. Successful interactions with customer will impacts customer’s decision for term deposit. Based on the data, the longer the duration of contact with customers, the higher the tendency for people to use term deposits. However, the difference in contact duration between those with and without a term deposit is not significant.

bank_campaign <- bank %>%
  select (poutcome, y) 
plot9 <- ggplot(data = bank_campaign, aes(x = poutcome, fill = y)) +
  geom_bar(position = "dodge") +
  geom_text(stat = "count", aes(label = y), position = position_dodge(0.9), vjust = -0.5) +
  labs(title = "Term Deposit based on Campaign Outcome",
       x = "Campaign Outcome",
       y = "Number of People") +
  scale_fill_brewer(palette = "Set3") +  
  theme_minimal() + 
  theme(legend.position = "none")
ggplotly(plot9, tootltip = "x")

Marketing Campaigns considers about the effectiveness of marketing efforts and promotional campaigns, which can influence subscription rates. Based on the data, unknown campaign status dominates the number of people (337) who have term deposits. Followed by successful (83), failed (63) and other campaigns (38). This shows that the success of a campaign cannot necessarily be used as a benchmark for the number of people who take term deposits.

Data Source

Moro,S., Rita,P., and Cortez,P.. (2012). Bank Marketing. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.