Import Data

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls.

bank <- read.csv("bank-full.csv",sep = ";")

Data Inspection

head(bank, 5)

dim(bank)

## [1] 45211    17

From our inspection we can conclude :

bank data contain 45211 rows and 17 columns
each columns names: “age”, “job”, “marital”, “education”, “default”, “balance”, “housing”, “loan”
“contact”, “day”, “month”, “duration”, “campaign”, “pdays”, “previous”, “poutcome”, “y”

Data Structure & NA Checking

str(bank)

## 'data.frame':    45211 obs. of  17 variables:
##  $ age      : int  58 44 33 47 33 35 28 42 58 43 ...
##  $ job      : chr  "management" "technician" "entrepreneur" "blue-collar" ...
##  $ marital  : chr  "married" "single" "married" "married" ...
##  $ education: chr  "tertiary" "secondary" "secondary" "unknown" ...
##  $ default  : chr  "no" "no" "no" "no" ...
##  $ balance  : int  2143 29 2 1506 1 231 447 2 121 593 ...
##  $ housing  : chr  "yes" "yes" "yes" "yes" ...
##  $ loan     : chr  "no" "no" "yes" "no" ...
##  $ contact  : chr  "unknown" "unknown" "unknown" "unknown" ...
##  $ day      : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ month    : chr  "may" "may" "may" "may" ...
##  $ duration : int  261 151 76 92 198 139 217 380 50 55 ...
##  $ campaign : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pdays    : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ previous : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ poutcome : chr  "unknown" "unknown" "unknown" "unknown" ...
##  $ y        : chr  "no" "no" "no" "no" ...

Based on STR, I want to change some columns into factor

bank$job <- as.factor(bank$job)
bank$marital <- as.factor(bank$marital)
bank$education <- as.factor(bank$education)
bank$default <- as.factor(bank$default)
bank$housing <- as.factor(bank$housing)
bank$loan <- as.factor(bank$loan)
bank$contact <- as.factor(bank$contact)
bank$month <- as.factor(bank$month)
bank$poutcome <- as.factor(bank$poutcome)
bank$y <- as.factor(bank$y)

str(bank)

## 'data.frame':    45211 obs. of  17 variables:
##  $ age      : int  58 44 33 47 33 35 28 42 58 43 ...
##  $ job      : Factor w/ 12 levels "admin.","blue-collar",..: 5 10 3 2 12 5 5 3 6 10 ...
##  $ marital  : Factor w/ 3 levels "divorced","married",..: 2 3 2 2 3 2 3 1 2 3 ...
##  $ education: Factor w/ 4 levels "primary","secondary",..: 3 2 2 4 4 3 3 3 1 2 ...
##  $ default  : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 2 1 1 ...
##  $ balance  : int  2143 29 2 1506 1 231 447 2 121 593 ...
##  $ housing  : Factor w/ 2 levels "no","yes": 2 2 2 2 1 2 2 2 2 2 ...
##  $ loan     : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 2 1 1 1 ...
##  $ contact  : Factor w/ 3 levels "cellular","telephone",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ day      : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ month    : Factor w/ 12 levels "apr","aug","dec",..: 9 9 9 9 9 9 9 9 9 9 ...
##  $ duration : int  261 151 76 92 198 139 217 380 50 55 ...
##  $ campaign : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pdays    : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ previous : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ poutcome : Factor w/ 4 levels "failure","other",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ y        : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

After that, I want to check if there is any NA value in the data.

colSums(is.na(bank))

##       age       job   marital education   default   balance   housing      loan 
##         0         0         0         0         0         0         0         0 
##   contact       day     month  duration  campaign     pdays  previous  poutcome 
##         0         0         0         0         0         0         0         0 
##         y 
##         0

anyNA(bank)

## [1] FALSE

Based on the result, the data don’t have any NA values, so we can continue to explore the data

Data Wrangling & Exploration

I want to know more about education degree of our clients

data.frame(sort(table("education" = bank$education),decreasing = T))

I want to know more about loan condition of our clients

data.frame(table("loan" = bank$loan, 
      "housing" = bank$housing,
      "default" = bank$default))

I want to know marital status of our clients

data.frame(sort(table("marital_status" = bank$marital),decreasing = T))

I want to know more about how to contact our clients

data.frame(sort(table("contact" = bank$contact),decreasing = T))

I want to know how many clients have not been contacted

data.frame(sort(table("contacted" = bank$previous != 0),decreasing = T))

Based on the data, we know that:

Education degree of our clients mainly secondary and tertiary
Half of our clients already have housing loan
Majority of our clients already married
Majority of our clients can be contacted by cellular
Majority of our clients have not been contacted for loan program

Business Case

Despite majority of our clients already have housing loan, many of them don’t

For this reason, I want to create a new program for loan.

But first, I have to explore the data about job and age distribution.

More Data Wrangling & Exploration

To see job description and the distribution

data.frame(sort(table("job" = bank$job),decreasing = T))

To see age distribution

data.frame(sort(table("age" = bank$age),decreasing = T))

Based on the data, I want to know how many clients we can prospect to join the loan program. The requirements:

age between 30 and 50
work as management or blue-collar or technician or admin. or services
don’t have loan, housing loan, and default credit.
can be contacted

Condition

Condition for subsetting

age_cond <- bank$age == 30:60
job_cond <- bank$job %in% c("management", "blue-collar", "technician", "admin.", "services")
loan_cond <- bank$loan == "no" & bank$housing == "no" & bank$default == "no"
cont_cond <- bank$contact != "unknown"

Subsetting

Subsetting to get new dataframe

prog_cond <- bank[age_cond & job_cond & loan_cond & cont_cond, ]
prog_cond

Based the data, we have 278 clients to be offered.

Exploratory Data Analysis - Bank Dataset

Patricio Kresnatama

December 15, 2020