Dataset

Using Bank Marketing Dataset from https://archive.ics.uci.edu/ml/datasets/bank+marketing

This dataset is public available for research. The details are described in [Moro et al., 2014].

[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

The target-variable is called “y” - has the client subscribed a term deposit? (binary: ‘yes’,‘no’)

Read Data

data <- readr::read_delim("bank-additional.csv", 
                   ";", escape_double = FALSE, trim_ws = TRUE)

data %>% describe_tbl(target = y)
## 4 119 observations with 21 variables; 451 targets (10.9%)
## 0 variables containing missings (NA)
## 0 variables with no variance
data %>% describe()
## # A tibble: 21 x 8
##    variable    type     na na_pct unique   min  mean   max
##    <chr>       <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>
##  1 age         dbl       0      0     67    18  40.1    88
##  2 job         chr       0      0     12    NA  NA      NA
##  3 marital     chr       0      0      4    NA  NA      NA
##  4 education   chr       0      0      8    NA  NA      NA
##  5 default     chr       0      0      3    NA  NA      NA
##  6 housing     chr       0      0      3    NA  NA      NA
##  7 loan        chr       0      0      3    NA  NA      NA
##  8 contact     chr       0      0      2    NA  NA      NA
##  9 month       chr       0      0     10    NA  NA      NA
## 10 day_of_week chr       0      0      5    NA  NA      NA
## # ... with 11 more rows

Data Exploration

data %>% 
  explore_all()

Data Understanding

data %>% explore(age)

data %>% describe(age)
## variable = age
## type     = double
## na       = 0 of 4 119 (0%)
## unique   = 67
## min|max  = 18 | 88
## q05|q95  = 26 | 58
## q25|q75  = 32 | 47
## median   = 38
## mean     = 40.11362

So, there are customer with age between 18 and 88, the median is 38 years. There is a “peak” at age 30-35. No customer with unknown age.

data %>% explore(marital)

60.9% of customers are married, 28% single.

data %>% explore(age, marital)

There is a pattern between age and marital. Singles have lowest age.

data %>% explore(job)

data %>% explore(age, job)

data %>% explore(education)

Data+Target Exploration

data %>% 
  explore_all(target = y, split = FALSE)

Predit Target

data %>% explore(y)

In the data are 10.9 cases with y = “yes”

data %>% explain_tree(target = y, minsplit = 300, maxdepth = 4)