Using Bank Marketing Dataset from https://archive.ics.uci.edu/ml/datasets/bank+marketing
This dataset is public available for research. The details are described in [Moro et al., 2014].
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
The target-variable is called “y” - has the client subscribed a term deposit? (binary: ‘yes’,‘no’)
data <- readr::read_delim("bank-additional.csv",
";", escape_double = FALSE, trim_ws = TRUE)
data %>% describe_tbl(target = y)
## 4 119 observations with 21 variables; 451 targets (10.9%)
## 0 variables containing missings (NA)
## 0 variables with no variance
data %>% describe()
## # A tibble: 21 x 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 age dbl 0 0 67 18 40.1 88
## 2 job chr 0 0 12 NA NA NA
## 3 marital chr 0 0 4 NA NA NA
## 4 education chr 0 0 8 NA NA NA
## 5 default chr 0 0 3 NA NA NA
## 6 housing chr 0 0 3 NA NA NA
## 7 loan chr 0 0 3 NA NA NA
## 8 contact chr 0 0 2 NA NA NA
## 9 month chr 0 0 10 NA NA NA
## 10 day_of_week chr 0 0 5 NA NA NA
## # ... with 11 more rows
data %>%
explore_all()
data %>% explore(age)
data %>% describe(age)
## variable = age
## type = double
## na = 0 of 4 119 (0%)
## unique = 67
## min|max = 18 | 88
## q05|q95 = 26 | 58
## q25|q75 = 32 | 47
## median = 38
## mean = 40.11362
So, there are customer with age between 18 and 88, the median is 38 years. There is a “peak” at age 30-35. No customer with unknown age.
data %>% explore(marital)
60.9% of customers are married, 28% single.
data %>% explore(age, marital)
There is a pattern between age and marital. Singles have lowest age.
data %>% explore(job)
data %>% explore(age, job)
data %>% explore(education)
data %>%
explore_all(target = y, split = FALSE)
data %>% explore(y)
In the data are 10.9 cases with y = “yes”
data %>% explain_tree(target = y, minsplit = 300, maxdepth = 4)