Due to internal competition and current financial crisis, there are huge pressures for European banks to increase a financial asset. To solve this issue, one adopted strategy is offer attractive long-term deposit applications with good interest rates, in particular by using directed marketing campaigns. Also, the same drivers are pressing for a reduction in costs and time. Thus, there is a need for an improvement in efficiency: lesser contacts should be done, but an approximately number of successes (clients subscribing the deposit) should be kept.
1 - age (numeric)
2 - job : type of job (categorical:“admin.”,“unknown”,“unemployed”,“management”,“housemaid”,“entrepreneur”,“student”,“blue-collar”,“self-employed”,“retired”,“technician”,“services”)
3 - marital : marital status (categorical: “married”,“divorced”,“single”; note: “divorced” means divorced or widowed)
4 - education (categorical: “unknown”,“secondary”,“primary”,“tertiary”)
5 - default: has credit in default? (binary: “yes”,“no”)
6 - balance: average yearly balance, in euros (numeric)
7 - housing: has housing loan? (binary: “yes”,“no”)
8 - loan: has personal loan? (binary: “yes”,“no”)
### Related with the last contact of the current campaign:
9 - contact: contact communication type (categorical: “unknown”,“telephone”,“cellular”)
10 - day: last contact day of the month (numeric)
11 - month: last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”, “dec”)
12 - duration: last contact duration, in seconds (numeric)
other attributes:
13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
15 - previous: number of contacts performed before this campaign and for this client (numeric)
16 - poutcome: outcome of the previous marketing campaign (categorical: “unknown”,“other”,“failure”,“success”)
17 - y - has the client subscribed a term deposit? (binary: “yes”,“no”)
## age job marital education default
## Min. :19.00 management :969 divorced: 528 primary : 678 no :4445
## 1st Qu.:33.00 blue-collar:946 married :2797 secondary:2306 yes: 76
## Median :39.00 technician :768 single :1196 tertiary :1350
## Mean :41.17 admin. :478 unknown : 187
## 3rd Qu.:49.00 services :417
## Max. :87.00 retired :230
## (Other) :713
## balance housing loan contact day
## Min. :-3313 no :1962 no :3830 cellular :2896 Min. : 1.00
## 1st Qu.: 69 yes:2559 yes: 691 telephone: 301 1st Qu.: 9.00
## Median : 444 unknown :1324 Median :16.00
## Mean : 1423 Mean :15.92
## 3rd Qu.: 1480 3rd Qu.:21.00
## Max. :71188 Max. :31.00
##
## month duration campaign pdays
## may :1398 Min. : 4 Min. : 1.000 Min. : -1.00
## jul : 706 1st Qu.: 104 1st Qu.: 1.000 1st Qu.: -1.00
## aug : 633 Median : 185 Median : 2.000 Median : -1.00
## jun : 531 Mean : 264 Mean : 2.794 Mean : 39.77
## nov : 389 3rd Qu.: 329 3rd Qu.: 3.000 3rd Qu.: -1.00
## apr : 293 Max. :3025 Max. :50.000 Max. :871.00
## (Other): 571
## previous poutcome y
## Min. : 0.0000 failure: 490 no :4000
## 1st Qu.: 0.0000 other : 197 yes: 521
## Median : 0.0000 success: 129
## Mean : 0.5426 unknown:3705
## 3rd Qu.: 0.0000
## Max. :25.0000
##
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
pw <- function(x){
ifelse(x<=30,"Tenageer & Young",
ifelse(x>30 & x<= 45,"Middle-aged",
ifelse(x>45 & x<= 60,"Pre - Retired","Elder")))
}tm_bank_yes <- tm_bank %>%
filter(y == "yes") %>%
mutate(age_category = as.factor(pw(age)))
summary(tm_bank_yes)## age job marital education default
## Min. :19.00 management :131 divorced: 77 primary : 64 no :512
## 1st Qu.:32.00 technician : 83 married :277 secondary:245 yes: 9
## Median :40.00 blue-collar: 69 single :167 tertiary :193
## Mean :42.49 admin. : 58 unknown : 19
## 3rd Qu.:50.00 retired : 54
## Max. :87.00 services : 38
## (Other) : 88
## balance housing loan contact day
## Min. :-1206 no :301 no :478 cellular :416 Min. : 1.00
## 1st Qu.: 171 yes:220 yes: 43 telephone: 44 1st Qu.: 9.00
## Median : 710 unknown : 61 Median :15.00
## Mean : 1572 Mean :15.66
## 3rd Qu.: 2160 3rd Qu.:22.00
## Max. :26965 Max. :31.00
##
## month duration campaign pdays
## may : 93 Min. : 30.0 Min. : 1.000 Min. : -1.00
## aug : 79 1st Qu.: 260.0 1st Qu.: 1.000 1st Qu.: -1.00
## jul : 61 Median : 442.0 Median : 2.000 Median : -1.00
## apr : 56 Mean : 552.7 Mean : 2.267 Mean : 68.64
## jun : 55 3rd Qu.: 755.0 3rd Qu.: 3.000 3rd Qu.: 98.00
## nov : 39 Max. :2769.0 Max. :24.000 Max. :804.00
## (Other):138
## previous poutcome y age_category
## Min. : 0.00 failure: 63 no : 0 Elder : 48
## 1st Qu.: 0.00 other : 38 yes:521 Middle-aged :236
## Median : 0.00 success: 83 Pre - Retired :147
## Mean : 1.09 unknown:337 Tenageer & Young: 90
## 3rd Qu.: 2.00
## Max. :14.00
##
tm_edit <- tm_bank_yes %>%
group_by(job,age_category) %>%
summarise(Freq = n()) %>%
ungroup() %>%
arrange(age_category,Freq) %>%
mutate(order = row_number())
ggplot(data= tm_edit, aes(x = order, y = Freq)) +
geom_col(aes(fill = job), show.legend = F) +
coord_flip()+
facet_wrap(~age_category,
scales = "free")+
xlab("Job Category")+
ylab("Frequency")+
scale_x_continuous(
breaks = tm_edit$order,
labels = tm_edit$job)+
theme(axis.ticks.x = element_blank(),
axis.text.y = element_text())+
theme_minimal()tm_edit1 <- tm_bank_yes %>%
group_by(education,age_category) %>%
summarise(Freq = n()) %>%
ungroup() %>%
arrange(age_category,Freq) %>%
mutate(order = row_number())
ggplot(data= tm_edit1, aes(x = order, y = Freq)) +
geom_col(aes(fill = education), show.legend = F) +
coord_flip()+
facet_wrap(~age_category,
scales = "free")+
xlab("Job Category")+
ylab("Frequency")+
scale_x_continuous(
breaks = tm_edit1$order,
labels = tm_edit1$education)+
theme(axis.ticks.x = element_blank(),
axis.text.y = element_text())+
theme_minimal()