In this analysis, we will see what happened with Direct marketing Campaign from May 2008 to November 2010. The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution.
Data will be set as Factor for any kind except numeric data. Because, the criteria is almost categorical data
The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. - Age : Numbers of client’s age
job : type of job (categorical: ‘admin.’,‘blue-collar’,‘entrepreneur’,‘housemaid’,‘management’,‘retired’,‘self-employed’,‘services’,‘student’,‘technician’,‘unemployed’,‘unknown’)
marital : marital status (categorical: ‘divorced’,‘married’,‘single’,‘unknown’; note: ‘divorced’ means divorced or widowed)
education : education level (categorical: ‘basic.4y’,‘basic.6y’,‘basic.9y’,‘high.school’,‘illiterate’,‘professional.course’,‘university.degree’,‘unknown’)
default : has credit in default? (categorical: ‘no’,‘yes’,‘unknown’)
balance : amount of the balance
housing : has housing loan? (categorical: ‘no’,‘yes’,‘unknown’)
loan: has personal loan? (categorical: ‘no’,‘yes’,‘unknown’)
contact: contact communication type (categorical: ‘cellular’,‘telephone’)
day : last contact in day
month : last contact in month
duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y=‘no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
previous: number of contacts performed before this campaign and for this client (numeric)
poutcome: outcome of the previous marketing campaign (categorical: ‘failure’,‘nonexistent’,‘success’)
y : has the client subscribed a term deposit? (binary: ‘yes’,‘no’)
#> Rows: 45,211
#> Columns: 17
#> $ age <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57,…
#> $ job <fct> management, technician, entrepreneur, blue-collar, unknown,…
#> $ marital <fct> married, single, married, married, single, married, single,…
#> $ education <fct> tertiary, secondary, secondary, unknown, unknown, tertiary,…
#> $ default <fct> no, no, no, no, no, no, no, yes, no, no, no, no, no, no, no…
#> $ balance <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, 7…
#> $ housing <fct> yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes, yes, …
#> $ loan <fct> no, no, yes, no, no, no, yes, no, no, no, no, no, no, no, n…
#> $ contact <fct> unknown, unknown, unknown, unknown, unknown, unknown, unkno…
#> $ day <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
#> $ month <fct> may, may, may, may, may, may, may, may, may, may, may, may,…
#> $ duration <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 517…
#> $ campaign <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ pdays <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,…
#> $ previous <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ poutcome <fct> unknown, unknown, unknown, unknown, unknown, unknown, unkno…
#> $ y <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no,…
#> NULL
Data will removed :
Data will format change :
#> age job marital education
#> Min. :18.00 blue-collar:9732 divorced: 5207 primary : 6851
#> 1st Qu.:33.00 management :9458 married :27214 secondary:23202
#> Median :39.00 technician :7597 single :12790 tertiary :13301
#> Mean :40.94 admin. :5171 unknown : 1857
#> 3rd Qu.:48.00 services :4154
#> Max. :95.00 retired :2264
#> (Other) :6835
#> default balance housing loan contact
#> no :44396 Min. : -8019 no :20081 no :37967 cellular :29285
#> yes: 815 1st Qu.: 72 yes:25130 yes: 7244 telephone: 2906
#> Median : 448 unknown :13020
#> Mean : 1362
#> 3rd Qu.: 1428
#> Max. :102127
#>
#> day month duration campaign
#> 20 : 2752 may :13766 Min. : 0.0 Min. : 1.000
#> 18 : 2308 jul : 6895 1st Qu.: 103.0 1st Qu.: 1.000
#> 21 : 2026 aug : 6247 Median : 180.0 Median : 2.000
#> 17 : 1939 jun : 5341 Mean : 258.2 Mean : 2.764
#> 6 : 1932 nov : 3970 3rd Qu.: 319.0 3rd Qu.: 3.000
#> 5 : 1910 apr : 2932 Max. :4918.0 Max. :63.000
#> (Other):32344 (Other): 6060
#> pdays poutcome y
#> Min. : -1.0 failure: 4901 no :39922
#> 1st Qu.: -1.0 other : 1840 yes: 5289
#> Median : -1.0 success: 1511
#> Mean : 40.2 unknown:36959
#> 3rd Qu.: -1.0
#> Max. :871.0
#>
Summary :
Average of client age is around 40 years old
The most job categorical is blue-collar
Direct Marketing campaign have the highest numbers for contact duration in 4918 seconds, and the average of the campaign is around 258 seconds
Cellular is most often for contacting client with 29285 times during the period of Direct Marketing Campaign
blue-collar or we can find out other insight from that?#>
#> blue-collar management technician admin. services
#> 338 268 223 172 148
#> self-employed unemployed entrepreneur housemaid retired
#> 54 48 47 36 15
#> student unknown
#> 4 2
yes, it is still the same for Top 3 categorical type in 40. the rank as follows :
blue-collar
management
technician
bank_edu <- bank %>% filter(duration >= 258.2)
bank_edu <- xtabs(campaign ~ education, bank_edu)
bank_edu <- as.data.frame(bank_edu)
bank_edu <- bank_edu %>%
arrange(-Freq)
bank_eduKnow, we know that secondary edu level is the highest frequency of call more than average duration call from 258.2 seconds.
library(ggplot2)
ggplot(data = bank, mapping = aes(x = job, y = duration)) +
geom_boxplot(outlier.shape = NA) +
geom_point()bank_job_vis <- bank %>%
group_by(job) %>%
summarise(duration =mean(duration)) %>%
ungroup() %>%
arrange(-duration)
bank_job_vis %>%
arrange(duration) %>%
mutate(job = factor(job, levels = job)) %>%
ggplot(mapping = aes(x=job, y=duration)) +
geom_segment( aes(xend=job, yend=0)) +
geom_point( size=4, color="orange") +
coord_flip() +
theme_bw() +
xlab("Job Type")+
ylab("Duration (seconds)") unemployed is the longest period be able to called, also retired and self-employed. Probably, other job can up calling but in the short time available.