library(dplyr)
library(ggplot2)
library(tidyr)
library(scales)
library(colorspace)
library(ggridges)
library(glue)
library(lubridate)
To enhance the quantity and quality of customers, the company can achieve this through campaigns and an effective campaign is one that understands its target audience well. The company need to identify the clients who require their services or products. Hopefully, the company can increase its revenue through effective campaigns.
In 2012, a Portuguese bank company launched a campaign to promote their services. The company tracked and summarized the characteristics of their clients and assessed the effectiveness of their campaign. They aimed to enhance the campaign’s efficiency to reduce capital expenditure in the following periods.
Currently, the bank is planning a new campaign and is seeking assistance from data analysts to provide recommendations on the types of clients to target in the campaign.
Identifying the characteristics of clients from the campaign history recap and providing recommendations for clients who are responsive to the campaign.
Benefit: To enhance the effectiveness of campaign success
The data we get from UCI Machine Learning Repository that provide by Moro,S., Rita,P., and Cortez,P.. (2012). with title Bank Marketing https://doi.org/10.24432/C5K306.
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.
Here is the data.
and, here is the description of the attribute data.
1 age (numeric)
2 job : type of job (categorical: “admin.”,“unknown”,“unemployed”,“management”,“housemaid”,“entrepreneur”,“student”, “blue-collar”,“self-employed”,“retired”,“technician”,“services”)
3 marital : marital status (categorical: “married”,“divorced”,“single”; note: “divorced” means divorced or widowed)
4 education (categorical: “unknown”,“secondary”,“primary”,“tertiary”)
5 default: has credit in default? (binary: “yes”,“no”)
6 balance: average yearly balance, in euros (numeric)
7 housing: has housing loan? (binary: “yes”,“no”)
8 loan: has personal loan? (binary: “yes”,“no”)
# related with the last contact of the current campaign:
9 contact: contact communication type (categorical: “unknown”,“telephone”,“cellular”)
10 day: last contact day of the month (numeric)
11 month: last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”, “dec”)
12 duration: last contact duration, in seconds (numeric)
# other attributes:
13 campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
14 pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
15 previous: number of contacts performed before this campaign and for this client (numeric)
16 poutcome: outcome of the previous marketing campaign (categorical: “unknown”,“other”,“failure”,“success”)
Output variable (desired target):
17 y - has the client subscribed a term deposit? (binary: “yes”,“no”)
Let’s explore our data and assist the bank.
Firstly, we need to check the data types and identify data that has irrelevant type.
#> Rows: 45,211
#> Columns: 17
#> $ age <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57, ~
#> $ job <chr> "management", "technician", "entrepreneur", "blue-collar", "~
#> $ marital <chr> "married", "single", "married", "married", "single", "marrie~
#> $ education <chr> "tertiary", "secondary", "secondary", "unknown", "unknown", ~
#> $ default <chr> "no", "no", "no", "no", "no", "no", "no", "yes", "no", "no",~
#> $ balance <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, 71~
#> $ housing <chr> "yes", "yes", "yes", "yes", "no", "yes", "yes", "yes", "yes"~
#> $ loan <chr> "no", "no", "yes", "no", "no", "no", "yes", "no", "no", "no"~
#> $ contact <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "unkn~
#> $ day <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
#> $ month <chr> "may", "may", "may", "may", "may", "may", "may", "may", "may~
#> $ duration <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 517,~
#> $ campaign <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ pdays <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, ~
#> $ previous <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ poutcome <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "unkn~
#> $ y <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", ~
As we can see from the data description, there are columns with irrelevant types. Here they are.
job
, marital
, education
, contact
, month
, poutcome
, default
, housing
, loan
, y
.So, let’s convert the data types.
konv_factor <- c("job", "marital", "education", "contact", "month", "poutcome", "default", "housing", "loan", "y")
marketingBank <- marketingBank %>%
mutate_at(konv_factor, as.factor)
glimpse(marketingBank)
#> Rows: 45,211
#> Columns: 17
#> $ age <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57, ~
#> $ job <fct> management, technician, entrepreneur, blue-collar, unknown, ~
#> $ marital <fct> married, single, married, married, single, married, single, ~
#> $ education <fct> tertiary, secondary, secondary, unknown, unknown, tertiary, ~
#> $ default <fct> no, no, no, no, no, no, no, yes, no, no, no, no, no, no, no,~
#> $ balance <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, 71~
#> $ housing <fct> yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes, yes, y~
#> $ loan <fct> no, no, yes, no, no, no, yes, no, no, no, no, no, no, no, no~
#> $ contact <fct> unknown, unknown, unknown, unknown, unknown, unknown, unknow~
#> $ day <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
#> $ month <fct> may, may, may, may, may, may, may, may, may, may, may, may, ~
#> $ duration <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 517,~
#> $ campaign <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ pdays <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, ~
#> $ previous <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ poutcome <fct> unknown, unknown, unknown, unknown, unknown, unknown, unknow~
#> $ y <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, ~
Next, we need to check for missing values in our data. Missing values can introduce bias into our subsequent analysis.
#> [1] FALSE
Upon inspection, it appears that there are no missing values in our data.
To gain a basic insight into our data, we can use the summary() method. This method will provide a summary of each of our columns.
#> [1] 45211 17
#> age job marital education
#> Min. :18.00 blue-collar:9732 divorced: 5207 primary : 6851
#> 1st Qu.:33.00 management :9458 married :27214 secondary:23202
#> Median :39.00 technician :7597 single :12790 tertiary :13301
#> Mean :40.94 admin. :5171 unknown : 1857
#> 3rd Qu.:48.00 services :4154
#> Max. :95.00 retired :2264
#> (Other) :6835
#> default balance housing loan contact
#> no :44396 Min. : -8019 no :20081 no :37967 cellular :29285
#> yes: 815 1st Qu.: 72 yes:25130 yes: 7244 telephone: 2906
#> Median : 448 unknown :13020
#> Mean : 1362
#> 3rd Qu.: 1428
#> Max. :102127
#>
#> day month duration campaign
#> Min. : 1.00 may :13766 Min. : 0.0 Min. : 1.000
#> 1st Qu.: 8.00 jul : 6895 1st Qu.: 103.0 1st Qu.: 1.000
#> Median :16.00 aug : 6247 Median : 180.0 Median : 2.000
#> Mean :15.81 jun : 5341 Mean : 258.2 Mean : 2.764
#> 3rd Qu.:21.00 nov : 3970 3rd Qu.: 319.0 3rd Qu.: 3.000
#> Max. :31.00 apr : 2932 Max. :4918.0 Max. :63.000
#> (Other): 6060
#> pdays previous poutcome y
#> Min. : -1.0 Min. : 0.0000 failure: 4901 no :39922
#> 1st Qu.: -1.0 1st Qu.: 0.0000 other : 1840 yes: 5289
#> Median : -1.0 Median : 0.0000 success: 1511
#> Mean : 40.2 Mean : 0.5803 unknown:36959
#> 3rd Qu.: -1.0 3rd Qu.: 0.0000
#> Max. :871.0 Max. :275.0000
#>
Based on the summary() results, there are some insights that we can use to identify our data. We can uncover two significant categories of insights: primary and secondary insights. The primary insights are those connected with the campaign’s objectives or targets, while the secondary insights are those that relate to the clients’ characteristics.
Primary Insights:
Secondary Insights:
Based on the previous data exploration, we understand that we need to clean and prepare our data according to the business objectives.
Wrangling based on campaign’s objective
In the poutcome column, which represents the outcome of the previous marketing campaign (categorized as ‘unknown’, ‘other’, ‘failure’, ‘success’), we will focus on the ‘failure’ and ‘success’ outcomes. This approach is aimed at ensuring accurate analysis and making accountable decisions.
marketingBank_pOutcome <- marketingBank %>%
filter(poutcome %in% c("failure", "success"))
head(marketingBank_pOutcome)
Insight: In general, the last contact duration is < 400 seconds or < 7 minutes.
Insight: In general, the number of contacts made during the campaign is 1-2 times for each client
Insight: In general, the number of days passed since the last contact of the previous campaign for each client is around 100-380 days.
Insight: In general, the number of contacts made before the current campaign for each client ranges from 1 to 15 times, with the majority falling between 1 and 5 times.
#>
#> no yes
#> no 1412 1059
#> yes 3404 537
#>
#> no yes
#> no 0.22021210 0.16515908
#> yes 0.53087960 0.08374922
Insight: Clients who have housing loans tend to not have subscribed yet.
Insight: Clients who have subscribed have a higher average balance.
#>
#> no yes
#> divorced 560 148
#> married 2862 893
#> single 1394 555
#>
#> no yes
#> divorced 0.08733624 0.02308172
#> married 0.44635059 0.13927012
#> single 0.21740487 0.08655646
Insight: Clients who are married tend to subscribe more easily.
#>
#> no yes
#> primary 638 138
#> secondary 2549 715
#> tertiary 1454 660
#> unknown 175 83
Insight: Subscribers are primarily clients with a secondary education background.
#>
#> no yes
#> no 4062 1494
#> yes 754 102
Insight: Subscribers are primarily clients who have no loan.
#>
#> no yes
#> no 4778 1593
#> yes 38 3
Insight: Subscribers are primarily clients who have no default history.
Insight: The age average of subscribers is 42 years.
#>
#> no yes
#> failure 4283 618
#> other 0 0
#> success 533 978
#> unknown 0 0
Insight: The success of the previous campaign greatly influences clients to become subscribers.
Insight: Clients who have no personal loans have an average balance that is almost twice that of those who have personal loans.
Insight: The average balance of clients who do not have housing loans is higher than those who have housing loans.
marketingBank_pOutcome %>%
mutate(date = glue("{day}-{month}-2012")) %>%
mutate(date = dmy(date)) %>%
group_by(date, y) %>%
summarise(campaign_freq = length(campaign)) %>%
ggplot(aes(x = date, y = campaign_freq)) +
geom_line(color = "lightblue")+
geom_point(color = "darkblue")+
facet_grid(rows = vars(y),
scales = "free_y") +
labs(title = "Total Campaigns Over Time in the 2012 Period",
subtitle = "Differentiated Based on Subscription Status",
x = NULL,
y = "Campaign Intensity") +
theme_light()
Insights:
Reviewing the ‘marketing_pOutcome’ data frame for the existence of clients who have not received the campaign in the previous period.
Insight: All clients have received the campaign in the previous period.
poutcome_n_y <- marketingBank_pOutcome %>%
group_by(poutcome, y) %>%
summarise(freq = length(y)) %>%
mutate(proportion = round(freq/sum(freq)*100 ,2))
poutcome_n_y
poutcome_n_y %>%
ggplot(aes(x = poutcome,
y = freq,
fill = y)) +
geom_col(position = "dodge")+
scale_fill_grey()+
scale_color_grey()+
labs(title = "The Relationship Between Campaign Success and Subscription Status",
subtitle = "The Previous Campaign Period",
y = "Frequence",
x = "Campaing",
fill = "Subscription Status") +
theme_minimal()+
theme(legend.background = element_rect())
Insight:
marketingBank_pOutcome %>%
ggplot(aes(x = previous, fill = poutcome))+
geom_histogram(alpha = 0.7)+
scale_fill_ordinal()+
labs(title = "Correlation of Contact Intensity with Campaign Success",
subtitle = "The Previous Campaign Period",
x = "Contact Intensity",
y = "Frequence",
fill = "Campaign Result") +
theme_minimal()+
theme(legend.background = element_rect())
Insight:
marketingBank_pOutcome %>%
ggplot(aes(x = balance, y = age)) +
geom_jitter(aes( color = poutcome))+
geom_smooth()+
scale_color_manual(values = c("#003049","#eae2b7"))+
labs(title = "The Relationship Between Average Annual Balance and Client Age",
subtitle = "Associated with the Success of the Previous Campaign",
x = "Average Annual Balance (Euro)",
y = "Age",
color = "Previous Campaign Result") +
theme_minimal() +
theme(legend.background = element_rect(),legend.position = "bottom")
Insights:
job_n_y <- marketingBank_pOutcome %>%
group_by(job, poutcome, y) %>%
summarise(freq = length(y)) %>%
mutate(proportion = round(freq/sum(freq)*100,2))
job_n_y
plot <- job_n_y %>%
ggplot(aes(x = freq, y = reorder(job, freq), fill = y))+
geom_col()+
facet_grid(rows = vars(y),
cols = vars(poutcome),
scales = "free_y") +
scale_fill_ordinal()+
geom_point(aes(x = freq),
color = "green")+
geom_text(aes(label = freq),
hjust = -0.4,
size = 3,
color = "darkgreen") +
labs(title = "Client Occupations Regarding the Success of the Previous Campaign Period",
subtitle = "Differentiated Based on Subscription Status",
x = "Frequence",
y = "Occupation",
fill = "Subscription Status") +
theme_light()+
theme(legend.justification = "center" ,
legend.background = element_rect(),
legend.position = "bottom")
plot
Insights:
agg_plot5 <- marketingBank_pOutcome %>%
select(duration, campaign, contact, poutcome,y) %>%
group_by(contact, poutcome, y) %>%
summarise(count_y = length(y))%>%
mutate(poutcome_y = glue("{poutcome}_{y}"))
agg_plot5
agg_plot5 %>%
ggplot(aes(x = contact,
y = count_y,
fill = poutcome_y)) +
geom_col(position = "dodge")+
scale_fill_manual(values = c("#003049","#f7cdb2", "#2d5745", "#f7d168"))+
facet_wrap(vars(contact),
scales = "free_x")+
scale_x_discrete(labels = NULL)+
labs(title = "The Relationship Between the Success of the Previous Campaign and the Subscription Status",
subtitle = "Differentiated Based on Contact Procedures",
x = NULL,
y = "Intensity",
fill = "Campaign_Subscription\n(result_status)") +
theme_light()+
theme(legend.background = element_rect())
Insight:
marketingBank_pOutcome %>%
ggplot(aes(y = age, x = duration))+
geom_jitter(aes(color = poutcome))+
geom_smooth()+
scale_color_ordinal()+
labs(title = "The Relationship Between Contact Duration and Client Age",
subtitle = "Correlated with the Previous Campaign",
x = "Contact Duration (Second)",
y = "Age",
color = "Campaign Result") +
theme_minimal()+
theme(legend.background = element_rect(),legend.position = "bottom")
Insights:
marketingBank_pOutcome %>%
filter(education != "unknown") %>%
ggplot(aes(x = marital, fill = housing))+
geom_bar()+
facet_grid(rows = vars(y),
cols = vars(education),
scale = "free")+
scale_fill_ordinal()+
#scale_x_discrete(labels = NULL)+
labs(title = "Customers' Educational Background and Marital Status",
subtitle = "Housing loans~row: client status (customer/not)",
x = NULL,
y = "Frequence",
fill = "Housing loans") +
theme_light()+
theme(legend.background = element_rect())
marketingBank_pOutcome %>%
filter(education != "unknown") %>%
ggplot(aes(x = marital, fill = loan))+
geom_bar()+
facet_grid(rows = vars(y),
cols = vars(education),
scale = "free")+
scale_fill_grey()+
#scale_x_discrete(labels = NULL)+
labs(title = "Customer's Educational Background and Marital Status",
subtitle = "Personal loans~row: client status (customer/not)",
x = NULL,
y = "Frequence",
fill = "Personal Loans") +
theme_light()+
theme(legend.background = element_rect())
marketingBank_pOutcome %>%
filter(education != "unknown") %>%
ggplot(aes(x = marital, fill = default))+
geom_bar()+
facet_grid(rows = vars(y),
cols = vars(education),
scale = "free")+
scale_fill_discrete_diverging()+
#scale_x_discrete(labels = NULL)+
labs(title = "Customer's Educational Background and Marital Status",
subtitle = "Default history~row: client status (customer/not)",
x = NULL,
y = "Frequence",
fill = "Default History") +
theme_light()+
theme(legend.background = element_rect())
Insights:
The success of the campaign is closely related to the status of the client to become a customer. More than 70% of the previous period’s campaigns failed. The campaign needs to target certain client groups to save costs while increasing success.
Here are some recommendations for prioritizing client characteristics to be contacted for the next campaign period for a more effective campaign.