library(dplyr)
library(ggplot2)
library(tidyr)
library(scales)
library(colorspace)
library(ggridges)
library(glue)
library(lubridate)

1 Project Background

To enhance the quantity and quality of customers, the company can achieve this through campaigns and an effective campaign is one that understands its target audience well. The company need to identify the clients who require their services or products. Hopefully, the company can increase its revenue through effective campaigns.

In 2012, a Portuguese bank company launched a campaign to promote their services. The company tracked and summarized the characteristics of their clients and assessed the effectiveness of their campaign. They aimed to enhance the campaign’s efficiency to reduce capital expenditure in the following periods.

Currently, the bank is planning a new campaign and is seeking assistance from data analysts to provide recommendations on the types of clients to target in the campaign.

2 Business Objective

Identifying the characteristics of clients from the campaign history recap and providing recommendations for clients who are responsive to the campaign.

Benefit: To enhance the effectiveness of campaign success

3 Dataset

The data we get from UCI Machine Learning Repository that provide by Moro,S., Rita,P., and Cortez,P.. (2012). with title Bank Marketing https://doi.org/10.24432/C5K306.

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.

Here is the data.

marketingBank <- read.csv2("bank-full.csv")

marketingBank

and, here is the description of the attribute data.

1 age (numeric)

2 job : type of job (categorical: “admin.”,“unknown”,“unemployed”,“management”,“housemaid”,“entrepreneur”,“student”, “blue-collar”,“self-employed”,“retired”,“technician”,“services”)

3 marital : marital status (categorical: “married”,“divorced”,“single”; note: “divorced” means divorced or widowed)

4 education (categorical: “unknown”,“secondary”,“primary”,“tertiary”)

5 default: has credit in default? (binary: “yes”,“no”)

6 balance: average yearly balance, in euros (numeric)

7 housing: has housing loan? (binary: “yes”,“no”)

8 loan: has personal loan? (binary: “yes”,“no”)

# related with the last contact of the current campaign:

9 contact: contact communication type (categorical: “unknown”,“telephone”,“cellular”)

10 day: last contact day of the month (numeric)

11 month: last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”, “dec”)

12 duration: last contact duration, in seconds (numeric)

# other attributes:

13 campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)

14 pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)

15 previous: number of contacts performed before this campaign and for this client (numeric)

16 poutcome: outcome of the previous marketing campaign (categorical: “unknown”,“other”,“failure”,“success”)

Output variable (desired target):

17 y - has the client subscribed a term deposit? (binary: “yes”,“no”)

4 Exploratory Data Analysis

Let’s explore our data and assist the bank.

4.1 Data Type Inspection

Firstly, we need to check the data types and identify data that has irrelevant type.

glimpse(marketingBank)

#> Rows: 45,211
#> Columns: 17
#> $ age       <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57, ~
#> $ job       <chr> "management", "technician", "entrepreneur", "blue-collar", "~
#> $ marital   <chr> "married", "single", "married", "married", "single", "marrie~
#> $ education <chr> "tertiary", "secondary", "secondary", "unknown", "unknown", ~
#> $ default   <chr> "no", "no", "no", "no", "no", "no", "no", "yes", "no", "no",~
#> $ balance   <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, 71~
#> $ housing   <chr> "yes", "yes", "yes", "yes", "no", "yes", "yes", "yes", "yes"~
#> $ loan      <chr> "no", "no", "yes", "no", "no", "no", "yes", "no", "no", "no"~
#> $ contact   <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "unkn~
#> $ day       <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
#> $ month     <chr> "may", "may", "may", "may", "may", "may", "may", "may", "may~
#> $ duration  <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 517,~
#> $ campaign  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ pdays     <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, ~
#> $ previous  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ poutcome  <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "unkn~
#> $ y         <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", ~

As we can see from the data description, there are columns with irrelevant types. Here they are.

as.factor: job, marital, education, contact, month, poutcome, default, housing, loan, y.

So, let’s convert the data types.

konv_factor <- c("job", "marital", "education", "contact", "month", "poutcome", "default", "housing", "loan", "y")

marketingBank <- marketingBank %>% 
  mutate_at(konv_factor, as.factor) 

glimpse(marketingBank)

#> Rows: 45,211
#> Columns: 17
#> $ age       <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57, ~
#> $ job       <fct> management, technician, entrepreneur, blue-collar, unknown, ~
#> $ marital   <fct> married, single, married, married, single, married, single, ~
#> $ education <fct> tertiary, secondary, secondary, unknown, unknown, tertiary, ~
#> $ default   <fct> no, no, no, no, no, no, no, yes, no, no, no, no, no, no, no,~
#> $ balance   <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, 71~
#> $ housing   <fct> yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes, yes, y~
#> $ loan      <fct> no, no, yes, no, no, no, yes, no, no, no, no, no, no, no, no~
#> $ contact   <fct> unknown, unknown, unknown, unknown, unknown, unknown, unknow~
#> $ day       <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
#> $ month     <fct> may, may, may, may, may, may, may, may, may, may, may, may, ~
#> $ duration  <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 517,~
#> $ campaign  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ pdays     <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, ~
#> $ previous  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ poutcome  <fct> unknown, unknown, unknown, unknown, unknown, unknown, unknow~
#> $ y         <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, ~

4.2 Checking missing value

Next, we need to check for missing values in our data. Missing values can introduce bias into our subsequent analysis.

anyNA(marketingBank)

#> [1] FALSE

Upon inspection, it appears that there are no missing values in our data.

4.3 Data Summary

To gain a basic insight into our data, we can use the summary() method. This method will provide a summary of each of our columns.

dim(marketingBank)

#> [1] 45211    17

summary(marketingBank)

#>       age                 job           marital          education    
#>  Min.   :18.00   blue-collar:9732   divorced: 5207   primary  : 6851  
#>  1st Qu.:33.00   management :9458   married :27214   secondary:23202  
#>  Median :39.00   technician :7597   single  :12790   tertiary :13301  
#>  Mean   :40.94   admin.     :5171                    unknown  : 1857  
#>  3rd Qu.:48.00   services   :4154                                     
#>  Max.   :95.00   retired    :2264                                     
#>                  (Other)    :6835                                     
#>  default        balance       housing      loan            contact     
#>  no :44396   Min.   : -8019   no :20081   no :37967   cellular :29285  
#>  yes:  815   1st Qu.:    72   yes:25130   yes: 7244   telephone: 2906  
#>              Median :   448                           unknown  :13020  
#>              Mean   :  1362                                            
#>              3rd Qu.:  1428                                            
#>              Max.   :102127                                            
#>                                                                        
#>       day            month          duration         campaign     
#>  Min.   : 1.00   may    :13766   Min.   :   0.0   Min.   : 1.000  
#>  1st Qu.: 8.00   jul    : 6895   1st Qu.: 103.0   1st Qu.: 1.000  
#>  Median :16.00   aug    : 6247   Median : 180.0   Median : 2.000  
#>  Mean   :15.81   jun    : 5341   Mean   : 258.2   Mean   : 2.764  
#>  3rd Qu.:21.00   nov    : 3970   3rd Qu.: 319.0   3rd Qu.: 3.000  
#>  Max.   :31.00   apr    : 2932   Max.   :4918.0   Max.   :63.000  
#>                  (Other): 6060                                    
#>      pdays          previous           poutcome       y        
#>  Min.   : -1.0   Min.   :  0.0000   failure: 4901   no :39922  
#>  1st Qu.: -1.0   1st Qu.:  0.0000   other  : 1840   yes: 5289  
#>  Median : -1.0   Median :  0.0000   success: 1511              
#>  Mean   : 40.2   Mean   :  0.5803   unknown:36959              
#>  3rd Qu.: -1.0   3rd Qu.:  0.0000                              
#>  Max.   :871.0   Max.   :275.0000                              
#>

Based on the summary() results, there are some insights that we can use to identify our data. We can uncover two significant categories of insights: primary and secondary insights. The primary insights are those connected with the campaign’s objectives or targets, while the secondary insights are those that relate to the clients’ characteristics.

Primary Insights:

Through the poutcome column, it is evident that 4,901 of the previous campaigns were categorized as failed, while only 1,511 were labeled as successful. The remainder is unknown.
Through the ‘y’ column, it is evident that 39,922 clients who received the campaign have not subscribed to a ‘term deposit’

Secondary Insights:

The average age of clients is 40 years, and the majority of them are married.
The top three client occupations are ‘blue-collar,’ ‘management,’ and ‘technician’.
Most clients do not default on credit and have personal loans.
The most common educational background among clients is ‘secondary’.
Most clients are contacted through ‘cellular’.

Based on the previous data exploration, we understand that we need to clean and prepare our data according to the business objectives.

Wrangling based on campaign’s objective

In the poutcome column, which represents the outcome of the previous marketing campaign (categorized as ‘unknown’, ‘other’, ‘failure’, ‘success’), we will focus on the ‘failure’ and ‘success’ outcomes. This approach is aimed at ensuring accurate analysis and making accountable decisions.

marketingBank_pOutcome <- marketingBank %>% 
  filter(poutcome %in% c("failure", "success"))

head(marketingBank_pOutcome)

4.4 EDA by Plot

4.4.1 Last contact duration

hist(marketingBank_pOutcome$duration, breaks = 100)

Insight: In general, the last contact duration is < 400 seconds or < 7 minutes.

4.4.2 Number of contacts made during the campaign

boxplot(marketingBank_pOutcome$campaign, horizontal = T)

Insight: In general, the number of contacts made during the campaign is 1-2 times for each client

4.4.3 Time passed since the last contact of the previous campaign for each client

hist(marketingBank_pOutcome$pdays, breaks = 50)

Insight: In general, the number of days passed since the last contact of the previous campaign for each client is around 100-380 days.

4.4.4 Number of contacts made before the current campaign

hist(marketingBank_pOutcome$previous, breaks = 100)

Insight: In general, the number of contacts made before the current campaign for each client ranges from 1 to 15 times, with the majority falling between 1 and 5 times.

4.5 EDA by Agregated Table

4.5.1 Housing vs Subscription Status

table(marketingBank_pOutcome$housing, marketingBank_pOutcome$y)

#>      
#>         no  yes
#>   no  1412 1059
#>   yes 3404  537

prop.table(table(marketingBank_pOutcome$housing, marketingBank_pOutcome$y))

#>      
#>               no        yes
#>   no  0.22021210 0.16515908
#>   yes 0.53087960 0.08374922

Insight: Clients who have housing loans tend to not have subscribed yet.

4.5.2 Balance Vs Subscription Status

aggregate(balance~y,
          marketingBank_pOutcome,
          FUN = "mean")

Insight: Clients who have subscribed have a higher average balance.

4.5.3 Marital Vs Subscription Status

table(marketingBank_pOutcome$marital, marketingBank_pOutcome$y)

#>           
#>              no  yes
#>   divorced  560  148
#>   married  2862  893
#>   single   1394  555

prop.table(table(marketingBank_pOutcome$marital, marketingBank_pOutcome$y))

#>           
#>                    no        yes
#>   divorced 0.08733624 0.02308172
#>   married  0.44635059 0.13927012
#>   single   0.21740487 0.08655646

Insight: Clients who are married tend to subscribe more easily.

4.5.4 Education Vs Subscription Status

table(marketingBank_pOutcome$education, marketingBank_pOutcome$y)

#>            
#>               no  yes
#>   primary    638  138
#>   secondary 2549  715
#>   tertiary  1454  660
#>   unknown    175   83

Insight: Subscribers are primarily clients with a secondary education background.

4.5.5 Loan Vs Subscription Status

table(marketingBank_pOutcome$loan, marketingBank_pOutcome$y)

#>      
#>         no  yes
#>   no  4062 1494
#>   yes  754  102

Insight: Subscribers are primarily clients who have no loan.

4.5.6 Default Vs Subscription Status

table(marketingBank_pOutcome$default, marketingBank_pOutcome$y)

#>      
#>         no  yes
#>   no  4778 1593
#>   yes   38    3

Insight: Subscribers are primarily clients who have no default history.

4.5.7 Age Vs Subscription Status

aggregate(age~y,
          marketingBank_pOutcome,
          FUN = "mean")

Insight: The age average of subscribers is 42 years.

4.5.8 Poutcome Vs Subscription Status

table(marketingBank_pOutcome$poutcome, marketingBank_pOutcome$y)

#>          
#>             no  yes
#>   failure 4283  618
#>   other      0    0
#>   success  533  978
#>   unknown    0    0

Insight: The success of the previous campaign greatly influences clients to become subscribers.

4.5.9 Balance Vs Loan

aggregate(balance~loan,
          marketingBank_pOutcome,
          "mean")

Insight: Clients who have no personal loans have an average balance that is almost twice that of those who have personal loans.

4.5.10 Balance Vs Housing

aggregate(balance~housing,
          marketingBank_pOutcome,
          "mean")

Insight: The average balance of clients who do not have housing loans is higher than those who have housing loans.

4.6 Summary insights from EDA by agregated table

The success of the campaign significantly increases the likelihood of customers subscribing.
Dominant client characteristics: -> Aged 42 years -> Married -> Education background: ‘Secondary’ -> No default history -> No housing or personal loans
Most clients in this campaign have not been contacted previously.
On average, the campaign has only been conducted 2-3 times.

5 Explanatory Data Analysis

5.1 How is the trend of the campaign intensity over time?

marketingBank_pOutcome %>% 
  mutate(date = glue("{day}-{month}-2012")) %>% 
  mutate(date = dmy(date)) %>% 
  group_by(date, y) %>% 
  summarise(campaign_freq = length(campaign)) %>% 
  ggplot(aes(x = date, y = campaign_freq)) +
  geom_line(color = "lightblue")+
  geom_point(color = "darkblue")+
  facet_grid(rows = vars(y),
             scales = "free_y") +
  labs(title = "Total Campaigns Over Time in the 2012 Period",
       subtitle = "Differentiated Based on Subscription Status",
       x = NULL,
       y = "Campaign Intensity") +
  theme_light()

Insights:

The campaign intensity for non-subscribers is significantly higher than for subscribers.
The highest average campaign intensity occurs in the months of May and June for both subscribers and non-subscribers.
The campaign intensity for non-subscribers is also notably high during the months of February and March, as well as November and December.

5.2 How is the relationship between the success of the previous campaign and the status of clients becoming subscribers?

Reviewing the ‘marketing_pOutcome’ data frame for the existence of clients who have not received the campaign in the previous period.

marketingBank_pOutcome %>% 
  filter(pdays == -1)

Insight: All clients have received the campaign in the previous period.

poutcome_n_y <- marketingBank_pOutcome %>% 
  group_by(poutcome, y) %>% 
  summarise(freq = length(y)) %>% 
  mutate(proportion = round(freq/sum(freq)*100 ,2))

poutcome_n_y

poutcome_n_y %>% 
  ggplot(aes(x = poutcome, 
             y = freq, 
             fill = y)) +
  geom_col(position = "dodge")+
  scale_fill_grey()+
  scale_color_grey()+
  labs(title = "The Relationship Between Campaign Success and Subscription Status",
       subtitle = "The Previous Campaign Period",
       y = "Frequence",
       x = "Campaing",
       fill = "Subscription Status") +
  theme_minimal()+
  theme(legend.background = element_rect())

Insight:

The success of the campaign significantly increases the likelihood of clients subscribing.
There were more failed campaigns than successful ones in the previous period.

5.3 How is the relationship between the success of the campaign and the number of contacts made in the previous period?

marketingBank_pOutcome %>% 
  ggplot(aes(x = previous, fill = poutcome))+
  geom_histogram(alpha = 0.7)+
  scale_fill_ordinal()+
  labs(title = "Correlation of Contact Intensity with Campaign Success",
       subtitle = "The Previous Campaign Period",
       x = "Contact Intensity",
       y = "Frequence",
       fill = "Campaign Result") +
  theme_minimal()+
  theme(legend.background = element_rect())

Insight:

The intensity of contacts made to clients, whether the campaign was successful or not, in the previous campaign period was the same.

5.4 The Relationship of Client Age with Balance and Previous Campaign Success

marketingBank_pOutcome %>% 
  ggplot(aes(x = balance, y = age)) +
  geom_jitter(aes( color = poutcome))+
  geom_smooth()+
  scale_color_manual(values = c("#003049","#eae2b7"))+
  labs(title = "The Relationship Between Average Annual Balance and Client Age",
       subtitle = "Associated with the Success of the Previous Campaign",
       x = "Average Annual Balance (Euro)",
       y = "Age",
       color = "Previous Campaign Result") +
  theme_minimal() +
  theme(legend.background = element_rect(),legend.position = "bottom")

Insights:

The average annual balance is below 15,000 Euros.
The older the client, the higher the campaign success rate.
There is a weak but positive relationship between the average annual balance and the client’s age.

5.5 How is the relationship between occupation and the success of the previous campaign period, as well as the subscription status of clients?

job_n_y <- marketingBank_pOutcome %>%
  group_by(job, poutcome, y) %>% 
  summarise(freq = length(y)) %>% 
  mutate(proportion = round(freq/sum(freq)*100,2)) 

job_n_y

plot <- job_n_y %>% 
  ggplot(aes(x = freq, y = reorder(job, freq), fill = y))+
  geom_col()+
  facet_grid(rows = vars(y),
             cols = vars(poutcome),
             scales = "free_y") +
  scale_fill_ordinal()+
  geom_point(aes(x = freq),
             color = "green")+
  geom_text(aes(label = freq),
             hjust  = -0.4,
             size = 3,
             color = "darkgreen") +
  labs(title = "Client Occupations Regarding the Success of the Previous Campaign Period",
       subtitle = "Differentiated Based on Subscription Status",
       x = "Frequence",
       y = "Occupation",
       fill = "Subscription Status") +
  theme_light()+
  theme(legend.justification = "center" ,
        legend.background = element_rect(),
        legend.position = "bottom")

plot

Insights:

The occupations of subscribers in the successful category in the previous campaign period are predominantly management, technician, admin, and services.
The occupations of subscribers in the failed category in the previous campaign period are predominantly management, technician, admin, and blue-collar.
The occupations of non-subscribers in the successful category in the previous campaign period are predominantly management, technician, admin, and blue-collar.

5.6 Does the contact method affect the campaign results and subscription status?

agg_plot5 <- marketingBank_pOutcome %>% 
  select(duration, campaign, contact, poutcome,y) %>% 
  group_by(contact, poutcome, y) %>% 
  summarise(count_y = length(y))%>% 
  mutate(poutcome_y = glue("{poutcome}_{y}"))

agg_plot5

agg_plot5 %>% 
  ggplot(aes(x = contact, 
             y = count_y, 
             fill = poutcome_y)) + 
  geom_col(position = "dodge")+
  scale_fill_manual(values = c("#003049","#f7cdb2", "#2d5745", "#f7d168"))+
  facet_wrap(vars(contact),
             scales = "free_x")+
  scale_x_discrete(labels = NULL)+
  labs(title = "The Relationship Between the Success of the Previous Campaign and the Subscription Status",
       subtitle = "Differentiated Based on Contact Procedures",
       x = NULL,
       y = "Intensity",
       fill = "Campaign_Subscription\n(result_status)") +
  theme_light()+
  theme(legend.background = element_rect())

Insight:

Generally, the campaigns are conducted via cellular.

5.7 Does the client’s age affect the duration of contact and relate to the campaign outcome?

marketingBank_pOutcome %>% 
  ggplot(aes(y = age, x = duration))+
  geom_jitter(aes(color = poutcome))+
  geom_smooth()+
  scale_color_ordinal()+
  labs(title = "The Relationship Between Contact Duration and Client Age",
       subtitle = "Correlated with the Previous Campaign",
       x = "Contact Duration (Second)",
       y = "Age",
       color = "Campaign Result") +
  theme_minimal()+
  theme(legend.background = element_rect(),legend.position = "bottom")

Insights:

There is a weak positive correlation between the contact duration and the client’s age during the campaign.
The campaign success also correlates positively with both the client’s age and the campaign contact duration.

5.8 How are the characteristics of marital status, educational background, and loans of the clients?

marketingBank_pOutcome %>% 
  filter(education != "unknown") %>% 
  ggplot(aes(x = marital, fill = housing))+
    geom_bar()+
    facet_grid(rows = vars(y),
               cols = vars(education),
               scale = "free")+
  scale_fill_ordinal()+
  #scale_x_discrete(labels = NULL)+
  labs(title = "Customers' Educational Background and Marital Status",
       subtitle = "Housing loans~row: client status (customer/not)",
       x = NULL,
       y = "Frequence",
       fill = "Housing loans") +
  theme_light()+
  theme(legend.background = element_rect())

marketingBank_pOutcome %>% 
  filter(education != "unknown") %>% 
  ggplot(aes(x = marital, fill = loan))+
    geom_bar()+
    facet_grid(rows = vars(y),
               cols = vars(education),
               scale = "free")+
  scale_fill_grey()+
  #scale_x_discrete(labels = NULL)+
  labs(title = "Customer's Educational Background and Marital Status",
       subtitle = "Personal loans~row: client status (customer/not)",
       x = NULL,
       y = "Frequence",
       fill = "Personal Loans") +
  theme_light()+
  theme(legend.background = element_rect())

marketingBank_pOutcome %>% 
  filter(education != "unknown") %>% 
  ggplot(aes(x = marital, fill = default))+
    geom_bar()+
    facet_grid(rows = vars(y),
               cols = vars(education),
               scale = "free")+
  scale_fill_discrete_diverging()+
  #scale_x_discrete(labels = NULL)+
  labs(title = "Customer's Educational Background and Marital Status",
       subtitle = "Default history~row: client status (customer/not)",
       x = NULL,
       y = "Frequence",
       fill = "Default History") +
  theme_light()+
  theme(legend.background = element_rect())

Insights:

Customers’ educational backgrounds are predominantly secondary > tertiary > primary.
Nearly all customers have no default history and no personal loans.
Most customers do not have housing loans, but among customers with a secondary educational background and married status, almost 50% have housing loans.
The likelihood of customers having housing loans is approximately 25-45% across all educational backgrounds and married statuses

6 Conclusion and Recommendation

6.1 Conclusion

The success of the campaign is closely related to the status of the client to become a customer. More than 70% of the previous period’s campaigns failed. The campaign needs to target certain client groups to save costs while increasing success.

6.2 Recommendation

Here are some recommendations for prioritizing client characteristics to be contacted for the next campaign period for a more effective campaign.

No default history
No personal and housing loans
Occupations: ‘management’, ‘technician’, ‘admin’, and ‘services’
Marital status: married or single
Aged 25 and above.

Bank Marketing Analysis

Ahmad Sirrullah

2023-06-11

1 Project Background

2 Business Objective

3 Dataset

4 Exploratory Data Analysis

4.1 Data Type Inspection

4.2 Checking missing value

4.3 Data Summary

4.4 EDA by Plot

4.4.1 Last contact duration

4.4.2 Number of contacts made during the campaign

4.4.3 Time passed since the last contact of the previous campaign for each client

4.4.4 Number of contacts made before the current campaign

4.5 EDA by Agregated Table

4.5.1 Housing vs Subscription Status

4.5.2 Balance Vs Subscription Status

4.5.3 Marital Vs Subscription Status

4.5.4 Education Vs Subscription Status

4.5.5 Loan Vs Subscription Status

4.5.6 Default Vs Subscription Status

4.5.7 Age Vs Subscription Status

4.5.8 Poutcome Vs Subscription Status

4.5.9 Balance Vs Loan

4.5.10 Balance Vs Housing

4.6 Summary insights from EDA by agregated table

5 Explanatory Data Analysis

5.1 How is the trend of the campaign intensity over time?

5.2 How is the relationship between the success of the previous campaign and the status of clients becoming subscribers?

5.3 How is the relationship between the success of the campaign and the number of contacts made in the previous period?

5.4 The Relationship of Client Age with Balance and Previous Campaign Success

5.5 How is the relationship between occupation and the success of the previous campaign period, as well as the subscription status of clients?

5.6 Does the contact method affect the campaign results and subscription status?

5.7 Does the client’s age affect the duration of contact and relate to the campaign outcome?

5.8 How are the characteristics of marital status, educational background, and loans of the clients?

6 Conclusion and Recommendation

6.1 Conclusion

6.2 Recommendation