1 Project Background

To enhance the quantity and quality of customers, the company can achieve this through campaigns and an effective campaign is one that understands its target audience well. The company need to identify the clients who require their services or products. Hopefully, the company can increase its revenue through effective campaigns.

In 2012, a Portuguese bank company launched a campaign to promote their services. The company tracked and summarized the characteristics of their clients and assessed the effectiveness of their campaign. They aimed to enhance the campaign’s efficiency to reduce capital expenditure in the following periods.

Currently, the bank is planning a new campaign and is seeking assistance from data analysts to provide recommendations on the types of clients to target in the campaign.

2 Business Objective

Identifying the characteristics of clients from the campaign history recap and providing recommendations for clients who are responsive to the campaign.

Benefit: To enhance the effectiveness of campaign success

3 Dataset

The data we get from UCI Machine Learning Repository that provide by Moro,S., Rita,P., and Cortez,P.. (2012). with title Bank Marketing https://doi.org/10.24432/C5K306.

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.

Here is the data.

and, here is the description of the attribute data.

1 age (numeric)

2 job : type of job (categorical: “admin.”,“unknown”,“unemployed”,“management”,“housemaid”,“entrepreneur”,“student”, “blue-collar”,“self-employed”,“retired”,“technician”,“services”)

3 marital : marital status (categorical: “married”,“divorced”,“single”; note: “divorced” means divorced or widowed)

4 education (categorical: “unknown”,“secondary”,“primary”,“tertiary”)

5 default: has credit in default? (binary: “yes”,“no”)

6 balance: average yearly balance, in euros (numeric)

7 housing: has housing loan? (binary: “yes”,“no”)

8 loan: has personal loan? (binary: “yes”,“no”)

# related with the last contact of the current campaign:

9 contact: contact communication type (categorical: “unknown”,“telephone”,“cellular”)

10 day: last contact day of the month (numeric)

11 month: last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”, “dec”)

12 duration: last contact duration, in seconds (numeric)

# other attributes:

13 campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)

14 pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)

15 previous: number of contacts performed before this campaign and for this client (numeric)

16 poutcome: outcome of the previous marketing campaign (categorical: “unknown”,“other”,“failure”,“success”)

Output variable (desired target):

17 y - has the client subscribed a term deposit? (binary: “yes”,“no”)

4 Exploratory Data Analysis

Let’s explore our data and assist the bank.

4.1 Data Type Inspection

Firstly, we need to check the data types and identify data that has irrelevant type.

#> Rows: 45,211
#> Columns: 17
#> $ age       <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57, ~
#> $ job       <chr> "management", "technician", "entrepreneur", "blue-collar", "~
#> $ marital   <chr> "married", "single", "married", "married", "single", "marrie~
#> $ education <chr> "tertiary", "secondary", "secondary", "unknown", "unknown", ~
#> $ default   <chr> "no", "no", "no", "no", "no", "no", "no", "yes", "no", "no",~
#> $ balance   <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, 71~
#> $ housing   <chr> "yes", "yes", "yes", "yes", "no", "yes", "yes", "yes", "yes"~
#> $ loan      <chr> "no", "no", "yes", "no", "no", "no", "yes", "no", "no", "no"~
#> $ contact   <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "unkn~
#> $ day       <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
#> $ month     <chr> "may", "may", "may", "may", "may", "may", "may", "may", "may~
#> $ duration  <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 517,~
#> $ campaign  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ pdays     <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, ~
#> $ previous  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ poutcome  <chr> "unknown", "unknown", "unknown", "unknown", "unknown", "unkn~
#> $ y         <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", ~

As we can see from the data description, there are columns with irrelevant types. Here they are.

  • as.factor: job, marital, education, contact, month, poutcome, default, housing, loan, y.

So, let’s convert the data types.

#> Rows: 45,211
#> Columns: 17
#> $ age       <int> 58, 44, 33, 47, 33, 35, 28, 42, 58, 43, 41, 29, 53, 58, 57, ~
#> $ job       <fct> management, technician, entrepreneur, blue-collar, unknown, ~
#> $ marital   <fct> married, single, married, married, single, married, single, ~
#> $ education <fct> tertiary, secondary, secondary, unknown, unknown, tertiary, ~
#> $ default   <fct> no, no, no, no, no, no, no, yes, no, no, no, no, no, no, no,~
#> $ balance   <int> 2143, 29, 2, 1506, 1, 231, 447, 2, 121, 593, 270, 390, 6, 71~
#> $ housing   <fct> yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes, yes, y~
#> $ loan      <fct> no, no, yes, no, no, no, yes, no, no, no, no, no, no, no, no~
#> $ contact   <fct> unknown, unknown, unknown, unknown, unknown, unknown, unknow~
#> $ day       <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
#> $ month     <fct> may, may, may, may, may, may, may, may, may, may, may, may, ~
#> $ duration  <int> 261, 151, 76, 92, 198, 139, 217, 380, 50, 55, 222, 137, 517,~
#> $ campaign  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ pdays     <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, ~
#> $ previous  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ poutcome  <fct> unknown, unknown, unknown, unknown, unknown, unknown, unknow~
#> $ y         <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, ~

4.2 Checking missing value

Next, we need to check for missing values in our data. Missing values can introduce bias into our subsequent analysis.

#> [1] FALSE

Upon inspection, it appears that there are no missing values in our data.

4.3 Data Summary

To gain a basic insight into our data, we can use the summary() method. This method will provide a summary of each of our columns.

#> [1] 45211    17
#>       age                 job           marital          education    
#>  Min.   :18.00   blue-collar:9732   divorced: 5207   primary  : 6851  
#>  1st Qu.:33.00   management :9458   married :27214   secondary:23202  
#>  Median :39.00   technician :7597   single  :12790   tertiary :13301  
#>  Mean   :40.94   admin.     :5171                    unknown  : 1857  
#>  3rd Qu.:48.00   services   :4154                                     
#>  Max.   :95.00   retired    :2264                                     
#>                  (Other)    :6835                                     
#>  default        balance       housing      loan            contact     
#>  no :44396   Min.   : -8019   no :20081   no :37967   cellular :29285  
#>  yes:  815   1st Qu.:    72   yes:25130   yes: 7244   telephone: 2906  
#>              Median :   448                           unknown  :13020  
#>              Mean   :  1362                                            
#>              3rd Qu.:  1428                                            
#>              Max.   :102127                                            
#>                                                                        
#>       day            month          duration         campaign     
#>  Min.   : 1.00   may    :13766   Min.   :   0.0   Min.   : 1.000  
#>  1st Qu.: 8.00   jul    : 6895   1st Qu.: 103.0   1st Qu.: 1.000  
#>  Median :16.00   aug    : 6247   Median : 180.0   Median : 2.000  
#>  Mean   :15.81   jun    : 5341   Mean   : 258.2   Mean   : 2.764  
#>  3rd Qu.:21.00   nov    : 3970   3rd Qu.: 319.0   3rd Qu.: 3.000  
#>  Max.   :31.00   apr    : 2932   Max.   :4918.0   Max.   :63.000  
#>                  (Other): 6060                                    
#>      pdays          previous           poutcome       y        
#>  Min.   : -1.0   Min.   :  0.0000   failure: 4901   no :39922  
#>  1st Qu.: -1.0   1st Qu.:  0.0000   other  : 1840   yes: 5289  
#>  Median : -1.0   Median :  0.0000   success: 1511              
#>  Mean   : 40.2   Mean   :  0.5803   unknown:36959              
#>  3rd Qu.: -1.0   3rd Qu.:  0.0000                              
#>  Max.   :871.0   Max.   :275.0000                              
#> 

Based on the summary() results, there are some insights that we can use to identify our data. We can uncover two significant categories of insights: primary and secondary insights. The primary insights are those connected with the campaign’s objectives or targets, while the secondary insights are those that relate to the clients’ characteristics.

Primary Insights:

  • Through the poutcome column, it is evident that 4,901 of the previous campaigns were categorized as failed, while only 1,511 were labeled as successful. The remainder is unknown.
  • Through the ‘y’ column, it is evident that 39,922 clients who received the campaign have not subscribed to a ‘term deposit’

Secondary Insights:

  • The average age of clients is 40 years, and the majority of them are married.
  • The top three client occupations are ‘blue-collar,’ ‘management,’ and ‘technician’.
  • Most clients do not default on credit and have personal loans.
  • The most common educational background among clients is ‘secondary’.
  • Most clients are contacted through ‘cellular’.

Based on the previous data exploration, we understand that we need to clean and prepare our data according to the business objectives.

Wrangling based on campaign’s objective

In the poutcome column, which represents the outcome of the previous marketing campaign (categorized as ‘unknown’, ‘other’, ‘failure’, ‘success’), we will focus on the ‘failure’ and ‘success’ outcomes. This approach is aimed at ensuring accurate analysis and making accountable decisions.

4.4 EDA by Plot

4.4.1 Last contact duration

Insight: In general, the last contact duration is < 400 seconds or < 7 minutes.

4.4.2 Number of contacts made during the campaign

Insight: In general, the number of contacts made during the campaign is 1-2 times for each client

4.4.3 Time passed since the last contact of the previous campaign for each client

Insight: In general, the number of days passed since the last contact of the previous campaign for each client is around 100-380 days.

4.4.4 Number of contacts made before the current campaign

Insight: In general, the number of contacts made before the current campaign for each client ranges from 1 to 15 times, with the majority falling between 1 and 5 times.

4.5 EDA by Agregated Table

4.5.1 Housing vs Subscription Status

#>      
#>         no  yes
#>   no  1412 1059
#>   yes 3404  537
#>      
#>               no        yes
#>   no  0.22021210 0.16515908
#>   yes 0.53087960 0.08374922

Insight: Clients who have housing loans tend to not have subscribed yet.

4.5.2 Balance Vs Subscription Status

Insight: Clients who have subscribed have a higher average balance.

4.5.3 Marital Vs Subscription Status

#>           
#>              no  yes
#>   divorced  560  148
#>   married  2862  893
#>   single   1394  555
#>           
#>                    no        yes
#>   divorced 0.08733624 0.02308172
#>   married  0.44635059 0.13927012
#>   single   0.21740487 0.08655646

Insight: Clients who are married tend to subscribe more easily.

4.5.4 Education Vs Subscription Status

#>            
#>               no  yes
#>   primary    638  138
#>   secondary 2549  715
#>   tertiary  1454  660
#>   unknown    175   83

Insight: Subscribers are primarily clients with a secondary education background.

4.5.5 Loan Vs Subscription Status

#>      
#>         no  yes
#>   no  4062 1494
#>   yes  754  102

Insight: Subscribers are primarily clients who have no loan.

4.5.6 Default Vs Subscription Status

#>      
#>         no  yes
#>   no  4778 1593
#>   yes   38    3

Insight: Subscribers are primarily clients who have no default history.

4.5.7 Age Vs Subscription Status

Insight: The age average of subscribers is 42 years.

4.5.8 Poutcome Vs Subscription Status

#>          
#>             no  yes
#>   failure 4283  618
#>   other      0    0
#>   success  533  978
#>   unknown    0    0

Insight: The success of the previous campaign greatly influences clients to become subscribers.

4.5.9 Balance Vs Loan

Insight: Clients who have no personal loans have an average balance that is almost twice that of those who have personal loans.

4.5.10 Balance Vs Housing

Insight: The average balance of clients who do not have housing loans is higher than those who have housing loans.

4.6 Summary insights from EDA by agregated table

  • The success of the campaign significantly increases the likelihood of customers subscribing.
  • Dominant client characteristics: -> Aged 42 years -> Married -> Education background: ‘Secondary’ -> No default history -> No housing or personal loans
  • Most clients in this campaign have not been contacted previously.
  • On average, the campaign has only been conducted 2-3 times.

5 Explanatory Data Analysis

5.1 How is the trend of the campaign intensity over time?

Insights:

  • The campaign intensity for non-subscribers is significantly higher than for subscribers.
  • The highest average campaign intensity occurs in the months of May and June for both subscribers and non-subscribers.
  • The campaign intensity for non-subscribers is also notably high during the months of February and March, as well as November and December.

5.2 How is the relationship between the success of the previous campaign and the status of clients becoming subscribers?

Reviewing the ‘marketing_pOutcome’ data frame for the existence of clients who have not received the campaign in the previous period.

Insight: All clients have received the campaign in the previous period.

Insight:

  • The success of the campaign significantly increases the likelihood of clients subscribing.
  • There were more failed campaigns than successful ones in the previous period.

5.3 How is the relationship between the success of the campaign and the number of contacts made in the previous period?

Insight:

  • The intensity of contacts made to clients, whether the campaign was successful or not, in the previous campaign period was the same.

5.5 How is the relationship between occupation and the success of the previous campaign period, as well as the subscription status of clients?

Insights:

  • The occupations of subscribers in the successful category in the previous campaign period are predominantly management, technician, admin, and services.
  • The occupations of subscribers in the failed category in the previous campaign period are predominantly management, technician, admin, and blue-collar.
  • The occupations of non-subscribers in the successful category in the previous campaign period are predominantly management, technician, admin, and blue-collar.

5.6 Does the contact method affect the campaign results and subscription status?

Insight:

  • Generally, the campaigns are conducted via cellular.

5.7 Does the client’s age affect the duration of contact and relate to the campaign outcome?

Insights:

  • There is a weak positive correlation between the contact duration and the client’s age during the campaign.
  • The campaign success also correlates positively with both the client’s age and the campaign contact duration.

5.8 How are the characteristics of marital status, educational background, and loans of the clients?

Insights:

  • Customers’ educational backgrounds are predominantly secondary > tertiary > primary.
  • Nearly all customers have no default history and no personal loans.
  • Most customers do not have housing loans, but among customers with a secondary educational background and married status, almost 50% have housing loans.
  • The likelihood of customers having housing loans is approximately 25-45% across all educational backgrounds and married statuses

6 Conclusion and Recommendation

6.1 Conclusion

The success of the campaign is closely related to the status of the client to become a customer. More than 70% of the previous period’s campaigns failed. The campaign needs to target certain client groups to save costs while increasing success.

6.2 Recommendation

Here are some recommendations for prioritizing client characteristics to be contacted for the next campaign period for a more effective campaign.

  • No default history
  • No personal and housing loans
  • Occupations: ‘management’, ‘technician’, ‘admin’, and ‘services’
  • Marital status: married or single
  • Aged 25 and above.