Introduction

During this task our group was given an opportunity to explore the Portuguese Bank Marketing Data Set, containing information on this bank’s telemarketing campaign about term deposits. Work on the project was quite productive, and together we performed exploratory data analysis, built a predictive model, developed several strategies for subscription rate increase based on the analysis results and checked our suggestions with the what-if modelling. In this report, we will provide essential summary of our work’s results.

Exploratory data analysis

Exploratory Data Analysis or EDA is the first step of the work with the dataset, targeted on understading the data itself and the current situation. First of all we checked, how many people agreed to subscribe a term deposit during the bank’s campaign and figured out, that only 11,4% of bank’s clients did so. To understand the reasons of this rate we explored every variable of the dataset to reveal the groups of clients with the highest subscribtion rate and with the lowest. In the graphs and remarks below, you can observe the visualization of the most important parts in our opinion at this stage of the analysis.

Client’s age

This graph shows the age distribution amoung bank customers. Blue dashed lines divide the overall amount in three groups : from 0 to 30, from 30 to 60 and from 60 till 100 : the young, middle-aged and old people. From the graph it is clear that the main “audience” of the bank is middle-aged people from 30 to 60, however, as our further analysis showed, this age group has the lowest subscribtion rate among other two : only 10%. Young people have slightly higher rate : 15% of them subsrcibed the term deposit. In contrast the smallest age group, eldery people, have the highest subscribtion rate, reaching almost 50%, which makes this age group very beneficial for such campaigns.

The importance of the time of the year

In this variable exploration we noticed an interesting feature : the month with the biggest number of calles performed has the worst subscribtion rate, while months with very little number of calls have the best subscribtion rate. Based on this observation we concluded that the best time of the year for activity of the bank’s telemarketing campaign about term deposits is the whole Autumn + December + March.

Response from the previous campaign

This statement seems more or less logical, but we checked, whether people, who agreed to subscribe a term deposit during last campaign are likely to subscribe it again, and it turned out, that they do. Also this parameter is very, very significant for the further analysis.

Days from the previous campaign

The results from this part’s analysis are quite interesting. It was quite a suprise, that the highest subscribtion rate so vividly traced among people, who were contacted 75-100 days and 175-200 days after previous campaign. This seems to be the best gap for re-contacting with the client.

Number of contacts with client

Nobody likes too assertive suggestions, and the bank’s clients are not an exception. None of the clients agreed to subsribe a term deposit after being contacted 10 and more times. From this we can coclude, that multiple contacts with the client are not the best idea, and the maximum number of contacts, which gives at least some profit is 8.

Client’s balance

Firtly it is worth mentioning, that “Dolphins”, “Middle fish” and “Plankton” in this context are not ocean inhabitants, but names of the client’s groups, divided by their current balance, due to the business trdition. Dolphins are those, whose balance is higher than 2254 dollars, Middle fish are those, whose balance is higher than 327 dollars, and Plankton are the rest (Such clustering is justified by the results of our analysis). What is very interesting in this part of analysis, is that with the balance increasing, the subscribtion rate of the group is also increasing.

EDA Results

To summarise all the text below, we provided several hypotheses: * Older people have the highest subscribtion rate of all the other age groups, so in order to increase the subscribtion rate the bank should target eldery. * It is better to avoid multiple contacts with one client, since none of the clients agreed to subsribe a term deposit after being contacted many times (more than 10). * The best amount of time for recontating during new campaign with the familiar client is 75-100 days or 175-200 days after previous campaign. * People, who were pre-conctacted 1-3 times before the campaign, are more likely to subscribe a term deposit, than those, who were not pre-contacted at all. * The best time for the campaign activity is Autumn + December + March, while the worst month for doing so is May.

Now, to the next part of our analysis!

Making predictions about subscriptions

What Tasks Will Be Performed

During the analysis, we created a model which predicts whether a person will subscribe or not. It makes a correct prediction in almost 91% of cases which is considered to be very precise.

  • You should NOT use it: for defining people who will agree to subscribe. It works bad.
  • You should use it: for defining people who will refuse to subscribe.

Although the model does not fit for identifying subscribers, this drawback does not seem to be important for your company. If the company does not spend resources on attracting clients, but they still subscribe to the service, you will loose nothing.

In contrast, from the point of view of monetary gains and losses, it will be much more crucial to waste resources on working with clients who will not subscribe anyway. In fact, this model will help to save company`s money, since it defines non-subscribers in 97% of cases.

So, our goal is not only to improve your subscription rates, but also to save you from wasting money on non-profitable actions.

Describing subscription process

From the tree, we can track the subscription process in its initial stages. Here is a description of several combinations of factors which can lead to the positive outcome: a person will subscribe for the service.

  • At the first stage, the outcome of the campaign depends on the duration of the last contact with a person. If it lasted more than 8.4 seconds, then in 43% of cases a person would agree to subscribe
    • If the duration was even more than 14 seconds, a person would agree to subscribe in 59% of cases.
    • If not, but a person subscribed during the last campaign, they would subscribe in this one in 83% of cases.
  • If the last call`s duration was less than 8.4 seconds, but more than 2.7 seconds, given that people subscribed during previous campaign, they would subscribe again in 73% of cases.
  • If both the of the last call was less than 8.4 seconds and a person did not subscribe during previous campaign, if a person was contacted in March, September, October or December with call duration more than 3.1 seconds, 58% of people would subscribe.

To sum up, if your company will apply one of presented scenarios to the work, it will be possible to increase subscription rates both in this and in future campaigns.

Looking at individual subscription cases

Now, let`s have a closer look at the way how the combinations of factors can influence the outcome of the subscription process for indivisual cases.Here we have two cases: for subsription and refusal to subscribe. Green bars show that the given factor increase the chances that a person will refuse to subscribe (for case 1) or will accept to subscribe (for case 2). Red bars show that the given factor decrease the chances that a person will refuse to subscribe (2) or will accept to subscribe (for case 1).

  • For case 1, we have that in 83 out of 100 cases a person having these factors will refuse to subscribe. + The largest bar is red and it refers to the duration of the last call which was less than 5.32 seconds. This factor is the main one which lowers the chance that a person will refuse to subscribe.
    • Two most important factors which increase the chance that a person will refuse to subscribe are having housing loan and unknown result of the previous campaign.
    • The other factors which weaken the chances to refuse subscription are an absence of a personal loan and a person is being divorced.
    • The other factors which improve the chances to refuse subscription are having a credit in default and a job of an admin.
  • For case 2, we have that in 51 out of 100 cases a person having these factors will accept to subscribe.
    • The largest bar is green and it refers to the duration of the last call which was less than 5.32 seconds. This factor is the main one which increses the chance that a person will subscribe.
    • Two most important factors which descrease the chance that a person will accept subscribe are having housing loan and unknown result of the previous campaign.
    • The another factor which improves the chances to accept subscription is an absence of a personal loan.
    • The other factors which weakens the chances to accept subscription are having a credit in default, being married and being younger than 48.
  • So, here is a presentation of how this factors occuring both separately and together influence the outcome: whether a person will subscribe or not.

Identifying key factors for subscription

Next, let`s move from individual cases to general patterns. We are to define the most important factors which influence the result of subscription in general.

We can evaluate how significant the given variable in defining whether a person will subscribe or not.

  • The most important factor is duration of the last call.
  • Also, such factors as last contact month, poutcome - outcome of the previous marketing campaign, and housing - having personal loan, have a high influence on the outcome of the campaign.
  • It turns out that having a credit default have no influence on the outcome.
  • Moreover, we should think twice before inclusing such factors as personal education level and number of contacts performed during this campaign and for this client since they show low importance for defining whether a person will subscribe or not.

Defining how factors influence each other

Here, we will define how factors are interconnected.

We suggest to read the plot upwards for better undertanding of its logics.

  • The factor that is shown at the very bottom of the picture - response_binary - is our outcome we want to define: whether a person will subscribe or not.
  • Arrows pointing towards response_binary are key factors which have the highest influence on the outcome. 3 of them were chosen among the most important features that we defined earlier: mon - last contacted month, dur - duration of the last call, and poutcome - outcome of the previous marketing campaign. Moreover, personally we defined bal - balance level as a factor which strongly influences the outcome variable.
  • Arrows pointing towards key factors state that a feature influences a key factor.
    • For balance: the variables which can influence person`s balance level are connected either with credits or with job.
    • For poutcome: the outcome of previous campaign can be influenced by a last contact month. The logics is simple: since it has an impact on this campaign, it had an impact on the last one, too.

Next, let`s move from left to right and upwards through chains of factors for deeper exploring of interconnections between arcs.

  • Bal -> response_binary stands for an influence of a balance level on whether a person will subscribe or not (later - outcome). The higher is the balance, the higher is the chance that a person will subscribe.
  • Job -> bal stands for an influence of a type of job on a balance level. There is an arc between a job and balance since different types of jobs are differently paid.
  • Loan -> bal stands for an influence of having a personal loan on a balance level. If a person has a loan, their balance is decreased by the amount of loan.
  • Housing -> bal stands for an influence of having a housing loan on a balance level. If a person has the housing loan, their balance is decreased by the amount of loan.
  • Poutcome -> response_binary stands for an influence of outcome of the previous marketing campaign on results of this campaign. If a person agreed to subscribe the last time, the chance that he/she will agree again is increased.
  • Month -> poutcome stands for an influence of the last contact month on the results of previous campaign. It was found out that, during previous campaign, in march, september, october and december subscription rates were much higher than other months.
  • Month -> response_binary stands for an influence of the last contact month of the year on the outcome. It was found out that, during this campaign, in march, september, october and december subscription rates are much higher than during other months.
  • Dur -> response_binary stands for an influence of the last contact duration on the outcome. The longer was the last contact, the more chances that a company managed to convince a person to subscribe.

How well does it work

From the point of view monetary gains and losses, working according to this model will help you to save as much money as possible. The reason is that it correctly defines non-subscribers in 99% of cases, which is even much higher than in previous model. Apply it and you will always know the customers with which it can be non-profitable to work and think, how you can change their minds in order to improve subscription rates.

Subscription Improvement Policy and What-If

Conclusions, based on probabilities’ analysis

In order to improve the subscription level, we should identify the aspects that are worth paying attention to.

Solutions for the Next Marketing Campaign:

  1. Months of Marketing Activity: We saw that the the months of highest subscription rate were March, September, October and December. For the next marketing campaign, it will be wise for the bank to focus the marketing campaign on these months.
  2. Seasonality: According to the previous point, potential clients are more inclined to suscribe term deposits during the seasons of fall and winter. The next marketing campaign should focus its activity throghout these seasons.
  3. Duration: The length of the conversation greatly affects the subscription. The longer the conversation, the more likely it is that the potential client would open a term deposit account. According to the data, optimal conversation duration is 3-5 sec.
  4. Occupation: In this campaign, potential clients that were students or retired were the most likely to suscribe to a term deposit. As for retired, they, as a rule, do not spend its cash, so they are more likely to put their cash in term deposits in order to gain some cash through interest payments. As for students, they tend to use deposits as a piggy bank for their own purposes. Since students generally do not have regular earnings, they tend to have season part-time job and invest the accumulated money at interest. The next marketing campaign should take into account these groups.
  5. Previous outcome: Target a group that has subscribed during this campaign. As the data shows, people who have already subscribed in the past are more likely to do it again.
  6. Loans and Balances: Potential clients with a low or negative balance are most likely to have a loan. Having a loan means that potential client has financial compromises to pay back its house loan and thus, there is no cash for he or she to suscribe to a term deposit account. Accordingly, the bank should focus on individuals with average or high balance, since these groups are less inclined to have a loan, so they have free cash that they can put at interest.

WHAT we may change (ang how exactly) IF we want more response?

Let’s take another look at our model.

How should variables be changed to increase the subscription rate?

What if we will take in account previous’ campaign outcome and duration?

How will response change if the duration of the call is longer with clients who subscribed during last campaign?

  • If bank talk with such clients for more than 10 sec in 9 cases out of 10 or every fourth for 3-5 seconds, then we can increase the subscription rate by about 10%.

How to achieve this?

  • It is necessary to interest the client in continuing cooperation. This can be achieved by sending newsletters to the client’s email or on his or her phone number. Thus, the client will be aware of relevant information and more likely to agree to subscribe again.
  • Bank may also interest customers with special offers available only to them. For example, a privileged service at a bank subject to a deposit of a certain amount.

What if we change the duration of our call?

Now let’s see how response changes if bank increase the call duration for those client with whom the bank could not contact (duration less than 1 sec).

If bank talk with these clients in 9 out of 10 cases or 7 out of 10 cases for less than 1sec, then the probability of a positive response will increase by about 18%.

How to achieve this?

  • It is necessary to inform the client about relevant information by email or SMS. This should increase the probability of client interest.
  • Bank can use the questionnaire during calls in order to understand how the bank should develop in order to increase the subscription level.

What if we call our customers in different season of the year?

How will response change if bank call more affluent clients (more than 1000$) during summer?

If bank talk with these clients and those who have 500-1000$ in 9 out of 10 cases, then the probability of a positive response will increase by about 10%.

How to achieve this?

  • Again, bank can offer affluent clients exclusive services that are provided along with the open a term deposit.
  • Seasonal offers. Offer clients a higher rate in case of opening a deposit in a certain season.

What if we will take in account client’s balance?

Now let’s consider the balance again. How response changes if bank call less poor clients (negative balance) more affluent clients (more than 1000$)?

If bank talk with these clients in 9 out of 10 cases or 1 out of 10 cases for clients with balance 100-500$, then the probability of a positive response will increase by about 5%.

How to achieve this?

  • Again, bank can offer affluent clients exclusive services that are provided along with the open a term deposit.
  • Bank can also offer a special discount on a certain list of goods to those who open a term deposit.

Who is our perfect client?

From previous analysis, we can try to find a condition under which the probability of subscribtion increases.

To get a high probability of subscription, bank should call clients who subscribed during the last campaign in the spring and speak with them for at least 10 seconds. Then the probability of subscription will reach 82%