Business Task

Analyzing the customer dataset provided by ABC Multinational Bank to highlight any trends in the data and provide insights.

The Available Data

We have been provided a dataset in .csv(Comma Separated Value) format, containing 10,000 customers from the bank and each customer has the following attributes;

Loading and Cleaning the Data

Preparing the Environment
library(readr)
library(janitor)
library(tidyr)
library(dplyr)
library(corrplot)
library(ggplot2)
library(forcats)
library(scales)
Importing the data

Now that the environment has been prepared and the necessary packages loaded, the data is imported into the customer dataset below;

customer <- read.csv("Bank Customer Churn Prediction.csv")
Exploring the data

We then proceed to explore the data to get a grasp of it;

str(customer)
## 'data.frame':    10000 obs. of  12 variables:
##  $ customer_id     : int  15634602 15647311 15619304 15701354 15737888 15574012 15592531 15656148 15792365 15592389 ...
##  $ credit_score    : int  619 608 502 699 850 645 822 376 501 684 ...
##  $ country         : chr  "France" "Spain" "France" "France" ...
##  $ gender          : chr  "Female" "Female" "Female" "Female" ...
##  $ age             : int  42 41 42 39 43 44 50 29 44 27 ...
##  $ tenure          : int  2 1 8 1 2 8 7 4 4 2 ...
##  $ balance         : num  0 83808 159661 0 125511 ...
##  $ products_number : int  1 1 3 2 1 2 2 4 2 1 ...
##  $ credit_card     : int  1 0 1 0 1 1 1 1 0 1 ...
##  $ active_member   : int  1 1 0 0 1 0 1 0 1 1 ...
##  $ estimated_salary: num  101349 112543 113932 93827 79084 ...
##  $ churn           : int  1 0 1 0 0 1 0 1 0 0 ...
head(customer)
##   customer_id credit_score country gender age tenure   balance products_number
## 1    15634602          619  France Female  42      2      0.00               1
## 2    15647311          608   Spain Female  41      1  83807.86               1
## 3    15619304          502  France Female  42      8 159660.80               3
## 4    15701354          699  France Female  39      1      0.00               2
## 5    15737888          850   Spain Female  43      2 125510.82               1
## 6    15574012          645   Spain   Male  44      8 113755.78               2
##   credit_card active_member estimated_salary churn
## 1           1             1        101348.88     1
## 2           0             1        112542.58     0
## 3           1             0        113931.57     1
## 4           0             0         93826.63     0
## 5           1             1         79084.10     0
## 6           1             0        149756.71     1

NOTE: The credit card, active member and churn attributes contain Boolean values depicting True(1) and False(0)

Cleaning the data

The first step we take is to query the data to check if there are duplicate customers; using the customer_id as that is unique to each customer.

#Check for duplicates
get_dupes(customer, customer_id)
## No duplicate combinations found of: customer_id

There are no duplicates in the data. Hence, we proceed to alter the Boolean values to be more accessible and readable;

#Changing the datatype from Integers to Characters
customer$churn <- as.character(customer$churn)
customer$active_member <- as.character(customer$active_member)
customer$credit_card <- as.character(customer$credit_card)

#Converting the Boolean values to the desired values
customer_updated <- customer %>% mutate(credit_card = credit_card) %>%
  mutate(credit_card = fct_recode(credit_card, "Credit Card" = "1", "No-Credit card" = "0"), active_member = fct_recode(active_member, "Active" = "1", "Inactive" = "0"), churn = fct_recode(churn, "Churned" = "1", "Retained" = "0"))
head(customer_updated)

Finally, we remove the Customer Id and the other character datatypes from the dataset to create a table that we can use to check for correlation afterwards;

#Churned customers dataset separated
customer_churned <- customer_updated %>%
  filter(churn == "Churned")

#Drop columns from the dataset
drop<- c("customer_id","country","gender","active_member","churn","credit_card")
customer_corr <-customer_updated[,!names(customer_updated) %in% drop]
customer_churned_corr <- customer_churned[,!names(customer_updated) %in% drop]
head(customer_corr)

NOTE: The Customer Id was dropped as it is a unique value and has no effect on the correlation of the other variables in the dataset.

Exploratory Data Analysis

CUSTOMER ANALYSIS

The bank has a large customer base from France; with more than 50% of the customers banking from France.

The bank’s customers are predominantly male which make up 55% of the customer base with the females making up the remaining 45%

The average credit scores of the customers is 650, with most of the customers having credit scores between 600 and 700

Majority of the bank’s customers are below the age of 50; and have the average age of 39.

Most of the customers have been with the bank for more than a year.

A huge percent of the bank’s customers are inactive; 48% of the total customer count are reported to be inactive members.

The bank managed to retain 80% of their customers with the remaining 20% churned out.

Only 29% of the ABC multinational bank’s customer base do not make use of credit cards.

There is no significant correlation to be observed with only a weak correlation existing between the balance and product numbers.

CHURNED CUSTOMER ANALYSIS

Germany recorded the highest amount of churned customers with 48% of the German customers churned.

Majority of the churned customers(56%) were female despite the total population of the bank being predominantly male.

The average credit scores for the churned customers(645) is slightly lower than the overall average credit score, but majority of the credit scores of churned customers still falls between the 600 and 700 range.

The average age of the churned customer(45) is higher than that of the retained customer(37) with majority of the churned customers between the ages of 35 and 60

Majority of the bank’s churned customers left after spending one year at the bank.

As expected most of the customers who left the bank were inactive members which could serve as an indicator as to to why they left.But 36% of the churned customers were still active customers.

Customers that purchased more products through the bank tended to be churned as 100% of the customers that purchased 4 products left and 79% of those that purchased 3 products also being churned out.

There are still no correlations within the churned customer dataset; as it was with the total customer dataset

Conclusion and Recommendation

Based on the data available from he ABC Multinational Bank dataset below are my recommendations grouped by category;

Country

Credit Scores

Gender

Activity Levels

Age

Tenure and Purchased Products

Churn Rate The bank recorded a churn rate of 20% which is below the average churn rate for financial institutions which is 25%. Hence, I recommend the bank keep implementing more methods and strategies to reduce the churn rate even further.

Thank You!!! for your time.