Analyzing the customer dataset provided by ABC Multinational Bank to highlight any trends in the data and provide insights.
We have been provided a dataset in .csv(Comma Separated Value) format, containing 10,000 customers from the bank and each customer has the following attributes;
library(readr)
library(janitor)
library(tidyr)
library(dplyr)
library(corrplot)
library(ggplot2)
library(forcats)
library(scales)
Now that the environment has been prepared and the necessary packages loaded, the data is imported into the customer dataset below;
customer <- read.csv("Bank Customer Churn Prediction.csv")
We then proceed to explore the data to get a grasp of it;
str(customer)
## 'data.frame': 10000 obs. of 12 variables:
## $ customer_id : int 15634602 15647311 15619304 15701354 15737888 15574012 15592531 15656148 15792365 15592389 ...
## $ credit_score : int 619 608 502 699 850 645 822 376 501 684 ...
## $ country : chr "France" "Spain" "France" "France" ...
## $ gender : chr "Female" "Female" "Female" "Female" ...
## $ age : int 42 41 42 39 43 44 50 29 44 27 ...
## $ tenure : int 2 1 8 1 2 8 7 4 4 2 ...
## $ balance : num 0 83808 159661 0 125511 ...
## $ products_number : int 1 1 3 2 1 2 2 4 2 1 ...
## $ credit_card : int 1 0 1 0 1 1 1 1 0 1 ...
## $ active_member : int 1 1 0 0 1 0 1 0 1 1 ...
## $ estimated_salary: num 101349 112543 113932 93827 79084 ...
## $ churn : int 1 0 1 0 0 1 0 1 0 0 ...
head(customer)
## customer_id credit_score country gender age tenure balance products_number
## 1 15634602 619 France Female 42 2 0.00 1
## 2 15647311 608 Spain Female 41 1 83807.86 1
## 3 15619304 502 France Female 42 8 159660.80 3
## 4 15701354 699 France Female 39 1 0.00 2
## 5 15737888 850 Spain Female 43 2 125510.82 1
## 6 15574012 645 Spain Male 44 8 113755.78 2
## credit_card active_member estimated_salary churn
## 1 1 1 101348.88 1
## 2 0 1 112542.58 0
## 3 1 0 113931.57 1
## 4 0 0 93826.63 0
## 5 1 1 79084.10 0
## 6 1 0 149756.71 1
NOTE: The credit card, active member and churn attributes contain Boolean values depicting True(1) and False(0)
The first step we take is to query the data to check if there are duplicate customers; using the customer_id as that is unique to each customer.
#Check for duplicates
get_dupes(customer, customer_id)
## No duplicate combinations found of: customer_id
There are no duplicates in the data. Hence, we proceed to alter the Boolean values to be more accessible and readable;
#Changing the datatype from Integers to Characters
customer$churn <- as.character(customer$churn)
customer$active_member <- as.character(customer$active_member)
customer$credit_card <- as.character(customer$credit_card)
#Converting the Boolean values to the desired values
customer_updated <- customer %>% mutate(credit_card = credit_card) %>%
mutate(credit_card = fct_recode(credit_card, "Credit Card" = "1", "No-Credit card" = "0"), active_member = fct_recode(active_member, "Active" = "1", "Inactive" = "0"), churn = fct_recode(churn, "Churned" = "1", "Retained" = "0"))
head(customer_updated)
Finally, we remove the Customer Id and the other character datatypes from the dataset to create a table that we can use to check for correlation afterwards;
#Churned customers dataset separated
customer_churned <- customer_updated %>%
filter(churn == "Churned")
#Drop columns from the dataset
drop<- c("customer_id","country","gender","active_member","churn","credit_card")
customer_corr <-customer_updated[,!names(customer_updated) %in% drop]
customer_churned_corr <- customer_churned[,!names(customer_updated) %in% drop]
head(customer_corr)
NOTE: The Customer Id was dropped as it is a unique value and has no effect on the correlation of the other variables in the dataset.
The bank has a large customer base from France; with more than 50% of the customers banking from France.
The bank’s customers are predominantly male which make up 55% of the customer base with the females making up the remaining 45%
The average credit scores of the customers is 650, with most of the customers having credit scores between 600 and 700
Majority of the bank’s customers are below the age of 50; and have the average age of 39.
Most of the customers have been with the bank for more than a year.
A huge percent of the bank’s customers are inactive; 48% of the total customer count are reported to be inactive members.
The bank managed to retain 80% of their customers with the remaining 20% churned out.
Only 29% of the ABC multinational bank’s customer base do not make use of credit cards.
There is no significant correlation to be observed with only a weak correlation existing between the balance and product numbers.
Germany recorded the highest amount of churned customers with 48% of the German customers churned.
Majority of the churned customers(56%) were female despite the total population of the bank being predominantly male.
The average credit scores for the churned customers(645) is slightly lower than the overall average credit score, but majority of the credit scores of churned customers still falls between the 600 and 700 range.
The average age of the churned customer(45) is higher than that of the retained customer(37) with majority of the churned customers between the ages of 35 and 60
Majority of the bank’s churned customers left after spending one year at the bank.
As expected most of the customers who left the bank were inactive members which could serve as an indicator as to to why they left.But 36% of the churned customers were still active customers.
Customers that purchased more products through the bank tended to be churned as 100% of the customers that purchased 4 products left and 79% of those that purchased 3 products also being churned out.
There are still no correlations within the churned customer dataset; as it was with the total customer dataset
Based on the data available from he ABC Multinational Bank dataset below are my recommendations grouped by category;
Country
The French branches make up a huge portion of the customer base.
The bank should set up a campaign to expand the customer base in Germany, Spain, and could later look to expand to other surrounding countries.
German branches recorded the highest churn rate with 48% hence further analysis would be required to determine why.
Credit Scores
The credit scores of most customers of the bank are generally considered to be above the good threshold.
Notwithstanding, good credit scores were not enough to deter customers from churning out.
Gender
The gender split between the male and female customers was 55% and 45% respectively.
Most of the churned customers turned out to be women so further analysis is required to find out the cause.
Activity Levels
Age
Younger customers made up most of the retained customer base so efforts should be made to acquire young customers as they were less likely to leave.
Analysis should also be carried out to determine why older customers were more likely to leave.
Tenure and Purchased Products
Majority of the churned customers left after one year with the bank; although it is important to note that customers still left after spending a considerable amount of time at the bank.
Analysis is required to determine why customers that bought more products with the bank ended up leaving the bank, as none of the customers that bought 4 products with the bank were retained.
Churn Rate The bank recorded a churn rate of 20% which is below the average churn rate for financial institutions which is 25%. Hence, I recommend the bank keep implementing more methods and strategies to reduce the churn rate even further.
Thank You!!! for your time.