In today’s dynamic banking sector, customer churn analysis has emerged as a pivotal instrument for financial institutions. Churn, the departure of customers to competitors, not only impacts revenue but serves as a critical gauge of customer satisfaction and loyalty. Understanding and effectively managing churn is vital for banks seeking to maintain a competitive edge, retain valuable clientele, and navigate the ever-evolving financial landscape. This report analyzes customer records from a fictitious bank but aims to discover realistic trends and correlations to ultimately enhance retention.
# Load and preprocess data
churn_table <- read.csv("/Users/Owner/Downloads/Customer_Churn_Records.csv")
# Data processing steps
Our first analysis simply allows us to see that there are 2038 out of 10000 have churned (or exited) while 7962 are existing customers.
churn_count <- churn_table %>%
group_by(Exited) %>%
summarize(total_customers = sum(Exited==1, Exited == 0))%>%
mutate(percentage = total_customers/10000*100)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
In the country analysis we compare churn percentage in different countries and conclude that there is a significantly higher churn percentage in Germany than other country.
country_breakdown <- churn_table %>%
group_by(Geography) %>%
summarize(total_exited = sum(Exited == 1),
total_customers = n()) %>%
mutate(churn_percentage = round(total_exited/total_customers*100,2)) %>%
arrange(desc(churn_percentage))
| Geography | total_exited | total_customers | churn_percentage |
|---|---|---|---|
| Germany | 814 | 2509 | 32.44 |
| Spain | 413 | 2477 | 16.67 |
| France | 811 | 5014 | 16.17 |
Our data set also allows us to view whether customers have filed a complaint with the bank or not and whether this is a good predictor of someone leaving.
complaint_analysis <- churn_table %>%
group_by(Complain) %>%
summarize(exited_count =sum(Exited==1))
| Complain | exited_count |
|---|---|
| 0 | 4 |
| 1 | 2034 |
In the table above where 0 = “Never Complained” and 1 = “Has Complained” we can see a strong correlation between those who have complained and later left the bank with over 99.8% churn rate.
Now we can check to see if there is any relationship between churn rates and our age demographic.
age_analysis <- churn_table %>%
group_by (age_ranges) %>%
summarize(exited_customers = sum(Exited ==1),
total_customers = n()) %>%
mutate(percentage = round(exited_customers/total_customers*100,2))
We found that the age group of 46-64 have the highest churn rate.
We also checked to see if there is any relationship between how long someone has been a customer for and churn rates.
tenure_analysis <- churn_table %>%
group_by(Tenure) %>%
summarize(exited_count =sum(Exited==1),
total_customers = n()) %>%
mutate(churn_percentage = round(exited_count/total_customers*100))
Using a double y-axis graph we see that there is not much variation between churn percentage caused by tenure. The variation of churn percentage throughout all years is rather consistent only ranging from 17-24 percent with little trend in any direction.
Now that we have identified some key insights that reveal such things as where and who is most likely to churn we can take our findings one step further and create a logistic regression using some of the predictors used in our analysis. Logistic regression is statistical model that predicts the probability of an event taking place. In our case we want to predict the likelihood of someone leaving.
model1 <- glm(Exited ~ Complain + Age + Tenure + Geography, data = churn_table, family = binomial(link = "logit"))
##
## Call:
## glm(formula = Exited ~ Complain + Age + Tenure + Geography, family = binomial(link = "logit"),
## data = churn_table)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -10.65511 1.39445 -7.641 2.15e-14 ***
## Complain 13.29994 0.72487 18.348 < 2e-16 ***
## Age 0.07739 0.02283 3.390 0.0007 ***
## Tenure -0.05839 0.09281 -0.629 0.5292
## GeographyGermany -0.54436 0.63415 -0.858 0.3907
## GeographySpain 0.23838 0.75082 0.317 0.7509
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10112.5 on 9999 degrees of freedom
## Residual deviance: 183.1 on 9994 degrees of freedom
## AIC: 195.1
##
## Number of Fisher Scoring iterations: 10
In conclusion, our comprehensive analysis of customer churn in the banking industry has shed light on critical factors influencing customer retention. We observed that customer complaints and age play significant roles in predicting churn, highlighting areas for targeted interventions. While tenure and geographical factors exhibited less impact in our model, their consideration remains valuable in understanding customer dynamics. As the banking landscape continues to evolve, our findings serve as a valuable resource for institutions seeking to proactively manage churn, enhance customer satisfaction, and fortify their competitive position. By implementing data-driven strategies, banks can navigate these challenges effectively and foster enduring relationships with their clientele in the ever-changing financial landscape.