Introduction

Hi everybody! This is an Customer Segmentation Analysis document.

library(datasets)
library(readxl)  # for reading Excel files
# Load your loan dataset
loandata <- read_excel("C:\\Users\\Lenovo\\Desktop\\loandata.xlsx")

data(loandata)

## Warning in data(loandata): data set 'loandata' not found

summary(loandata)

##  REPORTING_DATE                   ACCOUNT_NUMBER   CUSTOMER_ID   
##  Min.   :2016-01-31 00:00:00.00   Min.   :    1   Min.   :    1  
##  1st Qu.:2016-10-31 00:00:00.00   1st Qu.: 9788   1st Qu.: 9287  
##  Median :2017-08-30 00:00:00.00   Median :19073   Median :18248  
##  Mean   :2017-09-26 22:54:54.42   Mean   :18641   Mean   :17935  
##  3rd Qu.:2018-07-31 00:00:00.00   3rd Qu.:27258   3rd Qu.:26383  
##  Max.   :2019-08-29 00:00:00.00   Max.   :39597   Max.   :38396  
##                                                                  
##  PROGRAM_NAME       LOAN_OPEN_DATE                  
##  Length:900860      Min.   :2009-09-16 00:00:00.00  
##  Class :character   1st Qu.:2014-10-30 00:00:00.00  
##  Mode  :character   Median :2015-09-07 00:00:00.00  
##                     Mean   :2015-09-24 01:30:05.58  
##                     3rd Qu.:2016-08-23 00:00:00.00  
##                     Max.   :2019-08-29 00:00:00.00  
##                                                     
##  EXPECTED_CLOSE_DATE              ORIGINAL_BOOKED_AMOUNT  OUTSTANDING       
##  Min.   :2015-08-03 00:00:00.00   Min.   :  15000        Min.   :   -111.4  
##  1st Qu.:2019-03-03 00:00:00.00   1st Qu.:  56000        1st Qu.:  25705.6  
##  Median :2020-06-03 00:00:00.00   Median :  75500        Median :  48284.8  
##  Mean   :2020-05-17 04:02:32.07   Mean   :  88714        Mean   :  60824.4  
##  3rd Qu.:2021-06-03 00:00:00.00   3rd Qu.: 105000        3rd Qu.:  81733.3  
##  Max.   :2026-09-03 00:00:00.00   Max.   :2000000        Max.   :2000000.0  
##                                                                             
##      BUCKET           SEX            CUSTOMER_OPEN_DATE              
##  Min.   :0.0000   Length:900860      Min.   :1981-05-21 00:00:00.00  
##  1st Qu.:0.0000   Class :character   1st Qu.:2014-06-04 00:00:00.00  
##  Median :0.0000   Mode  :character   Median :2015-06-29 00:00:00.00  
##  Mean   :0.2827                      Mean   :2015-02-23 00:03:59.28  
##  3rd Qu.:0.0000                      3rd Qu.:2016-07-03 00:00:00.00  
##  Max.   :7.0000                      Max.   :2019-08-28 00:00:00.00  
##                                                                      
##    BIRTH_DATE                       PROFESSION          CAR_TYPE        
##  Min.   :1935-10-28 00:00:00.000   Length:900860      Length:900860     
##  1st Qu.:1970-03-27 00:00:00.000   Class :character   Class :character  
##  Median :1979-08-22 00:00:00.000   Mode  :character   Mode  :character  
##  Mean   :1977-07-24 07:50:29.962                                        
##  3rd Qu.:1985-09-20 00:00:00.000                                        
##  Max.   :1998-02-14 00:00:00.000                                        
##  NA's   :4533

R Code

# Load necessary libraries
library(dplyr)  # for data manipulation

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)  # for data visualization

Explanation

This code chunk loads the required libraries ‘dplyr’ and ‘ggplot2’. These libraries are essential for data manipulation and visualization tasks throughout the analysis.

R Code

# Define risk categories
very_high_risk <- c(remaining_tenure = 180 , loan_to_outstanding_ratio = 1.2)
high_risk <- c(remaining_tenure = 365 , loan_to_outstanding_ratio = 1)
moderate_risk <- c(remaining_tenure = 545, loan_to_outstanding_ratio = 0.8)
low_risk <- c(remaining_tenure =730, loan_to_outstanding_ratio = 0.6)

Explanation

This code chunk defines the risk categories based on the provided thresholds. These categories will be used to segment customers based on their loan tenure and loan-to-outstanding ratio.

R code

# Function to assign risk categories
assign_risk_category <- function(remaining_tenure, loan_to_outstanding_ratio, latest_bucket) {
  if (latest_bucket > 3) {
    return("Very High Risk")
  } else if (remaining_tenure <= very_high_risk["remaining_tenure"] | loan_to_outstanding_ratio >= very_high_risk["loan_to_outstanding_ratio"]) {
    return("High Risk")
  } else if (remaining_tenure <= high_risk["remaining_tenure"] | loan_to_outstanding_ratio >= high_risk["loan_to_outstanding_ratio"]) {
    return("Moderate Risk")
  } else if (remaining_tenure <= moderate_risk["remaining_tenure"] | loan_to_outstanding_ratio >= moderate_risk["loan_to_outstanding_ratio"]) {
    return("Low Risk")
  } else {
    return("Very Low Risk")
  }
}

Explanation

This code chunk defines a function ‘assign_risk_category’ to assign risk categories to customers based on their loan tenure, loan-to-outstanding ratio, and latest bucket value. This function implements a set of conditions to determine the risk category for each customer.

R code

# Calculate loan-to-outstanding ratio
loandata$Loan_to_outstanding_ratio <- loandata$OUTSTANDING / loandata$ORIGINAL_BOOKED_AMOUNT
glimpse(loandata)

## Rows: 900,860
## Columns: 15
## $ REPORTING_DATE            <dttm> 2016-01-31, 2016-01-31, 2016-01-31, 2016-01…
## $ ACCOUNT_NUMBER            <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ CUSTOMER_ID               <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ PROGRAM_NAME              <chr> "Auto Loans 50% Down Payment - Employed", "P…
## $ LOAN_OPEN_DATE            <dttm> 2015-11-25, 2015-12-08, 2016-01-12, 2015-11…
## $ EXPECTED_CLOSE_DATE       <dttm> 2020-11-03, 2017-12-03, 2021-01-03, 2019-10…
## $ ORIGINAL_BOOKED_AMOUNT    <dbl> 91000, 35000, 52500, 103000, 94250, 54500, 1…
## $ OUTSTANDING               <dbl> 88223.40, 33714.82, 52500.00, 99054.45, 8945…
## $ BUCKET                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ SEX                       <chr> "M", "M", "F", "M", "M", "M", "M", "M", "F",…
## $ CUSTOMER_OPEN_DATE        <dttm> 2015-10-27, 2015-11-29, 2015-12-28, 2015-10…
## $ BIRTH_DATE                <dttm> 1986-03-24, 1985-08-18, 1985-07-02, 1979-01…
## $ PROFESSION                <chr> "EMPLOYEE", "EMPLOYEE", "HOUSEWIFE", "Shop O…
## $ CAR_TYPE                  <chr> "KIA", "CARRY", "CHEVROLET", "MITSUBISHI", "…
## $ Loan_to_outstanding_ratio <dbl> 0.9694879, 0.9632806, 1.0000000, 0.9616937, …

Explanation

This code chunk calculates the loan-to-outstanding ratio for each loan in the dataset. It divides the outstanding amount by the original booked amount to obtain this ratio at every reporting date.

R code

# Calculate remaining loan tenure
loandata$Remaining_tenure <- as.numeric(difftime(loandata$EXPECTED_CLOSE_DATE, loandata$REPORTING_DATE, units = "days"))
glimpse(loandata)

## Rows: 900,860
## Columns: 16
## $ REPORTING_DATE            <dttm> 2016-01-31, 2016-01-31, 2016-01-31, 2016-01…
## $ ACCOUNT_NUMBER            <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ CUSTOMER_ID               <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ PROGRAM_NAME              <chr> "Auto Loans 50% Down Payment - Employed", "P…
## $ LOAN_OPEN_DATE            <dttm> 2015-11-25, 2015-12-08, 2016-01-12, 2015-11…
## $ EXPECTED_CLOSE_DATE       <dttm> 2020-11-03, 2017-12-03, 2021-01-03, 2019-10…
## $ ORIGINAL_BOOKED_AMOUNT    <dbl> 91000, 35000, 52500, 103000, 94250, 54500, 1…
## $ OUTSTANDING               <dbl> 88223.40, 33714.82, 52500.00, 99054.45, 8945…
## $ BUCKET                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ SEX                       <chr> "M", "M", "F", "M", "M", "M", "M", "M", "F",…
## $ CUSTOMER_OPEN_DATE        <dttm> 2015-10-27, 2015-11-29, 2015-12-28, 2015-10…
## $ BIRTH_DATE                <dttm> 1986-03-24, 1985-08-18, 1985-07-02, 1979-01…
## $ PROFESSION                <chr> "EMPLOYEE", "EMPLOYEE", "HOUSEWIFE", "Shop O…
## $ CAR_TYPE                  <chr> "KIA", "CARRY", "CHEVROLET", "MITSUBISHI", "…
## $ Loan_to_outstanding_ratio <dbl> 0.9694879, 0.9632806, 1.0000000, 0.9616937, …
## $ Remaining_tenure          <dbl> 1738, 672, 1799, 1341, 1007, 1738, 1068, 106…

Explanation

This code chunk calculates the remaining loan tenure for each loan in the dataset based on the loan expected end date and reporting date. It uses the ‘difftime’ function to calculate the difference in days between the two dates.

R code

# Segment customers based on risk categories
customer_segments <- loandata %>%
  group_by(CUSTOMER_ID) %>%
  summarise(
    Average_remaining_tenure = mean(Remaining_tenure),
    Average_loan_to_outstanding_ratio = mean(Loan_to_outstanding_ratio),
    Latest_Bucket = last(BUCKET)  # Get the latest bucket value for each customer
  ) %>%
  mutate(Risk_Category = mapply(assign_risk_category, Average_remaining_tenure, Average_loan_to_outstanding_ratio, Latest_Bucket))
glimpse(customer_segments)

## Rows: 38,396
## Columns: 5
## $ CUSTOMER_ID                       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1…
## $ Average_remaining_tenure          <dbl> 1419.5909, 338.3913, 1252.3514, 1022…
## $ Average_loan_to_outstanding_ratio <dbl> 0.8131058, 0.5285329, 0.7605609, 0.7…
## $ Latest_Bucket                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, …
## $ Risk_Category                     <chr> "Low Risk", "Moderate Risk", "Very L…

Explanation

This code chunk segments customers based on their average remaining loan tenure, average loan-to-outstanding ratio, and latest bucket value. It calculates these metrics for each customer and assigns a risk category using the previously defined function ‘assign_risk_category’.

R code

# Visualize the segmentation with risk categories
ggplot(customer_segments, aes(x = Average_remaining_tenure, y = Average_loan_to_outstanding_ratio, color = Risk_Category)) +
  geom_point() +
  labs(title = "Customer Segmentation by Risk Level",
       x = "Average Remaining Loan Tenure",
       y = "Average Loan-to-Outstanding Ratio",
       color = "Risk Category") +
  theme_minimal()

Explanation

This code chunk visualizes the segmentation of customers by risk level using a scatter plot. The x-axis represents the average remaining loan tenure, the y-axis represents the average loan-to-outstanding ratio, and the points are colored based on the assigned risk category, to provide insights into customer segmentation.

New ideas to mitigate credit risk

Very High Risk Segment:

Aggressive Monitoring: Implement a real-time monitoring system to closely track customers in this segment. Any signs of default or late payments should trigger immediate actions.

Restructuring Options: Offer loan restructuring options to customers in this segment to help them manage their debts more effectively, such as extending the loan tenure or reducing interest rates temporarily.

Collateralization: Require collateral for new loans or increase collateral requirements for existing customers in this segment to mitigate potential losses in case of default.

High Risk Segment:

Financial Counseling: Provide financial counseling services to customers in this segment to help them improve their financial literacy and better manage their debts.

Early Intervention: Implement early intervention strategies to proactively address any emerging issues with customers in this segment before they escalate into defaults.

Flexible Repayment Plans: Offer flexible repayment plans for customers facing temporary financial difficulties, such as unemployment or medical emergencies.

Moderate Risk Segment:

Credit Education: Offer educational resources or workshops to help customers in this segment better understand credit management principles and improve their financial habits.

Diversification: Diversify the loan portfolio by expanding into different markets or product offerings to reduce concentration risk associated with this segment.

Low Risk Segment:

Reward Programs: Implement reward programs or incentives for customers in this segment who demonstrate responsible borrowing behavior, such as making timely payments or maintaining low levels of debt.

Cross-Selling: Cross-sell additional financial products or services to customers in this segment, such as investment opportunities or insurance products, to deepen the relationship.

Credit Limit Increases: Offer credit limit increases to customers in this segment based on their positive credit history and repayment behavior, while still maintaining cautious risk management practices.

**The effectiveness of these strategies will depend on various factors, including the specific characteristics of the customer base, market conditions, and regulatory requirements. It’s important to continually monitor and adjust these strategies based on evolving risk factors and business objectives.

RMarkdown Customer segmentation

Rahma Magdy

2024-02-21

Introduction

R Code

Explanation

R Code

Explanation

R code

Explanation

R code

Explanation

R code

Explanation

R code

Explanation

R code

Explanation

New ideas to mitigate credit risk

Very High Risk Segment:

High Risk Segment:

Moderate Risk Segment:

Low Risk Segment: