Hi everybody! This is an Customer Segmentation Analysis document.
library(datasets)
library(readxl) # for reading Excel files
# Load your loan dataset
loandata <- read_excel("C:\\Users\\Lenovo\\Desktop\\loandata.xlsx")
data(loandata)
## Warning in data(loandata): data set 'loandata' not found
summary(loandata)
## REPORTING_DATE ACCOUNT_NUMBER CUSTOMER_ID
## Min. :2016-01-31 00:00:00.00 Min. : 1 Min. : 1
## 1st Qu.:2016-10-31 00:00:00.00 1st Qu.: 9788 1st Qu.: 9287
## Median :2017-08-30 00:00:00.00 Median :19073 Median :18248
## Mean :2017-09-26 22:54:54.42 Mean :18641 Mean :17935
## 3rd Qu.:2018-07-31 00:00:00.00 3rd Qu.:27258 3rd Qu.:26383
## Max. :2019-08-29 00:00:00.00 Max. :39597 Max. :38396
##
## PROGRAM_NAME LOAN_OPEN_DATE
## Length:900860 Min. :2009-09-16 00:00:00.00
## Class :character 1st Qu.:2014-10-30 00:00:00.00
## Mode :character Median :2015-09-07 00:00:00.00
## Mean :2015-09-24 01:30:05.58
## 3rd Qu.:2016-08-23 00:00:00.00
## Max. :2019-08-29 00:00:00.00
##
## EXPECTED_CLOSE_DATE ORIGINAL_BOOKED_AMOUNT OUTSTANDING
## Min. :2015-08-03 00:00:00.00 Min. : 15000 Min. : -111.4
## 1st Qu.:2019-03-03 00:00:00.00 1st Qu.: 56000 1st Qu.: 25705.6
## Median :2020-06-03 00:00:00.00 Median : 75500 Median : 48284.8
## Mean :2020-05-17 04:02:32.07 Mean : 88714 Mean : 60824.4
## 3rd Qu.:2021-06-03 00:00:00.00 3rd Qu.: 105000 3rd Qu.: 81733.3
## Max. :2026-09-03 00:00:00.00 Max. :2000000 Max. :2000000.0
##
## BUCKET SEX CUSTOMER_OPEN_DATE
## Min. :0.0000 Length:900860 Min. :1981-05-21 00:00:00.00
## 1st Qu.:0.0000 Class :character 1st Qu.:2014-06-04 00:00:00.00
## Median :0.0000 Mode :character Median :2015-06-29 00:00:00.00
## Mean :0.2827 Mean :2015-02-23 00:03:59.28
## 3rd Qu.:0.0000 3rd Qu.:2016-07-03 00:00:00.00
## Max. :7.0000 Max. :2019-08-28 00:00:00.00
##
## BIRTH_DATE PROFESSION CAR_TYPE
## Min. :1935-10-28 00:00:00.000 Length:900860 Length:900860
## 1st Qu.:1970-03-27 00:00:00.000 Class :character Class :character
## Median :1979-08-22 00:00:00.000 Mode :character Mode :character
## Mean :1977-07-24 07:50:29.962
## 3rd Qu.:1985-09-20 00:00:00.000
## Max. :1998-02-14 00:00:00.000
## NA's :4533
# Load necessary libraries
library(dplyr) # for data manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # for data visualization
This code chunk loads the required libraries ‘dplyr’ and ‘ggplot2’. These libraries are essential for data manipulation and visualization tasks throughout the analysis.
# Define risk categories
very_high_risk <- c(remaining_tenure = 180 , loan_to_outstanding_ratio = 1.2)
high_risk <- c(remaining_tenure = 365 , loan_to_outstanding_ratio = 1)
moderate_risk <- c(remaining_tenure = 545, loan_to_outstanding_ratio = 0.8)
low_risk <- c(remaining_tenure =730, loan_to_outstanding_ratio = 0.6)
This code chunk defines the risk categories based on the provided thresholds. These categories will be used to segment customers based on their loan tenure and loan-to-outstanding ratio.
# Function to assign risk categories
assign_risk_category <- function(remaining_tenure, loan_to_outstanding_ratio, latest_bucket) {
if (latest_bucket > 3) {
return("Very High Risk")
} else if (remaining_tenure <= very_high_risk["remaining_tenure"] | loan_to_outstanding_ratio >= very_high_risk["loan_to_outstanding_ratio"]) {
return("High Risk")
} else if (remaining_tenure <= high_risk["remaining_tenure"] | loan_to_outstanding_ratio >= high_risk["loan_to_outstanding_ratio"]) {
return("Moderate Risk")
} else if (remaining_tenure <= moderate_risk["remaining_tenure"] | loan_to_outstanding_ratio >= moderate_risk["loan_to_outstanding_ratio"]) {
return("Low Risk")
} else {
return("Very Low Risk")
}
}
This code chunk defines a function ‘assign_risk_category’ to assign risk categories to customers based on their loan tenure, loan-to-outstanding ratio, and latest bucket value. This function implements a set of conditions to determine the risk category for each customer.
# Calculate loan-to-outstanding ratio
loandata$Loan_to_outstanding_ratio <- loandata$OUTSTANDING / loandata$ORIGINAL_BOOKED_AMOUNT
glimpse(loandata)
## Rows: 900,860
## Columns: 15
## $ REPORTING_DATE <dttm> 2016-01-31, 2016-01-31, 2016-01-31, 2016-01…
## $ ACCOUNT_NUMBER <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ CUSTOMER_ID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ PROGRAM_NAME <chr> "Auto Loans 50% Down Payment - Employed", "P…
## $ LOAN_OPEN_DATE <dttm> 2015-11-25, 2015-12-08, 2016-01-12, 2015-11…
## $ EXPECTED_CLOSE_DATE <dttm> 2020-11-03, 2017-12-03, 2021-01-03, 2019-10…
## $ ORIGINAL_BOOKED_AMOUNT <dbl> 91000, 35000, 52500, 103000, 94250, 54500, 1…
## $ OUTSTANDING <dbl> 88223.40, 33714.82, 52500.00, 99054.45, 8945…
## $ BUCKET <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ SEX <chr> "M", "M", "F", "M", "M", "M", "M", "M", "F",…
## $ CUSTOMER_OPEN_DATE <dttm> 2015-10-27, 2015-11-29, 2015-12-28, 2015-10…
## $ BIRTH_DATE <dttm> 1986-03-24, 1985-08-18, 1985-07-02, 1979-01…
## $ PROFESSION <chr> "EMPLOYEE", "EMPLOYEE", "HOUSEWIFE", "Shop O…
## $ CAR_TYPE <chr> "KIA", "CARRY", "CHEVROLET", "MITSUBISHI", "…
## $ Loan_to_outstanding_ratio <dbl> 0.9694879, 0.9632806, 1.0000000, 0.9616937, …
This code chunk calculates the loan-to-outstanding ratio for each loan in the dataset. It divides the outstanding amount by the original booked amount to obtain this ratio at every reporting date.
# Calculate remaining loan tenure
loandata$Remaining_tenure <- as.numeric(difftime(loandata$EXPECTED_CLOSE_DATE, loandata$REPORTING_DATE, units = "days"))
glimpse(loandata)
## Rows: 900,860
## Columns: 16
## $ REPORTING_DATE <dttm> 2016-01-31, 2016-01-31, 2016-01-31, 2016-01…
## $ ACCOUNT_NUMBER <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ CUSTOMER_ID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ PROGRAM_NAME <chr> "Auto Loans 50% Down Payment - Employed", "P…
## $ LOAN_OPEN_DATE <dttm> 2015-11-25, 2015-12-08, 2016-01-12, 2015-11…
## $ EXPECTED_CLOSE_DATE <dttm> 2020-11-03, 2017-12-03, 2021-01-03, 2019-10…
## $ ORIGINAL_BOOKED_AMOUNT <dbl> 91000, 35000, 52500, 103000, 94250, 54500, 1…
## $ OUTSTANDING <dbl> 88223.40, 33714.82, 52500.00, 99054.45, 8945…
## $ BUCKET <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ SEX <chr> "M", "M", "F", "M", "M", "M", "M", "M", "F",…
## $ CUSTOMER_OPEN_DATE <dttm> 2015-10-27, 2015-11-29, 2015-12-28, 2015-10…
## $ BIRTH_DATE <dttm> 1986-03-24, 1985-08-18, 1985-07-02, 1979-01…
## $ PROFESSION <chr> "EMPLOYEE", "EMPLOYEE", "HOUSEWIFE", "Shop O…
## $ CAR_TYPE <chr> "KIA", "CARRY", "CHEVROLET", "MITSUBISHI", "…
## $ Loan_to_outstanding_ratio <dbl> 0.9694879, 0.9632806, 1.0000000, 0.9616937, …
## $ Remaining_tenure <dbl> 1738, 672, 1799, 1341, 1007, 1738, 1068, 106…
This code chunk calculates the remaining loan tenure for each loan in the dataset based on the loan expected end date and reporting date. It uses the ‘difftime’ function to calculate the difference in days between the two dates.
# Segment customers based on risk categories
customer_segments <- loandata %>%
group_by(CUSTOMER_ID) %>%
summarise(
Average_remaining_tenure = mean(Remaining_tenure),
Average_loan_to_outstanding_ratio = mean(Loan_to_outstanding_ratio),
Latest_Bucket = last(BUCKET) # Get the latest bucket value for each customer
) %>%
mutate(Risk_Category = mapply(assign_risk_category, Average_remaining_tenure, Average_loan_to_outstanding_ratio, Latest_Bucket))
glimpse(customer_segments)
## Rows: 38,396
## Columns: 5
## $ CUSTOMER_ID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1…
## $ Average_remaining_tenure <dbl> 1419.5909, 338.3913, 1252.3514, 1022…
## $ Average_loan_to_outstanding_ratio <dbl> 0.8131058, 0.5285329, 0.7605609, 0.7…
## $ Latest_Bucket <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, …
## $ Risk_Category <chr> "Low Risk", "Moderate Risk", "Very L…
This code chunk segments customers based on their average remaining loan tenure, average loan-to-outstanding ratio, and latest bucket value. It calculates these metrics for each customer and assigns a risk category using the previously defined function ‘assign_risk_category’.
# Visualize the segmentation with risk categories
ggplot(customer_segments, aes(x = Average_remaining_tenure, y = Average_loan_to_outstanding_ratio, color = Risk_Category)) +
geom_point() +
labs(title = "Customer Segmentation by Risk Level",
x = "Average Remaining Loan Tenure",
y = "Average Loan-to-Outstanding Ratio",
color = "Risk Category") +
theme_minimal()
This code chunk visualizes the segmentation of customers by risk level using a scatter plot. The x-axis represents the average remaining loan tenure, the y-axis represents the average loan-to-outstanding ratio, and the points are colored based on the assigned risk category, to provide insights into customer segmentation.
Aggressive Monitoring: Implement a real-time monitoring system to closely track customers in this segment. Any signs of default or late payments should trigger immediate actions.
Restructuring Options: Offer loan restructuring options to customers in this segment to help them manage their debts more effectively, such as extending the loan tenure or reducing interest rates temporarily.
Collateralization: Require collateral for new loans or increase collateral requirements for existing customers in this segment to mitigate potential losses in case of default.
Financial Counseling: Provide financial counseling services to customers in this segment to help them improve their financial literacy and better manage their debts.
Early Intervention: Implement early intervention strategies to proactively address any emerging issues with customers in this segment before they escalate into defaults.
Flexible Repayment Plans: Offer flexible repayment plans for customers facing temporary financial difficulties, such as unemployment or medical emergencies.
Credit Education: Offer educational resources or workshops to help customers in this segment better understand credit management principles and improve their financial habits.
Diversification: Diversify the loan portfolio by expanding into different markets or product offerings to reduce concentration risk associated with this segment.
Reward Programs: Implement reward programs or incentives for customers in this segment who demonstrate responsible borrowing behavior, such as making timely payments or maintaining low levels of debt.
Cross-Selling: Cross-sell additional financial products or services to customers in this segment, such as investment opportunities or insurance products, to deepen the relationship.
Credit Limit Increases: Offer credit limit increases to customers in this segment based on their positive credit history and repayment behavior, while still maintaining cautious risk management practices.
**The effectiveness of these strategies will depend on various factors, including the specific characteristics of the customer base, market conditions, and regulatory requirements. It’s important to continually monitor and adjust these strategies based on evolving risk factors and business objectives.