Alembert Company is a company engaged in business loan services for the MSME sector. Due to this pandemic, the company is trying to provide services in the form of loan waivers for customers which are referred to as follow-up recommendations. The provision of follow-up recommendations to customers is based on certain criteria, and it is necessary to determine what factors influence so that customers get certain treatments that are included in the follow-up program recommendations from the company.
Project Objective: Classification of customers to be included in the follow-up recommendations. In the target class, this follow-up recommendation itself consists of several classes such as restructuring and regular installments.
Model: Multinomial regression.
Model reference (Ref): A class on follow-up recommendations that have many customers.
Data: The data used consists of 1000 rows.
data = read.csv("https://storage.googleapis.com/dqlab-dataset/project.csv")
head(data)
## X NAMA_NASABAH NOMOR_KONTRAK DOMISILI KARAKTER
## 1 0 YOLI SEPINA NAINGGOLAN 0 MASIH TETAP KOOPERATIF
## 2 1 ERWIN NASUTION 1 MASIH TETAP TIDAK KOOPERATIF
## 3 2 HUSIN 2 MASIH TETAP TIDAK KOOPERATIF
## 4 3 HARITSYAH 3 PINDAH PERMANEN KOOPERATIF
## 5 4 HARIRI PANGGABEAN 4 MASIH TETAP TIDAK KOOPERATIF
## 6 5 JHON PREDDY HUTABARAT 5 MASIH TETAP KOOPERATIF
## PROFESI KONDISI_USAHA KONDISI_JAMINAN STATUS PRODUK PYD
## 1 IBU RUMAH TANGGA 2 Baik 2 3 30000000
## 2 NELAYAN 3 Rusak 8 3 10000000
## 3 LAINNYA 3 Baik 8 3 60000000
## 4 PNS 1 Rusak 7 3 90000000
## 5 WIRAUSAHA / PEDAGANG 1 Baik 8 3 150000000
## 6 WIRAUSAHA / PEDAGANG 1 Baik 7 3 40000000
## TENOR OSL KEWAJIBAN KOLEKTIBILITAS COUNT_SURVEY
## 1 24 28750000 4896841 DALAM PENGAWASAN KHUSUS 1
## 2 12 2040693 0 MACET 1
## 3 24 0 0 MACET 1
## 4 18 0 0 MACET 1
## 5 18 19844807 0 MACET 1
## 6 36 27298726 2208516 DALAM PENGAWASAN KHUSUS 1
## REKOMENDASI_TINDAK_LANJUT
## 1 Angsuran Biasa
## 2 Penarikan
## 3 Penarikan
## 4 Angsuran Biasa
## 5 Penarikan
## 6 Restrukturisasi
str(data)
## 'data.frame': 1000 obs. of 17 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ NAMA_NASABAH : chr "YOLI SEPINA NAINGGOLAN" "ERWIN NASUTION" "HUSIN" "HARITSYAH" ...
## $ NOMOR_KONTRAK : int 0 1 2 3 4 5 6 7 8 9 ...
## $ DOMISILI : chr "MASIH TETAP" "MASIH TETAP" "MASIH TETAP" "PINDAH PERMANEN" ...
## $ KARAKTER : chr "KOOPERATIF" "TIDAK KOOPERATIF" "TIDAK KOOPERATIF" "KOOPERATIF" ...
## $ PROFESI : chr "IBU RUMAH TANGGA" "NELAYAN" "LAINNYA" "PNS" ...
## $ KONDISI_USAHA : int 2 3 3 1 1 1 3 2 2 3 ...
## $ KONDISI_JAMINAN : chr "Baik" "Rusak" "Baik" "Rusak" ...
## $ STATUS : int 2 8 8 7 8 7 7 7 7 3 ...
## $ PRODUK : int 3 3 3 3 3 3 3 3 3 3 ...
## $ PYD : int 30000000 10000000 60000000 90000000 150000000 40000000 60000000 7500000 45000000 50000000 ...
## $ TENOR : int 24 12 24 18 18 36 36 4 24 36 ...
## $ OSL : int 28750000 2040693 0 0 19844807 27298726 19999200 7500000 45000000 6944100 ...
## $ KEWAJIBAN : int 4896841 0 0 0 0 2208516 6946592 0 0 7730984 ...
## $ KOLEKTIBILITAS : chr "DALAM PENGAWASAN KHUSUS" "MACET" "MACET" "MACET" ...
## $ COUNT_SURVEY : int 1 1 1 1 1 1 2 2 1 1 ...
## $ REKOMENDASI_TINDAK_LANJUT: chr "Angsuran Biasa" "Penarikan" "Penarikan" "Angsuran Biasa" ...
summary(data$OSL)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3999950 8687350 26562373 32082900 440932336
In the data, I don’t actually need the customer’s name to be given a recommendation. Or in other words, the customer marker to be given a recommendation is simply to look at the customer’s no_kontrak alone.
data_reduce = data[-c(1,2)]
colnames(data_reduce)
## [1] "NOMOR_KONTRAK" "DOMISILI"
## [3] "KARAKTER" "PROFESI"
## [5] "KONDISI_USAHA" "KONDISI_JAMINAN"
## [7] "STATUS" "PRODUK"
## [9] "PYD" "TENOR"
## [11] "OSL" "KEWAJIBAN"
## [13] "KOLEKTIBILITAS" "COUNT_SURVEY"
## [15] "REKOMENDASI_TINDAK_LANJUT"
data_reduce[, 8:11] = sapply(data_reduce[, 8:11], as.numeric)
Category data can be selected through columns “KONDISI_USAHA”, “KONDISI_JAMINAN”, “REKOMENDASI_TINDAK_LANJUT”
Use the chi-square test can be used to see the relationships between the following categorical variables:
data_kategorik = data_reduce[,c("KONDISI_USAHA", "KONDISI_JAMINAN","REKOMENDASI_TINDAK_LANJUT")]
data_reduce$REKOMENDASI_TINDAK_LANJUT = as.factor (data_reduce$REKOMENDASI_TINDAK_LANJUT)
chisq.test(data_kategorik$KONDISI_USAHA, data_kategorik$REKOMENDASI_TINDAK_LANJUT)
## Warning in chisq.test(data_kategorik$KONDISI_USAHA,
## data_kategorik$REKOMENDASI_TINDAK_LANJUT): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: data_kategorik$KONDISI_USAHA and data_kategorik$REKOMENDASI_TINDAK_LANJUT
## X-squared = 129.82, df = 6, p-value < 2.2e-16
chisq.test(data_kategorik$KONDISI_JAMINAN, data_kategorik$REKOMENDASI_TINDAK_LANJUT)
## Warning in chisq.test(data_kategorik$KONDISI_JAMINAN,
## data_kategorik$REKOMENDASI_TINDAK_LANJUT): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: data_kategorik$KONDISI_JAMINAN and data_kategorik$REKOMENDASI_TINDAK_LANJUT
## X-squared = 162.87, df = 9, p-value < 2.2e-16
If the p-value < 0.05, then reject H0 (there is no relationship between the variables). It can be seen from the result that the p-value < 0.05, meaning that there is a relationship between the two columns and REKOMENDASI_TINDAK_LANJUT
In addition to seeing relationships in categorical data, we can also see relationships between numerical variables. Yes. We’re going to use correlation.
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.2.2
## corrplot 0.92 loaded
library(ggcorrplot)
## Warning: package 'ggcorrplot' was built under R version 4.2.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.2.2
M = data_reduce[,8:11]
#Library corrplot
#-- Pearson correlation
par(mfrow=c(2,2))
corrplot(cor(M), type="upper",order="hclust")
corrplot(cor(M), method="square", type="upper")
corrplot(cor(M), method="number", type="lower")
corrplot(cor(M), method="ellipse")
#--Kendall correlation
par(mfrow=c(2,2))
corrplot(cor(M, method="kendall"), type="upper", order="hclust")
corrplot(cor(M, method="kendall"), method="square", type="upper")
corrplot(cor(M, method="kendall"), method="number", type="lower")
corrplot(cor(M, method="kendall"), method="ellipse")
# library ggcorrplot
corr = round(cor(M),1) # Pearson correlation
ggcorrplot(round(cor(M),1),
hc.order = TRUE,
type ="lower",
lab = TRUE,
lab_size = 3,
method = "circle",
colors = c("tomato2","white", "springgreen3"),
title="Correlogram of Data Nasabah",
ggtheme=theme_bw)
## Feature selection/independent variable/input In doing modeling, of
course, we need to review what variables have an effect on our model,
especially in classification. On this occasion we use the Multinomial
Regression model.
Then how to determine what variables have an effect?
There are many alternatives, one of which is Information Gain. Through information gain, the importance value of the variable is taken which is more than 0.02 (you can explore what happens if we take a value that is less than 0.02).
colnames(data_reduce)
## [1] "NOMOR_KONTRAK" "DOMISILI"
## [3] "KARAKTER" "PROFESI"
## [5] "KONDISI_USAHA" "KONDISI_JAMINAN"
## [7] "STATUS" "PRODUK"
## [9] "PYD" "TENOR"
## [11] "OSL" "KEWAJIBAN"
## [13] "KOLEKTIBILITAS" "COUNT_SURVEY"
## [15] "REKOMENDASI_TINDAK_LANJUT"
data_select =
data_reduce[,c("KARAKTER","KONDISI_USAHA","KONDISI_JAMINAN","STATUS","KEWAJIBAN","OSL","KOLEKTIBILITAS","REKOMENDASI_TINDAK_LANJUT")]
data_non_na = na.omit(data_select)
head(data_non_na,9)
## KARAKTER KONDISI_USAHA KONDISI_JAMINAN STATUS KEWAJIBAN OSL
## 1 KOOPERATIF 2 Baik 2 4896841 28750000
## 2 TIDAK KOOPERATIF 3 Rusak 8 0 2040693
## 3 TIDAK KOOPERATIF 3 Baik 8 0 0
## 4 KOOPERATIF 1 Rusak 7 0 0
## 5 TIDAK KOOPERATIF 1 Baik 8 0 19844807
## 6 KOOPERATIF 1 Baik 7 2208516 27298726
## 7 KOOPERATIF 3 Baik 7 6946592 19999200
## 8 KOOPERATIF 2 Baik 7 0 7500000
## 9 KOOPERATIF 2 Baik 7 0 45000000
## KOLEKTIBILITAS REKOMENDASI_TINDAK_LANJUT
## 1 DALAM PENGAWASAN KHUSUS Angsuran Biasa
## 2 MACET Penarikan
## 3 MACET Penarikan
## 4 MACET Angsuran Biasa
## 5 MACET Penarikan
## 6 DALAM PENGAWASAN KHUSUS Restrukturisasi
## 7 DALAM PENGAWASAN KHUSUS Restrukturisasi
## 8 LANCAR Restrukturisasi
## 9 LANCAR Restrukturisasi
To provide good model performance, our data needs to be done certain treatments, for example scaling or grouping data or also called bucketing.
data_select_new = data_select
data_select_new$KEWAJIBAN = scale(data_select_new$KEWAJIBAN)[, 1]
data_select_new$OSL = scale(data_select_new$OSL)[,1]
data_select_new$KEWAJIBAN = cut(data_select_new$KEWAJIBAN, breaks = c(-0.354107,5,15,30))
data_select_new$KEWAJIBAN = as.factor(data_select_new$KEWAJIBAN)
data_select_new$OSL = cut(data_select_new$OSL, breaks = c(-0.60383,3,10,15))
data_select_new$OSL = as.factor(data_select_new$OSL)
data_select_new = na.omit(data_select_new)
head(data_select_new, 9)
## KARAKTER KONDISI_USAHA KONDISI_JAMINAN STATUS KEWAJIBAN OSL
## 1 KOOPERATIF 2 Baik 2 (-0.354,5] (-0.604,3]
## 2 TIDAK KOOPERATIF 3 Rusak 8 (-0.354,5] (-0.604,3]
## 3 TIDAK KOOPERATIF 3 Baik 8 (-0.354,5] (-0.604,3]
## 4 KOOPERATIF 1 Rusak 7 (-0.354,5] (-0.604,3]
## 5 TIDAK KOOPERATIF 1 Baik 8 (-0.354,5] (-0.604,3]
## 6 KOOPERATIF 1 Baik 7 (-0.354,5] (-0.604,3]
## 7 KOOPERATIF 3 Baik 7 (-0.354,5] (-0.604,3]
## 8 KOOPERATIF 2 Baik 7 (-0.354,5] (-0.604,3]
## 9 KOOPERATIF 2 Baik 7 (-0.354,5] (-0.604,3]
## KOLEKTIBILITAS REKOMENDASI_TINDAK_LANJUT
## 1 DALAM PENGAWASAN KHUSUS Angsuran Biasa
## 2 MACET Penarikan
## 3 MACET Penarikan
## 4 MACET Angsuran Biasa
## 5 MACET Penarikan
## 6 DALAM PENGAWASAN KHUSUS Restrukturisasi
## 7 DALAM PENGAWASAN KHUSUS Restrukturisasi
## 8 LANCAR Restrukturisasi
## 9 LANCAR Restrukturisasi
Before entering into modeling,we need to separate our data into training and testing (some divide it into training, testing, and validation). The purpose of this data separation is to see the ability of our model to make predictions as well as the purpose of our modeling.
library(caret)
## Warning: package 'caret' was built under R version 4.2.2
## Loading required package: lattice
library(lattice)
index = createDataPartition(data_select_new$REKOMENDASI_TINDAK_LANJUT, p = .95, list = FALSE)
train = data_select_new[index, ]
test = data_select_new[index, ]
dim(train)
## [1] 952 8
dim(test)
## [1] 952 8
Remember that we are using a Multinomial Regression Model, where we need to specify a reference of the target class. This target class reference is the class that has the highest number of members.
train2 = train
# Setting the reference
train2$REKOMENDASI_TINDAK_LANJUT = relevel(train2$REKOMENDASI_TINDAK_LANJUT, ref = "Angsuran Biasa")
# training model
require(nnet)
## Loading required package: nnet
## Warning: package 'nnet' was built under R version 4.2.2
# training the multinomial mode
multinom_model = multinom(REKOMENDASI_TINDAK_LANJUT ~ ., data=train2)
## # weights: 64 (45 variable)
## initial value 1319.752232
## iter 10 value 743.235375
## iter 20 value 616.432909
## iter 30 value 612.563476
## iter 40 value 612.412202
## iter 50 value 612.387772
## iter 60 value 612.387449
## iter 60 value 612.387446
## iter 60 value 612.387446
## final value 612.387446
## converged
# checking model
summary(multinom_model)
## Warning in sqrt(diag(vc)): NaNs produced
## Call:
## multinom(formula = REKOMENDASI_TINDAK_LANJUT ~ ., data = train2)
##
## Coefficients:
## (Intercept) KARAKTERTIDAK KOOPERATIF KONDISI_USAHA
## Diskon Pelunasan -3.249705 1.810700 -0.7507187
## Penarikan -7.470390 4.112703 0.7741951
## Restrukturisasi -4.113883 -2.234672 0.9887580
## KONDISI_JAMINANHilang KONDISI_JAMINANPindah Tangan
## Diskon Pelunasan -22.4687812 -20.6245431
## Penarikan 0.6878420 0.4362509
## Restrukturisasi 0.4500063 -13.8932023
## KONDISI_JAMINANRusak STATUS KEWAJIBAN(5,15]
## Diskon Pelunasan -23.0638486 0.09018175 0
## Penarikan -1.1333422 0.07871418 0
## Restrukturisasi 0.2776425 0.31833914 0
## KEWAJIBAN(15,30] OSL(3,10] OSL(10,15]
## Diskon Pelunasan -3.5842713 3.2792970 0
## Penarikan 0.7381747 -22.8015258 0
## Restrukturisasi 23.0708746 0.6303358 0
## KOLEKTIBILITASDIRAGUKAN KOLEKTIBILITASKURANG LANCAR
## Diskon Pelunasan 2.7648800 0.3241799
## Penarikan 4.3963220 0.7007799
## Restrukturisasi -0.2825678 -0.1598568
## KOLEKTIBILITASLANCAR KOLEKTIBILITASMACET
## Diskon Pelunasan 0.2575033 0.6756479
## Penarikan -35.4163473 1.6162054
## Restrukturisasi 0.8280166 -3.1832824
##
## Std. Errors:
## (Intercept) KARAKTERTIDAK KOOPERATIF KONDISI_USAHA
## Diskon Pelunasan 1.104367 0.8940158 0.3852032
## Penarikan 1.284050 0.6871785 0.3919627
## Restrukturisasi 0.354970 0.7807061 0.1037661
## KONDISI_JAMINANHilang KONDISI_JAMINANPindah Tangan
## Diskon Pelunasan 2.283885e-11 2.444888e-10
## Penarikan 1.175756e+00 6.830980e-01
## Restrukturisasi 1.174936e+00 5.753451e-07
## KONDISI_JAMINANRusak STATUS KEWAJIBAN(5,15]
## Diskon Pelunasan 3.609143e-11 0.1408186 5.630232e-15
## Penarikan 1.350562e+00 0.1064586 1.294408e-16
## Restrukturisasi 1.224095e+00 0.0372587 0.000000e+00
## KEWAJIBAN(15,30] OSL(3,10] OSL(10,15]
## Diskon Pelunasan 4.136453e-13 1.071363e+00 9.704891e-16
## Penarikan NaN 1.704960e-12 NaN
## Restrukturisasi 1.889551e-11 6.991349e-01 0.000000e+00
## KOLEKTIBILITASDIRAGUKAN KOLEKTIBILITASKURANG LANCAR
## Diskon Pelunasan 0.9208343 0.7435020
## Penarikan 0.9036990 0.6890898
## Restrukturisasi 0.5845599 0.1928828
## KOLEKTIBILITASLANCAR KOLEKTIBILITASMACET
## Diskon Pelunasan 9.404178e-01 1.2080996
## Penarikan 1.360775e-16 0.7222968
## Restrukturisasi 2.166407e-01 1.0675992
##
## Residual Deviance: 1224.775
## AIC: 1302.775
# converting the coefficients to odds by taking the exponential of the coefficients
exp(coef(multinom_model))
## (Intercept) KARAKTERTIDAK KOOPERATIF KONDISI_USAHA
## Diskon Pelunasan 0.0387856565 6.1147236 0.4720272
## Penarikan 0.0005697059 61.1116529 2.1688457
## Restrukturisasi 0.0163441848 0.1070273 2.6878940
## KONDISI_JAMINANHilang KONDISI_JAMINANPindah Tangan
## Diskon Pelunasan 1.745550e-10 1.103760e-09
## Penarikan 1.989418e+00 1.546897e+00
## Restrukturisasi 1.568322e+00 9.252496e-07
## KONDISI_JAMINANRusak STATUS KEWAJIBAN(5,15] KEWAJIBAN(15,30]
## Diskon Pelunasan 9.627152e-11 1.094373 1 2.775689e-02
## Penarikan 3.219554e-01 1.081895 1 2.092113e+00
## Restrukturisasi 1.320014e+00 1.374842 1 1.046053e+10
## OSL(3,10] OSL(10,15] KOLEKTIBILITASDIRAGUKAN
## Diskon Pelunasan 2.655710e+01 1 15.8771345
## Penarikan 1.251478e-10 1 81.1518401
## Restrukturisasi 1.878241e+00 1 0.7538455
## KOLEKTIBILITASKURANG LANCAR KOLEKTIBILITASLANCAR
## Diskon Pelunasan 1.3828960 1.293696e+00
## Penarikan 2.0153238 4.157917e-16
## Restrukturisasi 0.8522658 2.288775e+00
## KOLEKTIBILITASMACET
## Diskon Pelunasan 1.96530593
## Penarikan 5.03395212
## Restrukturisasi 0.04144938
head(round(fitted(multinom_model), 2))
## Angsuran Biasa Diskon Pelunasan Penarikan Restrukturisasi
## 1 0.81 0.01 0.00 0.18
## 2 0.48 0.00 0.51 0.01
## 3 0.22 0.02 0.75 0.00
## 4 0.97 0.00 0.00 0.02
## 5 0.46 0.21 0.33 0.00
## 6 0.69 0.02 0.00 0.28
For specific purposes, we need to know the opportunity of each row of data (a representative of the customer) to enter on a specific target class. The goal is to see how influential the model is for classification. In addition, it can also be from a business point of view, in the case of real determination of the threshold on the opportunity value is also associated with several factors, for example customer revenue. To see the opportunity value that a customer enters on a specific target class.
# predicting the values for train dataset
train2$ClassPredicted = predict(multinom_model, newdata = train2, "class")
train_prob = predict(multinom_model, newdata = train2, "probs")
df = train_prob
df$max = apply(df,1,max)
## Warning in df$max = apply(df, 1, max): Coercing LHS to a list
train2$score = df$max
test_prob = predict(multinom_model, newdata = test, "probs")
df2 = test_prob
df2$max = apply(df2,1,max)
## Warning in df2$max = apply(df2, 1, max): Coercing LHS to a list
# Building classification table
tab_train = table(train2$REKOMENDASI_TINDAK_LANJUT, train2$ClassPredicted)
round((sum(diag(tab_train))/sum(tab_train))*100,4)
## [1] 68.5924
test$ClassPredicted = predict(multinom_model, newdata = test, "class")
test$score = df2$max
tab_test = table(test$REKOMENDASI_TINDAK_LANJUT, test$ClassPredicted)
round((sum(diag(tab_test))/sum(tab_test))*100,4)
## [1] 68.5924