Project Analisa Klasifikasi Pinjaman untuk Sektor UMKM

Introduction

Alembert Company is a company engaged in business loan services for the MSME sector. Due to this pandemic, the company is trying to provide services in the form of loan waivers for customers which are referred to as follow-up recommendations. The provision of follow-up recommendations to customers is based on certain criteria, and it is necessary to determine what factors influence so that customers get certain treatments that are included in the follow-up program recommendations from the company.

Project Objective: Classification of customers to be included in the follow-up recommendations. In the target class, this follow-up recommendation itself consists of several classes such as restructuring and regular installments.

Model: Multinomial regression.

Model reference (Ref): A class on follow-up recommendations that have many customers.

Data: The data used consists of 1000 rows.

Read External Data

data = read.csv("https://storage.googleapis.com/dqlab-dataset/project.csv")

Inspection Data

head(data)

##   X           NAMA_NASABAH NOMOR_KONTRAK        DOMISILI         KARAKTER
## 1 0 YOLI SEPINA NAINGGOLAN             0     MASIH TETAP       KOOPERATIF
## 2 1         ERWIN NASUTION             1     MASIH TETAP TIDAK KOOPERATIF
## 3 2                  HUSIN             2     MASIH TETAP TIDAK KOOPERATIF
## 4 3              HARITSYAH             3 PINDAH PERMANEN       KOOPERATIF
## 5 4      HARIRI PANGGABEAN             4     MASIH TETAP TIDAK KOOPERATIF
## 6 5  JHON PREDDY HUTABARAT             5     MASIH TETAP       KOOPERATIF
##                PROFESI KONDISI_USAHA KONDISI_JAMINAN STATUS PRODUK       PYD
## 1     IBU RUMAH TANGGA             2            Baik      2      3  30000000
## 2              NELAYAN             3           Rusak      8      3  10000000
## 3              LAINNYA             3            Baik      8      3  60000000
## 4                  PNS             1           Rusak      7      3  90000000
## 5 WIRAUSAHA / PEDAGANG             1            Baik      8      3 150000000
## 6 WIRAUSAHA / PEDAGANG             1            Baik      7      3  40000000
##   TENOR      OSL KEWAJIBAN          KOLEKTIBILITAS COUNT_SURVEY
## 1    24 28750000   4896841 DALAM PENGAWASAN KHUSUS            1
## 2    12  2040693         0                   MACET            1
## 3    24        0         0                   MACET            1
## 4    18        0         0                   MACET            1
## 5    18 19844807         0                   MACET            1
## 6    36 27298726   2208516 DALAM PENGAWASAN KHUSUS            1
##   REKOMENDASI_TINDAK_LANJUT
## 1            Angsuran Biasa
## 2                 Penarikan
## 3                 Penarikan
## 4            Angsuran Biasa
## 5                 Penarikan
## 6           Restrukturisasi

str(data)

## 'data.frame':    1000 obs. of  17 variables:
##  $ X                        : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ NAMA_NASABAH             : chr  "YOLI SEPINA NAINGGOLAN" "ERWIN NASUTION" "HUSIN" "HARITSYAH" ...
##  $ NOMOR_KONTRAK            : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ DOMISILI                 : chr  "MASIH TETAP" "MASIH TETAP" "MASIH TETAP" "PINDAH PERMANEN" ...
##  $ KARAKTER                 : chr  "KOOPERATIF" "TIDAK KOOPERATIF" "TIDAK KOOPERATIF" "KOOPERATIF" ...
##  $ PROFESI                  : chr  "IBU RUMAH TANGGA" "NELAYAN" "LAINNYA" "PNS" ...
##  $ KONDISI_USAHA            : int  2 3 3 1 1 1 3 2 2 3 ...
##  $ KONDISI_JAMINAN          : chr  "Baik" "Rusak" "Baik" "Rusak" ...
##  $ STATUS                   : int  2 8 8 7 8 7 7 7 7 3 ...
##  $ PRODUK                   : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ PYD                      : int  30000000 10000000 60000000 90000000 150000000 40000000 60000000 7500000 45000000 50000000 ...
##  $ TENOR                    : int  24 12 24 18 18 36 36 4 24 36 ...
##  $ OSL                      : int  28750000 2040693 0 0 19844807 27298726 19999200 7500000 45000000 6944100 ...
##  $ KEWAJIBAN                : int  4896841 0 0 0 0 2208516 6946592 0 0 7730984 ...
##  $ KOLEKTIBILITAS           : chr  "DALAM PENGAWASAN KHUSUS" "MACET" "MACET" "MACET" ...
##  $ COUNT_SURVEY             : int  1 1 1 1 1 1 2 2 1 1 ...
##  $ REKOMENDASI_TINDAK_LANJUT: chr  "Angsuran Biasa" "Penarikan" "Penarikan" "Angsuran Biasa" ...

Descriptive Statistics data

summary(data$OSL)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##         0   3999950   8687350  26562373  32082900 440932336

Deleting a column

In the data, I don’t actually need the customer’s name to be given a recommendation. Or in other words, the customer marker to be given a recommendation is simply to look at the customer’s no_kontrak alone.

data_reduce = data[-c(1,2)]
colnames(data_reduce)

##  [1] "NOMOR_KONTRAK"             "DOMISILI"                 
##  [3] "KARAKTER"                  "PROFESI"                  
##  [5] "KONDISI_USAHA"             "KONDISI_JAMINAN"          
##  [7] "STATUS"                    "PRODUK"                   
##  [9] "PYD"                       "TENOR"                    
## [11] "OSL"                       "KEWAJIBAN"                
## [13] "KOLEKTIBILITAS"            "COUNT_SURVEY"             
## [15] "REKOMENDASI_TINDAK_LANJUT"

data_reduce[, 8:11] = sapply(data_reduce[, 8:11], as.numeric)

Category data selection

Category data can be selected through columns “KONDISI_USAHA”, “KONDISI_JAMINAN”, “REKOMENDASI_TINDAK_LANJUT”

Use the chi-square test can be used to see the relationships between the following categorical variables:

data_kategorik = data_reduce[,c("KONDISI_USAHA", "KONDISI_JAMINAN","REKOMENDASI_TINDAK_LANJUT")]
data_reduce$REKOMENDASI_TINDAK_LANJUT = as.factor (data_reduce$REKOMENDASI_TINDAK_LANJUT)
chisq.test(data_kategorik$KONDISI_USAHA, data_kategorik$REKOMENDASI_TINDAK_LANJUT)

## Warning in chisq.test(data_kategorik$KONDISI_USAHA,
## data_kategorik$REKOMENDASI_TINDAK_LANJUT): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  data_kategorik$KONDISI_USAHA and data_kategorik$REKOMENDASI_TINDAK_LANJUT
## X-squared = 129.82, df = 6, p-value < 2.2e-16

chisq.test(data_kategorik$KONDISI_JAMINAN, data_kategorik$REKOMENDASI_TINDAK_LANJUT)

## Warning in chisq.test(data_kategorik$KONDISI_JAMINAN,
## data_kategorik$REKOMENDASI_TINDAK_LANJUT): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  data_kategorik$KONDISI_JAMINAN and data_kategorik$REKOMENDASI_TINDAK_LANJUT
## X-squared = 162.87, df = 9, p-value < 2.2e-16

If the p-value < 0.05, then reject H0 (there is no relationship between the variables). It can be seen from the result that the p-value < 0.05, meaning that there is a relationship between the two columns and REKOMENDASI_TINDAK_LANJUT

Correlation between data variables

In addition to seeing relationships in categorical data, we can also see relationships between numerical variables. Yes. We’re going to use correlation.

library(corrplot)

## Warning: package 'corrplot' was built under R version 4.2.2

## corrplot 0.92 loaded

library(ggcorrplot)

## Warning: package 'ggcorrplot' was built under R version 4.2.2

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 4.2.2

M = data_reduce[,8:11]

#Library corrplot
#-- Pearson correlation
par(mfrow=c(2,2))
corrplot(cor(M), type="upper",order="hclust")
corrplot(cor(M), method="square", type="upper")
corrplot(cor(M), method="number", type="lower")
corrplot(cor(M), method="ellipse")

#--Kendall correlation
par(mfrow=c(2,2))
corrplot(cor(M, method="kendall"), type="upper", order="hclust")
corrplot(cor(M, method="kendall"), method="square", type="upper")
corrplot(cor(M, method="kendall"), method="number", type="lower")
corrplot(cor(M, method="kendall"), method="ellipse")

# library ggcorrplot
corr = round(cor(M),1) # Pearson correlation
ggcorrplot(round(cor(M),1),
          hc.order = TRUE,
          type ="lower",
          lab = TRUE,
          lab_size = 3,
          method = "circle",
          colors = c("tomato2","white", "springgreen3"),
          title="Correlogram of Data Nasabah",
          ggtheme=theme_bw)

## Feature selection/independent variable/input In doing modeling, of course, we need to review what variables have an effect on our model, especially in classification. On this occasion we use the Multinomial Regression model.

Then how to determine what variables have an effect?

There are many alternatives, one of which is Information Gain. Through information gain, the importance value of the variable is taken which is more than 0.02 (you can explore what happens if we take a value that is less than 0.02).

colnames(data_reduce)

##  [1] "NOMOR_KONTRAK"             "DOMISILI"                 
##  [3] "KARAKTER"                  "PROFESI"                  
##  [5] "KONDISI_USAHA"             "KONDISI_JAMINAN"          
##  [7] "STATUS"                    "PRODUK"                   
##  [9] "PYD"                       "TENOR"                    
## [11] "OSL"                       "KEWAJIBAN"                
## [13] "KOLEKTIBILITAS"            "COUNT_SURVEY"             
## [15] "REKOMENDASI_TINDAK_LANJUT"

data_select =
data_reduce[,c("KARAKTER","KONDISI_USAHA","KONDISI_JAMINAN","STATUS","KEWAJIBAN","OSL","KOLEKTIBILITAS","REKOMENDASI_TINDAK_LANJUT")]
data_non_na = na.omit(data_select)
head(data_non_na,9)

##           KARAKTER KONDISI_USAHA KONDISI_JAMINAN STATUS KEWAJIBAN      OSL
## 1       KOOPERATIF             2            Baik      2   4896841 28750000
## 2 TIDAK KOOPERATIF             3           Rusak      8         0  2040693
## 3 TIDAK KOOPERATIF             3            Baik      8         0        0
## 4       KOOPERATIF             1           Rusak      7         0        0
## 5 TIDAK KOOPERATIF             1            Baik      8         0 19844807
## 6       KOOPERATIF             1            Baik      7   2208516 27298726
## 7       KOOPERATIF             3            Baik      7   6946592 19999200
## 8       KOOPERATIF             2            Baik      7         0  7500000
## 9       KOOPERATIF             2            Baik      7         0 45000000
##            KOLEKTIBILITAS REKOMENDASI_TINDAK_LANJUT
## 1 DALAM PENGAWASAN KHUSUS            Angsuran Biasa
## 2                   MACET                 Penarikan
## 3                   MACET                 Penarikan
## 4                   MACET            Angsuran Biasa
## 5                   MACET                 Penarikan
## 6 DALAM PENGAWASAN KHUSUS           Restrukturisasi
## 7 DALAM PENGAWASAN KHUSUS           Restrukturisasi
## 8                  LANCAR           Restrukturisasi
## 9                  LANCAR           Restrukturisasi

Data Transformation

To provide good model performance, our data needs to be done certain treatments, for example scaling or grouping data or also called bucketing.

data_select_new = data_select
data_select_new$KEWAJIBAN = scale(data_select_new$KEWAJIBAN)[, 1]
data_select_new$OSL = scale(data_select_new$OSL)[,1]
data_select_new$KEWAJIBAN = cut(data_select_new$KEWAJIBAN, breaks = c(-0.354107,5,15,30))
data_select_new$KEWAJIBAN = as.factor(data_select_new$KEWAJIBAN) 
data_select_new$OSL = cut(data_select_new$OSL, breaks = c(-0.60383,3,10,15))
data_select_new$OSL = as.factor(data_select_new$OSL)
data_select_new = na.omit(data_select_new)
head(data_select_new, 9)

##           KARAKTER KONDISI_USAHA KONDISI_JAMINAN STATUS  KEWAJIBAN        OSL
## 1       KOOPERATIF             2            Baik      2 (-0.354,5] (-0.604,3]
## 2 TIDAK KOOPERATIF             3           Rusak      8 (-0.354,5] (-0.604,3]
## 3 TIDAK KOOPERATIF             3            Baik      8 (-0.354,5] (-0.604,3]
## 4       KOOPERATIF             1           Rusak      7 (-0.354,5] (-0.604,3]
## 5 TIDAK KOOPERATIF             1            Baik      8 (-0.354,5] (-0.604,3]
## 6       KOOPERATIF             1            Baik      7 (-0.354,5] (-0.604,3]
## 7       KOOPERATIF             3            Baik      7 (-0.354,5] (-0.604,3]
## 8       KOOPERATIF             2            Baik      7 (-0.354,5] (-0.604,3]
## 9       KOOPERATIF             2            Baik      7 (-0.354,5] (-0.604,3]
##            KOLEKTIBILITAS REKOMENDASI_TINDAK_LANJUT
## 1 DALAM PENGAWASAN KHUSUS            Angsuran Biasa
## 2                   MACET                 Penarikan
## 3                   MACET                 Penarikan
## 4                   MACET            Angsuran Biasa
## 5                   MACET                 Penarikan
## 6 DALAM PENGAWASAN KHUSUS           Restrukturisasi
## 7 DALAM PENGAWASAN KHUSUS           Restrukturisasi
## 8                  LANCAR           Restrukturisasi
## 9                  LANCAR           Restrukturisasi

Traning data

Before entering into modeling,we need to separate our data into training and testing (some divide it into training, testing, and validation). The purpose of this data separation is to see the ability of our model to make predictions as well as the purpose of our modeling.

library(caret)

## Warning: package 'caret' was built under R version 4.2.2

## Loading required package: lattice

library(lattice)
index = createDataPartition(data_select_new$REKOMENDASI_TINDAK_LANJUT, p = .95, list = FALSE)
train = data_select_new[index, ]
test = data_select_new[index, ]
 dim(train)

## [1] 952   8

dim(test)

## [1] 952   8

Modelling

Remember that we are using a Multinomial Regression Model, where we need to specify a reference of the target class. This target class reference is the class that has the highest number of members.

train2 = train 

# Setting the reference
train2$REKOMENDASI_TINDAK_LANJUT = relevel(train2$REKOMENDASI_TINDAK_LANJUT, ref = "Angsuran Biasa")
# training model
require(nnet)

## Loading required package: nnet

## Warning: package 'nnet' was built under R version 4.2.2

# training the multinomial mode
multinom_model = multinom(REKOMENDASI_TINDAK_LANJUT ~ ., data=train2)

## # weights:  64 (45 variable)
## initial  value 1319.752232 
## iter  10 value 743.235375
## iter  20 value 616.432909
## iter  30 value 612.563476
## iter  40 value 612.412202
## iter  50 value 612.387772
## iter  60 value 612.387449
## iter  60 value 612.387446
## iter  60 value 612.387446
## final  value 612.387446 
## converged

# checking model
summary(multinom_model)

## Warning in sqrt(diag(vc)): NaNs produced

## Call:
## multinom(formula = REKOMENDASI_TINDAK_LANJUT ~ ., data = train2)
## 
## Coefficients:
##                  (Intercept) KARAKTERTIDAK KOOPERATIF KONDISI_USAHA
## Diskon Pelunasan   -3.249705                 1.810700    -0.7507187
## Penarikan          -7.470390                 4.112703     0.7741951
## Restrukturisasi    -4.113883                -2.234672     0.9887580
##                  KONDISI_JAMINANHilang KONDISI_JAMINANPindah Tangan
## Diskon Pelunasan           -22.4687812                  -20.6245431
## Penarikan                    0.6878420                    0.4362509
## Restrukturisasi              0.4500063                  -13.8932023
##                  KONDISI_JAMINANRusak     STATUS KEWAJIBAN(5,15]
## Diskon Pelunasan          -23.0638486 0.09018175               0
## Penarikan                  -1.1333422 0.07871418               0
## Restrukturisasi             0.2776425 0.31833914               0
##                  KEWAJIBAN(15,30]   OSL(3,10] OSL(10,15]
## Diskon Pelunasan       -3.5842713   3.2792970          0
## Penarikan               0.7381747 -22.8015258          0
## Restrukturisasi        23.0708746   0.6303358          0
##                  KOLEKTIBILITASDIRAGUKAN KOLEKTIBILITASKURANG LANCAR
## Diskon Pelunasan               2.7648800                   0.3241799
## Penarikan                      4.3963220                   0.7007799
## Restrukturisasi               -0.2825678                  -0.1598568
##                  KOLEKTIBILITASLANCAR KOLEKTIBILITASMACET
## Diskon Pelunasan            0.2575033           0.6756479
## Penarikan                 -35.4163473           1.6162054
## Restrukturisasi             0.8280166          -3.1832824
## 
## Std. Errors:
##                  (Intercept) KARAKTERTIDAK KOOPERATIF KONDISI_USAHA
## Diskon Pelunasan    1.104367                0.8940158     0.3852032
## Penarikan           1.284050                0.6871785     0.3919627
## Restrukturisasi     0.354970                0.7807061     0.1037661
##                  KONDISI_JAMINANHilang KONDISI_JAMINANPindah Tangan
## Diskon Pelunasan          2.283885e-11                 2.444888e-10
## Penarikan                 1.175756e+00                 6.830980e-01
## Restrukturisasi           1.174936e+00                 5.753451e-07
##                  KONDISI_JAMINANRusak    STATUS KEWAJIBAN(5,15]
## Diskon Pelunasan         3.609143e-11 0.1408186    5.630232e-15
## Penarikan                1.350562e+00 0.1064586    1.294408e-16
## Restrukturisasi          1.224095e+00 0.0372587    0.000000e+00
##                  KEWAJIBAN(15,30]    OSL(3,10]   OSL(10,15]
## Diskon Pelunasan     4.136453e-13 1.071363e+00 9.704891e-16
## Penarikan                     NaN 1.704960e-12          NaN
## Restrukturisasi      1.889551e-11 6.991349e-01 0.000000e+00
##                  KOLEKTIBILITASDIRAGUKAN KOLEKTIBILITASKURANG LANCAR
## Diskon Pelunasan               0.9208343                   0.7435020
## Penarikan                      0.9036990                   0.6890898
## Restrukturisasi                0.5845599                   0.1928828
##                  KOLEKTIBILITASLANCAR KOLEKTIBILITASMACET
## Diskon Pelunasan         9.404178e-01           1.2080996
## Penarikan                1.360775e-16           0.7222968
## Restrukturisasi          2.166407e-01           1.0675992
## 
## Residual Deviance: 1224.775 
## AIC: 1302.775

# converting the coefficients to odds by taking the exponential of the coefficients
exp(coef(multinom_model))

##                   (Intercept) KARAKTERTIDAK KOOPERATIF KONDISI_USAHA
## Diskon Pelunasan 0.0387856565                6.1147236     0.4720272
## Penarikan        0.0005697059               61.1116529     2.1688457
## Restrukturisasi  0.0163441848                0.1070273     2.6878940
##                  KONDISI_JAMINANHilang KONDISI_JAMINANPindah Tangan
## Diskon Pelunasan          1.745550e-10                 1.103760e-09
## Penarikan                 1.989418e+00                 1.546897e+00
## Restrukturisasi           1.568322e+00                 9.252496e-07
##                  KONDISI_JAMINANRusak   STATUS KEWAJIBAN(5,15] KEWAJIBAN(15,30]
## Diskon Pelunasan         9.627152e-11 1.094373               1     2.775689e-02
## Penarikan                3.219554e-01 1.081895               1     2.092113e+00
## Restrukturisasi          1.320014e+00 1.374842               1     1.046053e+10
##                     OSL(3,10] OSL(10,15] KOLEKTIBILITASDIRAGUKAN
## Diskon Pelunasan 2.655710e+01          1              15.8771345
## Penarikan        1.251478e-10          1              81.1518401
## Restrukturisasi  1.878241e+00          1               0.7538455
##                  KOLEKTIBILITASKURANG LANCAR KOLEKTIBILITASLANCAR
## Diskon Pelunasan                   1.3828960         1.293696e+00
## Penarikan                          2.0153238         4.157917e-16
## Restrukturisasi                    0.8522658         2.288775e+00
##                  KOLEKTIBILITASMACET
## Diskon Pelunasan          1.96530593
## Penarikan                 5.03395212
## Restrukturisasi           0.04144938

head(round(fitted(multinom_model), 2))

##   Angsuran Biasa Diskon Pelunasan Penarikan Restrukturisasi
## 1           0.81             0.01      0.00            0.18
## 2           0.48             0.00      0.51            0.01
## 3           0.22             0.02      0.75            0.00
## 4           0.97             0.00      0.00            0.02
## 5           0.46             0.21      0.33            0.00
## 6           0.69             0.02      0.00            0.28

For specific purposes, we need to know the opportunity of each row of data (a representative of the customer) to enter on a specific target class. The goal is to see how influential the model is for classification. In addition, it can also be from a business point of view, in the case of real determination of the threshold on the opportunity value is also associated with several factors, for example customer revenue. To see the opportunity value that a customer enters on a specific target class.

# predicting the values for train dataset
train2$ClassPredicted = predict(multinom_model, newdata = train2, "class")
train_prob = predict(multinom_model, newdata = train2, "probs")
df = train_prob
df$max = apply(df,1,max)

## Warning in df$max = apply(df, 1, max): Coercing LHS to a list

train2$score = df$max
test_prob = predict(multinom_model, newdata = test, "probs")
df2 = test_prob
df2$max = apply(df2,1,max)

## Warning in df2$max = apply(df2, 1, max): Coercing LHS to a list

# Building classification table
tab_train = table(train2$REKOMENDASI_TINDAK_LANJUT, train2$ClassPredicted)
round((sum(diag(tab_train))/sum(tab_train))*100,4)

## [1] 68.5924

test$ClassPredicted = predict(multinom_model, newdata = test, "class")
test$score = df2$max
tab_test = table(test$REKOMENDASI_TINDAK_LANJUT, test$ClassPredicted)
round((sum(diag(tab_test))/sum(tab_test))*100,4)

## [1] 68.5924