LBB DSS Behaviour Credit Score

Problem Statement

As Data team, we have assumption in this bank that we are credit company that have market share. We want to decrease bad rate loss potential.

Objectives : To minimize losses, stricter cutoffs are needed in the scorecard.

Library

# install libraries
source("setup.R")
library(dplyr)      # data manipulation
library(UBL)        # upsampling
library(scorecard)  # scorecard

Data Input

dt <- read.csv("bankloans.csv")
head(dt)

##   age ed employ address income debtinc  creddebt  othdebt def
## 1  41  3     17      12    176     9.3 11.359392 5.008608   1
## 2  27  1     10       6     31    17.3  1.362202 4.000798   0
## 3  40  1     15      14     55     5.5  0.856075 2.168925   0
## 4  41  1     15      14    120     2.9  2.658720 0.821280   0
## 5  24  2      2       0     28    17.3  1.787436 3.056564   1
## 6  41  2      5       5     25    10.2  0.392700 2.157300   0

Dataset can be found here. Description of our dataset is below

age : Age of the Customers
ed : Education Level
employ : Work Experience
address : Address of the Customer
income : Yearly Income of the Customer
debtinc : Debt to Income Ratio
creddebt : Credit to Debt Ratio
othdebt : Other Debts
def : Target, 0 default, 1 not default

Exploratory Data Analysis

# check data types
describe(dt)

##    variable   class count missing_rate unique_count identical_rate       min
##      <char>  <char> <int>        <num>        <int>          <num>     <num>
## 1:      age integer   700            0           37         0.0629 20.000000
## 2:       ed integer   700            0            5         0.5314  1.000000
## 3:   employ integer   700            0           32         0.0886  0.000000
## 4:  address integer   700            0           31         0.0843  0.000000
## 5:   income integer   700            0          114         0.0343 14.000000
## 6:  debtinc numeric   700            0          231         0.0143  0.400000
## 7: creddebt numeric   700            0          695         0.0029  0.011696
## 8:  othdebt numeric   700            0          699         0.0029  0.045584
## 9:      def integer   700            0            2         0.7386  0.000000
##           p25        p50       p75       max    mean      sd     cv
##         <num>      <num>     <num>     <num>   <num>   <num>  <num>
## 1: 29.0000000 34.0000000 40.000000  56.00000 34.8600  7.9973 0.2294
## 2:  1.0000000  1.0000000  2.000000   5.00000  1.7229  0.9282 0.5388
## 3:  3.0000000  7.0000000 12.000000  31.00000  8.3886  6.6580 0.7937
## 4:  3.0000000  7.0000000 12.000000  34.00000  8.2786  6.8249 0.8244
## 5: 24.0000000 34.0000000 55.000000 446.00000 45.6014 36.8142 0.8073
## 6:  5.0000000  8.6000000 14.125000  41.30000 10.2606  6.8272 0.6654
## 7:  0.3690593  0.8548695  1.901955  20.56131  1.5536  2.1172 1.3628
## 8:  1.0441782  1.9875675  3.923065  27.03360  3.0582  3.2876 1.0750
## 9:  0.0000000  0.0000000  1.000000   1.00000  0.2614  0.4397 1.6820

# change data types
dt <- dt %>% mutate(ed = as.factor(ed))

# check data balance
prop.table(table(dt$def))

## 
##         0         1 
## 0.7385714 0.2614286

We can consider our target has balance data even though the class proportion is 70:30.

Data Preprocessing

# split into train and test
set.seed(572)
idx <- sample(x = nrow(dt), size = nrow(dt) * 0.8)

train <- dt[idx,]
test <- dt[-idx,]

prop.table(table(train$def))

## 
##         0         1 
## 0.7357143 0.2642857

Initial Characteristic Analysis

Weight of Evidence (WoE)

We want to classify (binning) splitting positive and negative class. This would make scorecard analysis easier. We can see how each class has potential risk.

binning <- woebin(dt = train,
                  y = 'def',
                  positive = 0)

## ✔ Binning on 560 rows and 9 columns in 00:00:04

binning

## $age
##    variable       bin count count_distr   neg   pos   posprob         woe
##      <char>    <char> <int>       <num> <int> <int>     <num>       <num>
## 1:      age [-Inf,26)    68   0.1214286    31    37 0.5441176 -0.84688037
## 2:      age   [26,30)   104   0.1857143    34    70 0.6730769 -0.30167636
## 3:      age   [30,46)   318   0.5678571    64   254 0.7987421  0.35464011
## 4:      age [46, Inf)    70   0.1250000    19    51 0.7285714 -0.03642442
##          bin_iv  total_iv breaks is_special_values
##           <num>     <num> <char>            <lgcl>
## 1: 0.1013323137 0.1848274     26             FALSE
## 2: 0.0180483363 0.1848274     30             FALSE
## 3: 0.0652794636 0.1848274     46             FALSE
## 4: 0.0001672599 0.1848274    Inf             FALSE
## 
## $ed
##    variable    bin count count_distr   neg   pos   posprob         woe
##      <char> <char> <int>       <num> <int> <int>     <num>       <num>
## 1:       ed      1   290  0.51785714    64   226 0.7793103  0.23784084
## 2:       ed      2   163  0.29107143    46   117 0.7177914 -0.09027854
## 3:       ed      3    73  0.13035714    26    47 0.6438356 -0.43176001
## 4:       ed  4%,%5    34  0.06071429    12    22 0.6470588 -0.41767527
##         bin_iv   total_iv breaks is_special_values
##          <num>      <num> <char>            <lgcl>
## 1: 0.027615999 0.06819626      1             FALSE
## 2: 0.002422194 0.06819626      2             FALSE
## 3: 0.026595556 0.06819626      3             FALSE
## 4: 0.011562514 0.06819626  4%,%5             FALSE
## 
## $employ
##    variable       bin count count_distr   neg   pos   posprob          woe
##      <char>    <char> <int>       <num> <int> <int>     <num>        <num>
## 1:   employ  [-Inf,4)   158  0.28214286    73    85 0.5379747 -0.871619260
## 2:   employ     [4,6)    69  0.12321429    20    49 0.7101449 -0.127723051
## 3:   employ    [6,13)   188  0.33571429    35   153 0.8138298  0.451278784
## 4:   employ   [13,15)    34  0.06071429     9    25 0.7352941 -0.002159828
## 5:   employ [15, Inf)   111  0.19821429    11   100 0.9009009  1.183463838
##          bin_iv  total_iv breaks is_special_values
##           <num>     <num> <char>            <lgcl>
## 1: 2.500959e-01 0.5123193      4             FALSE
## 2: 2.069509e-03 0.5123193      6             FALSE
## 3: 6.086520e-02 0.5123193     13             FALSE
## 4: 2.833676e-07 0.5123193     15             FALSE
## 5: 1.992884e-01 0.5123193    Inf             FALSE
## 
## $address
##    variable       bin count count_distr   neg   pos   posprob        woe
##      <char>    <char> <int>       <num> <int> <int>     <num>      <num>
## 1:  address  [-Inf,1)    38  0.06785714    17    21 0.5526316 -0.8125020
## 2:  address     [1,7)   225  0.40178571    75   150 0.6666667 -0.3306639
## 3:  address     [7,9)    63  0.11250000    14    49 0.7777778  0.2289519
## 4:  address    [9,11)    59  0.10535714     6    53 0.8983051  1.1547214
## 5:  address   [11,19)   119  0.21250000    29    90 0.7563025  0.1087028
## 6:  address [19, Inf)    56  0.10000000     7    49 0.8750000  0.9220991
##         bin_iv  total_iv breaks is_special_values
##          <num>     <num> <char>            <lgcl>
## 1: 0.051913994 0.2748964      1             FALSE
## 2: 0.047178823 0.2748964      7             FALSE
## 3: 0.005572104 0.2748964      9             FALSE
## 4: 0.101731225 0.2748964     11             FALSE
## 5: 0.002445884 0.2748964     19             FALSE
## 6: 0.066054329 0.2748964    Inf             FALSE
## 
## $income
##    variable       bin count count_distr   neg   pos   posprob         woe
##      <char>    <char> <int>       <num> <int> <int>     <num>       <num>
## 1:   income [-Inf,20)    61  0.10892857    25    36 0.5901639 -0.65916796
## 2:   income   [20,30)   167  0.29821429    52   115 0.6886228 -0.23012267
## 3:   income   [30,34)    43  0.07678571     7    36 0.8372093  0.61379771
## 4:   income   [34,60)   166  0.29642857    43   123 0.7409639  0.02717316
## 5:   income   [60,70)    37  0.06607143     4    33 0.8918919  1.08640212
## 6:   income   [70,90)    40  0.07142857     9    31 0.7750000  0.21295155
## 7:   income [90, Inf)    46  0.08214286     8    38 0.8260870  0.53433354
##          bin_iv  total_iv breaks is_special_values
##           <num>     <num> <char>            <lgcl>
## 1: 0.0537487390 0.1763177     20             FALSE
## 2: 0.0166206412 0.1763177     30             FALSE
## 3: 0.0246018370 0.1763177     34             FALSE
## 4: 0.0002174709 0.1763177     60             FALSE
## 5: 0.0576554263 0.1763177     70             FALSE
## 6: 0.0030732971 0.1763177     90             FALSE
## 7: 0.0204002966 0.1763177    Inf             FALSE
## 
## $debtinc
##    variable       bin count count_distr   neg   pos   posprob        woe
##      <char>    <char> <int>       <num> <int> <int>     <num>      <num>
## 1:  debtinc  [-Inf,3)    59  0.10535714     3    56 0.9491525  1.9029283
## 2:  debtinc    [3,11)   282  0.50357143    46   236 0.8368794  0.6113793
## 3:  debtinc   [11,16)   107  0.19107143    37    70 0.6542056 -0.3862337
## 4:  debtinc   [16,24)    83  0.14821429    40    43 0.5180723 -0.9514904
## 5:  debtinc [24, Inf)    29  0.05178571    22     7 0.2413793 -2.1689434
##       bin_iv  total_iv breaks is_special_values
##        <num>     <num> <char>            <lgcl>
## 1: 0.2200776 0.8546111      3             FALSE
## 2: 0.1601843 0.8546111     11             FALSE
## 3: 0.0309362 0.8546111     16             FALSE
## 4: 0.1578535 0.8546111     24             FALSE
## 5: 0.2855595 0.8546111    Inf             FALSE
## 
## $creddebt
##    variable        bin count count_distr   neg   pos   posprob          woe
##      <char>     <char> <int>       <num> <int> <int>     <num>        <num>
## 1: creddebt [-Inf,0.1)    28   0.0500000     1    27 0.9642857  2.272025790
## 2: creddebt  [0.1,1.2)   310   0.5535714    59   251 0.8096774  0.424104420
## 3: creddebt  [1.2,2.9)   145   0.2589286    55    90 0.6206897 -0.531334590
## 4: creddebt  [2.9,5.5)    49   0.0875000    13    36 0.7346939 -0.005241495
## 5: creddebt [5.5, Inf)    28   0.0500000    20     8 0.2857143 -1.940101807
##          bin_iv  total_iv breaks is_special_values
##           <num>     <num> <char>            <lgcl>
## 1: 1.335434e-01 0.5287426    0.1             FALSE
## 2: 8.930564e-02 0.5287426    1.2             FALSE
## 3: 8.138719e-02 0.5287426    2.9             FALSE
## 4: 2.406879e-06 0.5287426    5.5             FALSE
## 5: 2.245040e-01 0.5287426    Inf             FALSE
## 
## $othdebt
##    variable        bin count count_distr   neg   pos   posprob          woe
##      <char>     <char> <int>       <num> <int> <int>     <num>        <num>
## 1:  othdebt [-Inf,0.4)    28  0.05000000     5    23 0.8214286  0.502245228
## 2:  othdebt  [0.4,0.6)    33  0.05892857     1    32 0.9696970  2.441924827
## 3:  othdebt  [0.6,1.8)   194  0.34642857    51   143 0.7371134  0.007207922
## 4:  othdebt  [1.8,2.4)    60  0.10714286    11    49 0.8166667  0.470113950
## 5:  othdebt [2.4, Inf)   245  0.43750000    80   165 0.6734694 -0.299892236
##          bin_iv  total_iv breaks is_special_values
##           <num>     <num> <char>            <lgcl>
## 1: 1.107022e-02 0.2472249    0.4             FALSE
## 2: 1.731646e-01 0.2472249    0.6             FALSE
## 3: 1.796779e-05 0.2472249    1.8             FALSE
## 4: 2.097071e-02 0.2472249    2.4             FALSE
## 5: 4.200144e-02 0.2472249    Inf             FALSE

From income binning, the yearly income range -inf to 19 and 20 to 29 has negative WoE values means that range of income is not good if we approved, the rate of people that can pay is 59% to 68%. Change train data and test data into woebin form by apply woebin.

# apply woebin train
train_woe <- woebin_ply(dt = train,
                        bins = binning)

## ✔ Woe transformating on 560 rows and 8 columns in 00:00:00

head(train_woe)

##      def     age_woe      ed_woe employ_woe address_woe  income_woe debtinc_woe
##    <int>       <num>       <num>      <num>       <num>       <num>       <num>
## 1:     0 -0.03642442  0.23784084  1.1834638   0.2289519  0.02717316  -2.1689434
## 2:     1 -0.84688037  0.23784084 -0.8716193  -0.3306639 -0.23012267  -2.1689434
## 3:     0 -0.84688037  0.23784084 -0.1277231  -0.3306639 -0.23012267   0.6113793
## 4:     0  0.35464011  0.23784084  0.4512788   1.1547214 -0.23012267  -0.9514904
## 5:     0  0.35464011 -0.09027854  0.4512788   0.9220991  0.02717316   1.9029283
## 6:     0 -0.84688037 -0.09027854 -0.8716193  -0.3306639 -0.65916796   0.6113793
##    creddebt_woe  othdebt_woe
##           <num>        <num>
## 1: -0.005241495 -0.299892236
## 2: -0.531334590 -0.299892236
## 3:  0.424104420  0.470113950
## 4: -0.531334590 -0.299892236
## 5:  2.272025790  0.502245228
## 6:  2.272025790  0.007207922

# apply woebin test
test_woe <- woebin_ply(dt = test,
                        bins = binning)

## ✔ Woe transformating on 140 rows and 8 columns in 00:00:00

head(test_woe)

##      def     age_woe      ed_woe   employ_woe address_woe  income_woe
##    <int>       <num>       <num>        <num>       <num>       <num>
## 1:     0 -0.30167636  0.23784084  0.451278784  -0.3306639  0.61379771
## 2:     0 -0.30167636  0.23784084 -0.871619260  -0.3306639 -0.65916796
## 3:     0 -0.84688037  0.23784084 -0.127723051  -0.8125020 -0.23012267
## 4:     0 -0.03642442  0.23784084  1.183463838   0.1087028  0.53433354
## 5:     1  0.35464011 -0.09027854 -0.002159828  -0.3306639  0.02717316
## 6:     0 -0.30167636  0.23784084  0.451278784  -0.3306639 -0.23012267
##    debtinc_woe creddebt_woe  othdebt_woe
##          <num>        <num>        <num>
## 1:  -0.9514904 -0.531334590 -0.299892236
## 2:   1.9029283  0.424104420  0.502245228
## 3:   0.6113793  0.424104420  0.007207922
## 4:   0.6113793 -0.005241495 -0.299892236
## 5:  -0.9514904 -0.005241495 -0.299892236
## 6:   0.6113793  0.424104420  0.470113950

Information Value (IV)

feature importance, whether each variables give good information or not in classify positive and negative class. The result is shown descending

iv(dt = train_woe,
   y = 'def',
   positive = 0)

##        variable info_value
##          <char>      <num>
## 1:  debtinc_woe 0.85461111
## 2: creddebt_woe 0.52874264
## 3:   employ_woe 0.51231930
## 4:  address_woe 0.27489636
## 5:  othdebt_woe 0.24722490
## 6:      age_woe 0.18482737
## 7:   income_woe 0.17631771
## 8:       ed_woe 0.06819626

We don’t have the unpredictive and weak variable so we don’t need remove columns.

Modelling

Logistic Regression

mdl <- glm(formula = def~.,
           data = train_woe,
           family = 'binomial')
summary(mdl)

## 
## Call:
## glm(formula = def ~ ., family = "binomial", data = train_woe)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1505  -0.6299  -0.3114   0.4071   3.1680  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -1.0792     0.1250  -8.637  < 2e-16 ***
## age_woe       -0.2960     0.3030  -0.977 0.328511    
## ed_woe        -0.5329     0.4732  -1.126 0.260146    
## employ_woe    -1.0378     0.2096  -4.952 7.36e-07 ***
## address_woe   -1.0542     0.2678  -3.936 8.28e-05 ***
## income_woe    -0.7757     0.3858  -2.010 0.044395 *  
## debtinc_woe   -0.6088     0.1607  -3.789 0.000151 ***
## creddebt_woe  -1.1309     0.2184  -5.178 2.24e-07 ***
## othdebt_woe   -0.7077     0.3098  -2.284 0.022346 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 646.79  on 559  degrees of freedom
## Residual deviance: 445.61  on 551  degrees of freedom
## AIC: 463.61
## 
## Number of Fisher Scoring iterations: 6

Age and Education is not significant to our model.

Prediction

Evaluate model for the scorecard

test_woe$pred <- predict(object = mdl,
                        newdata = test_woe,
                        type = 'response')
test_woe$pred %>% head()

## [1] 0.42065863 0.20656462 0.34421558 0.04283838 0.49778716 0.09600386

Scorecard

Make a scorecard using universal odds0 1/19 and points0 600. Odds 1/19 means that we want to set in 19 people that positive (can pay), 1 person of them is negative (can’t pay).

score_card <- scorecard(bins = binning,
                       model = mdl,
                       odds0 = 1/19,
                       points0 = 600,
                       pdo = 20)  

score_card

## $basepoints
##      variable    bin    woe points
##        <char> <lgcl> <lgcl>  <num>
## 1: basepoints     NA     NA    546
## 
## $age
##    variable       bin count count_distr   neg   pos   posprob         woe
##      <char>    <char> <int>       <num> <int> <int>     <num>       <num>
## 1:      age [-Inf,26)    68   0.1214286    31    37 0.5441176 -0.84688037
## 2:      age   [26,30)   104   0.1857143    34    70 0.6730769 -0.30167636
## 3:      age   [30,46)   318   0.5678571    64   254 0.7987421  0.35464011
## 4:      age [46, Inf)    70   0.1250000    19    51 0.7285714 -0.03642442
##          bin_iv  total_iv breaks is_special_values points
##           <num>     <num> <char>            <lgcl>  <num>
## 1: 0.1013323137 0.1848274     26             FALSE     -7
## 2: 0.0180483363 0.1848274     30             FALSE     -3
## 3: 0.0652794636 0.1848274     46             FALSE      3
## 4: 0.0001672599 0.1848274    Inf             FALSE      0
## 
## $ed
##    variable    bin count count_distr   neg   pos   posprob         woe
##      <char> <char> <int>       <num> <int> <int>     <num>       <num>
## 1:       ed      1   290  0.51785714    64   226 0.7793103  0.23784084
## 2:       ed      2   163  0.29107143    46   117 0.7177914 -0.09027854
## 3:       ed      3    73  0.13035714    26    47 0.6438356 -0.43176001
## 4:       ed  4%,%5    34  0.06071429    12    22 0.6470588 -0.41767527
##         bin_iv   total_iv breaks is_special_values points
##          <num>      <num> <char>            <lgcl>  <num>
## 1: 0.027615999 0.06819626      1             FALSE      4
## 2: 0.002422194 0.06819626      2             FALSE     -1
## 3: 0.026595556 0.06819626      3             FALSE     -7
## 4: 0.011562514 0.06819626  4%,%5             FALSE     -6
## 
## $employ
##    variable       bin count count_distr   neg   pos   posprob          woe
##      <char>    <char> <int>       <num> <int> <int>     <num>        <num>
## 1:   employ  [-Inf,4)   158  0.28214286    73    85 0.5379747 -0.871619260
## 2:   employ     [4,6)    69  0.12321429    20    49 0.7101449 -0.127723051
## 3:   employ    [6,13)   188  0.33571429    35   153 0.8138298  0.451278784
## 4:   employ   [13,15)    34  0.06071429     9    25 0.7352941 -0.002159828
## 5:   employ [15, Inf)   111  0.19821429    11   100 0.9009009  1.183463838
##          bin_iv  total_iv breaks is_special_values points
##           <num>     <num> <char>            <lgcl>  <num>
## 1: 2.500959e-01 0.5123193      4             FALSE    -26
## 2: 2.069509e-03 0.5123193      6             FALSE     -4
## 3: 6.086520e-02 0.5123193     13             FALSE     14
## 4: 2.833676e-07 0.5123193     15             FALSE      0
## 5: 1.992884e-01 0.5123193    Inf             FALSE     35
## 
## $address
##    variable       bin count count_distr   neg   pos   posprob        woe
##      <char>    <char> <int>       <num> <int> <int>     <num>      <num>
## 1:  address  [-Inf,1)    38  0.06785714    17    21 0.5526316 -0.8125020
## 2:  address     [1,7)   225  0.40178571    75   150 0.6666667 -0.3306639
## 3:  address     [7,9)    63  0.11250000    14    49 0.7777778  0.2289519
## 4:  address    [9,11)    59  0.10535714     6    53 0.8983051  1.1547214
## 5:  address   [11,19)   119  0.21250000    29    90 0.7563025  0.1087028
## 6:  address [19, Inf)    56  0.10000000     7    49 0.8750000  0.9220991
##         bin_iv  total_iv breaks is_special_values points
##          <num>     <num> <char>            <lgcl>  <num>
## 1: 0.051913994 0.2748964      1             FALSE    -25
## 2: 0.047178823 0.2748964      7             FALSE    -10
## 3: 0.005572104 0.2748964      9             FALSE      7
## 4: 0.101731225 0.2748964     11             FALSE     35
## 5: 0.002445884 0.2748964     19             FALSE      3
## 6: 0.066054329 0.2748964    Inf             FALSE     28
## 
## $income
##    variable       bin count count_distr   neg   pos   posprob         woe
##      <char>    <char> <int>       <num> <int> <int>     <num>       <num>
## 1:   income [-Inf,20)    61  0.10892857    25    36 0.5901639 -0.65916796
## 2:   income   [20,30)   167  0.29821429    52   115 0.6886228 -0.23012267
## 3:   income   [30,34)    43  0.07678571     7    36 0.8372093  0.61379771
## 4:   income   [34,60)   166  0.29642857    43   123 0.7409639  0.02717316
## 5:   income   [60,70)    37  0.06607143     4    33 0.8918919  1.08640212
## 6:   income   [70,90)    40  0.07142857     9    31 0.7750000  0.21295155
## 7:   income [90, Inf)    46  0.08214286     8    38 0.8260870  0.53433354
##          bin_iv  total_iv breaks is_special_values points
##           <num>     <num> <char>            <lgcl>  <num>
## 1: 0.0537487390 0.1763177     20             FALSE    -15
## 2: 0.0166206412 0.1763177     30             FALSE     -5
## 3: 0.0246018370 0.1763177     34             FALSE     14
## 4: 0.0002174709 0.1763177     60             FALSE      1
## 5: 0.0576554263 0.1763177     70             FALSE     24
## 6: 0.0030732971 0.1763177     90             FALSE      5
## 7: 0.0204002966 0.1763177    Inf             FALSE     12
## 
## $debtinc
##    variable       bin count count_distr   neg   pos   posprob        woe
##      <char>    <char> <int>       <num> <int> <int>     <num>      <num>
## 1:  debtinc  [-Inf,3)    59  0.10535714     3    56 0.9491525  1.9029283
## 2:  debtinc    [3,11)   282  0.50357143    46   236 0.8368794  0.6113793
## 3:  debtinc   [11,16)   107  0.19107143    37    70 0.6542056 -0.3862337
## 4:  debtinc   [16,24)    83  0.14821429    40    43 0.5180723 -0.9514904
## 5:  debtinc [24, Inf)    29  0.05178571    22     7 0.2413793 -2.1689434
##       bin_iv  total_iv breaks is_special_values points
##        <num>     <num> <char>            <lgcl>  <num>
## 1: 0.2200776 0.8546111      3             FALSE     33
## 2: 0.1601843 0.8546111     11             FALSE     11
## 3: 0.0309362 0.8546111     16             FALSE     -7
## 4: 0.1578535 0.8546111     24             FALSE    -17
## 5: 0.2855595 0.8546111    Inf             FALSE    -38
## 
## $creddebt
##    variable        bin count count_distr   neg   pos   posprob          woe
##      <char>     <char> <int>       <num> <int> <int>     <num>        <num>
## 1: creddebt [-Inf,0.1)    28   0.0500000     1    27 0.9642857  2.272025790
## 2: creddebt  [0.1,1.2)   310   0.5535714    59   251 0.8096774  0.424104420
## 3: creddebt  [1.2,2.9)   145   0.2589286    55    90 0.6206897 -0.531334590
## 4: creddebt  [2.9,5.5)    49   0.0875000    13    36 0.7346939 -0.005241495
## 5: creddebt [5.5, Inf)    28   0.0500000    20     8 0.2857143 -1.940101807
##          bin_iv  total_iv breaks is_special_values points
##           <num>     <num> <char>            <lgcl>  <num>
## 1: 1.335434e-01 0.5287426    0.1             FALSE     74
## 2: 8.930564e-02 0.5287426    1.2             FALSE     14
## 3: 8.138719e-02 0.5287426    2.9             FALSE    -17
## 4: 2.406879e-06 0.5287426    5.5             FALSE      0
## 5: 2.245040e-01 0.5287426    Inf             FALSE    -63
## 
## $othdebt
##    variable        bin count count_distr   neg   pos   posprob          woe
##      <char>     <char> <int>       <num> <int> <int>     <num>        <num>
## 1:  othdebt [-Inf,0.4)    28  0.05000000     5    23 0.8214286  0.502245228
## 2:  othdebt  [0.4,0.6)    33  0.05892857     1    32 0.9696970  2.441924827
## 3:  othdebt  [0.6,1.8)   194  0.34642857    51   143 0.7371134  0.007207922
## 4:  othdebt  [1.8,2.4)    60  0.10714286    11    49 0.8166667  0.470113950
## 5:  othdebt [2.4, Inf)   245  0.43750000    80   165 0.6734694 -0.299892236
##          bin_iv  total_iv breaks is_special_values points
##           <num>     <num> <char>            <lgcl>  <num>
## 1: 1.107022e-02 0.2472249    0.4             FALSE     10
## 2: 1.731646e-01 0.2472249    0.6             FALSE     50
## 3: 1.796779e-05 0.2472249    1.8             FALSE      0
## 4: 2.097071e-02 0.2472249    2.4             FALSE     10
## 5: 4.200144e-02 0.2472249    Inf             FALSE     -6

# apply train to scorecard
score_train <- scorecard_ply(dt = train, 
                             card = score_card,
                             only_total_score = FALSE) 
score_train %>% head()

##    age_points ed_points employ_points address_points income_points
##         <num>     <num>         <num>          <num>         <num>
## 1:          0         4            35              7             1
## 2:         -7         4           -26            -10            -5
## 3:         -7         4            -4            -10            -5
## 4:          3         4            14             35            -5
## 5:          3        -1            14             28             1
## 6:         -7        -1           -26            -10           -15
##    debtinc_points creddebt_points othdebt_points score
##             <num>           <num>          <num> <num>
## 1:            -38               0             -6   549
## 2:            -38             -17             -6   441
## 3:             11              14             10   559
## 4:            -17             -17             -6   557
## 5:             33              74             10   708
## 6:             11              74              0   572

# apply test to scorecard
score_test <- scorecard_ply(dt = test, 
                             card = score_card,
                             only_total_score = FALSE) 
score_test %>% head()

##    age_points ed_points employ_points address_points income_points
##         <num>     <num>         <num>          <num>         <num>
## 1:         -3         4            14            -10            14
## 2:         -3         4           -26            -10           -15
## 3:         -7         4            -4            -25            -5
## 4:          0         4            35              3            12
## 5:          3        -1             0            -10             1
## 6:         -3         4            14            -10            -5
##    debtinc_points creddebt_points othdebt_points score
##             <num>           <num>          <num> <num>
## 1:            -17             -17             -6   525
## 2:             33              14             10   553
## 3:             11              14              0   534
## 4:             11               0             -6   605
## 5:            -17               0             -6   516
## 6:             11              14             10   581

We just got the score of each characteristics market from our dataset. To see how our scorecard result stable for some population, we can use Population Stability Index.

Performance Evaluation Scorecard

Population Stability Index

# score list
score_list <- list(train = score_train$score,
                   test = score_test$score)

# label list
label_list <- list(train = train_woe$def,
                   test = test_woe$def)

psi <- perf_psi(score = score_list, 
                label = label_list, 
                positive = 0) 
psi

## $pic
## $pic$pred

## 
## 
## $psi
##    variable    dataset   psi
##      <char>     <char> <num>
## 1:     pred train_test   Inf

PSI value is under 0.10 which means there is no significant changes and our scorecard stable in population score.

Cutoff

We can set the cutoff depends of our business question. In this case, our business wanted risk under 10%, so we set cutoff in 552 that has approval rate 61% from approval rate.

approval_rate(score = score_test$score, 
              label = test_woe$def,
              positive = 0)

## Key: <datset>
##            bin approval_rate neg_rate count_approved neg_approved count   neg
##         <fctr>         <num>    <num>          <int>        <int> <int> <int>
##  1: [-Inf,502)        0.9143   0.2188            128           28    12     7
##  2:  [502,520)        0.8000   0.1518            112           17    16    11
##  3:  [520,540)        0.7000   0.1224             98           12    14     5
##  4:  [540,553)        0.6143   0.0930             86            8    12     4
##  5:  [553,568)        0.5000   0.0571             70            4    16     4
##  6:  [568,580)        0.4000   0.0536             56            3    14     1
##  7:  [580,591)        0.3071   0.0465             43            2    13     1
##  8:  [591,605)        0.2071   0.0000             29            0    14     2
##  9:  [605,617)        0.1143   0.0000             16            0    13     0
## 10: [617, Inf)        0.0000   0.0000              0            0    16     0
##       pos
##     <int>
##  1:     5
##  2:     5
##  3:     9
##  4:     8
##  5:    12
##  6:    13
##  7:    12
##  8:    12
##  9:    13
## 10:    16

# predict new data
new_data <- data.frame(list(age = 22,
                            ed = 2,
                            employ = 1,
                            address = 1,
                            income = 20,
                            debtinc = 11.0,
                            creddebt = 0.775656,
                            othdebt = 1.318344))
new_data

##   age ed employ address income debtinc creddebt  othdebt
## 1  22  2      1       1     20      11 0.775656 1.318344

res <- predict_behaviour(data = new_data,
                         score_card = score_card,
                         cutoff = 552)
res

##    score recommendation
##    <num>         <char>
## 1:   504            BAD

The result shows that the new data would negatively impact approval. However, it could be a good recommendation if we set the cutoff lower than before, even though it would increase the bad rate. You can adjust depends on what the needs.

LBB DSS Credit Score

Rahma Fairuz Rania

2024-06-29