LBB DSS Behaviour Credit Score
Problem Statement
As Data team, we have assumption in this bank that we are credit company that have market share. We want to decrease bad rate loss potential.
Objectives : To minimize losses, stricter cutoffs are needed in the scorecard.
Library
Data Input
## age ed employ address income debtinc creddebt othdebt def
## 1 41 3 17 12 176 9.3 11.359392 5.008608 1
## 2 27 1 10 6 31 17.3 1.362202 4.000798 0
## 3 40 1 15 14 55 5.5 0.856075 2.168925 0
## 4 41 1 15 14 120 2.9 2.658720 0.821280 0
## 5 24 2 2 0 28 17.3 1.787436 3.056564 1
## 6 41 2 5 5 25 10.2 0.392700 2.157300 0
Dataset can be found here. Description of our dataset is below
age: Age of the Customersed: Education Levelemploy: Work Experienceaddress: Address of the Customerincome: Yearly Income of the Customerdebtinc: Debt to Income Ratiocreddebt: Credit to Debt Ratioothdebt: Other Debtsdef: Target, 0 default, 1 not default
Exploratory Data Analysis
## variable class count missing_rate unique_count identical_rate min
## <char> <char> <int> <num> <int> <num> <num>
## 1: age integer 700 0 37 0.0629 20.000000
## 2: ed integer 700 0 5 0.5314 1.000000
## 3: employ integer 700 0 32 0.0886 0.000000
## 4: address integer 700 0 31 0.0843 0.000000
## 5: income integer 700 0 114 0.0343 14.000000
## 6: debtinc numeric 700 0 231 0.0143 0.400000
## 7: creddebt numeric 700 0 695 0.0029 0.011696
## 8: othdebt numeric 700 0 699 0.0029 0.045584
## 9: def integer 700 0 2 0.7386 0.000000
## p25 p50 p75 max mean sd cv
## <num> <num> <num> <num> <num> <num> <num>
## 1: 29.0000000 34.0000000 40.000000 56.00000 34.8600 7.9973 0.2294
## 2: 1.0000000 1.0000000 2.000000 5.00000 1.7229 0.9282 0.5388
## 3: 3.0000000 7.0000000 12.000000 31.00000 8.3886 6.6580 0.7937
## 4: 3.0000000 7.0000000 12.000000 34.00000 8.2786 6.8249 0.8244
## 5: 24.0000000 34.0000000 55.000000 446.00000 45.6014 36.8142 0.8073
## 6: 5.0000000 8.6000000 14.125000 41.30000 10.2606 6.8272 0.6654
## 7: 0.3690593 0.8548695 1.901955 20.56131 1.5536 2.1172 1.3628
## 8: 1.0441782 1.9875675 3.923065 27.03360 3.0582 3.2876 1.0750
## 9: 0.0000000 0.0000000 1.000000 1.00000 0.2614 0.4397 1.6820
##
## 0 1
## 0.7385714 0.2614286
We can consider our target has balance data even though the class proportion is 70:30.
Data Preprocessing
# split into train and test
set.seed(572)
idx <- sample(x = nrow(dt), size = nrow(dt) * 0.8)
train <- dt[idx,]
test <- dt[-idx,]##
## 0 1
## 0.7357143 0.2642857
Initial Characteristic Analysis
Weight of Evidence (WoE)
We want to classify (binning) splitting positive and negative class. This would make scorecard analysis easier. We can see how each class has potential risk.
## ✔ Binning on 560 rows and 9 columns in 00:00:04
## $age
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: age [-Inf,26) 68 0.1214286 31 37 0.5441176 -0.84688037
## 2: age [26,30) 104 0.1857143 34 70 0.6730769 -0.30167636
## 3: age [30,46) 318 0.5678571 64 254 0.7987421 0.35464011
## 4: age [46, Inf) 70 0.1250000 19 51 0.7285714 -0.03642442
## bin_iv total_iv breaks is_special_values
## <num> <num> <char> <lgcl>
## 1: 0.1013323137 0.1848274 26 FALSE
## 2: 0.0180483363 0.1848274 30 FALSE
## 3: 0.0652794636 0.1848274 46 FALSE
## 4: 0.0001672599 0.1848274 Inf FALSE
##
## $ed
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: ed 1 290 0.51785714 64 226 0.7793103 0.23784084
## 2: ed 2 163 0.29107143 46 117 0.7177914 -0.09027854
## 3: ed 3 73 0.13035714 26 47 0.6438356 -0.43176001
## 4: ed 4%,%5 34 0.06071429 12 22 0.6470588 -0.41767527
## bin_iv total_iv breaks is_special_values
## <num> <num> <char> <lgcl>
## 1: 0.027615999 0.06819626 1 FALSE
## 2: 0.002422194 0.06819626 2 FALSE
## 3: 0.026595556 0.06819626 3 FALSE
## 4: 0.011562514 0.06819626 4%,%5 FALSE
##
## $employ
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: employ [-Inf,4) 158 0.28214286 73 85 0.5379747 -0.871619260
## 2: employ [4,6) 69 0.12321429 20 49 0.7101449 -0.127723051
## 3: employ [6,13) 188 0.33571429 35 153 0.8138298 0.451278784
## 4: employ [13,15) 34 0.06071429 9 25 0.7352941 -0.002159828
## 5: employ [15, Inf) 111 0.19821429 11 100 0.9009009 1.183463838
## bin_iv total_iv breaks is_special_values
## <num> <num> <char> <lgcl>
## 1: 2.500959e-01 0.5123193 4 FALSE
## 2: 2.069509e-03 0.5123193 6 FALSE
## 3: 6.086520e-02 0.5123193 13 FALSE
## 4: 2.833676e-07 0.5123193 15 FALSE
## 5: 1.992884e-01 0.5123193 Inf FALSE
##
## $address
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: address [-Inf,1) 38 0.06785714 17 21 0.5526316 -0.8125020
## 2: address [1,7) 225 0.40178571 75 150 0.6666667 -0.3306639
## 3: address [7,9) 63 0.11250000 14 49 0.7777778 0.2289519
## 4: address [9,11) 59 0.10535714 6 53 0.8983051 1.1547214
## 5: address [11,19) 119 0.21250000 29 90 0.7563025 0.1087028
## 6: address [19, Inf) 56 0.10000000 7 49 0.8750000 0.9220991
## bin_iv total_iv breaks is_special_values
## <num> <num> <char> <lgcl>
## 1: 0.051913994 0.2748964 1 FALSE
## 2: 0.047178823 0.2748964 7 FALSE
## 3: 0.005572104 0.2748964 9 FALSE
## 4: 0.101731225 0.2748964 11 FALSE
## 5: 0.002445884 0.2748964 19 FALSE
## 6: 0.066054329 0.2748964 Inf FALSE
##
## $income
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: income [-Inf,20) 61 0.10892857 25 36 0.5901639 -0.65916796
## 2: income [20,30) 167 0.29821429 52 115 0.6886228 -0.23012267
## 3: income [30,34) 43 0.07678571 7 36 0.8372093 0.61379771
## 4: income [34,60) 166 0.29642857 43 123 0.7409639 0.02717316
## 5: income [60,70) 37 0.06607143 4 33 0.8918919 1.08640212
## 6: income [70,90) 40 0.07142857 9 31 0.7750000 0.21295155
## 7: income [90, Inf) 46 0.08214286 8 38 0.8260870 0.53433354
## bin_iv total_iv breaks is_special_values
## <num> <num> <char> <lgcl>
## 1: 0.0537487390 0.1763177 20 FALSE
## 2: 0.0166206412 0.1763177 30 FALSE
## 3: 0.0246018370 0.1763177 34 FALSE
## 4: 0.0002174709 0.1763177 60 FALSE
## 5: 0.0576554263 0.1763177 70 FALSE
## 6: 0.0030732971 0.1763177 90 FALSE
## 7: 0.0204002966 0.1763177 Inf FALSE
##
## $debtinc
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: debtinc [-Inf,3) 59 0.10535714 3 56 0.9491525 1.9029283
## 2: debtinc [3,11) 282 0.50357143 46 236 0.8368794 0.6113793
## 3: debtinc [11,16) 107 0.19107143 37 70 0.6542056 -0.3862337
## 4: debtinc [16,24) 83 0.14821429 40 43 0.5180723 -0.9514904
## 5: debtinc [24, Inf) 29 0.05178571 22 7 0.2413793 -2.1689434
## bin_iv total_iv breaks is_special_values
## <num> <num> <char> <lgcl>
## 1: 0.2200776 0.8546111 3 FALSE
## 2: 0.1601843 0.8546111 11 FALSE
## 3: 0.0309362 0.8546111 16 FALSE
## 4: 0.1578535 0.8546111 24 FALSE
## 5: 0.2855595 0.8546111 Inf FALSE
##
## $creddebt
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: creddebt [-Inf,0.1) 28 0.0500000 1 27 0.9642857 2.272025790
## 2: creddebt [0.1,1.2) 310 0.5535714 59 251 0.8096774 0.424104420
## 3: creddebt [1.2,2.9) 145 0.2589286 55 90 0.6206897 -0.531334590
## 4: creddebt [2.9,5.5) 49 0.0875000 13 36 0.7346939 -0.005241495
## 5: creddebt [5.5, Inf) 28 0.0500000 20 8 0.2857143 -1.940101807
## bin_iv total_iv breaks is_special_values
## <num> <num> <char> <lgcl>
## 1: 1.335434e-01 0.5287426 0.1 FALSE
## 2: 8.930564e-02 0.5287426 1.2 FALSE
## 3: 8.138719e-02 0.5287426 2.9 FALSE
## 4: 2.406879e-06 0.5287426 5.5 FALSE
## 5: 2.245040e-01 0.5287426 Inf FALSE
##
## $othdebt
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: othdebt [-Inf,0.4) 28 0.05000000 5 23 0.8214286 0.502245228
## 2: othdebt [0.4,0.6) 33 0.05892857 1 32 0.9696970 2.441924827
## 3: othdebt [0.6,1.8) 194 0.34642857 51 143 0.7371134 0.007207922
## 4: othdebt [1.8,2.4) 60 0.10714286 11 49 0.8166667 0.470113950
## 5: othdebt [2.4, Inf) 245 0.43750000 80 165 0.6734694 -0.299892236
## bin_iv total_iv breaks is_special_values
## <num> <num> <char> <lgcl>
## 1: 1.107022e-02 0.2472249 0.4 FALSE
## 2: 1.731646e-01 0.2472249 0.6 FALSE
## 3: 1.796779e-05 0.2472249 1.8 FALSE
## 4: 2.097071e-02 0.2472249 2.4 FALSE
## 5: 4.200144e-02 0.2472249 Inf FALSE
From income binning, the yearly income range -inf to 19 and 20 to 29 has negative WoE values means that range of income is not good if we approved, the rate of people that can pay is 59% to 68%. Change train data and test data into woebin form by apply woebin.
## ✔ Woe transformating on 560 rows and 8 columns in 00:00:00
## def age_woe ed_woe employ_woe address_woe income_woe debtinc_woe
## <int> <num> <num> <num> <num> <num> <num>
## 1: 0 -0.03642442 0.23784084 1.1834638 0.2289519 0.02717316 -2.1689434
## 2: 1 -0.84688037 0.23784084 -0.8716193 -0.3306639 -0.23012267 -2.1689434
## 3: 0 -0.84688037 0.23784084 -0.1277231 -0.3306639 -0.23012267 0.6113793
## 4: 0 0.35464011 0.23784084 0.4512788 1.1547214 -0.23012267 -0.9514904
## 5: 0 0.35464011 -0.09027854 0.4512788 0.9220991 0.02717316 1.9029283
## 6: 0 -0.84688037 -0.09027854 -0.8716193 -0.3306639 -0.65916796 0.6113793
## creddebt_woe othdebt_woe
## <num> <num>
## 1: -0.005241495 -0.299892236
## 2: -0.531334590 -0.299892236
## 3: 0.424104420 0.470113950
## 4: -0.531334590 -0.299892236
## 5: 2.272025790 0.502245228
## 6: 2.272025790 0.007207922
## ✔ Woe transformating on 140 rows and 8 columns in 00:00:00
## def age_woe ed_woe employ_woe address_woe income_woe
## <int> <num> <num> <num> <num> <num>
## 1: 0 -0.30167636 0.23784084 0.451278784 -0.3306639 0.61379771
## 2: 0 -0.30167636 0.23784084 -0.871619260 -0.3306639 -0.65916796
## 3: 0 -0.84688037 0.23784084 -0.127723051 -0.8125020 -0.23012267
## 4: 0 -0.03642442 0.23784084 1.183463838 0.1087028 0.53433354
## 5: 1 0.35464011 -0.09027854 -0.002159828 -0.3306639 0.02717316
## 6: 0 -0.30167636 0.23784084 0.451278784 -0.3306639 -0.23012267
## debtinc_woe creddebt_woe othdebt_woe
## <num> <num> <num>
## 1: -0.9514904 -0.531334590 -0.299892236
## 2: 1.9029283 0.424104420 0.502245228
## 3: 0.6113793 0.424104420 0.007207922
## 4: 0.6113793 -0.005241495 -0.299892236
## 5: -0.9514904 -0.005241495 -0.299892236
## 6: 0.6113793 0.424104420 0.470113950
Information Value (IV)
feature importance, whether each variables give good information or not in classify positive and negative class. The result is shown descending
## variable info_value
## <char> <num>
## 1: debtinc_woe 0.85461111
## 2: creddebt_woe 0.52874264
## 3: employ_woe 0.51231930
## 4: address_woe 0.27489636
## 5: othdebt_woe 0.24722490
## 6: age_woe 0.18482737
## 7: income_woe 0.17631771
## 8: ed_woe 0.06819626
We don’t have the unpredictive and weak variable so we don’t need remove columns.
Modelling
Logistic Regression
##
## Call:
## glm(formula = def ~ ., family = "binomial", data = train_woe)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1505 -0.6299 -0.3114 0.4071 3.1680
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0792 0.1250 -8.637 < 2e-16 ***
## age_woe -0.2960 0.3030 -0.977 0.328511
## ed_woe -0.5329 0.4732 -1.126 0.260146
## employ_woe -1.0378 0.2096 -4.952 7.36e-07 ***
## address_woe -1.0542 0.2678 -3.936 8.28e-05 ***
## income_woe -0.7757 0.3858 -2.010 0.044395 *
## debtinc_woe -0.6088 0.1607 -3.789 0.000151 ***
## creddebt_woe -1.1309 0.2184 -5.178 2.24e-07 ***
## othdebt_woe -0.7077 0.3098 -2.284 0.022346 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 646.79 on 559 degrees of freedom
## Residual deviance: 445.61 on 551 degrees of freedom
## AIC: 463.61
##
## Number of Fisher Scoring iterations: 6
Age and Education is not significant to our model.
Prediction
Evaluate model for the scorecard
test_woe$pred <- predict(object = mdl,
newdata = test_woe,
type = 'response')
test_woe$pred %>% head()## [1] 0.42065863 0.20656462 0.34421558 0.04283838 0.49778716 0.09600386
Scorecard
Make a scorecard using universal odds0 1/19 and points0 600. Odds 1/19 means that we want to set in 19 people that positive (can pay), 1 person of them is negative (can’t pay).
score_card <- scorecard(bins = binning,
model = mdl,
odds0 = 1/19,
points0 = 600,
pdo = 20)
score_card## $basepoints
## variable bin woe points
## <char> <lgcl> <lgcl> <num>
## 1: basepoints NA NA 546
##
## $age
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: age [-Inf,26) 68 0.1214286 31 37 0.5441176 -0.84688037
## 2: age [26,30) 104 0.1857143 34 70 0.6730769 -0.30167636
## 3: age [30,46) 318 0.5678571 64 254 0.7987421 0.35464011
## 4: age [46, Inf) 70 0.1250000 19 51 0.7285714 -0.03642442
## bin_iv total_iv breaks is_special_values points
## <num> <num> <char> <lgcl> <num>
## 1: 0.1013323137 0.1848274 26 FALSE -7
## 2: 0.0180483363 0.1848274 30 FALSE -3
## 3: 0.0652794636 0.1848274 46 FALSE 3
## 4: 0.0001672599 0.1848274 Inf FALSE 0
##
## $ed
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: ed 1 290 0.51785714 64 226 0.7793103 0.23784084
## 2: ed 2 163 0.29107143 46 117 0.7177914 -0.09027854
## 3: ed 3 73 0.13035714 26 47 0.6438356 -0.43176001
## 4: ed 4%,%5 34 0.06071429 12 22 0.6470588 -0.41767527
## bin_iv total_iv breaks is_special_values points
## <num> <num> <char> <lgcl> <num>
## 1: 0.027615999 0.06819626 1 FALSE 4
## 2: 0.002422194 0.06819626 2 FALSE -1
## 3: 0.026595556 0.06819626 3 FALSE -7
## 4: 0.011562514 0.06819626 4%,%5 FALSE -6
##
## $employ
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: employ [-Inf,4) 158 0.28214286 73 85 0.5379747 -0.871619260
## 2: employ [4,6) 69 0.12321429 20 49 0.7101449 -0.127723051
## 3: employ [6,13) 188 0.33571429 35 153 0.8138298 0.451278784
## 4: employ [13,15) 34 0.06071429 9 25 0.7352941 -0.002159828
## 5: employ [15, Inf) 111 0.19821429 11 100 0.9009009 1.183463838
## bin_iv total_iv breaks is_special_values points
## <num> <num> <char> <lgcl> <num>
## 1: 2.500959e-01 0.5123193 4 FALSE -26
## 2: 2.069509e-03 0.5123193 6 FALSE -4
## 3: 6.086520e-02 0.5123193 13 FALSE 14
## 4: 2.833676e-07 0.5123193 15 FALSE 0
## 5: 1.992884e-01 0.5123193 Inf FALSE 35
##
## $address
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: address [-Inf,1) 38 0.06785714 17 21 0.5526316 -0.8125020
## 2: address [1,7) 225 0.40178571 75 150 0.6666667 -0.3306639
## 3: address [7,9) 63 0.11250000 14 49 0.7777778 0.2289519
## 4: address [9,11) 59 0.10535714 6 53 0.8983051 1.1547214
## 5: address [11,19) 119 0.21250000 29 90 0.7563025 0.1087028
## 6: address [19, Inf) 56 0.10000000 7 49 0.8750000 0.9220991
## bin_iv total_iv breaks is_special_values points
## <num> <num> <char> <lgcl> <num>
## 1: 0.051913994 0.2748964 1 FALSE -25
## 2: 0.047178823 0.2748964 7 FALSE -10
## 3: 0.005572104 0.2748964 9 FALSE 7
## 4: 0.101731225 0.2748964 11 FALSE 35
## 5: 0.002445884 0.2748964 19 FALSE 3
## 6: 0.066054329 0.2748964 Inf FALSE 28
##
## $income
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: income [-Inf,20) 61 0.10892857 25 36 0.5901639 -0.65916796
## 2: income [20,30) 167 0.29821429 52 115 0.6886228 -0.23012267
## 3: income [30,34) 43 0.07678571 7 36 0.8372093 0.61379771
## 4: income [34,60) 166 0.29642857 43 123 0.7409639 0.02717316
## 5: income [60,70) 37 0.06607143 4 33 0.8918919 1.08640212
## 6: income [70,90) 40 0.07142857 9 31 0.7750000 0.21295155
## 7: income [90, Inf) 46 0.08214286 8 38 0.8260870 0.53433354
## bin_iv total_iv breaks is_special_values points
## <num> <num> <char> <lgcl> <num>
## 1: 0.0537487390 0.1763177 20 FALSE -15
## 2: 0.0166206412 0.1763177 30 FALSE -5
## 3: 0.0246018370 0.1763177 34 FALSE 14
## 4: 0.0002174709 0.1763177 60 FALSE 1
## 5: 0.0576554263 0.1763177 70 FALSE 24
## 6: 0.0030732971 0.1763177 90 FALSE 5
## 7: 0.0204002966 0.1763177 Inf FALSE 12
##
## $debtinc
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: debtinc [-Inf,3) 59 0.10535714 3 56 0.9491525 1.9029283
## 2: debtinc [3,11) 282 0.50357143 46 236 0.8368794 0.6113793
## 3: debtinc [11,16) 107 0.19107143 37 70 0.6542056 -0.3862337
## 4: debtinc [16,24) 83 0.14821429 40 43 0.5180723 -0.9514904
## 5: debtinc [24, Inf) 29 0.05178571 22 7 0.2413793 -2.1689434
## bin_iv total_iv breaks is_special_values points
## <num> <num> <char> <lgcl> <num>
## 1: 0.2200776 0.8546111 3 FALSE 33
## 2: 0.1601843 0.8546111 11 FALSE 11
## 3: 0.0309362 0.8546111 16 FALSE -7
## 4: 0.1578535 0.8546111 24 FALSE -17
## 5: 0.2855595 0.8546111 Inf FALSE -38
##
## $creddebt
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: creddebt [-Inf,0.1) 28 0.0500000 1 27 0.9642857 2.272025790
## 2: creddebt [0.1,1.2) 310 0.5535714 59 251 0.8096774 0.424104420
## 3: creddebt [1.2,2.9) 145 0.2589286 55 90 0.6206897 -0.531334590
## 4: creddebt [2.9,5.5) 49 0.0875000 13 36 0.7346939 -0.005241495
## 5: creddebt [5.5, Inf) 28 0.0500000 20 8 0.2857143 -1.940101807
## bin_iv total_iv breaks is_special_values points
## <num> <num> <char> <lgcl> <num>
## 1: 1.335434e-01 0.5287426 0.1 FALSE 74
## 2: 8.930564e-02 0.5287426 1.2 FALSE 14
## 3: 8.138719e-02 0.5287426 2.9 FALSE -17
## 4: 2.406879e-06 0.5287426 5.5 FALSE 0
## 5: 2.245040e-01 0.5287426 Inf FALSE -63
##
## $othdebt
## variable bin count count_distr neg pos posprob woe
## <char> <char> <int> <num> <int> <int> <num> <num>
## 1: othdebt [-Inf,0.4) 28 0.05000000 5 23 0.8214286 0.502245228
## 2: othdebt [0.4,0.6) 33 0.05892857 1 32 0.9696970 2.441924827
## 3: othdebt [0.6,1.8) 194 0.34642857 51 143 0.7371134 0.007207922
## 4: othdebt [1.8,2.4) 60 0.10714286 11 49 0.8166667 0.470113950
## 5: othdebt [2.4, Inf) 245 0.43750000 80 165 0.6734694 -0.299892236
## bin_iv total_iv breaks is_special_values points
## <num> <num> <char> <lgcl> <num>
## 1: 1.107022e-02 0.2472249 0.4 FALSE 10
## 2: 1.731646e-01 0.2472249 0.6 FALSE 50
## 3: 1.796779e-05 0.2472249 1.8 FALSE 0
## 4: 2.097071e-02 0.2472249 2.4 FALSE 10
## 5: 4.200144e-02 0.2472249 Inf FALSE -6
# apply train to scorecard
score_train <- scorecard_ply(dt = train,
card = score_card,
only_total_score = FALSE)
score_train %>% head()## age_points ed_points employ_points address_points income_points
## <num> <num> <num> <num> <num>
## 1: 0 4 35 7 1
## 2: -7 4 -26 -10 -5
## 3: -7 4 -4 -10 -5
## 4: 3 4 14 35 -5
## 5: 3 -1 14 28 1
## 6: -7 -1 -26 -10 -15
## debtinc_points creddebt_points othdebt_points score
## <num> <num> <num> <num>
## 1: -38 0 -6 549
## 2: -38 -17 -6 441
## 3: 11 14 10 559
## 4: -17 -17 -6 557
## 5: 33 74 10 708
## 6: 11 74 0 572
# apply test to scorecard
score_test <- scorecard_ply(dt = test,
card = score_card,
only_total_score = FALSE)
score_test %>% head()## age_points ed_points employ_points address_points income_points
## <num> <num> <num> <num> <num>
## 1: -3 4 14 -10 14
## 2: -3 4 -26 -10 -15
## 3: -7 4 -4 -25 -5
## 4: 0 4 35 3 12
## 5: 3 -1 0 -10 1
## 6: -3 4 14 -10 -5
## debtinc_points creddebt_points othdebt_points score
## <num> <num> <num> <num>
## 1: -17 -17 -6 525
## 2: 33 14 10 553
## 3: 11 14 0 534
## 4: 11 0 -6 605
## 5: -17 0 -6 516
## 6: 11 14 10 581
We just got the score of each characteristics market from our dataset. To see how our scorecard result stable for some population, we can use Population Stability Index.
Performance Evaluation Scorecard
Population Stability Index
# score list
score_list <- list(train = score_train$score,
test = score_test$score)
# label list
label_list <- list(train = train_woe$def,
test = test_woe$def)
psi <- perf_psi(score = score_list,
label = label_list,
positive = 0)
psi## $pic
## $pic$pred
##
##
## $psi
## variable dataset psi
## <char> <char> <num>
## 1: pred train_test Inf
PSI value is under 0.10 which means there is no significant changes and our scorecard stable in population score.
Cutoff
We can set the cutoff depends of our business question. In this case, our business wanted risk under 10%, so we set cutoff in 552 that has approval rate 61% from approval rate.
## Key: <datset>
## bin approval_rate neg_rate count_approved neg_approved count neg
## <fctr> <num> <num> <int> <int> <int> <int>
## 1: [-Inf,502) 0.9143 0.2188 128 28 12 7
## 2: [502,520) 0.8000 0.1518 112 17 16 11
## 3: [520,540) 0.7000 0.1224 98 12 14 5
## 4: [540,553) 0.6143 0.0930 86 8 12 4
## 5: [553,568) 0.5000 0.0571 70 4 16 4
## 6: [568,580) 0.4000 0.0536 56 3 14 1
## 7: [580,591) 0.3071 0.0465 43 2 13 1
## 8: [591,605) 0.2071 0.0000 29 0 14 2
## 9: [605,617) 0.1143 0.0000 16 0 13 0
## 10: [617, Inf) 0.0000 0.0000 0 0 16 0
## pos
## <int>
## 1: 5
## 2: 5
## 3: 9
## 4: 8
## 5: 12
## 6: 13
## 7: 12
## 8: 12
## 9: 13
## 10: 16
# predict new data
new_data <- data.frame(list(age = 22,
ed = 2,
employ = 1,
address = 1,
income = 20,
debtinc = 11.0,
creddebt = 0.775656,
othdebt = 1.318344))
new_data## age ed employ address income debtinc creddebt othdebt
## 1 22 2 1 1 20 11 0.775656 1.318344
## score recommendation
## <num> <char>
## 1: 504 BAD
The result shows that the new data would negatively impact approval. However, it could be a good recommendation if we set the cutoff lower than before, even though it would increase the bad rate. You can adjust depends on what the needs.