信用评分卡开发 - 基于R语言

B.Li 2023.04.20

本次使用 scorecard 包内置的德国信贷数据,并选择逻辑回归模型进行信用评分卡构建。

目录

  1. 探索性分析

  2. 特征工程

  3. 建模&模型评估

  4. 评分卡构建

1.探索性分析

#加载所用包
library(scorecard)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0     ✔ purrr   1.0.1
## ✔ tibble  3.2.1     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::lag()        masks stats::lag()
## ✖ tidyr::replace_na() masks scorecard::replace_na()
library(corrplot) 
## corrplot 0.92 loaded
library(naniar)
library(glmnet)
## 载入需要的程辑包:Matrix
## 
## 载入程辑包:'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Loaded glmnet 4.1-7

首先,查看数据的基础统计信息:

data("germancredit")#导入数据
glimpse(germancredit)
## Rows: 1,000
## Columns: 21
## $ status.of.existing.checking.account                      <fct> ... < 0 DM, 0…
## $ duration.in.month                                        <dbl> 6, 48, 12, 42…
## $ credit.history                                           <fct> critical acco…
## $ purpose                                                  <chr> "radio/televi…
## $ credit.amount                                            <dbl> 1169, 5951, 2…
## $ savings.account.and.bonds                                <fct> unknown/ no s…
## $ present.employment.since                                 <fct> ... >= 7 year…
## $ installment.rate.in.percentage.of.disposable.income      <dbl> 4, 2, 2, 2, 3…
## $ personal.status.and.sex                                  <fct> male : divorc…
## $ other.debtors.or.guarantors                              <fct> none, none, n…
## $ present.residence.since                                  <dbl> 4, 2, 3, 4, 4…
## $ property                                                 <fct> "real estate"…
## $ age.in.years                                             <dbl> 67, 22, 49, 4…
## $ other.installment.plans                                  <fct> none, none, n…
## $ housing                                                  <fct> own, own, own…
## $ number.of.existing.credits.at.this.bank                  <dbl> 2, 1, 1, 1, 2…
## $ job                                                      <fct> skilled emplo…
## $ number.of.people.being.liable.to.provide.maintenance.for <dbl> 1, 1, 2, 2, 2…
## $ telephone                                                <fct> "yes, registe…
## $ foreign.worker                                           <fct> yes, yes, yes…
## $ creditability                                            <fct> good, bad, go…
summary(germancredit)
##                                      status.of.existing.checking.account
##  ... < 0 DM                                            :274             
##  0 <= ... < 200 DM                                     :269             
##  ... >= 200 DM / salary assignments for at least 1 year: 63             
##  no checking account                                   :394             
##                                                                         
##                                                                         
##  duration.in.month
##  Min.   : 4.0     
##  1st Qu.:12.0     
##  Median :18.0     
##  Mean   :20.9     
##  3rd Qu.:24.0     
##  Max.   :72.0     
##                                                      credit.history
##  no credits taken/ all credits paid back duly               : 40   
##  all credits at this bank paid back duly                    : 49   
##  existing credits paid back duly till now                   :530   
##  delay in paying off in the past                            : 88   
##  critical account/ other credits existing (not at this bank):293   
##                                                                    
##    purpose          credit.amount                 savings.account.and.bonds
##  Length:1000        Min.   :  250   ... < 100 DM               :603        
##  Class :character   1st Qu.: 1366   100 <= ... < 500 DM        :103        
##  Mode  :character   Median : 2320   500 <= ... < 1000 DM       : 63        
##                     Mean   : 3271   ... >= 1000 DM             : 48        
##                     3rd Qu.: 3972   unknown/ no savings account:183        
##                     Max.   :18424                                          
##        present.employment.since
##  unemployed        : 62        
##  ... < 1 year      :172        
##  1 <= ... < 4 years:339        
##  4 <= ... < 7 years:174        
##  ... >= 7 years    :253        
##                                
##  installment.rate.in.percentage.of.disposable.income
##  Min.   :1.000                                      
##  1st Qu.:2.000                                      
##  Median :3.000                                      
##  Mean   :2.973                                      
##  3rd Qu.:4.000                                      
##  Max.   :4.000                                      
##                         personal.status.and.sex other.debtors.or.guarantors
##  male : divorced/separated          : 50        none        :907           
##  female : divorced/separated/married:310        co-applicant: 41           
##  male : single                      :548        guarantor   : 52           
##  male : married/widowed             : 92                                   
##  female : single                    :  0                                   
##                                                                            
##  present.residence.since
##  Min.   :1.000          
##  1st Qu.:2.000          
##  Median :3.000          
##  Mean   :2.845          
##  3rd Qu.:4.000          
##  Max.   :4.000          
##                                                  property    age.in.years  
##  real estate                                         :282   Min.   :19.00  
##  building society savings agreement/ life insurance  :232   1st Qu.:27.00  
##  car or other, not in attribute Savings account/bonds:332   Median :33.00  
##  unknown / no property                               :154   Mean   :35.55  
##                                                             3rd Qu.:42.00  
##                                                             Max.   :75.00  
##  other.installment.plans     housing    number.of.existing.credits.at.this.bank
##  bank  :139              rent    :179   Min.   :1.000                          
##  stores: 47              own     :713   1st Qu.:1.000                          
##  none  :814              for free:108   Median :1.000                          
##                                         Mean   :1.407                          
##                                         3rd Qu.:2.000                          
##                                         Max.   :4.000                          
##                                                             job     
##  unemployed/ unskilled - non-resident                         : 22  
##  unskilled - resident                                         :200  
##  skilled employee / official                                  :630  
##  management/ self-employed/ highly qualified employee/ officer:148  
##                                                                     
##                                                                     
##  number.of.people.being.liable.to.provide.maintenance.for
##  Min.   :1.000                                           
##  1st Qu.:1.000                                           
##  Median :1.000                                           
##  Mean   :1.155                                           
##  3rd Qu.:1.000                                           
##  Max.   :2.000                                           
##                                     telephone   foreign.worker creditability
##  none                                    :596   yes:963        bad :300     
##  yes, registered under the customers name:404   no : 37        good:700     
##                                                                             
##                                                                             
##                                                                             
## 

发现:

1)当前样本分布(7:3)的均衡性尚可,认为不会导致模型仅学得先验认知。

2)部分特征可能取值一致性过高,没有预测能力。如有,需剔除此类特征。

3)未发现特征存在异常值,但特征取值范围差距明显,需要特征归一化。

其次,观察特征缺失情况:

gg_miss_var(germancredit)

4)未发现特征存在缺失值。

再其次,观察各特征的相关水平:

#对于连续型变量,进行相关性检验
X1<-germancredit%>%select(-creditability)%>%
  select_if(is.numeric)
colnames(X1)<-c('duration','credit amount','installment rate','present residence','age','number of credit','number of liabel')
corrplot(corr = cor(X1),method = "square")

#对于离散型变量,进行卡方检验
chisq.test(germancredit$status.of.existing.checking.account,germancredit$creditability)
## 
##  Pearson's Chi-squared test
## 
## data:  germancredit$status.of.existing.checking.account and germancredit$creditability
## X-squared = 123.72, df = 3, p-value < 2.2e-16

5)大部分特征之间无明显相关关系。若模型表现欠佳,可尝试对高相关性变量择一后重新建模。

综上所述,认为无需对当前数据进行缺失值、非均衡性等问题的处理,经过特征选择及归一化后便可进行建模。

2.特征工程

尝试通过IV值进行特征筛选。首先计算并观察各特征的IV值:

info_value = iv(germancredit, y = "creditability")
info_value
##                                                     variable   info_value
##  1:                      status.of.existing.checking.account 6.660115e-01
##  2:                                        duration.in.month 3.345035e-01
##  3:                                           credit.history 2.932335e-01
##  4:                                             age.in.years 2.596514e-01
##  5:                                savings.account.and.bonds 1.960096e-01
##  6:                                                  purpose 1.691951e-01
##  7:                                                 property 1.126383e-01
##  8:                                 present.employment.since 8.643363e-02
##  9:                                                  housing 8.329343e-02
## 10:                                  other.installment.plans 5.761454e-02
## 11:                                           foreign.worker 4.387741e-02
## 12:                                            credit.amount 3.895727e-02
## 13:                              other.debtors.or.guarantors 3.201932e-02
## 14:      installment.rate.in.percentage.of.disposable.income 2.632209e-02
## 15:                  number.of.existing.credits.at.this.bank 1.326652e-02
## 16:                                  personal.status.and.sex 8.839919e-03
## 17:                                                      job 8.762766e-03
## 18:                                                telephone 6.377605e-03
## 19:                                  present.residence.since 3.588773e-03
## 20: number.of.people.being.liable.to.provide.maintenance.for 4.339223e-05

发现达到最低可用门槛(IV>.02)的特征有13个,其中预测能力较强的特征(.1<IV<.5)有6个。关于IV值的高低,直观的理解为”特征在各切片下好坏样本的概率分布与先验概率的差异”:

ggplot(germancredit,aes(x = duration.in.month,fill = creditability))+geom_bar()+theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(germancredit,aes(x = present.residence.since ,fill = creditability))+geom_bar()

此处取duration.in.month(IV>.3)及present.residence.since(IV>.003)进行比较,可以看出前者部分切片下的样本分布明显有别于先验认知(3:7),而后者无明显区别。

基于IV值等信息进行特征筛选,并将过滤后的数据集划分成训练集/测试集:

df<-var_filter(germancredit,y="creditability",lims = list(identical_rate = .95,info_value = .02),positive = "bad|1")%>% 
  select(-status.of.existing.checking.account) #剔除IV过高的特征
## ℹ Filtering variables via missing_rate, identical_rate, info_value ...
## ✔ 1 variables are removed via identical_rate
## ✔ 6 variables are removed via info_value
## ✔ Variable filtering on 1000 rows and 20 columns in 00:00:00
## ✔ 7 variables are removed in total
df_list<-split_df(df,y="creditability",ratios=c(.7,.3),seed = 121)
df_train<-df_list$train
df_test<-df_list$test

出于特征归一化的需要,对特征值进行woe变换:

bins_tree<-woebin(df, y="creditability",method = 'tree',positive = "bad|1")#分箱,分箱方法选择决策树
## ℹ Creating woe binning ...
## ✔ Binning on 1000 rows and 13 columns in 00:00:03
df_woe<-lapply(df_list,function(x){woebin_ply(x,bins = bins_tree, to = "woe")})#基于分箱计算woe值
## ℹ Converting into woe values ...
## ✔ Woe transformating on 676 rows and 12 columns in 00:00:01
## ℹ Converting into woe values ...
## ✔ Woe transformating on 324 rows and 12 columns in 00:00:01
head(df_woe)
## $train
##      creditability duration.in.month_woe credit.history_woe purpose_woe
##   1:             0            -1.3121864        -0.73374058  -0.4100628
##   2:             1             1.1349799         0.08831862  -0.4100628
##   3:             0             0.5245245         0.08831862   0.2799201
##   4:             1             0.1086883         0.08515781   0.2799201
##   5:             0             0.1086883         0.08831862   0.2799201
##  ---                                                                   
## 672:             0             0.5245245         0.08831862   0.2799201
## 673:             0            -0.3466246         0.08831862   0.2799201
## 674:             0            -0.3466246         0.08831862   0.2799201
## 675:             0            -0.3466246         0.08831862  -0.4100628
## 676:             1             1.1349799         0.08831862  -0.4100628
##      credit.amount_woe savings.account.and.bonds_woe
##   1:        0.03366128                    -0.7621401
##   2:        0.39053946                     0.2713578
##   3:        0.39053946                     0.2713578
##   4:        0.39053946                     0.2713578
##   5:       -0.25830746                    -0.7621401
##  ---                                                
## 672:       -0.25830746                     0.2713578
## 673:       -0.25830746                    -0.7621401
## 674:       -0.72823850                     0.2713578
## 675:        0.03366128                     0.2713578
## 676:       -0.25830746                     0.2713578
##      present.employment.since_woe
##   1:                  -0.23556607
##   2:                   0.03210325
##   3:                  -0.39441527
##   4:                   0.03210325
##   5:                  -0.23556607
##  ---                             
## 672:                   0.43113746
## 673:                  -0.23556607
## 674:                  -0.39441527
## 675:                  -0.23556607
## 676:                   0.03210325
##      installment.rate.in.percentage.of.disposable.income_woe
##   1:                                              0.15730029
##   2:                                             -0.19047277
##   3:                                             -0.19047277
##   4:                                             -0.06453852
##   5:                                             -0.06453852
##  ---                                                        
## 672:                                              0.15730029
## 673:                                              0.15730029
## 674:                                             -0.06453852
## 675:                                              0.15730029
## 676:                                              0.15730029
##      other.debtors.or.guarantors_woe property_woe age.in.years_woe
##   1:                      0.02797385  -0.46103496       -0.2123715
##   2:                      0.02797385  -0.46103496        0.5288441
##   3:                     -0.58778666   0.02857337       -0.2123715
##   4:                      0.02797385   0.58608236       -0.2123715
##   5:                      0.02797385   0.02857337       -0.2123715
##  ---                                                              
## 672:                      0.02797385   0.02857337        0.1424546
## 673:                      0.02797385   0.03419136       -0.2123715
## 674:                      0.02797385  -0.46103496        0.1424546
## 675:                      0.02797385   0.03419136       -0.2123715
## 676:                      0.02797385   0.58608236        0.5288441
##      other.installment.plans_woe housing_woe
##   1:                  -0.1211786  -0.1941560
##   2:                  -0.1211786  -0.1941560
##   3:                  -0.1211786   0.4726044
##   4:                  -0.1211786   0.4726044
##   5:                  -0.1211786  -0.1941560
##  ---                                        
## 672:                  -0.1211786  -0.1941560
## 673:                  -0.1211786  -0.1941560
## 674:                  -0.1211786  -0.1941560
## 675:                  -0.1211786  -0.1941560
## 676:                  -0.1211786   0.4726044
## 
## $test
##      creditability duration.in.month_woe credit.history_woe purpose_woe
##   1:             0            -0.3466246        -0.73374058   0.2799201
##   2:             0             0.5245245         0.08831862   0.2799201
##   3:             0             0.1086883         1.23407084   0.2799201
##   4:             0             0.1086883         0.08831862  -0.4100628
##   5:             0            -1.3121864         1.23407084  -0.4100628
##  ---                                                                   
## 320:             0            -0.3466246        -0.73374058   0.2799201
## 321:             0            -0.3466246         1.23407084  -0.4100628
## 322:             0             0.1086883         0.08831862  -0.4100628
## 323:             0             0.1086883         0.08831862  -0.8056252
## 324:             0             1.1349799        -0.73374058  -0.8056252
##      credit.amount_woe savings.account.and.bonds_woe
##   1:       -0.25830746                     0.2713578
##   2:        0.39053946                    -0.7621401
##   3:        0.39053946                    -0.7621401
##   4:       -0.25830746                    -0.7621401
##   5:        0.03366128                     0.2713578
##  ---                                                
## 320:       -0.25830746                    -0.7621401
## 321:       -0.72823850                     0.1395519
## 322:       -0.25830746                    -0.7621401
## 323:       -0.25830746                     0.2713578
## 324:        0.39053946                     0.1395519
##      present.employment.since_woe
##   1:                  -0.39441527
##   2:                   0.03210325
##   3:                   0.43113746
##   4:                  -0.23556607
##   5:                  -0.23556607
##  ---                             
## 320:                   0.43113746
## 321:                  -0.23556607
## 322:                  -0.39441527
## 323:                   0.03210325
## 324:                   0.43113746
##      installment.rate.in.percentage.of.disposable.income_woe
##   1:                                             -0.19047277
##   2:                                             -0.19047277
##   3:                                             -0.19047277
##   4:                                             -0.06453852
##   5:                                              0.15730029
##  ---                                                        
## 320:                                             -0.19047277
## 321:                                              0.15730029
## 322:                                             -0.19047277
## 323:                                              0.15730029
## 324:                                             -0.06453852
##      other.debtors.or.guarantors_woe property_woe age.in.years_woe
##   1:                      0.02797385  -0.46103496       -0.2123715
##   2:                      0.02797385   0.58608236       -0.8724881
##   3:                      0.02797385   0.03419136        0.5288441
##   4:                      0.02797385   0.03419136        0.1424546
##   5:                      0.02797385   0.03419136       -0.2123715
##  ---                                                              
## 320:                      0.02797385   0.02857337       -0.2123715
## 321:                      0.02797385   0.03419136        0.1424546
## 322:                      0.02797385   0.03419136        0.5288441
## 323:                      0.02797385   0.02857337       -0.2123715
## 324:                      0.02797385   0.03419136       -0.1609304
##      other.installment.plans_woe housing_woe
##   1:                  -0.1211786  -0.1941560
##   2:                  -0.1211786   0.4726044
##   3:                   0.4775508  -0.1941560
##   4:                  -0.1211786  -0.1941560
##   5:                  -0.1211786  -0.1941560
##  ---                                        
## 320:                  -0.1211786  -0.1941560
## 321:                   0.4775508  -0.1941560
## 322:                  -0.1211786   0.4044452
## 323:                  -0.1211786  -0.1941560
## 324:                  -0.1211786  -0.1941560

此时特征已初步筛选完成(13个),取值范围同为0-1,且数据集已划分完毕。接下来可以进行建模及模型评估。

3.建模&模型评估

模型选择逻辑回归:

m1<-glm(creditability~.,family = binomial,data = df_woe$train)
summary(m1)
## 
## Call:
## glm(formula = creditability ~ ., family = binomial, data = df_woe$train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1101  -0.7685  -0.4724   0.8565   2.6345  
## 
## Coefficients:
##                                                         Estimate Std. Error
## (Intercept)                                              -0.9060     0.0982
## duration.in.month_woe                                     0.5994     0.1995
## credit.history_woe                                        0.8006     0.1781
## purpose_woe                                               1.0313     0.2538
## credit.amount_woe                                         1.0754     0.2468
## savings.account.and.bonds_woe                             1.0294     0.2368
## present.employment.since_woe                              0.7972     0.3324
## installment.rate.in.percentage.of.disposable.income_woe   1.7796     0.6314
## other.debtors.or.guarantors_woe                           0.9177     0.7727
## property_woe                                              0.6061     0.3322
## age.in.years_woe                                          0.6477     0.2753
## other.installment.plans_woe                               0.6994     0.3879
## housing_woe                                               0.4922     0.3562
##                                                         z value Pr(>|z|)    
## (Intercept)                                              -9.226  < 2e-16 ***
## duration.in.month_woe                                     3.005  0.00266 ** 
## credit.history_woe                                        4.495 6.96e-06 ***
## purpose_woe                                               4.064 4.82e-05 ***
## credit.amount_woe                                         4.358 1.31e-05 ***
## savings.account.and.bonds_woe                             4.347 1.38e-05 ***
## present.employment.since_woe                              2.398  0.01647 *  
## installment.rate.in.percentage.of.disposable.income_woe   2.819  0.00482 ** 
## other.debtors.or.guarantors_woe                           1.188  0.23495    
## property_woe                                              1.824  0.06812 .  
## age.in.years_woe                                          2.352  0.01866 *  
## other.installment.plans_woe                               1.803  0.07136 .  
## housing_woe                                               1.382  0.16697    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 819.35  on 675  degrees of freedom
## Residual deviance: 662.78  on 663  degrees of freedom
## AIC: 688.78
## 
## Number of Fisher Scoring iterations: 5

发现少量特征的预测作用不显著,尝试用双向搜索进一步筛选特征:

m2<-step(m1, direction="both", trace = FALSE)
summary(m2)
## 
## Call:
## glm(formula = creditability ~ duration.in.month_woe + credit.history_woe + 
##     purpose_woe + credit.amount_woe + savings.account.and.bonds_woe + 
##     present.employment.since_woe + installment.rate.in.percentage.of.disposable.income_woe + 
##     property_woe + age.in.years_woe + other.installment.plans_woe, 
##     family = binomial, data = df_woe$train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0554  -0.7653  -0.4872   0.8676   2.6298  
## 
## Coefficients:
##                                                         Estimate Std. Error
## (Intercept)                                             -0.90777    0.09797
## duration.in.month_woe                                    0.58563    0.19754
## credit.history_woe                                       0.79633    0.17649
## purpose_woe                                              1.04282    0.25281
## credit.amount_woe                                        1.08733    0.24582
## savings.account.and.bonds_woe                            0.99939    0.23514
## present.employment.since_woe                             0.82819    0.33029
## installment.rate.in.percentage.of.disposable.income_woe  1.76511    0.62848
## property_woe                                             0.82920    0.30459
## age.in.years_woe                                         0.69000    0.27325
## other.installment.plans_woe                              0.63804    0.38476
##                                                         z value Pr(>|z|)    
## (Intercept)                                              -9.266  < 2e-16 ***
## duration.in.month_woe                                     2.965  0.00303 ** 
## credit.history_woe                                        4.512 6.42e-06 ***
## purpose_woe                                               4.125 3.71e-05 ***
## credit.amount_woe                                         4.423 9.73e-06 ***
## savings.account.and.bonds_woe                             4.250 2.14e-05 ***
## present.employment.since_woe                              2.507  0.01216 *  
## installment.rate.in.percentage.of.disposable.income_woe   2.809  0.00498 ** 
## property_woe                                              2.722  0.00648 ** 
## age.in.years_woe                                          2.525  0.01156 *  
## other.installment.plans_woe                               1.658  0.09726 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 819.35  on 675  degrees of freedom
## Residual deviance: 666.10  on 665  degrees of freedom
## AIC: 688.1
## 
## Number of Fisher Scoring iterations: 5

模型表现:

result_list<-lapply(df_woe,function(x){predict.glm(m2,x,type = "response")})#预测结果
label_list<-lapply(df_woe,function(x){x$creditability})#真实标签
perf1<-perf_eva(result_list,label = label_list, show_plot = c("ks","roc","pr","f1"),positive = "bad|1")
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the scorecard package.
##   Please report the issue at <]8;;https://github.com/ShichenXie/scorecard/issueshttps://github.com/ShichenXie/scorecard/issues]8;;>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

当前模型KS=0.47,ROC=0.80,认为性能良好。接下来将模型映射为评分卡:

card = scorecard(bins_tree, m2)#基于模型所学系数,将各特征按其分箱赋予分值
score_list = lapply(df_list, function(x) scorecard_ply(x, card))#基于评分规则,计算各对象的最终得分
head(score_list)
## $train
##      score
##   1:   672
##   2:   410
##   3:   417
##   4:   359
##   5:   534
##  ---      
## 672:   356
## 673:   526
## 674:   538
## 675:   480
## 676:   354
## 
## $test
##      score
##   1:   582
##   2:   465
##   3:   328
##   4:   568
##   5:   454
##  ---      
## 320:   577
## 321:   438
## 322:   575
## 323:   497
## 324:   461

计算目前的评分卡的PSI,从而判定其预测结果是否稳定:

perf_psi(score = score_list, label = label_list)
## $pic
## $pic$score

## 
## 
## $psi
##    variable    dataset        psi
## 1:    score train_test 0.06038522

PSI=.03,认为模型稳定。同时,上图显示了各分数段代表的坏账概率(Positive probability),分数越高则坏账率越低。

可根据业务需要指定界定好坏的分数阈值,或基于模型性能指标(KS/AUC/F1等)得出”相对最佳”的阈值:

#观察perf1,可知KS与F1的最大值出现于将预测概率排名前40%左右的样本判为正样本时,由此可以计算模型的判定阈值:
table_result<- cbind(result_list$test,label_list$test)%>%
    as.data.frame()%>%
    rename(p_estimate = V1,label = V2)%>%
    arrange(desc(p_estimate))%>%
    mutate(cnt = 1)%>%
    mutate(cnt_positive = cumsum(cnt))%>%
    mutate(rate_positive = cnt_positive/nrow(df_list$test))

p_estimate<-mean(table_result[which(table_result$rate_positive>=.38 & table_result$rate_positive<=.42),1])
p_estimate
## [1] 0.2830725

即,可将模型输出概率0.28作为判定好坏样本的阈值,与评分卡关联,可知其对应的确切分数。粗略比对,可知目标分数约为500。

至此,信用评分卡开发完成。