Information about the Classification Problem

LTFS is one of India’s most respected & leading NBFCs providing finance.

Financial institutions incur significant losses due to the default of vehicle loans. This has led to the tightening up of vehicle loan underwriting and increased vehicle loan rejection rates. The need for a better credit risk scoring model is also raised by these institutions. This warrants a study to estimate the determinants of vehicle loan default.

A financial institution has hired you to accurately predict the probability of loanee/borrower defaulting on a vehicle loan in the first EMI (Equated Monthly Instalments) on the due date. Following Information regarding the loan and loanee are provided in the datasets:

Doing so will ensure that clients capable of repayment are not rejected and important determinants can be identified which can be further used for minimising the default rates.

Problem Statement

Vehicle Loan Default Prediction

Financial institutions incur significant losses due to the default of vehicle loans. This has led to the tightening up of vehicle loan underwriting and increased vehicle loan rejection rates. The need for a better credit risk scoring model is also raised by these institutions. This warrants a study to estimate the determinants of vehicle loan default. A financial institution has hired you to accurately predict the probability of loanee/borrower defaulting on a vehicle loan in the first EMI (Equated Monthly Instalments) on the due date. Following Information regarding the loan and loanee are provided in the datasets:

  • Loanee Information (Demographic data like age, income, Identity proof etc.)

  • Loan Information (Disbursal details, amount, EMI, loan to value ratio etc.)

  • Bureau data & history (Bureau score, number of active accounts, the status of other loans, credit history etc.)

Doing so will ensure that clients capable of repayment are not rejected and important determinants can be identified which can be further used for minimising the default rates.

Data Description

The Data Set contains:

  • train.csv contains the training data with details on loan as described in the last section

  • data_dictionary.csv contains a brief description on each variable provided in the training and test set.

  • test.csv contains details of all customers and loans for which the participants are to submit probability of default.

  • sample_submission.csv contains the submission format for the predictions against the test set. A single csv needs to be submitted as a solution.

Evaluation Metric

Submissions are evaluated on area under the ROC curve between the predicted probability and the observed target.

Applied methods of machine learning on the classification problem (two classes)

Despite the fact that various methods are actively used for preprocessing and interpreting modeling results, including the tidymodels, the Microsoft SQL Server 2019 Microsoft ML Server 9.4.7 {Update August 2019} was the main set for building classification models.

Functions for Visualizations

R Markdown does not support combining Rmd files. Per the R Markdown website, R Markdown requires a single Rmd file. It does not currently support the embedding of one Rmd file within another Rmd document.

Optimal Binning for Factor and Numeric (Scale) Variables

Import Data

## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
## Microsoft R Server (Machine Learning) 9.4.7
## New names:
## * `` -> `..3`
# create a list of 70% of the rows in the original training dataset we can use for training
# Randomly Split the data into two sets: training (inTrain) and testing (inTest) 
seed <- 2019
set.seed(seed)

inTrain1 <- rep(FALSE, nrow(DT1))
inTrain1[sample(nrow(DT1), 9/10 * nrow(DT1))] <- TRUE

inTrain = c( inTrain1, rep( FALSE, times = dim(DT2)[1] ) )
inTest = c( !inTrain1, rep( FALSE, times = dim(DT2)[1] ) )
inProblem = c( rep( FALSE, times = dim(DT1)[1] ), rep( TRUE, times = dim(DT2)[1] ) )

# Adding Attributes by Variables into DT from Data Dictionary
attr(DT, "variable.labels") <- tibble(`Variable.Name` = colnames(DT)) %>%
  dplyr::left_join(DD, by = c('Variable.Name' = 'Variable Name')) %>%
    dplyr::select(Description) %>%
      pull

# gsub() is replacing '.' (dot)
names(DT) %<>% gsub('[.]', '_', .)
# DT$PRI_CURRENT_BALANCE <- DT$PRI_CURRENT_BALANCE + 6678296
# DT$PRI_CURRENT_BALANCE <- log(DT$PRI_CURRENT_BALANCE)

# Renaming All Variables with Name's Length > 31
oldnames = c('DELINQUENT_ACCTS_IN_LAST_SIX_MONTHS')

newnames = c('DELINQUENT_ACCTS_IN_LAST_SIX_M')

data.table::setnames(DT, old = oldnames, new = newnames)

# Get The Date of Birthday and convert into 'Age' at '2019-01-01'
Finish_Date <- lubridate::parse_date_time2('2019-01-01', orders = '%Y-%m-%d', tz = 'Asia/Dhaka')
DT[, Date_of_Birth := paste0(substr( DT$Date_of_Birth, 1, 6 ),
                             ifelse(substr( DT$Date_of_Birth, 7, 8 ) %>% as.integer > 40, '19', '20'),
                             substr( DT$Date_of_Birth, 7, 8 )) %>%
     as.Date(., '%d-%m-%Y') ]
DT[, Age := difftime(time1 = Finish_Date, time2 = Date_of_Birth, units = 'days') %>%
          as.integer(.) / 365.24 ]

# Get The Date of Disbursal and convert into 'YearsSinceDisbursment'
DT[, DisbursalDate := paste0(substr( DT$DisbursalDate, 1, 6 ),
                             ifelse(substr( DT$DisbursalDate, 7, 8 ) %>% as.integer > 40, '19', '20'),
                             substr( DT$DisbursalDate, 7, 8 )) %>%
    as.Date( ., '%d-%m-%Y') ]
DT[, YearsSinceDisbursment := difftime(time1 = Finish_Date, time2 = DisbursalDate, units = 'days') %>%
      as.integer(.) / 365.24 ]

# Get The Average loan tenure and convert into 'AverageLoanTenure'
DT[, AverageLoanTenure := 
     stringr::str_split_fixed(DT$AVERAGE_ACCT_AGE, ' ', 2) %>%
       data.frame(., stringsAsFactors = FALSE) %>%
         setNames(c('Years', 'Months')) %>%
           dplyr::mutate(Years = readr::parse_number(Years), Months = readr::parse_number(Months) / 12 ) %>%
             dplyr::mutate(Len = Years +  Months) %>%
               dplyr::select(Len) %>% pull ]

# Get The Time since first loan and convert into 'TimeSinceFirstLoan'
DT[, TimeSinceFirstLoan := 
     stringr::str_split_fixed(DT$CREDIT_HISTORY_LENGTH, ' ', 2) %>%
       data.frame(., stringsAsFactors = FALSE) %>%
         setNames(c('Years', 'Months')) %>%
           dplyr::mutate(Years = readr::parse_number(Years), Months = readr::parse_number(Months) / 12 ) %>%
             dplyr::mutate(Len = Years +  Months) %>%
               dplyr::select(Len) %>% pull ]

DT[, ':=' (
  YearsOnLoan = difftime(time1 = DisbursalDate, time2 = Date_of_Birth, units = 'days') %>%
          as.integer(.) / 365.24,
  DisAsDiff = asset_cost - disbursed_amount,
  DisAsShare = asset_cost / disbursed_amount,
  # DiffLTV = (asset_cost - disbursed_amount - ltv),
  
  
  Qrt = lubridate::quarter(DisbursalDate) %>% as.factor(.), # weekdays
  Day = format(DisbursalDate, '%d') %>% as.integer,
  
  OutstandingNow = disbursed_amount + PRI_CURRENT_BALANCE,
  DisbursedTotal = PRI_DISBURSED_AMOUNT + disbursed_amount,
  ShareOverdue = DELINQUENT_ACCTS_IN_LAST_SIX_M - NEW_ACCTS_IN_LAST_SIX_MONTHS,
  # OutstandingNow2Dsbrsd = OutstandingNow / DisbursedTotal
  
  SEC_OverdueShareSec = SEC_OVERDUE_ACCTS / SEC_NO_OF_ACCTS,
  PRI_OverdueShare = PRI_OVERDUE_ACCTS / PRI_NO_OF_ACCTS
) ]

library('Hmisc')            # Harrell Miscellaneous
## Loading required package: survival
## 
## Attaching package: 'survival'
## The following object is masked from 'package:rpart':
## 
##     solder
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
## Input object size:    96804992 bytes;     55 variables    345546 observations
## New object size: 96812608 bytes; 55 variables    345546 observations
## Rows Read: 345546, Total Rows Processed: 345546, Total Chunk Time: 0.746 seconds

I use package embed that contains extra steps for the recipes package for embedding predictors into one or more numeric columns. All of the preprocessing methods are supervised.

## [1] "branch_id"
## start par. =  0.1123205 fn =  221335.9 
## At return
## eval:  19 fn:      221335.56 par: 0.106959
## [1] "supplier_id"
## start par. =  0.2075279 fn =  220404.6 
## At return
## eval:  19 fn:      220349.07 par: 0.172799
## [1] "manufacturer_id"
## start par. =  0.04441674 fn =  223266.8 
## At return
## eval:  18 fn:      223266.33 par: 0.0533252
## [1] "Current_pincode_ID"
## start par. =  0.245132 fn =  221670.7 
## At return
## eval:  16 fn:      221350.34 par: 0.166798
## [1] "Employment_Type"
## start par. =  0.02972829 fn =  216219.5 
## At return
## eval:  28 fn:      216219.22 par: 0.0422908
## [1] "State_ID"
## start par. =  0.08355545 fn =  222283.1 
## At return
## eval:  18 fn:            222283. par: 0.0889640
## [1] "Employee_code_ID"
## start par. =  0.2213827 fn =  220258.2 
## At return
## eval:  19 fn:      220127.69 par: 0.175705
## [1] "PERFORM_CNS_SCORE_DESCRIPTION"
## start par. =  0.098095 fn =  221735 
## At return
## eval:  17 fn:      221733.42 par: 0.119980

Inspection Data Frame using Automate Data Exploration and Inspection packages: DataExplorer, summarytools.

## 
## Attaching package: 'summarytools'
## The following objects are masked from 'package:Hmisc':
## 
##     label, label<-
## The following object is masked from 'package:tibble':
## 
##     view

Data Frame Summary

DT

N: 345546
No Variable Label Stats / Values Freqs (% of Valid) Graph Valid Missing
1 UniqueID [integer] mean (sd) : 593106.04 (101481.56) min < med < max : 417428 < 592918.5 < 769909 IQR (CV) : 175115.5 (0.17) 345546 distinct values 345546 (100%) 0 (0%)
2 disbursed_amount [integer] mean (sd) : 54916.38 (13045.96) min < med < max : 11613 < 54303 < 990572 IQR (CV) : 13302 (0.24) 29271 distinct values 345546 (100%) 0 (0%)
3 asset_cost [integer] mean (sd) : 76294.84 (18738.64) min < med < max : 37000 < 71541 < 1628992 IQR (CV) : 13323 (0.25) 53158 distinct values 345546 (100%) 0 (0%)
4 ltv [numeric] mean (sd) : 74.93 (11.32) min < med < max : 10.03 < 77.14 < 95 IQR (CV) : 14.41 (0.15) 6819 distinct values 345546 (100%) 0 (0%)
5 branch_id [factor] 1. 1 2. 2 3. 3 4. 5 5. 7 6. 8 7. 9 8. 10 9. 11 10. 13 [ 72 others ] 8337 (2.4%) 20527 (5.9%) 14881 (4.3%) 10276 (3.0%) 4323 (1.3%) 4472 (1.3%) 3891 (1.1%) 5685 (1.6%) 5873 (1.7%) 4170 (1.2%) 263111 (76.1%) 345546 (100%) 0 (0%)
6 supplier_id [factor] 1. 10524 2. 12311 3. 12312 4. 12374 5. 12441 6. 12456 7. 12500 8. 12534 9. 12539 10. 12797 [ 3079 others ] 7 (0.0%) 6 (0.0%) 57 (0.0%) 146 (0.0%) 62 (0.0%) 101 (0.0%) 79 (0.0%) 73 (0.0%) 11 (0.0%) 87 (0.0%) 344917 (99.8%) 345546 (100%) 0 (0%)
7 manufacturer_id [factor] 1. 45 2. 48 3. 49 4. 51 5. 67 6. 86 7. 120 8. 145 9. 152 10. 153 [ 2 others ] 87053 (25.2%) 22964 (6.6%) 14812 (4.3%) 40927 (11.8%) 3364 (1.0%) 161203 (46.7%) 14049 (4.1%) 1138 (0.3%) 9 (0.0%) 25 (0.0%) 2 (0.0%) 345546 (100%) 0 (0%)
8 Current_pincode_ID [factor] 1. 1 2. 2 3. 3 4. 4 5. 5 6. 6 7. 7 8. 8 9. 9 10. 10 [ 7086 others ] 44 (0.0%) 118 (0.0%) 87 (0.0%) 153 (0.0%) 331 (0.1%) 162 (0.0%) 152 (0.0%) 74 (0.0%) 56 (0.0%) 8 (0.0%) 344361 (99.7%) 345546 (100%) 0 (0%)
9 Date_of_Birth [Date] min : 1949-09-15 med : 1986-01-01 max : 2000-11-29 range : 51y 2m 14d 15888 distinct val. 345546 (100%) 0 (0%)
10 Employment_Type [factor] 1. 2. Salaried 3. Self employed 11104 (3.2%) 147013 (42.5%) 187429 (54.2%) 345546 (100%) 0 (0%)
11 DisbursalDate [Date] min : 2018-08-01 med : 2018-10-20 max : 2018-11-30 range : 3m 29d 111 distinct val. 345546 (100%) 0 (0%)
12 State_ID [factor] 1. 1 2. 2 3. 3 4. 4 5. 5 6. 6 7. 7 8. 8 9. 9 10. 10 [ 12 others ] 14351 (4.2%) 7258 (2.1%) 47868 (13.9%) 70438 (20.4%) 14304 (4.1%) 48903 (14.2%) 10628 (3.1%) 20047 (5.8%) 21459 (6.2%) 5564 (1.6%) 84726 (24.5%) 345546 (100%) 0 (0%)
13 Employee_code_ID [factor] 1. 1 2. 3 3. 4 4. 5 5. 7 6. 9 7. 10 8. 11 9. 12 10. 15 [ 3388 others ] 106 (0.0%) 192 (0.1%) 96 (0.0%) 133 (0.0%) 221 (0.1%) 77 (0.0%) 44 (0.0%) 111 (0.0%) 162 (0.0%) 146 (0.0%) 344258 (99.6%) 345546 (100%) 0 (0%)
14 MobileNo_Avl_Flag [factor] 1. 1 345546 (100.0%) 345546 (100%) 0 (0%)
15 Aadhar_flag [factor] 1. 0 2. 1 51883 (15.0%) 293663 (85.0%) 345546 (100%) 0 (0%)
16 PAN_flag [factor] 1. 0 2. 1 306392 (88.7%) 39154 (11.3%) 345546 (100%) 0 (0%)
17 VoterID_flag [factor] 1. 0 2. 1 298155 (86.3%) 47391 (13.7%) 345546 (100%) 0 (0%)
18 Driving_flag [factor] 1. 0 2. 1 338249 (97.9%) 7297 (2.1%) 345546 (100%) 0 (0%)
19 Passport_flag [factor] 1. 0 2. 1 344835 (99.8%) 711 (0.2%) 345546 (100%) 0 (0%)
20 PERFORM_CNS_SCORE [integer] mean (sd) : 289.03 (338.84) min < med < max : 0 < 0 < 890 IQR (CV) : 679 (1.17) 574 distinct values 345546 (100%) 0 (0%)
21 PERFORM_CNS_SCORE_DESCRIPTION [factor] 1. A-Very Low Risk 2. B-Very Low Risk 3. C-Very Low Risk 4. D-Very Low Risk 5. E-Low Risk 6. F-Low Risk 7. G-Low Risk 8. H-Medium Risk 9. I-Medium Risk 10. J-High Risk [ 10 others ] 21683 (6.3%) 13696 (4.0%) 23870 (6.9%) 16472 (4.8%) 8393 (2.4%) 12176 (3.5%) 5795 (1.7%) 10142 (2.9%) 8260 (2.4%) 5526 (1.6%) 219533 (63.5%) 345546 (100%) 0 (0%)
22 PRI_NO_OF_ACCTS [integer] mean (sd) : 2.37 (5.01) min < med < max : 0 < 0 < 453 IQR (CV) : 3 (2.11) 114 distinct values 345546 (100%) 0 (0%)
23 PRI_ACTIVE_ACCTS [integer] mean (sd) : 1 (1.88) min < med < max : 0 < 0 < 144 IQR (CV) : 1 (1.87) 42 distinct values 345546 (100%) 0 (0%)
24 PRI_OVERDUE_ACCTS [integer] mean (sd) : 0.16 (0.54) min < med < max : 0 < 0 < 25 IQR (CV) : 0 (3.5) 23 distinct values 345546 (100%) 0 (0%)
25 PRI_CURRENT_BALANCE [integer] mean (sd) : 160270.2 (925345.66) min < med < max : -6678296 < 0 < 96524920 IQR (CV) : 31364.5 (5.77) 97465 distinct values 345546 (100%) 0 (0%)
26 PRI_SANCTIONED_AMOUNT [integer] mean (sd) : 209650.86 (2043865.78) min < med < max : -481500 < 0 < 1e+09 IQR (CV) : 59416.75 (9.75) 60681 distinct values 345546 (100%) 0 (0%)
27 PRI_DISBURSED_AMOUNT [integer] mean (sd) : 209560.79 (2047482.84) min < med < max : 0 < 0 < 1e+09 IQR (CV) : 57645.75 (9.77) 65673 distinct values 345546 (100%) 0 (0%)
28 SEC_NO_OF_ACCTS [integer] mean (sd) : 0.05 (0.56) min < med < max : 0 < 0 < 57 IQR (CV) : 0 (11.82) 40 distinct values 345546 (100%) 0 (0%)
29 SEC_ACTIVE_ACCTS [integer] mean (sd) : 0.02 (0.28) min < med < max : 0 < 0 < 36 IQR (CV) : 0 (12.47) 23 distinct values 345546 (100%) 0 (0%)
30 SEC_OVERDUE_ACCTS [integer] mean (sd) : 0.01 (0.1) min < med < max : 0 < 0 < 8 IQR (CV) : 0 (16.97) 0 : 343928 (99.5%) 1 : 1358 (0.4%) 2 : 166 (0.0%) 3 : 54 (0.0%) 4 : 22 (0.0%) 5 : 8 (0.0%) 6 : 6 (0.0%) 7 : 2 (0.0%) 8 : 2 (0.0%) 345546 (100%) 0 (0%)
31 SEC_CURRENT_BALANCE [integer] mean (sd) : 4565.3 (161202.6) min < med < max : -574647 < 0 < 36032852 IQR (CV) : 0 (35.31) 3947 distinct values 345546 (100%) 0 (0%)
32 SEC_SANCTIONED_AMOUNT [integer] mean (sd) : 6133.3 (189342.66) min < med < max : 0 < 0 < 57945000 IQR (CV) : 0 (30.87) 2631 distinct values 345546 (100%) 0 (0%)
33 SEC_DISBURSED_AMOUNT [integer] mean (sd) : 6038.72 (188911.37) min < med < max : 0 < 0 < 57945000 IQR (CV) : 0 (31.28) 3031 distinct values 345546 (100%) 0 (0%)
34 PRIMARY_INSTAL_AMT [integer] mean (sd) : 12497.73 (199754.5) min < med < max : 0 < 0 < 85262329 IQR (CV) : 1946 (15.98) 34330 distinct values 345546 (100%) 0 (0%)
35 SEC_INSTAL_AMT [integer] mean (sd) : 272.74 (16261.26) min < med < max : 0 < 0 < 5390000 IQR (CV) : 0 (59.62) 2295 distinct values 345546 (100%) 0 (0%)
36 NEW_ACCTS_IN_LAST_SIX_MONTHS [integer] mean (sd) : 0.36 (0.92) min < med < max : 0 < 0 < 35 IQR (CV) : 0 (2.56) 26 distinct values 345546 (100%) 0 (0%)
37 DELINQUENT_ACCTS_IN_LAST_SIX_M [integer] mean (sd) : 0.1 (0.38) min < med < max : 0 < 0 < 20 IQR (CV) : 0 (4.01) 16 distinct values 345546 (100%) 0 (0%)
38 AVERAGE_ACCT_AGE [character] 1. 0yrs 0mon 2. 0yrs 6mon 3. 0yrs 7mon 4. 0yrs 11mon 5. 0yrs 10mon 6. 1yrs 0mon 7. 0yrs 9mon 8. 0yrs 8mon 9. 1yrs 1mon 10. 0yrs 5mon [ 190 others ] 177481 (51.4%) 9325 (2.7%) 8167 (2.4%) 7665 (2.2%) 7587 (2.2%) 7447 (2.2%) 7353 (2.1%) 7224 (2.1%) 6680 (1.9%) 6458 (1.9%) 100159 (29.0%) 345546 (100%) 0 (0%)
39 CREDIT_HISTORY_LENGTH [character] 1. 0yrs 0mon 2. 0yrs 6mon 3. 2yrs 1mon 4. 0yrs 7mon 5. 2yrs 0mon 6. 1yrs 0mon 7. 1yrs 1mon 8. 0yrs 11mon 9. 0yrs 8mon 10. 0yrs 9mon [ 297 others ] 177178 (51.3%) 7456 (2.2%) 6932 (2.0%) 6243 (1.8%) 5762 (1.7%) 5153 (1.5%) 4542 (1.3%) 3925 (1.1%) 3753 (1.1%) 3572 (1.0%) 121030 (35.0%) 345546 (100%) 0 (0%)
40 NO_OF_INQUIRIES [integer] mean (sd) : 0.21 (0.72) min < med < max : 0 < 0 < 36 IQR (CV) : 0 (3.37) 26 distinct values 345546 (100%) 0 (0%)
41 GB_flag [integer] mean (sd) : 0.22 (0.41) min < med < max : 0 < 0 < 1 IQR (CV) : 0 (1.9) 0 : 182543 (78.3%) 1 : 50611 (21.7%) 233154 (67.47%) 112392 (32.53%)
42 Age [labelled, numeric] Age in Years mean (sd) : 34.81 (9.86) min < med < max : 18.09 < 33 < 69.29 IQR (CV) : 15.28 (0.28) 15888 distinct values 345546 (100%) 0 (0%)
43 YearsSinceDisbursment [labelled, numeric] Years Since Disbursment mean (sd) : 0.22 (0.1) min < med < max : 0.08 < 0.2 < 0.42 IQR (CV) : 0.16 (0.43) 111 distinct values 345546 (100%) 0 (0%)
44 AverageLoanTenure [labelled, numeric] Average loan tenure in Years (AVERAGE_ACCT_AGE) mean (sd) : 0.74 (1.26) min < med < max : 0 < 0 < 30.75 IQR (CV) : 1.08 (1.69) 200 distinct values 345546 (100%) 0 (0%)
45 TimeSinceFirstLoan [labelled, numeric] Years since First Loan (CREDIT_HISTORY_LENGTH) mean (sd) : 1.33 (2.35) min < med < max : 0 < 0 < 39 IQR (CV) : 1.92 (1.76) 307 distinct values 345546 (100%) 0 (0%)
46 YearsOnLoan [labelled, numeric] Years On Loan mean (sd) : 34.59 (9.86) min < med < max : 18 < 32.81 < 69.13 IQR (CV) : 15.23 (0.28) 16252 distinct values 345546 (100%) 0 (0%)
47 DisAsDiff [labelled, integer] Difference of asset_cost from disbursed_amount mean (sd) : 21378.46 (12308.93) min < med < max : 3917 < 17901 < 638420 IQR (CV) : 13187 (0.58) 48676 distinct values 345546 (100%) 0 (0%)
48 DisAsShare [labelled, numeric] Ratio of asset_cost to disbursed_amount mean (sd) : 1.42 (0.31) min < med < max : 1.07 < 1.34 < 10.57 IQR (CV) : 0.26 (0.22) 299901 distinct values 345546 (100%) 0 (0%)
49 Qrt [labelled, factor] Quarter of DisbursalDate 1. 3 2. 4 134790 (39.0%) 210756 (61.0%) 345546 (100%) 0 (0%)
50 Day [labelled, integer] Months`s Day of DisbursalDate mean (sd) : 19.3 (7.86) min < med < max : 1 < 20 < 31 IQR (CV) : 13 (0.41) 31 distinct values 345546 (100%) 0 (0%)
51 OutstandingNow [labelled, integer] Difference of disbursed_amount from PRI_CURRENT_BALANCE mean (sd) : 215186.57 (925615.45) min < med < max : -6608979 < 60379.5 < 96583433 IQR (CV) : 40649.75 (4.3) 118719 distinct values 345546 (100%) 0 (0%)
52 DisbursedTotal [labelled, integer] Summ of PRI_DISBURSED_AMOUNT and disbursed_amount mean (sd) : 264477.16 (2047611.56) min < med < max : 11613 < 62639.5 < 1000047773 IQR (CV) : 63776.5 (7.74) 121005 distinct values 345546 (100%) 0 (0%)
53 ShareOverdue [labelled, integer] Difference of DELINQUENT_ACCTS_IN_LAST_SIX_M from NEW_ACCTS_IN_LAST_SIX_MONTHS mean (sd) : -0.26 (0.93) min < med < max : -30 < 0 < 17 IQR (CV) : 0 (-3.52) 38 distinct values 345546 (100%) 0 (0%)
54 SEC_OverdueShareSec [labelled, numeric] Ratio of SEC_OVERDUE_ACCTS to SEC_NO_OF_ACCTS mean (sd) : 0.16 (0.33) min < med < max : 0 < 0 < 1 IQR (CV) : 0 (2.11) 66 distinct values 7108 (2.06%) 338438 (97.94%)
55 PRI_OverdueShare [labelled, numeric] Ratio of PRI_OVERDUE_ACCTS / PRI_NO_OF_ACCTS mean (sd) : 0.09 (0.23) min < med < max : 0 < 0 < 1 IQR (CV) : 0 (2.49) 324 distinct values 170703 (49.4%) 174843 (50.6%)

Generated by summarytools 0.8.8 (R version 3.5.2)
2019-08-31

Create IV Table for Factor and Numeric (Scale) Variables

A function smbinning::smbinning() of Optimal Binning for Scoring Modeling categorizes a numeric characteristic into bins for ulterior usage in scoring modeling. This process, also known as supervised discretization, utilizes Recursive Partitioning (rpart) to categorize the numeric characteristic.

WenSui Liu has developed two different algorithms for monotonic binning of numeric varisbles. While the first tends to generate bins with equal densities, the second would define finer bins based on the isotonic regression.

In the code snippet below, a third approach would be illustrated for the purpose to generate bins with roughly equal-sized bads. Once again, for the reporting layer, WenSui Liu leveraged the flexible smbinning::smbinning.custom() function with a small tweak.

The levels of factor variables should be jointed into groups manualy by a special function smbinning::smbinning.factor.custom().

## Loading required package: sqldf
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Loading required package: partykit
## Loading required package: grid
## Loading required package: libcoin
## Loading required package: mvtnorm
# install.packages("https://cran.r-project.org/src/contrib/woeBinning_0.1.6.tar.gz", repos = NULL, type = "source")
library('woeBinning')        # Supervised Weight of Evidence Binning of Numeric Variables and Factors
library('openxlsx')          # Read, Write and Edit Miscosoft XLSX Files 

isobin <- function(data, y, x) { # Second Variant - Finer Monotonic Binning Based on Isotonic Regression
# WenSui Liu, is leading a team of quantitative analysts developing operational risk models for American Bank.
# https://statcompute.wordpress.com/2017/06/15/finer-monotonic-binning-based-on-isotonic-regression/
  d1 <- data[c(y, x)]
  d2 <- d1[!is.na(d1[x]), ]
  c <- cor(d2[, 2], d2[, 1], method = 'spearman', use = 'complete.obs')
  reg <- isoreg(d2[, 2], c / abs(c) * d2[, 1])
  k <- knots(as.stepfun(reg))
  sm1 <- smbinning::smbinning.custom(d1, y, x, k)
  c1 <- subset(sm1$ivtable, subset = CntGood * CntBad > 0, select = Cutpoint)
  c2 <- suppressWarnings(as.numeric(unlist(strsplit(c1$Cutpoint, ' '))))
  c3 <- c2[!is.na(c2)]
  return(smbinning::smbinning.custom(d1, y, x, c3[-length(c3)]))
}

tree_bin<- function(data, y, x) {
# Thilo Eichenberg for `woeBinning` package
  binning <- woeBinning::woe.tree.binning(df = data, target.var =  y, pred.var =  x,
                           min.perc.total = 0.05, min.perc.class = 0, stop.limit = 0.01,
                           abbrev.fact.levels = 200, event.class = 1)
  
  if (class(binning) == 'list') {
    z <- c()
    
    if ( class(data[, x]) ==  'factor') {
      for (variable in binning[[2]]$Group.2 %>% levels()) {
        df <- binning[[2]]
        fac_vec <- df[binning[[2]]$Group.2 == variable, 'Group.1'] %>%
          as.character 
        chr_vec <- paste0('\'', paste0(fac_vec,  sep = "\'", collapse = ', \''))
        z <- c(z, chr_vec)
      }  
    } else { 
      df <- binning[[2]]
      z <- df$cutpoints.final %>%
        .[c(-1, -nrow(df))] %>%
          as.vector()
      } # End of 'Numeric' class
    } else { # ERROR !!!
      z <- c()
  }
  
  return( z )
}

# Exploratory Data Analysis (EDA)
if (Max_Vars >= ncol(DF0) )
  smbinning.eda(DF0)$eda

# Convert double after round into integer for smbinning()
DF0 %<>% # dplyr::mutate_if(is.double, round) %>%
  # dplyr::mutate_if(is.double, as.integer) %>% 
    dplyr::mutate( GB_flag = ifelse(GB_flag == 1, 0, 1) )   # Flag for Optimal Binning
# Convert integer features with unique < 5 into factor for smbinning()
DF0 %<>% dplyr::mutate_at( dplyr::select_if(., ~ is.integer(.) & unique(.) %>% length(.) < 5 ) %>%
  dplyr::select( -dplyr::one_of('GB_flag') ) %>% colnames,
    as.factor )

# Create MS Excel File for Output
openxlsx::addWorksheet(wb <- openxlsx::createWorkbook(), sheetName = 'IV Table', 
                       gridLines = FALSE, tabColour = 'olivedrab')
openxlsx::addWorksheet(wb, sheetName = 'Scorecard', gridLines = FALSE, tabColour = 'brown')

NamesOfVariables <- DF0 %>%
  dplyr::select( -dplyr::one_of('GB_flag', dplyr::select_if(., ~ is.factor(.) & nlevels(.) == 1) %>%
                                  colnames, # Levels > 1
                         # At least 5 different values for Numeric variables
                dplyr::select_if(., ~ is.numeric(.) & unique(.) %>% length() < 5) %>% colnames)) %>%
    colnames

binning.df <- cbind(Variable = NamesOfVariables,
                    `IV-Finish` = rep(NA_real_, times = length(NamesOfVariables)),
                    data.frame(matrix(data = rep(NA_real_, length(NamesOfVariables) * 8),
                                      nrow = length(NamesOfVariables), ncol = 8))) %>%
                      setNames(c('Variable', 'IV', 'IV-RPart', 'N-RPart', 'IV-Decile', 'N-Decile',
                                 'IV-Iso', 'N-Iso', 'IV-Tree', 'N-Tree'))
binning.df$Method <- rep(x = '', times = length(NamesOfVariables))

TotalBinning.sql <- ''

for (i in 1:length(NamesOfVariables)) {
  val <- NamesOfVariables[i]
  writeLines(paste(i, '-', val))

  if (DF0[, val] %>% is.factor) {  #  Generate a binning table for all the categories of a given factor variable (A factor variable with at least 2 different values. Labels with commas are not allowed)
    
    result.smb <- switch(val,
      `education`    = smbinning.factor.custom(DF0, x = val, y = 'GB_flag',
                          c("'Высшее'",                                    # 'Высшее'
                            "'Начальное','Средне-специальное'",            # 'Начальное','Средне-специальное'
                            "'Среднее'")),                                 # 'Среднее'
      if (levels(DF0[, val]) %>% length() > Max_Levels) { # Multi levels `factor` variables
        chr_vec <- tree_bin(DF0, 'GB_flag', val)   # Combine levels of factor variable by Badrate into some bins
        smbinning.factor.custom(DF0, x = val, y = 'GB_flag', chr_vec)
      }
      else {
        smbinning.factor(DF0, x = val, y = 'GB_flag', maxcat = levels(DF0[, val]) %>% length() + 1)  
      }
    )

    if (class(result.smb) == 'list') {
      binning.df[i, 'IV']      <- result.smb$iv
      binning.df[i, 'IV-RPart'] <- result.smb$iv
      binning.df[i, 'N-RPart']  <- result.smb$ivtable %>% nrow - 1 # with Missing Vales
      binning.df[i, 'Method']  <- ifelse(length(result.smb$groups) == 0, 'IV-By Levels', 'IV-By Groups')[1]
      
        # IV Table Supplement
      result.smb$ivtable <- result.smb$ivtable %>%
      dplyr::mutate(G_Dis = CntGood / table(DF0$GB_flag)[2],
           B_Dis = CntBad / table(DF0$GB_flag)[1],
           `G/B Index` = ifelse(G_Dis > B_Dis, G_Dis / B_Dis, B_Dis / G_Dis),
           `0=Good, 1=Bad` = ifelse(G_Dis > B_Dis, 0, 1),
           Bin = c(1:(nrow(result.smb$ivtable) - 1), NA),
           # Min = (c(NA, result.smb$cuts, NA, NA)),
           # Max = (c(result.smb$cuts, NA, NA, NA))
           Min = rep(NA, times = result.smb$ivtable %>% nrow),
           Max = rep(NA, times = result.smb$ivtable %>% nrow)
      )
    }

  } else 
    {                                                                  # Numeric Class
    # Optimal Binning for Scoring Modeling from package `smbinning`
    # This process, also known as supervised discretization, utilizes Recursive Partitioning to categorize the numeric characteristic. The especific algorithm is Conditional Inference Trees which initially excludes missing values (NA) to compute the cutpoints, adding them back later in the process for the calculation of the Information Value.
    result1.smb <- smbinning(DF0, 'GB_flag', val)
  
    if (class(result1.smb) == 'list') {
      binning.df[i, 'IV-RPart'] <- result1.smb$iv
      binning.df[i, 'N-RPart']  <- result1.smb$bands %>% length
    }
  
    # Custom Binning Based by cutpoints using percentiles (10% each)
    if (length(NamesOfVariables) <= Max_Vars || class(result1.smb) != 'list') {
      cbs1cuts <- as.vector(quantile(DF0[, val], probs=seq(0, 1, 0.1), na.rm=TRUE)) # Quantiles by 10%
      cbs1cuts <- cbs1cuts[2:(length(cbs1cuts) - 1)] # Remove first (min) and last (max) values
      result2.smb <- smbinning.custom(df=DF0, y = 'GB_flag', x = val, cuts = cbs1cuts)
      binning.df[i, 'IV-Decile'] <- result2.smb$iv
      binning.df[i, 'N-Decile']  <- result2.smb$bands %>% length
    } else {
      binning.df[i, 'IV-Decile'] <- 0
      binning.df[i, 'N-Decile']  <- ncol(DF0) + 1
    }
      
    if (length(NamesOfVariables) <= Max_Vars ) { # & !isMicrosoftRServer
      # Finer Monotonic Binning Based on Isotonic Regression - Do Not working with Microsoft R Server 9.3.0
      result3.smb <- isobin(DF0, 'GB_flag', val)
      binning.df[i, 'IV-Iso'] <- result3.smb$iv
      binning.df[i, 'N-Iso']  <- result3.smb$bands %>% length
    
      # Generates a supervised tree-like segmentation of numeric variables with respect to a binary target outcome
      # result4.smb <- tree_chimergebin(DF0, 'GB_flag', val)
      cbs1cuts <- tree_bin(DF0, 'GB_flag', val)   # Binning via Tree-Like Segmentation
      result4.smb <- smbinning.custom(df = DF0, x = val, y = 'GB_flag', cuts = cbs1cuts)
      if (class(result4.smb) == 'list') {  
        binning.df[i, 'IV-Tree'] <- result4.smb$iv
        binning.df[i, 'N-Tree']  <- result4.smb$bands %>% length
      } else {  # 'Not Meaningful (IV<0.1)' or 'Uniques values < 5' case
        binning.df[i, 'IV-Tree'] <- 0
        binning.df[i, 'N-Tree']  <- ncol(DF0)
      }
    } else {
      binning.df[i, 'IV-Iso'] <- 0
      binning.df[i, 'N-Iso']  <- ncol(DF0)
      
      binning.df[i, 'IV-Bad'] <- 0
      binning.df[i, 'N-Bad']  <- ncol(DF0)
    }
    
    # Selection of the Optimal Binning Method
    if (if_else(is.na(binning.df[i, 'IV-RPart']) == TRUE, 0, binning.df[i, 'IV-RPart'] * 1.1) >
          binning.df[i, 'IV-Decile']) {
                        binning.df[i, 'Method'] <- 'IV-RPart'
    } else 
      {
      if ( ( (binning.df[i, 'IV-Iso'] > binning.df[i, 'IV-Decile']) & 
             (binning.df[i, 'N-Iso'] / 1.1 < binning.df[i, 'N-Decile']) ) |
           ( (binning.df[i, 'IV-Iso'] * 1.1 > binning.df[i, 'IV-Decile']) & 
             (binning.df[i, 'N-Iso'] * 2 < binning.df[i, 'N-Decile']) ) ) {
                        binning.df[i, 'Method'] <- 'IV-Iso'
      } else {
        if (binning.df[i, 'IV-Decile'] >= binning.df[i, 'IV-Iso']) { 
                        binning.df[i, 'Method'] <- 'IV-Decile'
              } else { 
                  binning.df[i, 'Method'] <- 'IV-Iso' } 
              }
      
      }  # End Else If

    type <- binning.df[i, 'Method']
    result.smb <- 
      switch(type,
             `IV-RPart` = result1.smb,
             `IV-Decile` = result2.smb,
             `IV-Iso` = result3.smb,
             `IV-Bad` = result4.smb
             )
    binning.df[i, 'IV'] <- result.smb$iv
  
    # IV Table Supplement
    result.smb$ivtable <- result.smb$ivtable %>%
      mutate(G_Dis = CntGood / table(DF0$GB_flag)[2],
             B_Dis = CntBad / table(DF0$GB_flag)[1],
             `G/B Index` = if_else(G_Dis > B_Dis, G_Dis / B_Dis, B_Dis / G_Dis),
             `0=Good, 1=Bad` = if_else(G_Dis > B_Dis, 0, 1),
             Bin = c(1:(nrow(result.smb$ivtable) - 1), NA),
             Min = c(NA, result.smb$cuts, NA, NA),
             Max = c(result.smb$cuts, NA, NA, NA)
      )

  } #  End else for numeric class 
  
  # Prepare MySQL-code for Binning and Fine Classing
  # Sys.setloc <- Sys.setlocale(locale = 'Russian') # set locale to `Russian`
  binning.sql <- capture.output(smbinning.sql(result.smb)) %>% 
    gsub("then '0", "then '", .) %>% 
      gsub('TableName', 'DF', .) %>%
        stringr::str_replace('NewCharName', paste0(val, '_fct')) 
  
  val_fct <- paste0('  \'', val, '_fct', '\'', '  from data.frame \'DF0\'')
  binning.sql <- c('select *,', paste0('      /*   Inserting the new factor variable', val_fct, '  */'),
                    binning.sql[-c(1:3)] , paste0('  \'', val, '_fct', '\'', ' from \'DF1\''))

  # Truncation of the Name of the Gradation in a Complex Set of levels
  theBest <- TRUE
  for (j in 1:(nrow(result.smb[['ivtable']]) - 2)) {
    if (str_length(binning.sql[j + 3]) > 999) {
      gradation_str <- str_split( string = binning.sql[j + 3], pattern = sprintf('%s: %s ', j, val) ) %>% 
        unlist
      binning.sql[j + 3] <- 
        paste0(gradation_str[1], ifelse( theBest, sprintf('%s: %s the Best\'', j, val), 
                                          sprintf('%s: %s the Worst\'', j, val) ) )
      theBest <= FALSE
    } else {
      # print('empty')
    }
  }
  
  # Appending binning.sql into TotalBinning.sql
  if (i == 1) {
    binning.sql[length(binning.sql)] = paste0('  \'', val, '_fct\',')
    TotalBinning.sql <- binning.sql
  } else {
    if (i == length(NamesOfVariables)) {
      TotalBinning.sql <- c(TotalBinning.sql, '', binning.sql[-1] )
    } else {
      TotalBinning.sql <- c(TotalBinning.sql, '', binning.sql[-c(1, length(binning.sql))], 
                            paste0('  \'', val, '_fct\','))
    }
  } # End if (i == 1)
  # TotalBinning.sql <- c(TotalBinning.sql, '', binning.sql)
  
  # Preparing a Data.Frame with VI Table for Export into MS Excel
  addWorksheet(wb, val, gridLines = FALSE, tabColour = ifelse(result.smb$iv >= 0.05, 'chartreuse',
                                                              ifelse(result.smb$iv >= 0.03, 'khaki', 'white')))
  result.smb$ivtable[ is.na( result.smb$ivtable ) ] <- NA  # Dealing with NaN's in data frames
  N <- 3:(nrow(result.smb$ivtable) + 2)
  
  result.smb$ivtable %>% 
    dplyr::select(Cutpoint, Bin, Min, Max, CntRec, CntGood, CntBad,  G_Dis, B_Dis, Share = PctRec, 
                  BadRate, WoE, IV,`G/B Index`, `0=Good, 1=Bad`) %>% 
      writeDataTable(wb, sheet = val, x = ., tableStyle = 'TableStyleMedium2', startCol = 'A',
                     startRow = 2, tableName = val, firstColumn = TRUE, lastColumn = FALSE, bandedRows = TRUE)
  # Set Columns widths
  setColWidths(wb, sheet = val, cols = 1:4, widths = c(32, 7, 10, 10))
  
  # # Set Row heights
  # setRowHeights(wb, sheet = 1, rows = 1, heights = 45)
  
  # Set Styles & Conditional Formattings in Columns
  addStyle(wb, sheet = val, style = createStyle(wrapText = TRUE, halign = 'center', valign = 'center'),
                                              cols = 1:ncol(result.smb$ivtable), rows = 2)
  addStyle(wb, sheet = val, cols = 1, rows = 1, style = createStyle(fontSize = 16, textDecoration = 'bold'))
  addStyle(wb, sheet = val, cols = 1:ncol(result.smb$ivtable), rows = (nrow(result.smb$ivtable) + 2), 
           style = createStyle(textDecoration = 'bold'))
  addStyle(wb, sheet = val, cols = 5:7, rows = N, style = createStyle(numFmt = 'COMMA'), gridExpand = TRUE)
  addStyle(wb, sheet = val, cols = 8:10, rows = N, style = createStyle(numFmt = '0%'), gridExpand = TRUE)
  addStyle(wb, sheet = val, cols = 11, rows = N, style = createStyle(numFmt = paste0('0', options()$OutDec, '0%'), textDecoration = 'bold'))
  conditionalFormatting(wb, sheet = val, cols = 13, rows = 3:(nrow(result.smb$ivtable) + 1), type = 'databar',
                        border = FALSE, style = c('red', 'chartreuse'))
  addStyle(wb, sheet = val, cols = 4, rows = N, style = createStyle(border = 'right', borderColour = '#4F81BC'))
  addStyle(wb, sheet = val, cols = 7, rows = N, style = createStyle(numFmt = 'COMMA', border = 'right',
                                                                    borderColour = '#4F81BC'))
  addStyle(wb, sheet = val, cols = 10, rows = N, style = createStyle(numFmt = '0%', border = 'right',
                                                                     borderColour = '#4F81BC'))

  conditionalFormatting(wb, sheet = val, cols = 15, rows = 3:(nrow(result.smb$ivtable) + 1), rule ='$O3=0',
      style = createStyle(fontColour = 'red', halign = 'center', valign = 'center', textDecoration = 'bold'))
  conditionalFormatting(wb, sheet = val, cols = 15, rows = 3:(nrow(result.smb$ivtable) + 1), rule ='$O3>0',
      style = createStyle(fontColour = 'black', halign = 'center', valign = 'center', textDecoration = 'bold'))

  writeData(wb, sheet = val, val, startCol = 'A', startRow = 1)
  writeData(wb, sheet = val, data.frame(binning.sql), colNames = FALSE, rowNames = FALSE,
            startCol = 'A', startRow = nrow(result.smb$ivtable) + 4)

  writeFormula(wb, sheet = val, startCol = 'A', 
               startRow = nrow(result.smb$ivtable) + 5 + length(binning.sql), 
               x = makeHyperlinkString(sheet = 'IV Table', row = i + 2, col = 1,
                                       text = 'Link to IV Table'))
}     # End next i
## 1 - disbursed_amount
## 2 - asset_cost
## 3 - ltv
## 4 - branch_id
## 5 - supplier_id
## 6 - manufacturer_id
## 7 - Current_pincode_ID
## 8 - Employment_Type
## 9 - State_ID
## 10 - Employee_code_ID
## 11 - Aadhar_flag
## 12 - PAN_flag
## 13 - VoterID_flag
## 14 - Driving_flag
## 15 - Passport_flag
## 16 - PERFORM_CNS_SCORE
## 17 - PERFORM_CNS_SCORE_DESCRIPTION
## 18 - PRI_NO_OF_ACCTS
## 19 - PRI_ACTIVE_ACCTS
## 20 - PRI_OVERDUE_ACCTS
## 21 - PRI_SANCTIONED_AMOUNT
## 22 - PRI_DISBURSED_AMOUNT
## 23 - SEC_NO_OF_ACCTS
## 24 - SEC_ACTIVE_ACCTS
## 25 - SEC_OVERDUE_ACCTS
## 26 - SEC_CURRENT_BALANCE
## 27 - SEC_SANCTIONED_AMOUNT
## 28 - SEC_DISBURSED_AMOUNT
## 29 - PRIMARY_INSTAL_AMT
## 30 - SEC_INSTAL_AMT
## 31 - NEW_ACCTS_IN_LAST_SIX_MONTHS
## 32 - DELINQUENT_ACCTS_IN_LAST_SIX_M
## 33 - NO_OF_INQUIRIES
## 34 - Age
## 35 - YearsSinceDisbursment
## 36 - AverageLoanTenure
## 37 - TimeSinceFirstLoan
## 38 - YearsOnLoan
## 39 - DisAsDiff
## 40 - DisAsShare
## 41 - Qrt
## 42 - Day
## 43 - OutstandingNow
## 44 - DisbursedTotal
## 45 - ShareOverdue
## 46 - SEC_OverdueShareSec
## 47 - PRI_OverdueShare
# Flag Recovery after Optimal Binning
DF0 %<>% dplyr::mutate( GB_flag = ifelse(GB_flag == 1, 0, 1) )

# Write MySQL code for Coarse Classing Selected Variables
write_lines(x = TotalBinning.sql, path = NameMySQLCode, na = "NA", append = FALSE)

N <- 3:(nrow(binning.df) + 2)
# writeDataTable(wb, sheet = 'IV Table', x = binning.df, tableStyle = 'TableStyleMedium4', startCol = 'A',
#               startRow = 2, tableName = 'IVTable', firstColumn = FALSE, lastColumn = TRUE, bandedRows = TRUE)

# Set Columns widths
setColWidths(wb, sheet = 'IV Table', cols = 1:2, widths = c(15, 12))
setColWidths(wb, sheet = 'IV Table', cols = 11, widths = c(12))

conditionalFormatting(wb, sheet = 'IV Table', cols = 2, rows = N, type = 'databar',
                      border = FALSE, style = c('red', 'royalblue'))
addStyle(wb, sheet = 'IV Table', cols = 2, rows = N, 
         style = createStyle(border = 'right', borderColour = '#9CB95C'))
addStyle(wb, sheet = 'IV Table', cols = 10, rows = N, 
         style = createStyle(border = 'right', borderColour = '#9CB95C'))

writeData(wb, sheet = 'IV Table', 'IV Table', startCol = 'A', startRow = 1)
addStyle(wb, sheet = 'IV Table', cols = 1, rows = 1, 
         style = createStyle(fontSize = 16, textDecoration = 'bold'))

for (i in 1:nrow(binning.df)) {
  ## Internal - Text to display
  val = binning.df[i, 'Variable']
  writeFormula(wb, sheet = 'IV Table', startCol = 'A', startRow = i + 2, 
    x = makeHyperlinkString(sheet = val, row = 1, col = 1, text = val))
}

# # Open MS Excel
# openXL(wb)

 remove(NamesOfVariables, result1.smb, result2.smb, result3.smb, result4.smb, result.smb, chr_vec, # binning.df,
        j, TotalBinning.sql, theBest, binning.sql, val, val_fct, i, type, cbs1cuts, N)

Coarse Classing Factor and Numeric Variables

  1. Fine Classing

Create 10/20 bins/groups for a continuous independent variable and then calculates WOE and IV of the variable

  1. Coarse Classing

Combine adjacent categories with similar WOE scores

Rules related to WOE

  • Each category (bin) should have at least 5% of the observations.

  • Each category (bin) should be non-zero for both non-events and events.

  • The WOE should be distinct for each category. Similar groups should be aggregated.

  • The WOE should be monotonic, i.e. either growing or decreasing with the groupings.

Missing values are binned separately.

Split Train and Testing Sets

It is a good idea to use a validation hold out set. This is a sample of the data that we hold back from our analysis and modeling. We use it right at the end of our project to confirm the accuracy of our final model. It is a smoke test that we can use to see if we messed up and to give us confidence on our estimates of accuracy on unseen data.

## 
##    Bad   Good 
##  45557 164281

Exploratory data analysis (EDA)

Population Proportion – Sample Size

\[ \displaystyle \large X = \frac{Z_{\alpha/2}^2 × p × (1\ - \ p)}{\beta^2} \hspace{.5 in} [1]\]

where, \(Z_{\alpha/2}\) is the critical value of the Normal distribution at \(\alpha\) (e.g. for a confidence level of 98%, \({\alpha}\) (Type I Error or False Positive) is 0.02, so \(alpha/2\) = 0.01 and \(Z_{\alpha/2}\) - the critical value is 2.326) and \({\beta}\) for calculating the Margin of Error (Type II Error or False Negative) is 0.05,

  • \(\beta\) is the margin of error by (1 - Power),

  • p is the sample proportion,

and X is the population size.

## Sampling into two unequal sample sizes with Fraction (60.73%) of Training & Test Sets.
## Full Dataset has Real Sample Size is equal 345546 observations.
##   Therefore the Training Dateset might be 240000 obs. and the Test Dateset - 105546 obs.
##         Samples
## Subjects inTrain inTest
##     Bad    45557   5054
##     Good  164281  18262
## 
##  Fisher's Exact Test for Count Data
## 
## data:  .
## p-value = 0.9067
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.9695789 1.0356963
## sample estimates:
## odds ratio 
##    1.00203
## 
##      difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.02259551
##              n1 = 209838
##              n2 = 22818.26
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: different sample sizes

## With Treatment Probability of Success (Bad Rate on Training dataset = 21.71%) and Bias = 0.94%, Significance level (1 - alpha = 95.0%) & Power (1 - beta = 90.0%) Minimal Sample Size of the Binomial distribution should be 22819 obs., but Test set is 23316 obs.

Unimodal Data Visualizations: Information Values & Distrubutions

Let’s look at visualizations of individual attributes. It is often useful to look at your data using multiple different visualizations in order to spark ideas. Let’s look at histograms of each attribute to get a sense of the data distributions.

## There are no any Numeric feature in `X` data.frame.
## There are no any Factor or Character feature in `X` data.frame.

Important Points

  1. Information value increases as bins / groups increases for an independent variable. Be careful when there are more than 20 bins as some bins may have a very few number of events and non-events.

  2. Information value should not be used as a feature selection method when you are building a classification model other than binary logistic regression (for example, random forest or SVM) as it’s designed for binary logistic regression model only.

Multimodal Data Visualizations - The Correlation Matrix

Let’s look at some visualizations of the interactions between outcome and predictors. The best place to start is a scatter plot matrix.

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:Hmisc':
## 
##     subplot
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## Loading required package: caret
## 
## Attaching package: 'caret'
## The following object is masked from 'package:survival':
## 
##     cluster
## The following object is masked from 'package:purrr':
## 
##     lift
## Loading required package: reshape2
## 
## Attaching package: 'reshape2'
## The following objects are masked from 'package:data.table':
## 
##     dcast, melt
## The following object is masked from 'package:tidyr':
## 
##     smiths
##   Var1                                                Var2 Spearman
## 1    Y     Employee_code_ID_fct.2..Employee_code_ID....153   0.0743
## 2    Y                  DisAsDiff_fct.5..DisAsDiff...19822   0.0679
## 3    Y Current_pincode_ID_fct.3..Current_pincode_ID....174   0.0673
## 4    Y               DisAsShare_fct.7..DisAsShare...1.7649   0.0655
## 5    Y     Employee_code_ID_fct.3..Employee_code_ID....170   0.0645
## 6    Y   PERFORM_CNS_SCORE_fct.6..PERFORM_CNS_SCORE....824   0.0644
## 7    Y               supplier_id_fct.3..supplier_id....165   0.0573
##   Var1                                                 Var2 Spearman
## 1    Y  Current_pincode_ID_fct.12..Current_pincode_ID...291  -0.1252
## 2    Y      Employee_code_ID_fct.15..Employee_code_ID...319  -0.1147
## 3    Y                supplier_id_fct.13..supplier_id...314  -0.1057
## 4    Y       OutstandingNow_fct.3..OutstandingNow....171384  -0.0907
## 5    Y Current_pincode_ID_fct.11..Current_pincode_ID....291  -0.0839
## 6    Y     Employee_code_ID_fct.14..Employee_code_ID....319  -0.0823
## 7    Y               supplier_id_fct.12..supplier_id....314  -0.0809

This helps point out the skew in many distributions so much so that data looks like outliers (e.g. beyond the whisker of the plots).

Select Features

Feature Selection (removing correlated attributes or reduce the quality of classification), but Transformations (Box-Cox or YeoJohnson) could not apply to factors.

## [1] freqRatio     percentUnique zeroVar       nzv          
## <0 rows> (or 0-length row.names)
## [1] "Dropping 0 zero variance predictors from 47 (fraction=  0.000000)"
## integer(0)
## Compare row 18  and column  19 with corr  0.869 
##   Means:  0.239 vs 0.104 so flagging column 18 
## Compare row 37  and column  36 with corr  0.843 
##   Means:  0.211 vs 0.098 so flagging column 37 
## Compare row 44  and column  43 with corr  0.871 
##   Means:  0.206 vs 0.094 so flagging column 44 
## Compare row 36  and column  47 with corr  0.846 
##   Means:  0.186 vs 0.088 so flagging column 36 
## Compare row 47  and column  16 with corr  0.883 
##   Means:  0.162 vs 0.084 so flagging column 47 
## Compare row 21  and column  22 with corr  0.99 
##   Means:  0.132 vs 0.081 so flagging column 21 
## Compare row 31  and column  45 with corr  0.949 
##   Means:  0.134 vs 0.079 so flagging column 31 
## Compare row 23  and column  24 with corr  0.803 
##   Means:  0.143 vs 0.075 so flagging column 23 
## Compare row 24  and column  27 with corr  0.989 
##   Means:  0.129 vs 0.072 so flagging column 24 
## Compare row 27  and column  28 with corr  0.995 
##   Means:  0.106 vs 0.069 so flagging column 27 
## Compare row 28  and column  26 with corr  0.933 
##   Means:  0.081 vs 0.068 so flagging column 28 
## Compare row 40  and column  39 with corr  0.812 
##   Means:  0.102 vs 0.067 so flagging column 40 
## Compare row 34  and column  38 with corr  0.996 
##   Means:  0.082 vs 0.065 so flagging column 34 
## Compare row 11  and column  13 with corr  0.869 
##   Means:  0.093 vs 0.064 so flagging column 11 
## Compare row 35  and column  41 with corr  0.824 
##   Means:  0.064 vs 0.063 so flagging column 35 
## All correlations <= 0.8
## [1] "Dropping 15 predictors due to high correlation to others (multicollinearity) 47 (fraction=  0.319149)"
##  [1] "PRI_NO_OF_ACCTS_fct"              "TimeSinceFirstLoan_fct"          
##  [3] "DisbursedTotal_fct"               "AverageLoanTenure_fct"           
##  [5] "PRI_OverdueShare_fct"             "PRI_SANCTIONED_AMOUNT_fct"       
##  [7] "NEW_ACCTS_IN_LAST_SIX_MONTHS_fct" "SEC_NO_OF_ACCTS_fct"             
##  [9] "SEC_ACTIVE_ACCTS_fct"             "SEC_SANCTIONED_AMOUNT_fct"       
## [11] "SEC_DISBURSED_AMOUNT_fct"         "DisAsShare_fct"                  
## [13] "Age_fct"                          "Aadhar_flag_fct"                 
## [15] "YearsSinceDisbursment_fct"
## [1] "Remaining 32 Variables"
##  [1] "disbursed_amount_fct"              
##  [2] "asset_cost_fct"                    
##  [3] "ltv_fct"                           
##  [4] "branch_id_fct"                     
##  [5] "supplier_id_fct"                   
##  [6] "manufacturer_id_fct"               
##  [7] "Current_pincode_ID_fct"            
##  [8] "Employment_Type_fct"               
##  [9] "State_ID_fct"                      
## [10] "Employee_code_ID_fct"              
## [11] "PAN_flag_fct"                      
## [12] "VoterID_flag_fct"                  
## [13] "Driving_flag_fct"                  
## [14] "Passport_flag_fct"                 
## [15] "PERFORM_CNS_SCORE_fct"             
## [16] "PERFORM_CNS_SCORE_DESCRIPTION_fct" 
## [17] "PRI_ACTIVE_ACCTS_fct"              
## [18] "PRI_OVERDUE_ACCTS_fct"             
## [19] "PRI_DISBURSED_AMOUNT_fct"          
## [20] "SEC_OVERDUE_ACCTS_fct"             
## [21] "SEC_CURRENT_BALANCE_fct"           
## [22] "PRIMARY_INSTAL_AMT_fct"            
## [23] "SEC_INSTAL_AMT_fct"                
## [24] "DELINQUENT_ACCTS_IN_LAST_SIX_M_fct"
## [25] "NO_OF_INQUIRIES_fct"               
## [26] "YearsOnLoan_fct"                   
## [27] "DisAsDiff_fct"                     
## [28] "Qrt_fct"                           
## [29] "Day_fct"                           
## [30] "OutstandingNow_fct"                
## [31] "ShareOverdue_fct"                  
## [32] "SEC_OverdueShareSec_fct"

Entropy-Based Feature Selection

Different methods for calculating the feature importance are built into FSelectorRcpp’s function information_gain(). I recommend using a fast but effective method FSelectorRcpp_information.gain written in C++ from package FSelectorRcpp.

Simple Logit-Model

The Logit in logistic regression is a special case of a link function in a Generalized Linear Model (GLM): it is the canonical link function for the Bernoulli distribution.

The logistic model is usually represented as:

\[ \displaystyle \large \pi(Y)=\frac{\exp(\beta_0+\beta_1X)}{1+\exp(\beta_0+\beta_1X)} \hspace{.5 in} [2]\] or going into the common linear regression model:

\[ \displaystyle \large \ln\left(\frac{\pi(Y)}{1-\pi(Y)}\right)=\beta_0+\beta_1X \hspace{.5 in} [3]\]

Therefore, logit itself is obtained:

\[ \displaystyle \large \Pr(Y=1 \mid X) = [1 + e^{-X'\beta}]^{-1} \hspace{.5 in} [4]\]

## 
##  Logit-Model on Training Set
## **** Computing starting values:
## 
Rows Processed: 209838 
## 
Rows Processed: 209838 
## **** Scoring iteration #3:
## 
## Deviance: 201058.0735
## 
Rows Processed: 209838 
## **** Scoring iteration #4:
## 
## Deviance: 199961.8653
## 
Rows Processed: 209838 
## **** Scoring iteration #5:
## 
## Deviance: 199949.6311
## 
Rows Processed: 209838 
## **** Scoring iteration #6:
## 
## Deviance: 199949.6240
## 
Rows Processed: 209838 
## 
## 
## Logistic Regression Results for: Y~Employee_code_ID_fct+Current_pincode_ID_fct+supplier_id_fct+branch_id_fct+ltv_fct+PERFORM_CNS_SCORE_fct+disbursed_amount_fct+OutstandingNow_fct+PERFORM_CNS_SCORE_DESCRIPTION_fct+State_ID_fct+DisAsDiff_fct+PRI_DISBURSED_AMOUNT_fct+PRI_OVERDUE_ACCTS_fct+manufacturer_id_fct+VoterID_flag_fct+ShareOverdue_fct+PRI_ACTIVE_ACCTS_fct+Day_fct+NO_OF_INQUIRIES_fct+DELINQUENT_ACCTS_IN_LAST_SIX_M_fct+Qrt_fct+YearsOnLoan_fct+Employment_Type_fct+PRIMARY_INSTAL_AMT_fct+asset_cost_fct+SEC_OverdueShareSec_fct+SEC_CURRENT_BALANCE_fct+Passport_flag_fct+Driving_flag_fct+SEC_INSTAL_AMT_fct+PAN_flag_fct+SEC_OVERDUE_ACCTS_fct
## ************************************************************************************************************************
## Dependent Variable: Y
## Total independent variables: 163 (Including number dropped: 33)
## Number of valid observations: 209838
## -2*LogLikelihood: 199950 (Residual Deviance on 209708 degrees of freedom)
## Row        Coeffs.                                                                                 Value      Std. Error         t Value        Pr(>|t|)
## [   1,.]   (Intercept)                                                                           0.7391          0.2000          3.6958          0.0002
## [   2,.]   Employee_code_ID_fct=1: Employee_code_ID <= 134                                       0.7657          0.0427         17.9335          0.0000
## [   3,.]   Employee_code_ID_fct=10: Employee_code_ID <= 242                                     -0.0770          0.0303         -2.5387          0.0111
## [   4,.]   Employee_code_ID_fct=11: Employee_code_ID <= 254                                     -0.1015          0.0296         -3.4293          0.0006
## [   5,.]   Employee_code_ID_fct=12: Employee_code_ID <= 270                                     -0.1436          0.0297         -4.8288          0.0000
## [   6,.]   Employee_code_ID_fct=13: Employee_code_ID <= 289                                     -0.2340          0.0293         -7.9857          0.0000
## [   7,.]   Employee_code_ID_fct=14: Employee_code_ID <= 319                                     -0.2577          0.0286         -9.0069          0.0000
## [   8,.]   Employee_code_ID_fct=15: Employee_code_ID > 319                                      -0.4823          0.0320        -15.0504          0.0000
## [   9,.]   Employee_code_ID_fct=2: Employee_code_ID <= 153                                       0.5028          0.0339         14.8256          0.0000
## [  10,.]   Employee_code_ID_fct=3: Employee_code_ID <= 170                                       0.4005          0.0298         13.4511          0.0000
## [  11,.]   Employee_code_ID_fct=4: Employee_code_ID <= 179                                       0.2856          0.0337          8.4796          0.0000
## [  12,.]   Employee_code_ID_fct=5: Employee_code_ID <= 188                                       0.2518          0.0318          7.9085          0.0000
## [  13,.]   Employee_code_ID_fct=6: Employee_code_ID <= 199                                       0.1600          0.0297          5.3933          0.0000
## [  14,.]   Employee_code_ID_fct=7: Employee_code_ID <= 211                                       0.0937          0.0291          3.2206          0.0013
## [  15,.]   Employee_code_ID_fct=8: Employee_code_ID <= 221                                       0.0284          0.0298          0.9526          0.3408
## [  16,.]   Employee_code_ID_fct=9: Employee_code_ID <= 233                                      Dropped         Dropped         Dropped          0.0000
## [  17,.]   Current_pincode_ID_fct=1: Current_pincode_ID <= 143                                   0.8572          0.0388         22.0715          0.0000
## [  18,.]   Current_pincode_ID_fct=10: Current_pincode_ID <= 257                                 -0.1809          0.0245         -7.3807          0.0000
## [  19,.]   Current_pincode_ID_fct=11: Current_pincode_ID <= 291                                 -0.2728          0.0236        -11.5684          0.0000
## [  20,.]   Current_pincode_ID_fct=12: Current_pincode_ID > 291                                  -0.3917          0.0259        -15.1180          0.0000
## [  21,.]   Current_pincode_ID_fct=2: Current_pincode_ID <= 158                                   0.7512          0.0370         20.3062          0.0000
## [  22,.]   Current_pincode_ID_fct=3: Current_pincode_ID <= 174                                   0.5712          0.0278         20.5606          0.0000
## [  23,.]   Current_pincode_ID_fct=4: Current_pincode_ID <= 188                                   0.4876          0.0269         18.1216          0.0000
## [  24,.]   Current_pincode_ID_fct=5: Current_pincode_ID <= 201                                   0.3985          0.0244         16.3081          0.0000
## [  25,.]   Current_pincode_ID_fct=6: Current_pincode_ID <= 212                                   0.3153          0.0253         12.4573          0.0000
## [  26,.]   Current_pincode_ID_fct=7: Current_pincode_ID <= 219                                   0.1621          0.0295          5.4903          0.0000
## [  27,.]   Current_pincode_ID_fct=8: Current_pincode_ID <= 225                                   0.1075          0.0296          3.6285          0.0003
## [  28,.]   Current_pincode_ID_fct=9: Current_pincode_ID <= 238                                  Dropped         Dropped         Dropped          0.0000
## [  29,.]   supplier_id_fct=1: supplier_id <= 133                                                 0.3345          0.0425          7.8740          0.0000
## [  30,.]   supplier_id_fct=10: supplier_id <= 253                                               -0.0792          0.0270         -2.9305          0.0034
## [  31,.]   supplier_id_fct=11: supplier_id <= 275                                               -0.0973          0.0248         -3.9244          0.0001
## [  32,.]   supplier_id_fct=12: supplier_id <= 314                                               -0.1325          0.0259         -5.1236          0.0000
## [  33,.]   supplier_id_fct=13: supplier_id > 314                                                -0.2318          0.0306         -7.5686          0.0000
## [  34,.]   supplier_id_fct=2: supplier_id <= 149                                                 0.3415          0.0373          9.1609          0.0000
## [  35,.]   supplier_id_fct=3: supplier_id <= 165                                                 0.2308          0.0311          7.4297          0.0000
## [  36,.]   supplier_id_fct=4: supplier_id <= 178                                                 0.1855          0.0288          6.4352          0.0000
## [  37,.]   supplier_id_fct=5: supplier_id <= 196                                                 0.1510          0.0256          5.9035          0.0000
## [  38,.]   supplier_id_fct=6: supplier_id <= 206                                                 0.1296          0.0287          4.5106          0.0000
## [  39,.]   supplier_id_fct=7: supplier_id <= 214                                                 0.1039          0.0304          3.4171          0.0006
## [  40,.]   supplier_id_fct=8: supplier_id <= 225                                                 0.0651          0.0260          2.5088          0.0121
## [  41,.]   supplier_id_fct=9: supplier_id <= 240                                                Dropped         Dropped         Dropped          0.0000
## [  42,.]   branch_id_fct=1: branch_id <= 153                                                    -0.4503          0.0410        -10.9721          0.0000
## [  43,.]   branch_id_fct=10: branch_id <= 284                                                    0.0550          0.0304          1.8089          0.0705
## [  44,.]   branch_id_fct=11: branch_id > 284                                                     0.0527          0.0414          1.2718          0.2034
## [  45,.]   branch_id_fct=2: branch_id <= 174                                                    -0.4923          0.0384        -12.8176          0.0000
## [  46,.]   branch_id_fct=3: branch_id <= 184                                                    -0.3828          0.0317        -12.0901          0.0000
## [  47,.]   branch_id_fct=4: branch_id <= 198                                                    -0.3148          0.0281        -11.2160          0.0000
## [  48,.]   branch_id_fct=5: branch_id <= 214                                                    -0.2090          0.0319         -6.5462          0.0000
## [  49,.]   branch_id_fct=6: branch_id <= 222                                                    -0.1907          0.0332         -5.7400          0.0000
## [  50,.]   branch_id_fct=7: branch_id <= 233                                                    -0.0964          0.0316         -3.0471          0.0023
## [  51,.]   branch_id_fct=8: branch_id <= 261                                                    -0.2644          0.0302         -8.7606          0.0000
## [  52,.]   branch_id_fct=9: branch_id <= 276                                                    Dropped         Dropped         Dropped          0.0000
## [  53,.]   ltv_fct=1: ltv <= 55.63                                                               0.6741          0.0483         13.9558          0.0000
## [  54,.]   ltv_fct=10: ltv <= 84.57                                                             -0.1191          0.0296         -4.0282          0.0001
## [  55,.]   ltv_fct=11: ltv <= 85                                                                -0.2560          0.0299         -8.5656          0.0000
## [  56,.]   ltv_fct=12: ltv <= 87.8                                                              -0.1109          0.0328         -3.3790          0.0007
## [  57,.]   ltv_fct=13: ltv <= 89.3                                                              -0.2617          0.0335         -7.8184          0.0000
## [  58,.]   ltv_fct=14: ltv > 89.3                                                               -0.3100          0.0333         -9.3079          0.0000
## [  59,.]   ltv_fct=2: ltv <= 62.22                                                               0.5654          0.0424         13.3198          0.0000
## [  60,.]   ltv_fct=3: ltv <= 68.34                                                               0.3978          0.0373         10.6611          0.0000
## [  61,.]   ltv_fct=4: ltv <= 72.9301                                                             0.2696          0.0334          8.0815          0.0000
## [  62,.]   ltv_fct=5: ltv <= 74.31                                                               0.1395          0.0346          4.0289          0.0001
## [  63,.]   ltv_fct=6: ltv <= 75                                                                  0.0904          0.0336          2.6889          0.0072
## [  64,.]   ltv_fct=7: ltv <= 77.39                                                               0.1953          0.0306          6.3806          0.0000
## [  65,.]   ltv_fct=8: ltv <= 78.92                                                               0.0634          0.0287          2.2089          0.0272
## [  66,.]   ltv_fct=9: ltv <= 83.34                                                              Dropped         Dropped         Dropped          0.0000
## [  67,.]   PERFORM_CNS_SCORE_fct=1: PERFORM_CNS_SCORE <= 0                                      -0.1592          0.0722         -2.2036          0.0276
## [  68,.]   PERFORM_CNS_SCORE_fct=2: PERFORM_CNS_SCORE <= 18                                     -0.0610          0.0589         -1.0345          0.3009
## [  69,.]   PERFORM_CNS_SCORE_fct=3: PERFORM_CNS_SCORE <= 441                                    -0.2494          0.0714         -3.4913          0.0005
## [  70,.]   PERFORM_CNS_SCORE_fct=4: PERFORM_CNS_SCORE <= 643                                    -0.1389          0.0618         -2.2491          0.0245
## [  71,.]   PERFORM_CNS_SCORE_fct=5: PERFORM_CNS_SCORE <= 738                                    -0.0788          0.0409         -1.9274          0.0539
## [  72,.]   PERFORM_CNS_SCORE_fct=6: PERFORM_CNS_SCORE <= 824                                     0.1750          0.0449          3.8987          0.0001
## [  73,.]   PERFORM_CNS_SCORE_fct=7: PERFORM_CNS_SCORE > 824                                     Dropped         Dropped         Dropped          0.0000
## [  74,.]   disbursed_amount_fct=1: disbursed_amount <= 39134                                     0.0393          0.0477          0.8234          0.4103
## [  75,.]   disbursed_amount_fct=2: disbursed_amount <= 43615                                     0.0841          0.0422          1.9918          0.0464
## [  76,.]   disbursed_amount_fct=3: disbursed_amount <= 48555                                     0.0864          0.0309          2.7959          0.0052
## [  77,.]   disbursed_amount_fct=4: disbursed_amount <= 51908                                     0.0959          0.0248          3.8738          0.0001
## [  78,.]   disbursed_amount_fct=5: disbursed_amount <= 55400                                     0.0522          0.0191          2.7370          0.0062
## [  79,.]   disbursed_amount_fct=6: disbursed_amount > 55400                                     Dropped         Dropped         Dropped          0.0000
## [  80,.]   OutstandingNow_fct=1: OutstandingNow <= 44402                                        -0.0596          0.0582         -1.0234          0.3061
## [  81,.]   OutstandingNow_fct=2: OutstandingNow <= 50314                                        -0.1223          0.0545         -2.2443          0.0248
## [  82,.]   OutstandingNow_fct=3: OutstandingNow <= 171384                                       -0.1834          0.0485         -3.7823          0.0002
## [  83,.]   OutstandingNow_fct=4: OutstandingNow <= 324324                                       -0.0857          0.0429         -1.9988          0.0456
## [  84,.]   OutstandingNow_fct=5: OutstandingNow <= 746271                                       -0.1138          0.0390         -2.9159          0.0035
## [  85,.]   OutstandingNow_fct=6: OutstandingNow > 746271                                        Dropped         Dropped         Dropped          0.0000
## [  86,.]   PERFORM_CNS_SCORE_DESCRIPTION_fct=1: PERFORM_CNS_SCORE_DESCRIPTION <= 150             0.3537          0.0632          5.5963          0.0000
## [  87,.]   PERFORM_CNS_SCORE_DESCRIPTION_fct=2: PERFORM_CNS_SCORE_DESCRIPTION <= 172             0.3311          0.0616          5.3773          0.0000
## [  88,.]   PERFORM_CNS_SCORE_DESCRIPTION_fct=3: PERFORM_CNS_SCORE_DESCRIPTION <= 205             0.2451          0.0500          4.9071          0.0000
## [  89,.]   PERFORM_CNS_SCORE_DESCRIPTION_fct=4: PERFORM_CNS_SCORE_DESCRIPTION <= 231             0.2198          0.0622          3.5352          0.0004
## [  90,.]   PERFORM_CNS_SCORE_DESCRIPTION_fct=5: PERFORM_CNS_SCORE_DESCRIPTION <= 256             0.0049          0.0358          0.1370          0.8910
## [  91,.]   PERFORM_CNS_SCORE_DESCRIPTION_fct=6: PERFORM_CNS_SCORE_DESCRIPTION > 256             Dropped         Dropped         Dropped          0.0000
## [  92,.]   State_ID_fct=1: State_ID <= 183                                                      -0.0597          0.0494         -1.2078          0.2271
## [  93,.]   State_ID_fct=2: State_ID <= 188                                                      -0.0293          0.0464         -0.6329          0.5268
## [  94,.]   State_ID_fct=3: State_ID <= 206                                                      -0.0022          0.0432         -0.0499          0.9602
## [  95,.]   State_ID_fct=4: State_ID <= 214                                                       0.0796          0.0425          1.8719          0.0612
## [  96,.]   State_ID_fct=5: State_ID <= 220                                                       0.1009          0.0497          2.0304          0.0423
## [  97,.]   State_ID_fct=6: State_ID <= 229                                                       0.0639          0.0512          1.2496          0.2115
## [  98,.]   State_ID_fct=7: State_ID <= 272                                                       0.0847          0.0445          1.9050          0.0568
## [  99,.]   State_ID_fct=8: State_ID > 272                                                       Dropped         Dropped         Dropped          0.0000
## [ 100,.]   DisAsDiff_fct=1: DisAsDiff <= 13554                                                  -0.1302          0.0396         -3.2854          0.0010
## [ 101,.]   DisAsDiff_fct=2: DisAsDiff <= 15670                                                  -0.0822          0.0342         -2.4020          0.0163
## [ 102,.]   DisAsDiff_fct=3: DisAsDiff <= 16661                                                   0.0062          0.0359          0.1722          0.8633
## [ 103,.]   DisAsDiff_fct=4: DisAsDiff <= 19822                                                  -0.0175          0.0254         -0.6869          0.4921
## [ 104,.]   DisAsDiff_fct=5: DisAsDiff > 19822                                                   Dropped         Dropped         Dropped          0.0000
## [ 105,.]   PRI_DISBURSED_AMOUNT_fct=1: PRI_DISBURSED_AMOUNT <= 218581                           -0.2560          0.0387         -6.6095          0.0000
## [ 106,.]   PRI_DISBURSED_AMOUNT_fct=2: PRI_DISBURSED_AMOUNT > 218581                            Dropped         Dropped         Dropped          0.0000
## [ 107,.]   PRI_OVERDUE_ACCTS_fct=1: PRI_OVERDUE_ACCTS <= 0                                       0.1444          0.0287          5.0372          0.0000
## [ 108,.]   PRI_OVERDUE_ACCTS_fct=2: PRI_OVERDUE_ACCTS > 0                                       Dropped         Dropped         Dropped          0.0000
## [ 109,.]   manufacturer_id_fct=1: manufacturer_id <= 210                                         0.1232          0.0239          5.1589          0.0000
## [ 110,.]   manufacturer_id_fct=2: manufacturer_id <= 221                                        -0.0035          0.0278         -0.1266          0.8993
## [ 111,.]   manufacturer_id_fct=3: manufacturer_id <= 228                                         0.0977          0.0261          3.7512          0.0002
## [ 112,.]   manufacturer_id_fct=4: manufacturer_id > 228                                         Dropped         Dropped         Dropped          0.0000
## [ 113,.]   VoterID_flag_fct=1: VoterID_flag = 0                                                  0.0935          0.0195          4.8004          0.0000
## [ 114,.]   VoterID_flag_fct=2: VoterID_flag = 1                                                 Dropped         Dropped         Dropped          0.0000
## [ 115,.]   ShareOverdue_fct=1: ShareOverdue <= -2                                                0.0923          0.0318          2.8997          0.0037
## [ 116,.]   ShareOverdue_fct=2: ShareOverdue <= -1                                                0.1288          0.0233          5.5417          0.0000
## [ 117,.]   ShareOverdue_fct=3: ShareOverdue > -1                                                Dropped         Dropped         Dropped          0.0000
## [ 118,.]   PRI_ACTIVE_ACCTS_fct=1: PRI_ACTIVE_ACCTS <= 0                                        -0.2271          0.0443         -5.1314          0.0000
## [ 119,.]   PRI_ACTIVE_ACCTS_fct=2: PRI_ACTIVE_ACCTS <= 1                                        -0.2429          0.0334         -7.2669          0.0000
## [ 120,.]   PRI_ACTIVE_ACCTS_fct=3: PRI_ACTIVE_ACCTS <= 3                                        -0.2096          0.0288         -7.2732          0.0000
## [ 121,.]   PRI_ACTIVE_ACCTS_fct=4: PRI_ACTIVE_ACCTS > 3                                         Dropped         Dropped         Dropped          0.0000
## [ 122,.]   Day_fct=1: Day <= 28                                                                  0.2494          0.0213         11.7055          0.0000
## [ 123,.]   Day_fct=2: Day <= 30                                                                  0.1283          0.0260          4.9301          0.0000
## [ 124,.]   Day_fct=3: Day > 30                                                                  Dropped         Dropped         Dropped          0.0000
## [ 125,.]   NO_OF_INQUIRIES_fct=1: NO_OF_INQUIRIES <= 0                                           0.2732          0.0170         16.0551          0.0000
## [ 126,.]   NO_OF_INQUIRIES_fct=2: NO_OF_INQUIRIES > 0                                           Dropped         Dropped         Dropped          0.0000
## [ 127,.]   DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=1: DELINQUENT_ACCTS_IN_LAST_SIX_M <= 0             0.2820          0.0244         11.5364          0.0000
## [ 128,.]   DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=2: DELINQUENT_ACCTS_IN_LAST_SIX_M > 0             Dropped         Dropped         Dropped          0.0000
## [ 129,.]   Qrt_fct=1: Qrt = 3                                                                    0.2191          0.0118         18.5775          0.0000
## [ 130,.]   Qrt_fct=2: Qrt = 4                                                                   Dropped         Dropped         Dropped          0.0000
## [ 131,.]   YearsOnLoan_fct=1: YearsOnLoan <= 22.8918                                            -0.3841          0.0306        -12.5451          0.0000
## [ 132,.]   YearsOnLoan_fct=2: YearsOnLoan <= 28.8496                                            -0.2784          0.0266        -10.4683          0.0000
## [ 133,.]   YearsOnLoan_fct=3: YearsOnLoan <= 38.8321                                            -0.1799          0.0259         -6.9531          0.0000
## [ 134,.]   YearsOnLoan_fct=4: YearsOnLoan <= 51.8208                                            -0.0930          0.0265         -3.5125          0.0004
## [ 135,.]   YearsOnLoan_fct=5: YearsOnLoan > 51.8208                                             Dropped         Dropped         Dropped          0.0000
## [ 136,.]   Employment_Type_fct=1: Employment_Type = 203                                          0.1524          0.0123         12.4431          0.0000
## [ 137,.]   Employment_Type_fct=2: Employment_Type = 215                                          0.2166          0.0347          6.2474          0.0000
## [ 138,.]   Employment_Type_fct=3: Employment_Type = 227                                         Dropped         Dropped         Dropped          0.0000
## [ 139,.]   PRIMARY_INSTAL_AMT_fct=1: PRIMARY_INSTAL_AMT <= 1564                                 -0.0180          0.0325         -0.5519          0.5810
## [ 140,.]   PRIMARY_INSTAL_AMT_fct=2: PRIMARY_INSTAL_AMT <= 2832                                 -0.2560          0.0375         -6.8204          0.0000
## [ 141,.]   PRIMARY_INSTAL_AMT_fct=3: PRIMARY_INSTAL_AMT <= 5033                                 -0.1442          0.0387         -3.7226          0.0002
## [ 142,.]   PRIMARY_INSTAL_AMT_fct=4: PRIMARY_INSTAL_AMT <= 25326                                -0.1437          0.0325         -4.4164          0.0000
## [ 143,.]   PRIMARY_INSTAL_AMT_fct=5: PRIMARY_INSTAL_AMT > 25326                                 Dropped         Dropped         Dropped          0.0000
## [ 144,.]   asset_cost_fct=1: asset_cost <= 60098                                                 0.0726          0.0457          1.5873          0.1124
## [ 145,.]   asset_cost_fct=2: asset_cost <= 70561                                                 0.1494          0.0296          5.0467          0.0000
## [ 146,.]   asset_cost_fct=3: asset_cost <= 85738                                                 0.1776          0.0224          7.9199          0.0000
## [ 147,.]   asset_cost_fct=4: asset_cost > 85738                                                 Dropped         Dropped         Dropped          0.0000
## [ 148,.]   SEC_OverdueShareSec_fct=1: SEC_OverdueShareSec <= 0                                  -0.0599          0.0932         -0.6423          0.5206
## [ 149,.]   SEC_OverdueShareSec_fct=11: SEC_OverdueShareSec Is Null                              -0.0752          0.1024         -0.7339          0.4630
## [ 150,.]   SEC_OverdueShareSec_fct=8: SEC_OverdueShareSec <= 0.2                                 0.2864          0.2454          1.1672          0.2431
## [ 151,.]   SEC_OverdueShareSec_fct=9: SEC_OverdueShareSec <= 1                                  Dropped         Dropped         Dropped          0.0000
## [ 152,.]   SEC_CURRENT_BALANCE_fct=1: SEC_CURRENT_BALANCE <= 0                                  -0.0239          0.0779         -0.3071          0.7588
## [ 153,.]   SEC_CURRENT_BALANCE_fct=10: SEC_CURRENT_BALANCE > 0                                  Dropped         Dropped         Dropped          0.0000
## [ 154,.]   Passport_flag_fct=1: Passport_flag = 0                                               -0.1880          0.1371         -1.3710          0.1704
## [ 155,.]   Passport_flag_fct=2: Passport_flag = 1                                               Dropped         Dropped         Dropped          0.0000
## [ 156,.]   Driving_flag_fct=1: Driving_flag = 0                                                  0.0159          0.0384          0.4145          0.6785
## [ 157,.]   Driving_flag_fct=2: Driving_flag = 1                                                 Dropped         Dropped         Dropped          0.0000
## [ 158,.]   SEC_INSTAL_AMT_fct=1: SEC_INSTAL_AMT <= 0                                             0.0609          0.0778          0.7835          0.4334
## [ 159,.]   SEC_INSTAL_AMT_fct=10: SEC_INSTAL_AMT > 0                                            Dropped         Dropped         Dropped          0.0000
## [ 160,.]   PAN_flag_fct=1: PAN_flag = 0                                                         -0.0460          0.0230         -1.9976          0.0458
## [ 161,.]   PAN_flag_fct=2: PAN_flag = 1                                                         Dropped         Dropped         Dropped          0.0000
## [ 162,.]   SEC_OVERDUE_ACCTS_fct=1: SEC_OVERDUE_ACCTS <= 0                                      Dropped         Dropped         Dropped          0.0000
## [ 163,.]   SEC_OVERDUE_ACCTS_fct=10: SEC_OVERDUE_ACCTS > 0                                      Dropped         Dropped         Dropped          0.0000
## Condition number of final VC matrix: 1997.7639
## ************************************************************************************************************************
## 
##    user  system elapsed 
##    0.14    0.01    3.95
## Call:
## RevoScaleR::rxLogit(formula = paste("Y ~ ", paste(names(X), collapse = "+")), 
##     data = cbind(Y = Y[inTrain], X[inTrain, ], Pweights = weight.cases), 
##     pweights = "Pweights", reportProgress = 1, verbose = 1)
## 
## Logistic Regression Results for: Y ~
##     Employee_code_ID_fct+Current_pincode_ID_fct+supplier_id_fct+branch_id_fct+ltv_fct+PERFORM_CNS_SCORE_fct+disbursed_amount_fct+OutstandingNow_fct+PERFORM_CNS_SCORE_DESCRIPTION_fct+State_ID_fct+DisAsDiff_fct+PRI_DISBURSED_AMOUNT_fct+PRI_OVERDUE_ACCTS_fct+manufacturer_id_fct+VoterID_flag_fct+ShareOverdue_fct+PRI_ACTIVE_ACCTS_fct+Day_fct+NO_OF_INQUIRIES_fct+DELINQUENT_ACCTS_IN_LAST_SIX_M_fct+Qrt_fct+YearsOnLoan_fct+Employment_Type_fct+PRIMARY_INSTAL_AMT_fct+asset_cost_fct+SEC_OverdueShareSec_fct+SEC_CURRENT_BALANCE_fct+Passport_flag_fct+Driving_flag_fct+SEC_INSTAL_AMT_fct+PAN_flag_fct+SEC_OVERDUE_ACCTS_fct
## Data: cbind(Y = Y[inTrain], X[inTrain, ], Pweights = weight.cases)
## Dependent variable(s): Y
## Total independent variables: 163 (Including number dropped: 33)
## Number of valid observations: 209838
## Number of missing observations: 0 
## -2*LogLikelihood: 199949.624 (Residual deviance on 209708 degrees of freedom)
##  
## Coefficients:
##                                                                            Estimate
## (Intercept)                                                                0.739128
## Employee_code_ID_fct=1: Employee_code_ID <= 134                            0.765665
## Employee_code_ID_fct=10: Employee_code_ID <= 242                          -0.077044
## Employee_code_ID_fct=11: Employee_code_ID <= 254                          -0.101471
## Employee_code_ID_fct=12: Employee_code_ID <= 270                          -0.143575
## Employee_code_ID_fct=13: Employee_code_ID <= 289                          -0.234039
## Employee_code_ID_fct=14: Employee_code_ID <= 319                          -0.257740
## Employee_code_ID_fct=15: Employee_code_ID > 319                           -0.482348
## Employee_code_ID_fct=2: Employee_code_ID <= 153                            0.502839
## Employee_code_ID_fct=3: Employee_code_ID <= 170                            0.400538
## Employee_code_ID_fct=4: Employee_code_ID <= 179                            0.285620
## Employee_code_ID_fct=5: Employee_code_ID <= 188                            0.251824
## Employee_code_ID_fct=6: Employee_code_ID <= 199                            0.160031
## Employee_code_ID_fct=7: Employee_code_ID <= 211                            0.093653
## Employee_code_ID_fct=8: Employee_code_ID <= 221                            0.028377
## Employee_code_ID_fct=9: Employee_code_ID <= 233                             Dropped
## Current_pincode_ID_fct=1: Current_pincode_ID <= 143                        0.857166
## Current_pincode_ID_fct=10: Current_pincode_ID <= 257                      -0.180906
## Current_pincode_ID_fct=11: Current_pincode_ID <= 291                      -0.272803
## Current_pincode_ID_fct=12: Current_pincode_ID > 291                       -0.391739
## Current_pincode_ID_fct=2: Current_pincode_ID <= 158                        0.751176
## Current_pincode_ID_fct=3: Current_pincode_ID <= 174                        0.571246
## Current_pincode_ID_fct=4: Current_pincode_ID <= 188                        0.487634
## Current_pincode_ID_fct=5: Current_pincode_ID <= 201                        0.398476
## Current_pincode_ID_fct=6: Current_pincode_ID <= 212                        0.315290
## Current_pincode_ID_fct=7: Current_pincode_ID <= 219                        0.162143
## Current_pincode_ID_fct=8: Current_pincode_ID <= 225                        0.107461
## Current_pincode_ID_fct=9: Current_pincode_ID <= 238                         Dropped
## supplier_id_fct=1: supplier_id <= 133                                      0.334523
## supplier_id_fct=10: supplier_id <= 253                                    -0.079193
## supplier_id_fct=11: supplier_id <= 275                                    -0.097272
## supplier_id_fct=12: supplier_id <= 314                                    -0.132506
## supplier_id_fct=13: supplier_id > 314                                     -0.231812
## supplier_id_fct=2: supplier_id <= 149                                      0.341500
## supplier_id_fct=3: supplier_id <= 165                                      0.230836
## supplier_id_fct=4: supplier_id <= 178                                      0.185501
## supplier_id_fct=5: supplier_id <= 196                                      0.150963
## supplier_id_fct=6: supplier_id <= 206                                      0.129583
## supplier_id_fct=7: supplier_id <= 214                                      0.103929
## supplier_id_fct=8: supplier_id <= 225                                      0.065137
## supplier_id_fct=9: supplier_id <= 240                                       Dropped
## branch_id_fct=1: branch_id <= 153                                         -0.450331
## branch_id_fct=10: branch_id <= 284                                         0.055049
## branch_id_fct=11: branch_id > 284                                          0.052659
## branch_id_fct=2: branch_id <= 174                                         -0.492323
## branch_id_fct=3: branch_id <= 184                                         -0.382755
## branch_id_fct=4: branch_id <= 198                                         -0.314757
## branch_id_fct=5: branch_id <= 214                                         -0.208958
## branch_id_fct=6: branch_id <= 222                                         -0.190663
## branch_id_fct=7: branch_id <= 233                                         -0.096355
## branch_id_fct=8: branch_id <= 261                                         -0.264450
## branch_id_fct=9: branch_id <= 276                                           Dropped
## ltv_fct=1: ltv <= 55.63                                                    0.674092
## ltv_fct=10: ltv <= 84.57                                                  -0.119055
## ltv_fct=11: ltv <= 85                                                     -0.256004
## ltv_fct=12: ltv <= 87.8                                                   -0.110868
## ltv_fct=13: ltv <= 89.3                                                   -0.261671
## ltv_fct=14: ltv > 89.3                                                    -0.310022
## ltv_fct=2: ltv <= 62.22                                                    0.565406
## ltv_fct=3: ltv <= 68.34                                                    0.397794
## ltv_fct=4: ltv <= 72.9301                                                  0.269590
## ltv_fct=5: ltv <= 74.31                                                    0.139500
## ltv_fct=6: ltv <= 75                                                       0.090372
## ltv_fct=7: ltv <= 77.39                                                    0.195296
## ltv_fct=8: ltv <= 78.92                                                    0.063365
## ltv_fct=9: ltv <= 83.34                                                     Dropped
## PERFORM_CNS_SCORE_fct=1: PERFORM_CNS_SCORE <= 0                           -0.159161
## PERFORM_CNS_SCORE_fct=2: PERFORM_CNS_SCORE <= 18                          -0.060971
## PERFORM_CNS_SCORE_fct=3: PERFORM_CNS_SCORE <= 441                         -0.249354
## PERFORM_CNS_SCORE_fct=4: PERFORM_CNS_SCORE <= 643                         -0.138944
## PERFORM_CNS_SCORE_fct=5: PERFORM_CNS_SCORE <= 738                         -0.078764
## PERFORM_CNS_SCORE_fct=6: PERFORM_CNS_SCORE <= 824                          0.175027
## PERFORM_CNS_SCORE_fct=7: PERFORM_CNS_SCORE > 824                            Dropped
## disbursed_amount_fct=1: disbursed_amount <= 39134                          0.039263
## disbursed_amount_fct=2: disbursed_amount <= 43615                          0.084088
## disbursed_amount_fct=3: disbursed_amount <= 48555                          0.086379
## disbursed_amount_fct=4: disbursed_amount <= 51908                          0.095899
## disbursed_amount_fct=5: disbursed_amount <= 55400                          0.052241
## disbursed_amount_fct=6: disbursed_amount > 55400                            Dropped
## OutstandingNow_fct=1: OutstandingNow <= 44402                             -0.059569
## OutstandingNow_fct=2: OutstandingNow <= 50314                             -0.122291
## OutstandingNow_fct=3: OutstandingNow <= 171384                            -0.183387
## OutstandingNow_fct=4: OutstandingNow <= 324324                            -0.085655
## OutstandingNow_fct=5: OutstandingNow <= 746271                            -0.113832
## OutstandingNow_fct=6: OutstandingNow > 746271                               Dropped
## PERFORM_CNS_SCORE_DESCRIPTION_fct=1: PERFORM_CNS_SCORE_DESCRIPTION <= 150  0.353738
## PERFORM_CNS_SCORE_DESCRIPTION_fct=2: PERFORM_CNS_SCORE_DESCRIPTION <= 172  0.331067
## PERFORM_CNS_SCORE_DESCRIPTION_fct=3: PERFORM_CNS_SCORE_DESCRIPTION <= 205  0.245147
## PERFORM_CNS_SCORE_DESCRIPTION_fct=4: PERFORM_CNS_SCORE_DESCRIPTION <= 231  0.219787
## PERFORM_CNS_SCORE_DESCRIPTION_fct=5: PERFORM_CNS_SCORE_DESCRIPTION <= 256  0.004902
## PERFORM_CNS_SCORE_DESCRIPTION_fct=6: PERFORM_CNS_SCORE_DESCRIPTION > 256    Dropped
## State_ID_fct=1: State_ID <= 183                                           -0.059697
## State_ID_fct=2: State_ID <= 188                                           -0.029348
## State_ID_fct=3: State_ID <= 206                                           -0.002158
## State_ID_fct=4: State_ID <= 214                                            0.079577
## State_ID_fct=5: State_ID <= 220                                            0.100859
## State_ID_fct=6: State_ID <= 229                                            0.063927
## State_ID_fct=7: State_ID <= 272                                            0.084702
## State_ID_fct=8: State_ID > 272                                              Dropped
## DisAsDiff_fct=1: DisAsDiff <= 13554                                       -0.130242
## DisAsDiff_fct=2: DisAsDiff <= 15670                                       -0.082229
## DisAsDiff_fct=3: DisAsDiff <= 16661                                        0.006175
## DisAsDiff_fct=4: DisAsDiff <= 19822                                       -0.017471
## DisAsDiff_fct=5: DisAsDiff > 19822                                          Dropped
## PRI_DISBURSED_AMOUNT_fct=1: PRI_DISBURSED_AMOUNT <= 218581                -0.255990
## PRI_DISBURSED_AMOUNT_fct=2: PRI_DISBURSED_AMOUNT > 218581                   Dropped
## PRI_OVERDUE_ACCTS_fct=1: PRI_OVERDUE_ACCTS <= 0                            0.144360
## PRI_OVERDUE_ACCTS_fct=2: PRI_OVERDUE_ACCTS > 0                              Dropped
## manufacturer_id_fct=1: manufacturer_id <= 210                              0.123239
## manufacturer_id_fct=2: manufacturer_id <= 221                             -0.003519
## manufacturer_id_fct=3: manufacturer_id <= 228                              0.097721
## manufacturer_id_fct=4: manufacturer_id > 228                                Dropped
## VoterID_flag_fct=1: VoterID_flag = 0                                       0.093489
## VoterID_flag_fct=2: VoterID_flag = 1                                        Dropped
## ShareOverdue_fct=1: ShareOverdue <= -2                                     0.092295
## ShareOverdue_fct=2: ShareOverdue <= -1                                     0.128849
## ShareOverdue_fct=3: ShareOverdue > -1                                       Dropped
## PRI_ACTIVE_ACCTS_fct=1: PRI_ACTIVE_ACCTS <= 0                             -0.227117
## PRI_ACTIVE_ACCTS_fct=2: PRI_ACTIVE_ACCTS <= 1                             -0.242919
## PRI_ACTIVE_ACCTS_fct=3: PRI_ACTIVE_ACCTS <= 3                             -0.209621
## PRI_ACTIVE_ACCTS_fct=4: PRI_ACTIVE_ACCTS > 3                                Dropped
## Day_fct=1: Day <= 28                                                       0.249400
## Day_fct=2: Day <= 30                                                       0.128288
## Day_fct=3: Day > 30                                                         Dropped
## NO_OF_INQUIRIES_fct=1: NO_OF_INQUIRIES <= 0                                0.273240
## NO_OF_INQUIRIES_fct=2: NO_OF_INQUIRIES > 0                                  Dropped
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=1: DELINQUENT_ACCTS_IN_LAST_SIX_M <= 0  0.281950
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=2: DELINQUENT_ACCTS_IN_LAST_SIX_M > 0    Dropped
## Qrt_fct=1: Qrt = 3                                                         0.219055
## Qrt_fct=2: Qrt = 4                                                          Dropped
## YearsOnLoan_fct=1: YearsOnLoan <= 22.8918                                 -0.384106
## YearsOnLoan_fct=2: YearsOnLoan <= 28.8496                                 -0.278357
## YearsOnLoan_fct=3: YearsOnLoan <= 38.8321                                 -0.179941
## YearsOnLoan_fct=4: YearsOnLoan <= 51.8208                                 -0.092981
## YearsOnLoan_fct=5: YearsOnLoan > 51.8208                                    Dropped
## Employment_Type_fct=1: Employment_Type = 203                               0.152449
## Employment_Type_fct=2: Employment_Type = 215                               0.216629
## Employment_Type_fct=3: Employment_Type = 227                                Dropped
## PRIMARY_INSTAL_AMT_fct=1: PRIMARY_INSTAL_AMT <= 1564                      -0.017958
## PRIMARY_INSTAL_AMT_fct=2: PRIMARY_INSTAL_AMT <= 2832                      -0.256020
## PRIMARY_INSTAL_AMT_fct=3: PRIMARY_INSTAL_AMT <= 5033                      -0.144240
## PRIMARY_INSTAL_AMT_fct=4: PRIMARY_INSTAL_AMT <= 25326                     -0.143675
## PRIMARY_INSTAL_AMT_fct=5: PRIMARY_INSTAL_AMT > 25326                        Dropped
## asset_cost_fct=1: asset_cost <= 60098                                      0.072598
## asset_cost_fct=2: asset_cost <= 70561                                      0.149356
## asset_cost_fct=3: asset_cost <= 85738                                      0.177618
## asset_cost_fct=4: asset_cost > 85738                                        Dropped
## SEC_OverdueShareSec_fct=1: SEC_OverdueShareSec <= 0                       -0.059896
## SEC_OverdueShareSec_fct=11: SEC_OverdueShareSec Is Null                   -0.075173
## SEC_OverdueShareSec_fct=8: SEC_OverdueShareSec <= 0.2                      0.286437
## SEC_OverdueShareSec_fct=9: SEC_OverdueShareSec <= 1                         Dropped
## SEC_CURRENT_BALANCE_fct=1: SEC_CURRENT_BALANCE <= 0                       -0.023913
## SEC_CURRENT_BALANCE_fct=10: SEC_CURRENT_BALANCE > 0                         Dropped
## Passport_flag_fct=1: Passport_flag = 0                                    -0.187963
## Passport_flag_fct=2: Passport_flag = 1                                      Dropped
## Driving_flag_fct=1: Driving_flag = 0                                       0.015928
## Driving_flag_fct=2: Driving_flag = 1                                        Dropped
## SEC_INSTAL_AMT_fct=1: SEC_INSTAL_AMT <= 0                                  0.060927
## SEC_INSTAL_AMT_fct=10: SEC_INSTAL_AMT > 0                                   Dropped
## PAN_flag_fct=1: PAN_flag = 0                                              -0.045962
## PAN_flag_fct=2: PAN_flag = 1                                                Dropped
## SEC_OVERDUE_ACCTS_fct=1: SEC_OVERDUE_ACCTS <= 0                             Dropped
## SEC_OVERDUE_ACCTS_fct=10: SEC_OVERDUE_ACCTS > 0                             Dropped
##                                                                           Std. Error
## (Intercept)                                                                 0.199992
## Employee_code_ID_fct=1: Employee_code_ID <= 134                             0.042695
## Employee_code_ID_fct=10: Employee_code_ID <= 242                            0.030348
## Employee_code_ID_fct=11: Employee_code_ID <= 254                            0.029589
## Employee_code_ID_fct=12: Employee_code_ID <= 270                            0.029733
## Employee_code_ID_fct=13: Employee_code_ID <= 289                            0.029307
## Employee_code_ID_fct=14: Employee_code_ID <= 319                            0.028616
## Employee_code_ID_fct=15: Employee_code_ID > 319                             0.032049
## Employee_code_ID_fct=2: Employee_code_ID <= 153                             0.033917
## Employee_code_ID_fct=3: Employee_code_ID <= 170                             0.029777
## Employee_code_ID_fct=4: Employee_code_ID <= 179                             0.033683
## Employee_code_ID_fct=5: Employee_code_ID <= 188                             0.031842
## Employee_code_ID_fct=6: Employee_code_ID <= 199                             0.029672
## Employee_code_ID_fct=7: Employee_code_ID <= 211                             0.029080
## Employee_code_ID_fct=8: Employee_code_ID <= 221                             0.029788
## Employee_code_ID_fct=9: Employee_code_ID <= 233                              Dropped
## Current_pincode_ID_fct=1: Current_pincode_ID <= 143                         0.038836
## Current_pincode_ID_fct=10: Current_pincode_ID <= 257                        0.024511
## Current_pincode_ID_fct=11: Current_pincode_ID <= 291                        0.023582
## Current_pincode_ID_fct=12: Current_pincode_ID > 291                         0.025912
## Current_pincode_ID_fct=2: Current_pincode_ID <= 158                         0.036992
## Current_pincode_ID_fct=3: Current_pincode_ID <= 174                         0.027784
## Current_pincode_ID_fct=4: Current_pincode_ID <= 188                         0.026909
## Current_pincode_ID_fct=5: Current_pincode_ID <= 201                         0.024434
## Current_pincode_ID_fct=6: Current_pincode_ID <= 212                         0.025310
## Current_pincode_ID_fct=7: Current_pincode_ID <= 219                         0.029533
## Current_pincode_ID_fct=8: Current_pincode_ID <= 225                         0.029616
## Current_pincode_ID_fct=9: Current_pincode_ID <= 238                          Dropped
## supplier_id_fct=1: supplier_id <= 133                                       0.042484
## supplier_id_fct=10: supplier_id <= 253                                      0.027024
## supplier_id_fct=11: supplier_id <= 275                                      0.024787
## supplier_id_fct=12: supplier_id <= 314                                      0.025862
## supplier_id_fct=13: supplier_id > 314                                       0.030628
## supplier_id_fct=2: supplier_id <= 149                                       0.037278
## supplier_id_fct=3: supplier_id <= 165                                       0.031069
## supplier_id_fct=4: supplier_id <= 178                                       0.028826
## supplier_id_fct=5: supplier_id <= 196                                       0.025572
## supplier_id_fct=6: supplier_id <= 206                                       0.028729
## supplier_id_fct=7: supplier_id <= 214                                       0.030414
## supplier_id_fct=8: supplier_id <= 225                                       0.025963
## supplier_id_fct=9: supplier_id <= 240                                        Dropped
## branch_id_fct=1: branch_id <= 153                                           0.041043
## branch_id_fct=10: branch_id <= 284                                          0.030432
## branch_id_fct=11: branch_id > 284                                           0.041404
## branch_id_fct=2: branch_id <= 174                                           0.038410
## branch_id_fct=3: branch_id <= 184                                           0.031658
## branch_id_fct=4: branch_id <= 198                                           0.028063
## branch_id_fct=5: branch_id <= 214                                           0.031921
## branch_id_fct=6: branch_id <= 222                                           0.033216
## branch_id_fct=7: branch_id <= 233                                           0.031622
## branch_id_fct=8: branch_id <= 261                                           0.030186
## branch_id_fct=9: branch_id <= 276                                            Dropped
## ltv_fct=1: ltv <= 55.63                                                     0.048302
## ltv_fct=10: ltv <= 84.57                                                    0.029555
## ltv_fct=11: ltv <= 85                                                       0.029887
## ltv_fct=12: ltv <= 87.8                                                     0.032811
## ltv_fct=13: ltv <= 89.3                                                     0.033469
## ltv_fct=14: ltv > 89.3                                                      0.033307
## ltv_fct=2: ltv <= 62.22                                                     0.042449
## ltv_fct=3: ltv <= 68.34                                                     0.037313
## ltv_fct=4: ltv <= 72.9301                                                   0.033359
## ltv_fct=5: ltv <= 74.31                                                     0.034625
## ltv_fct=6: ltv <= 75                                                        0.033609
## ltv_fct=7: ltv <= 77.39                                                     0.030608
## ltv_fct=8: ltv <= 78.92                                                     0.028686
## ltv_fct=9: ltv <= 83.34                                                      Dropped
## PERFORM_CNS_SCORE_fct=1: PERFORM_CNS_SCORE <= 0                             0.072227
## PERFORM_CNS_SCORE_fct=2: PERFORM_CNS_SCORE <= 18                            0.058937
## PERFORM_CNS_SCORE_fct=3: PERFORM_CNS_SCORE <= 441                           0.071422
## PERFORM_CNS_SCORE_fct=4: PERFORM_CNS_SCORE <= 643                           0.061776
## PERFORM_CNS_SCORE_fct=5: PERFORM_CNS_SCORE <= 738                           0.040865
## PERFORM_CNS_SCORE_fct=6: PERFORM_CNS_SCORE <= 824                           0.044894
## PERFORM_CNS_SCORE_fct=7: PERFORM_CNS_SCORE > 824                             Dropped
## disbursed_amount_fct=1: disbursed_amount <= 39134                           0.047683
## disbursed_amount_fct=2: disbursed_amount <= 43615                           0.042217
## disbursed_amount_fct=3: disbursed_amount <= 48555                           0.030895
## disbursed_amount_fct=4: disbursed_amount <= 51908                           0.024756
## disbursed_amount_fct=5: disbursed_amount <= 55400                           0.019087
## disbursed_amount_fct=6: disbursed_amount > 55400                             Dropped
## OutstandingNow_fct=1: OutstandingNow <= 44402                               0.058208
## OutstandingNow_fct=2: OutstandingNow <= 50314                               0.054488
## OutstandingNow_fct=3: OutstandingNow <= 171384                              0.048486
## OutstandingNow_fct=4: OutstandingNow <= 324324                              0.042853
## OutstandingNow_fct=5: OutstandingNow <= 746271                              0.039039
## OutstandingNow_fct=6: OutstandingNow > 746271                                Dropped
## PERFORM_CNS_SCORE_DESCRIPTION_fct=1: PERFORM_CNS_SCORE_DESCRIPTION <= 150   0.063209
## PERFORM_CNS_SCORE_DESCRIPTION_fct=2: PERFORM_CNS_SCORE_DESCRIPTION <= 172   0.061567
## PERFORM_CNS_SCORE_DESCRIPTION_fct=3: PERFORM_CNS_SCORE_DESCRIPTION <= 205   0.049958
## PERFORM_CNS_SCORE_DESCRIPTION_fct=4: PERFORM_CNS_SCORE_DESCRIPTION <= 231   0.062171
## PERFORM_CNS_SCORE_DESCRIPTION_fct=5: PERFORM_CNS_SCORE_DESCRIPTION <= 256   0.035776
## PERFORM_CNS_SCORE_DESCRIPTION_fct=6: PERFORM_CNS_SCORE_DESCRIPTION > 256     Dropped
## State_ID_fct=1: State_ID <= 183                                             0.049425
## State_ID_fct=2: State_ID <= 188                                             0.046370
## State_ID_fct=3: State_ID <= 206                                             0.043220
## State_ID_fct=4: State_ID <= 214                                             0.042510
## State_ID_fct=5: State_ID <= 220                                             0.049674
## State_ID_fct=6: State_ID <= 229                                             0.051160
## State_ID_fct=7: State_ID <= 272                                             0.044464
## State_ID_fct=8: State_ID > 272                                               Dropped
## DisAsDiff_fct=1: DisAsDiff <= 13554                                         0.039643
## DisAsDiff_fct=2: DisAsDiff <= 15670                                         0.034234
## DisAsDiff_fct=3: DisAsDiff <= 16661                                         0.035854
## DisAsDiff_fct=4: DisAsDiff <= 19822                                         0.025433
## DisAsDiff_fct=5: DisAsDiff > 19822                                           Dropped
## PRI_DISBURSED_AMOUNT_fct=1: PRI_DISBURSED_AMOUNT <= 218581                  0.038731
## PRI_DISBURSED_AMOUNT_fct=2: PRI_DISBURSED_AMOUNT > 218581                    Dropped
## PRI_OVERDUE_ACCTS_fct=1: PRI_OVERDUE_ACCTS <= 0                             0.028659
## PRI_OVERDUE_ACCTS_fct=2: PRI_OVERDUE_ACCTS > 0                               Dropped
## manufacturer_id_fct=1: manufacturer_id <= 210                               0.023889
## manufacturer_id_fct=2: manufacturer_id <= 221                               0.027798
## manufacturer_id_fct=3: manufacturer_id <= 228                               0.026050
## manufacturer_id_fct=4: manufacturer_id > 228                                 Dropped
## VoterID_flag_fct=1: VoterID_flag = 0                                        0.019475
## VoterID_flag_fct=2: VoterID_flag = 1                                         Dropped
## ShareOverdue_fct=1: ShareOverdue <= -2                                      0.031829
## ShareOverdue_fct=2: ShareOverdue <= -1                                      0.023251
## ShareOverdue_fct=3: ShareOverdue > -1                                        Dropped
## PRI_ACTIVE_ACCTS_fct=1: PRI_ACTIVE_ACCTS <= 0                               0.044260
## PRI_ACTIVE_ACCTS_fct=2: PRI_ACTIVE_ACCTS <= 1                               0.033428
## PRI_ACTIVE_ACCTS_fct=3: PRI_ACTIVE_ACCTS <= 3                               0.028821
## PRI_ACTIVE_ACCTS_fct=4: PRI_ACTIVE_ACCTS > 3                                 Dropped
## Day_fct=1: Day <= 28                                                        0.021306
## Day_fct=2: Day <= 30                                                        0.026022
## Day_fct=3: Day > 30                                                          Dropped
## NO_OF_INQUIRIES_fct=1: NO_OF_INQUIRIES <= 0                                 0.017019
## NO_OF_INQUIRIES_fct=2: NO_OF_INQUIRIES > 0                                   Dropped
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=1: DELINQUENT_ACCTS_IN_LAST_SIX_M <= 0   0.024440
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=2: DELINQUENT_ACCTS_IN_LAST_SIX_M > 0     Dropped
## Qrt_fct=1: Qrt = 3                                                          0.011791
## Qrt_fct=2: Qrt = 4                                                           Dropped
## YearsOnLoan_fct=1: YearsOnLoan <= 22.8918                                   0.030618
## YearsOnLoan_fct=2: YearsOnLoan <= 28.8496                                   0.026590
## YearsOnLoan_fct=3: YearsOnLoan <= 38.8321                                   0.025879
## YearsOnLoan_fct=4: YearsOnLoan <= 51.8208                                   0.026471
## YearsOnLoan_fct=5: YearsOnLoan > 51.8208                                     Dropped
## Employment_Type_fct=1: Employment_Type = 203                                0.012252
## Employment_Type_fct=2: Employment_Type = 215                                0.034675
## Employment_Type_fct=3: Employment_Type = 227                                 Dropped
## PRIMARY_INSTAL_AMT_fct=1: PRIMARY_INSTAL_AMT <= 1564                        0.032539
## PRIMARY_INSTAL_AMT_fct=2: PRIMARY_INSTAL_AMT <= 2832                        0.037537
## PRIMARY_INSTAL_AMT_fct=3: PRIMARY_INSTAL_AMT <= 5033                        0.038747
## PRIMARY_INSTAL_AMT_fct=4: PRIMARY_INSTAL_AMT <= 25326                       0.032532
## PRIMARY_INSTAL_AMT_fct=5: PRIMARY_INSTAL_AMT > 25326                         Dropped
## asset_cost_fct=1: asset_cost <= 60098                                       0.045737
## asset_cost_fct=2: asset_cost <= 70561                                       0.029595
## asset_cost_fct=3: asset_cost <= 85738                                       0.022427
## asset_cost_fct=4: asset_cost > 85738                                         Dropped
## SEC_OverdueShareSec_fct=1: SEC_OverdueShareSec <= 0                         0.093246
## SEC_OverdueShareSec_fct=11: SEC_OverdueShareSec Is Null                     0.102431
## SEC_OverdueShareSec_fct=8: SEC_OverdueShareSec <= 0.2                       0.245402
## SEC_OverdueShareSec_fct=9: SEC_OverdueShareSec <= 1                          Dropped
## SEC_CURRENT_BALANCE_fct=1: SEC_CURRENT_BALANCE <= 0                         0.077865
## SEC_CURRENT_BALANCE_fct=10: SEC_CURRENT_BALANCE > 0                          Dropped
## Passport_flag_fct=1: Passport_flag = 0                                      0.137099
## Passport_flag_fct=2: Passport_flag = 1                                       Dropped
## Driving_flag_fct=1: Driving_flag = 0                                        0.038427
## Driving_flag_fct=2: Driving_flag = 1                                         Dropped
## SEC_INSTAL_AMT_fct=1: SEC_INSTAL_AMT <= 0                                   0.077767
## SEC_INSTAL_AMT_fct=10: SEC_INSTAL_AMT > 0                                    Dropped
## PAN_flag_fct=1: PAN_flag = 0                                                0.023009
## PAN_flag_fct=2: PAN_flag = 1                                                 Dropped
## SEC_OVERDUE_ACCTS_fct=1: SEC_OVERDUE_ACCTS <= 0                              Dropped
## SEC_OVERDUE_ACCTS_fct=10: SEC_OVERDUE_ACCTS > 0                              Dropped
##                                                                           z value
## (Intercept)                                                                 3.696
## Employee_code_ID_fct=1: Employee_code_ID <= 134                            17.934
## Employee_code_ID_fct=10: Employee_code_ID <= 242                           -2.539
## Employee_code_ID_fct=11: Employee_code_ID <= 254                           -3.429
## Employee_code_ID_fct=12: Employee_code_ID <= 270                           -4.829
## Employee_code_ID_fct=13: Employee_code_ID <= 289                           -7.986
## Employee_code_ID_fct=14: Employee_code_ID <= 319                           -9.007
## Employee_code_ID_fct=15: Employee_code_ID > 319                           -15.050
## Employee_code_ID_fct=2: Employee_code_ID <= 153                            14.826
## Employee_code_ID_fct=3: Employee_code_ID <= 170                            13.451
## Employee_code_ID_fct=4: Employee_code_ID <= 179                             8.480
## Employee_code_ID_fct=5: Employee_code_ID <= 188                             7.909
## Employee_code_ID_fct=6: Employee_code_ID <= 199                             5.393
## Employee_code_ID_fct=7: Employee_code_ID <= 211                             3.221
## Employee_code_ID_fct=8: Employee_code_ID <= 221                             0.953
## Employee_code_ID_fct=9: Employee_code_ID <= 233                           Dropped
## Current_pincode_ID_fct=1: Current_pincode_ID <= 143                        22.072
## Current_pincode_ID_fct=10: Current_pincode_ID <= 257                       -7.381
## Current_pincode_ID_fct=11: Current_pincode_ID <= 291                      -11.568
## Current_pincode_ID_fct=12: Current_pincode_ID > 291                       -15.118
## Current_pincode_ID_fct=2: Current_pincode_ID <= 158                        20.306
## Current_pincode_ID_fct=3: Current_pincode_ID <= 174                        20.561
## Current_pincode_ID_fct=4: Current_pincode_ID <= 188                        18.122
## Current_pincode_ID_fct=5: Current_pincode_ID <= 201                        16.308
## Current_pincode_ID_fct=6: Current_pincode_ID <= 212                        12.457
## Current_pincode_ID_fct=7: Current_pincode_ID <= 219                         5.490
## Current_pincode_ID_fct=8: Current_pincode_ID <= 225                         3.629
## Current_pincode_ID_fct=9: Current_pincode_ID <= 238                       Dropped
## supplier_id_fct=1: supplier_id <= 133                                       7.874
## supplier_id_fct=10: supplier_id <= 253                                     -2.930
## supplier_id_fct=11: supplier_id <= 275                                     -3.924
## supplier_id_fct=12: supplier_id <= 314                                     -5.124
## supplier_id_fct=13: supplier_id > 314                                      -7.569
## supplier_id_fct=2: supplier_id <= 149                                       9.161
## supplier_id_fct=3: supplier_id <= 165                                       7.430
## supplier_id_fct=4: supplier_id <= 178                                       6.435
## supplier_id_fct=5: supplier_id <= 196                                       5.903
## supplier_id_fct=6: supplier_id <= 206                                       4.511
## supplier_id_fct=7: supplier_id <= 214                                       3.417
## supplier_id_fct=8: supplier_id <= 225                                       2.509
## supplier_id_fct=9: supplier_id <= 240                                     Dropped
## branch_id_fct=1: branch_id <= 153                                         -10.972
## branch_id_fct=10: branch_id <= 284                                          1.809
## branch_id_fct=11: branch_id > 284                                           1.272
## branch_id_fct=2: branch_id <= 174                                         -12.818
## branch_id_fct=3: branch_id <= 184                                         -12.090
## branch_id_fct=4: branch_id <= 198                                         -11.216
## branch_id_fct=5: branch_id <= 214                                          -6.546
## branch_id_fct=6: branch_id <= 222                                          -5.740
## branch_id_fct=7: branch_id <= 233                                          -3.047
## branch_id_fct=8: branch_id <= 261                                          -8.761
## branch_id_fct=9: branch_id <= 276                                         Dropped
## ltv_fct=1: ltv <= 55.63                                                    13.956
## ltv_fct=10: ltv <= 84.57                                                   -4.028
## ltv_fct=11: ltv <= 85                                                      -8.566
## ltv_fct=12: ltv <= 87.8                                                    -3.379
## ltv_fct=13: ltv <= 89.3                                                    -7.818
## ltv_fct=14: ltv > 89.3                                                     -9.308
## ltv_fct=2: ltv <= 62.22                                                    13.320
## ltv_fct=3: ltv <= 68.34                                                    10.661
## ltv_fct=4: ltv <= 72.9301                                                   8.081
## ltv_fct=5: ltv <= 74.31                                                     4.029
## ltv_fct=6: ltv <= 75                                                        2.689
## ltv_fct=7: ltv <= 77.39                                                     6.381
## ltv_fct=8: ltv <= 78.92                                                     2.209
## ltv_fct=9: ltv <= 83.34                                                   Dropped
## PERFORM_CNS_SCORE_fct=1: PERFORM_CNS_SCORE <= 0                            -2.204
## PERFORM_CNS_SCORE_fct=2: PERFORM_CNS_SCORE <= 18                           -1.035
## PERFORM_CNS_SCORE_fct=3: PERFORM_CNS_SCORE <= 441                          -3.491
## PERFORM_CNS_SCORE_fct=4: PERFORM_CNS_SCORE <= 643                          -2.249
## PERFORM_CNS_SCORE_fct=5: PERFORM_CNS_SCORE <= 738                          -1.927
## PERFORM_CNS_SCORE_fct=6: PERFORM_CNS_SCORE <= 824                           3.899
## PERFORM_CNS_SCORE_fct=7: PERFORM_CNS_SCORE > 824                          Dropped
## disbursed_amount_fct=1: disbursed_amount <= 39134                           0.823
## disbursed_amount_fct=2: disbursed_amount <= 43615                           1.992
## disbursed_amount_fct=3: disbursed_amount <= 48555                           2.796
## disbursed_amount_fct=4: disbursed_amount <= 51908                           3.874
## disbursed_amount_fct=5: disbursed_amount <= 55400                           2.737
## disbursed_amount_fct=6: disbursed_amount > 55400                          Dropped
## OutstandingNow_fct=1: OutstandingNow <= 44402                              -1.023
## OutstandingNow_fct=2: OutstandingNow <= 50314                              -2.244
## OutstandingNow_fct=3: OutstandingNow <= 171384                             -3.782
## OutstandingNow_fct=4: OutstandingNow <= 324324                             -1.999
## OutstandingNow_fct=5: OutstandingNow <= 746271                             -2.916
## OutstandingNow_fct=6: OutstandingNow > 746271                             Dropped
## PERFORM_CNS_SCORE_DESCRIPTION_fct=1: PERFORM_CNS_SCORE_DESCRIPTION <= 150   5.596
## PERFORM_CNS_SCORE_DESCRIPTION_fct=2: PERFORM_CNS_SCORE_DESCRIPTION <= 172   5.377
## PERFORM_CNS_SCORE_DESCRIPTION_fct=3: PERFORM_CNS_SCORE_DESCRIPTION <= 205   4.907
## PERFORM_CNS_SCORE_DESCRIPTION_fct=4: PERFORM_CNS_SCORE_DESCRIPTION <= 231   3.535
## PERFORM_CNS_SCORE_DESCRIPTION_fct=5: PERFORM_CNS_SCORE_DESCRIPTION <= 256   0.137
## PERFORM_CNS_SCORE_DESCRIPTION_fct=6: PERFORM_CNS_SCORE_DESCRIPTION > 256  Dropped
## State_ID_fct=1: State_ID <= 183                                            -1.208
## State_ID_fct=2: State_ID <= 188                                            -0.633
## State_ID_fct=3: State_ID <= 206                                            -0.050
## State_ID_fct=4: State_ID <= 214                                             1.872
## State_ID_fct=5: State_ID <= 220                                             2.030
## State_ID_fct=6: State_ID <= 229                                             1.250
## State_ID_fct=7: State_ID <= 272                                             1.905
## State_ID_fct=8: State_ID > 272                                            Dropped
## DisAsDiff_fct=1: DisAsDiff <= 13554                                        -3.285
## DisAsDiff_fct=2: DisAsDiff <= 15670                                        -2.402
## DisAsDiff_fct=3: DisAsDiff <= 16661                                         0.172
## DisAsDiff_fct=4: DisAsDiff <= 19822                                        -0.687
## DisAsDiff_fct=5: DisAsDiff > 19822                                        Dropped
## PRI_DISBURSED_AMOUNT_fct=1: PRI_DISBURSED_AMOUNT <= 218581                 -6.609
## PRI_DISBURSED_AMOUNT_fct=2: PRI_DISBURSED_AMOUNT > 218581                 Dropped
## PRI_OVERDUE_ACCTS_fct=1: PRI_OVERDUE_ACCTS <= 0                             5.037
## PRI_OVERDUE_ACCTS_fct=2: PRI_OVERDUE_ACCTS > 0                            Dropped
## manufacturer_id_fct=1: manufacturer_id <= 210                               5.159
## manufacturer_id_fct=2: manufacturer_id <= 221                              -0.127
## manufacturer_id_fct=3: manufacturer_id <= 228                               3.751
## manufacturer_id_fct=4: manufacturer_id > 228                              Dropped
## VoterID_flag_fct=1: VoterID_flag = 0                                        4.800
## VoterID_flag_fct=2: VoterID_flag = 1                                      Dropped
## ShareOverdue_fct=1: ShareOverdue <= -2                                      2.900
## ShareOverdue_fct=2: ShareOverdue <= -1                                      5.542
## ShareOverdue_fct=3: ShareOverdue > -1                                     Dropped
## PRI_ACTIVE_ACCTS_fct=1: PRI_ACTIVE_ACCTS <= 0                              -5.131
## PRI_ACTIVE_ACCTS_fct=2: PRI_ACTIVE_ACCTS <= 1                              -7.267
## PRI_ACTIVE_ACCTS_fct=3: PRI_ACTIVE_ACCTS <= 3                              -7.273
## PRI_ACTIVE_ACCTS_fct=4: PRI_ACTIVE_ACCTS > 3                              Dropped
## Day_fct=1: Day <= 28                                                       11.705
## Day_fct=2: Day <= 30                                                        4.930
## Day_fct=3: Day > 30                                                       Dropped
## NO_OF_INQUIRIES_fct=1: NO_OF_INQUIRIES <= 0                                16.055
## NO_OF_INQUIRIES_fct=2: NO_OF_INQUIRIES > 0                                Dropped
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=1: DELINQUENT_ACCTS_IN_LAST_SIX_M <= 0  11.536
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=2: DELINQUENT_ACCTS_IN_LAST_SIX_M > 0  Dropped
## Qrt_fct=1: Qrt = 3                                                         18.577
## Qrt_fct=2: Qrt = 4                                                        Dropped
## YearsOnLoan_fct=1: YearsOnLoan <= 22.8918                                 -12.545
## YearsOnLoan_fct=2: YearsOnLoan <= 28.8496                                 -10.468
## YearsOnLoan_fct=3: YearsOnLoan <= 38.8321                                  -6.953
## YearsOnLoan_fct=4: YearsOnLoan <= 51.8208                                  -3.513
## YearsOnLoan_fct=5: YearsOnLoan > 51.8208                                  Dropped
## Employment_Type_fct=1: Employment_Type = 203                               12.443
## Employment_Type_fct=2: Employment_Type = 215                                6.247
## Employment_Type_fct=3: Employment_Type = 227                              Dropped
## PRIMARY_INSTAL_AMT_fct=1: PRIMARY_INSTAL_AMT <= 1564                       -0.552
## PRIMARY_INSTAL_AMT_fct=2: PRIMARY_INSTAL_AMT <= 2832                       -6.820
## PRIMARY_INSTAL_AMT_fct=3: PRIMARY_INSTAL_AMT <= 5033                       -3.723
## PRIMARY_INSTAL_AMT_fct=4: PRIMARY_INSTAL_AMT <= 25326                      -4.416
## PRIMARY_INSTAL_AMT_fct=5: PRIMARY_INSTAL_AMT > 25326                      Dropped
## asset_cost_fct=1: asset_cost <= 60098                                       1.587
## asset_cost_fct=2: asset_cost <= 70561                                       5.047
## asset_cost_fct=3: asset_cost <= 85738                                       7.920
## asset_cost_fct=4: asset_cost > 85738                                      Dropped
## SEC_OverdueShareSec_fct=1: SEC_OverdueShareSec <= 0                        -0.642
## SEC_OverdueShareSec_fct=11: SEC_OverdueShareSec Is Null                    -0.734
## SEC_OverdueShareSec_fct=8: SEC_OverdueShareSec <= 0.2                       1.167
## SEC_OverdueShareSec_fct=9: SEC_OverdueShareSec <= 1                       Dropped
## SEC_CURRENT_BALANCE_fct=1: SEC_CURRENT_BALANCE <= 0                        -0.307
## SEC_CURRENT_BALANCE_fct=10: SEC_CURRENT_BALANCE > 0                       Dropped
## Passport_flag_fct=1: Passport_flag = 0                                     -1.371
## Passport_flag_fct=2: Passport_flag = 1                                    Dropped
## Driving_flag_fct=1: Driving_flag = 0                                        0.414
## Driving_flag_fct=2: Driving_flag = 1                                      Dropped
## SEC_INSTAL_AMT_fct=1: SEC_INSTAL_AMT <= 0                                   0.783
## SEC_INSTAL_AMT_fct=10: SEC_INSTAL_AMT > 0                                 Dropped
## PAN_flag_fct=1: PAN_flag = 0                                               -1.998
## PAN_flag_fct=2: PAN_flag = 1                                              Dropped
## SEC_OVERDUE_ACCTS_fct=1: SEC_OVERDUE_ACCTS <= 0                           Dropped
## SEC_OVERDUE_ACCTS_fct=10: SEC_OVERDUE_ACCTS > 0                           Dropped
##                                                                                       Pr(>|z|)
## (Intercept)                                                                           0.000219
## Employee_code_ID_fct=1: Employee_code_ID <= 134                           0.000000000000000222
## Employee_code_ID_fct=10: Employee_code_ID <= 242                                      0.011127
## Employee_code_ID_fct=11: Employee_code_ID <= 254                                      0.000605
## Employee_code_ID_fct=12: Employee_code_ID <= 270                          0.000001373400840832
## Employee_code_ID_fct=13: Employee_code_ID <= 289                          0.000000000000000222
## Employee_code_ID_fct=14: Employee_code_ID <= 319                          0.000000000000000222
## Employee_code_ID_fct=15: Employee_code_ID > 319                           0.000000000000000222
## Employee_code_ID_fct=2: Employee_code_ID <= 153                           0.000000000000000222
## Employee_code_ID_fct=3: Employee_code_ID <= 170                           0.000000000000000222
## Employee_code_ID_fct=4: Employee_code_ID <= 179                           0.000000000000000222
## Employee_code_ID_fct=5: Employee_code_ID <= 188                           0.000000000000000222
## Employee_code_ID_fct=6: Employee_code_ID <= 199                           0.000000069185031482
## Employee_code_ID_fct=7: Employee_code_ID <= 211                                       0.001279
## Employee_code_ID_fct=8: Employee_code_ID <= 221                                       0.340780
## Employee_code_ID_fct=9: Employee_code_ID <= 233                                        Dropped
## Current_pincode_ID_fct=1: Current_pincode_ID <= 143                       0.000000000000000222
## Current_pincode_ID_fct=10: Current_pincode_ID <= 257                      0.000000000000000222
## Current_pincode_ID_fct=11: Current_pincode_ID <= 291                      0.000000000000000222
## Current_pincode_ID_fct=12: Current_pincode_ID > 291                       0.000000000000000222
## Current_pincode_ID_fct=2: Current_pincode_ID <= 158                       0.000000000000000222
## Current_pincode_ID_fct=3: Current_pincode_ID <= 174                       0.000000000000000222
## Current_pincode_ID_fct=4: Current_pincode_ID <= 188                       0.000000000000000222
## Current_pincode_ID_fct=5: Current_pincode_ID <= 201                       0.000000000000000222
## Current_pincode_ID_fct=6: Current_pincode_ID <= 212                       0.000000000000000222
## Current_pincode_ID_fct=7: Current_pincode_ID <= 219                       0.000000040128663947
## Current_pincode_ID_fct=8: Current_pincode_ID <= 225                                   0.000285
## Current_pincode_ID_fct=9: Current_pincode_ID <= 238                                    Dropped
## supplier_id_fct=1: supplier_id <= 133                                     0.000000000000000222
## supplier_id_fct=10: supplier_id <= 253                                                0.003385
## supplier_id_fct=11: supplier_id <= 275                                    0.000086946671635557
## supplier_id_fct=12: supplier_id <= 314                                    0.000000299739931098
## supplier_id_fct=13: supplier_id > 314                                     0.000000000000000222
## supplier_id_fct=2: supplier_id <= 149                                     0.000000000000000222
## supplier_id_fct=3: supplier_id <= 165                                     0.000000000000000222
## supplier_id_fct=4: supplier_id <= 178                                     0.000000000123338895
## supplier_id_fct=5: supplier_id <= 196                                     0.000000003559513795
## supplier_id_fct=6: supplier_id <= 206                                     0.000006464945373930
## supplier_id_fct=7: supplier_id <= 214                                                 0.000633
## supplier_id_fct=8: supplier_id <= 225                                                 0.012114
## supplier_id_fct=9: supplier_id <= 240                                                  Dropped
## branch_id_fct=1: branch_id <= 153                                         0.000000000000000222
## branch_id_fct=10: branch_id <= 284                                                    0.070464
## branch_id_fct=11: branch_id > 284                                                     0.203439
## branch_id_fct=2: branch_id <= 174                                         0.000000000000000222
## branch_id_fct=3: branch_id <= 184                                         0.000000000000000222
## branch_id_fct=4: branch_id <= 198                                         0.000000000000000222
## branch_id_fct=5: branch_id <= 214                                         0.000000000059036109
## branch_id_fct=6: branch_id <= 222                                         0.000000009467683748
## branch_id_fct=7: branch_id <= 233                                                     0.002310
## branch_id_fct=8: branch_id <= 261                                         0.000000000000000222
## branch_id_fct=9: branch_id <= 276                                                      Dropped
## ltv_fct=1: ltv <= 55.63                                                   0.000000000000000222
## ltv_fct=10: ltv <= 84.57                                                  0.000056194942485766
## ltv_fct=11: ltv <= 85                                                     0.000000000000000222
## ltv_fct=12: ltv <= 87.8                                                               0.000728
## ltv_fct=13: ltv <= 89.3                                                   0.000000000000000222
## ltv_fct=14: ltv > 89.3                                                    0.000000000000000222
## ltv_fct=2: ltv <= 62.22                                                   0.000000000000000222
## ltv_fct=3: ltv <= 68.34                                                   0.000000000000000222
## ltv_fct=4: ltv <= 72.9301                                                 0.000000000000000222
## ltv_fct=5: ltv <= 74.31                                                   0.000056050266953989
## ltv_fct=6: ltv <= 75                                                                  0.007169
## ltv_fct=7: ltv <= 77.39                                                   0.000000000176378467
## ltv_fct=8: ltv <= 78.92                                                               0.027181
## ltv_fct=9: ltv <= 83.34                                                                Dropped
## PERFORM_CNS_SCORE_fct=1: PERFORM_CNS_SCORE <= 0                                       0.027551
## PERFORM_CNS_SCORE_fct=2: PERFORM_CNS_SCORE <= 18                                      0.300902
## PERFORM_CNS_SCORE_fct=3: PERFORM_CNS_SCORE <= 441                                     0.000481
## PERFORM_CNS_SCORE_fct=4: PERFORM_CNS_SCORE <= 643                                     0.024504
## PERFORM_CNS_SCORE_fct=5: PERFORM_CNS_SCORE <= 738                                     0.053931
## PERFORM_CNS_SCORE_fct=6: PERFORM_CNS_SCORE <= 824                         0.000096707880559599
## PERFORM_CNS_SCORE_fct=7: PERFORM_CNS_SCORE > 824                                       Dropped
## disbursed_amount_fct=1: disbursed_amount <= 39134                                     0.410269
## disbursed_amount_fct=2: disbursed_amount <= 43615                                     0.046391
## disbursed_amount_fct=3: disbursed_amount <= 48555                                     0.005176
## disbursed_amount_fct=4: disbursed_amount <= 51908                                     0.000107
## disbursed_amount_fct=5: disbursed_amount <= 55400                                     0.006200
## disbursed_amount_fct=6: disbursed_amount > 55400                                       Dropped
## OutstandingNow_fct=1: OutstandingNow <= 44402                                         0.306128
## OutstandingNow_fct=2: OutstandingNow <= 50314                                         0.024810
## OutstandingNow_fct=3: OutstandingNow <= 171384                                        0.000155
## OutstandingNow_fct=4: OutstandingNow <= 324324                                        0.045628
## OutstandingNow_fct=5: OutstandingNow <= 746271                                        0.003547
## OutstandingNow_fct=6: OutstandingNow > 746271                                          Dropped
## PERFORM_CNS_SCORE_DESCRIPTION_fct=1: PERFORM_CNS_SCORE_DESCRIPTION <= 150 0.000000021892954560
## PERFORM_CNS_SCORE_DESCRIPTION_fct=2: PERFORM_CNS_SCORE_DESCRIPTION <= 172 0.000000075610862016
## PERFORM_CNS_SCORE_DESCRIPTION_fct=3: PERFORM_CNS_SCORE_DESCRIPTION <= 205 0.000000924302139271
## PERFORM_CNS_SCORE_DESCRIPTION_fct=4: PERFORM_CNS_SCORE_DESCRIPTION <= 231             0.000407
## PERFORM_CNS_SCORE_DESCRIPTION_fct=5: PERFORM_CNS_SCORE_DESCRIPTION <= 256             0.891012
## PERFORM_CNS_SCORE_DESCRIPTION_fct=6: PERFORM_CNS_SCORE_DESCRIPTION > 256               Dropped
## State_ID_fct=1: State_ID <= 183                                                       0.227109
## State_ID_fct=2: State_ID <= 188                                                       0.526790
## State_ID_fct=3: State_ID <= 206                                                       0.960185
## State_ID_fct=4: State_ID <= 214                                                       0.061214
## State_ID_fct=5: State_ID <= 220                                                       0.042315
## State_ID_fct=6: State_ID <= 229                                                       0.211464
## State_ID_fct=7: State_ID <= 272                                                       0.056786
## State_ID_fct=8: State_ID > 272                                                         Dropped
## DisAsDiff_fct=1: DisAsDiff <= 13554                                                   0.001018
## DisAsDiff_fct=2: DisAsDiff <= 15670                                                   0.016306
## DisAsDiff_fct=3: DisAsDiff <= 16661                                                   0.863261
## DisAsDiff_fct=4: DisAsDiff <= 19822                                                   0.492117
## DisAsDiff_fct=5: DisAsDiff > 19822                                                     Dropped
## PRI_DISBURSED_AMOUNT_fct=1: PRI_DISBURSED_AMOUNT <= 218581                0.000000000038574033
## PRI_DISBURSED_AMOUNT_fct=2: PRI_DISBURSED_AMOUNT > 218581                              Dropped
## PRI_OVERDUE_ACCTS_fct=1: PRI_OVERDUE_ACCTS <= 0                           0.000000472350173641
## PRI_OVERDUE_ACCTS_fct=2: PRI_OVERDUE_ACCTS > 0                                         Dropped
## manufacturer_id_fct=1: manufacturer_id <= 210                             0.000000248386704094
## manufacturer_id_fct=2: manufacturer_id <= 221                                         0.899275
## manufacturer_id_fct=3: manufacturer_id <= 228                                         0.000176
## manufacturer_id_fct=4: manufacturer_id > 228                                           Dropped
## VoterID_flag_fct=1: VoterID_flag = 0                                      0.000001583545196970
## VoterID_flag_fct=2: VoterID_flag = 1                                                   Dropped
## ShareOverdue_fct=1: ShareOverdue <= -2                                                0.003735
## ShareOverdue_fct=2: ShareOverdue <= -1                                    0.000000029958245662
## ShareOverdue_fct=3: ShareOverdue > -1                                                  Dropped
## PRI_ACTIVE_ACCTS_fct=1: PRI_ACTIVE_ACCTS <= 0                             0.000000287604146276
## PRI_ACTIVE_ACCTS_fct=2: PRI_ACTIVE_ACCTS <= 1                             0.000000000000000222
## PRI_ACTIVE_ACCTS_fct=3: PRI_ACTIVE_ACCTS <= 3                             0.000000000000000222
## PRI_ACTIVE_ACCTS_fct=4: PRI_ACTIVE_ACCTS > 3                                           Dropped
## Day_fct=1: Day <= 28                                                      0.000000000000000222
## Day_fct=2: Day <= 30                                                      0.000000822043672688
## Day_fct=3: Day > 30                                                                    Dropped
## NO_OF_INQUIRIES_fct=1: NO_OF_INQUIRIES <= 0                               0.000000000000000222
## NO_OF_INQUIRIES_fct=2: NO_OF_INQUIRIES > 0                                             Dropped
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=1: DELINQUENT_ACCTS_IN_LAST_SIX_M <= 0 0.000000000000000222
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=2: DELINQUENT_ACCTS_IN_LAST_SIX_M > 0               Dropped
## Qrt_fct=1: Qrt = 3                                                        0.000000000000000222
## Qrt_fct=2: Qrt = 4                                                                     Dropped
## YearsOnLoan_fct=1: YearsOnLoan <= 22.8918                                 0.000000000000000222
## YearsOnLoan_fct=2: YearsOnLoan <= 28.8496                                 0.000000000000000222
## YearsOnLoan_fct=3: YearsOnLoan <= 38.8321                                 0.000000000003574030
## YearsOnLoan_fct=4: YearsOnLoan <= 51.8208                                             0.000444
## YearsOnLoan_fct=5: YearsOnLoan > 51.8208                                               Dropped
## Employment_Type_fct=1: Employment_Type = 203                              0.000000000000000222
## Employment_Type_fct=2: Employment_Type = 215                              0.000000000417368806
## Employment_Type_fct=3: Employment_Type = 227                                           Dropped
## PRIMARY_INSTAL_AMT_fct=1: PRIMARY_INSTAL_AMT <= 1564                                  0.581034
## PRIMARY_INSTAL_AMT_fct=2: PRIMARY_INSTAL_AMT <= 2832                      0.000000000009078738
## PRIMARY_INSTAL_AMT_fct=3: PRIMARY_INSTAL_AMT <= 5033                                  0.000197
## PRIMARY_INSTAL_AMT_fct=4: PRIMARY_INSTAL_AMT <= 25326                     0.000010036155548399
## PRIMARY_INSTAL_AMT_fct=5: PRIMARY_INSTAL_AMT > 25326                                   Dropped
## asset_cost_fct=1: asset_cost <= 60098                                                 0.112444
## asset_cost_fct=2: asset_cost <= 70561                                     0.000000449390024304
## asset_cost_fct=3: asset_cost <= 85738                                     0.000000000000000222
## asset_cost_fct=4: asset_cost > 85738                                                   Dropped
## SEC_OverdueShareSec_fct=1: SEC_OverdueShareSec <= 0                                   0.520648
## SEC_OverdueShareSec_fct=11: SEC_OverdueShareSec Is Null                               0.463016
## SEC_OverdueShareSec_fct=8: SEC_OverdueShareSec <= 0.2                                 0.243122
## SEC_OverdueShareSec_fct=9: SEC_OverdueShareSec <= 1                                    Dropped
## SEC_CURRENT_BALANCE_fct=1: SEC_CURRENT_BALANCE <= 0                                   0.758758
## SEC_CURRENT_BALANCE_fct=10: SEC_CURRENT_BALANCE > 0                                    Dropped
## Passport_flag_fct=1: Passport_flag = 0                                                0.170376
## Passport_flag_fct=2: Passport_flag = 1                                                 Dropped
## Driving_flag_fct=1: Driving_flag = 0                                                  0.678510
## Driving_flag_fct=2: Driving_flag = 1                                                   Dropped
## SEC_INSTAL_AMT_fct=1: SEC_INSTAL_AMT <= 0                                             0.433359
## SEC_INSTAL_AMT_fct=10: SEC_INSTAL_AMT > 0                                              Dropped
## PAN_flag_fct=1: PAN_flag = 0                                                          0.045763
## PAN_flag_fct=2: PAN_flag = 1                                                           Dropped
## SEC_OVERDUE_ACCTS_fct=1: SEC_OVERDUE_ACCTS <= 0                                        Dropped
## SEC_OVERDUE_ACCTS_fct=10: SEC_OVERDUE_ACCTS > 0                                        Dropped
##                                                                              
## (Intercept)                                                               ***
## Employee_code_ID_fct=1: Employee_code_ID <= 134                           ***
## Employee_code_ID_fct=10: Employee_code_ID <= 242                          *  
## Employee_code_ID_fct=11: Employee_code_ID <= 254                          ***
## Employee_code_ID_fct=12: Employee_code_ID <= 270                          ***
## Employee_code_ID_fct=13: Employee_code_ID <= 289                          ***
## Employee_code_ID_fct=14: Employee_code_ID <= 319                          ***
## Employee_code_ID_fct=15: Employee_code_ID > 319                           ***
## Employee_code_ID_fct=2: Employee_code_ID <= 153                           ***
## Employee_code_ID_fct=3: Employee_code_ID <= 170                           ***
## Employee_code_ID_fct=4: Employee_code_ID <= 179                           ***
## Employee_code_ID_fct=5: Employee_code_ID <= 188                           ***
## Employee_code_ID_fct=6: Employee_code_ID <= 199                           ***
## Employee_code_ID_fct=7: Employee_code_ID <= 211                           ** 
## Employee_code_ID_fct=8: Employee_code_ID <= 221                              
## Employee_code_ID_fct=9: Employee_code_ID <= 233                              
## Current_pincode_ID_fct=1: Current_pincode_ID <= 143                       ***
## Current_pincode_ID_fct=10: Current_pincode_ID <= 257                      ***
## Current_pincode_ID_fct=11: Current_pincode_ID <= 291                      ***
## Current_pincode_ID_fct=12: Current_pincode_ID > 291                       ***
## Current_pincode_ID_fct=2: Current_pincode_ID <= 158                       ***
## Current_pincode_ID_fct=3: Current_pincode_ID <= 174                       ***
## Current_pincode_ID_fct=4: Current_pincode_ID <= 188                       ***
## Current_pincode_ID_fct=5: Current_pincode_ID <= 201                       ***
## Current_pincode_ID_fct=6: Current_pincode_ID <= 212                       ***
## Current_pincode_ID_fct=7: Current_pincode_ID <= 219                       ***
## Current_pincode_ID_fct=8: Current_pincode_ID <= 225                       ***
## Current_pincode_ID_fct=9: Current_pincode_ID <= 238                          
## supplier_id_fct=1: supplier_id <= 133                                     ***
## supplier_id_fct=10: supplier_id <= 253                                    ** 
## supplier_id_fct=11: supplier_id <= 275                                    ***
## supplier_id_fct=12: supplier_id <= 314                                    ***
## supplier_id_fct=13: supplier_id > 314                                     ***
## supplier_id_fct=2: supplier_id <= 149                                     ***
## supplier_id_fct=3: supplier_id <= 165                                     ***
## supplier_id_fct=4: supplier_id <= 178                                     ***
## supplier_id_fct=5: supplier_id <= 196                                     ***
## supplier_id_fct=6: supplier_id <= 206                                     ***
## supplier_id_fct=7: supplier_id <= 214                                     ***
## supplier_id_fct=8: supplier_id <= 225                                     *  
## supplier_id_fct=9: supplier_id <= 240                                        
## branch_id_fct=1: branch_id <= 153                                         ***
## branch_id_fct=10: branch_id <= 284                                        .  
## branch_id_fct=11: branch_id > 284                                            
## branch_id_fct=2: branch_id <= 174                                         ***
## branch_id_fct=3: branch_id <= 184                                         ***
## branch_id_fct=4: branch_id <= 198                                         ***
## branch_id_fct=5: branch_id <= 214                                         ***
## branch_id_fct=6: branch_id <= 222                                         ***
## branch_id_fct=7: branch_id <= 233                                         ** 
## branch_id_fct=8: branch_id <= 261                                         ***
## branch_id_fct=9: branch_id <= 276                                            
## ltv_fct=1: ltv <= 55.63                                                   ***
## ltv_fct=10: ltv <= 84.57                                                  ***
## ltv_fct=11: ltv <= 85                                                     ***
## ltv_fct=12: ltv <= 87.8                                                   ***
## ltv_fct=13: ltv <= 89.3                                                   ***
## ltv_fct=14: ltv > 89.3                                                    ***
## ltv_fct=2: ltv <= 62.22                                                   ***
## ltv_fct=3: ltv <= 68.34                                                   ***
## ltv_fct=4: ltv <= 72.9301                                                 ***
## ltv_fct=5: ltv <= 74.31                                                   ***
## ltv_fct=6: ltv <= 75                                                      ** 
## ltv_fct=7: ltv <= 77.39                                                   ***
## ltv_fct=8: ltv <= 78.92                                                   *  
## ltv_fct=9: ltv <= 83.34                                                      
## PERFORM_CNS_SCORE_fct=1: PERFORM_CNS_SCORE <= 0                           *  
## PERFORM_CNS_SCORE_fct=2: PERFORM_CNS_SCORE <= 18                             
## PERFORM_CNS_SCORE_fct=3: PERFORM_CNS_SCORE <= 441                         ***
## PERFORM_CNS_SCORE_fct=4: PERFORM_CNS_SCORE <= 643                         *  
## PERFORM_CNS_SCORE_fct=5: PERFORM_CNS_SCORE <= 738                         .  
## PERFORM_CNS_SCORE_fct=6: PERFORM_CNS_SCORE <= 824                         ***
## PERFORM_CNS_SCORE_fct=7: PERFORM_CNS_SCORE > 824                             
## disbursed_amount_fct=1: disbursed_amount <= 39134                            
## disbursed_amount_fct=2: disbursed_amount <= 43615                         *  
## disbursed_amount_fct=3: disbursed_amount <= 48555                         ** 
## disbursed_amount_fct=4: disbursed_amount <= 51908                         ***
## disbursed_amount_fct=5: disbursed_amount <= 55400                         ** 
## disbursed_amount_fct=6: disbursed_amount > 55400                             
## OutstandingNow_fct=1: OutstandingNow <= 44402                                
## OutstandingNow_fct=2: OutstandingNow <= 50314                             *  
## OutstandingNow_fct=3: OutstandingNow <= 171384                            ***
## OutstandingNow_fct=4: OutstandingNow <= 324324                            *  
## OutstandingNow_fct=5: OutstandingNow <= 746271                            ** 
## OutstandingNow_fct=6: OutstandingNow > 746271                                
## PERFORM_CNS_SCORE_DESCRIPTION_fct=1: PERFORM_CNS_SCORE_DESCRIPTION <= 150 ***
## PERFORM_CNS_SCORE_DESCRIPTION_fct=2: PERFORM_CNS_SCORE_DESCRIPTION <= 172 ***
## PERFORM_CNS_SCORE_DESCRIPTION_fct=3: PERFORM_CNS_SCORE_DESCRIPTION <= 205 ***
## PERFORM_CNS_SCORE_DESCRIPTION_fct=4: PERFORM_CNS_SCORE_DESCRIPTION <= 231 ***
## PERFORM_CNS_SCORE_DESCRIPTION_fct=5: PERFORM_CNS_SCORE_DESCRIPTION <= 256    
## PERFORM_CNS_SCORE_DESCRIPTION_fct=6: PERFORM_CNS_SCORE_DESCRIPTION > 256     
## State_ID_fct=1: State_ID <= 183                                              
## State_ID_fct=2: State_ID <= 188                                              
## State_ID_fct=3: State_ID <= 206                                              
## State_ID_fct=4: State_ID <= 214                                           .  
## State_ID_fct=5: State_ID <= 220                                           *  
## State_ID_fct=6: State_ID <= 229                                              
## State_ID_fct=7: State_ID <= 272                                           .  
## State_ID_fct=8: State_ID > 272                                               
## DisAsDiff_fct=1: DisAsDiff <= 13554                                       ** 
## DisAsDiff_fct=2: DisAsDiff <= 15670                                       *  
## DisAsDiff_fct=3: DisAsDiff <= 16661                                          
## DisAsDiff_fct=4: DisAsDiff <= 19822                                          
## DisAsDiff_fct=5: DisAsDiff > 19822                                           
## PRI_DISBURSED_AMOUNT_fct=1: PRI_DISBURSED_AMOUNT <= 218581                ***
## PRI_DISBURSED_AMOUNT_fct=2: PRI_DISBURSED_AMOUNT > 218581                    
## PRI_OVERDUE_ACCTS_fct=1: PRI_OVERDUE_ACCTS <= 0                           ***
## PRI_OVERDUE_ACCTS_fct=2: PRI_OVERDUE_ACCTS > 0                               
## manufacturer_id_fct=1: manufacturer_id <= 210                             ***
## manufacturer_id_fct=2: manufacturer_id <= 221                                
## manufacturer_id_fct=3: manufacturer_id <= 228                             ***
## manufacturer_id_fct=4: manufacturer_id > 228                                 
## VoterID_flag_fct=1: VoterID_flag = 0                                      ***
## VoterID_flag_fct=2: VoterID_flag = 1                                         
## ShareOverdue_fct=1: ShareOverdue <= -2                                    ** 
## ShareOverdue_fct=2: ShareOverdue <= -1                                    ***
## ShareOverdue_fct=3: ShareOverdue > -1                                        
## PRI_ACTIVE_ACCTS_fct=1: PRI_ACTIVE_ACCTS <= 0                             ***
## PRI_ACTIVE_ACCTS_fct=2: PRI_ACTIVE_ACCTS <= 1                             ***
## PRI_ACTIVE_ACCTS_fct=3: PRI_ACTIVE_ACCTS <= 3                             ***
## PRI_ACTIVE_ACCTS_fct=4: PRI_ACTIVE_ACCTS > 3                                 
## Day_fct=1: Day <= 28                                                      ***
## Day_fct=2: Day <= 30                                                      ***
## Day_fct=3: Day > 30                                                          
## NO_OF_INQUIRIES_fct=1: NO_OF_INQUIRIES <= 0                               ***
## NO_OF_INQUIRIES_fct=2: NO_OF_INQUIRIES > 0                                   
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=1: DELINQUENT_ACCTS_IN_LAST_SIX_M <= 0 ***
## DELINQUENT_ACCTS_IN_LAST_SIX_M_fct=2: DELINQUENT_ACCTS_IN_LAST_SIX_M > 0     
## Qrt_fct=1: Qrt = 3                                                        ***
## Qrt_fct=2: Qrt = 4                                                           
## YearsOnLoan_fct=1: YearsOnLoan <= 22.8918                                 ***
## YearsOnLoan_fct=2: YearsOnLoan <= 28.8496                                 ***
## YearsOnLoan_fct=3: YearsOnLoan <= 38.8321                                 ***
## YearsOnLoan_fct=4: YearsOnLoan <= 51.8208                                 ***
## YearsOnLoan_fct=5: YearsOnLoan > 51.8208                                     
## Employment_Type_fct=1: Employment_Type = 203                              ***
## Employment_Type_fct=2: Employment_Type = 215                              ***
## Employment_Type_fct=3: Employment_Type = 227                                 
## PRIMARY_INSTAL_AMT_fct=1: PRIMARY_INSTAL_AMT <= 1564                         
## PRIMARY_INSTAL_AMT_fct=2: PRIMARY_INSTAL_AMT <= 2832                      ***
## PRIMARY_INSTAL_AMT_fct=3: PRIMARY_INSTAL_AMT <= 5033                      ***
## PRIMARY_INSTAL_AMT_fct=4: PRIMARY_INSTAL_AMT <= 25326                     ***
## PRIMARY_INSTAL_AMT_fct=5: PRIMARY_INSTAL_AMT > 25326                         
## asset_cost_fct=1: asset_cost <= 60098                                        
## asset_cost_fct=2: asset_cost <= 70561                                     ***
## asset_cost_fct=3: asset_cost <= 85738                                     ***
## asset_cost_fct=4: asset_cost > 85738                                         
## SEC_OverdueShareSec_fct=1: SEC_OverdueShareSec <= 0                          
## SEC_OverdueShareSec_fct=11: SEC_OverdueShareSec Is Null                      
## SEC_OverdueShareSec_fct=8: SEC_OverdueShareSec <= 0.2                        
## SEC_OverdueShareSec_fct=9: SEC_OverdueShareSec <= 1                          
## SEC_CURRENT_BALANCE_fct=1: SEC_CURRENT_BALANCE <= 0                          
## SEC_CURRENT_BALANCE_fct=10: SEC_CURRENT_BALANCE > 0                          
## Passport_flag_fct=1: Passport_flag = 0                                       
## Passport_flag_fct=2: Passport_flag = 1                                       
## Driving_flag_fct=1: Driving_flag = 0                                         
## Driving_flag_fct=2: Driving_flag = 1                                         
## SEC_INSTAL_AMT_fct=1: SEC_INSTAL_AMT <= 0                                    
## SEC_INSTAL_AMT_fct=10: SEC_INSTAL_AMT > 0                                    
## PAN_flag_fct=1: PAN_flag = 0                                              *  
## PAN_flag_fct=2: PAN_flag = 1                                                 
## SEC_OVERDUE_ACCTS_fct=1: SEC_OVERDUE_ACCTS <= 0                              
## SEC_OVERDUE_ACCTS_fct=10: SEC_OVERDUE_ACCTS > 0                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Condition number of final variance-covariance matrix: 1997.764 
## Number of iterations: 6
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.019 seconds
## 
##  Estimate a Classification Result on Testing Set
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   Bad  Good
##       Bad    284   338
##       Good  4770 17924
##                                              
##                Accuracy : 0.7809             
##                  95% CI : (0.7756, 0.7862)   
##     No Information Rate : 0.7832             
##     P-Value [Acc > NIR] : 0.8069             
##                                              
##                   Kappa : 0.0552             
##  Mcnemar's Test P-Value : <0.0000000000000002
##                                              
##             Sensitivity : 0.05619            
##             Specificity : 0.98149            
##          Pos Pred Value : 0.45659            
##          Neg Pred Value : 0.78981            
##               Precision : 0.45659            
##                  Recall : 0.05619            
##                      F1 : 0.10007            
##              Prevalence : 0.21676            
##          Detection Rate : 0.01218            
##    Detection Prevalence : 0.02668            
##       Balanced Accuracy : 0.51884            
##                                              
##        'Positive' Class : Bad                
## 
## 
##  Estimate a Gini Coefficient on Testing Set = 31.79%

Receiver Operating Characteristic Curve

ROC - A receiver operating characteristic Curve, i.e., ROC Curve or Lorenz Curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.

Kolmogorov–Smirnov Test

One of the statistics commonly used in credit scoring, as well as countless other disciplines, is the KS statistic. This was developed by two renowned Soviet mathematicians, A.N. Kolmogorov (1903-1987) and N.V. Smirnov (1900-1966).

The K-S statistic of interest is where the difference is greatest. The treatment differs depending upon whether one or two samples were used to generate the values.

Measures of separation of the Score distribution

The credit score is a numeric expression measuring creditworthiness. Commercial Banks usually utilize it as a method to support the decision-making about credit applications.

If a reliable odds estimate already exists, whether because the statistical technique provides it directly, or some algorithm was used, scaling can be done using Equation Log reference.

\[ \displaystyle \large c' = \frac{S × ln(D × G) \ - \ (S + I) × ln(D)}{ln(G)} \hspace{.5 in} [5] \\ \displaystyle \large i' = \frac{I} {ln(G)} \hspace{0.3 in} s' = c' + ln(D_{Orig}) × i'\]

where \(S\) is the reference score, \(D\) is the required Good/Bad odds at that score, \(I\) is the score increment, \(G\) the required odds increment, and \(D_{Orig}\) the odds provided by the model. An example for a reference odds of 16 to 1 at a score of 700, with odds doubling every 50 points, is provided below. The scaled score equating to 128 to 1 is then 850, calculated as:

\[ \displaystyle \large c' = \frac{700 × ln(16 × 2) \ - \ (700 + 50) × ln(16)}{ln(2)} = 500 \hspace{.5 in} [6] \\ \displaystyle \large i' = \frac{50} {ln(2)} = 72.13475 \hspace{0.1 in} s' = 500 + ln(128) × 72.13475 = 850\]

A further method to perform validation is to compare the divergence statistic for the scores of ‘Good’ and ‘Bad’ class. Kullback-Leibler’s Divergence ot Relative Entropy can be calculated using the formula:

\[ \displaystyle \large Divergence = \frac{(mean_G \ - \ mean_B)^2}{0.5 × (var_G \ + \ var_B)} \hspace{.5 in} [7] \]

where \(mean_G\), \(mean_B\), \(var_G\), and \(var_B\) are the means and variances of the scored Good and Bad populations respectively.

If the divergence value is large, then the division of classes is fair.

Trainig Models in MicrosoftML package

John Mount has long wondered about the applicability of heterogeneous statistical methods for solving various classification problems. First, which classification methods are most accurate in general — that is, which methods identify the correct class most of the time. Second, which classifiers behave most like each other, in terms of the class probabilities that they assign to each of the target classes. Answers to these questions can be found on the company’s website Win-Vector LLC.

The rxLogisticRegression() algorithm is used to predict the value of a categorical dependent variable from its relationship to one or more independent variables assumed to have a logistic distribution. The rxLogisticRegression learner automatically adjusts the weights to select those variables that are most useful for making predictions (L1 and L2 regularization). This model based on the Stochastic Dual Coordinate Ascent method.

## 
## 
## Generalized Linear Model with Regularized by the L1 and L2 penalties ...
set.seed(seed)

# tuneLength <- 10
# tuneGrid = data.frame(lasso = seq(from = .1, to = 1, length.out = tuneLength),
#                       ridge = seq(from = .1, to = 1, length.out = tuneLength))
# Optimization.df <- data.frame(matrix(nrow = tuneLength, ncol = tuneLength))
# system.time(
#   for (i in 1:tuneLength ) { # The L1 (Lasso) regularization
# 
#     for (j in 1:tuneLength ) { # The L2 (Ridge) regularization
#       rxLogisticRegressionFit <-
#         MicrosoftML::rxLogisticRegression(
#           formula = Y ~ .
#           , data = caret::upSample(X[inTrain, ], Y[inTrain],  yname = 'Y')  # Up-Sampling Imbalanced Data
#           , type = 'binary'
#           , l2Weight = tuneGrid[j, 'ridge']  # The L2 (Ridge) regularization weight
#           , l1Weight = tuneGrid[i, 'lasso']  # The L1 (Lasso) regularization weight
#           , normalize = 'no'                 # no normalization is performed
#           , reportProgress = 0
#           , verbose = 4 )
#       Optimization.df[i, j] <- summary(rxLogisticRegressionFit)$summary$AIC
# 
#     }
#   }
# )
#
# optRegularizations <- which(Optimization.df == min(Optimization.df), arr.ind = TRUE)
rxLogisticRegressionFit <- 
  MicrosoftML::rxLogisticRegression( formula = Y ~ .
    , data = caret::upSample(X[inTrain, ], Y[inTrain],  yname = 'Y')  # Up-Sampling Imbalanced Data
    , type = 'binary'
    # , l2Weight = 1 # tuneGrid[optRegularizations[2], 'ridge']       # The Ridge regularization weight
    # , l1Weight = 1 # tuneGrid[optRegularizations[1], 'lasso']       # The Lasso regularization weight
    , trainThreads = parallel::detectCores()                          # The number of threads to use in model
    , normalize = 'no'                                                # no normalization is performed
    , reportProgress = 0
    , verbose = 0 )
## Not adding a normalizer.
## Beginning processing data.
## Rows Read: 328562, Read Time: 0.001, Transform Time: 0
## Beginning processing data.
## Beginning processing data.
## Rows Read: 328562, Read Time: 0, Transform Time: 0
## Beginning processing data.
## LBFGS multi-threading will attempt to load dataset into memory. In case of out-of-memory issues, turn off multi-threading by setting trainThreads to 1.
## Beginning optimization
## num vars: 163
## improvement criterion: Mean Improvement
## L1 regularization selected 163 of 163 weights.
## Not training a calibrator because it is not needed.
## Elapsed time: 00:00:02.3866356
## Elapsed time: 00:00:00.1384473
## Beginning processing data.
## Rows Read: 23316, Read Time: 0, Transform Time: 0
## Beginning processing data.
## Elapsed time: 00:00:00.2378449
## Finished writing 23316 rows.
## Writing completed.
## 
##  Estimate a Classification Result on Testing Set
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.025 seconds
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   Bad  Good
##       Bad   3039  6817
##       Good  2015 11445
##                                              
##                Accuracy : 0.6212             
##                  95% CI : (0.6149, 0.6274)   
##     No Information Rate : 0.7832             
##     P-Value [Acc > NIR] : 1                  
##                                              
##                   Kappa : 0.1697             
##  Mcnemar's Test P-Value : <0.0000000000000002
##                                              
##             Sensitivity : 0.6013             
##             Specificity : 0.6267             
##          Pos Pred Value : 0.3083             
##          Neg Pred Value : 0.8503             
##               Precision : 0.3083             
##                  Recall : 0.6013             
##                      F1 : 0.4076             
##              Prevalence : 0.2168             
##          Detection Rate : 0.1303             
##    Detection Prevalence : 0.4227             
##       Balanced Accuracy : 0.6140             
##                                              
##        'Positive' Class : Bad                
## 
## Measure running time of `Generalized Linear Model` code =
## Time difference of 3.74208 secs

The rxFastTrees() algorithm is a high performing, state of the art scalable boosted decision tree that implements FastRank, an efficient implementation of the MART gradient boosting algorithm. MART learns an ensemble of regression trees, which is a decision tree with scalar values in its leaves. For binary classification, the output is converted to a probability by using some form of calibration.

## 
## 
## Fast Trees is an Gradient Tree Boosting Algorith (GTB) ...
##    user  system elapsed 
##    0.14    0.00    2.67
## Beginning read for block: 1
## Rows Read: 23316, Read Time: 0.004, Transform Time: 0
## Beginning read for block: 2
## No rows remaining. Finished reading data set. 
## Elapsed time: 00:00:00.3513291
## Finished writing 23316 rows.
## Writing completed.
## 
##  Estimate a Classification Result on Testing Set
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.031 seconds
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   Bad  Good
##       Bad   3019  6681
##       Good  2035 11581
##                                              
##                Accuracy : 0.6262             
##                  95% CI : (0.6199, 0.6324)   
##     No Information Rate : 0.7832             
##     P-Value [Acc > NIR] : 1                  
##                                              
##                   Kappa : 0.1737             
##  Mcnemar's Test P-Value : <0.0000000000000002
##                                              
##             Sensitivity : 0.5973             
##             Specificity : 0.6342             
##          Pos Pred Value : 0.3112             
##          Neg Pred Value : 0.8505             
##               Precision : 0.3112             
##                  Recall : 0.5973             
##                      F1 : 0.4092             
##              Prevalence : 0.2168             
##          Detection Rate : 0.1295             
##    Detection Prevalence : 0.4160             
##       Balanced Accuracy : 0.6158             
##                                              
##        'Positive' Class : Bad                
## 
## Measure running time of `Gradient Tree Boosting Model` code =
## Time difference of 3.490278 secs

Decision trees are non-parametric models that perform a sequence of simple tests on inputs. The rxFastForest() algorithm is a random forest that provides a learning method for classification that constructs an ensemble of decision trees at training time, outputting the class that is the mode of the classes of the individual trees. Random decision forests can correct for the overfitting to training data sets to which decision trees are prone. The rxFastForest learner automatically builds a set of trees whose combined predictions are better than the predictions of any one of the trees.

## 
## 
## Fast Forest is an Fast Random Forest (RF) ...
## Beginning processing data.
## Rows Read: 328562
## Beginning processing data.
## Beginning processing data.
## Rows Read: 328562
## Beginning processing data.
##    user  system elapsed 
##    0.60    0.01    7.50
## Beginning read for block: 1
## Rows Read: 23316, Read Time: 0.003, Transform Time: 0
## Beginning read for block: 2
## No rows remaining. Finished reading data set. 
## Elapsed time: 00:00:00.8439608
## Finished writing 23316 rows.
## Writing completed.
## 
##  Estimate a Classification Result on Testing Set
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.037 seconds
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   Bad  Good
##       Bad   3035  7349
##       Good  2019 10913
##                                              
##                Accuracy : 0.5982             
##                  95% CI : (0.5919, 0.6045)   
##     No Information Rate : 0.7832             
##     P-Value [Acc > NIR] : 1                  
##                                              
##                   Kappa : 0.1434             
##  Mcnemar's Test P-Value : <0.0000000000000002
##                                              
##             Sensitivity : 0.6005             
##             Specificity : 0.5976             
##          Pos Pred Value : 0.2923             
##          Neg Pred Value : 0.8439             
##               Precision : 0.2923             
##                  Recall : 0.6005             
##                      F1 : 0.3932             
##              Prevalence : 0.2168             
##          Detection Rate : 0.1302             
##    Detection Prevalence : 0.4454             
##       Balanced Accuracy : 0.5990             
##                                              
##        'Positive' Class : Bad                
## 
## Measure running time of `Random Forest Model (Fast Forest)` code =
## Time difference of 8.736882 secs

The rxNeuralNet() algorithm supports a user-defined multilayer network topology with GPU acceleration. A neural network is a class of prediction models inspired by the human brain. It can be represented as a weighted directed graph. Each node in the graph is called a neuron. The neural network algorithm tries to learn the optimal weights on the edges based on the training data. Any class of statistical models can be considered a neural network if they use adaptive weights and can approximate non-linear functions of their inputs. Neural network regression is especially suited to problems where a more traditional regression model cannot fit a solution. Define Neural network using Net# language or the Azure Gallery.

For GPU acceleration, it is recommended to use a miniBatchSize greater than one. If you want to use the GPU acceleration, there are additional manual setup steps are required:

## 
## 
## Artificial Neural Networks (ANN) ...
## Not adding a normalizer.
## Beginning processing data.
## Rows Read: 328562, Read Time: 0, Transform Time: 0
## Beginning processing data.
## Beginning processing data.
## Rows Read: 328562, Read Time: 0, Transform Time: 0
## Beginning processing data.
## Failed to initialize CUDA runtime. Possible reasons:
## 1. The machine does not have CUDA-capable card. Supported devices have compute capability 2.0 and higher.
## 2. Outdated graphics drivers. Please install the latest drivers from http://www.nvidia.com/Drivers .
## 3. CUDA runtime DLLs are missing, please see the GPU acceleration help for the installation instructions.
## CUDA not supported, switched to SSE math.
## Using: SSE Math
## 
## ***** Net definition *****
##   input Data [162];
##   hidden H [100] sigmoid { // Depth 1
##     from Data all;
##   }
##   output Result [1] sigmoid { // Depth 0
##     from H all;
##   }
## ***** End net definition *****
## Input count: 162
## Output count: 1
## Output Function: Sigmoid
## Loss Function: LogLoss
## PreTrainer: NoPreTrainer
## ___________________________________________________________________
## Starting training...
## Learning rate: 0.001000
## Momentum: 0.000000
## InitWtsDiameter: 0.005000
## ___________________________________________________________________
## Initializing 1 Hidden Layers, 16401 Weights...
## Estimated Pre-training MeanError = 0.693613
## Iter:1/100, MeanErr=0.639431(-7.81%%), 3798.21M WeightUpdates/sec
## Iter:2/100, MeanErr=0.602368(-5.80%%), 3949.60M WeightUpdates/sec
## Iter:3/100, MeanErr=0.584635(-2.94%%), 3914.17M WeightUpdates/sec
## Iter:4/100, MeanErr=0.527511(-9.77%%), 3915.96M WeightUpdates/sec
## Iter:5/100, MeanErr=0.593839(12.57%%), 3827.49M WeightUpdates/sec
## Iter:6/100, MeanErr=0.554664(-6.60%%), 3856.07M WeightUpdates/sec
## Iter:7/100, MeanErr=0.562969(1.50%%), 3903.81M WeightUpdates/sec
## Iter:8/100, MeanErr=0.548285(-2.61%%), 3942.92M WeightUpdates/sec
## Iter:9/100, MeanErr=0.563428(2.76%%), 3897.05M WeightUpdates/sec
## Iter:10/100, MeanErr=0.586190(4.04%%), 3941.05M WeightUpdates/sec
## Iter:11/100, MeanErr=0.597030(1.85%%), 3853.81M WeightUpdates/sec
## Iter:12/100, MeanErr=0.608863(1.98%%), 3888.06M WeightUpdates/sec
## Iter:13/100, MeanErr=0.584287(-4.04%%), 3521.77M WeightUpdates/sec
## Iter:14/100, MeanErr=0.590426(1.05%%), 3879.91M WeightUpdates/sec
## Iter:15/100, MeanErr=0.520848(-11.78%%), 3869.92M WeightUpdates/sec
## Iter:16/100, MeanErr=0.582126(11.77%%), 3852.34M WeightUpdates/sec
## Iter:17/100, MeanErr=0.583295(0.20%%), 3921.90M WeightUpdates/sec
## Iter:18/100, MeanErr=0.603145(3.40%%), 3871.87M WeightUpdates/sec
## Iter:19/100, MeanErr=0.588263(-2.47%%), 3891.76M WeightUpdates/sec
## Iter:20/100, MeanErr=0.588528(0.05%%), 3984.09M WeightUpdates/sec
## Iter:21/100, MeanErr=0.561759(-4.55%%), 3952.93M WeightUpdates/sec
## Iter:22/100, MeanErr=0.597273(6.32%%), 3904.70M WeightUpdates/sec
## Iter:23/100, MeanErr=0.580513(-2.81%%), 3923.32M WeightUpdates/sec
## Iter:24/100, MeanErr=0.608079(4.75%%), 3931.57M WeightUpdates/sec
## Iter:25/100, MeanErr=0.575513(-5.36%%), 3940.99M WeightUpdates/sec
## Iter:26/100, MeanErr=0.601079(4.44%%), 3948.39M WeightUpdates/sec
## Iter:27/100, MeanErr=0.574797(-4.37%%), 3920.24M WeightUpdates/sec
## Iter:28/100, MeanErr=0.552832(-3.82%%), 3934.60M WeightUpdates/sec
## Iter:29/100, MeanErr=0.590020(6.73%%), 3982.74M WeightUpdates/sec
## Iter:30/100, MeanErr=0.574746(-2.59%%), 3846.70M WeightUpdates/sec
## Iter:31/100, MeanErr=0.582986(1.43%%), 3966.72M WeightUpdates/sec
## Iter:32/100, MeanErr=0.554354(-4.91%%), 3847.14M WeightUpdates/sec
## Iter:33/100, MeanErr=0.571911(3.17%%), 4055.53M WeightUpdates/sec
## Iter:34/100, MeanErr=0.607195(6.17%%), 3849.39M WeightUpdates/sec
## Iter:35/100, MeanErr=0.479481(-21.03%%), 3860.51M WeightUpdates/sec
## Iter:36/100, MeanErr=0.559952(16.78%%), 3806.84M WeightUpdates/sec
## Iter:37/100, MeanErr=0.572845(2.30%%), 3474.53M WeightUpdates/sec
## Iter:38/100, MeanErr=0.566092(-1.18%%), 3993.22M WeightUpdates/sec
## Iter:39/100, MeanErr=0.590559(4.32%%), 3884.40M WeightUpdates/sec
## Iter:40/100, MeanErr=0.562207(-4.80%%), 3865.32M WeightUpdates/sec
## Iter:41/100, MeanErr=0.570505(1.48%%), 3854.38M WeightUpdates/sec
## Iter:42/100, MeanErr=0.582996(2.19%%), 3185.89M WeightUpdates/sec
## Iter:43/100, MeanErr=0.589425(1.10%%), 3415.26M WeightUpdates/sec
## Iter:44/100, MeanErr=0.543417(-7.81%%), 2318.03M WeightUpdates/sec
## Iter:45/100, MeanErr=0.583254(7.33%%), 2706.60M WeightUpdates/sec
## Iter:46/100, MeanErr=0.570077(-2.26%%), 2547.02M WeightUpdates/sec
## Iter:47/100, MeanErr=0.593370(4.09%%), 3272.58M WeightUpdates/sec
## Iter:48/100, MeanErr=0.578531(-2.50%%), 3389.04M WeightUpdates/sec
## Iter:49/100, MeanErr=0.560148(-3.18%%), 3481.86M WeightUpdates/sec
## Iter:50/100, MeanErr=0.594706(6.17%%), 3710.83M WeightUpdates/sec
## Iter:51/100, MeanErr=0.576700(-3.03%%), 2830.77M WeightUpdates/sec
## Iter:52/100, MeanErr=0.591160(2.51%%), 3083.10M WeightUpdates/sec
## Iter:53/100, MeanErr=0.559937(-5.28%%), 3995.15M WeightUpdates/sec
## Iter:54/100, MeanErr=0.516699(-7.72%%), 4226.92M WeightUpdates/sec
## Iter:55/100, MeanErr=0.568916(10.11%%), 4151.69M WeightUpdates/sec
## Iter:56/100, MeanErr=0.551433(-3.07%%), 4325.79M WeightUpdates/sec
## Iter:57/100, MeanErr=0.601805(9.13%%), 4337.89M WeightUpdates/sec
## Iter:58/100, MeanErr=0.591378(-1.73%%), 4311.02M WeightUpdates/sec
## Iter:59/100, MeanErr=0.579823(-1.95%%), 4320.18M WeightUpdates/sec
## Iter:60/100, MeanErr=0.556642(-4.00%%), 4323.00M WeightUpdates/sec
## Iter:61/100, MeanErr=0.573425(3.01%%), 4319.77M WeightUpdates/sec
## Iter:62/100, MeanErr=0.571964(-0.25%%), 4263.00M WeightUpdates/sec
## Iter:63/100, MeanErr=0.568972(-0.52%%), 4258.27M WeightUpdates/sec
## Iter:64/100, MeanErr=0.569381(0.07%%), 4267.84M WeightUpdates/sec
## Iter:65/100, MeanErr=0.510309(-10.37%%), 4246.29M WeightUpdates/sec
## Iter:66/100, MeanErr=0.581686(13.99%%), 4215.08M WeightUpdates/sec
## Iter:67/100, MeanErr=0.570046(-2.00%%), 4180.59M WeightUpdates/sec
## Iter:68/100, MeanErr=0.588904(3.31%%), 3617.76M WeightUpdates/sec
## Iter:69/100, MeanErr=0.587524(-0.23%%), 4163.22M WeightUpdates/sec
## Iter:70/100, MeanErr=0.565621(-3.73%%), 4315.60M WeightUpdates/sec
## Iter:71/100, MeanErr=0.551080(-2.57%%), 4358.70M WeightUpdates/sec
## Iter:72/100, MeanErr=0.553144(0.37%%), 4288.95M WeightUpdates/sec
## Iter:73/100, MeanErr=0.546142(-1.27%%), 4341.31M WeightUpdates/sec
## Iter:74/100, MeanErr=0.599115(9.70%%), 4394.05M WeightUpdates/sec
## Iter:75/100, MeanErr=0.584863(-2.38%%), 4328.23M WeightUpdates/sec
## Iter:76/100, MeanErr=0.577997(-1.17%%), 4244.54M WeightUpdates/sec
## Iter:77/100, MeanErr=0.560597(-3.01%%), 4393.77M WeightUpdates/sec
## Iter:78/100, MeanErr=0.560554(-0.01%%), 4285.19M WeightUpdates/sec
## Iter:79/100, MeanErr=0.566539(1.07%%), 4323.93M WeightUpdates/sec
## Iter:80/100, MeanErr=0.583388(2.97%%), 4316.30M WeightUpdates/sec
## Iter:81/100, MeanErr=0.574882(-1.46%%), 4303.92M WeightUpdates/sec
## Iter:82/100, MeanErr=0.597803(3.99%%), 4363.38M WeightUpdates/sec
## Iter:83/100, MeanErr=0.573010(-4.15%%), 4319.29M WeightUpdates/sec
## Iter:84/100, MeanErr=0.576507(0.61%%), 4263.72M WeightUpdates/sec
## Iter:85/100, MeanErr=0.581926(0.94%%), 4375.43M WeightUpdates/sec
## Iter:86/100, MeanErr=0.550412(-5.42%%), 4324.91M WeightUpdates/sec
## Iter:87/100, MeanErr=0.524140(-4.77%%), 4318.94M WeightUpdates/sec
## Iter:88/100, MeanErr=0.584658(11.55%%), 4339.30M WeightUpdates/sec
## Iter:89/100, MeanErr=0.585000(0.06%%), 4297.34M WeightUpdates/sec
## Iter:90/100, MeanErr=0.562040(-3.92%%), 4313.00M WeightUpdates/sec
## Iter:91/100, MeanErr=0.553648(-1.49%%), 4360.05M WeightUpdates/sec
## Iter:92/100, MeanErr=0.598780(8.15%%), 4227.71M WeightUpdates/sec
## Iter:93/100, MeanErr=0.595341(-0.57%%), 4207.91M WeightUpdates/sec
## Iter:94/100, MeanErr=0.589134(-1.04%%), 4279.77M WeightUpdates/sec
## Iter:95/100, MeanErr=0.571897(-2.93%%), 4230.29M WeightUpdates/sec
## Iter:96/100, MeanErr=0.590322(3.22%%), 4332.06M WeightUpdates/sec
## Iter:97/100, MeanErr=0.568671(-3.67%%), 4195.30M WeightUpdates/sec
## Iter:98/100, MeanErr=0.528197(-7.12%%), 4155.70M WeightUpdates/sec
## Iter:99/100, MeanErr=0.573761(8.63%%), 4197.96M WeightUpdates/sec
## Iter:100/100, MeanErr=0.590485(2.91%%), 4186.22M WeightUpdates/sec
## Done!
## Estimated Post-training MeanError = 0.620647
## ___________________________________________________________________
## Not training a calibrator because it is not needed.
## Elapsed time: 00:02:18.7486743
##    user  system elapsed 
##    0.61    0.01  139.44
## Beginning read for block: 1
## Rows Read: 23316, Read Time: 0.004, Transform Time: 0
## Beginning read for block: 2
## No rows remaining. Finished reading data set. 
## Elapsed time: 00:00:00.2187188
## Finished writing 23316 rows.
## Writing completed.
## 
##  Estimate a Classification Result on Testing Set
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.042 seconds
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   Bad  Good
##       Bad   2882  6370
##       Good  2172 11892
##                                              
##                Accuracy : 0.6336             
##                  95% CI : (0.6274, 0.6398)   
##     No Information Rate : 0.7832             
##     P-Value [Acc > NIR] : 1                  
##                                              
##                   Kappa : 0.1703             
##  Mcnemar's Test P-Value : <0.0000000000000002
##                                              
##             Sensitivity : 0.5702             
##             Specificity : 0.6512             
##          Pos Pred Value : 0.3115             
##          Neg Pred Value : 0.8456             
##               Precision : 0.3115             
##                  Recall : 0.5702             
##                      F1 : 0.4029             
##              Prevalence : 0.2168             
##          Detection Rate : 0.1236             
##    Detection Prevalence : 0.3968             
##       Balanced Accuracy : 0.6107             
##                                              
##        'Positive' Class : Bad                
## 
## Measure running time of `Artificial Neural Networks (ANN)` code =
## Time difference of 2.335371 mins

This type of SVM is one-class because the training set contains only examples from the target class. It infers what properties are normal for the objects in the target class and from these properties predicts which examples are unlike the normal examples. This is useful for anomaly detection because the scarcity of training examples is the defining character of anomalies: typically there are very few examples of network intrusion, fraud, or other types of anomalous behavior.

Parallel External Memory Algorithm for Naive Bayes Classifiers

## 
## 
## Fast Forest is an Naive Bayes Classifiers (NB) ...
## 
Rows Processed: 328562
##    user  system elapsed 
##    0.62    0.03    0.83
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.694 seconds
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.323 seconds
## 
##  Estimate a Classification Result on Testing Set
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.050 seconds
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   Bad  Good
##       Bad   3014  6803
##       Good  2040 11459
##                                              
##                Accuracy : 0.6207             
##                  95% CI : (0.6145, 0.627)    
##     No Information Rate : 0.7832             
##     P-Value [Acc > NIR] : 1                  
##                                              
##                   Kappa : 0.1669             
##  Mcnemar's Test P-Value : <0.0000000000000002
##                                              
##             Sensitivity : 0.5964             
##             Specificity : 0.6275             
##          Pos Pred Value : 0.3070             
##          Neg Pred Value : 0.8489             
##               Precision : 0.3070             
##                  Recall : 0.5964             
##                      F1 : 0.4054             
##              Prevalence : 0.2168             
##          Detection Rate : 0.1293             
##    Detection Prevalence : 0.4210             
##       Balanced Accuracy : 0.6119             
##                                              
##        'Positive' Class : Bad                
## 
## Measure running time of `Naive Bayes Model (NB)` code =
## Time difference of 2.459047 secs

Next, we build an ensemble of fast tree models by using the function rxEnsemble().

## 
## 
## Ensemble of Some Models ...
## Not adding a normalizer.
## Making per-feature arrays
## Changing data from row-wise to column-wise
## Beginning processing data.
## Rows Read: 328562, Read Time: 0, Transform Time: 0
## Beginning processing data.
## Processed 327991 instances
## Binning and forming Feature objects
## Reserved memory for tree learner: 48568 bytes
## Starting to train ...
## Not training a calibrator because it is not needed.
## Elapsed time: 00:00:04.1492102
## Not adding a normalizer.
## Making per-feature arrays
## Changing data from row-wise to column-wise
## Beginning processing data.
## Rows Read: 328562, Read Time: 0, Transform Time: 0
## Beginning processing data.
## Processed 327962 instances
## Binning and forming Feature objects
## Reserved memory for tree learner: 48568 bytes
## Starting to train ...
## Not training a calibrator because it is not needed.
## Elapsed time: 00:00:06.5958746
## Not adding a normalizer.
## Making per-feature arrays
## Changing data from row-wise to column-wise
## Beginning processing data.
## Rows Read: 328562, Read Time: 0, Transform Time: 0
## Beginning processing data.
## Processed 328662 instances
## Binning and forming Feature objects
## Reserved memory for tree learner: 48568 bytes
## Starting to train ...
## Not training a calibrator because it is not needed.
## Elapsed time: 00:00:06.2534772
## Beginning processing data.
## Rows Read: 328562, Read Time: 0, Transform Time: 0
## Elapsed time: 00:00:06.4938534
## Beginning processing data.
##    user  system elapsed 
##    1.26    0.31   25.25
## Beginning processing data.
## Rows Read: 23316, Read Time: 0.001, Transform Time: 0
## Beginning processing data.
## Elapsed time: 00:00:01.6385659
## Finished writing 23316 rows.
## Writing completed.
## 
##  Estimate a Classification Result on Testing Set
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.057 seconds
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   Bad  Good
##       Bad   2995  6620
##       Good  2059 11642
##                                              
##                Accuracy : 0.6278             
##                  95% CI : (0.6215, 0.634)    
##     No Information Rate : 0.7832             
##     P-Value [Acc > NIR] : 1                  
##                                              
##                   Kappa : 0.1735             
##  Mcnemar's Test P-Value : <0.0000000000000002
##                                              
##             Sensitivity : 0.5926             
##             Specificity : 0.6375             
##          Pos Pred Value : 0.3115             
##          Neg Pred Value : 0.8497             
##               Precision : 0.3115             
##                  Recall : 0.5926             
##                      F1 : 0.4083             
##              Prevalence : 0.2168             
##          Detection Rate : 0.1285             
##    Detection Prevalence : 0.4124             
##       Balanced Accuracy : 0.6150             
##                                              
##        'Positive' Class : Bad                
## 
## Measure running time of `Ensemble Model` code =
## Time difference of 27.30104 secs

Evaluation of Classifiers

After constructing the set of classifiers, we will test them on the control dataset (inTest), which did not participate in the solution of the classification problem. We will compare the accuracy on it with test cases (inTest) for this task.

## Beginning processing data.
## Rows Read: 209838, Read Time: 0, Transform Time: 0
## Beginning processing data.
## Elapsed time: 00:00:01.1230519
## Finished writing 209838 rows.
## Writing completed.
## Beginning read for block: 1
## Rows Read: 209838, Read Time: 0.012, Transform Time: 0
## Beginning read for block: 2
## No rows remaining. Finished reading data set. 
## Elapsed time: 00:00:02.7085950
## Finished writing 209838 rows.
## Writing completed.
## Beginning read for block: 1
## Rows Read: 209838, Read Time: 0.014, Transform Time: 0
## Beginning read for block: 2
## No rows remaining. Finished reading data set. 
## Elapsed time: 00:00:07.3217277
## Finished writing 209838 rows.
## Writing completed.
## Beginning read for block: 1
## Rows Read: 209838, Read Time: 0.015, Transform Time: 0
## Beginning read for block: 2
## No rows remaining. Finished reading data set. 
## Elapsed time: 00:00:01.5667492
## Finished writing 209838 rows.
## Writing completed.
## Rows Read: 209838, Total Rows Processed: 209838, Total Chunk Time: 2.125 seconds
## Beginning read for block: 1
## Rows Read: 209838, Read Time: 0.098, Transform Time: 0
## Beginning read for block: 2
## No rows remaining. Finished reading data set. 
## Elapsed time: 00:00:13.3835712
## Finished writing 209838 rows.
## Writing completed.
## Rows Read: 209838, Total Rows Processed: 209838, Total Chunk Time: 1.197 seconds
## Rows Read: 209838, Total Rows Processed: 209838, Total Chunk Time: 0.260 seconds
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.348 seconds
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.030 seconds

Compute and plot an ROC curve using actual and predicted values from binary classifier system.

## Rows Read: 209838, Total Rows Processed: 209838, Total Chunk Time: 0.278 seconds
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.028 seconds
## Rows Read: 23316, Total Rows Processed: 23316, Total Chunk Time: 0.029 seconds

Finally, we evaluate and compare the above built models at various aspects.

## 
## Attaching package: 'formattable'
## The following object is masked from 'package:plotly':
## 
##     style
Classification results on TRAINING and TESTING Sets from Microsoft Machine Learning models
Training Sets
Testing Set
Methods Gini Sens Spec Kappa Gini Sens Spec Kappa Notes
rxEnsemble 0.4973 0.6656 0.6374 0.2241 0.3214 0.5926 0.6375 0.1735 (empty)
rxFastForest 0.3615 0.6897 0.6506 0.2534 0.2751 0.6005 0.5976 0.1434 (empty)
rxFastTrees 0.4641 0.6604 0.6079 0.1933 0.3229 0.5973 0.6342 0.1737 (empty)
rxLogisticRegression 0.4145 0.6383 0.6659 0.2332 0.3175 0.6013 0.6267 0.1697 (empty)
rxNaiveBayes 0.3701 0.7040 0.6634 0.2759 0.3063 0.5964 0.6275 0.1669 (empty)
rxNeuralNet 0.4192 0.6361 0.6331 0.1999 0.3062 0.5702 0.6512 0.1703 (empty)

The table shows that when comparing the predictions on the test dataset with the real breakdown into classes the top three in quality were:

• classifier trained by rxFastTrees - quality according to ‘Gini’ 0.3229

• classifier trained by rxEnsemble - quality according toо ‘Gini’ 0.3214

• classifier trained by rxLogisticRegression - quality according to ‘Gini’ 0.3175.

We select the most suitable method on this test dataset - the classifier by rxFastTrees.

Descriptive mAchine Learning EXplanations of Models

Machine Learning Models are widely used and have various applications in classification or regression tasks. Due to increasing computational power, availability of new data sources and new methods, ML models are more and more complex. Models created with techniques like boosting, bagging of neural networks are true black boxes. It is hard to trace the link between input variables and model outcomes. They are use because of high performance, but lack of interpretability is one of their weakest sides.

# See https://github.com/pbiecek/DALEX, https://github.com/MI2DataLab/modelDown
# See www.r-bloggers.com/dalex-and-h2o-machine-learning-model-interpretability-and-feature-explanation/
library('iml')              # Interpretable Machine Learning
library('DALEX')            # Descriptive mAchine Learning EXplanations
library('breakDown')        # Model Agnostic Explainers for Individual Predictions

Prob_fun <- function(object, newdata){
  # predict(object, newdata=newdata, type = 'prob')[, 'Good']
  
  rxPredict(modelObject = object, data = newdata, blocksPerRead = 200000,
        reportProgress = 0, verbose = 0) %>%
    dplyr::select(starts_with('Probability.')) %>%
      pull 
}

loss_gini <- function(observed, predicted) {
    hmeasure::HMeasure(true.class = observed, scores = predicted)[['metrics']] %>% .[1, 'Gini']
}

Predict.Fun <- function(model, newdata){
    rxPredict(modelObject = model, data = newdata, blocksPerRead = 200000,
        reportProgress = 0, verbose = 0) %>%
      dplyr::select(starts_with('Probability.')) %>%
          pull 
}

loss_Gini <- function(actual, predicted) {
    hmeasure::HMeasure(true.class = actual, scores = predicted)[['metrics']] %>% .[1, 'Gini']
}

Scorecard <- 'Scorecard'

# Create Model Explainer from 'DALEX' package
explainer_classif_1 <-
  DALEX::explain(TheBestModel, label = NameOfTheBestModel,
                 data = X[inTest, ], y = Y[inTest] %>% as.integer()-1, predict_function = Prob_fun)

explainer_classif_2 <-
  DALEX::explain(TheBestModel, label = NameOfTheSecondModel, 
                 data = X[inTest, ], y = Y[inTest] %>% as.integer()-1, predict_function = Prob_fun)

explainer_classif_3 <-
  DALEX::explain(get(paste0(NameOfTheThirdModel, 'Fit')),  label = NameOfTheThirdModel, 
                 data = X[inTest, ], y = Y[inTest] %>% as.integer()-1, predict_function = Prob_fun)

if (!exists('wb')) {
  # Create MS Excel File for Output
  openxlsx::addWorksheet(wb <- openxlsx::createWorkbook(), sheetName = 'IV Table', 
                         gridLines = FALSE, tabColour = 'olivedrab')
  openxlsx::addWorksheet(wb, sheetName = 'Scorecard', gridLines = FALSE, tabColour = 'brown')
}

if (ncol(X) <= Max_Vars) {

# [Interpretable Machine Learning: Feature Importance](https://christophm.github.io/interpretable-ml-book)
  system.time({
    ### Setup parallel processing - 4 times faster
    library('doParallel'); cl <- makeCluster(detectCores()); registerDoParallel(cl)

    model1 <- iml::Predictor$new(model = TheBestModel, data= X[inTest, ],
                y = Y[inTest] %>% as.integer() - 1, class = 'Good', predict.fun = Predict.Fun, type = 'prob' )
    # Feature Importance
    imp1 <- iml::FeatureImp$new(model1, loss = loss_Gini, compare = 'difference', parallel = TRUE)
    
    importance.df <- imp1$results %>%
      dplyr::mutate(Variable = gsub('_fct', '', feature) %>% as.factor,
                    Importance = -importance) %>%
        dplyr::select(Variable, Importance)
    
    p1 <- imp1$plot() + ggplot2::ggtitle(paste(NameOfTheBestModel, 'by Gini coefficient'))
    print(p1)    
    
    # model2 <- iml::Predictor$new(model = get(paste0(NameOfTheSecondModel, 'Fit')), data= X[inTest, ],
    #           y = Y[inTest] %>% as.integer()-1, class = 'Good', predict.fun = Predict.Fun, type = 'prob' )
    # # Feature Importance
    # imp2 <- iml::FeatureImp$new(model2, loss = loss_Gini, compare = 'difference', parallel = TRUE)
    # imp2$plot() + ggplot2::ggtitle(paste(NameOfTheSecondModel, 'by Gini coefficient'))
    # 
    # model3 <- iml::Predictor$new(model = get(paste0(NameOfTheThirdModel, 'Fit')), data= X[inTest, ],
    #           y = Y[inTest] %>% as.integer()-1, class = 'Good', predict.fun = Predict.Fun, type = 'prob' )
    # # Feature Importance
    # imp3 <- iml::FeatureImp$new(model3, loss = loss_Gini, compare = 'difference', parallel = TRUE)
    # imp3$plot() + ggplot2::ggtitle(paste(NameOfTheThirdModel, 'by Gini coefficient'))
    
    TheImportancePredictor <- imp1$results[nrow(imp1$results), 'feature']
    
    # Remember to stop the cluster in the end again
    stopCluster(cl)
  })
    
    # # Choice of The Importance Predictor - www.machinelearningplus.com/machine-learning/feature-selection/
    # TheImportancePredictor <-
    #   imp1$results %>% arrange(importance) %>% dplyr::select(feature) %>% pull %>% .[1]
    # 
    # # Effect of features on the model predictions
    # ale = iml::FeatureEffect$new(model, feature = TheImportancePredictor, method = 'ale')
    # ale$plot()
} else {
  system.time({
    # # Model Performance from 'DALEX' package
    # plot(DALEX::model_performance(explainer_classif_1)
    #    , DALEX::model_performance(explainer_classif_2)
    #    , DALEX::model_performance(explainer_classif_3)
    #    , geom = 'boxplot')
    
    # Variable importance
    importance.df <- DALEX::variable_importance(explainer_classif_1, type = 'raw', loss_function = loss_gini) %>%       dplyr::mutate(Importance = -(dropout_loss - .[variable == '_full_model_', 'dropout_loss'])) %>% 
        dplyr::filter(!str_detect(variable, 'full_model|baseline')) %>%
          dplyr::mutate(Variable = gsub('_fct', '', variable)) %>% 
            dplyr::select(Variable, Importance)
    
    p1 <- DALEX::variable_importance(explainer_classif_1, type = 'raw', loss_function = loss_gini) %>% 
      plot() + labs(title = 'Variable Importance', caption = 'by Gini Coefficient') + theme_grey()
    print(p1)
    
    # DALEX::variable_importance(explainer_classif_2, type = 'raw', loss_function = loss_gini) %>% 
    #   plot() + labs(subtitle = 'Variable Importance', caption = 'by Gini Coefficient')
    # DALEX::variable_importance(explainer_classif_3, type = 'raw', loss_function = loss_gini) %>% 
    #   plot() + labs(subtitle = 'Variable Importance', caption = 'by Gini Coefficient')
    
    TheImportancePredictor <-
      DALEX::variable_importance(explainer_classif_1, type = 'raw')[2, 'variable'] %>%
        as.character
  })
  } # End if ncol(X) <= Max_Vars

##    user  system elapsed 
##    0.81    0.39  116.79
# Output Chart: EXplanations of Model by Variable Importance
openxlsx::insertPlot(wb, sheet = 'IV Table', xy = c(1, nrow(binning.df)+5), width = 10 * (1 + sqrt(5)) / 2,
                     height = 10, units = 'cm')

if (TheBestModel$Description != 'LogisticRegression')
  openxlsx::insertPlot(wb, sheet = Scorecard, xy = c(1,2), width = 10*(1 + sqrt(5))/2, height = 10, units = 'cm')

# # compute Partial Dependence Plots for a given variable --> uses the pdp package
# plot(DALEX::variable_response(explainer_classif_1, variable = TheImportancePredictor, type = 'factor')) +
#     ggtitle('Marginal Response for a Single Variable by Gini Coefficient')
# plot(DALEX::variable_response(explainer_classif_2, variable = TheImportancePredictor, type = 'factor')) +
#     ggtitle('Marginal Response for a Single Variable by Gini Coefficient')
# plot(DALEX::variable_response(explainer_classif_3, variable = TheImportancePredictor, type = 'factor')) +
#     ggtitle('Marginal Response for a Single Variable by Gini Coefficient')            

if (ncol(X) <= Max_Vars) {
  # Explanations for a Single Prediction for Observations
  # True Good Observation
   new_obj <- data.frame(Probs = preds[[NameOfTheBestModel]], Obs = Y[inTest],
                Equal = ifelse(preds[[NameOfTheBestModel]] < 0.5, 0, 1) == Y[inTest] %>% as.integer() - 1) %>%
              rowid_to_column %>% dplyr::arrange(-Probs) %>%
                dplyr::filter(Probs <= 1, Obs == 'Good', Equal == TRUE) %>%
                  dplyr::select(rowid) %>% pull %>% .[1]
  
  # Recovery of Name of Selecled Scale Variables
  pdp1 <- DALEX::prediction_breakdown(explainer_classif_1, observation = X[inTest, ][new_obj, ])
  pdp1$variable <- pdp1$variable_value
  pdp1[1, 'variable'] <- '(Intercept)'; pdp1[nrow(pdp1), 'variable'] <- 'final_prognosis'
  
  # pdp2 <- DALEX::prediction_breakdown(explainer_classif_2, observation = X[inTest, ][new_obj, ])
  # pdp2$variable <- pdp2$variable_value
  # pdp2[1, 'variable'] <- '(Intercept)'; pdp2[nrow(pdp2), 'variable'] <- 'final_prognosis'
  # 
  # pdp3 <- DALEX::prediction_breakdown(explainer_classif_3, observation = X[inTest, ][new_obj, ])
  # pdp3$variable <- pdp3$variable_value
  # pdp3[1, 'variable'] <- '(Intercept)'; pdp3[nrow(pdp3), 'variable'] <- 'final_prognosis'
  
  p2 <- plot(pdp1, vcolors = c('-1' = 'tomato3', '0' = '#f5f5f5', '1' = 'palegreen3', 'X' = '#00BFC4')) +
    # theme(strip.background = element_rect(fill = 'gray45')) + 
    # theme(strip.text = element_text(colour = 'white')) +
    theme_grey() + theme(legend.position = 'none', panel.border = element_blank()) +
    labs(title = paste0('Reference (', new_obj, ' observation) = True ', Y[inTest][ new_obj ]))
  # plot(pdp2) +
  #   labs(subtitle = paste0('Reference (', new_obj, ' observation) = True ', Y[inTest][ new_obj ]))
  # plot(pdp3) +
  #   labs(subtitle = paste0('Reference (', new_obj, ' observation) = True ', Y[inTest][ new_obj ]))
  print(p2)

  # Output Chart: EXplanations of Model by Predictors of True Bad Observation
  openxlsx::insertPlot(wb, sheet = 'IV Table', xy = c(7, nrow(binning.df) + 5), width = 10 * (1 + sqrt(5)) / 2,
                       height = 10, units = 'cm')
  
  if (TheBestModel$Description != 'LogisticRegression')
    openxlsx::insertPlot(wb, sheet = Scorecard, xy=c(1, 21), width = 10*(1 + sqrt(5))/2, height=10, units = 'cm')

  # True Bad Observation

  # Choice Another Level of The Importance Predictor for True Bad Observation than True Good Observation
  AnotherLevel4TrueBad <- table(X[inTest, TheImportancePredictor], Y[inTest])[-which.max(table(X[inTest,
    TheImportancePredictor], Y[inTest])[, 'Good']), 'Bad'] %>%
      which.max %>% names
  
  if (is.null(AnotherLevel4TrueBad)) {
    AnotherLevel4TrueBad <- table(X[inTest, TheImportancePredictor], Y[inTest]) %>%
        as.data.frame(stringsAsFactors = FALSE) %>% 
          setNames(c('Levels', 'Classes', 'Freq')) %>%
            dplyr::filter(Classes == 'Bad', Freq > 0) %>%
              dplyr::select(Levels) %>% pull    
  }
  
  new_obj <- data.frame(Probs = preds[[NameOfTheBestModel]], Obs = Y[inTest],
    Equal = ifelse(preds[[NameOfTheBestModel]] < 0.5, 0, 1) == Y[inTest] %>% as.integer() - 1,
    TheImportancePredictor = X[inTest, TheImportancePredictor]) %>%
      rowid_to_column %>% dplyr::arrange(-Probs) %>%
        dplyr::filter(Probs < 0.4, Obs == 'Bad', Equal == TRUE,
                      TheImportancePredictor == AnotherLevel4TrueBad) %>% 
          dplyr::select(rowid) %>% pull %>% .[1]
  
  pdp1 <- DALEX::prediction_breakdown(explainer_classif_1, observation = X[inTest, ][new_obj, ])
  pdp1$variable <- pdp1$variable_value
  pdp1[1, 'variable'] <- '(Intercept)'; pdp1[nrow(pdp1), 'variable'] <- 'final_prognosis'
  
  # pdp2 <- DALEX::prediction_breakdown(explainer_classif_2, observation = X[inTest, ][new_obj, ])
  # pdp2$variable <- pdp2$variable_value
  # pdp2[1, 'variable'] <- '(Intercept)'; pdp2[nrow(pdp2), 'variable'] <- 'final_prognosis'
  # 
  # pdp3 <- DALEX::prediction_breakdown(explainer_classif_3, observation = X[inTest, ][new_obj, ])
  # pdp3$variable <- pdp3$variable_value
  # pdp3[1, 'variable'] <- '(Intercept)'; pdp3[nrow(pdp3), 'variable'] <- 'final_prognosis'
  
  p3 <- plot(pdp1, vcolors = c('-1' = 'tomato3', '0' = '#f5f5f5', '1' = 'palegreen3', 'X' = '#F8766D')) +
    # theme(strip.background = element_rect(fill = 'gray45')) + 
    # theme(strip.text = element_text(colour = 'white')) +
    theme_grey() + theme(legend.position = 'none', panel.border = element_blank()) +
    labs(title = paste0('Reference (', new_obj, ' observation) = True ', Y[inTest][ new_obj ]))
  # plot(pdp2) +
  #   labs(subtitle = paste0('Reference (', new_obj, ' observation) = True ', Y[inTest][ new_obj ]))
  # plot(pdp3) +
  #   labs(subtitle = paste0('Reference (', new_obj, ' observation) = True ', Y[inTest][ new_obj ]))
  print(p3)
  
  # Output Chart: EXplanations of Model by Predictors of True Bad Observation
  openxlsx::insertPlot(wb, sheet = 'IV Table', xy = c(15, nrow(binning.df) + 5), width = 10 * (1+sqrt(5)) / 2,
                       height = 10, units = 'cm')
  
  if (TheBestModel$Description != 'LogisticRegression')
    openxlsx::insertPlot(wb, sheet = Scorecard, xy=c(1, 40), width = 10*(1 + sqrt(5))/2, height=10, units = 'cm')  }

Model agnostic tool for decomposition of predictions from black boxes. Break Down Table shows contributions of every variable to a final prediction. Break Down Plot presents variable contributions in a concise graphical way. This package work for binary classifiers and general regression models.

The Best Classifier for Score Modeling

## 
## rxFastTrees - Estimate a Area Under the ROC Curve (AUC) on Testing Set = 66.15%

## 
## rxFastTrees - Estimate a Kolmogorov-Smirnov Statistic on Testing Set = 0.2372

## 
## rxFastTrees - Estimate a Kullback-Leibler’s Divergence Statistic on Testing Set = 0.0866

Finally, let’s build a Scorecard using the selected Logit-Model on the Test dataset.

I  = 50           # is the score increment (Points to double the Oods)
c_ = 500          # Offset of scores (margin of Good & Bad classes)

if (TheBestModel$Description == 'LogisticRegression') {
  # Create Scorecard for GLM
  
  Attributes <- vector()
  Levels <- vector()
  Predictors <- vector()
  Totals <- vector()
  ObBads <- vector()
  ObGoods <- vector()
  PrGoods <- vector()
  PrClass <- rxImport(inData= fitTestScores, varsToKeep = 'PredictedLabel.rxLogisticRegression') %>%
    pull
  Chi_squared_test <- vector()
  
  for (i in 1:length(rxLogisticRegressionFit$params$formulaVars %>% .[-1])) {
    
    Attributes0 <- 
      rxLogisticRegressionFit$coefficients %>% names() %>% 
        .[ grep(pattern = rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i],
                x = rxLogisticRegressionFit$coefficients %>% names()) ] %>%
          sort %>% 
            dplyr::as_tibble() %>%
              tidyr::separate(value, into = c('empty', 'Attributes'), 
                              sep=paste0(rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i], '.')) %>%
                pull(Attributes) 
    Attributes <- c(Attributes, Attributes0)
    
    Levels0 <- rxLogisticRegressionFit$coefficients %>% names() %>% 
      .[ grep(pattern = rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i],
              x = rxLogisticRegressionFit$coefficients %>% names()) ] %>% sort
    Levels <- c(Levels, Levels0)

    Predictors <- c(Predictors, rep(x = rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i],
                                times = length(Levels0)))
    
    Totals <- c(Totals, table(X[inTest, rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i] ]) %>% as.vector)
    
    ObBads <- c(ObBads, table(X[inTest, rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i] ], Y[inTest]) %>% 
                .[, 'Bad'] %>% as.vector)
    
    ObGoods <- c(ObGoods, table(X[inTest, rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i] ], Y[inTest]) %>%
                 .[, 'Good'] %>% as.vector)
  
    PrGoods <- c(PrGoods, table(X[inTest, rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i] ], PrClass) %>%
                 .[, 'Good'] %>% as.vector)
  
    # Chi squared test (contingency table) for each row in a table by each Predictors
    Chi_squared_test <-  c( Chi_squared_test, apply(table(X[inTest,
      rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i] ], Y[inTest]), 1, 
      function(x) chisq.test(matrix(x, ncol = levels(Y) %>% length))$p.value) )
  }
  
  # Part of Intercept's Score / Number of Features
  CommonScore = (c_ + rxLogisticRegressionFit$coefficients[1] * I / log(2)) / length(rxLogisticRegressionFit$params$formulaVars %>% .[-1])
  
  # Create a Data.Frame by Predictors with Coefficients & Scores
  Scorecard.df <- 
    # dplyr::left_join(  # faster join as.integer, then as.factor and finally as.character
    #   x =  data.frame(Predictors, Attributes, Levels, Totals, ObBads, ObGoods, PrGoods, Chi_squared_test,
    #                   stringsAsFactors = FALSE), 
    #   y = data.frame(Names = rxLogisticRegressionFit$coefficients %>% attr('names'),
    #                  Coefficients = rxLogisticRegressionFit$coefficients,
    #                  stringsAsFactors = FALSE), 
    #   by = c('Levels' = 'Names')
    #   ) %>%
    dplyr::left_join(  # faster join as.integer, then as.factor and finally as.character
      x = data.frame(Predictors, Attributes, Levels,
                     stringsAsFactors = FALSE), 
      y = data.frame(Levels = paste0( attr(Chi_squared_test, 'names') %>%
                                        str_extract_all(., boundary('word')) %>% transpose() %>% .[2] %>% unlist,
                                      '_fct.', attr(Chi_squared_test, 'names') ),
                     Totals, ObBads, ObGoods, PrGoods, Chi_squared_test,
                     stringsAsFactors = FALSE), 
      by = c('Levels' = 'Levels') ) %>% 
      # Attaching tables with different lengths due to the small gradations of some predictors
      dplyr::left_join(  # faster join as.integer, then as.factor and finally as.character
        x = ., 
        y = data.frame(Names = rxLogisticRegressionFit$coefficients %>% attr('names'),
                       Coefficients = rxLogisticRegressionFit$coefficients,
                       stringsAsFactors = FALSE), 
        by = c('Levels' = 'Names') ) %>%
          tidyr::replace_na(list(Coefficients = 0)) %>% 
            dplyr::mutate(
                          Total = Totals, Bad = ObBads, Good = ObGoods,
                          `Share of Total` = Totals / length(Y[inTest]),
                          `Chi Squared` = Chi_squared_test,
                          `Pred Good` = PrGoods,
                          `Sensitivity by Levels` = PrGoods / Totals,
                          Scores = round(Coefficients * I / log(2) + CommonScore, 0)) %>% 
              dplyr::select(-Levels, -Chi_squared_test, -Totals, -ObBads, -ObGoods, -PrGoods)
    
  ListOfPredictors <- vector()
  for (i in 1:length(rxLogisticRegressionFit$params$formulaVars %>% .[-1])) {
    ListOfPredictors <- c( ListOfPredictors, length(rxLogisticRegressionFit$coefficients %>% names() %>% 
    .[ grep(pattern = rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[5],
            x = rxLogisticRegressionFit$coefficients %>% names()) ]) )
    attr(ListOfPredictors, 'names')[i] <- rxLogisticRegressionFit$params$formulaVars %>% .[-1] %>% .[i]
  }
  
  header.df <- data.frame(A = c('Scorecard by Logit-Model and Result of Classification on TESTING Set', 'Logit Model'), B = c(NA, NA), C = c(NA, NA), D = c(NA, 'Observed Distribution (Reference)'), E = c(NA, NA), F = c(NA, NA), G = c(NA, NA), H = c(NA, NA), I = c(NA, 'Prediction'), stringsAsFactors = FALSE)
  
  ListOfHeaders <- c(2, 5, 3)
  attr(ListOfHeaders, 'names') <- c(header.df[2, 'A'], header.df[2, 'D'], header.df[2, 'I'])
  
  # Show a Data.Frame by Predictors with Coefficients & Scores
  
  Scorecard.df %>% 
    mutate( # Predictors = cell_spec(Predictors, bold = TRUE),
      Attributes = Attributes,
      Coefficients = cell_spec(Coefficients, 'html', color=ifelse(Scores >= arrange(., desc(Scores))[nrow(.)*.25,
                                                                  'Scores']%>% as.numeric, 'darkgreen', 'black')),
      Total, Bad, Good, 
      `Share of Total` = formattable::percent(`Share of Total`, digits = 1),
      `Chi Squared` = cell_spec(formattable(`Chi Squared`, format = "f", digits = 4), 'html', 
                                color = ifelse(`Chi Squared` >= 0.05, 'orangered', 'darkgray')),
      `Pred Good`, 
      `Sensitivity by Levels` = proportion_bar('khaki')(round(`Sensitivity by Levels`, 2)),
      Scores = proportion_bar('chartreuse')(Scores) ) %>% dplyr::select(-Predictors) %>% 
  knitr::kable(format = 'html', digits = 4, longtable = TRUE, booktabs = TRUE, escape = F,
             # col.names = c('Levels of Predictors', 'Coefficients', 'Total', 'Bad', 'Good', 'Share of Total',
             #              'Chi²', 'Predicted Good', 'Sensitivity by Levels', 'Scores'),
             caption = header.df[1, 'A']) %>% 
           kableExtra::kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive',
                                                           full_width = FALSE)) %>% 
             # kableExtra::column_spec(10, width = '5cm') %>%
               kableExtra::add_header_above(ListOfHeaders) # %>% 
                 #kableExtra::group_rows(index = ListOfPredictors)
  
  # Export a Data.Frame with Coefficients & Scores into MS Excel
  N <- 4:(nrow(Scorecard.df) + 3)
  writeDataTable(wb, sheet = Scorecard, x = Scorecard.df, tableStyle = 'TableStyleMedium2', startCol = 'A',
                 startRow = 3, tableName = 'Scorecard', firstColumn = TRUE, lastColumn = TRUE, bandedRows = TRUE)

  # Set Columns widths
  setColWidths(wb, sheet=Scorecard, cols=1:ncol(Scorecard.df), widths = c(14, 15, 11, 7, 7, 7, 8, 9, 7, 11, 10))
  mergeCells(wb, sheet = Scorecard, cols = 1:3, rows = 2)
  mergeCells(wb, sheet = Scorecard, cols = 4:8, rows = 2)
  mergeCells(wb, sheet = Scorecard, cols = 9:11, rows = 2)
  
  # # Set Row heights
  # setRowHeights(wb, sheet = Scorecard, rows = 1, heights = 45)
  
  # Set Styles & Conditional Formattings in Columns
  addStyle(wb, sheet = Scorecard, style = createStyle(wrapText = TRUE, halign = 'center', valign = 'center'),
                                              cols = 1:ncol(Scorecard.df), rows = 3)
  conditionalFormatting(wb, sheet = Scorecard, cols = 3, rows = N, type = "between",               # Coefficients
                        rule = c(quantile(Scorecard.df$Coefficients)['75%'], max(Scorecard.df$Coefficients)),
                        style = createStyle(fontColour = 'darkgreen'))   
  addStyle(wb, sheet=Scorecard, cols=3, rows=N, style = createStyle(border = 'right', borderColour = '#4F81BC'))
  addStyle(wb, sheet = Scorecard, createStyle(numFmt = 'comma'), cols = 4:6, rows = N, gridExpand = TRUE)
  addStyle(wb, sheet = Scorecard, cols = 7, rows = N, style = createStyle(numFmt = '0%'))
  addStyle(wb, sheet = Scorecard, cols = 8, rows = N, style = createStyle(border = 'right', 
           borderColour = '#4F81BC', fontColour = 'darkgrey', numFmt = paste0('0', options()$OutDec, '0000')))
  conditionalFormatting(wb, sheet = Scorecard, cols = 8, rows = N, type = "between", rule = c(0.05, 1))   # Chi²
  addStyle(wb, sheet = Scorecard, cols = 9, rows = N, style = createStyle(numFmt = 'COMMA'))
  addStyle(wb, sheet = Scorecard, cols = 10, rows = N, 
           style = createStyle(numFmt = paste0('0', options()$OutDec, '0000')))
  conditionalFormatting(wb, sheet = Scorecard, cols = ncol(Scorecard.df) - 1, rows = N,
                                  style = c('red', 'khaki'), type = 'databar')
  conditionalFormatting(wb, sheet = Scorecard, cols = ncol(Scorecard.df), rows = N,
                                  style = c('red', 'chartreuse'), type = 'databar')
  
  writeData(wb, sheet = Scorecard, header.df, colNames = FALSE, rowNames = FALSE, startCol = 'A', startRow = 1)
  addStyle(wb, sheet = Scorecard, cols=1, rows = 1, style = createStyle(fontSize = 16, textDecoration = 'bold'))
  addStyle(wb, sheet = Scorecard, cols = 1:ncol(Scorecard.df), rows = 2, style = createStyle(wrapText = TRUE,
        halign = 'center', valign = 'center', fontColour = 'white', fgFill = '#4F81BC', textDecoration = 'bold'))
  addStyle(wb, sheet = Scorecard, cols=3, rows=3, style = createStyle(border = 'right', borderColour = 'white'))
  addStyle(wb, sheet = Scorecard, cols=8, rows=3, style = createStyle(border = 'right', borderColour = 'white'))

  remove(Attributes, Attributes0, Levels, Levels0, Predictors, Totals, ObBads, ObGoods, PrClass, PrGoods,
         Chi_squared_test, ListOfPredictors, ListOfHeaders, header.df, N)
  
} else { # Not Logit-Models
  
  writeData(wb, sheet = Scorecard, data.frame(A = c(paste0('Scorecard by Model `', NameOfTheBestModel ,
                                                   '` and Result of Classification on TESTING Set'))),
            colNames = FALSE, rowNames = FALSE, startCol = 'A', startRow = 1)
  addStyle(wb, sheet = Scorecard, cols=1, rows = 1, style = createStyle(fontSize = 16, textDecoration = 'bold'))
  
} # End if == LogisticRegression

openxlsx::renameWorksheet(wb, Scorecard, 'MLS')
openxlsx::writeFormula(wb, sheet = 'IV Table', x = makeHyperlinkString(sheet = 'MLS', row = 1, col = 1,
                       text = 'Scorecard: MLS'), startCol = 'A', startRow = nrow(binning.df) + 4)
openxlsx::addStyle(wb,sheet = 'IV Table', cols = 1, rows = nrow(binning.df) + 4, 
                   style = createStyle(fontColour = 'brown', textDecoration = 'bold'))

# Supplement Variable Table with Importance Feature
binning.df %>%
  # dplyr::mutate_if(is.factor, as.character) %>%
    dplyr::left_join(importance.df, by = c('Variable' = 'Variable')) %>%
      openxlsx::writeDataTable(wb, sheet = 'IV Table', x = ., tableStyle = 'TableStyleMedium4', startCol = 'A',
                  startRow = 2, tableName = 'IVTable', firstColumn = FALSE, lastColumn = TRUE, bandedRows = TRUE)
## Warning: Column `Variable` joining factors with different levels, coercing
## to character vector
openxlsx::writeComment(wb, sheet = 'IV Table', xy = c(ncol(binning.df) + 1, 2),
              comment = openxlsx::createComment(comment = 'Importance Feature by Gini coefficient (MLS)',
                                                height = .6))
  
# Recovering First Column of Names with Hyperlinks
for (i in 1:nrow(binning.df)) {
  ## Internal - Text to display
  val = binning.df[i, 'Variable']
  writeFormula(wb, sheet = 'IV Table', startCol = 'A', startRow = i + 2, 
    x = makeHyperlinkString(sheet = val, row = 1, col = 1, text = val))
}

# Set Columns widths
openxlsx::setColWidths(wb, sheet = 'IV Table', cols = 1:2, widths = c(32, 12))
openxlsx::setColWidths(wb, sheet = 'IV Table', cols = ncol(binning.df):(ncol(binning.df)+1), widths = c(12, 13))

N <- 3:(nrow(binning.df) + 2)
openxlsx::conditionalFormatting(wb, sheet = 'IV Table', cols = 2, rows = N, type = 'databar',
                                border = FALSE, style = c('red', 'royalblue'))
openxlsx::conditionalFormatting(wb, sheet = 'IV Table', type = 'databar', cols = ncol(binning.df) + 1, 
                                rows =3:(nrow(binning.df)+2), border = FALSE, style = c('tomato3', 'palegreen3'))
openxlsx::addStyle(wb, sheet = 'IV Table', cols = 2, rows = N, 
                   style = openxlsx::createStyle(border = 'right', borderColour = '#9CB95C'))
openxlsx::addStyle(wb, sheet = 'IV Table', cols = 10, rows = N, 
                   style = openxlsx::createStyle(border = 'right', borderColour = '#9CB95C'))
openxlsx::addStyle(wb, sheet = 'IV Table', cols = ncol(binning.df) + 1, rows = 3:(nrow(binning.df)+2), 
                   style = openxlsx::createStyle(numFmt = paste0('0', options()$OutDec, '0000')))

openxlsx::writeFormula(wb, sheet = 'IV Table', x = paste0('=T("Table of Variables (', LoadingData,  ')")'),
                       startCol = 'A', startRow = 1)
openxlsx::addStyle(wb, sheet = 'IV Table', cols = 1, rows = 1, 
                   style = openxlsx::createStyle(fontSize = 16, textDecoration = 'bold'))

# Open MS Excel 
openxlsx::openXL(wb)

remove(p1, p2, p3) # , wb)

Let’s clarify the characteristics of the scoring card formed by the selected predictors.

## 
##   Overall Performance Metrics 
##   -------------------------------------------------- 
##                     KS : 0.2362 (Unpredictive)
##                    AUC : 0.6615 (Poor)
## 
##   Classification Matrix 
##   -------------------------------------------------- 
##            Cutoff (>=) : 500 (User Defined)
##    True Positives (TP) : 11626
##   False Positives (FP) : 2049
##   False Negatives (FN) : 6636
##    True Negatives (TN) : 3005
##    Total Positives (P) : 18262
##    Total Negatives (N) : 5054
## 
##   Business/Performance Metrics 
##   -------------------------------------------------- 
##       %Records>=Cutoff : 0.5865
##              Good Rate : 0.8502 (Vs 0.7832 Overall)
##               Bad Rate : 0.1498 (Vs 0.2168 Overall)
##         Accuracy (ACC) : 0.6275
##      Sensitivity (TPR) : 0.6366
##  False Neg. Rate (FNR) : 0.3634
##  False Pos. Rate (FPR) : 0.4054
##      Specificity (TNR) : 0.5946
##        Precision (PPV) : 0.8502
##   False Discovery Rate : 0.1498
##     False Omision Rate : 0.6883
##   Inv. Precision (NPV) : 0.3117
## 
##   Note: 0 rows deleted due to missing data.
## Warning in if (class(TheBestModel) == "rxNaiveBayes") {: длина условия > 1,
## будет использован только первый элемент
## Beginning processing data.
## Rows Read: 345546, Read Time: 0, Transform Time: 0
## Beginning processing data.
## Elapsed time: 00:00:04.1849337
## Finished writing 345546 rows.
## Writing completed.
## Rows Read: 345546, Total Rows Processed: 345546, Total Chunk Time: 0.049 seconds
if (class(DT) %>% length == 1) setDT(DT)

Output.tbl <- 
  cbind(DT[, c('UniqueID', gsub('_fct', '', names(X))), with = FALSE], X,  #
    data.frame(
       `Probability of Default` = 1 - Probs
      , Scores = round((c_ + (I / log(2))  * log(Probs / (1 - Probs))) + 0, 0)
      , Prediction = ifelse(Probs < 0.5, 'Bad', 'Good')
      , Reference = Y)
    )

# Create DataTable in MS Excel 
openxlsx::addWorksheet(wb2 <- openxlsx::createWorkbook(), sheetName = 'Data', gridLines = FALSE)
openxlsx::writeDataTable(wb2, sheet = 'Data', x = Output.tbl, tableStyle = 'TableStyleMedium6', 
                         tableName = 'Data', firstColumn = TRUE, lastColumn = TRUE, bandedRows = TRUE)
# # Writing Comments into cells
# for (i in 1:ncol(Output.tbl)) 
#   openxlsx::writeComment(wb2, sheet = 'Data', xy = c(i, 1),
#                          comment = openxlsx::createComment(comment = attr(Output.tbl, 'variable.labels')[i], 
#                                     visible = FALSE, width = 2, height = 10, style = createStyle(fontSize = 8)))
# Set Columns widths
openxlsx::setColWidths(wb2, sheet = 'Data', cols = 1:2, widths = c(12, 10))

openxlsx::freezePane(wb2, 'Data', firstCol = TRUE)  # shortcut to firstActiveCol = 2
# openxlsx::conditionalFormatting(wb2, sheet = 'Data', type = 'expression', cols = 1:ncol(Output.tbl), 
#                                 rows =2:(nrow(Output.tbl)+1), rule = '$A2=="АО "&CHAR(34)&"Евразийский Банк"&CHAR(34)', style = createStyle(fontColour = 'darkgreen', bgFill = 'darkseagreen1', textDecoration = 'bold'))
# openxlsx::conditionalFormatting(wb2, sheet = 'Data', type = 'expression', cols = 1:ncol(Output.tbl), 
#                                 rows =2:(nrow(Output.tbl)+1), rule = '$A2=="АО "&CHAR(34)&"АТФБанк"&CHAR(34)', style = createStyle(fontColour = '#505000', bgFill = 'lemonchiffon', textDecoration = 'bold'))
# openxlsx::conditionalFormatting(wb2, sheet = 'Data', type = 'expression', cols = 1:ncol(Output.tbl), 
#                                 rows =2:(nrow(Output.tbl)+1), rule = '$A2=="ДБ АО "&CHAR(34)&"Банк Хоум Кредит"&CHAR(34)', style = createStyle(fontColour = 'coral4', bgFill = 'coral', textDecoration = 'bold'))

openxlsx::openXL(wb2)
## Beginning processing data.
## Rows Read: 112392, Read Time: 0, Transform Time: 0
## Beginning processing data.
## Elapsed time: 00:00:01.4410516
## Finished writing 112392 rows.
## Writing completed.
## Rows Read: 112392, Total Rows Processed: 112392, Total Chunk Time: 0.033 seconds

Conclusion

The best results in Public Leaderboard for the LTFS Data Science FinHack are above 0.6731 AUC or Gini 0.3642. However, the algorithms implemented in R or Python could not properly solve this problem.

Obviously, the classification problem has not been resolved. It was not possible to find or construct such predictors that would have sufficient separation power capable of resolving the binary class default of vehicle loans. None of the many diverse classification models were able to obtain quality on the test dataset above AUC 0.70 or Gini 0.40.

Although the algorithms of Microsoft Machine Learning (Microisoft ML Server 9.4.7) work quite quickly, they are not yet able to solve this classification problem. Perhaps this is due to the fact that the proportion of default (bad) Indian borrowers on car loans is very large, that is, it exceeds 20%.

Finish of Session

You can execute this R Markdown file as a job. You should create one R file with code:

rmarkdown::render(input = '1._MLS_.rmd', output_format = c('html_document'))

Then you can run the R file in subdirectory ~/Projects/.

## - Session info ----------------------------------------------------------
##  setting  value                       
##  version  R version 3.5.2 (2018-12-20)
##  os       Windows 10 x64              
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  Russian_Russia.1251         
##  ctype    Russian_Russia.1251         
##  tz       Asia/Dhaka                  
##  date     2019-08-31                  
## 
## - Packages --------------------------------------------------------------
##  ! package          * version    date       lib
##    acepack            1.4.1      2016-10-29 [1]
##    agricolae          1.3-0      2019-01-07 [1]
##    ALEPlot            1.1        2018-05-24 [1]
##    AlgDesign          1.1-7.3    2014-10-15 [1]
##    assertthat         0.2.0      2017-04-11 [1]
##    backports          1.1.3      2018-12-14 [1]
##    base64enc          0.1-3      2015-07-28 [1]
##    bayesplot          1.6.0      2018-08-02 [1]
##    bindr              0.1.1      2018-03-13 [1]
##    bindrcpp         * 0.2.2      2018-03-29 [1]
##    bit                1.1-14     2018-05-29 [1]
##    bit64              0.9-7      2017-05-08 [1]
##    bitops             1.0-6      2013-08-17 [1]
##    blob               1.1.1      2018-03-25 [1]
##    boot               1.3-20     2017-08-06 [1]
##    breakDown        * 0.2.0      2019-08-15 [1]
##    broom              0.5.1      2018-12-05 [1]
##    callr              3.1.1      2018-12-21 [1]
##    caret            * 6.0-81     2018-11-20 [1]
##    caTools            1.17.1.1   2018-07-20 [1]
##    cellranger         1.1.0      2016-07-27 [1]
##    checkmate          1.9.1      2019-01-15 [1]
##    chron              2.3-53     2018-09-09 [1]
##    class              7.3-14     2015-08-30 [1]
##    cli                1.0.1      2018-09-25 [1]
##    cluster            2.0.7-1    2018-04-13 [1]
##    coda               0.19-2     2018-10-08 [1]
##    codetools          0.2-15     2016-10-05 [1]
##    colorspace         1.4-0      2019-01-13 [1]
##    colourpicker       1.0        2017-09-27 [1]
##    combinat           0.0-8      2012-10-29 [1]
##    CompatibilityAPI   1.1.0      2019-01-10 [1]
##    crayon             1.3.4      2017-09-16 [1]
##    crosstalk          1.0.0      2016-12-21 [1]
##    curl               3.3        2019-01-10 [1]
##    DALEX            * 0.2.6      2019-01-07 [1]
##    data.table       * 1.12.0     2019-01-13 [1]
##    DataExplorer     * 0.8.0      2019-08-24 [1]
##    DBI                1.0.0      2018-05-02 [1]
##    deldir             0.1-16     2019-01-04 [1]
##    desc               1.2.0      2018-05-01 [1]
##    DescTools          0.99.27    2019-01-19 [1]
##    devtools           2.0.1      2018-10-26 [1]
##    digest             0.6.18     2018-10-10 [1]
##    doParallel       * 1.0.14     2019-04-11 [1]
##    dplyr            * 0.7.8      2018-11-10 [1]
##    DT                 0.5        2018-11-05 [1]
##    dygraphs           1.1.1.6    2018-07-11 [1]
##    e1071              1.7-0.1    2019-01-21 [1]
##    embed              0.0.2      2018-11-19 [1]
##    evaluate           0.12       2018-10-09 [1]
##    expm               0.999-3    2018-09-22 [1]
##    factorMerger       0.4.0      2019-08-15 [1]
##    forcats          * 0.3.0      2018-02-19 [1]
##    foreach          * 1.5.1      2019-04-11 [1]
##    foreign            0.8-71     2018-07-20 [1]
##    formattable      * 0.2.0.1    2016-08-05 [1]
##    Formula          * 1.2-3      2018-05-03 [1]
##    fs                 1.2.6      2018-08-23 [1]
##    FSelectorRcpp    * 0.3.0      2018-11-12 [1]
##    gdata              2.18.0     2017-06-06 [1]
##    generics           0.0.2      2018-11-29 [1]
##    ggmosaic           0.2.0      2018-09-12 [1]
##    ggplot2          * 3.1.0      2018-10-25 [1]
##    ggpubr             0.2        2018-11-15 [1]
##    ggridges           0.5.1      2018-09-27 [1]
##    glmnet             2.0-16     2018-04-02 [1]
##    glue               1.3.0      2018-07-17 [1]
##    gmodels            2.18.1     2018-06-25 [1]
##    gower              0.1.2      2017-02-23 [1]
##    gplots           * 3.0.1.1    2019-01-27 [1]
##    gridExtra          2.3        2017-09-09 [1]
##    gsubfn           * 0.7        2018-03-16 [1]
##    gtable             0.2.0      2016-02-26 [1]
##    gtools             3.8.1      2018-06-26 [1]
##    haven              2.0.0      2018-11-22 [1]
##    highr              0.7        2018-06-09 [1]
##    hmeasure           1.0-1      2019-01-02 [1]
##    Hmisc            * 4.2-0      2019-01-26 [1]
##    hms                0.4.2      2018-03-10 [1]
##    htmlTable          1.13.1     2019-01-07 [1]
##    htmltools          0.3.6      2017-04-28 [1]
##    htmlwidgets        1.3        2018-09-30 [1]
##    httpuv             1.4.5.1    2018-12-18 [1]
##    httr               1.4.0      2018-12-11 [1]
##    igraph             1.2.2      2018-07-27 [1]
##    iml              * 0.8.1      2019-01-02 [1]
##    Information        0.0.9      2016-04-09 [1]
##    inline             0.3.15     2018-05-18 [1]
##    inum               1.0-0      2017-12-12 [1]
##    ipred              0.9-8      2018-11-05 [1]
##    iterators        * 1.0.11     2019-04-11 [1]
##    jsonlite           1.6        2018-12-07 [1]
##    kableExtra       * 1.0.1      2019-01-22 [1]
##    keras              2.2.4      2018-11-22 [1]
##    KernSmooth         2.23-15    2015-06-29 [1]
##    klaR               0.6-14     2018-03-19 [1]
##    knitr              1.21       2018-12-10 [1]
##    labeling           0.3        2014-08-23 [1]
##    later              0.7.5      2018-09-18 [1]
##    lattice          * 0.20-38    2018-11-04 [1]
##    latticeExtra       0.6-28     2016-02-09 [1]
##    lava               1.6.4      2018-11-25 [1]
##    lazyeval           0.2.1      2017-10-29 [1]
##    LearnBayes         2.15.1     2018-03-18 [1]
##    libcoin          * 1.0-2      2018-12-13 [1]
##    lme4               1.1-19     2018-11-10 [1]
##    loo                2.0.0      2018-04-11 [1]
##    lubridate          1.7.4      2018-04-11 [1]
##    magrittr         * 1.5        2014-11-22 [1]
##    manipulate         1.0.1      2014-12-24 [1]
##    markdown           0.9        2018-12-07 [1]
##    MASS               7.3-51.1   2018-11-01 [1]
##    Matrix             1.2-15     2018-11-01 [1]
##    matrixStats        0.54.0     2018-07-23 [1]
##    memoise            1.1.0      2017-04-21 [1]
##    Metrics            0.1.4      2018-07-09 [1]
##    MicrosoftML      * 9.4.7      2019-05-07 [1]
##    mime               0.6        2018-10-05 [1]
##    miniUI             0.1.1.1    2018-05-18 [1]
##    minqa              1.2.4      2014-10-09 [1]
##    ModelMetrics       1.2.2      2018-11-03 [1]
##    modelr             0.1.2      2018-05-11 [1]
##  D mrsdeploy        * 1.1.3      2019-05-15 [1]
##    munsell            0.5.0      2018-06-12 [1]
##    mvtnorm          * 1.0-8      2018-05-31 [1]
##    networkD3          0.4        2017-03-18 [1]
##    nlme               3.1-137    2018-04-07 [1]
##    nloptr             1.2.1      2018-10-03 [1]
##    nnet               7.3-12     2016-02-02 [1]
##    openxlsx         * 4.1.0      2018-05-26 [1]
##    pander             0.6.3      2018-11-06 [1]
##    partykit         * 1.2-3      2019-01-31 [1]
##    pdp                0.7.0      2018-08-27 [1]
##    pillar             1.3.1      2018-12-15 [1]
##    pkgbuild           1.0.2      2018-10-16 [1]
##    pkgconfig          2.0.2      2018-08-16 [1]
##    pkgload            1.0.2      2018-10-29 [1]
##    plotly           * 4.8.0      2018-07-20 [1]
##    plyr             * 1.8.4      2016-06-08 [1]
##    prettyunits        1.0.2      2015-07-13 [1]
##    processx           3.2.1      2018-12-05 [1]
##    prodlim            2018.04.18 2018-04-18 [1]
##    productplots       0.1.1      2016-07-02 [1]
##    promises           1.0.1      2018-04-13 [1]
##    proto            * 1.0.0      2016-10-29 [1]
##    proxy              0.4-22     2018-04-08 [1]
##    pryr               0.1.4      2018-02-18 [1]
##    ps                 1.3.0      2018-12-21 [1]
##    purrr            * 0.3.0      2019-01-27 [1]
##    pwr              * 1.2-2      2018-03-03 [1]
##    questionr          0.7.0      2018-11-26 [1]
##    R6                 2.3.0      2018-10-04 [1]
##    rapportools        1.0        2014-01-07 [1]
##    RColorBrewer       1.1-2      2014-12-07 [1]
##    Rcpp               1.0.0      2018-11-07 [1]
##    RCurl              1.95-4.11  2018-07-15 [1]
##    readr            * 1.3.1      2018-12-21 [1]
##    readxl             1.2.0      2018-12-19 [1]
##    recipes            0.1.4      2018-11-19 [1]
##    remotes            2.0.2      2018-10-30 [1]
##    reshape2         * 1.4.3      2017-12-11 [1]
##    reticulate         1.10       2018-08-05 [1]
##    RevoMods         * 11.0.1     2019-04-11 [1]
##    RevoScaleR       * 9.4.7      2019-05-21 [1]
##    RevoUtils        * 11.0.2     2019-04-11 [1]
##    RevoUtilsMath    * 11.0.0     2019-04-24 [1]
##    rlang              0.3.1      2019-01-08 [1]
##    rmarkdown          1.11       2018-12-08 [1]
##    ROCR             * 1.0-7      2015-03-26 [1]
##    rpart            * 4.1-13     2018-02-23 [1]
##    rprojroot          1.3-2      2018-01-03 [1]
##    rsconnect          0.8.13     2019-01-10 [1]
##    RSQLite          * 2.1.1      2018-05-06 [1]
##    rstan              2.18.2     2018-11-07 [1]
##    rstanarm           2.18.2     2018-11-10 [1]
##    rstantools         1.5.1      2018-08-22 [1]
##    rstudioapi         0.9.0      2019-01-09 [1]
##    rvest              0.3.2      2016-06-17 [1]
##    scales             1.0.0      2018-08-09 [1]
##    sessioninfo        1.1.1      2018-11-05 [1]
##    shiny              1.2.0      2018-11-02 [1]
##    shinyjs            1.0        2018-01-08 [1]
##    shinystan          2.5.0      2018-05-01 [1]
##    shinythemes        1.1.2      2018-11-06 [1]
##    smbinning        * 0.8        2019-01-07 [1]
##    sp                 1.3-1      2018-06-05 [1]
##    spData             0.3.0      2019-01-07 [1]
##    spdep              0.8-1      2018-11-21 [1]
##    sqldf            * 0.4-11     2017-06-28 [1]
##    StanHeaders        2.18.1     2019-01-28 [1]
##    stringi            1.2.4      2018-07-20 [1]
##    stringr          * 1.3.1      2018-05-10 [1]
##    summarytools     * 0.8.8      2018-10-07 [1]
##    survival         * 2.43-3     2018-11-26 [1]
##    tensorflow         1.10       2018-11-19 [1]
##    testthat           2.0.1      2018-10-13 [1]
##    tfruns             1.4        2018-08-25 [1]
##    threejs            0.3.1      2017-08-13 [1]
##    tibble           * 2.0.1      2019-01-12 [1]
##    tidyr            * 0.8.2      2018-10-28 [1]
##    tidyselect         0.2.5      2018-10-11 [1]
##    tidyverse        * 1.2.1      2017-11-14 [1]
##    timeDate           3043.102   2018-02-21 [1]
##    usethis            1.4.0      2018-08-14 [1]
##    viridisLite        0.3.0      2018-02-01 [1]
##    webshot            0.5.1      2018-09-28 [1]
##    whisker            0.3-2      2013-04-28 [1]
##    withr              2.1.2      2018-03-15 [1]
##    woeBinning       * 0.1.6      2018-07-28 [1]
##    xfun               0.4        2018-10-23 [1]
##    xml2               1.2.0      2018-01-24 [1]
##    xtable             1.8-3      2018-08-29 [1]
##    xts                0.11-2     2018-11-05 [1]
##    yaImpute           1.0-31     2019-01-09 [1]
##    yaml               2.2.0      2018-07-25 [1]
##    zeallot            0.1.0      2018-01-28 [1]
##    zip                1.0.0      2017-04-25 [1]
##    zoo                1.8-4      2018-09-19 [1]
##  source                                  
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  Github (pbiecek/breakDown@ba9a0d9)      
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  local                                   
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  Github (boxuancui/DataExplorer@8a71951) 
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  local                                   
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  Github (MI2DataLab/factorMerger@c49e37f)
##  CRAN (R 3.5.2)                          
##  local                                   
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  local                                   
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  local                                   
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  local                                   
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  local                                   
##  local                                   
##  local                                   
##  local                                   
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
##  CRAN (R 3.5.2)                          
## 
## [1] C:/R/MLS/R_SERVER/library
## 
##  D -- DLL MD5 mismatch, broken installation.