Datos

Se elimina el CUSTOMERID dado que es una variable para identificar al cliente y no aporta en el análisis.

CHECKINGSTATUS LOANDURATION CREDITHISTORY LOANPURPOSE LOANAMOUNT EXISTINGSAVINGS EMPLOYMENTDURATION INSTALLMENTPERCENT SEX OTHERSONLOAN CURRENTRESIDENCEDURATION OWNSPROPERTY AGE INSTALLMENTPLANS HOUSING EXISTINGCREDITSCOUNT JOB DEPENDENTS TELEPHONE FOREIGNWORKER RISK
0_to_200 31 credits_paid_to_date other 1889 100_to_500 less_1 3 female none 3 savings_insurance 32 none own 1 skilled 1 none yes No Risk
less_0 18 credits_paid_to_date car_new 462 less_100 1_to_4 2 female none 2 savings_insurance 37 stores own 2 skilled 1 none yes No Risk
less_0 15 prior_payments_delayed furniture 250 less_100 1_to_4 2 male none 3 real_estate 28 none own 2 skilled 1 yes no No Risk
0_to_200 28 credits_paid_to_date retraining 3693 less_100 greater_7 3 male none 2 savings_insurance 32 none own 1 skilled 1 none yes No Risk
no_checking 28 prior_payments_delayed education 6235 500_to_1000 greater_7 3 male none 3 unknown 57 none own 2 skilled 1 none yes Risk
no_checking 32 outstanding_credit vacation 9604 500_to_1000 greater_7 6 male co-applicant 5 unknown 57 none free 2 skilled 2 yes yes Risk

Estructura de los datos

## Classes 'data.table' and 'data.frame':   5000 obs. of  7 variables:
##  $ LOANDURATION            : int  31 18 15 28 28 32 9 16 11 35 ...
##  $ AGE                     : int  32 37 28 32 57 57 41 36 22 49 ...
##  $ LOANAMOUNT              : int  1889 462 250 3693 6235 9604 1032 3109 4553 7138 ...
##  $ CURRENTRESIDENCEDURATION: int  3 2 3 2 3 5 4 1 3 4 ...
##  $ EXISTINGCREDITSCOUNT    : int  1 2 2 1 2 2 1 2 1 2 ...
##  $ INSTALLMENTPERCENT      : int  3 2 2 3 3 6 3 3 3 5 ...
##  $ DEPENDENTS              : int  1 1 1 1 1 2 1 1 1 2 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Definimos las variables LOANDURATION, AGE, LOANAMOUNT, LOANDURATION como numéricas

## Classes 'data.table' and 'data.frame':   5000 obs. of  7 variables:
##  $ LOANDURATION            : num  31 18 15 28 28 32 9 16 11 35 ...
##  $ AGE                     : num  32 37 28 32 57 57 41 36 22 49 ...
##  $ LOANAMOUNT              : num  1889 462 250 3693 6235 ...
##  $ CURRENTRESIDENCEDURATION: int  3 2 3 2 3 5 4 1 3 4 ...
##  $ EXISTINGCREDITSCOUNT    : int  1 2 2 1 2 2 1 2 1 2 ...
##  $ INSTALLMENTPERCENT      : num  3 2 2 3 3 6 3 3 3 5 ...
##  $ DEPENDENTS              : int  1 1 1 1 1 2 1 1 1 2 ...
##  - attr(*, ".internal.selfref")=<externalptr>
## Classes 'data.table' and 'data.frame':   5000 obs. of  13 variables:
##  $ OTHERSONLOAN      : Factor w/ 3 levels "co-applicant",..: 3 3 3 3 3 1 3 3 3 1 ...
##  $ EMPLOYMENTDURATION: Factor w/ 5 levels "1_to_4","4_to_7",..: 4 1 1 3 3 3 2 2 4 3 ...
##  $ SEX               : Factor w/ 2 levels "female","male": 1 1 2 2 2 2 2 1 1 2 ...
##  $ OWNSPROPERTY      : Factor w/ 4 levels "car_other","real_estate",..: 3 3 2 3 4 4 3 1 3 4 ...
##  $ TELEPHONE         : Factor w/ 2 levels "none","yes": 1 1 2 1 1 2 1 1 1 2 ...
##  $ CHECKINGSTATUS    : Factor w/ 4 levels "0_to_200","greater_200",..: 1 3 3 1 4 4 4 3 1 4 ...
##  $ HOUSING           : Factor w/ 3 levels "free","own","rent": 2 2 2 2 2 1 2 2 2 1 ...
##  $ CREDITHISTORY     : Factor w/ 5 levels "all_credits_paid_back",..: 2 2 5 2 5 4 5 2 2 4 ...
##  $ INSTALLMENTPLANS  : Factor w/ 3 levels "bank","none",..: 2 3 2 2 2 2 2 2 2 2 ...
##  $ JOB               : Factor w/ 4 levels "management_self-employed",..: 2 2 2 2 2 2 1 2 1 2 ...
##  $ EXISTINGSAVINGS   : Factor w/ 5 levels "100_to_500","500_to_1000",..: 1 4 4 4 2 2 1 4 4 2 ...
##  $ LOANPURPOSE       : Factor w/ 11 levels "appliances","business",..: 7 3 6 10 5 11 3 11 3 1 ...
##  $ FOREIGNWORKER     : Factor w/ 2 levels "no","yes": 2 2 1 2 2 2 2 2 2 2 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Exploración de variables numéricas

##   LOANDURATION        AGE          LOANAMOUNT    CURRENTRESIDENCEDURATION
##  Min.   : 4.00   Min.   :19.00   Min.   :  250   Min.   :1.000           
##  1st Qu.:13.00   1st Qu.:28.00   1st Qu.: 1327   1st Qu.:2.000           
##  Median :21.00   Median :36.00   Median : 3238   Median :3.000           
##  Mean   :21.39   Mean   :35.93   Mean   : 3480   Mean   :2.854           
##  3rd Qu.:29.00   3rd Qu.:44.00   3rd Qu.: 5355   3rd Qu.:4.000           
##  Max.   :64.00   Max.   :74.00   Max.   :11676   Max.   :6.000           
##  EXISTINGCREDITSCOUNT INSTALLMENTPERCENT   DEPENDENTS   
##  Min.   :1.000        Min.   :1.000      Min.   :1.000  
##  1st Qu.:1.000        1st Qu.:2.000      1st Qu.:1.000  
##  Median :1.000        Median :3.000      Median :1.000  
##  Mean   :1.466        Mean   :2.982      Mean   :1.165  
##  3rd Qu.:2.000        3rd Qu.:4.000      3rd Qu.:1.000  
##  Max.   :4.000        Max.   :6.000      Max.   :2.000

Exploración de variables categóricas

##        OTHERSONLOAN   EMPLOYMENTDURATION     SEX                  OWNSPROPERTY 
##  co-applicant: 717   1_to_4    :1470     female:1896   car_other        :1540  
##  guarantor   : 110   4_to_7    :1400     male  :3104   real_estate      :1087  
##  none        :4173   greater_7 : 930                   savings_insurance:1660  
##                      less_1    : 904                   unknown          : 713  
##                      unemployed: 296                                           
##                                                                                
##                                                                                
##  TELEPHONE       CHECKINGSTATUS HOUSING                    CREDITHISTORY 
##  none:2941   0_to_200   :1304   free: 739   all_credits_paid_back : 769  
##  yes :2059   greater_200: 305   own :3195   credits_paid_to_date  :1490  
##              less_0     :1398   rent:1066   no_credits            : 117  
##              no_checking:1993               outstanding_credit    : 938  
##                                             prior_payments_delayed:1686  
##                                                                          
##                                                                          
##  INSTALLMENTPLANS                       JOB           EXISTINGSAVINGS
##  bank  : 466      management_self-employed: 641   100_to_500  :1133  
##  none  :3517      skilled                 :3400   500_to_1000 :1078  
##  stores:1017      unemployed              : 286   greater_1000: 558  
##                   unskilled               : 673   less_100    :1856  
##                                                   unknown     : 375  
##                                                                      
##                                                                      
##      LOANPURPOSE  FOREIGNWORKER
##  car_new   :945   no : 123     
##  furniture :853   yes:4877     
##  car_used  :808                
##  radio_tv  :755                
##  appliances:561                
##  repairs   :283                
##  (Other)   :795

Otros indicadores

No existen datos faltantes

Caso de análisis

Data summary
Name data
Number of rows 5000
Number of columns 21
Key NULL
_______________________
Column type frequency:
factor 14
numeric 7
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
CHECKINGSTATUS 0 1 TRUE 4 no_: 1993, les: 1398, 0_t: 1304, gre: 305
CREDITHISTORY 0 1 TRUE 5 pri: 1686, cre: 1490, out: 938, all: 769
LOANPURPOSE 0 1 FALSE 11 car: 945, fur: 853, car: 808, rad: 755
EXISTINGSAVINGS 0 1 TRUE 5 les: 1856, 100: 1133, 500: 1078, gre: 558
EMPLOYMENTDURATION 0 1 TRUE 5 1_t: 1470, 4_t: 1400, gre: 930, les: 904
SEX 0 1 FALSE 2 mal: 3104, fem: 1896
OTHERSONLOAN 0 1 FALSE 3 non: 4173, co-: 717, gua: 110
OWNSPROPERTY 0 1 FALSE 4 sav: 1660, car: 1540, rea: 1087, unk: 713
INSTALLMENTPLANS 0 1 FALSE 3 non: 3517, sto: 1017, ban: 466
HOUSING 0 1 FALSE 3 own: 3195, ren: 1066, fre: 739
JOB 0 1 TRUE 4 ski: 3400, uns: 673, man: 641, une: 286
TELEPHONE 0 1 FALSE 2 non: 2941, yes: 2059
FOREIGNWORKER 0 1 FALSE 2 yes: 4877, no: 123
RISK 0 1 FALSE 2 No : 3330, Ris: 1670

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
LOANDURATION 0 1 21.39 11.16 4 13.00 21.0 29 64 ▇▇▅▁▁
LOANAMOUNT 0 1 3480.14 2488.23 250 1326.75 3238.5 5355 11676 ▇▆▅▂▁
INSTALLMENTPERCENT 0 1 2.98 1.13 1 2.00 3.0 4 6 ▇▇▆▂▁
CURRENTRESIDENCEDURATION 0 1 2.85 1.12 1 2.00 3.0 4 6 ▇▇▅▂▁
AGE 0 1 35.93 10.65 19 28.00 36.0 44 74 ▇▇▆▁▁
EXISTINGCREDITSCOUNT 0 1 1.47 0.57 1 1.00 1.0 2 4 ▇▆▁▁▁
DEPENDENTS 0 1 1.16 0.37 1 1.00 1.0 1 2 ▇▁▁▁▂

Matriz de correlaciones de las variables numéricas

Relaciones en las variables categóricas

Analizando la relación entre las variables categóricas, se evidenció que las variables JOB y FOREIGNWORKER presentan relaciones con diversas variables, por cual se excluyen del conjunto de datos.

## Warning in chisq.test(data$LOANPURPOSE, data$FOREIGNWORKER): Chi-squared
## approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data$LOANPURPOSE and data$FOREIGNWORKER
## X-squared = 12.515, df = 10, p-value = 0.2521
## 
##  Pearson's Chi-squared test
## 
## data:  data$OWNSPROPERTY and data$JOB
## X-squared = 15.423, df = 9, p-value = 0.07996
## 
##  Pearson's Chi-squared test
## 
## data:  data$JOB and data$FOREIGNWORKER
## X-squared = 2.9341, df = 3, p-value = 0.4019

Variable de interés

## 
## No Risk    Risk 
##    3330    1670
## 
## No Risk    Risk 
##    66.6    33.4

Variable: Duración del Préstamo

Graficamente se puede ver una mayor demanda de creditos con una duración entre 4 a 6 meses, seguido con las demandas de credito de aproximadamente 30 meses, despues de este se ve un claro descenso en la duración de los creditos.

##   LOANDURATION  
##  Min.   : 4.00  
##  1st Qu.:13.00  
##  Median :21.00  
##  Mean   :21.39  
##  3rd Qu.:29.00  
##  Max.   :64.00

Duración del Préstamo vs Riesgo

Visualmente podemos apreciar que las personas que piden un credito con baja duracion de este, representan un menor riesgo que las personas que piden un credito de 30 meses a más.

Variable: Monto del Préstamo

Graficamente se puede ver una mayor demanda de creditos con una duración entre 4 a 6 meses, seguido con las demandas de credito de aproximadamente 30 meses, despues de este se ve un claro descenso en la duración de los creditos.

##    LOANAMOUNT   
##  Min.   :  250  
##  1st Qu.: 1327  
##  Median : 3238  
##  Mean   : 3480  
##  3rd Qu.: 5355  
##  Max.   :11676

Monto del Préstamo vs Riesgo

Visualmente podemos apreciar que las personas que piden un credito con baja duracion de este, representan un menor riesgo que las personas que piden un credito de 30 meses a más.

##      [250,1.13e+03] (1.13e+03,2.01e+03] (2.01e+03,2.89e+03] (2.89e+03,3.77e+03] 
##                1144                 546                 585                 553 
## (3.77e+03,4.64e+03] (4.64e+03,5.52e+03]  (5.52e+03,6.4e+03]  (6.4e+03,7.28e+03] 
##                 503                 500                 459                 335 
## (7.28e+03,8.16e+03] (8.16e+03,9.04e+03] (9.04e+03,9.92e+03] (9.92e+03,1.08e+04] 
##                 193                  95                  56                  23 
## (1.08e+04,1.17e+04] 
##                   8
##    [4,8.62] (8.62,13.2] (13.2,17.8] (17.8,22.5] (22.5,27.1] (27.1,31.7] 
##         757         618         631         667         776         570 
## (31.7,36.3] (36.3,40.9] (40.9,45.5] (45.5,50.2] (50.2,54.8] (54.8,59.4] 
##         523         231         150          52          14           4 
##   (59.4,64] 
##           7

Variable: Edad

Con respecto a la variable edad, las personas de aproximadamente 20 años son las que mas demandan créditos, sin embargo, la gran mayoria de personas que piden crédito estan en el rango de edad entre 30 a 50 años, despues de este rango, se nota un claro descenso en las demandas de crédito. La edad promedio de los solicitantes del prestamo es 35.93 anios y la duracion promedio del mismo es 21.39 meses

##       AGE       
##  Min.   :19.00  
##  1st Qu.:28.00  
##  Median :36.00  
##  Mean   :35.93  
##  3rd Qu.:44.00  
##  Max.   :74.00

Edad vs Riesgo

Visualmente se puede observar un mayor riesgo para las personas adultas mayores de 50 años en comparacion a un adulto de entre 20 a 35 años.

## Warning: Removed 6 rows containing missing values (geom_bar).

##   [19,23.2] (23.2,27.5] (27.5,31.7] (31.7,35.9] (35.9,40.2] (40.2,44.4] 
##         756         445         610         635         830         587 
## (44.4,48.6] (48.6,52.8] (52.8,57.1] (57.1,61.3] (61.3,65.5] (65.5,69.8] 
##         477         340         215          74          19           4 
##   (69.8,74] 
##           8

Variable: Otros créditos

En relacion a la obtencion de otros creditos, se observa que el 83.46% no tiene ningun prestamo. Mientras que el 14.34% es coaplicante y el 2.2% es garante de otro deudor.

## 
## co-applicant    guarantor         none 
##          717          110         4173
## 
## co-applicant    guarantor         none 
##       0.1434       0.0220       0.8346

Otros Créditos vs Riesgo

Variable: Propósito del préstamo

El destino del 50% de los prestamos incluye la adquisicion de muebles y vehiculos nuevos o usados, representando el 17.06%, 18.9%, y 16.16% respectivamente.

## 
## appliances   business    car_new   car_used  education  furniture      other 
##        561        146        945        808        167        853        113 
##   radio_tv    repairs retraining   vacation 
##        755        283        164        205
## 
## appliances   business    car_new   car_used  education  furniture      other 
##     0.1122     0.0292     0.1890     0.1616     0.0334     0.1706     0.0226 
##   radio_tv    repairs retraining   vacation 
##     0.1510     0.0566     0.0328     0.0410

Propósito del préstamo vs Riesgo

Observaciones

Hemos tratado de contrastar las variables para poder evaluar el negocio. Por ejemplo se podria esperar que a mayores rangos de ingresos, las personas sean menos riesgosas. Un paso adicional podria ser crear una variable en base al ingreso disponible despues de pagar las deudas. Por ejemplo se pueden generar 10, 000 soles al mes pero deber 30,000 en el sistema financiero o ganar solo 5,000 y deber 500

Para contrastar el comportamiento de las variables con el negocio que estamos evaluando, podriamos agregar variables para discriminar mejor.

Selección de variables

Utilizando el algoritmo Boruta

A través de Boruta, se han tratado de utilizar las variables mas relevantes al modelo. Dado que la data no tiene cortes por ejemplo a seis meses, se podría evaluar que tan bien discrimina. Como las variables son estáticas, seria mas difícil deterninar un factor de discriminación. Podriamos haber tomado en cuenta variables convolucionadas de sexo e ingresos u otras categorias.

No hay datos perdidos, por lo cual no ha habido necesidad de inputar datos.

Esencialmente, el algoritmo de Boruta entrena un bosque aleatorio en el conjunto de características originales y aleatorias. Este bosque aleatorio durante el entrenamiento solo toma un subconjunto de todas las características de cada nodo.

Utilizando las variables originales

Gráfico de selección e importancia de las variables

Mediante el algoritmo Boruta, un método alternativo de selección de variables, se determino que todas las variables que influyen más en la variable morosidad. El algoritmo confirma que las 19 variables son importantes, no rechazándose ninguna como no influyente según el gráfico a continuación.

Gráfico

A continuación el Gráfico de historial de importancia , y podemos ver que los atributos verdes tienen una importancia mucho mayor que los atributos de sombra representados en azul.

Tabla de importancia de las variables

Aquí podríamos elegir las variables más importantes, de acuerdo a la importancia promedio y que tenga menor rango entre el mínimo y máximo valores de importancia (estables) en las iteraciones.

##                    attribute   meanImp medianImp    minImp    maxImp  decision
##  1:       EMPLOYMENTDURATION 39.853412 40.027328 37.653008 41.230613 Confirmed
##  2:                      AGE 33.129929 33.079796 31.740182 34.902206 Confirmed
##  3:           CHECKINGSTATUS 30.780408 30.608107 29.482565 31.945036 Confirmed
##  4:             OTHERSONLOAN 29.670025 29.439980 28.829647 31.360621 Confirmed
##  5:                      SEX 29.122594 29.548718 25.177103 33.189944 Confirmed
##  6:             LOANDURATION 28.117218 27.808564 26.373261 30.672204 Confirmed
##  7:               LOANAMOUNT 26.870930 27.031948 24.151204 29.062599 Confirmed
##  8:     EXISTINGCREDITSCOUNT 25.531695 25.411251 23.968302 27.507689 Confirmed
##  9:             OWNSPROPERTY 25.033033 25.118686 23.411900 26.801097 Confirmed
## 10: CURRENTRESIDENCEDURATION 24.020469 23.667321 22.529908 25.697417 Confirmed
## 11:                TELEPHONE 23.855662 24.253180 21.795987 24.575069 Confirmed
## 12:           LOANDURATION_n 21.941686 21.835271 21.076616 23.309363 Confirmed
## 13:                  HOUSING 17.586400 17.323794 16.509296 19.228175 Confirmed
## 14:            CREDITHISTORY 16.786771 16.870717 15.062138 17.578415 Confirmed
## 15:          EXISTINGSAVINGS 12.542730 12.394264 11.509938 13.610939 Confirmed
## 16:         INSTALLMENTPLANS 12.009783 11.939293 10.843983 14.043137 Confirmed
## 17:       INSTALLMENTPERCENT 10.975612 10.828390 10.191274 11.570790 Confirmed
## 18:               DEPENDENTS  9.275352  9.056440  8.118089 10.893466 Confirmed
## 19:              LOANPURPOSE  4.280712  4.032957  2.567785  5.804621 Confirmed

Utilizando las variable Edad categorizada y Monto del Préstamo

##                    attribute   meanImp medianImp    minImp    maxImp  decision
##  1:       EMPLOYMENTDURATION 40.164292 40.324878 37.770977 43.272873 Confirmed
##  2:           CHECKINGSTATUS 31.261499 31.339636 29.229301 33.532249 Confirmed
##  3:             OTHERSONLOAN 30.590918 30.980418 28.043918 33.118282 Confirmed
##  4:                    AGE_n 29.814695 30.019524 28.145255 31.597384 Confirmed
##  5:                      SEX 29.809302 30.203775 27.090567 31.877534 Confirmed
##  6:             LOANDURATION 28.572417 28.330908 27.406162 29.937915 Confirmed
##  7:               LOANAMOUNT 27.086143 27.190457 24.660851 28.509144 Confirmed
##  8:     EXISTINGCREDITSCOUNT 26.541220 26.567531 25.441425 27.804449 Confirmed
##  9: CURRENTRESIDENCEDURATION 24.806361 24.665259 23.544241 25.764933 Confirmed
## 10:             OWNSPROPERTY 24.503700 24.621956 22.566988 26.742844 Confirmed
## 11:                TELEPHONE 24.324521 24.293246 22.367578 26.012559 Confirmed
## 12:           LOANDURATION_n 21.643050 21.868246 19.754058 22.619813 Confirmed
## 13:                  HOUSING 18.259865 18.336444 16.424203 20.570527 Confirmed
## 14:            CREDITHISTORY 16.137829 16.057376 15.195587 17.252472 Confirmed
## 15:          EXISTINGSAVINGS 12.627498 12.783498 11.399990 14.246745 Confirmed
## 16:         INSTALLMENTPLANS 11.701892 11.094096  8.643885 14.246422 Confirmed
## 17:       INSTALLMENTPERCENT 11.186265 11.167211 10.038348 12.152617 Confirmed
## 18:               DEPENDENTS  9.326550  9.232201  8.272204 10.803452 Confirmed
## 19:              LOANPURPOSE  4.087028  4.220269  2.206438  5.492419 Confirmed

Utilizando las variable Edad y Monto del Préstamo categorizado

##                    attribute   meanImp medianImp    minImp    maxImp  decision
##  1:       EMPLOYMENTDURATION 40.390693 40.426332 38.598742 43.413984 Confirmed
##  2:                      AGE 33.968305 33.976034 31.660996 35.990395 Confirmed
##  3:           CHECKINGSTATUS 31.550118 31.803730 29.315143 33.343750 Confirmed
##  4:             OTHERSONLOAN 31.156569 31.417736 28.939624 32.498316 Confirmed
##  5:                      SEX 29.505856 29.684140 28.178332 31.009044 Confirmed
##  6:             LOANDURATION 28.988690 29.065374 27.257272 30.939047 Confirmed
##  7:     EXISTINGCREDITSCOUNT 26.996884 27.259455 24.143524 29.050927 Confirmed
##  8:             OWNSPROPERTY 25.766894 26.018744 23.974284 27.304454 Confirmed
##  9: CURRENTRESIDENCEDURATION 25.319990 25.612994 22.473403 26.368296 Confirmed
## 10:                TELEPHONE 24.281577 24.344259 22.507951 26.162671 Confirmed
## 11:             LOANAMOUNT_n 23.189144 23.044438 21.892921 24.666904 Confirmed
## 12:           LOANDURATION_n 21.720813 21.865712 19.697876 23.052787 Confirmed
## 13:                  HOUSING 18.572068 18.484706 17.421285 20.292154 Confirmed
## 14:            CREDITHISTORY 16.700992 16.866744 15.454072 17.680639 Confirmed
## 15:          EXISTINGSAVINGS 12.964019 13.102271 10.934197 14.302789 Confirmed
## 16:         INSTALLMENTPLANS 12.839789 13.062416 10.990844 15.539287 Confirmed
## 17:       INSTALLMENTPERCENT 11.100957 10.765779  9.690924 12.542144 Confirmed
## 18:               DEPENDENTS  8.721735  8.643503  7.271188  9.974219 Confirmed
## 19:              LOANPURPOSE  4.546980  4.447390  2.901558  6.918347 Confirmed

Utilizando las variable Edad categorizada y Monto del Préstamo categorizado

##                    attribute   meanImp medianImp    minImp    maxImp  decision
##  1:       EMPLOYMENTDURATION 41.160508 41.211001 38.057316 43.426001 Confirmed
##  2:           CHECKINGSTATUS 31.953095 32.101926 29.438198 33.575608 Confirmed
##  3:             OTHERSONLOAN 31.403688 31.337934 29.772151 33.027241 Confirmed
##  4:                    AGE_n 30.634289 30.362209 28.260829 34.002227 Confirmed
##  5:                      SEX 30.039746 29.747411 28.081276 32.265615 Confirmed
##  6:             LOANDURATION 29.433396 29.534955 26.886010 31.239266 Confirmed
##  7:     EXISTINGCREDITSCOUNT 27.112740 27.049528 25.848812 28.573572 Confirmed
##  8:             OWNSPROPERTY 25.746180 25.644511 24.463069 27.819679 Confirmed
##  9:                TELEPHONE 25.212088 25.396886 23.780960 26.555830 Confirmed
## 10: CURRENTRESIDENCEDURATION 25.191252 25.113024 23.356655 27.445121 Confirmed
## 11:             LOANAMOUNT_n 24.277483 24.251003 21.747458 26.153495 Confirmed
## 12:           LOANDURATION_n 21.985099 21.982981 19.671726 23.912971 Confirmed
## 13:                  HOUSING 19.036828 18.966337 17.005093 20.854348 Confirmed
## 14:            CREDITHISTORY 17.106039 17.120655 15.473482 18.262769 Confirmed
## 15:          EXISTINGSAVINGS 13.460252 13.440092 11.122331 16.110528 Confirmed
## 16:         INSTALLMENTPLANS 12.542876 12.860055 10.236378 14.320066 Confirmed
## 17:       INSTALLMENTPERCENT 11.034733 10.960218 10.365609 12.615267 Confirmed
## 18:               DEPENDENTS  9.863946  9.896618  8.197944 11.277400 Confirmed
## 19:              LOANPURPOSE  4.180475  4.166471  1.961230  6.592706 Confirmed

Information Value

Variable IV
7 EMPLOYMENTDURATION 1.1669430
12 OWNSPROPERTY 1.1468583
13 AGE 1.1202743
5 LOANAMOUNT 1.1099155
21 AGE_n 1.0927639
19 LOANAMOUNT_n 1.0844599
2 LOANDURATION 1.0772301
20 LOANDURATION_n 1.0109078
1 CHECKINGSTATUS 0.9680050
11 CURRENTRESIDENCEDURATION 0.9313804
3 CREDITHISTORY 0.8574463
8 INSTALLMENTPERCENT 0.8182934
15 HOUSING 0.6973851
16 EXISTINGCREDITSCOUNT 0.6746747
6 EXISTINGSAVINGS 0.6482769
18 TELEPHONE 0.5763185
10 OTHERSONLOAN 0.4655481
4 LOANPURPOSE 0.4519853
17 DEPENDENTS 0.2227255
14 INSTALLMENTPLANS 0.1456835

Particionando la data

Particionamos la data, de tal manera que el 75% es para el desarrollo del modelo, y el 25% para su validación.

Proporción de la variable Riesgo en las particiones

## 
## No Risk    Risk 
##   0.666   0.334
## 
##   No Risk      Risk 
## 0.6653333 0.3346667
## 
## No Risk    Risk 
##   0.668   0.332
## 
## Call:
##  randomForest(formula = RISK ~ ., data = Train, importance = T) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 4
## 
##         OOB estimate of  error rate: 21.33%
## Confusion matrix:
##         No Risk Risk class.error
## No Risk    2215  280   0.1122244
## Risk        520  735   0.4143426

##                                          Variable   Imp_tot
## EMPLOYMENTDURATION             EMPLOYMENTDURATION 5.5589737
## AGE                                           AGE 5.3565739
## LOANDURATION                         LOANDURATION 5.0821198
## LOANAMOUNT                             LOANAMOUNT 4.8396395
## AGE_n                                       AGE_n 4.0278251
## CHECKINGSTATUS                     CHECKINGSTATUS 3.9350799
## LOANDURATION_n                     LOANDURATION_n 3.6975965
## OWNSPROPERTY                         OWNSPROPERTY 3.6087043
## SEX                                           SEX 3.4269828
## LOANAMOUNT_n                         LOANAMOUNT_n 3.3837034
## CURRENTRESIDENCEDURATION CURRENTRESIDENCEDURATION 3.0952248
## EXISTINGCREDITSCOUNT         EXISTINGCREDITSCOUNT 3.0120146
## OTHERSONLOAN                         OTHERSONLOAN 2.8131923
## LOANPURPOSE                           LOANPURPOSE 2.5497943
## TELEPHONE                               TELEPHONE 1.9006555
## HOUSING                                   HOUSING 1.4176819
## EXISTINGSAVINGS                   EXISTINGSAVINGS 1.0809348
## INSTALLMENTPERCENT             INSTALLMENTPERCENT 0.9230203
## CREDITHISTORY                       CREDITHISTORY 0.8753996
## INSTALLMENTPLANS                 INSTALLMENTPLANS 0.7585049
## DEPENDENTS                             DEPENDENTS 0.0000000
##                          MeanDecreaseAccuracy MeanDecreaseGini
## EMPLOYMENTDURATION                 2.00777372        0.6300751
## AGE                                0.64645508        1.7889939
## LOANDURATION                       0.88014646        1.2808485
## LOANAMOUNT                         0.02358105        1.8949336
## AGE_n                              0.34474490        0.7619553
## CHECKINGSTATUS                     0.78946069        0.2244944
## LOANDURATION_n                     0.12426504        0.6522066
## OWNSPROPERTY                       0.54034415        0.1472353
## SEX                                1.36155616       -0.8556982
## LOANAMOUNT_n                      -0.31278045        0.7753590
## CURRENTRESIDENCEDURATION           0.50308195       -0.3289820
## EXISTINGCREDITSCOUNT               0.64641051       -0.5555208
## OTHERSONLOAN                       0.70094532       -0.8088779
## LOANPURPOSE                       -1.37736318        1.0060327
## TELEPHONE                          0.11362325       -1.1340926
## HOUSING                           -0.53820094       -0.9652420
## EXISTINGSAVINGS                   -1.22810410       -0.6120860
## INSTALLMENTPERCENT                -1.30801051       -0.6900941
## CREDITHISTORY                     -1.22307305       -0.8226522
## INSTALLMENTPLANS                  -1.16518725       -0.9974327
## DEPENDENTS                        -1.52966883       -1.3914560

Bibliografía

https://www.jstatsoft.org/article/view/v036i11 https://www.analyticsvidhya.com/blog/2016/03/select-important-variables-boruta-package/ https://www.andreaperlato.com/mlpost/feature-selection-using-boruta-algorithm/ https://r4ds.had.co.nz/data-visualisation.html?q=ggplot#creating-a-ggplot https://cran.r-project.org/web/packages/Boruta/vignettes/inahurry.pdf https://rpubs.com/nicolasrondan/EjemploClasificacionRegresionLogistica