Se elimina el CUSTOMERID dado que es una variable para identificar al cliente y no aporta en el análisis.
| CHECKINGSTATUS | LOANDURATION | CREDITHISTORY | LOANPURPOSE | LOANAMOUNT | EXISTINGSAVINGS | EMPLOYMENTDURATION | INSTALLMENTPERCENT | SEX | OTHERSONLOAN | CURRENTRESIDENCEDURATION | OWNSPROPERTY | AGE | INSTALLMENTPLANS | HOUSING | EXISTINGCREDITSCOUNT | JOB | DEPENDENTS | TELEPHONE | FOREIGNWORKER | RISK |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0_to_200 | 31 | credits_paid_to_date | other | 1889 | 100_to_500 | less_1 | 3 | female | none | 3 | savings_insurance | 32 | none | own | 1 | skilled | 1 | none | yes | No Risk |
| less_0 | 18 | credits_paid_to_date | car_new | 462 | less_100 | 1_to_4 | 2 | female | none | 2 | savings_insurance | 37 | stores | own | 2 | skilled | 1 | none | yes | No Risk |
| less_0 | 15 | prior_payments_delayed | furniture | 250 | less_100 | 1_to_4 | 2 | male | none | 3 | real_estate | 28 | none | own | 2 | skilled | 1 | yes | no | No Risk |
| 0_to_200 | 28 | credits_paid_to_date | retraining | 3693 | less_100 | greater_7 | 3 | male | none | 2 | savings_insurance | 32 | none | own | 1 | skilled | 1 | none | yes | No Risk |
| no_checking | 28 | prior_payments_delayed | education | 6235 | 500_to_1000 | greater_7 | 3 | male | none | 3 | unknown | 57 | none | own | 2 | skilled | 1 | none | yes | Risk |
| no_checking | 32 | outstanding_credit | vacation | 9604 | 500_to_1000 | greater_7 | 6 | male | co-applicant | 5 | unknown | 57 | none | free | 2 | skilled | 2 | yes | yes | Risk |
## Classes 'data.table' and 'data.frame': 5000 obs. of 7 variables:
## $ LOANDURATION : int 31 18 15 28 28 32 9 16 11 35 ...
## $ AGE : int 32 37 28 32 57 57 41 36 22 49 ...
## $ LOANAMOUNT : int 1889 462 250 3693 6235 9604 1032 3109 4553 7138 ...
## $ CURRENTRESIDENCEDURATION: int 3 2 3 2 3 5 4 1 3 4 ...
## $ EXISTINGCREDITSCOUNT : int 1 2 2 1 2 2 1 2 1 2 ...
## $ INSTALLMENTPERCENT : int 3 2 2 3 3 6 3 3 3 5 ...
## $ DEPENDENTS : int 1 1 1 1 1 2 1 1 1 2 ...
## - attr(*, ".internal.selfref")=<externalptr>
Definimos las variables LOANDURATION, AGE, LOANAMOUNT, LOANDURATION como numéricas
## Classes 'data.table' and 'data.frame': 5000 obs. of 7 variables:
## $ LOANDURATION : num 31 18 15 28 28 32 9 16 11 35 ...
## $ AGE : num 32 37 28 32 57 57 41 36 22 49 ...
## $ LOANAMOUNT : num 1889 462 250 3693 6235 ...
## $ CURRENTRESIDENCEDURATION: int 3 2 3 2 3 5 4 1 3 4 ...
## $ EXISTINGCREDITSCOUNT : int 1 2 2 1 2 2 1 2 1 2 ...
## $ INSTALLMENTPERCENT : num 3 2 2 3 3 6 3 3 3 5 ...
## $ DEPENDENTS : int 1 1 1 1 1 2 1 1 1 2 ...
## - attr(*, ".internal.selfref")=<externalptr>
## Classes 'data.table' and 'data.frame': 5000 obs. of 13 variables:
## $ OTHERSONLOAN : Factor w/ 3 levels "co-applicant",..: 3 3 3 3 3 1 3 3 3 1 ...
## $ EMPLOYMENTDURATION: Factor w/ 5 levels "1_to_4","4_to_7",..: 4 1 1 3 3 3 2 2 4 3 ...
## $ SEX : Factor w/ 2 levels "female","male": 1 1 2 2 2 2 2 1 1 2 ...
## $ OWNSPROPERTY : Factor w/ 4 levels "car_other","real_estate",..: 3 3 2 3 4 4 3 1 3 4 ...
## $ TELEPHONE : Factor w/ 2 levels "none","yes": 1 1 2 1 1 2 1 1 1 2 ...
## $ CHECKINGSTATUS : Factor w/ 4 levels "0_to_200","greater_200",..: 1 3 3 1 4 4 4 3 1 4 ...
## $ HOUSING : Factor w/ 3 levels "free","own","rent": 2 2 2 2 2 1 2 2 2 1 ...
## $ CREDITHISTORY : Factor w/ 5 levels "all_credits_paid_back",..: 2 2 5 2 5 4 5 2 2 4 ...
## $ INSTALLMENTPLANS : Factor w/ 3 levels "bank","none",..: 2 3 2 2 2 2 2 2 2 2 ...
## $ JOB : Factor w/ 4 levels "management_self-employed",..: 2 2 2 2 2 2 1 2 1 2 ...
## $ EXISTINGSAVINGS : Factor w/ 5 levels "100_to_500","500_to_1000",..: 1 4 4 4 2 2 1 4 4 2 ...
## $ LOANPURPOSE : Factor w/ 11 levels "appliances","business",..: 7 3 6 10 5 11 3 11 3 1 ...
## $ FOREIGNWORKER : Factor w/ 2 levels "no","yes": 2 2 1 2 2 2 2 2 2 2 ...
## - attr(*, ".internal.selfref")=<externalptr>
## LOANDURATION AGE LOANAMOUNT CURRENTRESIDENCEDURATION
## Min. : 4.00 Min. :19.00 Min. : 250 Min. :1.000
## 1st Qu.:13.00 1st Qu.:28.00 1st Qu.: 1327 1st Qu.:2.000
## Median :21.00 Median :36.00 Median : 3238 Median :3.000
## Mean :21.39 Mean :35.93 Mean : 3480 Mean :2.854
## 3rd Qu.:29.00 3rd Qu.:44.00 3rd Qu.: 5355 3rd Qu.:4.000
## Max. :64.00 Max. :74.00 Max. :11676 Max. :6.000
## EXISTINGCREDITSCOUNT INSTALLMENTPERCENT DEPENDENTS
## Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:1.000
## Median :1.000 Median :3.000 Median :1.000
## Mean :1.466 Mean :2.982 Mean :1.165
## 3rd Qu.:2.000 3rd Qu.:4.000 3rd Qu.:1.000
## Max. :4.000 Max. :6.000 Max. :2.000
## OTHERSONLOAN EMPLOYMENTDURATION SEX OWNSPROPERTY
## co-applicant: 717 1_to_4 :1470 female:1896 car_other :1540
## guarantor : 110 4_to_7 :1400 male :3104 real_estate :1087
## none :4173 greater_7 : 930 savings_insurance:1660
## less_1 : 904 unknown : 713
## unemployed: 296
##
##
## TELEPHONE CHECKINGSTATUS HOUSING CREDITHISTORY
## none:2941 0_to_200 :1304 free: 739 all_credits_paid_back : 769
## yes :2059 greater_200: 305 own :3195 credits_paid_to_date :1490
## less_0 :1398 rent:1066 no_credits : 117
## no_checking:1993 outstanding_credit : 938
## prior_payments_delayed:1686
##
##
## INSTALLMENTPLANS JOB EXISTINGSAVINGS
## bank : 466 management_self-employed: 641 100_to_500 :1133
## none :3517 skilled :3400 500_to_1000 :1078
## stores:1017 unemployed : 286 greater_1000: 558
## unskilled : 673 less_100 :1856
## unknown : 375
##
##
## LOANPURPOSE FOREIGNWORKER
## car_new :945 no : 123
## furniture :853 yes:4877
## car_used :808
## radio_tv :755
## appliances:561
## repairs :283
## (Other) :795
No existen datos faltantes
| Name | data |
| Number of rows | 5000 |
| Number of columns | 21 |
| Key | NULL |
| _______________________ | |
| Column type frequency: | |
| factor | 14 |
| numeric | 7 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| CHECKINGSTATUS | 0 | 1 | TRUE | 4 | no_: 1993, les: 1398, 0_t: 1304, gre: 305 |
| CREDITHISTORY | 0 | 1 | TRUE | 5 | pri: 1686, cre: 1490, out: 938, all: 769 |
| LOANPURPOSE | 0 | 1 | FALSE | 11 | car: 945, fur: 853, car: 808, rad: 755 |
| EXISTINGSAVINGS | 0 | 1 | TRUE | 5 | les: 1856, 100: 1133, 500: 1078, gre: 558 |
| EMPLOYMENTDURATION | 0 | 1 | TRUE | 5 | 1_t: 1470, 4_t: 1400, gre: 930, les: 904 |
| SEX | 0 | 1 | FALSE | 2 | mal: 3104, fem: 1896 |
| OTHERSONLOAN | 0 | 1 | FALSE | 3 | non: 4173, co-: 717, gua: 110 |
| OWNSPROPERTY | 0 | 1 | FALSE | 4 | sav: 1660, car: 1540, rea: 1087, unk: 713 |
| INSTALLMENTPLANS | 0 | 1 | FALSE | 3 | non: 3517, sto: 1017, ban: 466 |
| HOUSING | 0 | 1 | FALSE | 3 | own: 3195, ren: 1066, fre: 739 |
| JOB | 0 | 1 | TRUE | 4 | ski: 3400, uns: 673, man: 641, une: 286 |
| TELEPHONE | 0 | 1 | FALSE | 2 | non: 2941, yes: 2059 |
| FOREIGNWORKER | 0 | 1 | FALSE | 2 | yes: 4877, no: 123 |
| RISK | 0 | 1 | FALSE | 2 | No : 3330, Ris: 1670 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| LOANDURATION | 0 | 1 | 21.39 | 11.16 | 4 | 13.00 | 21.0 | 29 | 64 | ▇▇▅▁▁ |
| LOANAMOUNT | 0 | 1 | 3480.14 | 2488.23 | 250 | 1326.75 | 3238.5 | 5355 | 11676 | ▇▆▅▂▁ |
| INSTALLMENTPERCENT | 0 | 1 | 2.98 | 1.13 | 1 | 2.00 | 3.0 | 4 | 6 | ▇▇▆▂▁ |
| CURRENTRESIDENCEDURATION | 0 | 1 | 2.85 | 1.12 | 1 | 2.00 | 3.0 | 4 | 6 | ▇▇▅▂▁ |
| AGE | 0 | 1 | 35.93 | 10.65 | 19 | 28.00 | 36.0 | 44 | 74 | ▇▇▆▁▁ |
| EXISTINGCREDITSCOUNT | 0 | 1 | 1.47 | 0.57 | 1 | 1.00 | 1.0 | 2 | 4 | ▇▆▁▁▁ |
| DEPENDENTS | 0 | 1 | 1.16 | 0.37 | 1 | 1.00 | 1.0 | 1 | 2 | ▇▁▁▁▂ |
Analizando la relación entre las variables categóricas, se evidenció que las variables JOB y FOREIGNWORKER presentan relaciones con diversas variables, por cual se excluyen del conjunto de datos.
## Warning in chisq.test(data$LOANPURPOSE, data$FOREIGNWORKER): Chi-squared
## approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: data$LOANPURPOSE and data$FOREIGNWORKER
## X-squared = 12.515, df = 10, p-value = 0.2521
##
## Pearson's Chi-squared test
##
## data: data$OWNSPROPERTY and data$JOB
## X-squared = 15.423, df = 9, p-value = 0.07996
##
## Pearson's Chi-squared test
##
## data: data$JOB and data$FOREIGNWORKER
## X-squared = 2.9341, df = 3, p-value = 0.4019
##
## No Risk Risk
## 3330 1670
##
## No Risk Risk
## 66.6 33.4
Graficamente se puede ver una mayor demanda de creditos con una duración entre 4 a 6 meses, seguido con las demandas de credito de aproximadamente 30 meses, despues de este se ve un claro descenso en la duración de los creditos.
## LOANDURATION
## Min. : 4.00
## 1st Qu.:13.00
## Median :21.00
## Mean :21.39
## 3rd Qu.:29.00
## Max. :64.00
Visualmente podemos apreciar que las personas que piden un credito con baja duracion de este, representan un menor riesgo que las personas que piden un credito de 30 meses a más.
Graficamente se puede ver una mayor demanda de creditos con una duración entre 4 a 6 meses, seguido con las demandas de credito de aproximadamente 30 meses, despues de este se ve un claro descenso en la duración de los creditos.
## LOANAMOUNT
## Min. : 250
## 1st Qu.: 1327
## Median : 3238
## Mean : 3480
## 3rd Qu.: 5355
## Max. :11676
Visualmente podemos apreciar que las personas que piden un credito con baja duracion de este, representan un menor riesgo que las personas que piden un credito de 30 meses a más.
## [250,1.13e+03] (1.13e+03,2.01e+03] (2.01e+03,2.89e+03] (2.89e+03,3.77e+03]
## 1144 546 585 553
## (3.77e+03,4.64e+03] (4.64e+03,5.52e+03] (5.52e+03,6.4e+03] (6.4e+03,7.28e+03]
## 503 500 459 335
## (7.28e+03,8.16e+03] (8.16e+03,9.04e+03] (9.04e+03,9.92e+03] (9.92e+03,1.08e+04]
## 193 95 56 23
## (1.08e+04,1.17e+04]
## 8
## [4,8.62] (8.62,13.2] (13.2,17.8] (17.8,22.5] (22.5,27.1] (27.1,31.7]
## 757 618 631 667 776 570
## (31.7,36.3] (36.3,40.9] (40.9,45.5] (45.5,50.2] (50.2,54.8] (54.8,59.4]
## 523 231 150 52 14 4
## (59.4,64]
## 7
Con respecto a la variable edad, las personas de aproximadamente 20 años son las que mas demandan créditos, sin embargo, la gran mayoria de personas que piden crédito estan en el rango de edad entre 30 a 50 años, despues de este rango, se nota un claro descenso en las demandas de crédito. La edad promedio de los solicitantes del prestamo es 35.93 anios y la duracion promedio del mismo es 21.39 meses
## AGE
## Min. :19.00
## 1st Qu.:28.00
## Median :36.00
## Mean :35.93
## 3rd Qu.:44.00
## Max. :74.00
Visualmente se puede observar un mayor riesgo para las personas adultas mayores de 50 años en comparacion a un adulto de entre 20 a 35 años.
## Warning: Removed 6 rows containing missing values (geom_bar).
## [19,23.2] (23.2,27.5] (27.5,31.7] (31.7,35.9] (35.9,40.2] (40.2,44.4]
## 756 445 610 635 830 587
## (44.4,48.6] (48.6,52.8] (52.8,57.1] (57.1,61.3] (61.3,65.5] (65.5,69.8]
## 477 340 215 74 19 4
## (69.8,74]
## 8
En relacion a la obtencion de otros creditos, se observa que el 83.46% no tiene ningun prestamo. Mientras que el 14.34% es coaplicante y el 2.2% es garante de otro deudor.
##
## co-applicant guarantor none
## 717 110 4173
##
## co-applicant guarantor none
## 0.1434 0.0220 0.8346
El destino del 50% de los prestamos incluye la adquisicion de muebles y vehiculos nuevos o usados, representando el 17.06%, 18.9%, y 16.16% respectivamente.
##
## appliances business car_new car_used education furniture other
## 561 146 945 808 167 853 113
## radio_tv repairs retraining vacation
## 755 283 164 205
##
## appliances business car_new car_used education furniture other
## 0.1122 0.0292 0.1890 0.1616 0.0334 0.1706 0.0226
## radio_tv repairs retraining vacation
## 0.1510 0.0566 0.0328 0.0410
Hemos tratado de contrastar las variables para poder evaluar el negocio. Por ejemplo se podria esperar que a mayores rangos de ingresos, las personas sean menos riesgosas. Un paso adicional podria ser crear una variable en base al ingreso disponible despues de pagar las deudas. Por ejemplo se pueden generar 10, 000 soles al mes pero deber 30,000 en el sistema financiero o ganar solo 5,000 y deber 500
Para contrastar el comportamiento de las variables con el negocio que estamos evaluando, podriamos agregar variables para discriminar mejor.
A través de Boruta, se han tratado de utilizar las variables mas relevantes al modelo. Dado que la data no tiene cortes por ejemplo a seis meses, se podría evaluar que tan bien discrimina. Como las variables son estáticas, seria mas difícil deterninar un factor de discriminación. Podriamos haber tomado en cuenta variables convolucionadas de sexo e ingresos u otras categorias.
No hay datos perdidos, por lo cual no ha habido necesidad de inputar datos.
Esencialmente, el algoritmo de Boruta entrena un bosque aleatorio en el conjunto de características originales y aleatorias. Este bosque aleatorio durante el entrenamiento solo toma un subconjunto de todas las características de cada nodo.
Mediante el algoritmo Boruta, un método alternativo de selección de variables, se determino que todas las variables que influyen más en la variable morosidad. El algoritmo confirma que las 19 variables son importantes, no rechazándose ninguna como no influyente según el gráfico a continuación.
A continuación el Gráfico de historial de importancia , y podemos ver que los atributos verdes tienen una importancia mucho mayor que los atributos de sombra representados en azul.
Aquí podríamos elegir las variables más importantes, de acuerdo a la importancia promedio y que tenga menor rango entre el mínimo y máximo valores de importancia (estables) en las iteraciones.
## attribute meanImp medianImp minImp maxImp decision
## 1: EMPLOYMENTDURATION 39.853412 40.027328 37.653008 41.230613 Confirmed
## 2: AGE 33.129929 33.079796 31.740182 34.902206 Confirmed
## 3: CHECKINGSTATUS 30.780408 30.608107 29.482565 31.945036 Confirmed
## 4: OTHERSONLOAN 29.670025 29.439980 28.829647 31.360621 Confirmed
## 5: SEX 29.122594 29.548718 25.177103 33.189944 Confirmed
## 6: LOANDURATION 28.117218 27.808564 26.373261 30.672204 Confirmed
## 7: LOANAMOUNT 26.870930 27.031948 24.151204 29.062599 Confirmed
## 8: EXISTINGCREDITSCOUNT 25.531695 25.411251 23.968302 27.507689 Confirmed
## 9: OWNSPROPERTY 25.033033 25.118686 23.411900 26.801097 Confirmed
## 10: CURRENTRESIDENCEDURATION 24.020469 23.667321 22.529908 25.697417 Confirmed
## 11: TELEPHONE 23.855662 24.253180 21.795987 24.575069 Confirmed
## 12: LOANDURATION_n 21.941686 21.835271 21.076616 23.309363 Confirmed
## 13: HOUSING 17.586400 17.323794 16.509296 19.228175 Confirmed
## 14: CREDITHISTORY 16.786771 16.870717 15.062138 17.578415 Confirmed
## 15: EXISTINGSAVINGS 12.542730 12.394264 11.509938 13.610939 Confirmed
## 16: INSTALLMENTPLANS 12.009783 11.939293 10.843983 14.043137 Confirmed
## 17: INSTALLMENTPERCENT 10.975612 10.828390 10.191274 11.570790 Confirmed
## 18: DEPENDENTS 9.275352 9.056440 8.118089 10.893466 Confirmed
## 19: LOANPURPOSE 4.280712 4.032957 2.567785 5.804621 Confirmed
## attribute meanImp medianImp minImp maxImp decision
## 1: EMPLOYMENTDURATION 40.164292 40.324878 37.770977 43.272873 Confirmed
## 2: CHECKINGSTATUS 31.261499 31.339636 29.229301 33.532249 Confirmed
## 3: OTHERSONLOAN 30.590918 30.980418 28.043918 33.118282 Confirmed
## 4: AGE_n 29.814695 30.019524 28.145255 31.597384 Confirmed
## 5: SEX 29.809302 30.203775 27.090567 31.877534 Confirmed
## 6: LOANDURATION 28.572417 28.330908 27.406162 29.937915 Confirmed
## 7: LOANAMOUNT 27.086143 27.190457 24.660851 28.509144 Confirmed
## 8: EXISTINGCREDITSCOUNT 26.541220 26.567531 25.441425 27.804449 Confirmed
## 9: CURRENTRESIDENCEDURATION 24.806361 24.665259 23.544241 25.764933 Confirmed
## 10: OWNSPROPERTY 24.503700 24.621956 22.566988 26.742844 Confirmed
## 11: TELEPHONE 24.324521 24.293246 22.367578 26.012559 Confirmed
## 12: LOANDURATION_n 21.643050 21.868246 19.754058 22.619813 Confirmed
## 13: HOUSING 18.259865 18.336444 16.424203 20.570527 Confirmed
## 14: CREDITHISTORY 16.137829 16.057376 15.195587 17.252472 Confirmed
## 15: EXISTINGSAVINGS 12.627498 12.783498 11.399990 14.246745 Confirmed
## 16: INSTALLMENTPLANS 11.701892 11.094096 8.643885 14.246422 Confirmed
## 17: INSTALLMENTPERCENT 11.186265 11.167211 10.038348 12.152617 Confirmed
## 18: DEPENDENTS 9.326550 9.232201 8.272204 10.803452 Confirmed
## 19: LOANPURPOSE 4.087028 4.220269 2.206438 5.492419 Confirmed
## attribute meanImp medianImp minImp maxImp decision
## 1: EMPLOYMENTDURATION 40.390693 40.426332 38.598742 43.413984 Confirmed
## 2: AGE 33.968305 33.976034 31.660996 35.990395 Confirmed
## 3: CHECKINGSTATUS 31.550118 31.803730 29.315143 33.343750 Confirmed
## 4: OTHERSONLOAN 31.156569 31.417736 28.939624 32.498316 Confirmed
## 5: SEX 29.505856 29.684140 28.178332 31.009044 Confirmed
## 6: LOANDURATION 28.988690 29.065374 27.257272 30.939047 Confirmed
## 7: EXISTINGCREDITSCOUNT 26.996884 27.259455 24.143524 29.050927 Confirmed
## 8: OWNSPROPERTY 25.766894 26.018744 23.974284 27.304454 Confirmed
## 9: CURRENTRESIDENCEDURATION 25.319990 25.612994 22.473403 26.368296 Confirmed
## 10: TELEPHONE 24.281577 24.344259 22.507951 26.162671 Confirmed
## 11: LOANAMOUNT_n 23.189144 23.044438 21.892921 24.666904 Confirmed
## 12: LOANDURATION_n 21.720813 21.865712 19.697876 23.052787 Confirmed
## 13: HOUSING 18.572068 18.484706 17.421285 20.292154 Confirmed
## 14: CREDITHISTORY 16.700992 16.866744 15.454072 17.680639 Confirmed
## 15: EXISTINGSAVINGS 12.964019 13.102271 10.934197 14.302789 Confirmed
## 16: INSTALLMENTPLANS 12.839789 13.062416 10.990844 15.539287 Confirmed
## 17: INSTALLMENTPERCENT 11.100957 10.765779 9.690924 12.542144 Confirmed
## 18: DEPENDENTS 8.721735 8.643503 7.271188 9.974219 Confirmed
## 19: LOANPURPOSE 4.546980 4.447390 2.901558 6.918347 Confirmed
## attribute meanImp medianImp minImp maxImp decision
## 1: EMPLOYMENTDURATION 41.160508 41.211001 38.057316 43.426001 Confirmed
## 2: CHECKINGSTATUS 31.953095 32.101926 29.438198 33.575608 Confirmed
## 3: OTHERSONLOAN 31.403688 31.337934 29.772151 33.027241 Confirmed
## 4: AGE_n 30.634289 30.362209 28.260829 34.002227 Confirmed
## 5: SEX 30.039746 29.747411 28.081276 32.265615 Confirmed
## 6: LOANDURATION 29.433396 29.534955 26.886010 31.239266 Confirmed
## 7: EXISTINGCREDITSCOUNT 27.112740 27.049528 25.848812 28.573572 Confirmed
## 8: OWNSPROPERTY 25.746180 25.644511 24.463069 27.819679 Confirmed
## 9: TELEPHONE 25.212088 25.396886 23.780960 26.555830 Confirmed
## 10: CURRENTRESIDENCEDURATION 25.191252 25.113024 23.356655 27.445121 Confirmed
## 11: LOANAMOUNT_n 24.277483 24.251003 21.747458 26.153495 Confirmed
## 12: LOANDURATION_n 21.985099 21.982981 19.671726 23.912971 Confirmed
## 13: HOUSING 19.036828 18.966337 17.005093 20.854348 Confirmed
## 14: CREDITHISTORY 17.106039 17.120655 15.473482 18.262769 Confirmed
## 15: EXISTINGSAVINGS 13.460252 13.440092 11.122331 16.110528 Confirmed
## 16: INSTALLMENTPLANS 12.542876 12.860055 10.236378 14.320066 Confirmed
## 17: INSTALLMENTPERCENT 11.034733 10.960218 10.365609 12.615267 Confirmed
## 18: DEPENDENTS 9.863946 9.896618 8.197944 11.277400 Confirmed
## 19: LOANPURPOSE 4.180475 4.166471 1.961230 6.592706 Confirmed
| Variable | IV | |
|---|---|---|
| 7 | EMPLOYMENTDURATION | 1.1669430 |
| 12 | OWNSPROPERTY | 1.1468583 |
| 13 | AGE | 1.1202743 |
| 5 | LOANAMOUNT | 1.1099155 |
| 21 | AGE_n | 1.0927639 |
| 19 | LOANAMOUNT_n | 1.0844599 |
| 2 | LOANDURATION | 1.0772301 |
| 20 | LOANDURATION_n | 1.0109078 |
| 1 | CHECKINGSTATUS | 0.9680050 |
| 11 | CURRENTRESIDENCEDURATION | 0.9313804 |
| 3 | CREDITHISTORY | 0.8574463 |
| 8 | INSTALLMENTPERCENT | 0.8182934 |
| 15 | HOUSING | 0.6973851 |
| 16 | EXISTINGCREDITSCOUNT | 0.6746747 |
| 6 | EXISTINGSAVINGS | 0.6482769 |
| 18 | TELEPHONE | 0.5763185 |
| 10 | OTHERSONLOAN | 0.4655481 |
| 4 | LOANPURPOSE | 0.4519853 |
| 17 | DEPENDENTS | 0.2227255 |
| 14 | INSTALLMENTPLANS | 0.1456835 |
Particionamos la data, de tal manera que el 75% es para el desarrollo del modelo, y el 25% para su validación.
Proporción de la variable Riesgo en las particiones
##
## No Risk Risk
## 0.666 0.334
##
## No Risk Risk
## 0.6653333 0.3346667
##
## No Risk Risk
## 0.668 0.332
##
## Call:
## randomForest(formula = RISK ~ ., data = Train, importance = T)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 4
##
## OOB estimate of error rate: 21.33%
## Confusion matrix:
## No Risk Risk class.error
## No Risk 2215 280 0.1122244
## Risk 520 735 0.4143426
## Variable Imp_tot
## EMPLOYMENTDURATION EMPLOYMENTDURATION 5.5589737
## AGE AGE 5.3565739
## LOANDURATION LOANDURATION 5.0821198
## LOANAMOUNT LOANAMOUNT 4.8396395
## AGE_n AGE_n 4.0278251
## CHECKINGSTATUS CHECKINGSTATUS 3.9350799
## LOANDURATION_n LOANDURATION_n 3.6975965
## OWNSPROPERTY OWNSPROPERTY 3.6087043
## SEX SEX 3.4269828
## LOANAMOUNT_n LOANAMOUNT_n 3.3837034
## CURRENTRESIDENCEDURATION CURRENTRESIDENCEDURATION 3.0952248
## EXISTINGCREDITSCOUNT EXISTINGCREDITSCOUNT 3.0120146
## OTHERSONLOAN OTHERSONLOAN 2.8131923
## LOANPURPOSE LOANPURPOSE 2.5497943
## TELEPHONE TELEPHONE 1.9006555
## HOUSING HOUSING 1.4176819
## EXISTINGSAVINGS EXISTINGSAVINGS 1.0809348
## INSTALLMENTPERCENT INSTALLMENTPERCENT 0.9230203
## CREDITHISTORY CREDITHISTORY 0.8753996
## INSTALLMENTPLANS INSTALLMENTPLANS 0.7585049
## DEPENDENTS DEPENDENTS 0.0000000
## MeanDecreaseAccuracy MeanDecreaseGini
## EMPLOYMENTDURATION 2.00777372 0.6300751
## AGE 0.64645508 1.7889939
## LOANDURATION 0.88014646 1.2808485
## LOANAMOUNT 0.02358105 1.8949336
## AGE_n 0.34474490 0.7619553
## CHECKINGSTATUS 0.78946069 0.2244944
## LOANDURATION_n 0.12426504 0.6522066
## OWNSPROPERTY 0.54034415 0.1472353
## SEX 1.36155616 -0.8556982
## LOANAMOUNT_n -0.31278045 0.7753590
## CURRENTRESIDENCEDURATION 0.50308195 -0.3289820
## EXISTINGCREDITSCOUNT 0.64641051 -0.5555208
## OTHERSONLOAN 0.70094532 -0.8088779
## LOANPURPOSE -1.37736318 1.0060327
## TELEPHONE 0.11362325 -1.1340926
## HOUSING -0.53820094 -0.9652420
## EXISTINGSAVINGS -1.22810410 -0.6120860
## INSTALLMENTPERCENT -1.30801051 -0.6900941
## CREDITHISTORY -1.22307305 -0.8226522
## INSTALLMENTPLANS -1.16518725 -0.9974327
## DEPENDENTS -1.52966883 -1.3914560
https://www.jstatsoft.org/article/view/v036i11 https://www.analyticsvidhya.com/blog/2016/03/select-important-variables-boruta-package/ https://www.andreaperlato.com/mlpost/feature-selection-using-boruta-algorithm/ https://r4ds.had.co.nz/data-visualisation.html?q=ggplot#creating-a-ggplot https://cran.r-project.org/web/packages/Boruta/vignettes/inahurry.pdf https://rpubs.com/nicolasrondan/EjemploClasificacionRegresionLogistica