Los conceptos pueden verse en el libro Introducción al riesgo financiero que puede obtenerse en:

https://1drv.ms/b/s!Aj-hHTVbsx01h4JmNiA9O57JQuANWg?e=l4IFbm

0. Introducción.

Los modelos de crédito deben permitir estimar, de forma explícita, la probabilidad de incumplimiento, los patrones de migración hacia distintos estados de deterioro y la magnitud de las pérdidas cuando el crédito no se paga.

Así, utilizando las variables de la solicitud del crédito se trata de predecir, para cada perfil o segmento identificado, la probabilidad de que un periodo después, generalmente un año, del otorgamiento del crédito, el cliente alcance la mora definida como incumplimiento.

Los modelos son calculados por tipo de producto dentro del portafolio de crédito de la entidad financiera, de acuerdo con la información histórica disponible y a su calidad.

Los modelos más utilizados para el cálculo de la probabilidad de incumplimiento son:

1.Árboles de decisión.

Modelo logit.
Modelo probit.
Análisis discriminante.
Redes neuronales.

1. Datos.

Se utiliza un conjunto de datos provenientes de una entidad cooperativa de crédito que tiene registro de 10 variables para 653 clientes. Los datos se pueden obtener en:

https://1drv.ms/x/s!Aj-hHTVbsx01hs9NANoVo0bdoAy93w?e=UFhF1h

2. Árbol de decisión.

arbol1 <-rpart(Cumple~Genero+Ingreso+Anterior+Edad+Actual+
Cargo+Civil+Ocupacion,data=H)
arbol2<-prune(arbol1,cp=0.00001)
arbol2

## n= 653 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 653 264 Si (0.40428790 0.59571210)  
##    2) Edad< 37.5 373 146 No (0.60857909 0.39142091)  
##      4) Genero< 0.5 204  46 No (0.77450980 0.22549020)  
##        8) Actual< 6.5 171  26 No (0.84795322 0.15204678) *
##        9) Actual>=6.5 33  13 Si (0.39393939 0.60606061) *
##      5) Genero>=0.5 169  69 Si (0.40828402 0.59171598)  
##       10) Actual< 3.5 109  48 No (0.55963303 0.44036697)  
##         20) Edad< 30 54  13 No (0.75925926 0.24074074) *
##         21) Edad>=30 55  20 Si (0.36363636 0.63636364)  
##           42) Cargo< 2.5 33  15 No (0.54545455 0.45454545)  
##             84) Ocupacion>=0.5 20   5 No (0.75000000 0.25000000) *
##             85) Ocupacion< 0.5 13   3 Si (0.23076923 0.76923077) *
##           43) Cargo>=2.5 22   2 Si (0.09090909 0.90909091) *
##       11) Actual>=3.5 60   8 Si (0.13333333 0.86666667) *
##    3) Edad>=37.5 280  37 Si (0.13214286 0.86785714) *

rpart.plot(arbol2)

pre10<-predict(arbol2,type="prob")
pre11<-predict(arbol2,type="class")
pre12<-predict(arbol2,type="matrix")
tabla1<-table(pre11,Cumple)
sum(diag(tabla1))/sum(tabla1)

## [1] 0.8361409

2.1 Curva ROC.

roc(Cumple,pre10[,2],plot = TRUE, legacy.axes = TRUE,
    percent = TRUE, xlab = "% Falsos positivos",
    ylab = "% verdaderos postivios", col = "red", lwd = 2,
    print.auc = TRUE)

## Setting levels: control = No, case = Si

## Setting direction: controls < cases

## 
## Call:
## roc.default(response = Cumple, predictor = pre10[, 2], percent = TRUE,     plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos",     ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
## 
## Data: pre10[, 2] in 264 controls (Cumple No) < 389 cases (Cumple Si).
## Area under the curve: 84.94%

2.2. Matriz de confusión.

matriz1<-confusionMatrix(tabla1);matriz1

## Confusion Matrix and Statistics
## 
##      Cumple
## pre11  No  Si
##    No 201  44
##    Si  63 345
##                                           
##                Accuracy : 0.8361          
##                  95% CI : (0.8055, 0.8637)
##     No Information Rate : 0.5957          
##     P-Value [Acc > NIR] : < 2e-16         
##                                           
##                   Kappa : 0.6558          
##                                           
##  Mcnemar's Test P-Value : 0.08184         
##                                           
##             Sensitivity : 0.7614          
##             Specificity : 0.8869          
##          Pos Pred Value : 0.8204          
##          Neg Pred Value : 0.8456          
##              Prevalence : 0.4043          
##          Detection Rate : 0.3078          
##    Detection Prevalence : 0.3752          
##       Balanced Accuracy : 0.8241          
##                                           
##        'Positive' Class : No              
##

3. Modelo logit.

m1=glm(Cumple~Genero+Ingreso+Anterior+Edad+Actual+Cargo+Civil+Ocupacion,family=binomial(link="logit"),data=D)
summary(m1)

## 
## Call:
## glm(formula = Cumple ~ Genero + Ingreso + Anterior + Edad + Actual + 
##     Cargo + Civil + Ocupacion, family = binomial(link = "logit"), 
##     data = D)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7362  -0.6539   0.1052   0.6421   2.5280  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -7.62945    0.74314 -10.266  < 2e-16 ***
## Genero       1.54359    0.22525   6.853 7.24e-12 ***
## Ingreso      0.52030    0.24612   2.114  0.03451 *  
## Anterior     0.59623    0.23633   2.523  0.01164 *  
## Edad         0.13900    0.01556   8.934  < 2e-16 ***
## Actual       0.28740    0.04069   7.064 1.62e-12 ***
## Cargo        0.18344    0.10330   1.776  0.07575 .  
## Civil        0.23725    0.13451   1.764  0.07776 .  
## Ocupacion   -0.64970    0.22095  -2.940  0.00328 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 881.17  on 652  degrees of freedom
## Residual deviance: 519.26  on 644  degrees of freedom
## AIC: 537.26
## 
## Number of Fisher Scoring iterations: 6

pre3<-predict(m1,type="response")
tabla2<-table(true=Cumple,pred=round(fitted(m1)))
sum(diag(tabla2))/sum(tabla2)

## [1] 0.8300153

3.1. Curva ROC.

roc(Cumple,pre3,plot = TRUE, legacy.axes = TRUE,
    percent = TRUE, xlab = "% Falsos positivos",
    ylab = "% verdaderos postivios", col = "red", lwd = 2,
    print.auc = TRUE)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

## 
## Call:
## roc.default(response = Cumple, predictor = pre3, percent = TRUE,     plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos",     ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
## 
## Data: pre3 in 264 controls (Cumple 0) < 389 cases (Cumple 1).
## Area under the curve: 89.55%

3.2. Matriz de cofusión.

matriz2<-confusionMatrix(tabla2);matriz2

## Confusion Matrix and Statistics
## 
##     pred
## true   0   1
##    0 210  54
##    1  57 332
##                                         
##                Accuracy : 0.83          
##                  95% CI : (0.799, 0.858)
##     No Information Rate : 0.5911        
##     P-Value [Acc > NIR] : <2e-16        
##                                         
##                   Kappa : 0.6477        
##                                         
##  Mcnemar's Test P-Value : 0.8494        
##                                         
##             Sensitivity : 0.7865        
##             Specificity : 0.8601        
##          Pos Pred Value : 0.7955        
##          Neg Pred Value : 0.8535        
##              Prevalence : 0.4089        
##          Detection Rate : 0.3216        
##    Detection Prevalence : 0.4043        
##       Balanced Accuracy : 0.8233        
##                                         
##        'Positive' Class : 0             
##

4. Modelo probit.

m2=glm(Cumple~Genero+Ingreso+Anterior+Edad+Actual+Cargo+
Civil+Ocupacion,
family=binomial(link="probit"))
summary(m2)

## 
## Call:
## glm(formula = Cumple ~ Genero + Ingreso + Anterior + Edad + Actual + 
##     Cargo + Civil + Ocupacion, family = binomial(link = "probit"))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.83072  -0.67329   0.05407   0.67486   2.55421  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.371160   0.401669 -10.883  < 2e-16 ***
## Genero       0.881288   0.127801   6.896 5.36e-12 ***
## Ingreso      0.325163   0.141525   2.298  0.02159 *  
## Anterior     0.355341   0.135431   2.624  0.00870 ** 
## Edad         0.078990   0.008437   9.362  < 2e-16 ***
## Actual       0.164675   0.022261   7.398 1.39e-13 ***
## Cargo        0.090673   0.059217   1.531  0.12572    
## Civil        0.136009   0.077397   1.757  0.07887 .  
## Ocupacion   -0.362081   0.127008  -2.851  0.00436 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 881.17  on 652  degrees of freedom
## Residual deviance: 520.03  on 644  degrees of freedom
## AIC: 538.03
## 
## Number of Fisher Scoring iterations: 7

pre4<-predict(m2,type="response")
tabla3<-table(true=Cumple,pred=round(fitted(m2)))
sum(diag(tabla3))/sum(tabla3)

## [1] 0.8269525

4.1. Curva ROC.

roc(Cumple,pre4,plot = TRUE, legacy.axes = TRUE,
    percent = TRUE, xlab = "% Falsos positivos",
    ylab = "% verdaderos postivios", col = "red", lwd = 2,
    print.auc = TRUE)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

## 
## Call:
## roc.default(response = Cumple, predictor = pre4, percent = TRUE,     plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos",     ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
## 
## Data: pre4 in 264 controls (Cumple 0) < 389 cases (Cumple 1).
## Area under the curve: 89.54%

4.2. Matriz de confusión.

matriz3<-confusionMatrix(tabla3);matriz3

## Confusion Matrix and Statistics
## 
##     pred
## true   0   1
##    0 210  54
##    1  59 330
##                                           
##                Accuracy : 0.827           
##                  95% CI : (0.7957, 0.8552)
##     No Information Rate : 0.5881          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6418          
##                                           
##  Mcnemar's Test P-Value : 0.7067          
##                                           
##             Sensitivity : 0.7807          
##             Specificity : 0.8594          
##          Pos Pred Value : 0.7955          
##          Neg Pred Value : 0.8483          
##              Prevalence : 0.4119          
##          Detection Rate : 0.3216          
##    Detection Prevalence : 0.4043          
##       Balanced Accuracy : 0.8200          
##                                           
##        'Positive' Class : 0               
##

5. Análisis discriminante.

dis<-lda(Cumple~Genero+Ingreso+Anterior+Edad+Actual+Cargo+
Civil+Ocupacion,prior=c(1,1)/2)
dis

## Call:
## lda(Cumple ~ Genero + Ingreso + Anterior + Edad + Actual + Cargo + 
##     Civil + Ocupacion, prior = c(1, 1)/2)
## 
## Prior probabilities of groups:
##   0   1 
## 0.5 0.5 
## 
## Group means:
##      Genero   Ingreso  Anterior     Edad   Actual    Cargo    Civil Ocupacion
## 0 0.3068182 0.6666667 0.5909091 30.71970 3.075758 1.390152 1.924242 0.6174242
## 1 0.6760925 0.7429306 0.7300771 41.04884 7.398458 1.429306 2.069409 0.4910026
## 
## Coefficients of linear discriminants:
##                   LD1
## Genero     0.93447129
## Ingreso    0.31925438
## Anterior   0.45131921
## Edad       0.07228589
## Actual     0.09202760
## Cargo      0.09896157
## Civil      0.15684656
## Ocupacion -0.38648743

dis$counts

##   0   1 
## 264 389

grupo <- predict(dis,method="plug-in")$class 
tabla5<-table(grupo,Cumple) 
sum(diag(tabla5))/sum(tabla5)

## [1] 0.8116386

plot(dis)

pre5<-predict(dis)
proba<-pre5$posterior

5.1. Curva ROC.

roc(Cumple,proba[,2],plot = TRUE, legacy.axes = TRUE,
    percent = TRUE, xlab = "% Falsos positivos",
    ylab = "% verdaderos postivios", col = "red", lwd = 2,
    print.auc = TRUE)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

## 
## Call:
## roc.default(response = Cumple, predictor = proba[, 2], percent = TRUE,     plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos",     ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
## 
## Data: proba[, 2] in 264 controls (Cumple 0) < 389 cases (Cumple 1).
## Area under the curve: 88.96%

5.2. Matriz de confusión.

matriz3<-confusionMatrix(tabla5);matriz3

## Confusion Matrix and Statistics
## 
##      Cumple
## grupo   0   1
##     0 227  86
##     1  37 303
##                                           
##                Accuracy : 0.8116          
##                  95% CI : (0.7795, 0.8409)
##     No Information Rate : 0.5957          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6203          
##                                           
##  Mcnemar's Test P-Value : 1.505e-05       
##                                           
##             Sensitivity : 0.8598          
##             Specificity : 0.7789          
##          Pos Pred Value : 0.7252          
##          Neg Pred Value : 0.8912          
##              Prevalence : 0.4043          
##          Detection Rate : 0.3476          
##    Detection Prevalence : 0.4793          
##       Balanced Accuracy : 0.8194          
##                                           
##        'Positive' Class : 0               
##

5.3. Clasificación de un nuevo cliente.

nuevo<-data.frame(Genero=1,Ingreso=1,Anterior=0,Edad=41,Actual=12,Cargo=0,Civil=2,Ocupacion=1)
predict(object=dis,newdata=nuevo)

## $class
## [1] 1
## Levels: 0 1
## 
## $posterior
##           0         1
## 1 0.1717489 0.8282511
## 
## $x
##         LD1
## 1 0.9522373

6. Red neuronal.

modelo <- nnet(Cumple~Genero+Ingreso+Anterior+Edad+Actual+Cargo+Civil+Ocupacion,size=4,D,act.fct="tanh",threshold = 0.01,linear.output = T,wts.only=TRUE )

## # weights:  41
## initial  value 200.641054 
## iter  10 value 133.121482
## iter  20 value 104.324678
## iter  30 value 84.159591
## iter  40 value 83.595971
## iter  50 value 83.155942
## iter  60 value 83.139526
## final  value 83.137064 
## converged

modelo

## a 8-4-1 network with 41 weights
## inputs: Genero Ingreso Anterior Edad Actual Cargo Civil Ocupacion 
## output(s): Cumple 
## options were -

zz<-predict(modelo)
xx<-round(predict(modelo))
tabla6<-table(Cumple,xx)
sum(diag(tabla6))/sum(tabla6)

## [1] 0.8315467

plotnet(modelo,wts.only=T)

garson(modelo)

pre6<-predict(modelo)

6.1. Curva ROC.

roc(Cumple,pre6,plot = TRUE, legacy.axes = TRUE,
    percent = TRUE, xlab = "% Falsos positivos",
    ylab = "% verdaderos postivios", col = "red", lwd = 2,
    print.auc = TRUE)

## Setting levels: control = 0, case = 1

## Warning in roc.default(Cumple, pre6, plot = TRUE, legacy.axes = TRUE, percent
## = TRUE, : Deprecated use a matrix as predictor. Unexpected results may be
## produced, please pass a numeric vector.

## Setting direction: controls < cases

## 
## Call:
## roc.default(response = Cumple, predictor = pre6, percent = TRUE,     plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos",     ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
## 
## Data: pre6 in 264 controls (Cumple 0) < 389 cases (Cumple 1).
## Area under the curve: 89.58%

6.2. Matriz de confusión.

matriz4<-confusionMatrix(tabla6);matriz3

## Confusion Matrix and Statistics
## 
##      Cumple
## grupo   0   1
##     0 227  86
##     1  37 303
##                                           
##                Accuracy : 0.8116          
##                  95% CI : (0.7795, 0.8409)
##     No Information Rate : 0.5957          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6203          
##                                           
##  Mcnemar's Test P-Value : 1.505e-05       
##                                           
##             Sensitivity : 0.8598          
##             Specificity : 0.7789          
##          Pos Pred Value : 0.7252          
##          Neg Pred Value : 0.8912          
##              Prevalence : 0.4043          
##          Detection Rate : 0.3476          
##    Detection Prevalence : 0.4793          
##       Balanced Accuracy : 0.8194          
##                                           
##        'Positive' Class : 0               
##

7. Probabilidad de incumplimiento.

arbol<-pre10[,1]
logit<-1-pre3
probit<-1-pre4
discri<-proba[,1]
red<-1-pre6
A<-data.frame(arbol,logit,probit,discri,red)
head(A,50)

##         arbol        logit       probit      discri        red
## 1  0.13214286 3.207148e-02 2.500339e-02 0.171748878 0.04893905
## 2  0.13214286 3.562042e-04 3.203966e-06 0.010786903 0.02795862
## 3  0.13214286 5.489965e-05 1.534556e-08 0.003008673 0.02746111
## 4  0.13214286 6.883278e-04 1.708842e-05 0.018581413 0.02835595
## 5  0.13214286 8.875691e-03 4.102322e-03 0.070790481 0.03334597
## 6  0.13214286 1.247284e-03 7.317820e-05 0.014628182 0.02892409
## 7  0.13214286 9.119398e-03 3.961189e-03 0.049735869 0.03438803
## 8  0.13214286 1.088655e-02 4.626811e-03 0.051967683 0.03698002
## 9  0.13333333 1.187124e-01 1.232424e-01 0.225803977 0.10638419
## 10 0.13333333 8.013737e-02 7.937462e-02 0.195160173 0.07928517
## 11 0.13214286 6.149441e-02 6.031079e-02 0.114354816 0.06549807
## 12 0.13333333 1.103208e-02 5.347374e-03 0.068452709 0.03539036
## 13 0.13333333 2.158241e-02 1.491971e-02 0.098726754 0.04185393
## 14 0.13214286 5.107521e-02 5.149439e-02 0.093634973 0.05569207
## 15 0.13214286 1.061058e-01 1.104577e-01 0.252405934 0.09665834
## 16 0.39393939 1.350527e-01 1.490649e-01 0.433890776 0.10962460
## 17 0.13214286 3.490821e-02 2.831285e-02 0.153254024 0.05310103
## 18 0.13214286 1.985072e-01 2.169304e-01 0.277485542 0.17413744
## 19 0.13214286 8.269849e-02 9.116347e-02 0.164423359 0.07061547
## 20 0.13333333 5.082938e-02 4.705295e-02 0.211858661 0.05815038
## 21 0.13214286 2.169003e-02 1.664272e-02 0.040398223 0.04078348
## 22 0.13214286 3.713543e-02 3.157056e-02 0.095984721 0.05210870
## 23 0.39393939 1.652384e-01 1.828802e-01 0.498286786 0.13125156
## 24 0.13214286 3.490821e-02 2.831285e-02 0.153254024 0.05310103
## 25 0.13214286 1.985072e-01 2.169304e-01 0.277485542 0.17413744
## 26 0.13214286 8.269849e-02 9.116347e-02 0.164423359 0.07061547
## 27 0.13214286 6.937231e-01 6.810956e-01 0.804464199 0.73410452
## 28 0.39393939 6.549787e-01 6.424410e-01 0.837841645 0.68913044
## 29 0.39393939 7.507563e-01 7.418372e-01 0.894710103 0.76800132
## 30 0.13214286 2.203913e-01 2.453003e-01 0.320808442 0.16023702
## 31 0.23076923 3.998673e-01 4.242130e-01 0.443190600 0.32541671
## 32 0.75925926 7.776015e-01 7.669591e-01 0.798545793 0.78941946
## 33 0.84795322 7.513275e-01 7.518615e-01 0.819326797 0.74720975
## 34 0.13214286 5.418731e-01 5.432734e-01 0.684791453 0.55804625
## 35 0.39393939 5.568889e-01 5.552430e-01 0.779515293 0.57178618
## 36 0.39393939 7.037885e-01 6.960281e-01 0.867684733 0.71777710
## 37 0.13214286 1.286347e-01 1.465295e-01 0.199629416 0.09680260
## 38 0.23076923 3.998673e-01 4.242130e-01 0.443190600 0.32541671
## 39 0.75925926 6.461243e-01 6.431114e-01 0.676706510 0.64006028
## 40 0.84795322 7.929748e-01 7.928552e-01 0.854573841 0.79233444
## 41 0.84795322 7.730131e-01 7.692604e-01 0.875958156 0.77982137
## 42 0.09090909 4.969957e-01 5.239813e-01 0.628915609 0.38645809
## 43 0.13333333 3.476447e-02 2.933145e-02 0.107956546 0.04943661
## 44 0.13333333 1.650028e-02 9.963547e-03 0.081046073 0.03915831
## 45 0.13214286 2.354020e-01 2.570459e-01 0.350082793 0.18166520
## 46 0.13214286 9.046551e-02 9.619167e-02 0.164606661 0.08157371
## 47 0.84795322 7.823720e-01 7.669872e-01 0.841168590 0.81995040
## 48 0.39393939 1.333593e-01 1.397279e-01 0.421592334 0.12159819
## 49 0.13214286 2.328623e-02 1.872489e-02 0.137755735 0.04140673
## 50 0.13214286 3.034513e-01 3.278870e-01 0.404819308 0.26133563

|—|O.M.F.

Probabilidad de incumplimiento

Orlando Moscote Flórez

2022-04-01