Los conceptos pueden verse en el libro Introducción al riesgo financiero que puede obtenerse en:
https://1drv.ms/b/s!Aj-hHTVbsx01h4JmNiA9O57JQuANWg?e=l4IFbm
Los modelos de crédito deben permitir estimar, de forma explícita, la probabilidad de incumplimiento, los patrones de migración hacia distintos estados de deterioro y la magnitud de las pérdidas cuando el crédito no se paga.
Así, utilizando las variables de la solicitud del crédito se trata de predecir, para cada perfil o segmento identificado, la probabilidad de que un periodo después, generalmente un año, del otorgamiento del crédito, el cliente alcance la mora definida como incumplimiento.
Los modelos son calculados por tipo de producto dentro del portafolio de crédito de la entidad financiera, de acuerdo con la información histórica disponible y a su calidad.
Los modelos más utilizados para el cálculo de la probabilidad de incumplimiento son:
1.Árboles de decisión.
Modelo logit.
Modelo probit.
Análisis discriminante.
Redes neuronales.
Se utiliza un conjunto de datos provenientes de una entidad cooperativa de crédito que tiene registro de 10 variables para 653 clientes. Los datos se pueden obtener en:
arbol1 <-rpart(Cumple~Genero+Ingreso+Anterior+Edad+Actual+
Cargo+Civil+Ocupacion,data=H)
arbol2<-prune(arbol1,cp=0.00001)
arbol2
## n= 653
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 653 264 Si (0.40428790 0.59571210)
## 2) Edad< 37.5 373 146 No (0.60857909 0.39142091)
## 4) Genero< 0.5 204 46 No (0.77450980 0.22549020)
## 8) Actual< 6.5 171 26 No (0.84795322 0.15204678) *
## 9) Actual>=6.5 33 13 Si (0.39393939 0.60606061) *
## 5) Genero>=0.5 169 69 Si (0.40828402 0.59171598)
## 10) Actual< 3.5 109 48 No (0.55963303 0.44036697)
## 20) Edad< 30 54 13 No (0.75925926 0.24074074) *
## 21) Edad>=30 55 20 Si (0.36363636 0.63636364)
## 42) Cargo< 2.5 33 15 No (0.54545455 0.45454545)
## 84) Ocupacion>=0.5 20 5 No (0.75000000 0.25000000) *
## 85) Ocupacion< 0.5 13 3 Si (0.23076923 0.76923077) *
## 43) Cargo>=2.5 22 2 Si (0.09090909 0.90909091) *
## 11) Actual>=3.5 60 8 Si (0.13333333 0.86666667) *
## 3) Edad>=37.5 280 37 Si (0.13214286 0.86785714) *
rpart.plot(arbol2)
pre10<-predict(arbol2,type="prob")
pre11<-predict(arbol2,type="class")
pre12<-predict(arbol2,type="matrix")
tabla1<-table(pre11,Cumple)
sum(diag(tabla1))/sum(tabla1)
## [1] 0.8361409
roc(Cumple,pre10[,2],plot = TRUE, legacy.axes = TRUE,
percent = TRUE, xlab = "% Falsos positivos",
ylab = "% verdaderos postivios", col = "red", lwd = 2,
print.auc = TRUE)
## Setting levels: control = No, case = Si
## Setting direction: controls < cases
##
## Call:
## roc.default(response = Cumple, predictor = pre10[, 2], percent = TRUE, plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos", ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
##
## Data: pre10[, 2] in 264 controls (Cumple No) < 389 cases (Cumple Si).
## Area under the curve: 84.94%
matriz1<-confusionMatrix(tabla1);matriz1
## Confusion Matrix and Statistics
##
## Cumple
## pre11 No Si
## No 201 44
## Si 63 345
##
## Accuracy : 0.8361
## 95% CI : (0.8055, 0.8637)
## No Information Rate : 0.5957
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.6558
##
## Mcnemar's Test P-Value : 0.08184
##
## Sensitivity : 0.7614
## Specificity : 0.8869
## Pos Pred Value : 0.8204
## Neg Pred Value : 0.8456
## Prevalence : 0.4043
## Detection Rate : 0.3078
## Detection Prevalence : 0.3752
## Balanced Accuracy : 0.8241
##
## 'Positive' Class : No
##
m1=glm(Cumple~Genero+Ingreso+Anterior+Edad+Actual+Cargo+Civil+Ocupacion,family=binomial(link="logit"),data=D)
summary(m1)
##
## Call:
## glm(formula = Cumple ~ Genero + Ingreso + Anterior + Edad + Actual +
## Cargo + Civil + Ocupacion, family = binomial(link = "logit"),
## data = D)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7362 -0.6539 0.1052 0.6421 2.5280
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.62945 0.74314 -10.266 < 2e-16 ***
## Genero 1.54359 0.22525 6.853 7.24e-12 ***
## Ingreso 0.52030 0.24612 2.114 0.03451 *
## Anterior 0.59623 0.23633 2.523 0.01164 *
## Edad 0.13900 0.01556 8.934 < 2e-16 ***
## Actual 0.28740 0.04069 7.064 1.62e-12 ***
## Cargo 0.18344 0.10330 1.776 0.07575 .
## Civil 0.23725 0.13451 1.764 0.07776 .
## Ocupacion -0.64970 0.22095 -2.940 0.00328 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 881.17 on 652 degrees of freedom
## Residual deviance: 519.26 on 644 degrees of freedom
## AIC: 537.26
##
## Number of Fisher Scoring iterations: 6
pre3<-predict(m1,type="response")
tabla2<-table(true=Cumple,pred=round(fitted(m1)))
sum(diag(tabla2))/sum(tabla2)
## [1] 0.8300153
roc(Cumple,pre3,plot = TRUE, legacy.axes = TRUE,
percent = TRUE, xlab = "% Falsos positivos",
ylab = "% verdaderos postivios", col = "red", lwd = 2,
print.auc = TRUE)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
##
## Call:
## roc.default(response = Cumple, predictor = pre3, percent = TRUE, plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos", ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
##
## Data: pre3 in 264 controls (Cumple 0) < 389 cases (Cumple 1).
## Area under the curve: 89.55%
matriz2<-confusionMatrix(tabla2);matriz2
## Confusion Matrix and Statistics
##
## pred
## true 0 1
## 0 210 54
## 1 57 332
##
## Accuracy : 0.83
## 95% CI : (0.799, 0.858)
## No Information Rate : 0.5911
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.6477
##
## Mcnemar's Test P-Value : 0.8494
##
## Sensitivity : 0.7865
## Specificity : 0.8601
## Pos Pred Value : 0.7955
## Neg Pred Value : 0.8535
## Prevalence : 0.4089
## Detection Rate : 0.3216
## Detection Prevalence : 0.4043
## Balanced Accuracy : 0.8233
##
## 'Positive' Class : 0
##
m2=glm(Cumple~Genero+Ingreso+Anterior+Edad+Actual+Cargo+
Civil+Ocupacion,
family=binomial(link="probit"))
summary(m2)
##
## Call:
## glm(formula = Cumple ~ Genero + Ingreso + Anterior + Edad + Actual +
## Cargo + Civil + Ocupacion, family = binomial(link = "probit"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.83072 -0.67329 0.05407 0.67486 2.55421
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.371160 0.401669 -10.883 < 2e-16 ***
## Genero 0.881288 0.127801 6.896 5.36e-12 ***
## Ingreso 0.325163 0.141525 2.298 0.02159 *
## Anterior 0.355341 0.135431 2.624 0.00870 **
## Edad 0.078990 0.008437 9.362 < 2e-16 ***
## Actual 0.164675 0.022261 7.398 1.39e-13 ***
## Cargo 0.090673 0.059217 1.531 0.12572
## Civil 0.136009 0.077397 1.757 0.07887 .
## Ocupacion -0.362081 0.127008 -2.851 0.00436 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 881.17 on 652 degrees of freedom
## Residual deviance: 520.03 on 644 degrees of freedom
## AIC: 538.03
##
## Number of Fisher Scoring iterations: 7
pre4<-predict(m2,type="response")
tabla3<-table(true=Cumple,pred=round(fitted(m2)))
sum(diag(tabla3))/sum(tabla3)
## [1] 0.8269525
roc(Cumple,pre4,plot = TRUE, legacy.axes = TRUE,
percent = TRUE, xlab = "% Falsos positivos",
ylab = "% verdaderos postivios", col = "red", lwd = 2,
print.auc = TRUE)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
##
## Call:
## roc.default(response = Cumple, predictor = pre4, percent = TRUE, plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos", ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
##
## Data: pre4 in 264 controls (Cumple 0) < 389 cases (Cumple 1).
## Area under the curve: 89.54%
matriz3<-confusionMatrix(tabla3);matriz3
## Confusion Matrix and Statistics
##
## pred
## true 0 1
## 0 210 54
## 1 59 330
##
## Accuracy : 0.827
## 95% CI : (0.7957, 0.8552)
## No Information Rate : 0.5881
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.6418
##
## Mcnemar's Test P-Value : 0.7067
##
## Sensitivity : 0.7807
## Specificity : 0.8594
## Pos Pred Value : 0.7955
## Neg Pred Value : 0.8483
## Prevalence : 0.4119
## Detection Rate : 0.3216
## Detection Prevalence : 0.4043
## Balanced Accuracy : 0.8200
##
## 'Positive' Class : 0
##
dis<-lda(Cumple~Genero+Ingreso+Anterior+Edad+Actual+Cargo+
Civil+Ocupacion,prior=c(1,1)/2)
dis
## Call:
## lda(Cumple ~ Genero + Ingreso + Anterior + Edad + Actual + Cargo +
## Civil + Ocupacion, prior = c(1, 1)/2)
##
## Prior probabilities of groups:
## 0 1
## 0.5 0.5
##
## Group means:
## Genero Ingreso Anterior Edad Actual Cargo Civil Ocupacion
## 0 0.3068182 0.6666667 0.5909091 30.71970 3.075758 1.390152 1.924242 0.6174242
## 1 0.6760925 0.7429306 0.7300771 41.04884 7.398458 1.429306 2.069409 0.4910026
##
## Coefficients of linear discriminants:
## LD1
## Genero 0.93447129
## Ingreso 0.31925438
## Anterior 0.45131921
## Edad 0.07228589
## Actual 0.09202760
## Cargo 0.09896157
## Civil 0.15684656
## Ocupacion -0.38648743
dis$counts
## 0 1
## 264 389
grupo <- predict(dis,method="plug-in")$class
tabla5<-table(grupo,Cumple)
sum(diag(tabla5))/sum(tabla5)
## [1] 0.8116386
plot(dis)
pre5<-predict(dis)
proba<-pre5$posterior
roc(Cumple,proba[,2],plot = TRUE, legacy.axes = TRUE,
percent = TRUE, xlab = "% Falsos positivos",
ylab = "% verdaderos postivios", col = "red", lwd = 2,
print.auc = TRUE)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
##
## Call:
## roc.default(response = Cumple, predictor = proba[, 2], percent = TRUE, plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos", ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
##
## Data: proba[, 2] in 264 controls (Cumple 0) < 389 cases (Cumple 1).
## Area under the curve: 88.96%
matriz3<-confusionMatrix(tabla5);matriz3
## Confusion Matrix and Statistics
##
## Cumple
## grupo 0 1
## 0 227 86
## 1 37 303
##
## Accuracy : 0.8116
## 95% CI : (0.7795, 0.8409)
## No Information Rate : 0.5957
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6203
##
## Mcnemar's Test P-Value : 1.505e-05
##
## Sensitivity : 0.8598
## Specificity : 0.7789
## Pos Pred Value : 0.7252
## Neg Pred Value : 0.8912
## Prevalence : 0.4043
## Detection Rate : 0.3476
## Detection Prevalence : 0.4793
## Balanced Accuracy : 0.8194
##
## 'Positive' Class : 0
##
nuevo<-data.frame(Genero=1,Ingreso=1,Anterior=0,Edad=41,Actual=12,Cargo=0,Civil=2,Ocupacion=1)
predict(object=dis,newdata=nuevo)
## $class
## [1] 1
## Levels: 0 1
##
## $posterior
## 0 1
## 1 0.1717489 0.8282511
##
## $x
## LD1
## 1 0.9522373
modelo <- nnet(Cumple~Genero+Ingreso+Anterior+Edad+Actual+Cargo+Civil+Ocupacion,size=4,D,act.fct="tanh",threshold = 0.01,linear.output = T,wts.only=TRUE )
## # weights: 41
## initial value 200.641054
## iter 10 value 133.121482
## iter 20 value 104.324678
## iter 30 value 84.159591
## iter 40 value 83.595971
## iter 50 value 83.155942
## iter 60 value 83.139526
## final value 83.137064
## converged
modelo
## a 8-4-1 network with 41 weights
## inputs: Genero Ingreso Anterior Edad Actual Cargo Civil Ocupacion
## output(s): Cumple
## options were -
zz<-predict(modelo)
xx<-round(predict(modelo))
tabla6<-table(Cumple,xx)
sum(diag(tabla6))/sum(tabla6)
## [1] 0.8315467
plotnet(modelo,wts.only=T)
garson(modelo)
pre6<-predict(modelo)
roc(Cumple,pre6,plot = TRUE, legacy.axes = TRUE,
percent = TRUE, xlab = "% Falsos positivos",
ylab = "% verdaderos postivios", col = "red", lwd = 2,
print.auc = TRUE)
## Setting levels: control = 0, case = 1
## Warning in roc.default(Cumple, pre6, plot = TRUE, legacy.axes = TRUE, percent
## = TRUE, : Deprecated use a matrix as predictor. Unexpected results may be
## produced, please pass a numeric vector.
## Setting direction: controls < cases
##
## Call:
## roc.default(response = Cumple, predictor = pre6, percent = TRUE, plot = TRUE, legacy.axes = TRUE, xlab = "% Falsos positivos", ylab = "% verdaderos postivios", col = "red", lwd = 2, print.auc = TRUE)
##
## Data: pre6 in 264 controls (Cumple 0) < 389 cases (Cumple 1).
## Area under the curve: 89.58%
matriz4<-confusionMatrix(tabla6);matriz3
## Confusion Matrix and Statistics
##
## Cumple
## grupo 0 1
## 0 227 86
## 1 37 303
##
## Accuracy : 0.8116
## 95% CI : (0.7795, 0.8409)
## No Information Rate : 0.5957
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6203
##
## Mcnemar's Test P-Value : 1.505e-05
##
## Sensitivity : 0.8598
## Specificity : 0.7789
## Pos Pred Value : 0.7252
## Neg Pred Value : 0.8912
## Prevalence : 0.4043
## Detection Rate : 0.3476
## Detection Prevalence : 0.4793
## Balanced Accuracy : 0.8194
##
## 'Positive' Class : 0
##
arbol<-pre10[,1]
logit<-1-pre3
probit<-1-pre4
discri<-proba[,1]
red<-1-pre6
A<-data.frame(arbol,logit,probit,discri,red)
head(A,50)
## arbol logit probit discri red
## 1 0.13214286 3.207148e-02 2.500339e-02 0.171748878 0.04893905
## 2 0.13214286 3.562042e-04 3.203966e-06 0.010786903 0.02795862
## 3 0.13214286 5.489965e-05 1.534556e-08 0.003008673 0.02746111
## 4 0.13214286 6.883278e-04 1.708842e-05 0.018581413 0.02835595
## 5 0.13214286 8.875691e-03 4.102322e-03 0.070790481 0.03334597
## 6 0.13214286 1.247284e-03 7.317820e-05 0.014628182 0.02892409
## 7 0.13214286 9.119398e-03 3.961189e-03 0.049735869 0.03438803
## 8 0.13214286 1.088655e-02 4.626811e-03 0.051967683 0.03698002
## 9 0.13333333 1.187124e-01 1.232424e-01 0.225803977 0.10638419
## 10 0.13333333 8.013737e-02 7.937462e-02 0.195160173 0.07928517
## 11 0.13214286 6.149441e-02 6.031079e-02 0.114354816 0.06549807
## 12 0.13333333 1.103208e-02 5.347374e-03 0.068452709 0.03539036
## 13 0.13333333 2.158241e-02 1.491971e-02 0.098726754 0.04185393
## 14 0.13214286 5.107521e-02 5.149439e-02 0.093634973 0.05569207
## 15 0.13214286 1.061058e-01 1.104577e-01 0.252405934 0.09665834
## 16 0.39393939 1.350527e-01 1.490649e-01 0.433890776 0.10962460
## 17 0.13214286 3.490821e-02 2.831285e-02 0.153254024 0.05310103
## 18 0.13214286 1.985072e-01 2.169304e-01 0.277485542 0.17413744
## 19 0.13214286 8.269849e-02 9.116347e-02 0.164423359 0.07061547
## 20 0.13333333 5.082938e-02 4.705295e-02 0.211858661 0.05815038
## 21 0.13214286 2.169003e-02 1.664272e-02 0.040398223 0.04078348
## 22 0.13214286 3.713543e-02 3.157056e-02 0.095984721 0.05210870
## 23 0.39393939 1.652384e-01 1.828802e-01 0.498286786 0.13125156
## 24 0.13214286 3.490821e-02 2.831285e-02 0.153254024 0.05310103
## 25 0.13214286 1.985072e-01 2.169304e-01 0.277485542 0.17413744
## 26 0.13214286 8.269849e-02 9.116347e-02 0.164423359 0.07061547
## 27 0.13214286 6.937231e-01 6.810956e-01 0.804464199 0.73410452
## 28 0.39393939 6.549787e-01 6.424410e-01 0.837841645 0.68913044
## 29 0.39393939 7.507563e-01 7.418372e-01 0.894710103 0.76800132
## 30 0.13214286 2.203913e-01 2.453003e-01 0.320808442 0.16023702
## 31 0.23076923 3.998673e-01 4.242130e-01 0.443190600 0.32541671
## 32 0.75925926 7.776015e-01 7.669591e-01 0.798545793 0.78941946
## 33 0.84795322 7.513275e-01 7.518615e-01 0.819326797 0.74720975
## 34 0.13214286 5.418731e-01 5.432734e-01 0.684791453 0.55804625
## 35 0.39393939 5.568889e-01 5.552430e-01 0.779515293 0.57178618
## 36 0.39393939 7.037885e-01 6.960281e-01 0.867684733 0.71777710
## 37 0.13214286 1.286347e-01 1.465295e-01 0.199629416 0.09680260
## 38 0.23076923 3.998673e-01 4.242130e-01 0.443190600 0.32541671
## 39 0.75925926 6.461243e-01 6.431114e-01 0.676706510 0.64006028
## 40 0.84795322 7.929748e-01 7.928552e-01 0.854573841 0.79233444
## 41 0.84795322 7.730131e-01 7.692604e-01 0.875958156 0.77982137
## 42 0.09090909 4.969957e-01 5.239813e-01 0.628915609 0.38645809
## 43 0.13333333 3.476447e-02 2.933145e-02 0.107956546 0.04943661
## 44 0.13333333 1.650028e-02 9.963547e-03 0.081046073 0.03915831
## 45 0.13214286 2.354020e-01 2.570459e-01 0.350082793 0.18166520
## 46 0.13214286 9.046551e-02 9.619167e-02 0.164606661 0.08157371
## 47 0.84795322 7.823720e-01 7.669872e-01 0.841168590 0.81995040
## 48 0.39393939 1.333593e-01 1.397279e-01 0.421592334 0.12159819
## 49 0.13214286 2.328623e-02 1.872489e-02 0.137755735 0.04140673
## 50 0.13214286 3.034513e-01 3.278870e-01 0.404819308 0.26133563
|—|O.M.F.