Tomaremos el dataset de aprobación de crédito bancario en https://archive.ics.uci.edu/ml/datasets/Credit+Approval . Los datos también se pueden cargar de la carpeta de contenido en crx.data. La información del dataset está en https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.names y expone lo siguiente:

  1. Title: Credit Approval

  2. Sources: 
      (confidential)
      Submitted by quinlan@cs.su.oz.au
  
  3.  Past Usage:
  
      See Quinlan,
      * "Simplifying decision trees", Int J Man-Machine Studies 27,
        Dec 1987, pp. 221-234.
      * "C4.5: Programs for Machine Learning", Morgan Kaufmann, Oct 1992
    
  4.  Relevant Information:
  
      This file concerns credit card applications.  All attribute names
      and values have been changed to meaningless symbols to protect
      confidentiality of the data.
    
      This dataset is interesting because there is a good mix of
      attributes -- continuous, nominal with small numbers of
      values, and nominal with larger numbers of values.  There
      are also a few missing values.
    
  5.  Number of Instances: 690
  
  6.  Number of Attributes: 15 + class attribute
  
  7.  Attribute Information:
  
      A1:   b, a.
      A2:   continuous.
      A3:   continuous.
      A4:   u, y, l, t.
      A5:   g, p, gg.
      A6:   c, d, cc, i, j, k, m, r, q, w, x, e, aa, ff.
      A7:   v, h, bb, j, n, z, dd, ff, o.
      A8:   continuous.
      A9:   t, f.
      A10:  t, f.
      A11:  continuous.
      A12:  t, f.
      A13:  g, p, s.
      A14:  continuous.
      A15:  continuous.
      A16: +,-         (class attribute)
  
  8.  Missing Attribute Values:
      37 cases (5%) have one or more missing values.  The missing
      values from particular attributes are:
  
      A1:  12
      A2:  12
      A4:   6
      A5:   6
      A6:   9
      A7:   9
      A14: 13
  
  9.  Class Distribution
    
      +: 307 (44.5%)
      -: 383 (55.5%)

Actividades a realizar

  1. Carga los datos. Realiza una inspección por variables de la distribución de aprobación de crédito en función de cada atributo visualmente. Realiza las observaciones pertinentes. ¿ Qué variables son mejores para separar los datos?

  2. Prepara el dataset convenientemente e imputa los valores faltantes usando la librería missForest

  3. Divide el dataset tomando las primeras 590 instancias como train y las últimas 100 como test.

  4. Entrena un modelo de regresión logística con regularización Ridge y Lasso en train seleccionando el que mejor AUC tenga. Da las métricas en test.

  5. Aporta los log odds de las variables predictoras sobre la variable objetivo.

  6. Si por cada verdadero positivo ganamos 100e y por cada falso positivo perdemos 20e. ¿Qué valor monetario generará el modelo teniendo en cuénta la matriz de confusión del modelo con mayor AUC (con las métricas en test)?

Paquetes empleados

library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      
── Attaching packages ────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5     ✓ purrr   0.3.4
✓ tibble  3.1.6     ✓ stringr 1.4.0
✓ tidyr   1.2.0     ✓ forcats 0.5.1
✓ readr   2.1.1     
── Conflicts ───────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(fastDummies)
library(missForest)
Loading required package: randomForest
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.

Attaching package: ‘randomForest’

The following object is masked from ‘package:ggplot2’:

    margin

The following object is masked from ‘package:dplyr’:

    combine

Loading required package: foreach

Attaching package: ‘foreach’

The following objects are masked from ‘package:purrr’:

    accumulate, when

Loading required package: itertools
Loading required package: iterators
library(corrplot)
corrplot 0.92 loaded
library(glmnet)
Loading required package: Matrix

Attaching package: ‘Matrix’

The following objects are masked from ‘package:tidyr’:

    expand, pack, unpack

Loaded glmnet 4.1-3
library(caret)
Loading required package: lattice
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     

Attaching package: ‘caret’

The following object is masked from ‘package:purrr’:

    lift
library(lattice)
library(e1071)
library(MASS) 

Attaching package: ‘MASS’

The following object is masked from ‘package:dplyr’:

    select
library(PerformanceAnalytics)
Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric


Attaching package: ‘xts’

The following objects are masked from ‘package:dplyr’:

    first, last


Attaching package: ‘PerformanceAnalytics’

The following objects are masked from ‘package:e1071’:

    kurtosis, skewness

The following object is masked from ‘package:graphics’:

    legend

1. Carga los datos. Realiza una inspección por variables de la distribución de aprobación de crédito en función de cada atributo visualmente. Realiza las observaciones pertinentes. ¿ Qué variables son mejores para separar los datos?

Carga de fichero de datos

url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data'
data <- read.csv(url, sep = ",", header = F)

Inspeccionamos las variables que tenemos y asignamos el nombre de cada una de las variables de acuerdo a la bibliografía (Khaneja, Deepesh. (2017). Credit Approval Analysis using R.). Además convertimos en binaria la variable objetivo Approved.

colnames(data) = c("Male", "Age", "Debt", "Married", "BankCustomer", "EducationLevel", "Ethnicity", "YearsEmployed", "PriorDefault", "Employed", "CreditScore", "DriversLicense", "Citizen", "ZipCode", "Income", "Approved")

head(data)
str(data)
'data.frame':   690 obs. of  16 variables:
 $ Male          : chr  "b" "a" "a" "b" ...
 $ Age           : chr  "30.83" "58.67" "24.50" "27.83" ...
 $ Debt          : num  0 4.46 0.5 1.54 5.62 ...
 $ Married       : chr  "u" "u" "u" "u" ...
 $ BankCustomer  : chr  "g" "g" "g" "g" ...
 $ EducationLevel: chr  "w" "q" "q" "w" ...
 $ Ethnicity     : chr  "v" "h" "h" "v" ...
 $ YearsEmployed : num  1.25 3.04 1.5 3.75 1.71 ...
 $ PriorDefault  : chr  "t" "t" "t" "t" ...
 $ Employed      : chr  "t" "t" "f" "t" ...
 $ CreditScore   : int  1 6 0 5 0 0 0 0 0 0 ...
 $ DriversLicense: chr  "f" "f" "f" "t" ...
 $ Citizen       : chr  "g" "g" "g" "g" ...
 $ ZipCode       : chr  "00202" "00043" "00280" "00100" ...
 $ Income        : int  0 560 824 3 0 0 31285 1349 314 1442 ...
 $ Approved      : chr  "+" "+" "+" "+" ...
summary(data)
     Male               Age                 Debt          Married         
 Length:690         Length:690         Min.   : 0.000   Length:690        
 Class :character   Class :character   1st Qu.: 1.000   Class :character  
 Mode  :character   Mode  :character   Median : 2.750   Mode  :character  
                                       Mean   : 4.759                     
                                       3rd Qu.: 7.207                     
                                       Max.   :28.000                     
 BankCustomer       EducationLevel      Ethnicity         YearsEmployed   
 Length:690         Length:690         Length:690         Min.   : 0.000  
 Class :character   Class :character   Class :character   1st Qu.: 0.165  
 Mode  :character   Mode  :character   Mode  :character   Median : 1.000  
                                                          Mean   : 2.223  
                                                          3rd Qu.: 2.625  
                                                          Max.   :28.500  
 PriorDefault         Employed          CreditScore   DriversLicense    
 Length:690         Length:690         Min.   : 0.0   Length:690        
 Class :character   Class :character   1st Qu.: 0.0   Class :character  
 Mode  :character   Mode  :character   Median : 0.0   Mode  :character  
                                       Mean   : 2.4                     
                                       3rd Qu.: 3.0                     
                                       Max.   :67.0                     
   Citizen            ZipCode              Income           Approved        
 Length:690         Length:690         Min.   :     0.0   Length:690        
 Class :character   Class :character   1st Qu.:     0.0   Class :character  
 Mode  :character   Mode  :character   Median :     5.0   Mode  :character  
                                       Mean   :  1017.4                     
                                       3rd Qu.:   395.5                     
                                       Max.   :100000.0                     
data <- data %>% 
    mutate(Approved = recode(Approved, 
                      "+" = "1", 
                      "-" = "0")) 

A continuación realizamos una inspeccion visual de cada una de las variables en función de la variable de aprovación del crédito (“Approved”).

explain.target <- function(dataframe.object, target.feature){
  
  for (columna in 1:ncol(dataframe.object)){
    
    if (names(dataframe.object[columna]) == "Approved"){
      next 
      
    } else {
      if (class(dataframe.object[,columna]) == "factor"){
        plot <- ggplot(dataframe.object) +
          geom_bar(aes(dataframe.object[,columna], fill = as.factor(target.feature))) + 
          labs(title = paste(names(dataframe.object[columna]), "- Approved")) + 
          xlab(names(dataframe.object[columna]))+
          ylab("Frecuencia") + 
          scale_fill_discrete(name="Crédit Approved", breaks=c("0","1"),
                              labels=c("NO","YES"))
      } 
      
      else if (class(dataframe.object[,columna]) == "character"){
        plot <- ggplot(dataframe.object) +
          geom_bar(aes(dataframe.object[,columna], fill = as.factor(target.feature))) + 
          labs(title = paste(names(dataframe.object[columna]), "- Approved")) + 
          xlab(names(dataframe.object[columna]))+
          ylab("Frecuencia") + 
          scale_fill_discrete(name="Crédit Approved", breaks=c("0","1"),
                              labels=c("NO","YES"))
      } 
      
      else {
        plot <- ggplot(dataframe.object) +
          geom_boxplot(aes(dataframe.object[,columna], fill = as.factor(target.feature)))+
          coord_flip()+
          labs(title=paste(names(dataframe.object[columna]), "- Approved"))+
          xlab(names(dataframe.object[columna])) + 
          scale_fill_discrete(name =" Approved", breaks=c("0","1"),
                              labels=c("NO","YES"))
      }
      
      
      plot <- print(plot)
    }
  }
}

explain.target(dataframe.object = data, target.feature = data$Approved)

Las observaciones se pueden dividir en dos:

  • Variables continuas: Se distribuyen de una manera similar en todos los caos, no obstante revisaremos esto más adelante ya que en el caso de CreditScore los valores outliers no nos permiten apreciar diferencias.

  • Variables discretas: Se observan valores faltanes (“?”) que se eliminaran. Las variables “Married”, “BankCustomer” y “Citizen” tienen valores que siempre obtienen el crédito bancario por lo que son buenas para separar datos. La variable PriorDefault contiene para su valor “t” una mayor cantidad de créditos concedidos mientras que para su valor “f” lo contrario.

2. Prepara el dataset convenientemente e imputa los valores faltantes usando la librería missForest

Se observa que algunas variables como Male, Married, BankCostumer, Education level y Ethnicity que poseen valores designados como “?”.Dichos valores se transforman en valores nulos en todo el dataset.

data[data == "?"] <- NA

Ahora prepararemos el dataset e imputaremos valores empleando para ello la librería MissForest

sapply(data, function(x) sum(is.na(x))); sum(sapply(data, function(x) sum(is.na(x))))
          Male            Age           Debt        Married   BankCustomer EducationLevel 
            12             12              0              6              6              9 
     Ethnicity  YearsEmployed   PriorDefault       Employed    CreditScore DriversLicense 
             9              0              0              0              0              0 
       Citizen        ZipCode         Income       Approved 
             0             13              0              0 
[1] 67

Se convierten en factor las variables chr para poder aplicar MissForest

data <- type.convert(data, as.is=FALSE)

data.i <- missForest(as.data.frame(data))
  missForest iteration 1 in progress...done!
  missForest iteration 2 in progress...done!
  missForest iteration 3 in progress...done!
data <- data.i$ximp

Comprobamos que los valores Nulos han desaparecido

sapply(data, function(x) sum(is.na(x)))
          Male            Age           Debt        Married   BankCustomer EducationLevel 
             0              0              0              0              0              0 
     Ethnicity  YearsEmployed   PriorDefault       Employed    CreditScore DriversLicense 
             0              0              0              0              0              0 
       Citizen        ZipCode         Income       Approved 
             0              0              0              0 
summary(data)
 Male         Age             Debt        Married BankCustomer EducationLevel
 a:213   Min.   :13.75   Min.   : 0.000   l:  2   g :525       c      :137   
 b:477   1st Qu.:22.67   1st Qu.: 1.000   u:525   gg:  2       q      : 78   
         Median :28.58   Median : 2.750   y:163   p :163       w      : 64   
         Mean   :31.59   Mean   : 4.759                        i      : 63   
         3rd Qu.:38.23   3rd Qu.: 7.207                        aa     : 54   
         Max.   :80.25   Max.   :28.000                        ff     : 54   
                                                               (Other):240   
   Ethnicity   YearsEmployed    PriorDefault Employed  CreditScore   DriversLicense
 v      :400   Min.   : 0.000   f:329        f:395    Min.   : 0.0   f:374         
 h      :138   1st Qu.: 0.165   t:361        t:295    1st Qu.: 0.0   t:316         
 bb     : 63   Median : 1.000                         Median : 0.0                 
 ff     : 58   Mean   : 2.223                         Mean   : 2.4                 
 j      :  9   3rd Qu.: 2.625                         3rd Qu.: 3.0                 
 z      :  9   Max.   :28.500                         Max.   :67.0                 
 (Other): 13                                                                       
 Citizen    ZipCode           Income            Approved     
 g:625   Min.   :   0.0   Min.   :     0.0   Min.   :0.0000  
 p:  8   1st Qu.:  80.0   1st Qu.:     0.0   1st Qu.:0.0000  
 s: 57   Median : 160.0   Median :     5.0   Median :0.0000  
         Mean   : 183.7   Mean   :  1017.4   Mean   :0.4449  
         3rd Qu.: 272.0   3rd Qu.:   395.5   3rd Qu.:1.0000  
         Max.   :2000.0   Max.   :100000.0   Max.   :1.0000  
                                                             

Analisis Exploratorio de Datos

La variable ZipCode vemos que tiene 183 variables diferentes las cuales no son numéricas sino categóricas por lo que se decide prescindir de esta variable antes de continuar con el análisis.

unique(data$ZipCode)
  [1]  202.00000   43.00000  280.00000  100.00000  120.00000  360.00000  164.00000
  [8]   80.00000  180.00000   52.00000  128.00000  260.00000    0.00000  320.00000
 [15]  396.00000   96.00000  200.00000  300.00000  145.00000  500.00000  168.00000
 [22]  434.00000  583.00000   30.00000  240.00000   70.00000  455.00000  311.00000
 [29]  216.00000  491.00000  400.00000  239.00000  160.00000  711.00000  250.00000
 [36]  520.00000  515.00000  420.00000  266.95667  980.00000  443.00000  140.00000
 [43]   94.00000  368.00000  288.00000  928.00000  188.00000  112.00000  171.00000
 [50]  268.00000  167.00000   75.00000  152.00000  176.00000  329.00000  212.00000
 [57]  410.00000  274.00000  375.00000  408.00000  350.00000  204.00000   40.00000
 [64]  181.00000  399.00000  440.00000   93.00000   60.00000  395.00000  393.00000
 [71]   21.00000   29.00000  102.00000  431.00000  370.00000   24.00000   20.00000
 [78]  129.00000  510.00000  195.00000  144.00000  380.00000  149.66111   49.00000
 [85]   50.00000  109.27971  381.00000  150.00000  117.00000   56.00000  211.00000
 [92]  230.00000  156.00000   22.00000  228.00000  519.00000  253.00000  487.00000
 [99]  220.00000   91.02667   88.00000   73.00000  121.00000  470.00000  136.00000
[106]  132.00000  292.00000  154.00000  272.00000  216.17571  340.00000   92.32067
[113]  108.00000  720.00000  450.00000  232.00000  170.00000 1160.00000  411.00000
[120]  189.66657  460.00000  348.00000  480.00000  640.00000  372.00000  276.00000
[127]  221.00000  352.00000  141.00000  178.00000  600.00000  550.00000  207.39714
[134] 2000.00000  225.00000  210.00000  110.00000  356.00000   45.00000   62.00000
[141]   92.00000  174.00000   17.00000   86.00000   82.99895  454.00000  201.13571
[148]  254.00000   28.00000  263.00000  333.00000  312.00000  290.00000  371.00000
[155]   99.00000  252.00000  760.00000  560.00000  130.00000  523.00000  680.00000
[162]  163.00000  208.00000  383.00000  330.00000  422.00000  840.00000  432.00000
[169]   32.00000  186.00000  303.00000  184.46190  349.00000  224.00000  369.00000
[176]  140.25905  231.42357   76.00000  231.00000  309.00000  416.00000  465.00000
[183]  256.00000
data = subset(data, select = -ZipCode)

Se convierten las variables: Male, PriorDefault, Employed y DriverLicense a variables del tipo factor binario.

data <- data %>% 
  mutate(Male = recode(Male,
         "a"="1",
         "b"="0",))

data$PriorDefault <- as.factor(data$PriorDefault)
data <- data %>% 
  mutate(PriorDefault = recode(PriorDefault,
         "t"="No",
         "f"="Yes"))

data$Employed <- as.factor(data$Employed)
data <- data %>% 
  mutate(Employed = recode(Employed,
         "t"="Employed",
         "f"="Unemployed"))

data$DriversLicense <- as.factor(data$DriversLicense)
data <- data %>% 
  mutate(DriversLicense = recode(DriversLicense,
         "t"="1",
         "f"="0"))
data$Approved <- as.character(data$Approved)
str(data)
'data.frame':   690 obs. of  15 variables:
 $ Male          : Factor w/ 2 levels "1","0": 2 1 1 2 2 2 2 1 2 2 ...
 $ Age           : num  30.8 58.7 24.5 27.8 20.2 ...
 $ Debt          : num  0 4.46 0.5 1.54 5.62 ...
 $ Married       : Factor w/ 3 levels "l","u","y": 2 2 2 2 2 2 2 2 3 3 ...
 $ BankCustomer  : Factor w/ 3 levels "g","gg","p": 1 1 1 1 1 1 1 1 3 3 ...
 $ EducationLevel: Factor w/ 14 levels "aa","c","cc",..: 13 11 11 13 13 10 12 3 9 13 ...
 $ Ethnicity     : Factor w/ 9 levels "bb","dd","ff",..: 8 4 4 8 8 8 4 8 4 8 ...
 $ YearsEmployed : num  1.25 3.04 1.5 3.75 1.71 ...
 $ PriorDefault  : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
 $ Employed      : Factor w/ 2 levels "Unemployed","Employed": 2 2 1 2 1 1 1 1 1 1 ...
 $ CreditScore   : int  1 6 0 5 0 0 0 0 0 0 ...
 $ DriversLicense: Factor w/ 2 levels "0","1": 1 1 1 2 1 2 2 1 1 2 ...
 $ Citizen       : Factor w/ 3 levels "g","p","s": 1 1 1 1 3 1 1 1 1 1 ...
 $ Income        : int  0 560 824 3 0 0 31285 1349 314 1442 ...
 $ Approved      : chr  "1" "1" "1" "1" ...
summary(data)
 Male         Age             Debt        Married BankCustomer EducationLevel
 1:213   Min.   :13.75   Min.   : 0.000   l:  2   g :525       c      :137   
 0:477   1st Qu.:22.67   1st Qu.: 1.000   u:525   gg:  2       q      : 78   
         Median :28.58   Median : 2.750   y:163   p :163       w      : 64   
         Mean   :31.59   Mean   : 4.759                        i      : 63   
         3rd Qu.:38.23   3rd Qu.: 7.207                        aa     : 54   
         Max.   :80.25   Max.   :28.000                        ff     : 54   
                                                               (Other):240   
   Ethnicity   YearsEmployed    PriorDefault       Employed    CreditScore  
 v      :400   Min.   : 0.000   Yes:329      Unemployed:395   Min.   : 0.0  
 h      :138   1st Qu.: 0.165   No :361      Employed  :295   1st Qu.: 0.0  
 bb     : 63   Median : 1.000                                 Median : 0.0  
 ff     : 58   Mean   : 2.223                                 Mean   : 2.4  
 j      :  9   3rd Qu.: 2.625                                 3rd Qu.: 3.0  
 z      :  9   Max.   :28.500                                 Max.   :67.0  
 (Other): 13                                                                
 DriversLicense Citizen     Income           Approved        
 0:374          g:625   Min.   :     0.0   Length:690        
 1:316          p:  8   1st Qu.:     0.0   Class :character  
                s: 57   Median :     5.0   Mode  :character  
                        Mean   :  1017.4                     
                        3rd Qu.:   395.5                     
                        Max.   :100000.0                     
                                                             

Se realiza una nueva observación de los datos con los cambios realizados

Variables Categóricas vs Variable Objetivo (Approved)

Male vs Approved
ggplot(data = data, aes(x = Male, fill = Approved)) +
  geom_bar(position = "fill") +
  labs(y = "Rate", x = 'Male') + ggtitle('Male vs Approved')

Parece que el género masculino tiene una mayor proporción de aprobaciones que el género femenino, pero la diferencia entre ambos índices no parece ser tan significativa, se seguirá estudiando si esto afecta a la obtención de un crédito más adelante.

Married vs Approved

ggplot(data = data, aes(x = Married, fill = Approved)) +
  geom_bar() +
  labs(y = "Rate", x = 'Married') + ggtitle('Married vs Approved')

En este caso se ve una clara diferencia entre el estado civil de una persona y la posibilidad de obtener un crédito bancario. Cabe destacar que para el estado civil ‘l’ la aprobación del crédito es total, esto pude deberse a que la muestra es demasiado pequeña y todas las personas con ese estado civil consiguieron el préstamo. Se comprueba de la siguiente manera:

data %>% 
  group_by(Married) %>% 
  count()

Se ve que apenas dos personas están clasificadas como ‘l’ dentro de la variable Married con lo que queda explicada la anomalía de tener un 100% de créditos aprobados en este caso.

Bank Custumer vs Approved
ggplot(data = data, aes(x = BankCustomer, fill = Approved)) +
  geom_bar() +
  labs(y = "Rate", x = 'Bank Customer') + ggtitle('Bank Customer vs Approved')

En este caso vemos una correlación entre los estados de los clientes bancarios y la tasa de aprobación de un crédito. Aunque nuevamente vemos que para la categoría ‘gg’ obtenemos un 100% de tasa de aprobación, asi que se estudiara el tamaño de la muestra:

data %>% 
  group_by(BankCustomer) %>% 
  count()

De nuevo vemos que hay solo dos personas en esta categoría y que a la vez obtuvieron el préstamo explicando así esa tasa de 100% de aprobación

Education level vs Approved
ggplot(data = data, aes(x = EducationLevel, fill = Approved)) +
  geom_bar() +
  labs(y = "Rate", x = 'Education Level') + ggtitle('Education Level vs Approved')

Se aprecia que el nivel de eduación también afecta a nuestra variable objetivo, para el nivel “x” y “cc” hay una mayor tasa de aprobación que para los niveles “ff” y “d”.

Ethnicity vs Approved
ggplot(data = data, aes(x = Ethnicity, fill = Approved)) +
  geom_bar() +
  labs(y = "Rate", x = 'Ethnicity') + ggtitle('Ethnicity vs Approved')

La etnia de una persona aparentemente afecta a la probabilidad de obtener un prestamos, los individuos etiquetados como “ff” tienen menos opciones de obtener un préstamo que los etiquetados como “z”.

Prior Default vs Approved
ggplot(data = data, aes(x = PriorDefault, fill = Approved)) +
  geom_bar(position = "fill") +
  labs(y = "Rate", x = 'Prior Default') + ggtitle('Prior Default vs Approved')

Se ve claramente que aquellos clientes que no han cumplido con sus pagos tiene muy pocas opciones de conseguir un nuevo crédito.

Employed vs Approved
ggplot(data = data, aes(x = Employed, fill = Approved)) +
  geom_bar(position = "fill") +
  labs(y = "Rate", x = 'Employed') + ggtitle('Employed vs Approved')

Como es lógico cabe esperar que las personas con trabajo tengan más opciones de obtener un préstamo

DriversLicense vs Approved
ggplot(data = data, aes(x = DriversLicense, fill = Approved)) +
  geom_bar(position = "fill") +
  labs(y = "Rate", x = 'Drivers License') + ggtitle('Drivers License vs Approved')

En este caso no parece haber una relación entre ambas variables.

Citizen vs Approved
ggplot(data = data, aes(x = Citizen, fill = Approved)) +
  geom_bar(position = "fill") +
  labs(y = "Rate", x = 'Citizenship') + ggtitle('Citizenship vs Approved')

Parece haber alguna relación entre estas dos variables.

Test de independencia de las variables categóricas frente a la variable objetivo

Para comprobar si existe independencia entre las diferentes variables categóricas y la variable objetivo, comprobaremos el chi-cuadrado con un nivel de significancia del 95%, la siguiente función imprimirá el nombre de la variable y los p-valores resultantes.

categoricVars <- data %>% dplyr::select(Male, Married, BankCustomer, EducationLevel,
                                       Ethnicity, PriorDefault, Employed, DriversLicense,
                                       Citizen) 

sapply(categoricVars, 
       function(x) round(chisq.test(table(x, data$Approved))$p.value,2))
Warning in chisq.test(table(x, data$Approved)) :
  Chi-squared approximation may be incorrect
Warning in chisq.test(table(x, data$Approved)) :
  Chi-squared approximation may be incorrect
Warning in chisq.test(table(x, data$Approved)) :
  Chi-squared approximation may be incorrect
Warning in chisq.test(table(x, data$Approved)) :
  Chi-squared approximation may be incorrect
Warning in chisq.test(table(x, data$Approved)) :
  Chi-squared approximation may be incorrect
          Male        Married   BankCustomer EducationLevel      Ethnicity   PriorDefault 
          0.54           0.00           0.00           0.00           0.00           0.00 
      Employed DriversLicense        Citizen 
          0.00           0.45           0.01 

Las variables Married, BankCustomer, EducationLevel, Ethnicity, PriorDefault y Employed son dependientes de la variable objetivo. Mientras que Male y DriversLicense son independientes. Por tanto, eliminaremos estas dos últimas variables de nuestro modelo.

Variables numéricas vs Variable Objetivo

Age vs Approved
data$Approved <- as.factor(data$Approved)
cdplot(data$Approved ~ data$Age, main = "Age vs Approved", 
       xlab = "Age", ylab = "Conditional Density" ) 

El gráfico muestra cómo los que tienen más edad (60) tienen más posibilidades de que les aprueben el crédito, aunque cuando se llega al umbral de los 75 años parece que la probabilidad baja drásticamente. Para más información se realiza un diagrama de cajas:

ggplot(data, aes(x= Approved, y= Age, fill= Approved)) +
geom_boxplot() +
labs(y = "Age", x = 'Approved') + ggtitle('Age vs Approved') +
scale_fill_brewer(palette = "Set2")

Como se ha visto en el gráfico anterior parece haber una cierta correlación entre la edad y la tasa de aprobación, a más edad podrias tener mayor facilidad para conseguir un crédito.

Debt vs Approved
cdplot(data$Approved ~ data$Debt, main = "Debt vs Approved", 
       xlab = "Debt", ylab = "Conditional Density" ) 

La gráfica describe una relación entre la deuda y la aprobación del crédito en la que cuanto más deuda tienes más posibilidades tienes de conseguir un crédito, aunque parece bajar alrededor del 26 en el eje de la Deuda para luego volver a subir.

ggplot(data, aes(x= Approved, y= Debt, fill= Approved)) +
geom_boxplot() +
labs(y = "Debt", x = 'Approved') + 
  ggtitle('Debt vs Approved') +
scale_fill_brewer(palette = "Set2")

El grafico de cajas parece indicar lo mismo descrito antes.

Years Employed vs Approved
ggplot(data, aes(x= Approved, y= YearsEmployed, fill= Approved)) +
geom_boxplot() +
labs(y = "Years Employed", x = 'Approved') + 
  ggtitle('Years Employed vs Approved') +
scale_fill_brewer(palette = "Set2")

Parece haber una correlación positiva entre los años trabajados y la aprobación del crédito.

Credit Score vs Approved
ggplot(data, aes(x= Approved, y= CreditScore, fill= Approved)) +
geom_boxplot() +
labs(y = "Credit Score", x = 'Approved') + 
  ggtitle('Credit Score vs Approved') +
scale_fill_brewer(palette = "Set2")

De nuevo se aprecia una correlación positiva entre ambas variables

Income vs Approved

ggplot(data, aes(x= Approved, y= Income, fill= Approved)) +
geom_boxplot() +
labs(y = "Income", x = 'Approved') + 
  ggtitle('Income vs Approved') +
scale_fill_brewer(palette = "Set2") 

Este gráfico contiene una gran cantidad de valores atípicos extremos, por lo que para apreciar la gráfica hacemos un zoom:

ggplot(data, aes(x= Approved, y= Income, fill= Approved)) +
geom_boxplot() +
labs(y = "Income", x = 'Approved') + 
  ggtitle('Income vs Approved') +
scale_fill_brewer(palette = "Set2") +
  coord_cartesian(ylim=c(0, 1500)) #zoom

El gráfico muestra una correlación positiva entre las variables Income y Approved.

Matriz de correlación

Ahora determinaremos una matriz de correlación para verificar si existe colinealidad entre las variables numéricas.

numericVars <- data.frame(data$Age, data$Debt, data$YearsEmployed, data$CreditScore, data$Income)
#corrplot(cor(numericVars), method = "number", type="upper")
chart.Correlation(numericVars, histogram=TRUE, pch=19)

El valor más grande es 0.4 entre Años empleados y Edad, este valor no es tan grande como para causar colinealidad, por lo que ambas variables se incluirán en nuestro modelo.

Normalización de las variables numéricas

Primero comprobamos si nuestras variables numéricas siguen una distribución normal.

for (columna in 1:ncol(data)){
  if (class(data[,columna]) != "factor"){
    qqnorm(data[,columna], 
         main = paste("Normality Plot: ", colnames(data[columna])))
    qqline(data[,columna])
  } else {
    next
  }
}

Ninguna de las variables parecen tener una distribución normal pero vamos a comprobarlo con la prueba de Shapiro.

sapply(numericVars, function(x) round(shapiro.test(x)$p.value,2))
          data.Age          data.Debt data.YearsEmployed   data.CreditScore 
                 0                  0                  0                  0 
       data.Income 
                 0 

Los valores de p obtenidos en la prueba de Shapiro son cercanos a 0, rechazamos la hipótesis nula de que existe normalidad en todos los casos, por lo que aceptamos la hipótesis alternativa de que ninguna de las variables tiene una distribución normal.

Conclusiones del Análisis Exploratorio de Datos:

  • Necesitamos normalizar todas las variables numéricas.

  • No hay colinealidad entre las variables numéricas.

  • Las variables categóricas “Male” y “DriversLicense” no parecen influir en la variable objetivo, el resto sí lo hace en diferente medida.

  • Las categorías ‘l’ y ‘gg’ de las variables “Married” y “BankCustomer” respectivamente, solo tienen dos observaciones cada una, y se les otorgó crédito en todos los casos. Por lo tanto, se supone que ambas variables son variables binarias, por lo que deberíamos eliminarlos de nuestro modelo.

Modificación de los datos

Normalización de las variables numéricas:

data$Age <- scale(data$Age)
data$Debt <- scale(data$Debt)
data$YearsEmployed <- scale(data$YearsEmployed)
data$CreditScore <- scale(data$CreditScore)
data$Income <- scale(data$Income)

Eliminamos las variables Male y DriversLicense

data$Male <- NULL
data$DriversLicense <- NULL

Ya que nuestros datos tienen variables categóricas, debemos tratarlas como dummies en un modelo de clasificación, por lo que definiremos un nuevo dataframe con variables dummies. Además, se eliminan la categoría “l” de Married y “gg” de BankCustomer.

df <- dummy_cols(data, remove_selected_columns = T)
colnames(df)
 [1] "Age"                 "Debt"                "YearsEmployed"      
 [4] "CreditScore"         "Income"              "Married_l"          
 [7] "Married_u"           "Married_y"           "BankCustomer_g"     
[10] "BankCustomer_gg"     "BankCustomer_p"      "EducationLevel_aa"  
[13] "EducationLevel_c"    "EducationLevel_cc"   "EducationLevel_d"   
[16] "EducationLevel_e"    "EducationLevel_ff"   "EducationLevel_i"   
[19] "EducationLevel_j"    "EducationLevel_k"    "EducationLevel_m"   
[22] "EducationLevel_q"    "EducationLevel_r"    "EducationLevel_w"   
[25] "EducationLevel_x"    "Ethnicity_bb"        "Ethnicity_dd"       
[28] "Ethnicity_ff"        "Ethnicity_h"         "Ethnicity_j"        
[31] "Ethnicity_n"         "Ethnicity_o"         "Ethnicity_v"        
[34] "Ethnicity_z"         "PriorDefault_Yes"    "PriorDefault_No"    
[37] "Employed_Unemployed" "Employed_Employed"   "Citizen_g"          
[40] "Citizen_p"           "Citizen_s"           "Approved_0"         
[43] "Approved_1"         
df$Approved_0 <- NULL
df$Approved_1 <- NULL


df$Married_l <- NULL
df$BankCustomer_gg <- NULL

df$Approved <- data$Approved

summary(df)
      Age               Debt         YearsEmployed      CreditScore          Income       
 Min.   :-1.5031   Min.   :-0.9559   Min.   :-0.6644   Min.   :-0.4935   Min.   :-0.1953  
 1st Qu.:-0.7515   1st Qu.:-0.7550   1st Qu.:-0.6151   1st Qu.:-0.4935   1st Qu.:-0.1953  
 Median :-0.2535   Median :-0.4035   Median :-0.3656   Median :-0.4935   Median :-0.1943  
 Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
 3rd Qu.: 0.5595   3rd Qu.: 0.4919   3rd Qu.: 0.1200   3rd Qu.: 0.1234   3rd Qu.:-0.1194  
 Max.   : 4.1000   Max.   : 4.6686   Max.   : 7.8519   Max.   :13.2841   Max.   :18.9982  
   Married_u        Married_y      BankCustomer_g   BankCustomer_p   EducationLevel_aa
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
 1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.00000  
 Median :1.0000   Median :0.0000   Median :1.0000   Median :0.0000   Median :0.00000  
 Mean   :0.7609   Mean   :0.2362   Mean   :0.7609   Mean   :0.2362   Mean   :0.07826  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
 EducationLevel_c EducationLevel_cc EducationLevel_d  EducationLevel_e  EducationLevel_ff
 Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :0.0000   Median :0.00000   Median :0.00000   Median :0.00000   Median :0.00000  
 Mean   :0.1986   Mean   :0.05942   Mean   :0.04348   Mean   :0.03913   Mean   :0.07826  
 3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :1.0000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
 EducationLevel_i EducationLevel_j  EducationLevel_k  EducationLevel_m  EducationLevel_q
 Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.000   
 1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.000   
 Median :0.0000   Median :0.00000   Median :0.00000   Median :0.00000   Median :0.000   
 Mean   :0.0913   Mean   :0.01594   Mean   :0.07391   Mean   :0.05652   Mean   :0.113   
 3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.000   
 Max.   :1.0000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000   Max.   :1.000   
 EducationLevel_r   EducationLevel_w  EducationLevel_x   Ethnicity_bb     Ethnicity_dd    
 Min.   :0.000000   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000   Min.   :0.00000  
 1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.00000  
 Median :0.000000   Median :0.00000   Median :0.00000   Median :0.0000   Median :0.00000  
 Mean   :0.004348   Mean   :0.09275   Mean   :0.05507   Mean   :0.0913   Mean   :0.01014  
 3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.00000  
 Max.   :1.000000   Max.   :1.00000   Max.   :1.00000   Max.   :1.0000   Max.   :1.00000  
  Ethnicity_ff      Ethnicity_h   Ethnicity_j       Ethnicity_n        Ethnicity_o      
 Min.   :0.00000   Min.   :0.0   Min.   :0.00000   Min.   :0.000000   Min.   :0.000000  
 1st Qu.:0.00000   1st Qu.:0.0   1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0.000000  
 Median :0.00000   Median :0.0   Median :0.00000   Median :0.000000   Median :0.000000  
 Mean   :0.08406   Mean   :0.2   Mean   :0.01304   Mean   :0.005797   Mean   :0.002899  
 3rd Qu.:0.00000   3rd Qu.:0.0   3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:0.000000  
 Max.   :1.00000   Max.   :1.0   Max.   :1.00000   Max.   :1.000000   Max.   :1.000000  
  Ethnicity_v      Ethnicity_z      PriorDefault_Yes PriorDefault_No  Employed_Unemployed
 Min.   :0.0000   Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000     
 1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000     
 Median :1.0000   Median :0.00000   Median :0.0000   Median :1.0000   Median :1.0000     
 Mean   :0.5797   Mean   :0.01304   Mean   :0.4768   Mean   :0.5232   Mean   :0.5725     
 3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000     
 Max.   :1.0000   Max.   :1.00000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000     
 Employed_Employed   Citizen_g        Citizen_p         Citizen_s       Approved
 Min.   :0.0000    Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   0:383   
 1st Qu.:0.0000    1st Qu.:1.0000   1st Qu.:0.00000   1st Qu.:0.00000   1:307   
 Median :0.0000    Median :1.0000   Median :0.00000   Median :0.00000           
 Mean   :0.4275    Mean   :0.9058   Mean   :0.01159   Mean   :0.08261           
 3rd Qu.:1.0000    3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.00000           
 Max.   :1.0000    Max.   :1.0000   Max.   :1.00000   Max.   :1.00000           
dim(df)
[1] 690  40
head(df)

Modelo de selección de variables

Se realizará un modelo de selección de variables basado en stepAIC, en primer lugar definimos el modelo mínimo y máximo, donde el mínimo será la variable objetivo(Approved) contra sí mismo y el valor máximo la variable objetivo contra todas las variables:

fit1 <- glm(Approved~., data=df, family=binomial)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
fit0 <- glm(Approved~1, data=df, family=binomial)

step <-stepAIC(fit0,direction="both",scope=list(upper=fit1,lower=fit0)) 
Start:  AIC=950.16
Approved ~ 1
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ PriorDefault_No      1   540.95 544.95
+ PriorDefault_Yes     1   540.95 544.95
+ CreditScore          1   762.74 766.74
+ Employed_Employed    1   798.66 802.66
+ Employed_Unemployed  1   798.66 802.66
+ Income               1   862.80 866.80
+ YearsEmployed        1   863.32 867.32
+ Debt                 1   918.36 922.36
+ EducationLevel_x     1   920.99 924.99
+ Married_y            1   922.64 926.64
+ BankCustomer_p       1   922.64 926.64
+ Ethnicity_ff         1   924.15 928.15
+ Ethnicity_h          1   924.16 928.16
+ EducationLevel_ff    1   924.72 928.72
+ Married_u            1   924.92 928.92
+ BankCustomer_g       1   924.92 928.92
+ Age                  1   929.85 933.85
+ EducationLevel_q     1   932.62 936.62
+ EducationLevel_cc    1   935.90 939.90
+ EducationLevel_i     1   937.37 941.37
+ Citizen_s            1   939.43 943.43
+ EducationLevel_k     1   941.39 945.39
+ EducationLevel_d     1   942.09 946.09
+ Citizen_g            1   942.51 946.51
+ EducationLevel_aa    1   946.06 950.06
<none>                     948.16 950.16
+ Ethnicity_v          1   946.22 950.22
+ Ethnicity_z          1   946.34 950.34
+ EducationLevel_w     1   946.74 950.74
+ Citizen_p            1   947.10 951.10
+ Ethnicity_dd         1   947.40 951.40
+ EducationLevel_e     1   947.54 951.54
+ EducationLevel_r     1   947.55 951.55
+ EducationLevel_j     1   947.85 951.85
+ EducationLevel_m     1   947.95 951.95
+ Ethnicity_bb         1   948.08 952.08
+ Ethnicity_n          1   948.11 952.11
+ EducationLevel_c     1   948.11 952.11
+ Ethnicity_o          1   948.13 952.13
+ Ethnicity_j          1   948.16 952.16

Step:  AIC=544.95
Approved ~ PriorDefault_No
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ CreditScore          1   503.21 509.21
+ Income               1   505.09 511.09
+ Employed_Unemployed  1   507.67 513.67
+ Employed_Employed    1   507.67 513.67
+ Citizen_p            1   523.48 529.48
+ Married_y            1   528.70 534.70
+ BankCustomer_p       1   528.70 534.70
+ EducationLevel_x     1   531.38 537.38
+ Married_u            1   531.98 537.98
+ BankCustomer_g       1   531.98 537.98
+ YearsEmployed        1   532.97 538.97
+ EducationLevel_aa    1   534.80 540.80
+ EducationLevel_cc    1   534.81 540.81
+ EducationLevel_ff    1   536.36 542.36
+ Ethnicity_ff         1   536.69 542.69
+ Ethnicity_h          1   537.32 543.32
+ EducationLevel_k     1   537.95 543.95
+ Citizen_s            1   537.97 543.97
+ Ethnicity_o          1   538.22 544.22
+ Ethnicity_n          1   538.72 544.72
<none>                     540.95 544.95
+ EducationLevel_d     1   539.15 545.15
+ EducationLevel_q     1   539.20 545.20
+ Ethnicity_j          1   539.35 545.35
+ EducationLevel_i     1   539.61 545.61
+ Debt                 1   539.69 545.69
+ EducationLevel_w     1   539.88 545.88
+ Ethnicity_bb         1   540.07 546.07
+ Ethnicity_v          1   540.48 546.48
+ EducationLevel_r     1   540.59 546.59
+ Age                  1   540.68 546.68
+ EducationLevel_m     1   540.69 546.69
+ EducationLevel_e     1   540.82 546.82
+ EducationLevel_j     1   540.83 546.83
+ Ethnicity_z          1   540.85 546.85
+ EducationLevel_c     1   540.89 546.89
+ Citizen_g            1   540.92 546.92
+ Ethnicity_dd         1   540.94 546.94
- PriorDefault_No      1   948.16 950.16

Step:  AIC=509.21
Approved ~ PriorDefault_No + CreditScore
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ Income               1   477.87 485.87
+ Citizen_p            1   484.26 492.26
+ EducationLevel_x     1   492.76 500.76
+ Married_y            1   494.87 502.87
+ BankCustomer_p       1   494.87 502.87
+ Ethnicity_ff         1   497.51 505.51
+ Married_u            1   497.62 505.62
+ BankCustomer_g       1   497.62 505.62
+ EducationLevel_ff    1   497.72 505.72
+ EducationLevel_cc    1   497.84 505.84
+ Employed_Employed    1   498.33 506.33
+ Employed_Unemployed  1   498.33 506.33
+ YearsEmployed        1   499.68 507.68
+ Ethnicity_h          1   499.76 507.76
+ EducationLevel_aa    1   499.85 507.85
+ Ethnicity_o          1   500.21 508.21
+ EducationLevel_k     1   500.79 508.79
<none>                     503.21 509.21
+ Ethnicity_j          1   501.27 509.27
+ Ethnicity_n          1   501.28 509.28
+ EducationLevel_d     1   501.70 509.70
+ EducationLevel_w     1   501.72 509.72
+ EducationLevel_i     1   501.96 509.96
+ Citizen_g            1   502.06 510.06
+ Ethnicity_bb         1   502.15 510.15
+ EducationLevel_q     1   502.17 510.17
+ Citizen_s            1   502.69 510.69
+ Ethnicity_z          1   502.82 510.82
+ EducationLevel_r     1   502.86 510.86
+ Ethnicity_v          1   503.05 511.05
+ EducationLevel_m     1   503.06 511.06
+ EducationLevel_e     1   503.09 511.09
+ EducationLevel_j     1   503.13 511.13
+ Age                  1   503.18 511.18
+ Ethnicity_dd         1   503.19 511.19
+ Debt                 1   503.19 511.19
+ EducationLevel_c     1   503.21 511.21
- CreditScore          1   540.95 544.95
- PriorDefault_No      1   762.74 766.74
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=485.87
Approved ~ PriorDefault_No + CreditScore + Income
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ Citizen_p            1   462.37 472.37
+ EducationLevel_x     1   468.30 478.30
+ EducationLevel_ff    1   470.00 480.00
+ Married_y            1   470.29 480.29
+ BankCustomer_p       1   470.29 480.29
+ Married_u            1   471.72 481.72
+ BankCustomer_g       1   471.72 481.72
+ Ethnicity_ff         1   472.52 482.52
+ EducationLevel_cc    1   472.64 482.64
+ Employed_Unemployed  1   473.35 483.35
+ Employed_Employed    1   473.35 483.35
+ YearsEmployed        1   473.93 483.93
+ Ethnicity_h          1   474.45 484.45
+ EducationLevel_aa    1   475.33 485.33
+ Ethnicity_j          1   475.57 485.57
+ Ethnicity_n          1   475.74 485.74
<none>                     477.87 485.87
+ EducationLevel_k     1   476.08 486.08
+ Citizen_g            1   476.19 486.19
+ EducationLevel_w     1   476.44 486.44
+ EducationLevel_q     1   476.57 486.57
+ Ethnicity_bb         1   476.76 486.76
+ EducationLevel_i     1   476.79 486.79
+ EducationLevel_d     1   477.00 487.00
+ Ethnicity_z          1   477.05 487.05
+ EducationLevel_m     1   477.71 487.71
+ EducationLevel_j     1   477.72 487.72
+ Debt                 1   477.73 487.73
+ Ethnicity_o          1   477.76 487.76
+ Age                  1   477.83 487.83
+ Ethnicity_dd         1   477.83 487.83
+ EducationLevel_e     1   477.84 487.84
+ Citizen_s            1   477.84 487.84
+ Ethnicity_v          1   477.85 487.85
+ EducationLevel_r     1   477.86 487.86
+ EducationLevel_c     1   477.87 487.87
- Income               1   503.21 509.21
- CreditScore          1   505.09 511.09
- PriorDefault_No      1   723.20 729.20
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=472.37
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ EducationLevel_x     1   452.34 464.34
+ EducationLevel_ff    1   454.25 466.25
+ Married_y            1   456.07 468.07
+ BankCustomer_p       1   456.07 468.07
+ EducationLevel_cc    1   456.58 468.58
+ Employed_Unemployed  1   456.73 468.73
+ Employed_Employed    1   456.73 468.73
+ Ethnicity_ff         1   456.89 468.89
+ Married_u            1   457.40 469.40
+ BankCustomer_g       1   457.40 469.40
+ YearsEmployed        1   457.91 469.91
+ Ethnicity_h          1   458.34 470.34
+ EducationLevel_i     1   459.19 471.19
+ Ethnicity_bb         1   459.23 471.23
+ Ethnicity_n          1   459.86 471.86
+ EducationLevel_aa    1   460.15 472.15
<none>                     462.37 472.37
+ EducationLevel_w     1   460.55 472.55
+ EducationLevel_q     1   460.83 472.83
+ EducationLevel_k     1   460.94 472.94
+ Ethnicity_j          1   461.40 473.40
+ Ethnicity_z          1   461.55 473.55
+ EducationLevel_d     1   461.69 473.69
+ Age                  1   462.16 474.16
+ EducationLevel_m     1   462.28 474.28
+ Ethnicity_o          1   462.28 474.28
+ Ethnicity_dd         1   462.29 474.29
+ Ethnicity_v          1   462.30 474.30
+ EducationLevel_e     1   462.30 474.30
+ EducationLevel_r     1   462.35 474.35
+ Debt                 1   462.36 474.36
+ EducationLevel_j     1   462.36 474.36
+ EducationLevel_c     1   462.36 474.36
+ Citizen_g            1   462.37 474.37
+ Citizen_s            1   462.37 474.37
- Citizen_p            1   477.87 485.87
- Income               1   484.26 492.26
- CreditScore          1   490.12 498.12
- PriorDefault_No      1   719.88 727.88
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=464.34
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ Married_y            1   444.97 458.97
+ BankCustomer_p       1   444.97 458.97
+ EducationLevel_ff    1   445.13 459.13
+ EducationLevel_cc    1   445.65 459.65
+ Married_u            1   446.44 460.44
+ BankCustomer_g       1   446.44 460.44
+ Ethnicity_ff         1   447.63 461.63
+ Employed_Unemployed  1   447.75 461.75
+ Employed_Employed    1   447.75 461.75
+ YearsEmployed        1   448.32 462.32
+ Ethnicity_n          1   449.70 463.70
+ EducationLevel_w     1   449.77 463.77
+ EducationLevel_i     1   449.82 463.82
+ Ethnicity_bb         1   449.82 463.82
+ EducationLevel_q     1   449.92 463.92
+ Ethnicity_h          1   450.18 464.18
<none>                     452.34 464.34
+ EducationLevel_aa    1   450.82 464.82
+ Ethnicity_j          1   451.25 465.25
+ EducationLevel_k     1   451.37 465.37
+ Ethnicity_z          1   451.69 465.69
+ EducationLevel_d     1   451.88 465.88
+ Ethnicity_v          1   452.08 466.08
+ Age                  1   452.14 466.14
+ EducationLevel_c     1   452.15 466.15
+ EducationLevel_e     1   452.18 466.18
+ Ethnicity_dd         1   452.22 466.22
+ Ethnicity_o          1   452.26 466.26
+ Debt                 1   452.26 466.26
+ Citizen_g            1   452.30 466.30
+ Citizen_s            1   452.30 466.30
+ EducationLevel_r     1   452.31 466.31
+ EducationLevel_m     1   452.33 466.33
+ EducationLevel_j     1   452.34 466.34
- EducationLevel_x     1   462.37 472.37
- Citizen_p            1   468.30 478.30
- Income               1   473.18 483.18
- CreditScore          1   480.55 490.55
- PriorDefault_No      1   697.98 707.98
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=458.97
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x + Married_y
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ EducationLevel_cc    1   437.84 453.84
+ EducationLevel_ff    1   438.29 454.29
+ Married_u            1   438.70 454.70
+ BankCustomer_g       1   438.70 454.70
+ Ethnicity_ff         1   440.73 456.73
+ Employed_Unemployed  1   441.06 457.06
+ Employed_Employed    1   441.06 457.06
+ YearsEmployed        1   441.28 457.28
+ Ethnicity_bb         1   441.96 457.96
+ EducationLevel_w     1   441.97 457.97
+ EducationLevel_i     1   442.07 458.07
+ Ethnicity_n          1   442.33 458.33
+ Ethnicity_h          1   442.66 458.66
<none>                     444.97 458.97
+ EducationLevel_q     1   443.61 459.61
+ Ethnicity_j          1   443.89 459.89
+ EducationLevel_aa    1   443.92 459.92
+ Ethnicity_z          1   444.02 460.02
+ EducationLevel_k     1   444.07 460.07
+ Age                  1   444.41 460.41
+ EducationLevel_d     1   444.63 460.63
+ Ethnicity_v          1   444.64 460.64
+ EducationLevel_c     1   444.66 460.66
+ Debt                 1   444.83 460.83
+ EducationLevel_e     1   444.86 460.86
+ Ethnicity_dd         1   444.87 460.87
+ Ethnicity_o          1   444.87 460.87
+ EducationLevel_m     1   444.90 460.90
+ EducationLevel_r     1   444.93 460.93
+ EducationLevel_j     1   444.97 460.97
+ Citizen_g            1   444.97 460.97
+ Citizen_s            1   444.97 460.97
- Married_y            1   452.34 464.34
- EducationLevel_x     1   456.07 468.07
- Citizen_p            1   459.56 471.56
- Income               1   465.83 477.83
- CreditScore          1   471.44 483.44
- PriorDefault_No      1   686.84 698.84
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=453.84
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x + Married_y + EducationLevel_cc
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ EducationLevel_ff    1   431.93 449.93
+ Employed_Employed    1   433.94 451.94
+ Employed_Unemployed  1   433.94 451.94
+ EducationLevel_w     1   433.96 451.96
+ Married_u            1   434.02 452.02
+ BankCustomer_g       1   434.02 452.02
+ Ethnicity_ff         1   434.06 452.06
+ YearsEmployed        1   434.90 452.90
+ Ethnicity_n          1   435.02 453.02
+ Ethnicity_bb         1   435.34 453.34
+ EducationLevel_i     1   435.49 453.49
+ EducationLevel_q     1   435.79 453.79
<none>                     437.84 453.84
+ Ethnicity_h          1   436.15 454.15
+ Ethnicity_j          1   436.62 454.62
+ Ethnicity_z          1   437.03 455.03
+ EducationLevel_c     1   437.07 455.07
+ EducationLevel_aa    1   437.22 455.22
+ EducationLevel_k     1   437.27 455.27
+ Age                  1   437.46 455.46
+ Ethnicity_v          1   437.49 455.49
+ EducationLevel_e     1   437.64 455.64
+ EducationLevel_d     1   437.65 455.65
+ Ethnicity_dd         1   437.70 455.70
+ Ethnicity_o          1   437.75 455.75
+ Debt                 1   437.77 455.77
+ EducationLevel_r     1   437.79 455.79
+ EducationLevel_m     1   437.83 455.83
+ Citizen_g            1   437.83 455.83
+ Citizen_s            1   437.83 455.83
+ EducationLevel_j     1   437.84 455.84
- EducationLevel_cc    1   444.97 458.97
- Married_y            1   445.65 459.65
- EducationLevel_x     1   449.94 463.94
- Citizen_p            1   453.02 467.02
- Income               1   458.56 472.56
- CreditScore          1   462.58 476.58
- PriorDefault_No      1   677.25 691.25
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=449.93
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ Employed_Unemployed  1   427.78 447.78
+ Employed_Employed    1   427.78 447.78
+ Married_u            1   428.14 448.14
+ BankCustomer_g       1   428.14 448.14
+ Ethnicity_bb         1   428.45 448.45
+ EducationLevel_i     1   428.60 448.60
+ EducationLevel_w     1   428.89 448.89
+ Ethnicity_n          1   429.33 449.33
+ YearsEmployed        1   429.73 449.73
<none>                     431.93 449.93
+ Ethnicity_ff         1   430.28 450.28
+ EducationLevel_q     1   430.47 450.47
+ Ethnicity_h          1   430.80 450.80
+ Ethnicity_j          1   430.93 450.93
+ EducationLevel_aa    1   430.94 450.94
+ Ethnicity_z          1   431.00 451.00
+ EducationLevel_k     1   431.03 451.03
+ EducationLevel_d     1   431.60 451.60
+ EducationLevel_c     1   431.61 451.61
+ EducationLevel_e     1   431.81 451.81
+ Ethnicity_o          1   431.83 451.83
+ Ethnicity_dd         1   431.83 451.83
+ EducationLevel_m     1   431.85 451.85
+ EducationLevel_r     1   431.88 451.88
+ Ethnicity_v          1   431.91 451.91
+ EducationLevel_j     1   431.93 451.93
+ Citizen_g            1   431.93 451.93
+ Citizen_s            1   431.93 451.93
+ Debt                 1   431.93 451.93
+ Age                  1   431.93 451.93
- EducationLevel_ff    1   437.84 453.84
- EducationLevel_cc    1   438.29 454.29
- Married_y            1   439.22 455.22
- EducationLevel_x     1   443.05 459.05
- Citizen_p            1   447.36 463.36
- Income               1   453.80 469.80
- CreditScore          1   456.92 472.92
- PriorDefault_No      1   655.07 671.07
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=447.78
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff + 
    Employed_Unemployed
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ Married_u            1   423.81 445.81
+ BankCustomer_g       1   423.81 445.81
+ Ethnicity_bb         1   424.96 446.96
+ EducationLevel_i     1   424.99 446.99
+ YearsEmployed        1   425.22 447.22
+ EducationLevel_w     1   425.37 447.37
+ Ethnicity_n          1   425.40 447.40
<none>                     427.78 447.78
+ Ethnicity_ff         1   425.92 447.92
+ Ethnicity_h          1   426.39 448.39
+ Ethnicity_j          1   426.65 448.65
+ Ethnicity_z          1   426.66 448.66
+ EducationLevel_q     1   426.81 448.81
+ EducationLevel_k     1   426.91 448.91
+ EducationLevel_aa    1   427.04 449.04
+ EducationLevel_c     1   427.44 449.44
+ Ethnicity_v          1   427.64 449.64
+ Citizen_s            1   427.64 449.64
+ Citizen_g            1   427.64 449.64
+ EducationLevel_d     1   427.67 449.67
+ EducationLevel_m     1   427.67 449.67
+ Ethnicity_o          1   427.69 449.69
+ Ethnicity_dd         1   427.70 449.70
+ EducationLevel_e     1   427.70 449.70
+ EducationLevel_r     1   427.71 449.71
+ Age                  1   427.74 449.74
+ EducationLevel_j     1   427.77 449.77
+ Debt                 1   427.78 449.78
- Employed_Unemployed  1   431.93 449.93
- CreditScore          1   433.61 451.61
- EducationLevel_ff    1   433.94 451.94
- EducationLevel_cc    1   434.11 452.11
- Married_y            1   434.28 452.28
- EducationLevel_x     1   437.59 455.59
- Citizen_p            1   444.13 462.13
- Income               1   448.23 466.23
- PriorDefault_No      1   643.83 661.83
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=445.81
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff + 
    Employed_Unemployed + Married_u
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ Ethnicity_bb         1   421.01 445.01
+ EducationLevel_i     1   421.04 445.04
+ Ethnicity_n          1   421.37 445.37
+ EducationLevel_w     1   421.41 445.41
<none>                     423.81 445.81
+ YearsEmployed        1   422.09 446.09
+ Ethnicity_h          1   422.26 446.26
+ Ethnicity_j          1   422.66 446.66
+ Ethnicity_z          1   422.67 446.67
+ EducationLevel_q     1   422.85 446.85
+ EducationLevel_k     1   422.94 446.94
+ EducationLevel_aa    1   423.05 447.05
+ EducationLevel_c     1   423.47 447.47
+ EducationLevel_d     1   423.70 447.70
+ EducationLevel_m     1   423.70 447.70
+ Ethnicity_dd         1   423.72 447.72
+ Age                  1   423.72 447.72
+ Ethnicity_o          1   423.72 447.72
+ EducationLevel_e     1   423.73 447.73
+ EducationLevel_r     1   423.74 447.74
+ Ethnicity_v          1   423.76 447.76
- Married_u            1   427.78 447.78
- EducationLevel_cc    1   427.79 447.79
+ Debt                 1   423.80 447.80
+ Ethnicity_ff         1   423.80 447.80
+ EducationLevel_j     1   423.80 447.80
+ Citizen_g            1   423.81 447.81
+ Citizen_s            1   423.81 447.81
- Employed_Unemployed  1   428.14 448.14
- Married_y            1   429.08 449.08
- CreditScore          1   429.55 449.55
- EducationLevel_ff    1   429.93 449.93
- EducationLevel_x     1   433.53 453.53
- Citizen_p            1   440.46 460.46
- Income               1   442.35 462.35
- PriorDefault_No      1   642.38 662.38
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=445.01
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff + 
    Employed_Unemployed + Married_u + Ethnicity_bb
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ Ethnicity_n          1   418.67 444.67
<none>                     421.01 445.01
+ EducationLevel_w     1   419.04 445.04
+ YearsEmployed        1   419.29 445.29
+ Ethnicity_z          1   419.69 445.69
+ Ethnicity_v          1   419.74 445.74
- Ethnicity_bb         1   423.81 445.81
+ EducationLevel_k     1   419.91 445.91
+ EducationLevel_aa    1   419.92 445.92
+ Ethnicity_h          1   420.11 446.11
+ EducationLevel_i     1   420.12 446.12
+ Ethnicity_j          1   420.15 446.15
+ EducationLevel_q     1   420.48 446.48
- EducationLevel_cc    1   424.53 446.53
+ EducationLevel_c     1   420.55 446.55
- Employed_Unemployed  1   424.68 446.68
+ EducationLevel_m     1   420.84 446.84
+ Age                  1   420.85 446.85
+ EducationLevel_d     1   420.90 446.90
+ Ethnicity_o          1   420.92 446.92
+ EducationLevel_e     1   420.95 446.95
- Married_u            1   424.96 446.96
+ Ethnicity_dd         1   420.96 446.96
+ EducationLevel_r     1   420.96 446.96
+ Ethnicity_ff         1   421.00 447.00
+ Citizen_s            1   421.00 447.00
+ Citizen_g            1   421.00 447.00
+ Debt                 1   421.01 447.01
+ EducationLevel_j     1   421.01 447.01
- Married_y            1   426.30 448.30
- CreditScore          1   427.19 449.19
- EducationLevel_ff    1   427.99 449.99
- EducationLevel_x     1   430.17 452.17
- Citizen_p            1   439.60 461.60
- Income               1   439.83 461.83
- PriorDefault_No      1   641.78 663.78
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=444.67
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff + 
    Employed_Unemployed + Married_u + Ethnicity_bb + Ethnicity_n
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
+ EducationLevel_w     1   416.52 444.52
<none>                     418.67 444.67
+ YearsEmployed        1   416.93 444.93
- Ethnicity_n          1   421.01 445.01
+ Ethnicity_z          1   417.37 445.37
- Ethnicity_bb         1   421.37 445.37
+ EducationLevel_aa    1   417.65 445.65
+ EducationLevel_k     1   417.65 445.65
+ Ethnicity_h          1   417.68 445.68
+ Ethnicity_j          1   417.76 445.76
+ Ethnicity_v          1   417.83 445.83
+ EducationLevel_i     1   417.83 445.83
- Employed_Unemployed  1   422.15 446.15
+ EducationLevel_q     1   418.21 446.21
- EducationLevel_cc    1   422.29 446.29
+ EducationLevel_c     1   418.30 446.30
+ EducationLevel_r     1   418.44 446.44
+ Age                  1   418.45 446.45
+ EducationLevel_m     1   418.53 446.53
+ EducationLevel_d     1   418.58 446.58
+ Ethnicity_o          1   418.59 446.59
+ EducationLevel_e     1   418.59 446.59
+ Ethnicity_dd         1   418.60 446.60
+ Citizen_g            1   418.65 446.65
+ Citizen_s            1   418.65 446.65
+ Ethnicity_ff         1   418.66 446.66
- Married_u            1   422.67 446.67
+ EducationLevel_j     1   418.67 446.67
+ Debt                 1   418.67 446.67
- Married_y            1   424.02 448.02
- CreditScore          1   424.85 448.85
- EducationLevel_ff    1   425.37 449.37
- EducationLevel_x     1   427.98 451.98
- Citizen_p            1   437.60 461.60
- Income               1   437.60 461.60
- PriorDefault_No      1   641.75 665.75
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Step:  AIC=444.52
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff + 
    Employed_Unemployed + Married_u + Ethnicity_bb + Ethnicity_n + 
    EducationLevel_w
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
                      Df Deviance    AIC
<none>                     416.52 444.52
- EducationLevel_w     1   418.67 444.67
+ YearsEmployed        1   414.76 444.76
- Ethnicity_bb         1   418.78 444.78
- Ethnicity_n          1   419.04 445.04
+ Ethnicity_h          1   415.20 445.20
+ Ethnicity_v          1   415.25 445.25
+ Ethnicity_z          1   415.38 445.38
+ Ethnicity_j          1   415.47 445.47
- Employed_Unemployed  1   419.52 445.52
+ EducationLevel_q     1   415.60 445.60
+ EducationLevel_c     1   415.65 445.65
+ EducationLevel_k     1   415.84 445.84
+ EducationLevel_i     1   415.89 445.89
+ EducationLevel_aa    1   415.89 445.89
+ Age                  1   416.24 446.24
+ EducationLevel_r     1   416.29 446.29
+ EducationLevel_e     1   416.35 446.35
+ Ethnicity_dd         1   416.41 446.41
+ Ethnicity_o          1   416.44 446.44
+ EducationLevel_m     1   416.48 446.48
+ EducationLevel_d     1   416.48 446.48
+ Citizen_g            1   416.49 446.49
+ Citizen_s            1   416.49 446.49
+ Debt                 1   416.51 446.51
- Married_u            1   420.51 446.51
+ Ethnicity_ff         1   416.51 446.51
+ EducationLevel_j     1   416.52 446.52
- EducationLevel_cc    1   420.67 446.67
- Married_y            1   421.91 447.91
- EducationLevel_ff    1   422.30 448.30
- CreditScore          1   423.29 449.29
- EducationLevel_x     1   426.67 452.67
- Income               1   435.02 461.02
- Citizen_p            1   435.73 461.73
- PriorDefault_No      1   639.97 665.97

Con un AIC 444.75 escogemos las siguiente variables aplicando el comando formula:

step$formula
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p + 
    EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff + 
    Employed_Unemployed + Married_u + Ethnicity_bb + Ethnicity_n + 
    EducationLevel_w

Selecionamos las variables indicadas en el paso anterior:

df <- df[c("Approved","PriorDefault_No","CreditScore","Income","Citizen_p","EducationLevel_x","Married_y","EducationLevel_cc","EducationLevel_ff","Employed_Unemployed","Married_u","EducationLevel_w","Ethnicity_n","Ethnicity_h")] 

3. Divide el dataset tomando las primeras 590 instancias como train y las últimas 100 como test.

X <- data.matrix(subset(df, select= - Approved))
Y <- as.double(as.matrix(df$Approved))

# TRAIN
X_Train <- X[0:590,]
Y_Train <- Y[0:590]

# TEST
X_Test <- X[591:nrow(X), ]
Y_Test <- Y[591:length(Y)]

4. Entrena un modelo de regresión logística con regularización Ridge y Lasso en train seleccionando el que mejor AUC tenga. Da las métricas en test.

Tenemos un problema de clasificación binaria (ya sea para aprobar crédito o no), por eso crearemos un modelo de Regresión Logística.

Necesitamos crear un modelo capaz de predecir si aprobar o no un crédito de la mejor manera posible, pero también debemos minimizar el número de falsos positivos, ya que los falsos positivos harían que nuestro banco perdiera dinero otorgando créditos que no debería. Por esa razón, usaremos el Área bajo la curva (ROC) (AUC) como nuestro estimador.

ROC es un gráfico de la tasa de falsos positivos (eje x) frente a la tasa de verdaderos positivos (eje y) para varios valores de umbral candidatos diferentes entre 0,0 y 1,0, por lo que el área debajo de esta curva sería el mejor estimador posible cuando se trata de obtener buenas predicciones y minimizar los falsos positivos al mismo tiempo.

Para obtener mejores resultados, usaremos también una regularización, ya sea para usar Lasso o Ridge, usaremos un modelo Elastic-Net para eso.

MODELO RIDGE

cv.ridge <- cv.glmnet(X_Train, Y_Train, family='binomial', alpha=0, parallel=TRUE, standardize=TRUE, type.measure='auc')
Warning: executing %dopar% sequentially: no parallel backend registered
plot(cv.ridge)


coef(cv.ridge, s=cv.ridge$lambda.min)
14 x 1 sparse Matrix of class "dgCMatrix"
                            s1
(Intercept)         -1.0025844
PriorDefault_No      1.8342115
CreditScore          0.3198329
Income               0.2120270
Citizen_p            0.9935918
EducationLevel_x     0.8695703
Married_y           -0.2839611
EducationLevel_cc    0.7100560
EducationLevel_ff   -0.6761165
Employed_Unemployed -0.7201874
Married_u            0.1999485
EducationLevel_w     0.2572430
Ethnicity_n          1.2953468
Ethnicity_h          0.4203607

MODELO LASSO

cv.lasso <- cv.glmnet(X_Train, Y_Train, family='binomial', alpha=1, parallel=TRUE, standardize=TRUE, type.measure='auc')

plot(cv.lasso)

coef(cv.lasso, s=cv.lasso$lambda.min)
14 x 1 sparse Matrix of class "dgCMatrix"
                              s1
(Intercept)         -0.006191541
PriorDefault_No      3.527374925
CreditScore          0.576209872
Income               2.214498560
Citizen_p            3.067028183
EducationLevel_x     2.000294335
Married_y           -2.620905366
EducationLevel_cc    1.542175224
EducationLevel_ff   -1.044945386
Employed_Unemployed -0.727078271
Married_u           -1.884715933
EducationLevel_w     0.603223164
Ethnicity_n          3.621548274
Ethnicity_h          0.584561098

Comparación modelo Lasso vs Ridge

Coeficiente AUC Ridge

max(cv.ridge$cvm)
[1] 0.9269096

Coeficiente AUC Lasso

max(cv.lasso$cvm)
[1] 0.9256555
max(cv.ridge$cvm) - max(cv.lasso$cvm)
[1] 0.001254151

Ambos valores parecen que dan el mismo resultado, pero Ridge da un ajuste ligeramente mejor.

TEST

Se prueba el modelo de regresión logística usando la regularización de Ridge para ver su utilidad:

y_pred <- as.numeric(predict.glmnet(cv.ridge$glmnet.fit, newx=X_Test, s=cv.ridge$lambda.min)>.5)

y_pred
  [1] 1 1 0 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [44] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [87] 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Ahora, se crea una matriz de confusión para poder comparar el resultado real y el resultado previsto:

conf_matrix <- confusionMatrix(as.factor(Y_Test), as.factor(y_pred), mode="everything", positive = "0")
conf_matrix
Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 85  1
         1  8  6
                                        
               Accuracy : 0.91          
                 95% CI : (0.836, 0.958)
    No Information Rate : 0.93          
    P-Value [Acc > NIR] : 0.8380        
                                        
                  Kappa : 0.5273        
                                        
 Mcnemar's Test P-Value : 0.0455        
                                        
            Sensitivity : 0.9140        
            Specificity : 0.8571        
         Pos Pred Value : 0.9884        
         Neg Pred Value : 0.4286        
              Precision : 0.9884        
                 Recall : 0.9140        
                     F1 : 0.9497        
             Prevalence : 0.9300        
         Detection Rate : 0.8500        
   Detection Prevalence : 0.8600        
      Balanced Accuracy : 0.8856        
                                        
       'Positive' Class : 0             
                                        

Disponemos de un modelo con una Accuracy del 90%, y Recall del 91,30%, F1 de 94,38% y Precision del 97,67%.

cTab    <- table(Y_Test, y_pred)    # Confusion Matrix
addmargins(cTab)
      y_pred
Y_Test   0   1 Sum
   0    85   1  86
   1     8   6  14
   Sum  93   7 100

En la matriz de confusión solo tuvimos dos falsos positivos de 100 predicciones, 6 se aprobaron correctamente y 84 se denegaron correctamente. También tuvimos 8 falsos negativos.

5. Aporta los log odds de las variables predictoras sobre la variable objetivo.

Variables tienen más influencia en nuestro modelo:

coef(cv.ridge, s=cv.ridge$lambda.min)
14 x 1 sparse Matrix of class "dgCMatrix"
                            s1
(Intercept)         -1.0025844
PriorDefault_No      1.8342115
CreditScore          0.3198329
Income               0.2120270
Citizen_p            0.9935918
EducationLevel_x     0.8695703
Married_y           -0.2839611
EducationLevel_cc    0.7100560
EducationLevel_ff   -0.6761165
Employed_Unemployed -0.7201874
Married_u            0.1999485
EducationLevel_w     0.2572430
Ethnicity_n          1.2953468
Ethnicity_h          0.4203607

Las variables siguientes se correlacionan positivamente: PriorDefaul_No, Ethnicity_n, Citizen_p. Mientras que tener un “EducationLevel_ff” y estar desempleado (“Employed_Unemployed”) tienen mayor impacto negativo a la hora de aprobar un crédito.

Tabla log odds:

exp(coef(cv.ridge, s=cv.ridge$lambda.min))
14 x 1 Matrix of class "dgeMatrix"
                           s1
(Intercept)         0.3669299
PriorDefault_No     6.2601958
CreditScore         1.3768976
Income              1.2361812
Citizen_p           2.7009182
EducationLevel_x    2.3858854
Married_y           0.7527960
EducationLevel_cc   2.0341053
EducationLevel_ff   0.5085883
Employed_Unemployed 0.4866611
Married_u           1.2213398
EducationLevel_w    1.2933594
Ethnicity_n         3.6522624
Ethnicity_h         1.5225107

Conclusiones:

El factor que más influyen es PriorDefault_no aumenta hasta un 753,4% la probalidad de obtener un préstamo, seguidamente se encuentra la variable Ethnicity_n que aumenta un 459,7%. Y las variables que influyen negativamente serían 48,2% (EducationLevel_ff) y 47,2 (Employed_Unemployed).

#6. Si por cada verdadero positivo ganamos 100e y por cada falso positivo perdemos 20e. ¿Qué valor monetario generará el modelo teniendo en cuénta la matriz de confusión del modelo con mayor AUC (con las métricas en test)?

sensibilidad <- round(conf_matrix$byClass["Sensitivity"], 3)
especificidad <- round(conf_matrix$byClass["Specificity"], 3)
rent_esp <- sensibilidad*100 - especificidad*20
rent_esp
Sensitivity 
      74.26 

La rentabilidad esperada es de rent_esp 74.26€ por cada caso.

