Tomaremos el dataset de aprobación de crédito bancario en https://archive.ics.uci.edu/ml/datasets/Credit+Approval . Los datos también se pueden cargar de la carpeta de contenido en crx.data. La información del dataset está en https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.names y expone lo siguiente:
1. Title: Credit Approval
2. Sources:
(confidential)
Submitted by quinlan@cs.su.oz.au
3. Past Usage:
See Quinlan,
* "Simplifying decision trees", Int J Man-Machine Studies 27,
Dec 1987, pp. 221-234.
* "C4.5: Programs for Machine Learning", Morgan Kaufmann, Oct 1992
4. Relevant Information:
This file concerns credit card applications. All attribute names
and values have been changed to meaningless symbols to protect
confidentiality of the data.
This dataset is interesting because there is a good mix of
attributes -- continuous, nominal with small numbers of
values, and nominal with larger numbers of values. There
are also a few missing values.
5. Number of Instances: 690
6. Number of Attributes: 15 + class attribute
7. Attribute Information:
A1: b, a.
A2: continuous.
A3: continuous.
A4: u, y, l, t.
A5: g, p, gg.
A6: c, d, cc, i, j, k, m, r, q, w, x, e, aa, ff.
A7: v, h, bb, j, n, z, dd, ff, o.
A8: continuous.
A9: t, f.
A10: t, f.
A11: continuous.
A12: t, f.
A13: g, p, s.
A14: continuous.
A15: continuous.
A16: +,- (class attribute)
8. Missing Attribute Values:
37 cases (5%) have one or more missing values. The missing
values from particular attributes are:
A1: 12
A2: 12
A4: 6
A5: 6
A6: 9
A7: 9
A14: 13
9. Class Distribution
+: 307 (44.5%)
-: 383 (55.5%)
Carga los datos. Realiza una inspección por variables de la distribución de aprobación de crédito en función de cada atributo visualmente. Realiza las observaciones pertinentes. ¿ Qué variables son mejores para separar los datos?
Prepara el dataset convenientemente e imputa los valores faltantes usando la librería missForest
Divide el dataset tomando las primeras 590 instancias como train y las últimas 100 como test.
Entrena un modelo de regresión logística con regularización Ridge y Lasso en train seleccionando el que mejor AUC tenga. Da las métricas en test.
Aporta los log odds de las variables predictoras sobre la variable objetivo.
Si por cada verdadero positivo ganamos 100e y por cada falso positivo perdemos 20e. ¿Qué valor monetario generará el modelo teniendo en cuénta la matriz de confusión del modelo con mayor AUC (con las métricas en test)?
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.6 ✓ stringr 1.4.0
✓ tidyr 1.2.0 ✓ forcats 0.5.1
✓ readr 2.1.1
── Conflicts ───────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(ggplot2)
library(fastDummies)
library(missForest)
Loading required package: randomForest
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Attaching package: ‘randomForest’
The following object is masked from ‘package:ggplot2’:
margin
The following object is masked from ‘package:dplyr’:
combine
Loading required package: foreach
Attaching package: ‘foreach’
The following objects are masked from ‘package:purrr’:
accumulate, when
Loading required package: itertools
Loading required package: iterators
library(corrplot)
corrplot 0.92 loaded
library(glmnet)
Loading required package: Matrix
Attaching package: ‘Matrix’
The following objects are masked from ‘package:tidyr’:
expand, pack, unpack
Loaded glmnet 4.1-3
library(caret)
Loading required package: lattice
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Attaching package: ‘caret’
The following object is masked from ‘package:purrr’:
lift
library(lattice)
library(e1071)
library(MASS)
Attaching package: ‘MASS’
The following object is masked from ‘package:dplyr’:
select
library(PerformanceAnalytics)
Loading required package: xts
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Attaching package: ‘xts’
The following objects are masked from ‘package:dplyr’:
first, last
Attaching package: ‘PerformanceAnalytics’
The following objects are masked from ‘package:e1071’:
kurtosis, skewness
The following object is masked from ‘package:graphics’:
legend
url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data'
data <- read.csv(url, sep = ",", header = F)
Inspeccionamos las variables que tenemos y asignamos el nombre de cada una de las variables de acuerdo a la bibliografía (Khaneja, Deepesh. (2017). Credit Approval Analysis using R.). Además convertimos en binaria la variable objetivo Approved.
colnames(data) = c("Male", "Age", "Debt", "Married", "BankCustomer", "EducationLevel", "Ethnicity", "YearsEmployed", "PriorDefault", "Employed", "CreditScore", "DriversLicense", "Citizen", "ZipCode", "Income", "Approved")
head(data)
str(data)
'data.frame': 690 obs. of 16 variables:
$ Male : chr "b" "a" "a" "b" ...
$ Age : chr "30.83" "58.67" "24.50" "27.83" ...
$ Debt : num 0 4.46 0.5 1.54 5.62 ...
$ Married : chr "u" "u" "u" "u" ...
$ BankCustomer : chr "g" "g" "g" "g" ...
$ EducationLevel: chr "w" "q" "q" "w" ...
$ Ethnicity : chr "v" "h" "h" "v" ...
$ YearsEmployed : num 1.25 3.04 1.5 3.75 1.71 ...
$ PriorDefault : chr "t" "t" "t" "t" ...
$ Employed : chr "t" "t" "f" "t" ...
$ CreditScore : int 1 6 0 5 0 0 0 0 0 0 ...
$ DriversLicense: chr "f" "f" "f" "t" ...
$ Citizen : chr "g" "g" "g" "g" ...
$ ZipCode : chr "00202" "00043" "00280" "00100" ...
$ Income : int 0 560 824 3 0 0 31285 1349 314 1442 ...
$ Approved : chr "+" "+" "+" "+" ...
summary(data)
Male Age Debt Married
Length:690 Length:690 Min. : 0.000 Length:690
Class :character Class :character 1st Qu.: 1.000 Class :character
Mode :character Mode :character Median : 2.750 Mode :character
Mean : 4.759
3rd Qu.: 7.207
Max. :28.000
BankCustomer EducationLevel Ethnicity YearsEmployed
Length:690 Length:690 Length:690 Min. : 0.000
Class :character Class :character Class :character 1st Qu.: 0.165
Mode :character Mode :character Mode :character Median : 1.000
Mean : 2.223
3rd Qu.: 2.625
Max. :28.500
PriorDefault Employed CreditScore DriversLicense
Length:690 Length:690 Min. : 0.0 Length:690
Class :character Class :character 1st Qu.: 0.0 Class :character
Mode :character Mode :character Median : 0.0 Mode :character
Mean : 2.4
3rd Qu.: 3.0
Max. :67.0
Citizen ZipCode Income Approved
Length:690 Length:690 Min. : 0.0 Length:690
Class :character Class :character 1st Qu.: 0.0 Class :character
Mode :character Mode :character Median : 5.0 Mode :character
Mean : 1017.4
3rd Qu.: 395.5
Max. :100000.0
data <- data %>%
mutate(Approved = recode(Approved,
"+" = "1",
"-" = "0"))
A continuación realizamos una inspeccion visual de cada una de las variables en función de la variable de aprovación del crédito (“Approved”).
explain.target <- function(dataframe.object, target.feature){
for (columna in 1:ncol(dataframe.object)){
if (names(dataframe.object[columna]) == "Approved"){
next
} else {
if (class(dataframe.object[,columna]) == "factor"){
plot <- ggplot(dataframe.object) +
geom_bar(aes(dataframe.object[,columna], fill = as.factor(target.feature))) +
labs(title = paste(names(dataframe.object[columna]), "- Approved")) +
xlab(names(dataframe.object[columna]))+
ylab("Frecuencia") +
scale_fill_discrete(name="Crédit Approved", breaks=c("0","1"),
labels=c("NO","YES"))
}
else if (class(dataframe.object[,columna]) == "character"){
plot <- ggplot(dataframe.object) +
geom_bar(aes(dataframe.object[,columna], fill = as.factor(target.feature))) +
labs(title = paste(names(dataframe.object[columna]), "- Approved")) +
xlab(names(dataframe.object[columna]))+
ylab("Frecuencia") +
scale_fill_discrete(name="Crédit Approved", breaks=c("0","1"),
labels=c("NO","YES"))
}
else {
plot <- ggplot(dataframe.object) +
geom_boxplot(aes(dataframe.object[,columna], fill = as.factor(target.feature)))+
coord_flip()+
labs(title=paste(names(dataframe.object[columna]), "- Approved"))+
xlab(names(dataframe.object[columna])) +
scale_fill_discrete(name =" Approved", breaks=c("0","1"),
labels=c("NO","YES"))
}
plot <- print(plot)
}
}
}
explain.target(dataframe.object = data, target.feature = data$Approved)
Las observaciones se pueden dividir en dos:
Variables continuas: Se distribuyen de una manera similar en todos los caos, no obstante revisaremos esto más adelante ya que en el caso de CreditScore los valores outliers no nos permiten apreciar diferencias.
Variables discretas: Se observan valores faltanes (“?”) que se eliminaran. Las variables “Married”, “BankCustomer” y “Citizen” tienen valores que siempre obtienen el crédito bancario por lo que son buenas para separar datos. La variable PriorDefault contiene para su valor “t” una mayor cantidad de créditos concedidos mientras que para su valor “f” lo contrario.
missForestSe observa que algunas variables como Male, Married, BankCostumer, Education level y Ethnicity que poseen valores designados como “?”.Dichos valores se transforman en valores nulos en todo el dataset.
data[data == "?"] <- NA
Ahora prepararemos el dataset e imputaremos valores empleando para ello la librería MissForest
sapply(data, function(x) sum(is.na(x))); sum(sapply(data, function(x) sum(is.na(x))))
Male Age Debt Married BankCustomer EducationLevel
12 12 0 6 6 9
Ethnicity YearsEmployed PriorDefault Employed CreditScore DriversLicense
9 0 0 0 0 0
Citizen ZipCode Income Approved
0 13 0 0
[1] 67
Se convierten en factor las variables chr para poder aplicar MissForest
data <- type.convert(data, as.is=FALSE)
data.i <- missForest(as.data.frame(data))
missForest iteration 1 in progress...done!
missForest iteration 2 in progress...done!
missForest iteration 3 in progress...done!
data <- data.i$ximp
Comprobamos que los valores Nulos han desaparecido
sapply(data, function(x) sum(is.na(x)))
Male Age Debt Married BankCustomer EducationLevel
0 0 0 0 0 0
Ethnicity YearsEmployed PriorDefault Employed CreditScore DriversLicense
0 0 0 0 0 0
Citizen ZipCode Income Approved
0 0 0 0
summary(data)
Male Age Debt Married BankCustomer EducationLevel
a:213 Min. :13.75 Min. : 0.000 l: 2 g :525 c :137
b:477 1st Qu.:22.67 1st Qu.: 1.000 u:525 gg: 2 q : 78
Median :28.58 Median : 2.750 y:163 p :163 w : 64
Mean :31.59 Mean : 4.759 i : 63
3rd Qu.:38.23 3rd Qu.: 7.207 aa : 54
Max. :80.25 Max. :28.000 ff : 54
(Other):240
Ethnicity YearsEmployed PriorDefault Employed CreditScore DriversLicense
v :400 Min. : 0.000 f:329 f:395 Min. : 0.0 f:374
h :138 1st Qu.: 0.165 t:361 t:295 1st Qu.: 0.0 t:316
bb : 63 Median : 1.000 Median : 0.0
ff : 58 Mean : 2.223 Mean : 2.4
j : 9 3rd Qu.: 2.625 3rd Qu.: 3.0
z : 9 Max. :28.500 Max. :67.0
(Other): 13
Citizen ZipCode Income Approved
g:625 Min. : 0.0 Min. : 0.0 Min. :0.0000
p: 8 1st Qu.: 80.0 1st Qu.: 0.0 1st Qu.:0.0000
s: 57 Median : 160.0 Median : 5.0 Median :0.0000
Mean : 183.7 Mean : 1017.4 Mean :0.4449
3rd Qu.: 272.0 3rd Qu.: 395.5 3rd Qu.:1.0000
Max. :2000.0 Max. :100000.0 Max. :1.0000
La variable ZipCode vemos que tiene 183 variables diferentes las cuales no son numéricas sino categóricas por lo que se decide prescindir de esta variable antes de continuar con el análisis.
unique(data$ZipCode)
[1] 202.00000 43.00000 280.00000 100.00000 120.00000 360.00000 164.00000
[8] 80.00000 180.00000 52.00000 128.00000 260.00000 0.00000 320.00000
[15] 396.00000 96.00000 200.00000 300.00000 145.00000 500.00000 168.00000
[22] 434.00000 583.00000 30.00000 240.00000 70.00000 455.00000 311.00000
[29] 216.00000 491.00000 400.00000 239.00000 160.00000 711.00000 250.00000
[36] 520.00000 515.00000 420.00000 266.95667 980.00000 443.00000 140.00000
[43] 94.00000 368.00000 288.00000 928.00000 188.00000 112.00000 171.00000
[50] 268.00000 167.00000 75.00000 152.00000 176.00000 329.00000 212.00000
[57] 410.00000 274.00000 375.00000 408.00000 350.00000 204.00000 40.00000
[64] 181.00000 399.00000 440.00000 93.00000 60.00000 395.00000 393.00000
[71] 21.00000 29.00000 102.00000 431.00000 370.00000 24.00000 20.00000
[78] 129.00000 510.00000 195.00000 144.00000 380.00000 149.66111 49.00000
[85] 50.00000 109.27971 381.00000 150.00000 117.00000 56.00000 211.00000
[92] 230.00000 156.00000 22.00000 228.00000 519.00000 253.00000 487.00000
[99] 220.00000 91.02667 88.00000 73.00000 121.00000 470.00000 136.00000
[106] 132.00000 292.00000 154.00000 272.00000 216.17571 340.00000 92.32067
[113] 108.00000 720.00000 450.00000 232.00000 170.00000 1160.00000 411.00000
[120] 189.66657 460.00000 348.00000 480.00000 640.00000 372.00000 276.00000
[127] 221.00000 352.00000 141.00000 178.00000 600.00000 550.00000 207.39714
[134] 2000.00000 225.00000 210.00000 110.00000 356.00000 45.00000 62.00000
[141] 92.00000 174.00000 17.00000 86.00000 82.99895 454.00000 201.13571
[148] 254.00000 28.00000 263.00000 333.00000 312.00000 290.00000 371.00000
[155] 99.00000 252.00000 760.00000 560.00000 130.00000 523.00000 680.00000
[162] 163.00000 208.00000 383.00000 330.00000 422.00000 840.00000 432.00000
[169] 32.00000 186.00000 303.00000 184.46190 349.00000 224.00000 369.00000
[176] 140.25905 231.42357 76.00000 231.00000 309.00000 416.00000 465.00000
[183] 256.00000
data = subset(data, select = -ZipCode)
Se convierten las variables: Male, PriorDefault, Employed y DriverLicense a variables del tipo factor binario.
data <- data %>%
mutate(Male = recode(Male,
"a"="1",
"b"="0",))
data$PriorDefault <- as.factor(data$PriorDefault)
data <- data %>%
mutate(PriorDefault = recode(PriorDefault,
"t"="No",
"f"="Yes"))
data$Employed <- as.factor(data$Employed)
data <- data %>%
mutate(Employed = recode(Employed,
"t"="Employed",
"f"="Unemployed"))
data$DriversLicense <- as.factor(data$DriversLicense)
data <- data %>%
mutate(DriversLicense = recode(DriversLicense,
"t"="1",
"f"="0"))
data$Approved <- as.character(data$Approved)
str(data)
'data.frame': 690 obs. of 15 variables:
$ Male : Factor w/ 2 levels "1","0": 2 1 1 2 2 2 2 1 2 2 ...
$ Age : num 30.8 58.7 24.5 27.8 20.2 ...
$ Debt : num 0 4.46 0.5 1.54 5.62 ...
$ Married : Factor w/ 3 levels "l","u","y": 2 2 2 2 2 2 2 2 3 3 ...
$ BankCustomer : Factor w/ 3 levels "g","gg","p": 1 1 1 1 1 1 1 1 3 3 ...
$ EducationLevel: Factor w/ 14 levels "aa","c","cc",..: 13 11 11 13 13 10 12 3 9 13 ...
$ Ethnicity : Factor w/ 9 levels "bb","dd","ff",..: 8 4 4 8 8 8 4 8 4 8 ...
$ YearsEmployed : num 1.25 3.04 1.5 3.75 1.71 ...
$ PriorDefault : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
$ Employed : Factor w/ 2 levels "Unemployed","Employed": 2 2 1 2 1 1 1 1 1 1 ...
$ CreditScore : int 1 6 0 5 0 0 0 0 0 0 ...
$ DriversLicense: Factor w/ 2 levels "0","1": 1 1 1 2 1 2 2 1 1 2 ...
$ Citizen : Factor w/ 3 levels "g","p","s": 1 1 1 1 3 1 1 1 1 1 ...
$ Income : int 0 560 824 3 0 0 31285 1349 314 1442 ...
$ Approved : chr "1" "1" "1" "1" ...
summary(data)
Male Age Debt Married BankCustomer EducationLevel
1:213 Min. :13.75 Min. : 0.000 l: 2 g :525 c :137
0:477 1st Qu.:22.67 1st Qu.: 1.000 u:525 gg: 2 q : 78
Median :28.58 Median : 2.750 y:163 p :163 w : 64
Mean :31.59 Mean : 4.759 i : 63
3rd Qu.:38.23 3rd Qu.: 7.207 aa : 54
Max. :80.25 Max. :28.000 ff : 54
(Other):240
Ethnicity YearsEmployed PriorDefault Employed CreditScore
v :400 Min. : 0.000 Yes:329 Unemployed:395 Min. : 0.0
h :138 1st Qu.: 0.165 No :361 Employed :295 1st Qu.: 0.0
bb : 63 Median : 1.000 Median : 0.0
ff : 58 Mean : 2.223 Mean : 2.4
j : 9 3rd Qu.: 2.625 3rd Qu.: 3.0
z : 9 Max. :28.500 Max. :67.0
(Other): 13
DriversLicense Citizen Income Approved
0:374 g:625 Min. : 0.0 Length:690
1:316 p: 8 1st Qu.: 0.0 Class :character
s: 57 Median : 5.0 Mode :character
Mean : 1017.4
3rd Qu.: 395.5
Max. :100000.0
ggplot(data = data, aes(x = Male, fill = Approved)) +
geom_bar(position = "fill") +
labs(y = "Rate", x = 'Male') + ggtitle('Male vs Approved')
Parece que el género masculino tiene una mayor proporción de aprobaciones que el género femenino, pero la diferencia entre ambos índices no parece ser tan significativa, se seguirá estudiando si esto afecta a la obtención de un crédito más adelante.
ggplot(data = data, aes(x = Married, fill = Approved)) +
geom_bar() +
labs(y = "Rate", x = 'Married') + ggtitle('Married vs Approved')
En este caso se ve una clara diferencia entre el estado civil de una persona y la posibilidad de obtener un crédito bancario. Cabe destacar que para el estado civil ‘l’ la aprobación del crédito es total, esto pude deberse a que la muestra es demasiado pequeña y todas las personas con ese estado civil consiguieron el préstamo. Se comprueba de la siguiente manera:
data %>%
group_by(Married) %>%
count()
Se ve que apenas dos personas están clasificadas como ‘l’ dentro de la variable Married con lo que queda explicada la anomalía de tener un 100% de créditos aprobados en este caso.
ggplot(data = data, aes(x = BankCustomer, fill = Approved)) +
geom_bar() +
labs(y = "Rate", x = 'Bank Customer') + ggtitle('Bank Customer vs Approved')
En este caso vemos una correlación entre los estados de los clientes bancarios y la tasa de aprobación de un crédito. Aunque nuevamente vemos que para la categoría ‘gg’ obtenemos un 100% de tasa de aprobación, asi que se estudiara el tamaño de la muestra:
data %>%
group_by(BankCustomer) %>%
count()
De nuevo vemos que hay solo dos personas en esta categoría y que a la vez obtuvieron el préstamo explicando así esa tasa de 100% de aprobación
ggplot(data = data, aes(x = EducationLevel, fill = Approved)) +
geom_bar() +
labs(y = "Rate", x = 'Education Level') + ggtitle('Education Level vs Approved')
Se aprecia que el nivel de eduación también afecta a nuestra variable objetivo, para el nivel “x” y “cc” hay una mayor tasa de aprobación que para los niveles “ff” y “d”.
ggplot(data = data, aes(x = Ethnicity, fill = Approved)) +
geom_bar() +
labs(y = "Rate", x = 'Ethnicity') + ggtitle('Ethnicity vs Approved')
La etnia de una persona aparentemente afecta a la probabilidad de obtener un prestamos, los individuos etiquetados como “ff” tienen menos opciones de obtener un préstamo que los etiquetados como “z”.
ggplot(data = data, aes(x = PriorDefault, fill = Approved)) +
geom_bar(position = "fill") +
labs(y = "Rate", x = 'Prior Default') + ggtitle('Prior Default vs Approved')
Se ve claramente que aquellos clientes que no han cumplido con sus pagos tiene muy pocas opciones de conseguir un nuevo crédito.
ggplot(data = data, aes(x = Employed, fill = Approved)) +
geom_bar(position = "fill") +
labs(y = "Rate", x = 'Employed') + ggtitle('Employed vs Approved')
Como es lógico cabe esperar que las personas con trabajo tengan más opciones de obtener un préstamo
ggplot(data = data, aes(x = DriversLicense, fill = Approved)) +
geom_bar(position = "fill") +
labs(y = "Rate", x = 'Drivers License') + ggtitle('Drivers License vs Approved')
En este caso no parece haber una relación entre ambas variables.
ggplot(data = data, aes(x = Citizen, fill = Approved)) +
geom_bar(position = "fill") +
labs(y = "Rate", x = 'Citizenship') + ggtitle('Citizenship vs Approved')
Parece haber alguna relación entre estas dos variables.
Para comprobar si existe independencia entre las diferentes variables categóricas y la variable objetivo, comprobaremos el chi-cuadrado con un nivel de significancia del 95%, la siguiente función imprimirá el nombre de la variable y los p-valores resultantes.
categoricVars <- data %>% dplyr::select(Male, Married, BankCustomer, EducationLevel,
Ethnicity, PriorDefault, Employed, DriversLicense,
Citizen)
sapply(categoricVars,
function(x) round(chisq.test(table(x, data$Approved))$p.value,2))
Warning in chisq.test(table(x, data$Approved)) :
Chi-squared approximation may be incorrect
Warning in chisq.test(table(x, data$Approved)) :
Chi-squared approximation may be incorrect
Warning in chisq.test(table(x, data$Approved)) :
Chi-squared approximation may be incorrect
Warning in chisq.test(table(x, data$Approved)) :
Chi-squared approximation may be incorrect
Warning in chisq.test(table(x, data$Approved)) :
Chi-squared approximation may be incorrect
Male Married BankCustomer EducationLevel Ethnicity PriorDefault
0.54 0.00 0.00 0.00 0.00 0.00
Employed DriversLicense Citizen
0.00 0.45 0.01
Las variables Married, BankCustomer, EducationLevel, Ethnicity, PriorDefault y Employed son dependientes de la variable objetivo. Mientras que Male y DriversLicense son independientes. Por tanto, eliminaremos estas dos últimas variables de nuestro modelo.
data$Approved <- as.factor(data$Approved)
cdplot(data$Approved ~ data$Age, main = "Age vs Approved",
xlab = "Age", ylab = "Conditional Density" )
El gráfico muestra cómo los que tienen más edad (60) tienen más posibilidades de que les aprueben el crédito, aunque cuando se llega al umbral de los 75 años parece que la probabilidad baja drásticamente. Para más información se realiza un diagrama de cajas:
ggplot(data, aes(x= Approved, y= Age, fill= Approved)) +
geom_boxplot() +
labs(y = "Age", x = 'Approved') + ggtitle('Age vs Approved') +
scale_fill_brewer(palette = "Set2")
Como se ha visto en el gráfico anterior parece haber una cierta correlación entre la edad y la tasa de aprobación, a más edad podrias tener mayor facilidad para conseguir un crédito.
cdplot(data$Approved ~ data$Debt, main = "Debt vs Approved",
xlab = "Debt", ylab = "Conditional Density" )
La gráfica describe una relación entre la deuda y la aprobación del crédito en la que cuanto más deuda tienes más posibilidades tienes de conseguir un crédito, aunque parece bajar alrededor del 26 en el eje de la Deuda para luego volver a subir.
ggplot(data, aes(x= Approved, y= Debt, fill= Approved)) +
geom_boxplot() +
labs(y = "Debt", x = 'Approved') +
ggtitle('Debt vs Approved') +
scale_fill_brewer(palette = "Set2")
El grafico de cajas parece indicar lo mismo descrito antes.
ggplot(data, aes(x= Approved, y= YearsEmployed, fill= Approved)) +
geom_boxplot() +
labs(y = "Years Employed", x = 'Approved') +
ggtitle('Years Employed vs Approved') +
scale_fill_brewer(palette = "Set2")
Parece haber una correlación positiva entre los años trabajados y la aprobación del crédito.
ggplot(data, aes(x= Approved, y= CreditScore, fill= Approved)) +
geom_boxplot() +
labs(y = "Credit Score", x = 'Approved') +
ggtitle('Credit Score vs Approved') +
scale_fill_brewer(palette = "Set2")
De nuevo se aprecia una correlación positiva entre ambas variables
ggplot(data, aes(x= Approved, y= Income, fill= Approved)) +
geom_boxplot() +
labs(y = "Income", x = 'Approved') +
ggtitle('Income vs Approved') +
scale_fill_brewer(palette = "Set2")
Este gráfico contiene una gran cantidad de valores atípicos extremos, por lo que para apreciar la gráfica hacemos un zoom:
ggplot(data, aes(x= Approved, y= Income, fill= Approved)) +
geom_boxplot() +
labs(y = "Income", x = 'Approved') +
ggtitle('Income vs Approved') +
scale_fill_brewer(palette = "Set2") +
coord_cartesian(ylim=c(0, 1500)) #zoom
El gráfico muestra una correlación positiva entre las variables Income y Approved.
Ahora determinaremos una matriz de correlación para verificar si existe colinealidad entre las variables numéricas.
numericVars <- data.frame(data$Age, data$Debt, data$YearsEmployed, data$CreditScore, data$Income)
#corrplot(cor(numericVars), method = "number", type="upper")
chart.Correlation(numericVars, histogram=TRUE, pch=19)
El valor más grande es 0.4 entre Años empleados y Edad, este valor no es tan grande como para causar colinealidad, por lo que ambas variables se incluirán en nuestro modelo.
Primero comprobamos si nuestras variables numéricas siguen una distribución normal.
for (columna in 1:ncol(data)){
if (class(data[,columna]) != "factor"){
qqnorm(data[,columna],
main = paste("Normality Plot: ", colnames(data[columna])))
qqline(data[,columna])
} else {
next
}
}
Ninguna de las variables parecen tener una distribución normal pero vamos a comprobarlo con la prueba de Shapiro.
sapply(numericVars, function(x) round(shapiro.test(x)$p.value,2))
data.Age data.Debt data.YearsEmployed data.CreditScore
0 0 0 0
data.Income
0
Los valores de p obtenidos en la prueba de Shapiro son cercanos a 0, rechazamos la hipótesis nula de que existe normalidad en todos los casos, por lo que aceptamos la hipótesis alternativa de que ninguna de las variables tiene una distribución normal.
Necesitamos normalizar todas las variables numéricas.
No hay colinealidad entre las variables numéricas.
Las variables categóricas “Male” y “DriversLicense” no parecen influir en la variable objetivo, el resto sí lo hace en diferente medida.
Las categorías ‘l’ y ‘gg’ de las variables “Married” y “BankCustomer” respectivamente, solo tienen dos observaciones cada una, y se les otorgó crédito en todos los casos. Por lo tanto, se supone que ambas variables son variables binarias, por lo que deberíamos eliminarlos de nuestro modelo.
data$Age <- scale(data$Age)
data$Debt <- scale(data$Debt)
data$YearsEmployed <- scale(data$YearsEmployed)
data$CreditScore <- scale(data$CreditScore)
data$Income <- scale(data$Income)
data$Male <- NULL
data$DriversLicense <- NULL
Ya que nuestros datos tienen variables categóricas, debemos tratarlas como dummies en un modelo de clasificación, por lo que definiremos un nuevo dataframe con variables dummies. Además, se eliminan la categoría “l” de Married y “gg” de BankCustomer.
df <- dummy_cols(data, remove_selected_columns = T)
colnames(df)
[1] "Age" "Debt" "YearsEmployed"
[4] "CreditScore" "Income" "Married_l"
[7] "Married_u" "Married_y" "BankCustomer_g"
[10] "BankCustomer_gg" "BankCustomer_p" "EducationLevel_aa"
[13] "EducationLevel_c" "EducationLevel_cc" "EducationLevel_d"
[16] "EducationLevel_e" "EducationLevel_ff" "EducationLevel_i"
[19] "EducationLevel_j" "EducationLevel_k" "EducationLevel_m"
[22] "EducationLevel_q" "EducationLevel_r" "EducationLevel_w"
[25] "EducationLevel_x" "Ethnicity_bb" "Ethnicity_dd"
[28] "Ethnicity_ff" "Ethnicity_h" "Ethnicity_j"
[31] "Ethnicity_n" "Ethnicity_o" "Ethnicity_v"
[34] "Ethnicity_z" "PriorDefault_Yes" "PriorDefault_No"
[37] "Employed_Unemployed" "Employed_Employed" "Citizen_g"
[40] "Citizen_p" "Citizen_s" "Approved_0"
[43] "Approved_1"
df$Approved_0 <- NULL
df$Approved_1 <- NULL
df$Married_l <- NULL
df$BankCustomer_gg <- NULL
df$Approved <- data$Approved
summary(df)
Age Debt YearsEmployed CreditScore Income
Min. :-1.5031 Min. :-0.9559 Min. :-0.6644 Min. :-0.4935 Min. :-0.1953
1st Qu.:-0.7515 1st Qu.:-0.7550 1st Qu.:-0.6151 1st Qu.:-0.4935 1st Qu.:-0.1953
Median :-0.2535 Median :-0.4035 Median :-0.3656 Median :-0.4935 Median :-0.1943
Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
3rd Qu.: 0.5595 3rd Qu.: 0.4919 3rd Qu.: 0.1200 3rd Qu.: 0.1234 3rd Qu.:-0.1194
Max. : 4.1000 Max. : 4.6686 Max. : 7.8519 Max. :13.2841 Max. :18.9982
Married_u Married_y BankCustomer_g BankCustomer_p EducationLevel_aa
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.00000
Median :1.0000 Median :0.0000 Median :1.0000 Median :0.0000 Median :0.00000
Mean :0.7609 Mean :0.2362 Mean :0.7609 Mean :0.2362 Mean :0.07826
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
EducationLevel_c EducationLevel_cc EducationLevel_d EducationLevel_e EducationLevel_ff
Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.0000 Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
Mean :0.1986 Mean :0.05942 Mean :0.04348 Mean :0.03913 Mean :0.07826
3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
EducationLevel_i EducationLevel_j EducationLevel_k EducationLevel_m EducationLevel_q
Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000
Median :0.0000 Median :0.00000 Median :0.00000 Median :0.00000 Median :0.000
Mean :0.0913 Mean :0.01594 Mean :0.07391 Mean :0.05652 Mean :0.113
3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.000
Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.000
EducationLevel_r EducationLevel_w EducationLevel_x Ethnicity_bb Ethnicity_dd
Min. :0.000000 Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
Median :0.000000 Median :0.00000 Median :0.00000 Median :0.0000 Median :0.00000
Mean :0.004348 Mean :0.09275 Mean :0.05507 Mean :0.0913 Mean :0.01014
3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :1.000000 Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.00000
Ethnicity_ff Ethnicity_h Ethnicity_j Ethnicity_n Ethnicity_o
Min. :0.00000 Min. :0.0 Min. :0.00000 Min. :0.000000 Min. :0.000000
1st Qu.:0.00000 1st Qu.:0.0 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.000000
Median :0.00000 Median :0.0 Median :0.00000 Median :0.000000 Median :0.000000
Mean :0.08406 Mean :0.2 Mean :0.01304 Mean :0.005797 Mean :0.002899
3rd Qu.:0.00000 3rd Qu.:0.0 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.000000
Max. :1.00000 Max. :1.0 Max. :1.00000 Max. :1.000000 Max. :1.000000
Ethnicity_v Ethnicity_z PriorDefault_Yes PriorDefault_No Employed_Unemployed
Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.0000 Median :0.00000 Median :0.0000 Median :1.0000 Median :1.0000
Mean :0.5797 Mean :0.01304 Mean :0.4768 Mean :0.5232 Mean :0.5725
3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
Employed_Employed Citizen_g Citizen_p Citizen_s Approved
Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000 0:383
1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.00000 1st Qu.:0.00000 1:307
Median :0.0000 Median :1.0000 Median :0.00000 Median :0.00000
Mean :0.4275 Mean :0.9058 Mean :0.01159 Mean :0.08261
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
dim(df)
[1] 690 40
head(df)
Se realizará un modelo de selección de variables basado en stepAIC, en primer lugar definimos el modelo mínimo y máximo, donde el mínimo será la variable objetivo(Approved) contra sí mismo y el valor máximo la variable objetivo contra todas las variables:
fit1 <- glm(Approved~., data=df, family=binomial)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
fit0 <- glm(Approved~1, data=df, family=binomial)
step <-stepAIC(fit0,direction="both",scope=list(upper=fit1,lower=fit0))
Start: AIC=950.16
Approved ~ 1
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ PriorDefault_No 1 540.95 544.95
+ PriorDefault_Yes 1 540.95 544.95
+ CreditScore 1 762.74 766.74
+ Employed_Employed 1 798.66 802.66
+ Employed_Unemployed 1 798.66 802.66
+ Income 1 862.80 866.80
+ YearsEmployed 1 863.32 867.32
+ Debt 1 918.36 922.36
+ EducationLevel_x 1 920.99 924.99
+ Married_y 1 922.64 926.64
+ BankCustomer_p 1 922.64 926.64
+ Ethnicity_ff 1 924.15 928.15
+ Ethnicity_h 1 924.16 928.16
+ EducationLevel_ff 1 924.72 928.72
+ Married_u 1 924.92 928.92
+ BankCustomer_g 1 924.92 928.92
+ Age 1 929.85 933.85
+ EducationLevel_q 1 932.62 936.62
+ EducationLevel_cc 1 935.90 939.90
+ EducationLevel_i 1 937.37 941.37
+ Citizen_s 1 939.43 943.43
+ EducationLevel_k 1 941.39 945.39
+ EducationLevel_d 1 942.09 946.09
+ Citizen_g 1 942.51 946.51
+ EducationLevel_aa 1 946.06 950.06
<none> 948.16 950.16
+ Ethnicity_v 1 946.22 950.22
+ Ethnicity_z 1 946.34 950.34
+ EducationLevel_w 1 946.74 950.74
+ Citizen_p 1 947.10 951.10
+ Ethnicity_dd 1 947.40 951.40
+ EducationLevel_e 1 947.54 951.54
+ EducationLevel_r 1 947.55 951.55
+ EducationLevel_j 1 947.85 951.85
+ EducationLevel_m 1 947.95 951.95
+ Ethnicity_bb 1 948.08 952.08
+ Ethnicity_n 1 948.11 952.11
+ EducationLevel_c 1 948.11 952.11
+ Ethnicity_o 1 948.13 952.13
+ Ethnicity_j 1 948.16 952.16
Step: AIC=544.95
Approved ~ PriorDefault_No
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ CreditScore 1 503.21 509.21
+ Income 1 505.09 511.09
+ Employed_Unemployed 1 507.67 513.67
+ Employed_Employed 1 507.67 513.67
+ Citizen_p 1 523.48 529.48
+ Married_y 1 528.70 534.70
+ BankCustomer_p 1 528.70 534.70
+ EducationLevel_x 1 531.38 537.38
+ Married_u 1 531.98 537.98
+ BankCustomer_g 1 531.98 537.98
+ YearsEmployed 1 532.97 538.97
+ EducationLevel_aa 1 534.80 540.80
+ EducationLevel_cc 1 534.81 540.81
+ EducationLevel_ff 1 536.36 542.36
+ Ethnicity_ff 1 536.69 542.69
+ Ethnicity_h 1 537.32 543.32
+ EducationLevel_k 1 537.95 543.95
+ Citizen_s 1 537.97 543.97
+ Ethnicity_o 1 538.22 544.22
+ Ethnicity_n 1 538.72 544.72
<none> 540.95 544.95
+ EducationLevel_d 1 539.15 545.15
+ EducationLevel_q 1 539.20 545.20
+ Ethnicity_j 1 539.35 545.35
+ EducationLevel_i 1 539.61 545.61
+ Debt 1 539.69 545.69
+ EducationLevel_w 1 539.88 545.88
+ Ethnicity_bb 1 540.07 546.07
+ Ethnicity_v 1 540.48 546.48
+ EducationLevel_r 1 540.59 546.59
+ Age 1 540.68 546.68
+ EducationLevel_m 1 540.69 546.69
+ EducationLevel_e 1 540.82 546.82
+ EducationLevel_j 1 540.83 546.83
+ Ethnicity_z 1 540.85 546.85
+ EducationLevel_c 1 540.89 546.89
+ Citizen_g 1 540.92 546.92
+ Ethnicity_dd 1 540.94 546.94
- PriorDefault_No 1 948.16 950.16
Step: AIC=509.21
Approved ~ PriorDefault_No + CreditScore
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ Income 1 477.87 485.87
+ Citizen_p 1 484.26 492.26
+ EducationLevel_x 1 492.76 500.76
+ Married_y 1 494.87 502.87
+ BankCustomer_p 1 494.87 502.87
+ Ethnicity_ff 1 497.51 505.51
+ Married_u 1 497.62 505.62
+ BankCustomer_g 1 497.62 505.62
+ EducationLevel_ff 1 497.72 505.72
+ EducationLevel_cc 1 497.84 505.84
+ Employed_Employed 1 498.33 506.33
+ Employed_Unemployed 1 498.33 506.33
+ YearsEmployed 1 499.68 507.68
+ Ethnicity_h 1 499.76 507.76
+ EducationLevel_aa 1 499.85 507.85
+ Ethnicity_o 1 500.21 508.21
+ EducationLevel_k 1 500.79 508.79
<none> 503.21 509.21
+ Ethnicity_j 1 501.27 509.27
+ Ethnicity_n 1 501.28 509.28
+ EducationLevel_d 1 501.70 509.70
+ EducationLevel_w 1 501.72 509.72
+ EducationLevel_i 1 501.96 509.96
+ Citizen_g 1 502.06 510.06
+ Ethnicity_bb 1 502.15 510.15
+ EducationLevel_q 1 502.17 510.17
+ Citizen_s 1 502.69 510.69
+ Ethnicity_z 1 502.82 510.82
+ EducationLevel_r 1 502.86 510.86
+ Ethnicity_v 1 503.05 511.05
+ EducationLevel_m 1 503.06 511.06
+ EducationLevel_e 1 503.09 511.09
+ EducationLevel_j 1 503.13 511.13
+ Age 1 503.18 511.18
+ Ethnicity_dd 1 503.19 511.19
+ Debt 1 503.19 511.19
+ EducationLevel_c 1 503.21 511.21
- CreditScore 1 540.95 544.95
- PriorDefault_No 1 762.74 766.74
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=485.87
Approved ~ PriorDefault_No + CreditScore + Income
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ Citizen_p 1 462.37 472.37
+ EducationLevel_x 1 468.30 478.30
+ EducationLevel_ff 1 470.00 480.00
+ Married_y 1 470.29 480.29
+ BankCustomer_p 1 470.29 480.29
+ Married_u 1 471.72 481.72
+ BankCustomer_g 1 471.72 481.72
+ Ethnicity_ff 1 472.52 482.52
+ EducationLevel_cc 1 472.64 482.64
+ Employed_Unemployed 1 473.35 483.35
+ Employed_Employed 1 473.35 483.35
+ YearsEmployed 1 473.93 483.93
+ Ethnicity_h 1 474.45 484.45
+ EducationLevel_aa 1 475.33 485.33
+ Ethnicity_j 1 475.57 485.57
+ Ethnicity_n 1 475.74 485.74
<none> 477.87 485.87
+ EducationLevel_k 1 476.08 486.08
+ Citizen_g 1 476.19 486.19
+ EducationLevel_w 1 476.44 486.44
+ EducationLevel_q 1 476.57 486.57
+ Ethnicity_bb 1 476.76 486.76
+ EducationLevel_i 1 476.79 486.79
+ EducationLevel_d 1 477.00 487.00
+ Ethnicity_z 1 477.05 487.05
+ EducationLevel_m 1 477.71 487.71
+ EducationLevel_j 1 477.72 487.72
+ Debt 1 477.73 487.73
+ Ethnicity_o 1 477.76 487.76
+ Age 1 477.83 487.83
+ Ethnicity_dd 1 477.83 487.83
+ EducationLevel_e 1 477.84 487.84
+ Citizen_s 1 477.84 487.84
+ Ethnicity_v 1 477.85 487.85
+ EducationLevel_r 1 477.86 487.86
+ EducationLevel_c 1 477.87 487.87
- Income 1 503.21 509.21
- CreditScore 1 505.09 511.09
- PriorDefault_No 1 723.20 729.20
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=472.37
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ EducationLevel_x 1 452.34 464.34
+ EducationLevel_ff 1 454.25 466.25
+ Married_y 1 456.07 468.07
+ BankCustomer_p 1 456.07 468.07
+ EducationLevel_cc 1 456.58 468.58
+ Employed_Unemployed 1 456.73 468.73
+ Employed_Employed 1 456.73 468.73
+ Ethnicity_ff 1 456.89 468.89
+ Married_u 1 457.40 469.40
+ BankCustomer_g 1 457.40 469.40
+ YearsEmployed 1 457.91 469.91
+ Ethnicity_h 1 458.34 470.34
+ EducationLevel_i 1 459.19 471.19
+ Ethnicity_bb 1 459.23 471.23
+ Ethnicity_n 1 459.86 471.86
+ EducationLevel_aa 1 460.15 472.15
<none> 462.37 472.37
+ EducationLevel_w 1 460.55 472.55
+ EducationLevel_q 1 460.83 472.83
+ EducationLevel_k 1 460.94 472.94
+ Ethnicity_j 1 461.40 473.40
+ Ethnicity_z 1 461.55 473.55
+ EducationLevel_d 1 461.69 473.69
+ Age 1 462.16 474.16
+ EducationLevel_m 1 462.28 474.28
+ Ethnicity_o 1 462.28 474.28
+ Ethnicity_dd 1 462.29 474.29
+ Ethnicity_v 1 462.30 474.30
+ EducationLevel_e 1 462.30 474.30
+ EducationLevel_r 1 462.35 474.35
+ Debt 1 462.36 474.36
+ EducationLevel_j 1 462.36 474.36
+ EducationLevel_c 1 462.36 474.36
+ Citizen_g 1 462.37 474.37
+ Citizen_s 1 462.37 474.37
- Citizen_p 1 477.87 485.87
- Income 1 484.26 492.26
- CreditScore 1 490.12 498.12
- PriorDefault_No 1 719.88 727.88
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=464.34
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ Married_y 1 444.97 458.97
+ BankCustomer_p 1 444.97 458.97
+ EducationLevel_ff 1 445.13 459.13
+ EducationLevel_cc 1 445.65 459.65
+ Married_u 1 446.44 460.44
+ BankCustomer_g 1 446.44 460.44
+ Ethnicity_ff 1 447.63 461.63
+ Employed_Unemployed 1 447.75 461.75
+ Employed_Employed 1 447.75 461.75
+ YearsEmployed 1 448.32 462.32
+ Ethnicity_n 1 449.70 463.70
+ EducationLevel_w 1 449.77 463.77
+ EducationLevel_i 1 449.82 463.82
+ Ethnicity_bb 1 449.82 463.82
+ EducationLevel_q 1 449.92 463.92
+ Ethnicity_h 1 450.18 464.18
<none> 452.34 464.34
+ EducationLevel_aa 1 450.82 464.82
+ Ethnicity_j 1 451.25 465.25
+ EducationLevel_k 1 451.37 465.37
+ Ethnicity_z 1 451.69 465.69
+ EducationLevel_d 1 451.88 465.88
+ Ethnicity_v 1 452.08 466.08
+ Age 1 452.14 466.14
+ EducationLevel_c 1 452.15 466.15
+ EducationLevel_e 1 452.18 466.18
+ Ethnicity_dd 1 452.22 466.22
+ Ethnicity_o 1 452.26 466.26
+ Debt 1 452.26 466.26
+ Citizen_g 1 452.30 466.30
+ Citizen_s 1 452.30 466.30
+ EducationLevel_r 1 452.31 466.31
+ EducationLevel_m 1 452.33 466.33
+ EducationLevel_j 1 452.34 466.34
- EducationLevel_x 1 462.37 472.37
- Citizen_p 1 468.30 478.30
- Income 1 473.18 483.18
- CreditScore 1 480.55 490.55
- PriorDefault_No 1 697.98 707.98
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=458.97
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x + Married_y
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ EducationLevel_cc 1 437.84 453.84
+ EducationLevel_ff 1 438.29 454.29
+ Married_u 1 438.70 454.70
+ BankCustomer_g 1 438.70 454.70
+ Ethnicity_ff 1 440.73 456.73
+ Employed_Unemployed 1 441.06 457.06
+ Employed_Employed 1 441.06 457.06
+ YearsEmployed 1 441.28 457.28
+ Ethnicity_bb 1 441.96 457.96
+ EducationLevel_w 1 441.97 457.97
+ EducationLevel_i 1 442.07 458.07
+ Ethnicity_n 1 442.33 458.33
+ Ethnicity_h 1 442.66 458.66
<none> 444.97 458.97
+ EducationLevel_q 1 443.61 459.61
+ Ethnicity_j 1 443.89 459.89
+ EducationLevel_aa 1 443.92 459.92
+ Ethnicity_z 1 444.02 460.02
+ EducationLevel_k 1 444.07 460.07
+ Age 1 444.41 460.41
+ EducationLevel_d 1 444.63 460.63
+ Ethnicity_v 1 444.64 460.64
+ EducationLevel_c 1 444.66 460.66
+ Debt 1 444.83 460.83
+ EducationLevel_e 1 444.86 460.86
+ Ethnicity_dd 1 444.87 460.87
+ Ethnicity_o 1 444.87 460.87
+ EducationLevel_m 1 444.90 460.90
+ EducationLevel_r 1 444.93 460.93
+ EducationLevel_j 1 444.97 460.97
+ Citizen_g 1 444.97 460.97
+ Citizen_s 1 444.97 460.97
- Married_y 1 452.34 464.34
- EducationLevel_x 1 456.07 468.07
- Citizen_p 1 459.56 471.56
- Income 1 465.83 477.83
- CreditScore 1 471.44 483.44
- PriorDefault_No 1 686.84 698.84
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=453.84
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x + Married_y + EducationLevel_cc
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ EducationLevel_ff 1 431.93 449.93
+ Employed_Employed 1 433.94 451.94
+ Employed_Unemployed 1 433.94 451.94
+ EducationLevel_w 1 433.96 451.96
+ Married_u 1 434.02 452.02
+ BankCustomer_g 1 434.02 452.02
+ Ethnicity_ff 1 434.06 452.06
+ YearsEmployed 1 434.90 452.90
+ Ethnicity_n 1 435.02 453.02
+ Ethnicity_bb 1 435.34 453.34
+ EducationLevel_i 1 435.49 453.49
+ EducationLevel_q 1 435.79 453.79
<none> 437.84 453.84
+ Ethnicity_h 1 436.15 454.15
+ Ethnicity_j 1 436.62 454.62
+ Ethnicity_z 1 437.03 455.03
+ EducationLevel_c 1 437.07 455.07
+ EducationLevel_aa 1 437.22 455.22
+ EducationLevel_k 1 437.27 455.27
+ Age 1 437.46 455.46
+ Ethnicity_v 1 437.49 455.49
+ EducationLevel_e 1 437.64 455.64
+ EducationLevel_d 1 437.65 455.65
+ Ethnicity_dd 1 437.70 455.70
+ Ethnicity_o 1 437.75 455.75
+ Debt 1 437.77 455.77
+ EducationLevel_r 1 437.79 455.79
+ EducationLevel_m 1 437.83 455.83
+ Citizen_g 1 437.83 455.83
+ Citizen_s 1 437.83 455.83
+ EducationLevel_j 1 437.84 455.84
- EducationLevel_cc 1 444.97 458.97
- Married_y 1 445.65 459.65
- EducationLevel_x 1 449.94 463.94
- Citizen_p 1 453.02 467.02
- Income 1 458.56 472.56
- CreditScore 1 462.58 476.58
- PriorDefault_No 1 677.25 691.25
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=449.93
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ Employed_Unemployed 1 427.78 447.78
+ Employed_Employed 1 427.78 447.78
+ Married_u 1 428.14 448.14
+ BankCustomer_g 1 428.14 448.14
+ Ethnicity_bb 1 428.45 448.45
+ EducationLevel_i 1 428.60 448.60
+ EducationLevel_w 1 428.89 448.89
+ Ethnicity_n 1 429.33 449.33
+ YearsEmployed 1 429.73 449.73
<none> 431.93 449.93
+ Ethnicity_ff 1 430.28 450.28
+ EducationLevel_q 1 430.47 450.47
+ Ethnicity_h 1 430.80 450.80
+ Ethnicity_j 1 430.93 450.93
+ EducationLevel_aa 1 430.94 450.94
+ Ethnicity_z 1 431.00 451.00
+ EducationLevel_k 1 431.03 451.03
+ EducationLevel_d 1 431.60 451.60
+ EducationLevel_c 1 431.61 451.61
+ EducationLevel_e 1 431.81 451.81
+ Ethnicity_o 1 431.83 451.83
+ Ethnicity_dd 1 431.83 451.83
+ EducationLevel_m 1 431.85 451.85
+ EducationLevel_r 1 431.88 451.88
+ Ethnicity_v 1 431.91 451.91
+ EducationLevel_j 1 431.93 451.93
+ Citizen_g 1 431.93 451.93
+ Citizen_s 1 431.93 451.93
+ Debt 1 431.93 451.93
+ Age 1 431.93 451.93
- EducationLevel_ff 1 437.84 453.84
- EducationLevel_cc 1 438.29 454.29
- Married_y 1 439.22 455.22
- EducationLevel_x 1 443.05 459.05
- Citizen_p 1 447.36 463.36
- Income 1 453.80 469.80
- CreditScore 1 456.92 472.92
- PriorDefault_No 1 655.07 671.07
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=447.78
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff +
Employed_Unemployed
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ Married_u 1 423.81 445.81
+ BankCustomer_g 1 423.81 445.81
+ Ethnicity_bb 1 424.96 446.96
+ EducationLevel_i 1 424.99 446.99
+ YearsEmployed 1 425.22 447.22
+ EducationLevel_w 1 425.37 447.37
+ Ethnicity_n 1 425.40 447.40
<none> 427.78 447.78
+ Ethnicity_ff 1 425.92 447.92
+ Ethnicity_h 1 426.39 448.39
+ Ethnicity_j 1 426.65 448.65
+ Ethnicity_z 1 426.66 448.66
+ EducationLevel_q 1 426.81 448.81
+ EducationLevel_k 1 426.91 448.91
+ EducationLevel_aa 1 427.04 449.04
+ EducationLevel_c 1 427.44 449.44
+ Ethnicity_v 1 427.64 449.64
+ Citizen_s 1 427.64 449.64
+ Citizen_g 1 427.64 449.64
+ EducationLevel_d 1 427.67 449.67
+ EducationLevel_m 1 427.67 449.67
+ Ethnicity_o 1 427.69 449.69
+ Ethnicity_dd 1 427.70 449.70
+ EducationLevel_e 1 427.70 449.70
+ EducationLevel_r 1 427.71 449.71
+ Age 1 427.74 449.74
+ EducationLevel_j 1 427.77 449.77
+ Debt 1 427.78 449.78
- Employed_Unemployed 1 431.93 449.93
- CreditScore 1 433.61 451.61
- EducationLevel_ff 1 433.94 451.94
- EducationLevel_cc 1 434.11 452.11
- Married_y 1 434.28 452.28
- EducationLevel_x 1 437.59 455.59
- Citizen_p 1 444.13 462.13
- Income 1 448.23 466.23
- PriorDefault_No 1 643.83 661.83
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=445.81
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff +
Employed_Unemployed + Married_u
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ Ethnicity_bb 1 421.01 445.01
+ EducationLevel_i 1 421.04 445.04
+ Ethnicity_n 1 421.37 445.37
+ EducationLevel_w 1 421.41 445.41
<none> 423.81 445.81
+ YearsEmployed 1 422.09 446.09
+ Ethnicity_h 1 422.26 446.26
+ Ethnicity_j 1 422.66 446.66
+ Ethnicity_z 1 422.67 446.67
+ EducationLevel_q 1 422.85 446.85
+ EducationLevel_k 1 422.94 446.94
+ EducationLevel_aa 1 423.05 447.05
+ EducationLevel_c 1 423.47 447.47
+ EducationLevel_d 1 423.70 447.70
+ EducationLevel_m 1 423.70 447.70
+ Ethnicity_dd 1 423.72 447.72
+ Age 1 423.72 447.72
+ Ethnicity_o 1 423.72 447.72
+ EducationLevel_e 1 423.73 447.73
+ EducationLevel_r 1 423.74 447.74
+ Ethnicity_v 1 423.76 447.76
- Married_u 1 427.78 447.78
- EducationLevel_cc 1 427.79 447.79
+ Debt 1 423.80 447.80
+ Ethnicity_ff 1 423.80 447.80
+ EducationLevel_j 1 423.80 447.80
+ Citizen_g 1 423.81 447.81
+ Citizen_s 1 423.81 447.81
- Employed_Unemployed 1 428.14 448.14
- Married_y 1 429.08 449.08
- CreditScore 1 429.55 449.55
- EducationLevel_ff 1 429.93 449.93
- EducationLevel_x 1 433.53 453.53
- Citizen_p 1 440.46 460.46
- Income 1 442.35 462.35
- PriorDefault_No 1 642.38 662.38
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=445.01
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff +
Employed_Unemployed + Married_u + Ethnicity_bb
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ Ethnicity_n 1 418.67 444.67
<none> 421.01 445.01
+ EducationLevel_w 1 419.04 445.04
+ YearsEmployed 1 419.29 445.29
+ Ethnicity_z 1 419.69 445.69
+ Ethnicity_v 1 419.74 445.74
- Ethnicity_bb 1 423.81 445.81
+ EducationLevel_k 1 419.91 445.91
+ EducationLevel_aa 1 419.92 445.92
+ Ethnicity_h 1 420.11 446.11
+ EducationLevel_i 1 420.12 446.12
+ Ethnicity_j 1 420.15 446.15
+ EducationLevel_q 1 420.48 446.48
- EducationLevel_cc 1 424.53 446.53
+ EducationLevel_c 1 420.55 446.55
- Employed_Unemployed 1 424.68 446.68
+ EducationLevel_m 1 420.84 446.84
+ Age 1 420.85 446.85
+ EducationLevel_d 1 420.90 446.90
+ Ethnicity_o 1 420.92 446.92
+ EducationLevel_e 1 420.95 446.95
- Married_u 1 424.96 446.96
+ Ethnicity_dd 1 420.96 446.96
+ EducationLevel_r 1 420.96 446.96
+ Ethnicity_ff 1 421.00 447.00
+ Citizen_s 1 421.00 447.00
+ Citizen_g 1 421.00 447.00
+ Debt 1 421.01 447.01
+ EducationLevel_j 1 421.01 447.01
- Married_y 1 426.30 448.30
- CreditScore 1 427.19 449.19
- EducationLevel_ff 1 427.99 449.99
- EducationLevel_x 1 430.17 452.17
- Citizen_p 1 439.60 461.60
- Income 1 439.83 461.83
- PriorDefault_No 1 641.78 663.78
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=444.67
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff +
Employed_Unemployed + Married_u + Ethnicity_bb + Ethnicity_n
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
+ EducationLevel_w 1 416.52 444.52
<none> 418.67 444.67
+ YearsEmployed 1 416.93 444.93
- Ethnicity_n 1 421.01 445.01
+ Ethnicity_z 1 417.37 445.37
- Ethnicity_bb 1 421.37 445.37
+ EducationLevel_aa 1 417.65 445.65
+ EducationLevel_k 1 417.65 445.65
+ Ethnicity_h 1 417.68 445.68
+ Ethnicity_j 1 417.76 445.76
+ Ethnicity_v 1 417.83 445.83
+ EducationLevel_i 1 417.83 445.83
- Employed_Unemployed 1 422.15 446.15
+ EducationLevel_q 1 418.21 446.21
- EducationLevel_cc 1 422.29 446.29
+ EducationLevel_c 1 418.30 446.30
+ EducationLevel_r 1 418.44 446.44
+ Age 1 418.45 446.45
+ EducationLevel_m 1 418.53 446.53
+ EducationLevel_d 1 418.58 446.58
+ Ethnicity_o 1 418.59 446.59
+ EducationLevel_e 1 418.59 446.59
+ Ethnicity_dd 1 418.60 446.60
+ Citizen_g 1 418.65 446.65
+ Citizen_s 1 418.65 446.65
+ Ethnicity_ff 1 418.66 446.66
- Married_u 1 422.67 446.67
+ EducationLevel_j 1 418.67 446.67
+ Debt 1 418.67 446.67
- Married_y 1 424.02 448.02
- CreditScore 1 424.85 448.85
- EducationLevel_ff 1 425.37 449.37
- EducationLevel_x 1 427.98 451.98
- Citizen_p 1 437.60 461.60
- Income 1 437.60 461.60
- PriorDefault_No 1 641.75 665.75
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Step: AIC=444.52
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff +
Employed_Unemployed + Married_u + Ethnicity_bb + Ethnicity_n +
EducationLevel_w
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Df Deviance AIC
<none> 416.52 444.52
- EducationLevel_w 1 418.67 444.67
+ YearsEmployed 1 414.76 444.76
- Ethnicity_bb 1 418.78 444.78
- Ethnicity_n 1 419.04 445.04
+ Ethnicity_h 1 415.20 445.20
+ Ethnicity_v 1 415.25 445.25
+ Ethnicity_z 1 415.38 445.38
+ Ethnicity_j 1 415.47 445.47
- Employed_Unemployed 1 419.52 445.52
+ EducationLevel_q 1 415.60 445.60
+ EducationLevel_c 1 415.65 445.65
+ EducationLevel_k 1 415.84 445.84
+ EducationLevel_i 1 415.89 445.89
+ EducationLevel_aa 1 415.89 445.89
+ Age 1 416.24 446.24
+ EducationLevel_r 1 416.29 446.29
+ EducationLevel_e 1 416.35 446.35
+ Ethnicity_dd 1 416.41 446.41
+ Ethnicity_o 1 416.44 446.44
+ EducationLevel_m 1 416.48 446.48
+ EducationLevel_d 1 416.48 446.48
+ Citizen_g 1 416.49 446.49
+ Citizen_s 1 416.49 446.49
+ Debt 1 416.51 446.51
- Married_u 1 420.51 446.51
+ Ethnicity_ff 1 416.51 446.51
+ EducationLevel_j 1 416.52 446.52
- EducationLevel_cc 1 420.67 446.67
- Married_y 1 421.91 447.91
- EducationLevel_ff 1 422.30 448.30
- CreditScore 1 423.29 449.29
- EducationLevel_x 1 426.67 452.67
- Income 1 435.02 461.02
- Citizen_p 1 435.73 461.73
- PriorDefault_No 1 639.97 665.97
Con un AIC 444.75 escogemos las siguiente variables aplicando el comando formula:
step$formula
Approved ~ PriorDefault_No + CreditScore + Income + Citizen_p +
EducationLevel_x + Married_y + EducationLevel_cc + EducationLevel_ff +
Employed_Unemployed + Married_u + Ethnicity_bb + Ethnicity_n +
EducationLevel_w
Selecionamos las variables indicadas en el paso anterior:
df <- df[c("Approved","PriorDefault_No","CreditScore","Income","Citizen_p","EducationLevel_x","Married_y","EducationLevel_cc","EducationLevel_ff","Employed_Unemployed","Married_u","EducationLevel_w","Ethnicity_n","Ethnicity_h")]
X <- data.matrix(subset(df, select= - Approved))
Y <- as.double(as.matrix(df$Approved))
# TRAIN
X_Train <- X[0:590,]
Y_Train <- Y[0:590]
# TEST
X_Test <- X[591:nrow(X), ]
Y_Test <- Y[591:length(Y)]
Tenemos un problema de clasificación binaria (ya sea para aprobar crédito o no), por eso crearemos un modelo de Regresión Logística.
Necesitamos crear un modelo capaz de predecir si aprobar o no un crédito de la mejor manera posible, pero también debemos minimizar el número de falsos positivos, ya que los falsos positivos harían que nuestro banco perdiera dinero otorgando créditos que no debería. Por esa razón, usaremos el Área bajo la curva (ROC) (AUC) como nuestro estimador.
ROC es un gráfico de la tasa de falsos positivos (eje x) frente a la tasa de verdaderos positivos (eje y) para varios valores de umbral candidatos diferentes entre 0,0 y 1,0, por lo que el área debajo de esta curva sería el mejor estimador posible cuando se trata de obtener buenas predicciones y minimizar los falsos positivos al mismo tiempo.
Para obtener mejores resultados, usaremos también una regularización, ya sea para usar Lasso o Ridge, usaremos un modelo Elastic-Net para eso.
cv.ridge <- cv.glmnet(X_Train, Y_Train, family='binomial', alpha=0, parallel=TRUE, standardize=TRUE, type.measure='auc')
Warning: executing %dopar% sequentially: no parallel backend registered
plot(cv.ridge)
coef(cv.ridge, s=cv.ridge$lambda.min)
14 x 1 sparse Matrix of class "dgCMatrix"
s1
(Intercept) -1.0025844
PriorDefault_No 1.8342115
CreditScore 0.3198329
Income 0.2120270
Citizen_p 0.9935918
EducationLevel_x 0.8695703
Married_y -0.2839611
EducationLevel_cc 0.7100560
EducationLevel_ff -0.6761165
Employed_Unemployed -0.7201874
Married_u 0.1999485
EducationLevel_w 0.2572430
Ethnicity_n 1.2953468
Ethnicity_h 0.4203607
cv.lasso <- cv.glmnet(X_Train, Y_Train, family='binomial', alpha=1, parallel=TRUE, standardize=TRUE, type.measure='auc')
plot(cv.lasso)
coef(cv.lasso, s=cv.lasso$lambda.min)
14 x 1 sparse Matrix of class "dgCMatrix"
s1
(Intercept) -0.006191541
PriorDefault_No 3.527374925
CreditScore 0.576209872
Income 2.214498560
Citizen_p 3.067028183
EducationLevel_x 2.000294335
Married_y -2.620905366
EducationLevel_cc 1.542175224
EducationLevel_ff -1.044945386
Employed_Unemployed -0.727078271
Married_u -1.884715933
EducationLevel_w 0.603223164
Ethnicity_n 3.621548274
Ethnicity_h 0.584561098
Coeficiente AUC Ridge
max(cv.ridge$cvm)
[1] 0.9269096
Coeficiente AUC Lasso
max(cv.lasso$cvm)
[1] 0.9256555
max(cv.ridge$cvm) - max(cv.lasso$cvm)
[1] 0.001254151
Ambos valores parecen que dan el mismo resultado, pero Ridge da un ajuste ligeramente mejor.
Se prueba el modelo de regresión logística usando la regularización de Ridge para ver su utilidad:
y_pred <- as.numeric(predict.glmnet(cv.ridge$glmnet.fit, newx=X_Test, s=cv.ridge$lambda.min)>.5)
y_pred
[1] 1 1 0 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[44] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[87] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Ahora, se crea una matriz de confusión para poder comparar el resultado real y el resultado previsto:
conf_matrix <- confusionMatrix(as.factor(Y_Test), as.factor(y_pred), mode="everything", positive = "0")
conf_matrix
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 85 1
1 8 6
Accuracy : 0.91
95% CI : (0.836, 0.958)
No Information Rate : 0.93
P-Value [Acc > NIR] : 0.8380
Kappa : 0.5273
Mcnemar's Test P-Value : 0.0455
Sensitivity : 0.9140
Specificity : 0.8571
Pos Pred Value : 0.9884
Neg Pred Value : 0.4286
Precision : 0.9884
Recall : 0.9140
F1 : 0.9497
Prevalence : 0.9300
Detection Rate : 0.8500
Detection Prevalence : 0.8600
Balanced Accuracy : 0.8856
'Positive' Class : 0
Disponemos de un modelo con una Accuracy del 90%, y Recall del 91,30%, F1 de 94,38% y Precision del 97,67%.
cTab <- table(Y_Test, y_pred) # Confusion Matrix
addmargins(cTab)
y_pred
Y_Test 0 1 Sum
0 85 1 86
1 8 6 14
Sum 93 7 100
En la matriz de confusión solo tuvimos dos falsos positivos de 100 predicciones, 6 se aprobaron correctamente y 84 se denegaron correctamente. También tuvimos 8 falsos negativos.
Variables tienen más influencia en nuestro modelo:
coef(cv.ridge, s=cv.ridge$lambda.min)
14 x 1 sparse Matrix of class "dgCMatrix"
s1
(Intercept) -1.0025844
PriorDefault_No 1.8342115
CreditScore 0.3198329
Income 0.2120270
Citizen_p 0.9935918
EducationLevel_x 0.8695703
Married_y -0.2839611
EducationLevel_cc 0.7100560
EducationLevel_ff -0.6761165
Employed_Unemployed -0.7201874
Married_u 0.1999485
EducationLevel_w 0.2572430
Ethnicity_n 1.2953468
Ethnicity_h 0.4203607
Las variables siguientes se correlacionan positivamente: PriorDefaul_No, Ethnicity_n, Citizen_p. Mientras que tener un “EducationLevel_ff” y estar desempleado (“Employed_Unemployed”) tienen mayor impacto negativo a la hora de aprobar un crédito.
exp(coef(cv.ridge, s=cv.ridge$lambda.min))
14 x 1 Matrix of class "dgeMatrix"
s1
(Intercept) 0.3669299
PriorDefault_No 6.2601958
CreditScore 1.3768976
Income 1.2361812
Citizen_p 2.7009182
EducationLevel_x 2.3858854
Married_y 0.7527960
EducationLevel_cc 2.0341053
EducationLevel_ff 0.5085883
Employed_Unemployed 0.4866611
Married_u 1.2213398
EducationLevel_w 1.2933594
Ethnicity_n 3.6522624
Ethnicity_h 1.5225107
El factor que más influyen es PriorDefault_no aumenta hasta un 753,4% la probalidad de obtener un préstamo, seguidamente se encuentra la variable Ethnicity_n que aumenta un 459,7%. Y las variables que influyen negativamente serían 48,2% (EducationLevel_ff) y 47,2 (Employed_Unemployed).
#6. Si por cada verdadero positivo ganamos 100e y por cada falso positivo perdemos 20e. ¿Qué valor monetario generará el modelo teniendo en cuénta la matriz de confusión del modelo con mayor AUC (con las métricas en test)?
sensibilidad <- round(conf_matrix$byClass["Sensitivity"], 3)
especificidad <- round(conf_matrix$byClass["Specificity"], 3)
rent_esp <- sensibilidad*100 - especificidad*20
rent_esp
Sensitivity
74.26