Using the Universal Bank data, determine the factors which influence whether a customer takes out a loan
CustomerID PersonalLoan Age Experience
Min. : 1 Min. :0.000 Min. :23.00 Min. :-3.0
1st Qu.:1251 1st Qu.:0.000 1st Qu.:35.00 1st Qu.:10.0
Median :2500 Median :0.000 Median :45.00 Median :20.0
Mean :2500 Mean :0.096 Mean :45.34 Mean :20.1
3rd Qu.:3750 3rd Qu.:0.000 3rd Qu.:55.00 3rd Qu.:30.0
Max. :5000 Max. :1.000 Max. :67.00 Max. :43.0
Income ZIP.Code Family CCAvg
Min. : 8.00 Min. : 9307 Min. :1.000 Min. : 0.000
1st Qu.: 39.00 1st Qu.:91911 1st Qu.:1.000 1st Qu.: 0.700
Median : 64.00 Median :93437 Median :2.000 Median : 1.500
Mean : 73.77 Mean :93153 Mean :2.396 Mean : 1.938
3rd Qu.: 98.00 3rd Qu.:94608 3rd Qu.:3.000 3rd Qu.: 2.500
Max. :224.00 Max. :96651 Max. :4.000 Max. :10.000
Education Mortgage SecuritiesAccount CDAccount
Min. :1.000 Min. : 0.0 Min. :0.0000 Min. :0.0000
1st Qu.:1.000 1st Qu.: 0.0 1st Qu.:0.0000 1st Qu.:0.0000
Median :2.000 Median : 0.0 Median :0.0000 Median :0.0000
Mean :1.881 Mean : 56.5 Mean :0.1044 Mean :0.0604
3rd Qu.:3.000 3rd Qu.:101.0 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :3.000 Max. :635.0 Max. :1.0000 Max. :1.0000
Online CreditCard
Min. :0.0000 Min. :0.000
1st Qu.:0.0000 1st Qu.:0.000
Median :1.0000 Median :0.000
Mean :0.5968 Mean :0.294
3rd Qu.:1.0000 3rd Qu.:1.000
Max. :1.0000 Max. :1.000
'data.frame': 5000 obs. of 14 variables:
$ CustomerID : int 1 2 3 4 5 6 7 8 9 10 ...
$ PersonalLoan : int 0 0 0 0 0 0 0 0 0 1 ...
$ Age : int 25 45 39 35 35 37 53 50 35 34 ...
$ Experience : int 1 19 15 9 8 13 27 24 10 9 ...
$ Income : int 49 34 11 100 45 29 72 22 81 180 ...
$ ZIP.Code : int 91107 90089 94720 94112 91330 92121 91711 93943 90089 93023 ...
$ Family : int 4 3 1 1 4 4 2 1 3 1 ...
$ CCAvg : num 1.6 1.5 1 2.7 1 0.4 1.5 0.3 0.6 8.9 ...
$ Education : int 1 1 1 2 2 2 2 3 2 3 ...
$ Mortgage : int 0 0 0 0 0 155 0 0 104 0 ...
$ SecuritiesAccount: int 1 1 0 0 0 0 0 0 0 0 ...
$ CDAccount : int 0 0 0 0 0 0 0 0 0 0 ...
$ Online : int 0 0 0 0 0 1 1 0 1 0 ...
$ CreditCard : int 0 0 0 0 1 0 0 1 0 0 ...
Call:
glm(formula = PersonalLoan ~ ., family = binomial("logit"), data = train[,
-5])
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2181 -0.1924 -0.0753 -0.0285 3.9633
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.012e+01 2.129e+00 -4.755 1.99e-06 ***
Age -1.372e-01 8.091e-02 -1.696 0.08990 .
Experience 1.441e-01 8.035e-02 1.793 0.07293 .
Income 5.355e-02 3.111e-03 17.212 < 2e-16 ***
Family 6.300e-01 8.896e-02 7.082 1.43e-12 ***
CCAvg 1.519e-01 4.864e-02 3.122 0.00179 **
Education 1.876e+00 1.421e-01 13.198 < 2e-16 ***
Mortgage 7.879e-04 6.957e-04 1.132 0.25743
SecuritiesAccount -8.248e-01 3.342e-01 -2.468 0.01358 *
CDAccount 3.779e+00 3.903e-01 9.682 < 2e-16 ***
Online -8.369e-01 1.935e-01 -4.326 1.52e-05 ***
CreditCard -1.124e+00 2.526e-01 -4.450 8.58e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 873.29 on 3531 degrees of freedom
AIC: 897.29
Number of Fisher Scoring iterations: 8
Call:
glm(formula = PersonalLoan ~ ., family = binomial("logit"), data = train[,
-c(2, 5)])
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2302 -0.1955 -0.0741 -0.0284 3.9341
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.357e+01 7.195e-01 -18.863 < 2e-16 ***
Experience 8.572e-03 7.808e-03 1.098 0.27229
Income 5.377e-02 3.105e-03 17.317 < 2e-16 ***
Family 6.280e-01 8.897e-02 7.058 1.69e-12 ***
CCAvg 1.535e-01 4.856e-02 3.161 0.00157 **
Education 1.838e+00 1.404e-01 13.094 < 2e-16 ***
Mortgage 7.454e-04 6.946e-04 1.073 0.28324
SecuritiesAccount -8.227e-01 3.324e-01 -2.475 0.01333 *
CDAccount 3.774e+00 3.892e-01 9.698 < 2e-16 ***
Online -8.263e-01 1.929e-01 -4.284 1.83e-05 ***
CreditCard -1.107e+00 2.518e-01 -4.396 1.10e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 876.34 on 3532 degrees of freedom
AIC: 898.34
Number of Fisher Scoring iterations: 8
Call:
glm(formula = PersonalLoan ~ ., family = binomial("logit"), data = train[,
-c(2, 3, 5, 9)])
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2974 -0.1969 -0.0759 -0.0288 3.9340
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -13.301116 0.679846 -19.565 < 2e-16 ***
Income 0.053953 0.003086 17.481 < 2e-16 ***
Family 0.626284 0.088970 7.039 1.93e-12 ***
CCAvg 0.142326 0.048053 2.962 0.00306 **
Education 1.820624 0.139001 13.098 < 2e-16 ***
SecuritiesAccount -0.822871 0.331392 -2.483 0.01303 *
CDAccount 3.784803 0.389120 9.727 < 2e-16 ***
Online -0.817354 0.192325 -4.250 2.14e-05 ***
CreditCard -1.109583 0.251602 -4.410 1.03e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 878.72 on 3534 degrees of freedom
AIC: 896.72
Number of Fisher Scoring iterations: 8
Analysis of Deviance Table
Model: binomial, link: logit
Response: PersonalLoan
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 3542 2208.49
Income 1 803.89 3541 1404.60 < 2.2e-16 ***
Family 1 136.39 3540 1268.21 < 2.2e-16 ***
CCAvg 1 6.16 3539 1262.05 0.0130814 *
Education 1 263.31 3538 998.73 < 2.2e-16 ***
SecuritiesAccount 1 4.15 3537 994.58 0.0415327 *
CDAccount 1 80.67 3536 913.91 < 2.2e-16 ***
Online 1 13.07 3535 900.83 0.0002997 ***
CreditCard 1 22.12 3534 878.72 2.568e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model 1 Accuracy : 0.9444063
Model 2 Accuracy : 0.9471517
Model 3 Accuracy : 0.9471517
Call:
glm(formula = PersonalLoan ~ ., family = binomial("probit"),
data = train[, -5])
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1902 -0.1957 -0.0486 -0.0077 4.5099
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.1497213 1.0431910 -4.937 7.95e-07 ***
Age -0.0665641 0.0400309 -1.663 0.096349 .
Experience 0.0685902 0.0397951 1.724 0.084783 .
Income 0.0271152 0.0015036 18.034 < 2e-16 ***
Family 0.3128836 0.0448137 6.982 2.91e-12 ***
CCAvg 0.0901009 0.0254491 3.540 0.000399 ***
Education 0.9096374 0.0694957 13.089 < 2e-16 ***
Mortgage 0.0003759 0.0003665 1.026 0.305032
SecuritiesAccount -0.4588586 0.1723938 -2.662 0.007775 **
CDAccount 2.0326038 0.1988450 10.222 < 2e-16 ***
Online -0.4311810 0.0988740 -4.361 1.30e-05 ***
CreditCard -0.6217245 0.1289869 -4.820 1.44e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 888.16 on 3531 degrees of freedom
AIC: 912.16
Number of Fisher Scoring iterations: 8
Call:
glm(formula = PersonalLoan ~ ., family = binomial("probit"),
data = train[, -c(2, 5)])
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2011 -0.1975 -0.0466 -0.0076 4.4574
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.8154342 0.3328028 -20.479 < 2e-16 ***
Experience 0.0027604 0.0040327 0.685 0.493658
Income 0.0272423 0.0015029 18.126 < 2e-16 ***
Family 0.3113072 0.0448858 6.936 4.05e-12 ***
CCAvg 0.0900172 0.0254321 3.540 0.000401 ***
Education 0.8902560 0.0683558 13.024 < 2e-16 ***
Mortgage 0.0003577 0.0003665 0.976 0.329010
SecuritiesAccount -0.4562660 0.1718496 -2.655 0.007930 **
CDAccount 2.0308815 0.1984530 10.234 < 2e-16 ***
Online -0.4282554 0.0986958 -4.339 1.43e-05 ***
CreditCard -0.6135917 0.1285642 -4.773 1.82e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 890.87 on 3532 degrees of freedom
AIC: 912.87
Number of Fisher Scoring iterations: 8
Call:
glm(formula = PersonalLoan ~ ., family = binomial("probit"),
data = train[, -c(2, 3, 5, 9)])
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2561 -0.1980 -0.0484 -0.0078 4.4804
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.731218 0.314365 -21.412 < 2e-16 ***
Income 0.027386 0.001496 18.310 < 2e-16 ***
Family 0.310808 0.044850 6.930 4.21e-12 ***
CCAvg 0.086265 0.025234 3.419 0.000629 ***
Education 0.885387 0.067939 13.032 < 2e-16 ***
SecuritiesAccount -0.456854 0.171773 -2.660 0.007823 **
CDAccount 2.036395 0.198297 10.269 < 2e-16 ***
Online -0.426754 0.098521 -4.332 1.48e-05 ***
CreditCard -0.614486 0.128442 -4.784 1.72e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 892.29 on 3534 degrees of freedom
AIC: 910.29
Number of Fisher Scoring iterations: 8
Analysis of Deviance Table
Model: binomial, link: probit
Response: PersonalLoan
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 3542 2208.49
Income 1 836.88 3541 1371.61 < 2.2e-16 ***
Family 1 110.58 3540 1261.03 < 2.2e-16 ***
CCAvg 1 8.96 3539 1252.07 0.0027525 **
Education 1 230.53 3538 1021.54 < 2.2e-16 ***
SecuritiesAccount 1 5.12 3537 1016.42 0.0236402 *
CDAccount 1 86.24 3536 930.18 < 2.2e-16 ***
Online 1 12.81 3535 917.37 0.0003444 ***
CreditCard 1 25.07 3534 892.29 5.515e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model 1 Accuracy : 0.9361702
Model 2 Accuracy : 0.9341112
Model 3 Accuracy : 0.9368566
Call:
glm(formula = PersonalLoan ~ (Income + Family + CCAvg + Education +
Mortgage + SecuritiesAccount + CDAccount + Online + CreditCard)^2,
family = binomial("probit"), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3445 -0.0416 -0.0001 0.0000 4.0472
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.907e+00 1.323e+00 2.196 0.02807 *
Income -7.429e-02 1.113e-02 -6.672 2.52e-11 ***
Family -1.254e+00 3.683e-01 -3.406 0.00066 ***
CCAvg 7.311e-01 2.695e-01 2.713 0.00667 **
Education -4.988e+00 7.170e-01 -6.956 3.50e-12 ***
Mortgage -5.121e-04 4.142e-03 -0.124 0.90160
SecuritiesAccount -1.098e+00 2.329e+00 -0.472 0.63711
CDAccount -3.074e+00 2.466e+01 -0.125 0.90078
Online 8.387e-01 8.788e-01 0.954 0.33990
CreditCard 2.397e-01 1.230e+00 0.195 0.84544
Income:Family 2.168e-02 3.182e-03 6.812 9.63e-12 ***
Income:CCAvg -7.806e-03 1.428e-03 -5.466 4.60e-08 ***
Income:Education 5.661e-02 5.953e-03 9.510 < 2e-16 ***
Income:Mortgage -4.096e-06 2.015e-05 -0.203 0.83891
Income:SecuritiesAccount 1.922e-02 1.423e-02 1.350 0.17704
Income:CDAccount 9.702e-03 1.329e-02 0.730 0.46528
Income:Online 8.413e-05 5.403e-03 0.016 0.98758
Income:CreditCard -4.249e-03 8.317e-03 -0.511 0.60942
Family:CCAvg 8.091e-02 5.162e-02 1.568 0.11698
Family:Education -4.181e-01 9.711e-02 -4.306 1.66e-05 ***
Family:Mortgage 8.849e-04 6.708e-04 1.319 0.18707
Family:SecuritiesAccount -9.046e-01 4.388e-01 -2.062 0.03923 *
Family:CDAccount 5.192e-01 3.925e-01 1.323 0.18588
Family:Online -2.420e-01 1.659e-01 -1.459 0.14465
Family:CreditCard -1.518e-01 2.120e-01 -0.716 0.47412
CCAvg:Education 2.726e-01 6.742e-02 4.044 5.26e-05 ***
CCAvg:Mortgage -6.726e-05 3.519e-04 -0.191 0.84841
CCAvg:SecuritiesAccount -6.553e-03 1.967e-01 -0.033 0.97343
CCAvg:CDAccount -2.200e-01 2.141e-01 -1.027 0.30423
CCAvg:Online -1.659e-01 1.034e-01 -1.605 0.10843
CCAvg:CreditCard 6.944e-02 1.345e-01 0.516 0.60573
Education:Mortgage -4.539e-04 9.182e-04 -0.494 0.62106
Education:SecuritiesAccount 1.130e-01 4.775e-01 0.237 0.81296
Education:CDAccount -6.369e-01 5.396e-01 -1.180 0.23781
Education:Online 3.504e-02 2.245e-01 0.156 0.87599
Education:CreditCard 1.341e-01 2.971e-01 0.451 0.65183
Mortgage:SecuritiesAccount 1.271e-03 2.933e-03 0.433 0.66472
Mortgage:CDAccount 6.270e-04 3.610e-03 0.174 0.86209
Mortgage:Online -2.166e-04 1.420e-03 -0.153 0.87871
Mortgage:CreditCard -1.788e-03 2.362e-03 -0.757 0.44911
SecuritiesAccount:CDAccount 5.886e+00 2.452e+01 0.240 0.81028
SecuritiesAccount:Online -4.477e+00 2.442e+01 -0.183 0.85452
SecuritiesAccount:CreditCard -1.745e+00 2.288e+00 -0.763 0.44568
CDAccount:Online 4.128e+00 2.443e+01 0.169 0.86583
CDAccount:CreditCard 1.631e+00 2.489e+00 0.655 0.51216
Online:CreditCard -2.139e+00 1.018e+00 -2.102 0.03555 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.5 on 3542 degrees of freedom
Residual deviance: 354.9 on 3497 degrees of freedom
AIC: 446.9
Number of Fisher Scoring iterations: 16
Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education +
Mortgage + SecuritiesAccount + Online + CreditCard + (Income *
Family) + (Income * CCAvg) + (Income + Education) + (Family +
Education) + (Family * SecuritiesAccount) + (CCAvg * Education) +
(Online * CreditCard), family = binomial("probit"), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2478 -0.1585 -0.0172 -0.0003 3.8019
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.1131527 0.6624813 -7.718 1.18e-14 ***
Income 0.0165651 0.0041332 4.008 6.13e-05 ***
Family -1.1006093 0.1797715 -6.122 9.23e-10 ***
CCAvg 0.5371912 0.1468435 3.658 0.000254 ***
Education 0.2644742 0.1356117 1.950 0.051149 .
Mortgage -0.0002530 0.0003996 -0.633 0.526722
SecuritiesAccount 0.9596056 0.3375711 2.843 0.004474 **
Online -0.0804624 0.1218678 -0.660 0.509098
CreditCard -0.0652945 0.1657301 -0.394 0.693595
Income:Family 0.0135094 0.0015871 8.512 < 2e-16 ***
Income:CCAvg -0.0055317 0.0008152 -6.785 1.16e-11 ***
Family:SecuritiesAccount -0.3264840 0.1537070 -2.124 0.033665 *
CCAvg:Education 0.2228096 0.0363826 6.124 9.12e-10 ***
Online:CreditCard 0.0423168 0.2274705 0.186 0.852420
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 740.84 on 3529 degrees of freedom
AIC: 768.84
Number of Fisher Scoring iterations: 9
Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education +
(Income * Family) + (Income * CCAvg) + (CCAvg * Education),
family = binomial("probit"), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1961 -0.1596 -0.0183 -0.0003 3.7536
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.0038181 0.6426583 -7.786 6.91e-15 ***
Income 0.0162185 0.0040693 3.986 6.73e-05 ***
Family -1.1420517 0.1765918 -6.467 9.98e-11 ***
CCAvg 0.5510001 0.1450108 3.800 0.000145 ***
Education 0.2526840 0.1336823 1.890 0.058734 .
Income:Family 0.0134833 0.0015653 8.614 < 2e-16 ***
Income:CCAvg -0.0055975 0.0008087 -6.921 4.48e-12 ***
CCAvg:Education 0.2226771 0.0358823 6.206 5.44e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 750.57 on 3535 degrees of freedom
AIC: 766.57
Number of Fisher Scoring iterations: 9
Model 1 Accuracy : 0.9677419
Model 2 Accuracy : 0.9464653
Model 3 Accuracy : 0.9361702
Call:
glm(formula = PersonalLoan ~ (Income + Family + CCAvg + Education +
Mortgage + SecuritiesAccount + CDAccount + Online + CreditCard)^2,
family = binomial("logit"), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5005 -0.0689 -0.0049 -0.0001 3.7992
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.326e+00 2.845e+00 2.224 0.026172 *
Income -1.536e-01 2.378e-02 -6.457 1.07e-10 ***
Family -2.520e+00 7.817e-01 -3.224 0.001265 **
CCAvg 1.469e+00 5.449e-01 2.696 0.007019 **
Education -9.796e+00 1.528e+00 -6.409 1.46e-10 ***
Mortgage -1.501e-03 9.131e-03 -0.164 0.869445
SecuritiesAccount -3.101e+00 4.819e+00 -0.644 0.519876
CDAccount -4.888e+00 2.333e+01 -0.209 0.834073
Online 1.594e+00 1.889e+00 0.844 0.398660
CreditCard 8.392e-01 2.523e+00 0.333 0.739424
Income:Family 4.497e-02 6.892e-03 6.526 6.76e-11 ***
Income:CCAvg -1.508e-02 2.896e-03 -5.208 1.91e-07 ***
Income:Education 1.131e-01 1.307e-02 8.652 < 2e-16 ***
Income:Mortgage -1.313e-05 4.485e-05 -0.293 0.769651
Income:SecuritiesAccount 4.129e-02 2.847e-02 1.450 0.147023
Income:CDAccount 8.643e-03 2.596e-02 0.333 0.739201
Income:Online 3.639e-03 1.139e-02 0.320 0.749241
Income:CreditCard -5.140e-03 1.676e-02 -0.307 0.759129
Family:CCAvg 1.332e-01 1.019e-01 1.306 0.191480
Family:Education -8.974e-01 1.927e-01 -4.656 3.22e-06 ***
Family:Mortgage 1.782e-03 1.356e-03 1.315 0.188579
Family:SecuritiesAccount -1.611e+00 8.681e-01 -1.856 0.063449 .
Family:CDAccount 1.049e+00 7.650e-01 1.371 0.170223
Family:Online -5.588e-01 3.335e-01 -1.676 0.093803 .
Family:CreditCard -4.131e-01 4.227e-01 -0.977 0.328469
CCAvg:Education 4.966e-01 1.308e-01 3.796 0.000147 ***
CCAvg:Mortgage -1.873e-05 6.961e-04 -0.027 0.978535
CCAvg:SecuritiesAccount 1.867e-02 3.785e-01 0.049 0.960655
CCAvg:CDAccount -4.579e-01 4.074e-01 -1.124 0.260954
CCAvg:Online -2.965e-01 2.060e-01 -1.440 0.149976
CCAvg:CreditCard 1.068e-01 2.615e-01 0.408 0.683151
Education:Mortgage -4.146e-04 1.787e-03 -0.232 0.816584
Education:SecuritiesAccount 3.224e-01 9.382e-01 0.344 0.731134
Education:CDAccount -1.106e+00 1.011e+00 -1.094 0.274054
Education:Online -1.099e-01 4.337e-01 -0.253 0.799993
Education:CreditCard 7.417e-02 5.814e-01 0.128 0.898484
Mortgage:SecuritiesAccount 2.941e-03 5.572e-03 0.528 0.597546
Mortgage:CDAccount 1.463e-03 7.098e-03 0.206 0.836696
Mortgage:Online -3.771e-04 2.837e-03 -0.133 0.894246
Mortgage:CreditCard -3.576e-03 4.940e-03 -0.724 0.469155
SecuritiesAccount:CDAccount 1.108e+01 2.272e+01 0.488 0.625705
SecuritiesAccount:Online -8.727e+00 2.209e+01 -0.395 0.692796
SecuritiesAccount:CreditCard -3.188e+00 5.395e+00 -0.591 0.554570
CDAccount:Online 8.073e+00 2.215e+01 0.365 0.715476
CDAccount:CreditCard 2.618e+00 5.697e+00 0.459 0.645912
Online:CreditCard -3.940e+00 2.017e+00 -1.953 0.050842 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.5 on 3542 degrees of freedom
Residual deviance: 351.9 on 3497 degrees of freedom
AIC: 443.9
Number of Fisher Scoring iterations: 13
Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education +
Mortgage + SecuritiesAccount + Online + CreditCard + (Income *
Family) + (Income * CCAvg) + (Income + Education) + (Family +
Education) + (Family * SecuritiesAccount) + (CCAvg * Education) +
(Online * CreditCard), family = binomial("logit"), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3243 -0.1662 -0.0428 -0.0080 3.5310
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.064e+01 1.338e+00 -7.955 1.79e-15 ***
Income 3.625e-02 8.086e-03 4.482 7.38e-06 ***
Family -1.958e+00 3.589e-01 -5.455 4.90e-08 ***
CCAvg 1.118e+00 2.924e-01 3.822 0.000133 ***
Education 7.780e-01 2.638e-01 2.949 0.003185 **
Mortgage -5.189e-04 7.544e-04 -0.688 0.491569
SecuritiesAccount 1.825e+00 6.483e-01 2.815 0.004877 **
Online -1.685e-01 2.330e-01 -0.723 0.469614
CreditCard -1.414e-01 3.111e-01 -0.454 0.649540
Income:Family 2.417e-02 3.121e-03 7.745 9.58e-15 ***
Income:CCAvg -1.042e-02 1.578e-03 -6.604 3.99e-11 ***
Family:SecuritiesAccount -6.058e-01 2.932e-01 -2.066 0.038811 *
CCAvg:Education 3.539e-01 6.996e-02 5.058 4.23e-07 ***
Online:CreditCard 5.752e-02 4.386e-01 0.131 0.895644
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 734.83 on 3529 degrees of freedom
AIC: 762.83
Number of Fisher Scoring iterations: 8
Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education +
(Income * Family) + (Income * CCAvg) + (CCAvg * Education),
family = binomial("logit"), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2586 -0.1672 -0.0458 -0.0086 3.5074
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.445839 1.299923 -8.036 9.30e-16 ***
Income 0.035434 0.007976 4.443 8.88e-06 ***
Family -2.018917 0.352261 -5.731 9.97e-09 ***
CCAvg 1.127809 0.288996 3.903 9.52e-05 ***
Education 0.762329 0.259703 2.935 0.00333 **
Income:Family 0.024012 0.003072 7.816 5.44e-15 ***
Income:CCAvg -0.010422 0.001562 -6.672 2.52e-11 ***
CCAvg:Education 0.352559 0.068912 5.116 3.12e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2208.49 on 3542 degrees of freedom
Residual deviance: 744.32 on 3535 degrees of freedom
AIC: 760.32
Number of Fisher Scoring iterations: 8
Model 1 Accuracy : 0.9752917
Model 2 Accuracy : 0.949897
Model 3 Accuracy : 0.9492107
Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education +
(Income * Family) + (Income * CCAvg) + (CCAvg * Education),
family = binomial("logit"), data = loandata)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3559 -0.1737 -0.0485 -0.0092 3.4737
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.856679 1.062838 -9.274 < 2e-16 ***
Income 0.031210 0.006652 4.692 2.70e-06 ***
Family -2.213478 0.295643 -7.487 7.05e-14 ***
CCAvg 1.122034 0.238552 4.704 2.56e-06 ***
Education 0.675988 0.213177 3.171 0.00152 **
Income:Family 0.026453 0.002615 10.116 < 2e-16 ***
Income:CCAvg -0.010654 0.001311 -8.125 4.48e-16 ***
CCAvg:Education 0.360433 0.056897 6.335 2.38e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3162.0 on 4999 degrees of freedom
Residual deviance: 1083.7 on 4992 degrees of freedom
AIC: 1099.7
Number of Fisher Scoring iterations: 8
hidden: 3 thresh: 0.1 rep: 1/1 steps: 5411 error: 24.62304 time: 8.42 secs
1
error 24.623039043850
reached.threshold 0.075844597757
steps 5411.000000000000
Intercept.to.1layhid1 0.918161267379
Income.to.1layhid1 3.680179680571
Family.to.1layhid1 2.921263944181
CCAvg.to.1layhid1 2.922362929115
Education.to.1layhid1 3.282033655071
Intercept.to.1layhid2 5.415241402185
Income.to.1layhid2 -0.046654895118
Family.to.1layhid2 0.042598358687
CCAvg.to.1layhid2 -0.287273002051
Education.to.1layhid2 0.218828126997
Intercept.to.1layhid3 15.007094442606
Income.to.1layhid3 -0.009632312168
Family.to.1layhid3 -3.133100757383
CCAvg.to.1layhid3 0.253714105793
Education.to.1layhid3 -10.446531750019
Intercept.to.PersonalLoan 4.002556055111
1layhid.1.to.PersonalLoan 3.239241576689
1layhid.2.to.PersonalLoan -13.402393937834
1layhid.3.to.PersonalLoan -347.486401505084
Accuracy : 0.9835277968
1
error 24.623039043850
reached.threshold 0.075844597757
steps 5411.000000000000
Intercept.to.1layhid1 0.918161267379
Income.to.1layhid1 3.680179680571
Family.to.1layhid1 2.921263944181
CCAvg.to.1layhid1 2.922362929115
Education.to.1layhid1 3.282033655071
Intercept.to.1layhid2 5.415241402185
Income.to.1layhid2 -0.046654895118
Family.to.1layhid2 0.042598358687
CCAvg.to.1layhid2 -0.287273002051
Education.to.1layhid2 0.218828126997
Intercept.to.1layhid3 15.007094442606
Income.to.1layhid3 -0.009632312168
Family.to.1layhid3 -3.133100757383
CCAvg.to.1layhid3 0.253714105793
Education.to.1layhid3 -10.446531750019
Intercept.to.PersonalLoan 4.002556055111
1layhid.1.to.PersonalLoan 3.239241576689
1layhid.2.to.PersonalLoan -13.402393937834
1layhid.3.to.PersonalLoan -347.486401505084
---
title: "Loan Analysis"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
self_contained: true
social : ["twitter","facebook","menu"]
source_code : embed
---
```{r load-packages}
library(neuralnet)
library(rbokeh)
library(psych)
```
```{r load-dataset}
loandata <- read.csv("C:\\Users\\teeja\\Desktop\\CleanGithubProjects\\scm651\\hw4_gb3\\scm651_homework_4_universal_bank.csv", na.strings =c(""))
#head(loandata)
```
# Data Decription
## {.tabset}
### Columns Descrption
#### Background
Using the Universal Bank data, determine the factors which influence whether a customer
takes out a loan
#### Universal Bank Data Fields
1. ID : unique identifier
1. Personal Loan : did the customer accept the personal load offered (1=Yes, 0=No)
1. Age : customer’s age
1. Experience : number of years of profession experience
1. Income : annual income of the customer ($000)
1. Zip code: home address zip code
1. Family : family size of customer
1. CCAvg : average spending on credit cards per month ($000)
1. Education: education level (1) undergraduate, (2) graduate, (3) advanced/professional
1. Mortgage : value of house mortgage ($000)
1. Securities : does the customer have a securities account with the bank? (1=Yes, 0=No)
1. CDAccount : does the customer have a certificate of deposit with the bank? (1=Yes, 0=No)
1. Online : does the customer use Internet banking facilities (1=Yes, 0=No)
1. CreditCard : does the customer use a credit card issued by Universal Bank? (1=Yes, 0=No)
### Summary
```{r summary-data}
summary(loandata)
```
### Data Structure
```{r}
str(loandata)
```
```{r}
### Data Transformation
#loandata$CustomerID < as.factor(loandata$CustomerID)
originaldata <- loandata
# # tranform loan data
# loandata$PersonalLoan <- as.factor(loandata$PersonalLoan)
# loandata$Income <- as.numeric(loandata$Income)
# loandata$ZIP.Code <- as.factor(loandata$ZIP.Code)
# loandata$Education <- as.ordered(loandata$Education)
# loandata$Mortgage <- as.numeric(loandata$Mortgage)
# loandata$SecuritiesAccount <- as.factor(loandata$SecuritiesAccount)
# loandata$CDAccount <- as.factor(loandata$CDAccount)
# loandata$Online <- as.factor(loandata$Online)
# loandata$CreditCard <- as.factor(loandata$CreditCard)
#
# str(loandata)
```
### Scatter Plot
```{r}
p = loandata %>% figure(title = "Scatter Plot", width = 1000) %>% ly_points(x=Experience, y=Income, size = CCAvg, color = PersonalLoan,
hover = list(ZIP.Code, CDAccount,Online, PersonalLoan) )
p
```
### Scater Matrix
```{r}
pairs.panels(loandata[,-c(1,6)])
```
### Age and Experience
```{r}
p =loandata[,c(3,4)] %>% figure(title = "Age and Experience") %>%
ly_points(Age,Experience, hover = list(Age, Experience))
p
```
# Logit
## {.tabset}
### Model 1
```{r split-data}
ind <- sample(2, nrow(loandata), replace = TRUE, prob = c(0.7, 0.3))
train <- loandata[ind==1,]
test <- loandata[ind==2,]
# remove customer ID
train <- train[,-1]
test <- test[,-1]
# str(train)
#str(test)
#dim(train)
#dim(test)
```
```{r}
#remove zip code
model1 <- glm(PersonalLoan ~ .,family = binomial("logit"), data= train[,-5])
summary(model1)
```
### Model 2
```{r}
#remove zip code
model2 <- glm(PersonalLoan ~ .,family = binomial("logit"), data= train[, -c(2,5)])
summary(model2)
```
### Model 3
```{r}
#remove zip code, Age, Experience
model3 <- glm(PersonalLoan ~ .,family = binomial("logit"), data= train[,-c(2,3,5,9)])
summary(model3)
```
### ANOVA
```{r}
anova(model3, test="Chisq")
```
### Model Accuracy
```{r}
prediction <- predict.glm(model3, newdata = test[,-c(1,2,3,5,9)])
prediction <- ifelse(prediction > 0.5, 1, 0)
error <- mean(prediction != test$PersonalLoan)
prediction2 <- predict.glm(model2, newdata = test[,-c(1,2,5)])
prediction2 <- ifelse(prediction2 > 0.5, 1, 0)
error2 <- mean(prediction2 != test$PersonalLoan)
prediction1 <- predict.glm(model1, newdata = test[,-c(1,5)])
prediction1 <- ifelse(prediction1 > 0.5, 1, 0)
error1 <- mean(prediction1 != test$PersonalLoan)
cat("Model 1 Accuracy : ", 1-error1, '\n\n')
cat("Model 2 Accuracy : ", 1-error2, '\n\n')
cat("Model 3 Accuracy : ", 1-error, '\n')
```
# Probit
## {.tabset}
### Model 1
```{r}
#remove zip code
model1 <- glm(PersonalLoan ~ .,family = binomial("probit"), data= train[,-5])
summary(model1)
```
### Model 2
```{r}
#remove zip code
model2 <- glm(PersonalLoan ~ .,family = binomial("probit"), data= train[, -c(2,5)])
summary(model2)
```
### Model 3
```{r}
#remove zip code, Age, Experience
model3 <- glm(PersonalLoan ~ .,family = binomial("probit"), data= train[,-c(2,3,5,9)])
summary(model3)
```
### ANOVA
```{r}
anova(model3, test="Chisq")
```
### Model Accuracy
```{r}
prediction <- predict.glm(model3, newdata = test[,-c(1,2,3,5,9)])
prediction <- ifelse(prediction > 0.5, 1, 0)
error <- mean(prediction != test$PersonalLoan)
prediction2 <- predict.glm(model2, newdata = test[,-c(1,2,5)])
prediction2 <- ifelse(prediction2 > 0.5, 1, 0)
error2 <- mean(prediction2 != test$PersonalLoan)
prediction1 <- predict.glm(model1, newdata = test[,-c(1,5)])
prediction1 <- ifelse(prediction1 > 0.5, 1, 0)
error1 <- mean(prediction1 != test$PersonalLoan)
cat("Model 1 Accuracy : ", 1-error1, '\n\n')
cat("Model 2 Accuracy : ", 1-error2, '\n\n')
cat("Model 3 Accuracy : ", 1-error, '\n')
```
# Moderating Probit
##{.tabset}
### Probit Model 1
```{r}
l1_model <- glm(PersonalLoan ~ (Income + Family + CCAvg + Education + Mortgage + SecuritiesAccount + CDAccount + Online + CreditCard)^2 ,
data = train, family = binomial("probit"))
summary(l1_model)
```
### Probit Model 2
```{r}
l2_model <- glm(PersonalLoan ~ Income + Family + CCAvg + Education + Mortgage + SecuritiesAccount + Online + CreditCard + (Income * Family) +
(Income * CCAvg) + (Income + Education) + (Family + Education) + (Family * SecuritiesAccount) + (CCAvg * Education) +
(Online * CreditCard),
data = train, family = binomial("probit"))
summary(l2_model)
```
### Probit Model 3
```{r}
l3_model <- glm(PersonalLoan ~ Income + Family + CCAvg + Education + (Income * Family) +
(Income * CCAvg) + (CCAvg * Education) ,
data = train, family = binomial("probit"))
summary(l3_model)
```
### Moderating Accuracy
```{r}
#model 1
prediction <- predict.glm(l1_model, newdata = test[,-c(1,5)])
prediction <- ifelse(prediction > 0.5, 1, 0)
error <- mean(prediction != test$PersonalLoan)
# model 2
prediction2 <- predict.glm(l2_model, newdata = test[,c("Income", "Family", "CCAvg", "Education", "Mortgage", "SecuritiesAccount","Online", "CreditCard" )])
prediction2 <- ifelse(prediction2 > 0.5, 1, 0)
error2 <- mean(prediction2 != test$PersonalLoan)
prediction3 <- predict.glm(l3_model, newdata = test[,c("Income","Family", "CCAvg", "Education" )])
prediction3 <- ifelse(prediction3 > 0.5, 1, 0)
error3 <- mean(prediction1 != test$PersonalLoan)
cat("Model 1 Accuracy : ", 1-error, '\n\n')
cat("Model 2 Accuracy : ", 1-error2, '\n\n')
cat("Model 3 Accuracy : ", 1-error3, '\n')
```
# Moderating Logit
##{.tabset}
### Logit Model 1
```{r}
l1_model <- glm(PersonalLoan ~ (Income + Family + CCAvg + Education + Mortgage + SecuritiesAccount + CDAccount + Online + CreditCard)^2 ,
data = train, family = binomial("logit"))
summary(l1_model)
```
### Logit Model 2
```{r}
l2_model <- glm(PersonalLoan ~ Income + Family + CCAvg + Education + Mortgage + SecuritiesAccount + Online + CreditCard + (Income * Family) +
(Income * CCAvg) + (Income + Education) + (Family + Education) + (Family * SecuritiesAccount) + (CCAvg * Education) +
(Online * CreditCard),
data = train, family = binomial("logit"))
summary(l2_model)
```
### Logit Model 3
```{r}
l3_model <- glm(PersonalLoan ~ Income + Family + CCAvg + Education + (Income * Family) +
(Income * CCAvg) + (CCAvg * Education) ,
data = train, family = binomial("logit"))
summary(l3_model)
```
### Moderating Accuracy
```{r}
#model 1
prediction <- predict.glm(l1_model, newdata = test[,-c(1,5)])
prediction <- ifelse(prediction > 0.5, 1, 0)
error <- mean(prediction != test$PersonalLoan)
# model 2
prediction2 <- predict.glm(l2_model, newdata = test[,c("Income", "Family", "CCAvg", "Education", "Mortgage", "SecuritiesAccount","Online", "CreditCard" )])
prediction2 <- ifelse(prediction2 > 0.5, 1, 0)
error2 <- mean(prediction2 != test$PersonalLoan)
prediction3 <- predict.glm(l3_model, newdata = test[,c("Income","Family", "CCAvg", "Education" )])
prediction3 <- ifelse(prediction3 > 0.5, 1, 0)
error3 <- mean(prediction3 != test$PersonalLoan)
cat("Model 1 Accuracy : ", 1-error, '\n\n')
cat("Model 2 Accuracy : ", 1-error2, '\n\n')
cat("Model 3 Accuracy : ", 1-error3, '\n')
```
# Final Model
##
### Model
```{r}
model <- glm(PersonalLoan ~ Income + Family + CCAvg + Education + (Income * Family) +
(Income * CCAvg) + (CCAvg * Education) ,
data = loandata, family = binomial("logit"))
summary(model)
```
# Neural Network
## {.tabset}
### Mode
```{r}
model1_nn <- neuralnet(PersonalLoan ~ Income + Family + CCAvg + Education , hidden = 3,
lifesign = "minimal",linear.output = FALSE,threshold = 0.1 , data = train )
```
### Neural Networks
```{r}
model1_nn$result.matrix
plot(model1_nn)
```
### Accuracy
```{r}
test_result <- compute(model1_nn,test[,c("Income","Family", "CCAvg", "Education" )] )
prediction <- test_result$net.result
prediction3 <- ifelse(prediction > 0.5, 1, 0)
error <- mean(prediction3 != test$PersonalLoan)
cat("Accuracy : ", 1-error)
```
### Result Matrix
```{r}
model1_nn$result.matrix
```
# Analysis in Excell
#### [Analysis in Excel](https://sumailsyr-my.sharepoint.com/:x:/g/personal/toabdula_syr_edu/Ed793jjWAg1NrkXHHfbzA90BewwooVhx_3kTZzHe3BU7lA?e=MqZmqz)