Data Decription

Columns Descrption

Background

Using the Universal Bank data, determine the factors which influence whether a customer takes out a loan

Universal Bank Data Fields

  1. ID : unique identifier
  2. Personal Loan : did the customer accept the personal load offered (1=Yes, 0=No)
  3. Age : customer’s age
  4. Experience : number of years of profession experience
  5. Income : annual income of the customer ($000)
  6. Zip code: home address zip code
  7. Family : family size of customer
  8. CCAvg : average spending on credit cards per month ($000)
  9. Education: education level (1) undergraduate, (2) graduate, (3) advanced/professional
  10. Mortgage : value of house mortgage ($000)
  11. Securities : does the customer have a securities account with the bank? (1=Yes, 0=No)
  12. CDAccount : does the customer have a certificate of deposit with the bank? (1=Yes, 0=No)
  13. Online : does the customer use Internet banking facilities (1=Yes, 0=No)
  14. CreditCard : does the customer use a credit card issued by Universal Bank? (1=Yes, 0=No)

Summary

   CustomerID    PersonalLoan        Age          Experience  
 Min.   :   1   Min.   :0.000   Min.   :23.00   Min.   :-3.0  
 1st Qu.:1251   1st Qu.:0.000   1st Qu.:35.00   1st Qu.:10.0  
 Median :2500   Median :0.000   Median :45.00   Median :20.0  
 Mean   :2500   Mean   :0.096   Mean   :45.34   Mean   :20.1  
 3rd Qu.:3750   3rd Qu.:0.000   3rd Qu.:55.00   3rd Qu.:30.0  
 Max.   :5000   Max.   :1.000   Max.   :67.00   Max.   :43.0  
     Income          ZIP.Code         Family          CCAvg       
 Min.   :  8.00   Min.   : 9307   Min.   :1.000   Min.   : 0.000  
 1st Qu.: 39.00   1st Qu.:91911   1st Qu.:1.000   1st Qu.: 0.700  
 Median : 64.00   Median :93437   Median :2.000   Median : 1.500  
 Mean   : 73.77   Mean   :93153   Mean   :2.396   Mean   : 1.938  
 3rd Qu.: 98.00   3rd Qu.:94608   3rd Qu.:3.000   3rd Qu.: 2.500  
 Max.   :224.00   Max.   :96651   Max.   :4.000   Max.   :10.000  
   Education        Mortgage     SecuritiesAccount   CDAccount     
 Min.   :1.000   Min.   :  0.0   Min.   :0.0000    Min.   :0.0000  
 1st Qu.:1.000   1st Qu.:  0.0   1st Qu.:0.0000    1st Qu.:0.0000  
 Median :2.000   Median :  0.0   Median :0.0000    Median :0.0000  
 Mean   :1.881   Mean   : 56.5   Mean   :0.1044    Mean   :0.0604  
 3rd Qu.:3.000   3rd Qu.:101.0   3rd Qu.:0.0000    3rd Qu.:0.0000  
 Max.   :3.000   Max.   :635.0   Max.   :1.0000    Max.   :1.0000  
     Online         CreditCard   
 Min.   :0.0000   Min.   :0.000  
 1st Qu.:0.0000   1st Qu.:0.000  
 Median :1.0000   Median :0.000  
 Mean   :0.5968   Mean   :0.294  
 3rd Qu.:1.0000   3rd Qu.:1.000  
 Max.   :1.0000   Max.   :1.000  

Data Structure

'data.frame':   5000 obs. of  14 variables:
 $ CustomerID       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ PersonalLoan     : int  0 0 0 0 0 0 0 0 0 1 ...
 $ Age              : int  25 45 39 35 35 37 53 50 35 34 ...
 $ Experience       : int  1 19 15 9 8 13 27 24 10 9 ...
 $ Income           : int  49 34 11 100 45 29 72 22 81 180 ...
 $ ZIP.Code         : int  91107 90089 94720 94112 91330 92121 91711 93943 90089 93023 ...
 $ Family           : int  4 3 1 1 4 4 2 1 3 1 ...
 $ CCAvg            : num  1.6 1.5 1 2.7 1 0.4 1.5 0.3 0.6 8.9 ...
 $ Education        : int  1 1 1 2 2 2 2 3 2 3 ...
 $ Mortgage         : int  0 0 0 0 0 155 0 0 104 0 ...
 $ SecuritiesAccount: int  1 1 0 0 0 0 0 0 0 0 ...
 $ CDAccount        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Online           : int  0 0 0 0 0 1 1 0 1 0 ...
 $ CreditCard       : int  0 0 0 0 1 0 0 1 0 0 ...

Scatter Plot

Scater Matrix

Age and Experience

Logit

Model 1


Call:
glm(formula = PersonalLoan ~ ., family = binomial("logit"), data = train[, 
    -5])

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2181  -0.1924  -0.0753  -0.0285   3.9633  

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -1.012e+01  2.129e+00  -4.755 1.99e-06 ***
Age               -1.372e-01  8.091e-02  -1.696  0.08990 .  
Experience         1.441e-01  8.035e-02   1.793  0.07293 .  
Income             5.355e-02  3.111e-03  17.212  < 2e-16 ***
Family             6.300e-01  8.896e-02   7.082 1.43e-12 ***
CCAvg              1.519e-01  4.864e-02   3.122  0.00179 ** 
Education          1.876e+00  1.421e-01  13.198  < 2e-16 ***
Mortgage           7.879e-04  6.957e-04   1.132  0.25743    
SecuritiesAccount -8.248e-01  3.342e-01  -2.468  0.01358 *  
CDAccount          3.779e+00  3.903e-01   9.682  < 2e-16 ***
Online            -8.369e-01  1.935e-01  -4.326 1.52e-05 ***
CreditCard        -1.124e+00  2.526e-01  -4.450 8.58e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  873.29  on 3531  degrees of freedom
AIC: 897.29

Number of Fisher Scoring iterations: 8

Model 2


Call:
glm(formula = PersonalLoan ~ ., family = binomial("logit"), data = train[, 
    -c(2, 5)])

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2302  -0.1955  -0.0741  -0.0284   3.9341  

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -1.357e+01  7.195e-01 -18.863  < 2e-16 ***
Experience         8.572e-03  7.808e-03   1.098  0.27229    
Income             5.377e-02  3.105e-03  17.317  < 2e-16 ***
Family             6.280e-01  8.897e-02   7.058 1.69e-12 ***
CCAvg              1.535e-01  4.856e-02   3.161  0.00157 ** 
Education          1.838e+00  1.404e-01  13.094  < 2e-16 ***
Mortgage           7.454e-04  6.946e-04   1.073  0.28324    
SecuritiesAccount -8.227e-01  3.324e-01  -2.475  0.01333 *  
CDAccount          3.774e+00  3.892e-01   9.698  < 2e-16 ***
Online            -8.263e-01  1.929e-01  -4.284 1.83e-05 ***
CreditCard        -1.107e+00  2.518e-01  -4.396 1.10e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  876.34  on 3532  degrees of freedom
AIC: 898.34

Number of Fisher Scoring iterations: 8

Model 3


Call:
glm(formula = PersonalLoan ~ ., family = binomial("logit"), data = train[, 
    -c(2, 3, 5, 9)])

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2974  -0.1969  -0.0759  -0.0288   3.9340  

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -13.301116   0.679846 -19.565  < 2e-16 ***
Income              0.053953   0.003086  17.481  < 2e-16 ***
Family              0.626284   0.088970   7.039 1.93e-12 ***
CCAvg               0.142326   0.048053   2.962  0.00306 ** 
Education           1.820624   0.139001  13.098  < 2e-16 ***
SecuritiesAccount  -0.822871   0.331392  -2.483  0.01303 *  
CDAccount           3.784803   0.389120   9.727  < 2e-16 ***
Online             -0.817354   0.192325  -4.250 2.14e-05 ***
CreditCard         -1.109583   0.251602  -4.410 1.03e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  878.72  on 3534  degrees of freedom
AIC: 896.72

Number of Fisher Scoring iterations: 8

ANOVA

Analysis of Deviance Table

Model: binomial, link: logit

Response: PersonalLoan

Terms added sequentially (first to last)


                  Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                               3542    2208.49              
Income             1   803.89      3541    1404.60 < 2.2e-16 ***
Family             1   136.39      3540    1268.21 < 2.2e-16 ***
CCAvg              1     6.16      3539    1262.05 0.0130814 *  
Education          1   263.31      3538     998.73 < 2.2e-16 ***
SecuritiesAccount  1     4.15      3537     994.58 0.0415327 *  
CDAccount          1    80.67      3536     913.91 < 2.2e-16 ***
Online             1    13.07      3535     900.83 0.0002997 ***
CreditCard         1    22.12      3534     878.72 2.568e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model Accuracy

Model 1 Accuracy :  0.9444063 
Model 2 Accuracy :  0.9471517 
Model 3 Accuracy :  0.9471517 

Probit

Model 1


Call:
glm(formula = PersonalLoan ~ ., family = binomial("probit"), 
    data = train[, -5])

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1902  -0.1957  -0.0486  -0.0077   4.5099  

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -5.1497213  1.0431910  -4.937 7.95e-07 ***
Age               -0.0665641  0.0400309  -1.663 0.096349 .  
Experience         0.0685902  0.0397951   1.724 0.084783 .  
Income             0.0271152  0.0015036  18.034  < 2e-16 ***
Family             0.3128836  0.0448137   6.982 2.91e-12 ***
CCAvg              0.0901009  0.0254491   3.540 0.000399 ***
Education          0.9096374  0.0694957  13.089  < 2e-16 ***
Mortgage           0.0003759  0.0003665   1.026 0.305032    
SecuritiesAccount -0.4588586  0.1723938  -2.662 0.007775 ** 
CDAccount          2.0326038  0.1988450  10.222  < 2e-16 ***
Online            -0.4311810  0.0988740  -4.361 1.30e-05 ***
CreditCard        -0.6217245  0.1289869  -4.820 1.44e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  888.16  on 3531  degrees of freedom
AIC: 912.16

Number of Fisher Scoring iterations: 8

Model 2


Call:
glm(formula = PersonalLoan ~ ., family = binomial("probit"), 
    data = train[, -c(2, 5)])

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2011  -0.1975  -0.0466  -0.0076   4.4574  

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -6.8154342  0.3328028 -20.479  < 2e-16 ***
Experience         0.0027604  0.0040327   0.685 0.493658    
Income             0.0272423  0.0015029  18.126  < 2e-16 ***
Family             0.3113072  0.0448858   6.936 4.05e-12 ***
CCAvg              0.0900172  0.0254321   3.540 0.000401 ***
Education          0.8902560  0.0683558  13.024  < 2e-16 ***
Mortgage           0.0003577  0.0003665   0.976 0.329010    
SecuritiesAccount -0.4562660  0.1718496  -2.655 0.007930 ** 
CDAccount          2.0308815  0.1984530  10.234  < 2e-16 ***
Online            -0.4282554  0.0986958  -4.339 1.43e-05 ***
CreditCard        -0.6135917  0.1285642  -4.773 1.82e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  890.87  on 3532  degrees of freedom
AIC: 912.87

Number of Fisher Scoring iterations: 8

Model 3


Call:
glm(formula = PersonalLoan ~ ., family = binomial("probit"), 
    data = train[, -c(2, 3, 5, 9)])

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2561  -0.1980  -0.0484  -0.0078   4.4804  

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -6.731218   0.314365 -21.412  < 2e-16 ***
Income             0.027386   0.001496  18.310  < 2e-16 ***
Family             0.310808   0.044850   6.930 4.21e-12 ***
CCAvg              0.086265   0.025234   3.419 0.000629 ***
Education          0.885387   0.067939  13.032  < 2e-16 ***
SecuritiesAccount -0.456854   0.171773  -2.660 0.007823 ** 
CDAccount          2.036395   0.198297  10.269  < 2e-16 ***
Online            -0.426754   0.098521  -4.332 1.48e-05 ***
CreditCard        -0.614486   0.128442  -4.784 1.72e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  892.29  on 3534  degrees of freedom
AIC: 910.29

Number of Fisher Scoring iterations: 8

ANOVA

Analysis of Deviance Table

Model: binomial, link: probit

Response: PersonalLoan

Terms added sequentially (first to last)


                  Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                               3542    2208.49              
Income             1   836.88      3541    1371.61 < 2.2e-16 ***
Family             1   110.58      3540    1261.03 < 2.2e-16 ***
CCAvg              1     8.96      3539    1252.07 0.0027525 ** 
Education          1   230.53      3538    1021.54 < 2.2e-16 ***
SecuritiesAccount  1     5.12      3537    1016.42 0.0236402 *  
CDAccount          1    86.24      3536     930.18 < 2.2e-16 ***
Online             1    12.81      3535     917.37 0.0003444 ***
CreditCard         1    25.07      3534     892.29 5.515e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model Accuracy

Model 1 Accuracy :  0.9361702 
Model 2 Accuracy :  0.9341112 
Model 3 Accuracy :  0.9368566 

Moderating Probit

Probit Model 1


Call:
glm(formula = PersonalLoan ~ (Income + Family + CCAvg + Education + 
    Mortgage + SecuritiesAccount + CDAccount + Online + CreditCard)^2, 
    family = binomial("probit"), data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3445  -0.0416  -0.0001   0.0000   4.0472  

Coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   2.907e+00  1.323e+00   2.196  0.02807 *  
Income                       -7.429e-02  1.113e-02  -6.672 2.52e-11 ***
Family                       -1.254e+00  3.683e-01  -3.406  0.00066 ***
CCAvg                         7.311e-01  2.695e-01   2.713  0.00667 ** 
Education                    -4.988e+00  7.170e-01  -6.956 3.50e-12 ***
Mortgage                     -5.121e-04  4.142e-03  -0.124  0.90160    
SecuritiesAccount            -1.098e+00  2.329e+00  -0.472  0.63711    
CDAccount                    -3.074e+00  2.466e+01  -0.125  0.90078    
Online                        8.387e-01  8.788e-01   0.954  0.33990    
CreditCard                    2.397e-01  1.230e+00   0.195  0.84544    
Income:Family                 2.168e-02  3.182e-03   6.812 9.63e-12 ***
Income:CCAvg                 -7.806e-03  1.428e-03  -5.466 4.60e-08 ***
Income:Education              5.661e-02  5.953e-03   9.510  < 2e-16 ***
Income:Mortgage              -4.096e-06  2.015e-05  -0.203  0.83891    
Income:SecuritiesAccount      1.922e-02  1.423e-02   1.350  0.17704    
Income:CDAccount              9.702e-03  1.329e-02   0.730  0.46528    
Income:Online                 8.413e-05  5.403e-03   0.016  0.98758    
Income:CreditCard            -4.249e-03  8.317e-03  -0.511  0.60942    
Family:CCAvg                  8.091e-02  5.162e-02   1.568  0.11698    
Family:Education             -4.181e-01  9.711e-02  -4.306 1.66e-05 ***
Family:Mortgage               8.849e-04  6.708e-04   1.319  0.18707    
Family:SecuritiesAccount     -9.046e-01  4.388e-01  -2.062  0.03923 *  
Family:CDAccount              5.192e-01  3.925e-01   1.323  0.18588    
Family:Online                -2.420e-01  1.659e-01  -1.459  0.14465    
Family:CreditCard            -1.518e-01  2.120e-01  -0.716  0.47412    
CCAvg:Education               2.726e-01  6.742e-02   4.044 5.26e-05 ***
CCAvg:Mortgage               -6.726e-05  3.519e-04  -0.191  0.84841    
CCAvg:SecuritiesAccount      -6.553e-03  1.967e-01  -0.033  0.97343    
CCAvg:CDAccount              -2.200e-01  2.141e-01  -1.027  0.30423    
CCAvg:Online                 -1.659e-01  1.034e-01  -1.605  0.10843    
CCAvg:CreditCard              6.944e-02  1.345e-01   0.516  0.60573    
Education:Mortgage           -4.539e-04  9.182e-04  -0.494  0.62106    
Education:SecuritiesAccount   1.130e-01  4.775e-01   0.237  0.81296    
Education:CDAccount          -6.369e-01  5.396e-01  -1.180  0.23781    
Education:Online              3.504e-02  2.245e-01   0.156  0.87599    
Education:CreditCard          1.341e-01  2.971e-01   0.451  0.65183    
Mortgage:SecuritiesAccount    1.271e-03  2.933e-03   0.433  0.66472    
Mortgage:CDAccount            6.270e-04  3.610e-03   0.174  0.86209    
Mortgage:Online              -2.166e-04  1.420e-03  -0.153  0.87871    
Mortgage:CreditCard          -1.788e-03  2.362e-03  -0.757  0.44911    
SecuritiesAccount:CDAccount   5.886e+00  2.452e+01   0.240  0.81028    
SecuritiesAccount:Online     -4.477e+00  2.442e+01  -0.183  0.85452    
SecuritiesAccount:CreditCard -1.745e+00  2.288e+00  -0.763  0.44568    
CDAccount:Online              4.128e+00  2.443e+01   0.169  0.86583    
CDAccount:CreditCard          1.631e+00  2.489e+00   0.655  0.51216    
Online:CreditCard            -2.139e+00  1.018e+00  -2.102  0.03555 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.5  on 3542  degrees of freedom
Residual deviance:  354.9  on 3497  degrees of freedom
AIC: 446.9

Number of Fisher Scoring iterations: 16

Probit Model 2


Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education + 
    Mortgage + SecuritiesAccount + Online + CreditCard + (Income * 
    Family) + (Income * CCAvg) + (Income + Education) + (Family + 
    Education) + (Family * SecuritiesAccount) + (CCAvg * Education) + 
    (Online * CreditCard), family = binomial("probit"), data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2478  -0.1585  -0.0172  -0.0003   3.8019  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -5.1131527  0.6624813  -7.718 1.18e-14 ***
Income                    0.0165651  0.0041332   4.008 6.13e-05 ***
Family                   -1.1006093  0.1797715  -6.122 9.23e-10 ***
CCAvg                     0.5371912  0.1468435   3.658 0.000254 ***
Education                 0.2644742  0.1356117   1.950 0.051149 .  
Mortgage                 -0.0002530  0.0003996  -0.633 0.526722    
SecuritiesAccount         0.9596056  0.3375711   2.843 0.004474 ** 
Online                   -0.0804624  0.1218678  -0.660 0.509098    
CreditCard               -0.0652945  0.1657301  -0.394 0.693595    
Income:Family             0.0135094  0.0015871   8.512  < 2e-16 ***
Income:CCAvg             -0.0055317  0.0008152  -6.785 1.16e-11 ***
Family:SecuritiesAccount -0.3264840  0.1537070  -2.124 0.033665 *  
CCAvg:Education           0.2228096  0.0363826   6.124 9.12e-10 ***
Online:CreditCard         0.0423168  0.2274705   0.186 0.852420    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  740.84  on 3529  degrees of freedom
AIC: 768.84

Number of Fisher Scoring iterations: 9

Probit Model 3


Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education + 
    (Income * Family) + (Income * CCAvg) + (CCAvg * Education), 
    family = binomial("probit"), data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1961  -0.1596  -0.0183  -0.0003   3.7536  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -5.0038181  0.6426583  -7.786 6.91e-15 ***
Income           0.0162185  0.0040693   3.986 6.73e-05 ***
Family          -1.1420517  0.1765918  -6.467 9.98e-11 ***
CCAvg            0.5510001  0.1450108   3.800 0.000145 ***
Education        0.2526840  0.1336823   1.890 0.058734 .  
Income:Family    0.0134833  0.0015653   8.614  < 2e-16 ***
Income:CCAvg    -0.0055975  0.0008087  -6.921 4.48e-12 ***
CCAvg:Education  0.2226771  0.0358823   6.206 5.44e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  750.57  on 3535  degrees of freedom
AIC: 766.57

Number of Fisher Scoring iterations: 9

Moderating Accuracy

Model 1 Accuracy :  0.9677419 
Model 2 Accuracy :  0.9464653 
Model 3 Accuracy :  0.9361702 

Moderating Logit

Logit Model 1


Call:
glm(formula = PersonalLoan ~ (Income + Family + CCAvg + Education + 
    Mortgage + SecuritiesAccount + CDAccount + Online + CreditCard)^2, 
    family = binomial("logit"), data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.5005  -0.0689  -0.0049  -0.0001   3.7992  

Coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   6.326e+00  2.845e+00   2.224 0.026172 *  
Income                       -1.536e-01  2.378e-02  -6.457 1.07e-10 ***
Family                       -2.520e+00  7.817e-01  -3.224 0.001265 ** 
CCAvg                         1.469e+00  5.449e-01   2.696 0.007019 ** 
Education                    -9.796e+00  1.528e+00  -6.409 1.46e-10 ***
Mortgage                     -1.501e-03  9.131e-03  -0.164 0.869445    
SecuritiesAccount            -3.101e+00  4.819e+00  -0.644 0.519876    
CDAccount                    -4.888e+00  2.333e+01  -0.209 0.834073    
Online                        1.594e+00  1.889e+00   0.844 0.398660    
CreditCard                    8.392e-01  2.523e+00   0.333 0.739424    
Income:Family                 4.497e-02  6.892e-03   6.526 6.76e-11 ***
Income:CCAvg                 -1.508e-02  2.896e-03  -5.208 1.91e-07 ***
Income:Education              1.131e-01  1.307e-02   8.652  < 2e-16 ***
Income:Mortgage              -1.313e-05  4.485e-05  -0.293 0.769651    
Income:SecuritiesAccount      4.129e-02  2.847e-02   1.450 0.147023    
Income:CDAccount              8.643e-03  2.596e-02   0.333 0.739201    
Income:Online                 3.639e-03  1.139e-02   0.320 0.749241    
Income:CreditCard            -5.140e-03  1.676e-02  -0.307 0.759129    
Family:CCAvg                  1.332e-01  1.019e-01   1.306 0.191480    
Family:Education             -8.974e-01  1.927e-01  -4.656 3.22e-06 ***
Family:Mortgage               1.782e-03  1.356e-03   1.315 0.188579    
Family:SecuritiesAccount     -1.611e+00  8.681e-01  -1.856 0.063449 .  
Family:CDAccount              1.049e+00  7.650e-01   1.371 0.170223    
Family:Online                -5.588e-01  3.335e-01  -1.676 0.093803 .  
Family:CreditCard            -4.131e-01  4.227e-01  -0.977 0.328469    
CCAvg:Education               4.966e-01  1.308e-01   3.796 0.000147 ***
CCAvg:Mortgage               -1.873e-05  6.961e-04  -0.027 0.978535    
CCAvg:SecuritiesAccount       1.867e-02  3.785e-01   0.049 0.960655    
CCAvg:CDAccount              -4.579e-01  4.074e-01  -1.124 0.260954    
CCAvg:Online                 -2.965e-01  2.060e-01  -1.440 0.149976    
CCAvg:CreditCard              1.068e-01  2.615e-01   0.408 0.683151    
Education:Mortgage           -4.146e-04  1.787e-03  -0.232 0.816584    
Education:SecuritiesAccount   3.224e-01  9.382e-01   0.344 0.731134    
Education:CDAccount          -1.106e+00  1.011e+00  -1.094 0.274054    
Education:Online             -1.099e-01  4.337e-01  -0.253 0.799993    
Education:CreditCard          7.417e-02  5.814e-01   0.128 0.898484    
Mortgage:SecuritiesAccount    2.941e-03  5.572e-03   0.528 0.597546    
Mortgage:CDAccount            1.463e-03  7.098e-03   0.206 0.836696    
Mortgage:Online              -3.771e-04  2.837e-03  -0.133 0.894246    
Mortgage:CreditCard          -3.576e-03  4.940e-03  -0.724 0.469155    
SecuritiesAccount:CDAccount   1.108e+01  2.272e+01   0.488 0.625705    
SecuritiesAccount:Online     -8.727e+00  2.209e+01  -0.395 0.692796    
SecuritiesAccount:CreditCard -3.188e+00  5.395e+00  -0.591 0.554570    
CDAccount:Online              8.073e+00  2.215e+01   0.365 0.715476    
CDAccount:CreditCard          2.618e+00  5.697e+00   0.459 0.645912    
Online:CreditCard            -3.940e+00  2.017e+00  -1.953 0.050842 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.5  on 3542  degrees of freedom
Residual deviance:  351.9  on 3497  degrees of freedom
AIC: 443.9

Number of Fisher Scoring iterations: 13

Logit Model 2


Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education + 
    Mortgage + SecuritiesAccount + Online + CreditCard + (Income * 
    Family) + (Income * CCAvg) + (Income + Education) + (Family + 
    Education) + (Family * SecuritiesAccount) + (CCAvg * Education) + 
    (Online * CreditCard), family = binomial("logit"), data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3243  -0.1662  -0.0428  -0.0080   3.5310  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -1.064e+01  1.338e+00  -7.955 1.79e-15 ***
Income                    3.625e-02  8.086e-03   4.482 7.38e-06 ***
Family                   -1.958e+00  3.589e-01  -5.455 4.90e-08 ***
CCAvg                     1.118e+00  2.924e-01   3.822 0.000133 ***
Education                 7.780e-01  2.638e-01   2.949 0.003185 ** 
Mortgage                 -5.189e-04  7.544e-04  -0.688 0.491569    
SecuritiesAccount         1.825e+00  6.483e-01   2.815 0.004877 ** 
Online                   -1.685e-01  2.330e-01  -0.723 0.469614    
CreditCard               -1.414e-01  3.111e-01  -0.454 0.649540    
Income:Family             2.417e-02  3.121e-03   7.745 9.58e-15 ***
Income:CCAvg             -1.042e-02  1.578e-03  -6.604 3.99e-11 ***
Family:SecuritiesAccount -6.058e-01  2.932e-01  -2.066 0.038811 *  
CCAvg:Education           3.539e-01  6.996e-02   5.058 4.23e-07 ***
Online:CreditCard         5.752e-02  4.386e-01   0.131 0.895644    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  734.83  on 3529  degrees of freedom
AIC: 762.83

Number of Fisher Scoring iterations: 8

Logit Model 3


Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education + 
    (Income * Family) + (Income * CCAvg) + (CCAvg * Education), 
    family = binomial("logit"), data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2586  -0.1672  -0.0458  -0.0086   3.5074  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -10.445839   1.299923  -8.036 9.30e-16 ***
Income            0.035434   0.007976   4.443 8.88e-06 ***
Family           -2.018917   0.352261  -5.731 9.97e-09 ***
CCAvg             1.127809   0.288996   3.903 9.52e-05 ***
Education         0.762329   0.259703   2.935  0.00333 ** 
Income:Family     0.024012   0.003072   7.816 5.44e-15 ***
Income:CCAvg     -0.010422   0.001562  -6.672 2.52e-11 ***
CCAvg:Education   0.352559   0.068912   5.116 3.12e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2208.49  on 3542  degrees of freedom
Residual deviance:  744.32  on 3535  degrees of freedom
AIC: 760.32

Number of Fisher Scoring iterations: 8

Moderating Accuracy

Model 1 Accuracy :  0.9752917 
Model 2 Accuracy :  0.949897 
Model 3 Accuracy :  0.9492107 

Final Model

Model


Call:
glm(formula = PersonalLoan ~ Income + Family + CCAvg + Education + 
    (Income * Family) + (Income * CCAvg) + (CCAvg * Education), 
    family = binomial("logit"), data = loandata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3559  -0.1737  -0.0485  -0.0092   3.4737  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -9.856679   1.062838  -9.274  < 2e-16 ***
Income           0.031210   0.006652   4.692 2.70e-06 ***
Family          -2.213478   0.295643  -7.487 7.05e-14 ***
CCAvg            1.122034   0.238552   4.704 2.56e-06 ***
Education        0.675988   0.213177   3.171  0.00152 ** 
Income:Family    0.026453   0.002615  10.116  < 2e-16 ***
Income:CCAvg    -0.010654   0.001311  -8.125 4.48e-16 ***
CCAvg:Education  0.360433   0.056897   6.335 2.38e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3162.0  on 4999  degrees of freedom
Residual deviance: 1083.7  on 4992  degrees of freedom
AIC: 1099.7

Number of Fisher Scoring iterations: 8

Neural Network

Mode

hidden: 3    thresh: 0.1    rep: 1/1    steps:    5411  error: 24.62304 time: 8.42 secs

Neural Networks

                                          1
error                       24.623039043850
reached.threshold            0.075844597757
steps                     5411.000000000000
Intercept.to.1layhid1        0.918161267379
Income.to.1layhid1           3.680179680571
Family.to.1layhid1           2.921263944181
CCAvg.to.1layhid1            2.922362929115
Education.to.1layhid1        3.282033655071
Intercept.to.1layhid2        5.415241402185
Income.to.1layhid2          -0.046654895118
Family.to.1layhid2           0.042598358687
CCAvg.to.1layhid2           -0.287273002051
Education.to.1layhid2        0.218828126997
Intercept.to.1layhid3       15.007094442606
Income.to.1layhid3          -0.009632312168
Family.to.1layhid3          -3.133100757383
CCAvg.to.1layhid3            0.253714105793
Education.to.1layhid3      -10.446531750019
Intercept.to.PersonalLoan    4.002556055111
1layhid.1.to.PersonalLoan    3.239241576689
1layhid.2.to.PersonalLoan  -13.402393937834
1layhid.3.to.PersonalLoan -347.486401505084

Accuracy

Accuracy :  0.9835277968

Result Matrix

                                          1
error                       24.623039043850
reached.threshold            0.075844597757
steps                     5411.000000000000
Intercept.to.1layhid1        0.918161267379
Income.to.1layhid1           3.680179680571
Family.to.1layhid1           2.921263944181
CCAvg.to.1layhid1            2.922362929115
Education.to.1layhid1        3.282033655071
Intercept.to.1layhid2        5.415241402185
Income.to.1layhid2          -0.046654895118
Family.to.1layhid2           0.042598358687
CCAvg.to.1layhid2           -0.287273002051
Education.to.1layhid2        0.218828126997
Intercept.to.1layhid3       15.007094442606
Income.to.1layhid3          -0.009632312168
Family.to.1layhid3          -3.133100757383
CCAvg.to.1layhid3            0.253714105793
Education.to.1layhid3      -10.446531750019
Intercept.to.PersonalLoan    4.002556055111
1layhid.1.to.PersonalLoan    3.239241576689
1layhid.2.to.PersonalLoan  -13.402393937834
1layhid.3.to.PersonalLoan -347.486401505084

Analysis in Excell

---
title: "Loan Analysis"
output: 
    flexdashboard::flex_dashboard:
        orientation: columns
        vertical_layout: fill
        self_contained: true
        social : ["twitter","facebook","menu"]
        source_code : embed
---

```{r load-packages}
library(neuralnet)
library(rbokeh)
library(psych)

```


```{r load-dataset}

loandata <- read.csv("C:\\Users\\teeja\\Desktop\\CleanGithubProjects\\scm651\\hw4_gb3\\scm651_homework_4_universal_bank.csv", na.strings =c(""))
#head(loandata)

```



# Data Decription
## {.tabset}
### Columns Descrption

#### Background
Using the Universal Bank data, determine the factors which influence whether a customer
takes out a loan

#### Universal Bank Data Fields
1. ID : unique identifier
1. Personal Loan :  did the customer accept the personal load offered (1=Yes, 0=No)
1. Age : customer’s age
1. Experience :  number of years of profession experience
1. Income : annual income of the customer ($000)
1. Zip code:  home address zip code
1. Family : family size of customer
1. CCAvg : average spending on credit cards per month ($000)
1. Education:  education level (1) undergraduate, (2) graduate, (3) advanced/professional
1. Mortgage :  value of house mortgage ($000)
1. Securities :  does the customer have a securities account with the bank? (1=Yes, 0=No)
1. CDAccount : does the customer have a certificate of deposit with the bank? (1=Yes, 0=No)
1. Online : does the customer use Internet banking facilities (1=Yes, 0=No)
1. CreditCard : does the customer use a credit card issued by Universal Bank? (1=Yes, 0=No)

### Summary
```{r summary-data}
summary(loandata)
```

### Data Structure
```{r}
str(loandata)
```




```{r}
### Data Transformation
#loandata$CustomerID < as.factor(loandata$CustomerID)
originaldata <- loandata

# # tranform loan data
# loandata$PersonalLoan <- as.factor(loandata$PersonalLoan)
# loandata$Income <- as.numeric(loandata$Income)
# loandata$ZIP.Code <- as.factor(loandata$ZIP.Code)
# loandata$Education <- as.ordered(loandata$Education)
# loandata$Mortgage <- as.numeric(loandata$Mortgage)
# loandata$SecuritiesAccount <- as.factor(loandata$SecuritiesAccount)
# loandata$CDAccount <- as.factor(loandata$CDAccount)
# loandata$Online <- as.factor(loandata$Online)
# loandata$CreditCard <- as.factor(loandata$CreditCard)
# 
# str(loandata)

```

### Scatter Plot

```{r}
p = loandata  %>% figure(title = "Scatter Plot", width = 1000) %>% ly_points(x=Experience, y=Income, size = CCAvg, color = PersonalLoan,
                                                               hover = list(ZIP.Code, CDAccount,Online, PersonalLoan) )
p

```


### Scater Matrix
```{r}
pairs.panels(loandata[,-c(1,6)])

```



### Age and Experience

```{r}
p =loandata[,c(3,4)] %>% figure(title = "Age and Experience") %>%
      ly_points(Age,Experience, hover = list(Age, Experience)) 

p

```





# Logit

## {.tabset}

### Model 1


```{r split-data}

ind <- sample(2, nrow(loandata), replace = TRUE, prob = c(0.7, 0.3))
train <- loandata[ind==1,]
test <- loandata[ind==2,]

# remove customer ID

train <- train[,-1]
test <- test[,-1]

# str(train)
#str(test)
#dim(train)
#dim(test)

```




```{r}

#remove zip code

model1 <- glm(PersonalLoan ~ .,family = binomial("logit"), data= train[,-5])

summary(model1)



```

### Model 2
```{r}

#remove zip code

model2 <- glm(PersonalLoan ~ .,family = binomial("logit"), data= train[, -c(2,5)])

summary(model2)

```


### Model 3

```{r}
#remove zip code, Age, Experience

model3 <- glm(PersonalLoan ~ .,family = binomial("logit"), data= train[,-c(2,3,5,9)])

summary(model3)

```


### ANOVA

```{r}
anova(model3, test="Chisq")
```


### Model Accuracy

```{r}

prediction <- predict.glm(model3, newdata = test[,-c(1,2,3,5,9)])

prediction <- ifelse(prediction > 0.5, 1, 0)

error <- mean(prediction != test$PersonalLoan)


prediction2 <- predict.glm(model2, newdata = test[,-c(1,2,5)])

prediction2 <- ifelse(prediction2 > 0.5, 1, 0)

error2 <- mean(prediction2 != test$PersonalLoan)


prediction1 <- predict.glm(model1, newdata = test[,-c(1,5)])

prediction1 <- ifelse(prediction1 > 0.5, 1, 0)

error1 <- mean(prediction1 != test$PersonalLoan)


cat("Model 1 Accuracy : ", 1-error1, '\n\n')
cat("Model 2 Accuracy : ", 1-error2, '\n\n')
cat("Model 3 Accuracy : ", 1-error, '\n')
 
```




# Probit

## {.tabset}

### Model 1




```{r}

#remove zip code

model1 <- glm(PersonalLoan ~ .,family = binomial("probit"), data= train[,-5])

summary(model1)



```

### Model 2
```{r}

#remove zip code

model2 <- glm(PersonalLoan ~ .,family = binomial("probit"), data= train[, -c(2,5)])

summary(model2)

```


### Model 3

```{r}
#remove zip code, Age, Experience

model3 <- glm(PersonalLoan ~ .,family = binomial("probit"), data= train[,-c(2,3,5,9)])

summary(model3)

```


### ANOVA

```{r}
anova(model3, test="Chisq")
```


### Model Accuracy

```{r}

prediction <- predict.glm(model3, newdata = test[,-c(1,2,3,5,9)])

prediction <- ifelse(prediction > 0.5, 1, 0)

error <- mean(prediction != test$PersonalLoan)


prediction2 <- predict.glm(model2, newdata = test[,-c(1,2,5)])

prediction2 <- ifelse(prediction2 > 0.5, 1, 0)

error2 <- mean(prediction2 != test$PersonalLoan)


prediction1 <- predict.glm(model1, newdata = test[,-c(1,5)])

prediction1 <- ifelse(prediction1 > 0.5, 1, 0)

error1 <- mean(prediction1 != test$PersonalLoan)


cat("Model 1 Accuracy : ", 1-error1, '\n\n')
cat("Model 2 Accuracy : ", 1-error2, '\n\n')
cat("Model 3 Accuracy : ", 1-error, '\n')
 
```



# Moderating Probit

##{.tabset}

### Probit Model 1

```{r}

l1_model <- glm(PersonalLoan ~ (Income + Family + CCAvg + Education + Mortgage + SecuritiesAccount + CDAccount + Online + CreditCard)^2 ,
             data = train, family = binomial("probit"))

summary(l1_model)

```


### Probit Model 2

```{r}
 
l2_model <- glm(PersonalLoan ~ Income + Family + CCAvg + Education + Mortgage + SecuritiesAccount  + Online + CreditCard + (Income * Family) +
               (Income * CCAvg) + (Income + Education) + (Family + Education) + (Family * SecuritiesAccount) + (CCAvg * Education) +
               (Online * CreditCard),
             data = train, family = binomial("probit"))
summary(l2_model)
```


### Probit Model 3

```{r}

 
l3_model <- glm(PersonalLoan ~ Income + Family + CCAvg + Education  + (Income * Family) +
               (Income * CCAvg)  + (CCAvg * Education) ,
             data = train, family = binomial("probit"))

summary(l3_model)

```



### Moderating Accuracy

```{r}
#model 1

prediction <- predict.glm(l1_model, newdata = test[,-c(1,5)])

prediction <- ifelse(prediction > 0.5, 1, 0)

error <- mean(prediction != test$PersonalLoan)


# model 2

prediction2 <- predict.glm(l2_model, newdata = test[,c("Income", "Family", "CCAvg", "Education", "Mortgage", "SecuritiesAccount","Online", "CreditCard" )])

prediction2 <- ifelse(prediction2 > 0.5, 1, 0)

error2 <- mean(prediction2 != test$PersonalLoan)


prediction3 <- predict.glm(l3_model, newdata = test[,c("Income","Family", "CCAvg", "Education" )])

prediction3 <- ifelse(prediction3 > 0.5, 1, 0)

error3 <- mean(prediction1 != test$PersonalLoan)


cat("Model 1 Accuracy : ", 1-error, '\n\n')
cat("Model 2 Accuracy : ", 1-error2, '\n\n')
cat("Model 3 Accuracy : ", 1-error3, '\n')

```








# Moderating Logit

##{.tabset}

### Logit Model 1

```{r}

l1_model <- glm(PersonalLoan ~ (Income + Family + CCAvg + Education + Mortgage + SecuritiesAccount + CDAccount + Online + CreditCard)^2 ,
             data = train, family = binomial("logit"))

summary(l1_model)

```


### Logit Model 2

```{r}
 
l2_model <- glm(PersonalLoan ~ Income + Family + CCAvg + Education + Mortgage + SecuritiesAccount  + Online + CreditCard + (Income * Family) +
               (Income * CCAvg) + (Income + Education) + (Family + Education) + (Family * SecuritiesAccount) + (CCAvg * Education) +
               (Online * CreditCard),
             data = train, family = binomial("logit"))
summary(l2_model)
```


### Logit Model 3

```{r}

 
l3_model <- glm(PersonalLoan ~ Income + Family + CCAvg + Education  + (Income * Family) +
               (Income * CCAvg)  + (CCAvg * Education) ,
             data = train, family = binomial("logit"))

summary(l3_model)

```



### Moderating Accuracy

```{r}
#model 1

prediction <- predict.glm(l1_model, newdata = test[,-c(1,5)])

prediction <- ifelse(prediction > 0.5, 1, 0)

error <- mean(prediction != test$PersonalLoan)


# model 2

prediction2 <- predict.glm(l2_model, newdata = test[,c("Income", "Family", "CCAvg", "Education", "Mortgage", "SecuritiesAccount","Online", "CreditCard" )])

prediction2 <- ifelse(prediction2 > 0.5, 1, 0)

error2 <- mean(prediction2 != test$PersonalLoan)


prediction3 <- predict.glm(l3_model, newdata = test[,c("Income","Family", "CCAvg", "Education" )])

prediction3 <- ifelse(prediction3 > 0.5, 1, 0)

error3 <- mean(prediction3 != test$PersonalLoan)


cat("Model 1 Accuracy : ", 1-error, '\n\n')
cat("Model 2 Accuracy : ", 1-error2, '\n\n')
cat("Model 3 Accuracy : ", 1-error3, '\n')

```


# Final Model
##
### Model

```{r}

model  <- glm(PersonalLoan ~ Income + Family + CCAvg + Education  + (Income * Family) +
               (Income * CCAvg)  + (CCAvg * Education) ,
             data = loandata, family = binomial("logit"))

summary(model)



```



# Neural Network

## {.tabset}

### Mode 



```{r}

model1_nn <- neuralnet(PersonalLoan ~ Income + Family + CCAvg + Education  , hidden = 3,
                        lifesign = "minimal",linear.output = FALSE,threshold = 0.1 , data = train )



```



### Neural Networks


```{r}

model1_nn$result.matrix

plot(model1_nn)


```



### Accuracy

```{r}

test_result <- compute(model1_nn,test[,c("Income","Family", "CCAvg", "Education" )] )

prediction <- test_result$net.result


prediction3 <- ifelse(prediction > 0.5, 1, 0)

error <- mean(prediction3 != test$PersonalLoan)

cat("Accuracy : ", 1-error)


```



### Result Matrix

```{r}
model1_nn$result.matrix

```


# Analysis in Excell

#### [Analysis in Excel](https://sumailsyr-my.sharepoint.com/:x:/g/personal/toabdula_syr_edu/Ed793jjWAg1NrkXHHfbzA90BewwooVhx_3kTZzHe3BU7lA?e=MqZmqz)