Step 1: Collecting Data

The data has been collected and ready to be analysed.

Step 2: Exploring and preparing data

credit <- read.csv("http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/Presentations/ml7/credit.csv")

Fix the default variable to be 0 or 1

credit$default = as.numeric(credit$default)
credit$default = credit$default - 1

Examine the credit data:

str(credit)

'data.frame':   1000 obs. of  17 variables:
 $ checking_balance    : Factor w/ 4 levels "< 0 DM","> 200 DM",..: 1 3 4 1 1 4 4 3 4 3 ...
 $ months_loan_duration: int  6 48 12 42 24 36 24 36 12 30 ...
 $ credit_history      : Factor w/ 5 levels "critical","good",..: 1 2 1 2 4 2 2 2 2 1 ...
 $ purpose             : Factor w/ 6 levels "business","car",..: 5 5 4 5 2 4 5 2 5 2 ...
 $ amount              : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
 $ savings_balance     : Factor w/ 5 levels "< 100 DM","> 1000 DM",..: 5 1 1 1 1 5 4 1 2 1 ...
 $ employment_duration : Factor w/ 5 levels "< 1 year","> 7 years",..: 2 3 4 4 3 3 2 3 4 5 ...
 $ percent_of_income   : int  4 2 2 2 3 2 3 2 2 4 ...
 $ years_at_residence  : int  4 2 3 4 4 4 4 2 4 2 ...
 $ age                 : int  67 22 49 45 53 35 53 35 61 28 ...
 $ other_credit        : Factor w/ 3 levels "bank","none",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ housing             : Factor w/ 3 levels "other","own",..: 2 2 2 1 1 1 2 3 2 2 ...
 $ existing_loans_count: int  2 1 1 1 2 1 1 1 1 2 ...
 $ job                 : Factor w/ 4 levels "management","skilled",..: 2 2 4 2 2 4 2 1 4 1 ...
 $ dependents          : int  1 1 2 2 2 2 1 1 1 1 ...
 $ phone               : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 1 2 1 1 ...
 $ default             : num  0 1 0 0 1 0 0 0 0 1 ...

Step 3: Fit model

Set up training and test data sets:

indx<- sample(1:nrow(credit),as.integer(0.9*nrow(credit)))
indx

  [1]  385  910  109  594  646  699  305  652  705  600   55  641
 [13]  592  195  933   77  510  124  442   43  751  694  524   76
 [25]   90  380  986  530  968  784  659  752  253  940  107  661
 [37]  816  431  515  650   68  534  513  871  734  876  512  730
 [49]  971  860  411  193  410  335  560  547  186  227  867  957
 [61]  883  744  672  404  721  779  642  444  356 1000  944  774
 [73]  780  445   59  361  289  975  988   70  263  806  341  555
 [85]  458  347  683  147  432  611  912  381  485   57  296   21
 [97]  969  682  403  287  655  830  616  537  353  897   34  880
[109]  217   44  644  695  151  508  143  213  518  677  987  874
[121]  303  549  377  639  440  748  840  237  905  301  663  104
[133]  566  496  299  207   65  809  325  966  166  586  160  308
[145]  418   72  875  707  792  190  208  737  192  469   45  344
[157]  346  999  916  281  657  545  314   64   85  522  211    3
[169]  525  155  700   52  467  950  448  955  842  768  254  763
[181]  802  348  322  449  557  653  540  921   14  878  776  319
[193]   73  246  478  393   42  660  994  801  470  926  189    5
[205]  698  554  167  934  823  996  728  884  961  345  125  647
[217]  980  175  643  230  573   58  948  746   67  120  669  982
[229]  755  328  483  363  722  129  919  712  422  420  486    1
[241]  433   35  599  758   15  794  671  983  261  119  351   99
[253]  800  198  998  202  173  267  681  608  631  111  131  739
[265]  531  276  179  165  103  417  727  575  226  414  667  598
[277]  953  229  188  236  133  473  931  302  606  286  542  789
[289]   28  981  906    9  389  461  602  435  704  140  475  836
[301]  452   30  815  849  894  407  893  340  144  742   47  238
[313]  622  753  437  675  597  819  387  741  757  419  854  191
[325]  847  587   71  726  689  245  127  310  562  952  993   56
[337]  293   94  710  371  550  749  609  235  645  664  796  153
[349]  536  285  811  180  330  544  264  870  383  519   22  252
[361]  553  832  973  991   53  300  963   16  580  516  844  374
[373]   12  520  889  484  640   24  824   23  765  977  398  866
[385]  172  282  297  349  911   11  391  503  329  601  838  858
[397]  209  541  839  272  735    6  457  703  591  223  618  493
[409]  219  563  460  181  532  570  257  942  687  121  274   91
[421]  777  174  388  943  856  279   25   81  373  239  488  283
[433]   41  248  899  738   83  110  100  362  863  579  567  450
[445]  569  215  584  306  161  828  375  891  848  141  400  970
[457]  447  902  716  949  693  593  212  582  456  436  265  762
[469]  231  401  142   50  614  621  315  869  929  851  367   38
[481]  535  923  242  256  588  879  900  224  731  288   46   51
[493]  833  295  837  113  105  805  706  941  402  617  733  480
[505]  658  199  477  529  984  499  358  613  928  724  778  382
[517]  552  610  817  169  607  771  197  825  498  808  114   39
[529]  292  907   27  903   13  793   48  112    2   61  565  492
[541]  909  262  379  708  408  320  807  951  578  890  935   69
[553]  123  163  323  791  135  852  521  627  959  152  343  992
[565]  413   80  918  939  128  130  194  769  368  271  201  670
[577]  972  218  331  311  439  159  922  118  797  551  526  865
[589]  760  468   78  491  634  423  572  370  527  985  352  459
[601]  625  612  504  506   62  138  309  548  685  628  596  861
[613]  501  350  915   84  925  680  877  873  184  775  514  465
[625]  788  960  278  978  136  443  476   54  723  259  304  908
[637]  990  590  205  574  139  154  754  366  291  494  187  203
[649]  421   29  826  273  589  892  148  533   92  914   74  258
[661]   87  665  268   18  701  764  633  326  603  799   19  835
[673]  583  156  228  284   86  803  359  717  785  810  967   97
[685]  397  581  662  196  275  507  720  814  116  171  756  247
[697]  711  157  206  463  558  913  946  691  164  974  924  718
[709]  936  692  945  132  729  386  241  101  269  489  813  855
[721]  615  766  954  438  339  750  624  364  812  221  384  885
[733]  927  845  736  896   75  713  145  696  989   49  745  623
[745]  559  886   10  162  686  654  637  183  474  841  761  277
[757]  409  523  232  454  673  482  204  831  316  798  332   79
[769]   96  490    7  290  240  630  126  122  372  577  312  820
[781]  790  853  338  502  455   82   88  134  656  280  412  511
[793]  626  864  976   66  937  666  354  702  979   17  528   32
[805]  895  843  964  365  715  732  546  170  648  629  862  619
[817]  472  917  679  539  571  882  604  947  146  360  214  158
[829]  509  333  168  827   98  200  818  337  451  249  434  585
[841]  294  307  334  108  406  938  427   37  115  782  102  466
[853]  176  804  904  495   33  635  678  106  632  620  446  336
[865]  415  266   31  888  255  222  636  747  342  234   93  185
[877]  781  298  317  481  649   63  500   95  846  767  995  321
[889]  887  471  216  651  324  822  430   89  150  684  962  538

credit_train<-credit[indx,]
credit_test<- credit[-indx,]
# create labels
which(colnames(credit)=='default')

[1] 17

credit_train_labels<- credit[indx,17]
credit_test_labels<- credit[-indx,17]

Check if there are any missing values:

library(Amelia)

Loading required package: Rcpp
package ‘Rcpp’ was built under R version 3.3.2## 
## Amelia II: Multiple Imputation
## (Version 1.7.4, built: 2015-12-05)
## Copyright (C) 2005-2017 James Honaker, Gary King and Matthew Blackwell
## Refer to http://gking.harvard.edu/amelia/ for more information
##

missmap(credit, main = "Missing values vs observed")

# number of missing values in each column
sapply(credit,function(x) sum(is.na(x)))

    checking_balance months_loan_duration       credit_history 
                   0                    0                    0 
             purpose               amount      savings_balance 
                   0                    0                    0 
 employment_duration    percent_of_income   years_at_residence 
                   0                    0                    0 
                 age         other_credit              housing 
                   0                    0                    0 
existing_loans_count                  job           dependents 
                   0                    0                    0 
               phone              default 
                   0                    0

# number of unique values in each column
sapply(credit, function(x) length(unique(x)))

    checking_balance months_loan_duration       credit_history 
                   4                   33                    5 
             purpose               amount      savings_balance 
                   6                  921                    5 
 employment_duration    percent_of_income   years_at_residence 
                   5                    4                    4 
                 age         other_credit              housing 
                  53                    3                    3 
existing_loans_count                  job           dependents 
                   4                    4                    2 
               phone              default 
                   2                    2

No missing values which is good.

Fit the logistic regression model, with all predictor variables

model <- glm(default ~.,family=binomial(link='logit'),data=credit_train)
model


Call:  glm(formula = default ~ ., family = binomial(link = "logit"), 
    data = credit_train)

Coefficients:
                   (Intercept)        checking_balance> 200 DM  
                    -1.7449409                      -0.7550219  
    checking_balance1 - 200 DM         checking_balanceunknown  
                    -0.3737206                      -1.6528697  
          months_loan_duration              credit_historygood  
                     0.0256547                       0.8459475  
         credit_historyperfect              credit_historypoor  
                     1.4342813                       0.8678542  
       credit_historyvery good                      purposecar  
                     1.4521062                       0.2895620  
                   purposecar0                purposeeducation  
                    -1.0438381                       0.8019085  
   purposefurniture/appliances              purposerenovations  
                    -0.1613657                       0.4198889  
                        amount        savings_balance> 1000 DM  
                     0.0001056                      -0.9151967  
   savings_balance100 - 500 DM    savings_balance500 - 1000 DM  
                    -0.1456988                      -0.3383594  
        savings_balanceunknown    employment_duration> 7 years  
                    -1.0950653                      -0.5157216  
employment_duration1 - 4 years  employment_duration4 - 7 years  
                    -0.2132003                      -0.8815581  
 employment_durationunemployed               percent_of_income  
                    -0.0940415                       0.2903974  
            years_at_residence                             age  
                     0.0462153                      -0.0133834  
              other_creditnone               other_creditstore  
                    -0.5011712                      -0.4526129  
                    housingown                     housingrent  
                    -0.0906545                       0.1756198  
          existing_loans_count                      jobskilled  
                     0.2715070                       0.0673919  
                 jobunemployed                    jobunskilled  
                    -0.1206304                      -0.0322218  
                    dependents                        phoneyes  
                     0.0678525                      -0.2847680  

Degrees of Freedom: 899 Total (i.e. Null);  864 Residual
Null Deviance:      1108 
Residual Deviance: 874  AIC: 946

summary(model)


Call:
glm(formula = default ~ ., family = binomial(link = "logit"), 
    data = credit_train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8651  -0.7746  -0.4263   0.8285   2.7059  

Coefficients:
                                 Estimate Std. Error z value
(Intercept)                    -1.745e+00  9.242e-01  -1.888
checking_balance> 200 DM       -7.550e-01  3.660e-01  -2.063
checking_balance1 - 200 DM     -3.737e-01  2.136e-01  -1.750
checking_balanceunknown        -1.653e+00  2.328e-01  -7.100
months_loan_duration            2.565e-02  9.164e-03   2.799
credit_historygood              8.459e-01  2.605e-01   3.248
credit_historyperfect           1.434e+00  4.407e-01   3.255
credit_historypoor              8.679e-01  3.326e-01   2.609
credit_historyvery good         1.452e+00  4.206e-01   3.452
purposecar                      2.896e-01  3.189e-01   0.908
purposecar0                    -1.044e+00  8.136e-01  -1.283
purposeeducation                8.019e-01  4.481e-01   1.790
purposefurniture/appliances    -1.614e-01  3.144e-01  -0.513
purposerenovations              4.199e-01  5.820e-01   0.721
amount                          1.056e-04  4.391e-05   2.406
savings_balance> 1000 DM       -9.152e-01  5.155e-01  -1.775
savings_balance100 - 500 DM    -1.457e-01  2.756e-01  -0.529
savings_balance500 - 1000 DM   -3.384e-01  4.000e-01  -0.846
savings_balanceunknown         -1.095e+00  2.705e-01  -4.048
employment_duration> 7 years   -5.157e-01  2.948e-01  -1.749
employment_duration1 - 4 years -2.132e-01  2.349e-01  -0.908
employment_duration4 - 7 years -8.816e-01  2.956e-01  -2.982
employment_durationunemployed  -9.404e-02  4.166e-01  -0.226
percent_of_income               2.904e-01  8.767e-02   3.312
years_at_residence              4.622e-02  8.649e-02   0.534
age                            -1.338e-02  8.979e-03  -1.490
other_creditnone               -5.012e-01  2.349e-01  -2.134
other_creditstore              -4.526e-01  4.348e-01  -1.041
housingown                     -9.065e-02  2.949e-01  -0.307
housingrent                     1.756e-01  3.438e-01   0.511
existing_loans_count            2.715e-01  1.900e-01   1.429
jobskilled                      6.739e-02  2.898e-01   0.233
jobunemployed                  -1.206e-01  6.220e-01  -0.194
jobunskilled                   -3.222e-02  3.486e-01  -0.092
dependents                      6.785e-02  2.392e-01   0.284
phoneyes                       -2.848e-01  2.030e-01  -1.402
                               Pr(>|z|)    
(Intercept)                    0.059014 .  
checking_balance> 200 DM       0.039131 *  
checking_balance1 - 200 DM     0.080132 .  
checking_balanceunknown        1.24e-12 ***
months_loan_duration           0.005120 ** 
credit_historygood             0.001163 ** 
credit_historyperfect          0.001135 ** 
credit_historypoor             0.009079 ** 
credit_historyvery good        0.000556 ***
purposecar                     0.363944    
purposecar0                    0.199469    
purposeeducation               0.073494 .  
purposefurniture/appliances    0.607740    
purposerenovations             0.470651    
amount                         0.016134 *  
savings_balance> 1000 DM       0.075860 .  
savings_balance100 - 500 DM    0.597062    
savings_balance500 - 1000 DM   0.397623    
savings_balanceunknown         5.16e-05 ***
employment_duration> 7 years   0.080267 .  
employment_duration1 - 4 years 0.364042    
employment_duration4 - 7 years 0.002864 ** 
employment_durationunemployed  0.821428    
percent_of_income              0.000925 ***
years_at_residence             0.593108    
age                            0.136099    
other_creditnone               0.032855 *  
other_creditstore              0.297940    
housingown                     0.758559    
housingrent                    0.609492    
existing_loans_count           0.152965    
jobskilled                     0.816108    
jobunemployed                  0.846212    
jobunskilled                   0.926358    
dependents                     0.776648    
phoneyes                       0.160776    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1107.90  on 899  degrees of freedom
Residual deviance:  873.97  on 864  degrees of freedom
AIC: 945.97

Number of Fisher Scoring iterations: 5

anova(model, test="Chisq")

Analysis of Deviance Table

Model: binomial, link: logit

Response: default

Terms added sequentially (first to last)

                     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                   899    1107.90              
checking_balance      3  112.037       896     995.86 < 2.2e-16 ***
months_loan_duration  1   30.031       895     965.83 4.252e-08 ***
credit_history        4   27.393       891     938.44 1.656e-05 ***
purpose               5    9.563       886     928.87  0.088608 .  
amount                1    0.494       885     928.38  0.482128    
savings_balance       4   20.249       881     908.13  0.000446 ***
employment_duration   4   10.396       877     897.74  0.034265 *  
percent_of_income     1    9.329       876     888.41  0.002255 ** 
years_at_residence    1    0.472       875     887.93  0.492289    
age                   1    3.162       874     884.77  0.075375 .  
other_credit          2    4.953       872     879.82  0.084033 .  
housing               2    1.463       870     878.36  0.481278    
existing_loans_count  1    1.862       869     876.49  0.172343    
job                   3    0.468       866     876.03  0.925841    
dependents            1    0.073       865     875.95  0.786640    
phone                 1    1.983       864     873.97  0.159066    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

From the result, we can tell that the checking balance, months loan duration, credit history, saving balance, employment duration, percent of income and age are signifiacnt.Next, Let’t drop the insignificant terms and fit another regression model. Alpha=0.10

model <- glm(default ~ checking_balance + months_loan_duration + credit_history +  percent_of_income + age,family=binomial(link='logit'),data=credit_train)
model


Call:  glm(formula = default ~ checking_balance + months_loan_duration + 
    credit_history + percent_of_income + age, family = binomial(link = "logit"), 
    data = credit_train)

Coefficients:
               (Intercept)    checking_balance> 200 DM  
                  -1.34219                    -0.98414  
checking_balance1 - 200 DM     checking_balanceunknown  
                  -0.49124                    -1.83406  
      months_loan_duration          credit_historygood  
                   0.03102                     0.57593  
     credit_historyperfect          credit_historypoor  
                   1.63343                     0.74246  
   credit_historyvery good           percent_of_income  
                   1.39251                     0.16950  
                       age  
                  -0.01213  

Degrees of Freedom: 899 Total (i.e. Null);  889 Residual
Null Deviance:      1108 
Residual Deviance: 930.8    AIC: 952.8

summary(model)


Call:
glm(formula = default ~ checking_balance + months_loan_duration + 
    credit_history + percent_of_income + age, family = binomial(link = "logit"), 
    data = credit_train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7835  -0.8141  -0.4896   0.9488   2.3335  

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                -1.342194   0.427226  -3.142 0.001680 ** 
checking_balance> 200 DM   -0.984138   0.344520  -2.857 0.004283 ** 
checking_balance1 - 200 DM -0.491239   0.195250  -2.516 0.011871 *  
checking_balanceunknown    -1.834064   0.214956  -8.532  < 2e-16 ***
months_loan_duration        0.031022   0.006642   4.671 3.00e-06 ***
credit_historygood          0.575926   0.205279   2.806 0.005023 ** 
credit_historyperfect       1.633426   0.413018   3.955 7.66e-05 ***
credit_historypoor          0.742460   0.312521   2.376 0.017515 *  
credit_historyvery good     1.392513   0.367016   3.794 0.000148 ***
percent_of_income           0.169499   0.074052   2.289 0.022084 *  
age                        -0.012125   0.007337  -1.653 0.098423 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1107.90  on 899  degrees of freedom
Residual deviance:  930.78  on 889  degrees of freedom
AIC: 952.78

Number of Fisher Scoring iterations: 4

anova(model, test="Chisq")

Analysis of Deviance Table

Model: binomial, link: logit

Response: default

Terms added sequentially (first to last)

                     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                   899    1107.90              
checking_balance      3  112.037       896     995.86 < 2.2e-16 ***
months_loan_duration  1   30.031       895     965.83 4.252e-08 ***
credit_history        4   27.393       891     938.44 1.656e-05 ***
percent_of_income     1    4.869       890     933.57   0.02735 *  
age                   1    2.787       889     930.78   0.09503 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Step 4: Check model performance

Check Accuracy:

fitted.results <- predict(model,newdata=credit_test,type='response')
fitted.results <- ifelse(fitted.results > 0.5,1,0)
misClasificError <- mean(fitted.results != credit_test$default)
print(paste('Accuracy',1-misClasificError))

[1] "Accuracy 0.82"

The accuracy is 0.82, which is very good.

Step 5: Improve model performance

ROC Method: Because this data set is so small, it is possible that the test data set does not contain both 0 and 1 values. If this happens the code will not run.And since the test data set is so small the ROC is not useful here but the code is provided. ‘’’ library(ROCR) p <- predict(model, newdata=credit_test, type=“response”) pr <- prediction(p, credit_test$default) prf <- performance(pr, measure = “tpr”, x.measure = “fpr”) plot(prf)

auc <- performance(pr, measure = “auc”) auc <- auc@y.values[[1]] auc ‘’’

LS0tCnRpdGxlOiAiTG9naXN0aWMgUmVncmVzc2lvbiBBbmFseXNpcyBvbiBDcmVkaXQgRGF0YSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQojIyBTdGVwIDE6IENvbGxlY3RpbmcgRGF0YQpUaGUgZGF0YSBoYXMgYmVlbiBjb2xsZWN0ZWQgYW5kIHJlYWR5IHRvIGJlIGFuYWx5c2VkLgoKIyMgU3RlcCAyOiBFeHBsb3JpbmcgYW5kIHByZXBhcmluZyBkYXRhCmBgYHtyfQpjcmVkaXQgPC0gcmVhZC5jc3YoImh0dHA6Ly93d3cuc2NpLmNzdWVhc3RiYXkuZWR1L35lc3Vlc3MvY2xhc3Nlcy9TdGF0aXN0aWNzXzY2MjAvUHJlc2VudGF0aW9ucy9tbDcvY3JlZGl0LmNzdiIpCmBgYAoKRml4IHRoZSBkZWZhdWx0IHZhcmlhYmxlIHRvIGJlIDAgb3IgMQpgYGB7cn0KY3JlZGl0JGRlZmF1bHQgPSBhcy5udW1lcmljKGNyZWRpdCRkZWZhdWx0KQpjcmVkaXQkZGVmYXVsdCA9IGNyZWRpdCRkZWZhdWx0IC0gMQpgYGAKRXhhbWluZSB0aGUgY3JlZGl0IGRhdGE6CmBgYHtyfQpzdHIoY3JlZGl0KQpgYGAKCiMjIFN0ZXAgMzogRml0IG1vZGVsCiMgU2V0IHVwIHRyYWluaW5nIGFuZCB0ZXN0IGRhdGEgc2V0czoKYGBge3J9CmluZHg8LSBzYW1wbGUoMTpucm93KGNyZWRpdCksYXMuaW50ZWdlcigwLjkqbnJvdyhjcmVkaXQpKSkKaW5keAoKY3JlZGl0X3RyYWluPC1jcmVkaXRbaW5keCxdCmNyZWRpdF90ZXN0PC0gY3JlZGl0Wy1pbmR4LF0KCiMgY3JlYXRlIGxhYmVscwp3aGljaChjb2xuYW1lcyhjcmVkaXQpPT0nZGVmYXVsdCcpCmNyZWRpdF90cmFpbl9sYWJlbHM8LSBjcmVkaXRbaW5keCwxN10KY3JlZGl0X3Rlc3RfbGFiZWxzPC0gY3JlZGl0Wy1pbmR4LDE3XQpgYGAKCkNoZWNrIGlmIHRoZXJlIGFyZSBhbnkgbWlzc2luZyB2YWx1ZXM6CmBgYHtyfQoKbGlicmFyeShBbWVsaWEpCm1pc3NtYXAoY3JlZGl0LCBtYWluID0gIk1pc3NpbmcgdmFsdWVzIHZzIG9ic2VydmVkIikKCiMgbnVtYmVyIG9mIG1pc3NpbmcgdmFsdWVzIGluIGVhY2ggY29sdW1uCgpzYXBwbHkoY3JlZGl0LGZ1bmN0aW9uKHgpIHN1bShpcy5uYSh4KSkpCgojIG51bWJlciBvZiB1bmlxdWUgdmFsdWVzIGluIGVhY2ggY29sdW1uCgpzYXBwbHkoY3JlZGl0LCBmdW5jdGlvbih4KSBsZW5ndGgodW5pcXVlKHgpKSkKYGBgCk5vIG1pc3NpbmcgdmFsdWVzIHdoaWNoIGlzIGdvb2QuCgpGaXQgdGhlIGxvZ2lzdGljIHJlZ3Jlc3Npb24gbW9kZWwsIHdpdGggYWxsIHByZWRpY3RvciB2YXJpYWJsZXMKYGBge3J9Cm1vZGVsIDwtIGdsbShkZWZhdWx0IH4uLGZhbWlseT1iaW5vbWlhbChsaW5rPSdsb2dpdCcpLGRhdGE9Y3JlZGl0X3RyYWluKQptb2RlbAoKc3VtbWFyeShtb2RlbCkKCmFub3ZhKG1vZGVsLCB0ZXN0PSJDaGlzcSIpCmBgYApGcm9tIHRoZSByZXN1bHQsIHdlIGNhbiB0ZWxsIHRoYXQgdGhlIGNoZWNraW5nIGJhbGFuY2UsIG1vbnRocyBsb2FuIGR1cmF0aW9uLCBjcmVkaXQgaGlzdG9yeSwgc2F2aW5nIGJhbGFuY2UsIGVtcGxveW1lbnQgZHVyYXRpb24sIHBlcmNlbnQgb2YgaW5jb21lIGFuZCBhZ2UgYXJlIHNpZ25pZmlhY250Lk5leHQsIExldCd0IGRyb3AgdGhlIGluc2lnbmlmaWNhbnQgdGVybXMgYW5kIGZpdCBhbm90aGVyIHJlZ3Jlc3Npb24gbW9kZWwuCkFscGhhPTAuMTAKCmBgYHtyfQptb2RlbCA8LSBnbG0oZGVmYXVsdCB+IGNoZWNraW5nX2JhbGFuY2UgKyBtb250aHNfbG9hbl9kdXJhdGlvbiArIGNyZWRpdF9oaXN0b3J5ICsgIHBlcmNlbnRfb2ZfaW5jb21lICsgYWdlLGZhbWlseT1iaW5vbWlhbChsaW5rPSdsb2dpdCcpLGRhdGE9Y3JlZGl0X3RyYWluKQptb2RlbAoKc3VtbWFyeShtb2RlbCkKCmFub3ZhKG1vZGVsLCB0ZXN0PSJDaGlzcSIpCmBgYAoKIyMgU3RlcCA0OiBDaGVjayBtb2RlbCBwZXJmb3JtYW5jZQpDaGVjayBBY2N1cmFjeToKYGBge3J9CmZpdHRlZC5yZXN1bHRzIDwtIHByZWRpY3QobW9kZWwsbmV3ZGF0YT1jcmVkaXRfdGVzdCx0eXBlPSdyZXNwb25zZScpCmZpdHRlZC5yZXN1bHRzIDwtIGlmZWxzZShmaXR0ZWQucmVzdWx0cyA+IDAuNSwxLDApCgptaXNDbGFzaWZpY0Vycm9yIDwtIG1lYW4oZml0dGVkLnJlc3VsdHMgIT0gY3JlZGl0X3Rlc3QkZGVmYXVsdCkKcHJpbnQocGFzdGUoJ0FjY3VyYWN5JywxLW1pc0NsYXNpZmljRXJyb3IpKQpgYGAKVGhlIGFjY3VyYWN5IGlzIDAuODIsIHdoaWNoIGlzIHZlcnkgZ29vZC4gCgojIyBTdGVwIDU6IEltcHJvdmUgbW9kZWwgcGVyZm9ybWFuY2UKClJPQyBNZXRob2Q6CkJlY2F1c2UgdGhpcyBkYXRhIHNldCBpcyBzbyBzbWFsbCwgaXQgaXMgcG9zc2libGUgdGhhdCB0aGUgdGVzdCBkYXRhIHNldApkb2VzIG5vdCBjb250YWluIGJvdGggMCBhbmQgMSB2YWx1ZXMuICBJZiB0aGlzIGhhcHBlbnMgdGhlIGNvZGUgd2lsbCBub3QKcnVuLkFuZCBzaW5jZSB0aGUgdGVzdCBkYXRhIHNldCBpcyBzbyBzbWFsbCB0aGUgUk9DIGlzIG5vdCB1c2VmdWwgaGVyZQpidXQgdGhlIGNvZGUgaXMgcHJvdmlkZWQuCicnJwpsaWJyYXJ5KFJPQ1IpCnAgPC0gcHJlZGljdChtb2RlbCwgbmV3ZGF0YT1jcmVkaXRfdGVzdCwgdHlwZT0icmVzcG9uc2UiKQpwciA8LSBwcmVkaWN0aW9uKHAsIGNyZWRpdF90ZXN0JGRlZmF1bHQpCnByZiA8LSBwZXJmb3JtYW5jZShwciwgbWVhc3VyZSA9ICJ0cHIiLCB4Lm1lYXN1cmUgPSAiZnByIikKcGxvdChwcmYpCgphdWMgPC0gcGVyZm9ybWFuY2UocHIsIG1lYXN1cmUgPSAiYXVjIikKYXVjIDwtIGF1Y0B5LnZhbHVlc1tbMV1dCmF1YwonJycKCg==

Logistic Regression Analysis on Credit Data

Step 1: Collecting Data

Step 2: Exploring and preparing data

Step 3: Fit model

Set up training and test data sets:

Step 4: Check model performance

Step 5: Improve model performance