Bagaimana menilai kredibilitas seseorang yang mengajukan kartu kredit, di sini akan di ulas dengan singkat menggunakan model statistik. Data diambil dari http://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data). Data terdiri dari 1000 observasi, dengan 21 varible.

Persiapan

  library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

upload data

Data setelah di download, disimpan di lokal disk, untuk keperluan selanjutnya.

Dt <- read.csv('G:/new-program/clients_credibility/german_credit-1.csv')
dim(Dt)
## [1] 1000   21
str(Dt)
## 'data.frame':    1000 obs. of  21 variables:
##  $ Creditability                : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Account.Balance              : int  1 1 2 1 1 1 1 1 4 2 ...
##  $ Duration.of.Credit..month.   : int  18 9 12 12 12 10 8 6 18 24 ...
##  $ PaymentStatusofPreviousCredit: int  4 4 2 4 4 4 4 4 4 2 ...
##  $ Purpose                      : int  2 0 9 0 0 0 0 0 3 3 ...
##  $ Credit.Amount                : int  1049 2799 841 2122 2171 2241 3398 1361 1098 3758 ...
##  $ Value.Savings.Stocks         : int  1 1 2 1 1 1 1 1 1 3 ...
##  $ Length.of.current.employment : int  2 3 4 3 3 2 4 2 1 1 ...
##  $ Instalment.per.cent          : int  4 2 2 3 4 1 1 2 4 1 ...
##  $ Sex...Marital.Status         : int  2 3 2 3 3 3 3 3 2 2 ...
##  $ Guarantors                   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Duration.in.Current.address  : int  4 2 4 2 4 3 4 4 4 4 ...
##  $ Most.valuable.available.asset: int  2 1 1 1 2 1 1 1 3 4 ...
##  $ Age..years.                  : int  21 36 23 39 38 48 39 40 65 23 ...
##  $ Concurrent.Credits           : int  3 3 3 3 1 3 3 3 3 3 ...
##  $ Type.of.apartment            : int  1 1 1 1 2 1 2 2 2 1 ...
##  $ No.of.Credits.at.this.Bank   : int  1 2 1 2 2 2 2 1 2 1 ...
##  $ Occupation                   : int  3 3 2 2 2 2 2 2 1 1 ...
##  $ No.of.dependents             : int  1 2 1 2 1 2 1 2 1 1 ...
##  $ Telephone                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Foreign.Worker               : int  1 1 1 2 2 2 2 2 1 1 ...

Spliting data

sp <- sample(nrow(Dt),nrow(Dt)*0.7)
train <- Dt[sp, ]
test <- Dt[-sp, ]

Create model

Menggunakan model logistic, model ini cocok dipakai untuk data yang sifatnya binomial, biasanya ditandai dengan variable dependent(variable terikat) , dengan nilai “1” atau “0”, “Yes” atau “No”, dst. Di data ini variable dependentnya adalah Creditability(nilainya “1” dan “0”), “1” artinya “good”, “0” artinya “bad”.

g <- glm(Creditability~
                 Account.Balance+ Duration.of.Credit..month.+ PaymentStatusofPreviousCredit+
                 Purpose+ Credit.Amount+ Value.Savings.Stocks+
                 Length.of.current.employment+ Instalment.per.cent+ Sex...Marital.Status+
                 Guarantors+ Duration.in.Current.address+ Most.valuable.available.asset+
                 Age..years.+ Concurrent.Credits+ Type.of.apartment+
                 No.of.Credits.at.this.Bank+ Occupation+ No.of.dependents+
                 Telephone+ Foreign.Worker, 
                 data = train, family=binomial)
summary(g)
## 
## Call:
## glm(formula = Creditability ~ Account.Balance + Duration.of.Credit..month. + 
##     PaymentStatusofPreviousCredit + Purpose + Credit.Amount + 
##     Value.Savings.Stocks + Length.of.current.employment + Instalment.per.cent + 
##     Sex...Marital.Status + Guarantors + Duration.in.Current.address + 
##     Most.valuable.available.asset + Age..years. + Concurrent.Credits + 
##     Type.of.apartment + No.of.Credits.at.this.Bank + Occupation + 
##     No.of.dependents + Telephone + Foreign.Worker, family = binomial, 
##     data = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7223  -0.7815   0.4393   0.7286   2.0103  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -3.926e+00  1.223e+00  -3.210 0.001326 ** 
## Account.Balance                6.044e-01  8.673e-02   6.969 3.19e-12 ***
## Duration.of.Credit..month.    -2.756e-02  1.027e-02  -2.683 0.007293 ** 
## PaymentStatusofPreviousCredit  3.753e-01  1.031e-01   3.641 0.000272 ***
## Purpose                        2.636e-02  3.640e-02   0.724 0.469044    
## Credit.Amount                 -8.619e-05  4.833e-05  -1.783 0.074563 .  
## Value.Savings.Stocks           2.133e-01  6.930e-02   3.078 0.002081 ** 
## Length.of.current.employment   1.736e-01  8.297e-02   2.092 0.036436 *  
## Instalment.per.cent           -2.544e-01  9.889e-02  -2.573 0.010088 *  
## Sex...Marital.Status           2.274e-01  1.363e-01   1.668 0.095328 .  
## Guarantors                     2.521e-01  2.072e-01   1.217 0.223754    
## Duration.in.Current.address   -1.309e-02  9.418e-02  -0.139 0.889464    
## Most.valuable.available.asset -1.008e-01  1.111e-01  -0.907 0.364333    
## Age..years.                    1.413e-02  9.963e-03   1.418 0.156091    
## Concurrent.Credits             2.889e-01  1.358e-01   2.128 0.033358 *  
## Type.of.apartment              1.986e-01  2.099e-01   0.947 0.343876    
## No.of.Credits.at.this.Bank    -2.106e-01  1.947e-01  -1.082 0.279304    
## Occupation                    -1.395e-01  1.647e-01  -0.847 0.396954    
## No.of.dependents              -3.251e-02  2.953e-01  -0.110 0.912352    
## Telephone                      2.752e-02  2.319e-01   0.119 0.905512    
## Foreign.Worker                 1.422e+00  7.303e-01   1.947 0.051482 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 855.68  on 695  degrees of freedom
## Residual deviance: 662.84  on 675  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 704.84
## 
## Number of Fisher Scoring iterations: 5

Cara baca summary

Variable yang memenuhi syarat adalah variable yang Pr(>|z|) < 0.05, dapat dilihat pada varible ditandai dengan bintang(*), variable dengan bintang 3 tertinggi dan seterusnya, inilah variable yang paling berpengaruh terhadap credibilitas calon pelamar kartu kredit. Variable tsb : 1.Account.Balance, 2.Value.Savings.Stocks, 3.PaymentStatusofPreviousCredit, 4.Guarantors, 5.Concurrent.Credits, 6.Duration.of.Credit..month., 7.Credit.Amount, 8.Instalment.per.cent, 9 Most.valuable.available.asset.

Prediction

p <- predict(g,newdata = test,type = 'response')
p
##         1         2         5         7        12        14        15        16 
## 0.4480585 0.6988018 0.8054350 0.9549841 0.6227000 0.4119008 0.7332512 0.7316658 
##        17        18        21        28        30        34        35        39 
## 0.7920488 0.7858833 0.7096159 0.8967881 0.8065819 0.8451232 0.5679898 0.9425123 
##        40        43        49        59        61        66        70        73 
## 0.8286703 0.8850726 0.8553557 0.9393030 0.9603511 0.9611316 0.9036502 0.8585993 
##        74        80        81        85        87        88        90       100 
## 0.8938785 0.9430186 0.8615112 0.9827145 0.8809530 0.7134968 0.6820674 0.5804640 
##       101       105       110       113       114       118       121       128 
## 0.9448379 0.9433931 0.9754525 0.8032591 0.8395282 0.8672191 0.2505256 0.9486575 
##       132       134       135       138       143       144       151       154 
## 0.8829956 0.8189968 0.9176032 0.3368888 0.8120713 0.9176812 0.8004879 0.5254585 
##       160       162       164       165       168       180       188       190 
## 0.8108743 0.9185904 0.9036981 0.6342048 0.9498946 0.6621885 0.9002068 0.6568461 
##       193       206       212       213       214       218       233       235 
## 0.5633732 0.7539346 0.3157711 0.3222304 0.9914753 0.9862435 0.7133940 0.6962054 
##       237       242       244       247       248       255       258       260 
## 0.8745409 0.6180612 0.9096124 0.8724918 0.8927228 0.8918965 0.9187561 0.9762455 
##       261       263       266       268       272       273       274       275 
## 0.7645460 0.6690436 0.9187660 0.8702150 0.8358881 0.9645511 0.9318171 0.8107016 
##       276       279       285       290       292       293       296       297 
## 0.8597686 0.3907878 0.2859524 0.8352958 0.8617651 0.9867702 0.6064236 0.8494980 
##       302       304       305       307       309       311       315       317 
## 0.4836165 0.9815538 0.8238208 0.6535495 0.2381739 0.9758142 0.6342840 0.6994537 
##       319       320       321       323       329       331       332       335 
## 0.9040890 0.8655907 0.9753759 0.9639497 0.9797624 0.9458828 0.8985156 0.9626685 
##       338       339       343       345       356       367       371       375 
## 0.7705008 0.9512725 0.9493933 0.9347038 0.9347357 0.8463855 0.4246215 0.8843299 
##       376       388       391       403       404       406       407       409 
## 0.3407028 0.8664730 0.8925306 0.6674036 0.6959921 0.5026051 0.8225427 0.7462604 
##       410       411       412       413       418       427       430       431 
## 0.8804361 0.9200484 0.9292966 0.9703809 0.9041324 0.4788300 0.8889998 0.7694928 
##       433       435       439       441       442       443       447       451 
## 0.8167271 0.9817318 0.4254067 0.8415466 0.8323225 0.3803890 0.9779115 0.8553234 
##       460       464       468       470       471       474       481       483 
## 0.9362657 0.4294087 0.9321294 0.8030533 0.7209578 0.9330662 0.8668095 0.9274128 
##       487       491       493       502       505       506       508       510 
## 0.8014977 0.8219913 0.2603003 0.9163324 0.7590638 0.9686365 0.9465301 0.7473819 
##       511       513       520       522       523       529       532       533 
## 0.9776566 0.5738503 0.7188756 0.3102771 0.8944334 0.8186900 0.9495529 0.8129993 
##       534       535       536       538       540       542       547       548 
## 0.6499562 0.2468896 0.5147862 0.9646870 0.2992004 0.5558303 0.6267217 0.9413573 
##       550       551       554       562       565       568       571       572 
## 0.9488647 0.6518085 0.9504710 0.8981890 0.8959490 0.9423341 0.7261085 0.8874197 
##       574       577       578       586       589       590       592       593 
## 0.4460011 0.3715800 0.5046899 0.5196787 0.5254556 0.5199312 0.7225611 0.2467751 
##       599       602       604       616       618       623       632       635 
## 0.7424600 0.9381587 0.6149636 0.5244432 0.4418115 0.7537739 0.5205073 0.2918439 
##       643       649       650       658       666       670       675       676 
## 0.5822497 0.6722629 0.6433782 0.8942376 0.7355943 0.8214372 0.6907769 0.9047724 
##       681       682       683       685       692       694       695       699 
## 0.5240486 0.8167156 0.5945117 0.4817062 0.5765099 0.9118735 0.8723524 0.5064005 
##       700       703       705       707       709       711       712       713 
## 0.7278153 0.6088749 0.9245598 0.8850199 0.2567157 0.8784832 0.5251983 0.9394899 
##       716       717       721       727       730       735       744       752 
## 0.4877333 0.2446791 0.5515088 0.8395063 0.5301700 0.4631305 0.9714969 0.5051386 
##       753       754       757       761       762       768       769       772 
## 0.9047377 0.4672225 0.7081430 0.3621208 0.4853176 0.8529441 0.7009372 0.4701511 
##       777       782       783       784       785       786       788       790 
## 0.5499902 0.5315023 0.3361626 0.4644443 0.2802226 0.8304998 0.8746406 0.5343765 
##       793       797       800       802       803       804       805       806 
## 0.1747835 0.8365235 0.6115170 0.4891824 0.4948186 0.7338956 0.6900319 0.2407778 
##       808       811       812       824       825       830       834       835 
## 0.3439556 0.2372710 0.1884575 0.3879015 0.3612473 0.3646078 0.5257835 0.8803022 
##       836       837       839       840       842       843       849       852 
## 0.2812730 0.6755285 0.6307495 0.9123262 0.8103388 0.8538584 0.5696104 0.2632647 
##       853       854       857       864       869       871       874       875 
## 0.7088311 0.8864554 0.3260370 0.8873999 0.2326770 0.3348650 0.6780053 0.3613330 
##       880       890       894       896       899       909       915       926 
## 0.5657093 0.2674830 0.3002287 0.2681009 0.5166692 0.4549754 0.1061541 0.4947740 
##       927       935       938       942       945       951       952       953 
## 0.1855616 0.9600733 0.8038207 0.6057880 0.7226603 0.3111426 0.9154348 0.7658870 
##       954       957       959       960       962       965       969       973 
## 0.6477980 0.6283691 0.2236815 0.5547219 0.6682290 0.6959884 0.2510308 0.8006526 
##       980       990       996      1000 
## 0.5991674 0.4661332 0.4846720 0.5485085

Check prediksi dari hasil test

p1 <- predict(g,test[1,],type = 'response')
# test[1,], artinya data observasi test baris 1
p1
##         1 
## 0.4480585
p2 <- predict(g,test[2,],type = 'response')
# test[2,], artinya data observasi test baris 2
p2
##         2 
## 0.6988018