Bagaimana menilai kredibilitas seseorang yang mengajukan kartu kredit, di sini akan di ulas dengan singkat menggunakan model statistik. Data diambil dari http://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data). Data terdiri dari 1000 observasi, dengan 21 varible.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Data setelah di download, disimpan di lokal disk, untuk keperluan selanjutnya.
Dt <- read.csv('G:/new-program/clients_credibility/german_credit-1.csv')
dim(Dt)
## [1] 1000 21
str(Dt)
## 'data.frame': 1000 obs. of 21 variables:
## $ Creditability : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Account.Balance : int 1 1 2 1 1 1 1 1 4 2 ...
## $ Duration.of.Credit..month. : int 18 9 12 12 12 10 8 6 18 24 ...
## $ PaymentStatusofPreviousCredit: int 4 4 2 4 4 4 4 4 4 2 ...
## $ Purpose : int 2 0 9 0 0 0 0 0 3 3 ...
## $ Credit.Amount : int 1049 2799 841 2122 2171 2241 3398 1361 1098 3758 ...
## $ Value.Savings.Stocks : int 1 1 2 1 1 1 1 1 1 3 ...
## $ Length.of.current.employment : int 2 3 4 3 3 2 4 2 1 1 ...
## $ Instalment.per.cent : int 4 2 2 3 4 1 1 2 4 1 ...
## $ Sex...Marital.Status : int 2 3 2 3 3 3 3 3 2 2 ...
## $ Guarantors : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Duration.in.Current.address : int 4 2 4 2 4 3 4 4 4 4 ...
## $ Most.valuable.available.asset: int 2 1 1 1 2 1 1 1 3 4 ...
## $ Age..years. : int 21 36 23 39 38 48 39 40 65 23 ...
## $ Concurrent.Credits : int 3 3 3 3 1 3 3 3 3 3 ...
## $ Type.of.apartment : int 1 1 1 1 2 1 2 2 2 1 ...
## $ No.of.Credits.at.this.Bank : int 1 2 1 2 2 2 2 1 2 1 ...
## $ Occupation : int 3 3 2 2 2 2 2 2 1 1 ...
## $ No.of.dependents : int 1 2 1 2 1 2 1 2 1 1 ...
## $ Telephone : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Foreign.Worker : int 1 1 1 2 2 2 2 2 1 1 ...
sp <- sample(nrow(Dt),nrow(Dt)*0.7)
train <- Dt[sp, ]
test <- Dt[-sp, ]
Menggunakan model logistic, model ini cocok dipakai untuk data yang sifatnya binomial, biasanya ditandai dengan variable dependent(variable terikat) , dengan nilai “1” atau “0”, “Yes” atau “No”, dst. Di data ini variable dependentnya adalah Creditability(nilainya “1” dan “0”), “1” artinya “good”, “0” artinya “bad”.
g <- glm(Creditability~
Account.Balance+ Duration.of.Credit..month.+ PaymentStatusofPreviousCredit+
Purpose+ Credit.Amount+ Value.Savings.Stocks+
Length.of.current.employment+ Instalment.per.cent+ Sex...Marital.Status+
Guarantors+ Duration.in.Current.address+ Most.valuable.available.asset+
Age..years.+ Concurrent.Credits+ Type.of.apartment+
No.of.Credits.at.this.Bank+ Occupation+ No.of.dependents+
Telephone+ Foreign.Worker,
data = train, family=binomial)
summary(g)
##
## Call:
## glm(formula = Creditability ~ Account.Balance + Duration.of.Credit..month. +
## PaymentStatusofPreviousCredit + Purpose + Credit.Amount +
## Value.Savings.Stocks + Length.of.current.employment + Instalment.per.cent +
## Sex...Marital.Status + Guarantors + Duration.in.Current.address +
## Most.valuable.available.asset + Age..years. + Concurrent.Credits +
## Type.of.apartment + No.of.Credits.at.this.Bank + Occupation +
## No.of.dependents + Telephone + Foreign.Worker, family = binomial,
## data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7223 -0.7815 0.4393 0.7286 2.0103
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.926e+00 1.223e+00 -3.210 0.001326 **
## Account.Balance 6.044e-01 8.673e-02 6.969 3.19e-12 ***
## Duration.of.Credit..month. -2.756e-02 1.027e-02 -2.683 0.007293 **
## PaymentStatusofPreviousCredit 3.753e-01 1.031e-01 3.641 0.000272 ***
## Purpose 2.636e-02 3.640e-02 0.724 0.469044
## Credit.Amount -8.619e-05 4.833e-05 -1.783 0.074563 .
## Value.Savings.Stocks 2.133e-01 6.930e-02 3.078 0.002081 **
## Length.of.current.employment 1.736e-01 8.297e-02 2.092 0.036436 *
## Instalment.per.cent -2.544e-01 9.889e-02 -2.573 0.010088 *
## Sex...Marital.Status 2.274e-01 1.363e-01 1.668 0.095328 .
## Guarantors 2.521e-01 2.072e-01 1.217 0.223754
## Duration.in.Current.address -1.309e-02 9.418e-02 -0.139 0.889464
## Most.valuable.available.asset -1.008e-01 1.111e-01 -0.907 0.364333
## Age..years. 1.413e-02 9.963e-03 1.418 0.156091
## Concurrent.Credits 2.889e-01 1.358e-01 2.128 0.033358 *
## Type.of.apartment 1.986e-01 2.099e-01 0.947 0.343876
## No.of.Credits.at.this.Bank -2.106e-01 1.947e-01 -1.082 0.279304
## Occupation -1.395e-01 1.647e-01 -0.847 0.396954
## No.of.dependents -3.251e-02 2.953e-01 -0.110 0.912352
## Telephone 2.752e-02 2.319e-01 0.119 0.905512
## Foreign.Worker 1.422e+00 7.303e-01 1.947 0.051482 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 855.68 on 695 degrees of freedom
## Residual deviance: 662.84 on 675 degrees of freedom
## (4 observations deleted due to missingness)
## AIC: 704.84
##
## Number of Fisher Scoring iterations: 5
Variable yang memenuhi syarat adalah variable yang Pr(>|z|) < 0.05, dapat dilihat pada varible ditandai dengan bintang(*), variable dengan bintang 3 tertinggi dan seterusnya, inilah variable yang paling berpengaruh terhadap credibilitas calon pelamar kartu kredit. Variable tsb : 1.Account.Balance, 2.Value.Savings.Stocks, 3.PaymentStatusofPreviousCredit, 4.Guarantors, 5.Concurrent.Credits, 6.Duration.of.Credit..month., 7.Credit.Amount, 8.Instalment.per.cent, 9 Most.valuable.available.asset.
p <- predict(g,newdata = test,type = 'response')
p
## 1 2 5 7 12 14 15 16
## 0.4480585 0.6988018 0.8054350 0.9549841 0.6227000 0.4119008 0.7332512 0.7316658
## 17 18 21 28 30 34 35 39
## 0.7920488 0.7858833 0.7096159 0.8967881 0.8065819 0.8451232 0.5679898 0.9425123
## 40 43 49 59 61 66 70 73
## 0.8286703 0.8850726 0.8553557 0.9393030 0.9603511 0.9611316 0.9036502 0.8585993
## 74 80 81 85 87 88 90 100
## 0.8938785 0.9430186 0.8615112 0.9827145 0.8809530 0.7134968 0.6820674 0.5804640
## 101 105 110 113 114 118 121 128
## 0.9448379 0.9433931 0.9754525 0.8032591 0.8395282 0.8672191 0.2505256 0.9486575
## 132 134 135 138 143 144 151 154
## 0.8829956 0.8189968 0.9176032 0.3368888 0.8120713 0.9176812 0.8004879 0.5254585
## 160 162 164 165 168 180 188 190
## 0.8108743 0.9185904 0.9036981 0.6342048 0.9498946 0.6621885 0.9002068 0.6568461
## 193 206 212 213 214 218 233 235
## 0.5633732 0.7539346 0.3157711 0.3222304 0.9914753 0.9862435 0.7133940 0.6962054
## 237 242 244 247 248 255 258 260
## 0.8745409 0.6180612 0.9096124 0.8724918 0.8927228 0.8918965 0.9187561 0.9762455
## 261 263 266 268 272 273 274 275
## 0.7645460 0.6690436 0.9187660 0.8702150 0.8358881 0.9645511 0.9318171 0.8107016
## 276 279 285 290 292 293 296 297
## 0.8597686 0.3907878 0.2859524 0.8352958 0.8617651 0.9867702 0.6064236 0.8494980
## 302 304 305 307 309 311 315 317
## 0.4836165 0.9815538 0.8238208 0.6535495 0.2381739 0.9758142 0.6342840 0.6994537
## 319 320 321 323 329 331 332 335
## 0.9040890 0.8655907 0.9753759 0.9639497 0.9797624 0.9458828 0.8985156 0.9626685
## 338 339 343 345 356 367 371 375
## 0.7705008 0.9512725 0.9493933 0.9347038 0.9347357 0.8463855 0.4246215 0.8843299
## 376 388 391 403 404 406 407 409
## 0.3407028 0.8664730 0.8925306 0.6674036 0.6959921 0.5026051 0.8225427 0.7462604
## 410 411 412 413 418 427 430 431
## 0.8804361 0.9200484 0.9292966 0.9703809 0.9041324 0.4788300 0.8889998 0.7694928
## 433 435 439 441 442 443 447 451
## 0.8167271 0.9817318 0.4254067 0.8415466 0.8323225 0.3803890 0.9779115 0.8553234
## 460 464 468 470 471 474 481 483
## 0.9362657 0.4294087 0.9321294 0.8030533 0.7209578 0.9330662 0.8668095 0.9274128
## 487 491 493 502 505 506 508 510
## 0.8014977 0.8219913 0.2603003 0.9163324 0.7590638 0.9686365 0.9465301 0.7473819
## 511 513 520 522 523 529 532 533
## 0.9776566 0.5738503 0.7188756 0.3102771 0.8944334 0.8186900 0.9495529 0.8129993
## 534 535 536 538 540 542 547 548
## 0.6499562 0.2468896 0.5147862 0.9646870 0.2992004 0.5558303 0.6267217 0.9413573
## 550 551 554 562 565 568 571 572
## 0.9488647 0.6518085 0.9504710 0.8981890 0.8959490 0.9423341 0.7261085 0.8874197
## 574 577 578 586 589 590 592 593
## 0.4460011 0.3715800 0.5046899 0.5196787 0.5254556 0.5199312 0.7225611 0.2467751
## 599 602 604 616 618 623 632 635
## 0.7424600 0.9381587 0.6149636 0.5244432 0.4418115 0.7537739 0.5205073 0.2918439
## 643 649 650 658 666 670 675 676
## 0.5822497 0.6722629 0.6433782 0.8942376 0.7355943 0.8214372 0.6907769 0.9047724
## 681 682 683 685 692 694 695 699
## 0.5240486 0.8167156 0.5945117 0.4817062 0.5765099 0.9118735 0.8723524 0.5064005
## 700 703 705 707 709 711 712 713
## 0.7278153 0.6088749 0.9245598 0.8850199 0.2567157 0.8784832 0.5251983 0.9394899
## 716 717 721 727 730 735 744 752
## 0.4877333 0.2446791 0.5515088 0.8395063 0.5301700 0.4631305 0.9714969 0.5051386
## 753 754 757 761 762 768 769 772
## 0.9047377 0.4672225 0.7081430 0.3621208 0.4853176 0.8529441 0.7009372 0.4701511
## 777 782 783 784 785 786 788 790
## 0.5499902 0.5315023 0.3361626 0.4644443 0.2802226 0.8304998 0.8746406 0.5343765
## 793 797 800 802 803 804 805 806
## 0.1747835 0.8365235 0.6115170 0.4891824 0.4948186 0.7338956 0.6900319 0.2407778
## 808 811 812 824 825 830 834 835
## 0.3439556 0.2372710 0.1884575 0.3879015 0.3612473 0.3646078 0.5257835 0.8803022
## 836 837 839 840 842 843 849 852
## 0.2812730 0.6755285 0.6307495 0.9123262 0.8103388 0.8538584 0.5696104 0.2632647
## 853 854 857 864 869 871 874 875
## 0.7088311 0.8864554 0.3260370 0.8873999 0.2326770 0.3348650 0.6780053 0.3613330
## 880 890 894 896 899 909 915 926
## 0.5657093 0.2674830 0.3002287 0.2681009 0.5166692 0.4549754 0.1061541 0.4947740
## 927 935 938 942 945 951 952 953
## 0.1855616 0.9600733 0.8038207 0.6057880 0.7226603 0.3111426 0.9154348 0.7658870
## 954 957 959 960 962 965 969 973
## 0.6477980 0.6283691 0.2236815 0.5547219 0.6682290 0.6959884 0.2510308 0.8006526
## 980 990 996 1000
## 0.5991674 0.4661332 0.4846720 0.5485085
p1 <- predict(g,test[1,],type = 'response')
# test[1,], artinya data observasi test baris 1
p1
## 1
## 0.4480585
p2 <- predict(g,test[2,],type = 'response')
# test[2,], artinya data observasi test baris 2
p2
## 2
## 0.6988018