STEP 3: TRAINING MODEL ON THE DATA.
- Use of regression tree using rpart package.
library(rpart)
package 㤼㸱rpart㤼㸲 was built under R version 3.3.3
m.rpart <- rpart(default ~ ., data = credit_train)
- Getting basic information about the tree.
m.rpart
n= 900
node), split, n, deviance, yval
* denotes terminal node
1) root 900 187.790000 0.29666670
2) checking_balance=> 200 DM,unknown 414 46.956520 0.13043480
4) other_credit=none 342 29.815790 0.09649123 *
5) other_credit=bank,store 72 14.875000 0.29166670
10) purpose=car0,furniture/appliances 35 3.542857 0.11428570 *
11) purpose=business,car,education 37 9.189189 0.45945950 *
3) checking_balance=< 0 DM,1 - 200 DM 486 119.648100 0.43827160
6) months_loan_duration< 22.5 271 60.767530 0.33948340
12) credit_history=critical,good,poor 248 52.318550 0.30241940
24) amount>=1373 142 24.788730 0.22535210 *
25) amount< 1373 106 25.556600 0.40566040
50) months_loan_duration< 8.5 22 1.818182 0.09090909 *
51) months_loan_duration>=8.5 84 20.988100 0.48809520
102) existing_loans_count>=1.5 23 3.913043 0.21739130 *
103) existing_loans_count< 1.5 61 14.754100 0.59016390 *
13) credit_history=perfect,very good 23 4.434783 0.73913040 *
7) months_loan_duration>=22.5 215 52.902330 0.56279070
14) savings_balance=> 1000 DM,unknown 38 7.815789 0.28947370
28) checking_balance=1 - 200 DM 23 1.826087 0.08695652 *
29) checking_balance=< 0 DM 15 3.600000 0.60000000 *
15) savings_balance=< 100 DM,100 - 500 DM,500 - 1000 DM 177 41.638420 0.62146890
30) months_loan_duration< 47.5 144 35.305560 0.56944440 *
31) months_loan_duration>=47.5 33 4.242424 0.84848480 *
- Getting more detailed information about the tree.
summary(m.rpart)
Call:
rpart(formula = default ~ ., data = credit_train)
n= 900
CP nsplit rel error xerror xstd
1 0.11281394 0 1.0000000 1.0032213 0.02977643
2 0.03183500 1 0.8871861 0.8918257 0.02998993
3 0.02137599 2 0.8553511 0.8789389 0.03302312
4 0.01836156 3 0.8339751 0.8764019 0.03486897
5 0.01272540 4 0.8156135 0.8855730 0.03632223
6 0.01257665 5 0.8028881 0.8982187 0.03862359
7 0.01235930 7 0.7777348 0.9026306 0.03895995
8 0.01206524 8 0.7653755 0.9000972 0.03909697
9 0.01141144 9 0.7533103 0.9039595 0.03986084
10 0.01113179 10 0.7418988 0.9080625 0.04005093
11 0.01000000 11 0.7307670 0.9018478 0.03991037
Variable importance
checking_balance months_loan_duration credit_history amount
32 17 13 9
savings_balance purpose existing_loans_count other_credit
9 4 4 3
age employment_duration housing job
2 2 2 1
percent_of_income
1
Node number 1: 900 observations, complexity param=0.1128139
mean=0.2966667, MSE=0.2086556
left son=2 (414 obs) right son=3 (486 obs)
Primary splits:
checking_balance splits as RLRL, improve=0.11281390, (0 missing)
credit_history splits as LLRLR, improve=0.04102733, (0 missing)
months_loan_duration < 34.5 to the left, improve=0.03257729, (0 missing)
savings_balance splits as RLRLL, improve=0.03127468, (0 missing)
amount < 3913.5 to the left, improve=0.02747942, (0 missing)
Surrogate splits:
savings_balance splits as RLRLL, agree=0.606, adj=0.143, (0 split)
credit_history splits as LRRRR, agree=0.602, adj=0.135, (0 split)
existing_loans_count < 1.5 to the right, agree=0.560, adj=0.043, (0 split)
age < 30.5 to the right, agree=0.556, adj=0.034, (0 split)
months_loan_duration < 10.5 to the left, agree=0.554, adj=0.031, (0 split)
Node number 2: 414 observations, complexity param=0.01206524
mean=0.1304348, MSE=0.1134216
left son=4 (342 obs) right son=5 (72 obs)
Primary splits:
other_credit splits as RLR, improve=0.04825171, (0 missing)
purpose splits as RLLRLR, improve=0.02267511, (0 missing)
employment_duration splits as RLRLR, improve=0.02193416, (0 missing)
credit_history splits as LLRRR, improve=0.01971601, (0 missing)
amount < 4158 to the left, improve=0.01727887, (0 missing)
Surrogate splits:
credit_history splits as LLLLR, agree=0.838, adj=0.069, (0 split)
months_loan_duration < 45 to the left, agree=0.833, adj=0.042, (0 split)
purpose splits as LLRLLL, agree=0.829, adj=0.014, (0 split)
Node number 3: 486 observations, complexity param=0.031835
mean=0.4382716, MSE=0.2461896
left son=6 (271 obs) right son=7 (215 obs)
Primary splits:
months_loan_duration < 22.5 to the left, improve=0.04996563, (0 missing)
credit_history splits as LLRLR, improve=0.03435761, (0 missing)
savings_balance splits as RLRLL, improve=0.02810707, (0 missing)
amount < 10841.5 to the left, improve=0.02392460, (0 missing)
housing splits as RLR, improve=0.01653865, (0 missing)
Surrogate splits:
amount < 2805.5 to the left, agree=0.749, adj=0.433, (0 split)
credit_history splits as LLRRR, agree=0.615, adj=0.130, (0 split)
housing splits as RLL, agree=0.609, adj=0.116, (0 split)
purpose splits as RLRLLL, agree=0.605, adj=0.107, (0 split)
job splits as RLLL, agree=0.595, adj=0.084, (0 split)
Node number 4: 342 observations
mean=0.09649123, MSE=0.08718067
Node number 5: 72 observations, complexity param=0.01141144
mean=0.2916667, MSE=0.2065972
left son=10 (35 obs) right son=11 (37 obs)
Primary splits:
purpose splits as RRLRL-, improve=0.14406410, (0 missing)
employment_duration splits as LLRLR, improve=0.09178217, (0 missing)
age < 44.5 to the right, improve=0.06448474, (0 missing)
months_loan_duration < 16.5 to the left, improve=0.05196594, (0 missing)
credit_history splits as RLRLL, improve=0.03781513, (0 missing)
Surrogate splits:
employment_duration splits as LLRRR, agree=0.653, adj=0.286, (0 split)
months_loan_duration < 27.5 to the left, agree=0.639, adj=0.257, (0 split)
credit_history splits as RLRRR, agree=0.625, adj=0.229, (0 split)
amount < 3588.5 to the left, agree=0.625, adj=0.229, (0 split)
percent_of_income < 3.5 to the right, agree=0.597, adj=0.171, (0 split)
Node number 6: 271 observations, complexity param=0.02137599
mean=0.3394834, MSE=0.2242344
left son=12 (248 obs) right son=13 (23 obs)
Primary splits:
credit_history splits as LLRLR, improve=0.06605825, (0 missing)
purpose splits as LRLRLR, improve=0.02908942, (0 missing)
amount < 1373 to the right, improve=0.02642861, (0 missing)
months_loan_duration < 8.5 to the left, improve=0.02557036, (0 missing)
employment_duration splits as RRRLR, improve=0.02148802, (0 missing)
Node number 7: 215 observations, complexity param=0.01836156
mean=0.5627907, MSE=0.2460573
left son=14 (38 obs) right son=15 (177 obs)
Primary splits:
savings_balance splits as RLRRL, improve=0.06517895, (0 missing)
percent_of_income < 2.5 to the left, improve=0.03332819, (0 missing)
amount < 1381.5 to the right, improve=0.02814606, (0 missing)
years_at_residence < 1.5 to the left, improve=0.02133182, (0 missing)
months_loan_duration < 43.5 to the left, improve=0.01816712, (0 missing)
Node number 10: 35 observations
mean=0.1142857, MSE=0.1012245
Node number 11: 37 observations
mean=0.4594595, MSE=0.2483565
Node number 12: 248 observations, complexity param=0.01257665
mean=0.3024194, MSE=0.2109619
left son=24 (142 obs) right son=25 (106 obs)
Primary splits:
amount < 1373 to the right, improve=0.03771535, (0 missing)
credit_history splits as LR-R-, improve=0.03542751, (0 missing)
employment_duration splits as RRRLR, improve=0.03057811, (0 missing)
months_loan_duration < 8.5 to the left, improve=0.02571194, (0 missing)
purpose splits as LRLRLL, improve=0.02087838, (0 missing)
Surrogate splits:
percent_of_income < 3.5 to the left, agree=0.645, adj=0.170, (0 split)
job splits as LLLR, agree=0.637, adj=0.151, (0 split)
months_loan_duration < 9.5 to the right, agree=0.625, adj=0.123, (0 split)
age < 23.5 to the right, agree=0.609, adj=0.085, (0 split)
purpose splits as LLLRLL, agree=0.605, adj=0.075, (0 split)
Node number 13: 23 observations
mean=0.7391304, MSE=0.1928166
Node number 14: 38 observations, complexity param=0.0127254
mean=0.2894737, MSE=0.2056787
left son=28 (23 obs) right son=29 (15 obs)
Primary splits:
checking_balance splits as R-L-, improve=0.30575320, (0 missing)
amount < 1840 to the right, improve=0.19812560, (0 missing)
credit_history splits as LRLLR, improve=0.18803420, (0 missing)
purpose splits as LRRLL-, improve=0.09955595, (0 missing)
months_loan_duration < 45 to the right, improve=0.04800237, (0 missing)
Surrogate splits:
amount < 1548 to the right, agree=0.763, adj=0.400, (0 split)
employment_duration splits as LRLLL, agree=0.711, adj=0.267, (0 split)
age < 48.5 to the left, agree=0.684, adj=0.200, (0 split)
housing splits as LLR, agree=0.684, adj=0.200, (0 split)
years_at_residence < 2.5 to the left, agree=0.658, adj=0.133, (0 split)
Node number 15: 177 observations, complexity param=0.01113179
mean=0.6214689, MSE=0.2352453
left son=30 (144 obs) right son=31 (33 obs)
Primary splits:
months_loan_duration < 47.5 to the left, improve=0.05020456, (0 missing)
percent_of_income < 2.5 to the left, improve=0.03933868, (0 missing)
years_at_residence < 1.5 to the left, improve=0.02880871, (0 missing)
amount < 11788 to the left, improve=0.02694030, (0 missing)
employment_duration splits as RRRLL, improve=0.02014564, (0 missing)
Surrogate splits:
amount < 13319.5 to the left, agree=0.831, adj=0.091, (0 split)
Node number 24: 142 observations
mean=0.2253521, MSE=0.1745685
Node number 25: 106 observations, complexity param=0.01257665
mean=0.4056604, MSE=0.2411
left son=50 (22 obs) right son=51 (84 obs)
Primary splits:
months_loan_duration < 8.5 to the left, improve=0.10761710, (0 missing)
existing_loans_count < 1.5 to the right, improve=0.10237620, (0 missing)
amount < 632 to the left, improve=0.07050801, (0 missing)
purpose splits as LRLRLL, improve=0.06581905, (0 missing)
age < 37.5 to the right, improve=0.06090808, (0 missing)
Surrogate splits:
amount < 456 to the left, agree=0.821, adj=0.136, (0 split)
credit_history splits as RR-L-, agree=0.802, adj=0.045, (0 split)
purpose splits as RRLRRR, agree=0.802, adj=0.045, (0 split)
job splits as LRRR, agree=0.802, adj=0.045, (0 split)
Node number 28: 23 observations
mean=0.08695652, MSE=0.07939509
Node number 29: 15 observations
mean=0.6, MSE=0.24
Node number 30: 144 observations
mean=0.5694444, MSE=0.2451775
Node number 31: 33 observations
mean=0.8484848, MSE=0.1285583
Node number 50: 22 observations
mean=0.09090909, MSE=0.08264463
Node number 51: 84 observations, complexity param=0.0123593
mean=0.4880952, MSE=0.2498583
left son=102 (23 obs) right son=103 (61 obs)
Primary splits:
existing_loans_count < 1.5 to the right, improve=0.11058430, (0 missing)
years_at_residence < 3.5 to the right, improve=0.10240430, (0 missing)
purpose splits as LR-RLL, improve=0.08257983, (0 missing)
job splits as LRRL, improve=0.07189450, (0 missing)
amount < 653 to the left, improve=0.05554229, (0 missing)
Surrogate splits:
credit_history splits as LR-L-, agree=0.869, adj=0.522, (0 split)
employment_duration splits as RLRRR, agree=0.762, adj=0.130, (0 split)
purpose splits as RR-RRL, agree=0.750, adj=0.087, (0 split)
age < 66.5 to the right, agree=0.750, adj=0.087, (0 split)
housing splits as LRR, agree=0.750, adj=0.087, (0 split)
Node number 102: 23 observations
mean=0.2173913, MSE=0.1701323
Node number 103: 61 observations
mean=0.5901639, MSE=0.2418705
- Using the rpart.plot package to create a visualization.
library(rpart.plot)
package 㤼㸱rpart.plot㤼㸲 was built under R version 3.3.3
- Obtaining a basic decision tree diagram.
rpart.plot(m.rpart, digits = 3)

- Making adjustments to increase visibility.
rpart.plot(m.rpart, digits = 4, fallen.leaves = TRUE, type = 3, extra = 101)

LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCmF1dGhvcjogIkFkaGlzbGFjeSINCi0tLQ0KDQojI0NMQVNTSUZJQ0FUSU9OIFRSRUVTLg0KDQojIyBFeGFtcGxlOiBDcmVkaXQgRGF0YS4NCg0KIyNTVEVQIDE6IERBVEEgQ09MTEVDVElPTi4NCg0KIyNTVEVQIDI6IEVYUExPUklORyBBTkQgUFJFUEFSSU5HIERBVEEuDQoNCmBgYHtyfQ0KY3JlZGl0IDwtIHJlYWQuY3N2KCJodHRwOi8vd3d3LnNjaS5jc3VlYXN0YmF5LmVkdS9+ZXN1ZXNzL2NsYXNzZXMvU3RhdGlzdGljc182NjIwL1ByZXNlbnRhdGlvbnMvbWw3L2NyZWRpdC5jc3YiKQ0KYGBgDQoNCg0KPi0gKkV4YW1pbmluZyB0aGUgc3RydWN0dXJlIG9mIHRoZSBjcmVkaXQgZGF0YS4qDQoNCmBgYHtyfQ0Kc3RyKGNyZWRpdCkNCmBgYA0KDQoNCj4tICpHZXR0aW5nIHRoZSBkaXN0cmlidXRpb24gb2YgdGhlIGRlZmF1bHRzLioNCiAgKkhlcmUsIHdlIHNlZSB0aGF0IG1hb3N0IG9mIHRoZSBsb2FuIGFwcGxpY2FudHMgd2VyZSBub3QgbG9hbiBkZWZhdWx0ZXJzLCB0aHVzIG1vc3RseSBxdWFsaWZ5aW5nIGZvciAgICAgYSBjcmVkaXQgbG9hbi4qDQoNCmBgYHtyfQ0KcGxvdChjcmVkaXQkZGVmYXVsdCkNCmBgYA0KDQoNCj4tICpTdW1tYXJ5IHN0YXRpc3RpY3Mgb2YgdGhlIGNyZWRpdCBkYXRhIC0gd2UgZ2V0IHRoZSBtaW4sIG1heCwgbWVkaWFuLCAxc3QsIDJuZCAmIDNyZCBxdHJzLCBhcyB3ZWxsIGFzIHRoZSBtZWFuIG9mIHRoZSB2YXJpYWJsZSBmb3IgbG9hbiBhcHBsaWNhdGlvbi4qDQoNCmBgYHtyfQ0Kc3VtbWFyeShjcmVkaXQpDQpgYGANCg0KDQojIyBTVEVQIDM6IFRSQUlOSU5HIE1PREVMIE9OIFRIRSBEQVRBLg0KDQo+LSAqVXNlIG9mIHJlZ3Jlc3Npb24gdHJlZSB1c2luZyBycGFydCBwYWNrYWdlLioNCg0KYGBge3J9DQpsaWJyYXJ5KHJwYXJ0KQ0KYGBgDQoNCmBgYHtyfQ0KbS5ycGFydCA8LSBycGFydChkZWZhdWx0IH4gLiwgZGF0YSA9IGNyZWRpdF90cmFpbikNCmBgYA0KDQoNCj4tICpHZXR0aW5nIGJhc2ljIGluZm9ybWF0aW9uIGFib3V0IHRoZSB0cmVlLioNCg0KYGBge3J9DQptLnJwYXJ0DQpgYGANCg0KDQo+LSAqR2V0dGluZyBtb3JlIGRldGFpbGVkIGluZm9ybWF0aW9uIGFib3V0IHRoZSB0cmVlLioNCg0KYGBge3J9DQpzdW1tYXJ5KG0ucnBhcnQpDQpgYGANCg0KDQo+LSAqVXNpbmcgdGhlIHJwYXJ0LnBsb3QgcGFja2FnZSB0byBjcmVhdGUgYSB2aXN1YWxpemF0aW9uLioNCg0KYGBge3J9DQpsaWJyYXJ5KHJwYXJ0LnBsb3QpDQpgYGANCg0KDQo+LSAqT2J0YWluaW5nIGEgYmFzaWMgZGVjaXNpb24gdHJlZSBkaWFncmFtLioNCg0KYGBge3J9DQpycGFydC5wbG90KG0ucnBhcnQsIGRpZ2l0cyA9IDMpDQpgYGANCg0KDQo+LSAqTWFraW5nIGFkanVzdG1lbnRzIHRvIGluY3JlYXNlIHZpc2liaWxpdHkuKg0KDQpgYGB7cn0NCnJwYXJ0LnBsb3QobS5ycGFydCwgZGlnaXRzID0gNCwgZmFsbGVuLmxlYXZlcyA9IFRSVUUsIHR5cGUgPSAzLCBleHRyYSA9IDEwMSkNCmBgYA0KDQoNCiMjIFNURVAgNDogRVZBTFVBVElORyBNT0RFTCBQRVJGT1JNQU5DRS4NCg0KPi0gKkdlbmVyYXRpbmcgcHJlZGljdGlvbnMgZm9yIHRoZSB0ZXN0aW5nIGRhdGFzZXQuKg0KDQpgYGB7cn0NCnAucnBhcnQgPC0gcHJlZGljdChtLnJwYXJ0LCBjcmVkaXRfdGVzdCkNCmBgYA0KDQoNCj4tICpDb21wYXJpbmcgdGhlIGRpc3RyaWJ1dGlvbiBvZiBQcmVkaWN0ZWQgdmFsdWVzIHZzLiBBY3R1YWwgdmFsdWVzLioNCiAgKlRoaXMgZ2l2ZXMgdXMgdGhlIGZpdmUgbnVtYmVyIHN1bW1hcnkuKg0KICANCmBgYHtyfQ0Kc3VtbWFyeShwLnJwYXJ0KQ0KYGBgDQoNCg0KYGBge3J9DQpzdW1tYXJ5KGNyZWRpdF90ZXN0JGRlZmF1bHQpDQpgYGANCg0KDQo+LSAqQ29tcGFyaW5nIHRoZSBjb3JyZWxhdGlvbi4qDQoNCmBgYHtyfQ0KY29yKHAucnBhcnQsIGFzLm51bWVyaWMoY3JlZGl0X3Rlc3QkZGVmYXVsdCkpDQpgYGANCg0KDQo+LSAqQ3JlYXRpbmcgYSBmdW5jdGlvbiB0byBjYWxjdWxhdGUgdGhlIE1lYW4gQWJzb2x1dGUgRXJyb3IuKg0KDQpgYGB7cn0NCk1BRSA8LSBmdW5jdGlvbihhY3R1YWwsIHByZWRpY3RlZCkgew0KICBtZWFuKGFicyhhY3R1YWwgLSBwcmVkaWN0ZWQpKSAgDQp9DQpgYGANCg0KDQo+LSAqR2V0dGluZyB0aGUgTWVhbiBBYnNvbHV0ZSBFcnJvciBiZXR3ZWVuIFByZWRpY3RlZCAmIEFjdHVhbCBWYWx1ZXMuKg0KDQpgYGB7cn0NCk1BRShwLnJwYXJ0LCBhcy5udW1lcmljKGNyZWRpdF90ZXN0JGRlZmF1bHQpKQ0KYGBgDQoNCg0KPi0gKk1lYW4gQWJzb2x1dGUgRXJyb3IgYmV0d2VlbiBBY3R1YWwgdmFsdWVzIGFuZCBNZWFuIHZhbHVlLioNCg0KYGBge3J9DQptZWFuKGFzLm51bWVyaWMoY3JlZGl0X3RyYWluJGRlZmF1bHQpKQ0KYGBgDQoNCiANCmBgYHtyfQ0KTUFFKDEuMzAsIGFzLm51bWVyaWMoY3JlZGl0X3RyYWluJGRlZmF1bHQpKQ0KYGBgDQo=