Outline
Supplemental Analysis + Description of the German credit dataset + Loading the dataset and installing the necessary packages + Utilization of Random Forest to determine Variable Importance + A Classification Tree for the German credit dataset
Random Forest (Requirement #1) + Building the Random Forest Model + The Confusion Matrix for the Random Forest Model + The Error Rate for the Random Forest Model + The Overall Benefit/Cost of the Random Forest Model + A Visualization of (1) Random Forest Model
Support Vector Machine (SVM) (Requirement #2) +
Description of the German credit dataset.
1. Title: German Credit data
2. Source Information
Professor Dr. Hans Hofmann
Institut f"ur Statistik und "Okonometrie
Universit"at Hamburg
FB Wirtschaftswissenschaften
Von-Melle-Park 5
2000 Hamburg 13
3. Number of Instances: 1000
4. Two datasets are provided. The original dataset, in the form provided by Prof. Hofmann, contains categorical/symbolic attributes and is in the file “german.data”.
5. For algorithms that need numerical attributes, Strathclyde University produced the file “german.data-numeric”.
This file has been edited and several indicator variables added to make it suitable for algorithms which cannot cope with categorical variables. Severalattributes that are ordered categorical (such as attribute 17) have been coded as integer. This was the form used by StatLog.
6. Number of Attributes german: 20 (7 numerical, 13 categorical) Number of Attributes german.numer: 24 (24 numerical)
7. Attribute description for german
Attribute 2: (numerical) Duration in month
Attribute 5: (numerical) Credit amount
Attribute 8: (numerical) Installment rate in percentage of disposable income
Attribute 11: (numerical) Present residence since
Attribute 13: (numerical) Age in years
Attribute 16: (numerical) Number of existing credits at this bank
Attribute 18: (numerical) Number of people being liable to provide maintenance for
8. Cost Matrix
This dataset requires use of a cost matrix (see below)
costMatrix <- matrix(c(0,5,1,0),ncol=2)
colnames(costMatrix) <- c('Predict Good','Predict Bad')
rownames(costMatrix) <- c('Actual Good','Actual Bad')
Predict Good | Predict Bad | |
---|---|---|
Actual Good | 0 | 1 |
Actual Bad | 5 | 0 |
(1 = Good, 2 = Bad)
The rows represent the actual classification and the columns the predicted classification.
**It is worse to class a customer as good when they are bad (5), than it is to class a customer as bad when they are good (1).**
Load the German Credit dataset and the necessary packages
## Warning: package 'randomForest' was built under R version 3.1.1
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
## Warning: package 'rattle' was built under R version 3.1.1
## Rattle: A free graphical interface for data mining with R.
## Version 3.1.0 Copyright (c) 2006-2014 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
## Warning: package 'DMwR' was built under R version 3.1.1
## Loading required package: lattice
## Loading required package: grid
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
## Warning: package 'e1071' was built under R version 3.1.1
## Warning: package 'performanceEstimation' was built under R version 3.1.1
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.1.1
Utilization of Random Forest to determine Variable Importance
MeanDecreaseAccuracy | MeanDecreaseGini | |
---|---|---|
chk_acct | 34.197 | 45.432 |
duration | 19.263 | 39.800 |
history | 14.765 | 25.718 |
purpose | 8.926 | 36.170 |
amount | 13.196 | 52.916 |
sav_acct | 9.393 | 21.216 |
employment | 5.846 | 24.054 |
install_rate | 5.000 | 16.830 |
pstatus | 3.392 | 15.128 |
other_debtor | 10.570 | 7.390 |
time_resid | 4.704 | 15.836 |
property | 6.819 | 19.271 |
age | 7.951 | 39.791 |
other_install | 7.370 | 11.201 |
housing | 5.533 | 9.647 |
other_credits | 5.061 | 9.178 |
job | 3.257 | 12.381 |
num_depend | 3.135 | 5.771 |
telephone | 4.249 | 6.062 |
foreign | 0.509 | 1.514 |
A Classification Tree for the German credit dataset Utilizing the ‘rpart’ and ‘rattle’ packages
## Loading required package: rpart.plot
## Loading required package: rpart
## Loading required package: RColorBrewer
## Warning: package 'RColorBrewer' was built under R version 3.1.1
Building the Random Forest Model
load('germanCredit.Rdata')
set.seed(1234)
trPerc <- 0.7
sp <- sample(1:nrow(german),as.integer(trPerc*nrow(german)))
tr <- german[sp,]
ts <- german[-sp,]
m <- randomForest(response ~ ., tr, ntree = 3000)
ps <- predict(m, ts)
Predicting with the Random Forest Model
## 4 5 7 8 11 15 16 20 24 28 29 33 34 38 41
## good bad good good bad bad bad good good good good good good good good
## 42 44 46 47 48 51 52 53 56 57 58 60 62 64 66
## good good good good good good good good good good good bad good bad good
## 73 76 77 79 80 87 88 93 98 100 103 107 108 112 118
## good good bad good good good good good good good good good good good good
## 121 124 125 129 134 136 137 138 140 142 143 144 147 150 153
## good good good good good good good good good bad good good good good bad
## 159 161 164 168 173 177 179 187 188 191 203 205 207 208 212
## good good good good good good good good good good good good good good good
## 213 216 218 219 226 227 229 232 234 236 237 239 241 251 253
## good good bad bad bad good good good good good good good bad good bad
## 256 259 260 262 264 265 267 286 287 290 301 303 314 318 319
## good good good good good good good bad bad bad good good good good good
## 325 326 328 329 334 336 337 338 340 351 358 359 364 365 370
## good good good good good good good good bad good good good good good good
## 372 383 385 387 404 406 407 408 410 424 425 428 429 432 437
## good good good good good good good good good good good good good bad good
## 441 444 445 455 456 460 465 468 469 482 486 495 497 498 499
## good good bad good good good good good good bad good good bad good good
## 505 511 517 519 521 523 525 540 553 556 557 560 561 562 564
## bad good good good good bad good good good good bad good good good good
## 567 568 570 571 572 574 575 577 585 588 594 598 600 605 608
## good good bad bad good bad good good good good bad good good good good
## 610 619 625 628 630 632 633 639 644 648 650 651 654 658 659
## good good good good good bad good good good good bad bad good good bad
## 664 666 670 671 675 676 678 683 684 689 692 696 697 701 704
## good good good good good good bad good good good good good good good good
## 706 710 712 714 723 725 729 730 731 737 739 740 741 744 745
## good good bad good bad good bad good good bad good bad good good bad
## 747 750 752 756 759 760 763 767 775 782 790 793 795 797 802
## good good bad bad good good good good good good bad good good good good
## 816 822 826 829 831 839 841 846 849 851 855 859 863 864 866
## bad good bad good good good bad good good good good good good good good
## 870 878 879 880 882 884 885 889 891 893 894 901 908 910 915
## bad good good good good good good good good good good good good good bad
## 922 923 924 925 926 928 929 933 934 936 937 938 939 940 941
## good good good good bad bad good good good good good good bad good good
## 945 947 956 962 968 971 976 978 980 981 984 993 994 995 1000
## good good good good good good good good bad good good good bad good good
## Levels: bad good
The Confusion Matrix for the Random Forest Model
confuseRF <- table(ps,ts$response)
colnames(confuseRF) <- c('Predict Good','Predict Bad')
rownames(confuseRF) <- c('Actual Good','Actual Bad')
Predict Good | Predict Bad | |
---|---|---|
Actual Good | 35 | 18 |
Actual Bad | 64 | 183 |
The Error Rate of the Random Forest Model The Error Rate measures the proportion of the predictions that are incorrect.
err <- 100 * (1 - sum(diag(confuseRF))/sum(confuseRF))
err
## [1] 27.33
The Overall Benefit/Cost of the Random Forest Model Calculated by multiplying the Confusion Matrix by the Cost/Benfit Matrix
utilityRF <- sum(confuseRF*costMatrix)
utilityRF
## [1] 338
A Visualization of (1) Random Forest Model
tree <- getTree(m,1,labelVar=TRUE)
left daughter | right daughter | split var | split point | status | prediction |
---|---|---|---|---|---|
2 | 3 | property | 3.0 | 1 | NA |
4 | 5 | pstatus | 3.0 | 1 | NA |
6 | 7 | age | 29.5 | 1 | NA |
8 | 9 | chk_acct | 1.0 | 1 | NA |
10 | 11 | sav_acct | 12.0 | 1 | NA |
12 | 13 | employment | 15.0 | 1 | NA |
14 | 15 | chk_acct | 3.0 | 1 | NA |
16 | 17 | purpose | 42.0 | 1 | NA |
18 | 19 | sav_acct | 12.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
20 | 21 | foreign | 1.0 | 1 | NA |
22 | 23 | sav_acct | 1.0 | 1 | NA |
24 | 25 | pstatus | 3.0 | 1 | NA |
26 | 27 | sav_acct | 9.0 | 1 | NA |
28 | 29 | sav_acct | 8.0 | 1 | NA |
30 | 31 | amount | 2609.0 | 1 | NA |
32 | 33 | age | 64.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
34 | 35 | age | 25.5 | 1 | NA |
36 | 37 | time_resid | 2.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
38 | 39 | job | 6.0 | 1 | NA |
40 | 41 | other_credits | 3.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
42 | 43 | history | 1.0 | 1 | NA |
44 | 45 | telephone | 1.0 | 1 | NA |
46 | 47 | other_install | 1.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
48 | 49 | amount | 9492.5 | 1 | NA |
50 | 51 | employment | 9.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
52 | 53 | amount | 1237.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
54 | 55 | duration | 13.5 | 1 | NA |
56 | 57 | chk_acct | 2.0 | 1 | NA |
58 | 59 | amount | 6201.0 | 1 | NA |
60 | 61 | amount | 969.5 | 1 | NA |
62 | 63 | pstatus | 5.0 | 1 | NA |
64 | 65 | pstatus | 4.0 | 1 | NA |
66 | 67 | duration | 22.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
68 | 69 | property | 4.0 | 1 | NA |
70 | 71 | history | 13.0 | 1 | NA |
72 | 73 | history | 8.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
74 | 75 | purpose | 17.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
76 | 77 | property | 1.0 | 1 | NA |
78 | 79 | amount | 997.5 | 1 | NA |
80 | 81 | amount | 2272.0 | 1 | NA |
82 | 83 | job | 2.0 | 1 | NA |
84 | 85 | purpose | 73.0 | 1 | NA |
86 | 87 | purpose | 5.0 | 1 | NA |
88 | 89 | history | 1.0 | 1 | NA |
90 | 91 | sav_acct | 1.0 | 1 | NA |
92 | 93 | num_depend | 1.5 | 1 | NA |
94 | 95 | other_install | 2.0 | 1 | NA |
96 | 97 | chk_acct | 1.0 | 1 | NA |
98 | 99 | employment | 6.0 | 1 | NA |
100 | 101 | amount | 2170.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
102 | 103 | amount | 2799.5 | 1 | NA |
104 | 105 | age | 32.5 | 1 | NA |
106 | 107 | age | 50.5 | 1 | NA |
108 | 109 | amount | 3317.5 | 1 | NA |
110 | 111 | amount | 1686.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
112 | 113 | install_rate | 3.5 | 1 | NA |
114 | 115 | sav_acct | 4.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
116 | 117 | time_resid | 2.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
118 | 119 | time_resid | 3.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
120 | 121 | other_credits | 1.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
122 | 123 | chk_acct | 2.0 | 1 | NA |
124 | 125 | time_resid | 3.0 | 1 | NA |
126 | 127 | chk_acct | 6.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
128 | 129 | age | 31.5 | 1 | NA |
130 | 131 | install_rate | 1.5 | 1 | NA |
132 | 133 | duration | 42.0 | 1 | NA |
134 | 135 | pstatus | 4.0 | 1 | NA |
136 | 137 | chk_acct | 1.0 | 1 | NA |
138 | 139 | duration | 42.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
140 | 141 | employment | 8.0 | 1 | NA |
142 | 143 | other_debtor | 2.0 | 1 | NA |
144 | 145 | other_debtor | 3.0 | 1 | NA |
146 | 147 | age | 25.0 | 1 | NA |
148 | 149 | purpose | 16.0 | 1 | NA |
150 | 151 | property | 4.0 | 1 | NA |
152 | 153 | duration | 39.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
154 | 155 | age | 22.5 | 1 | NA |
156 | 157 | history | 2.0 | 1 | NA |
158 | 159 | amount | 613.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
160 | 161 | num_depend | 1.5 | 1 | NA |
162 | 163 | employment | 12.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
164 | 165 | other_credits | 1.5 | 1 | NA |
166 | 167 | duration | 45.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
168 | 169 | employment | 2.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
170 | 171 | age | 32.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
172 | 173 | purpose | 16.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
174 | 175 | purpose | 8.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
176 | 177 | other_credits | 1.5 | 1 | NA |
178 | 179 | install_rate | 2.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
180 | 181 | amount | 4102.5 | 1 | NA |
182 | 183 | history | 2.0 | 1 | NA |
184 | 185 | other_install | 3.0 | 1 | NA |
186 | 187 | age | 25.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
188 | 189 | amount | 1613.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
190 | 191 | time_resid | 3.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
192 | 193 | purpose | 9.0 | 1 | NA |
194 | 195 | property | 1.0 | 1 | NA |
196 | 197 | age | 33.5 | 1 | NA |
198 | 199 | install_rate | 3.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
200 | 201 | foreign | 1.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
202 | 203 | other_credits | 1.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
204 | 205 | age | 26.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
206 | 207 | duration | 19.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
208 | 209 | other_debtor | 2.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
210 | 211 | time_resid | 2.0 | 1 | NA |
212 | 213 | chk_acct | 4.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
214 | 215 | purpose | 128.0 | 1 | NA |
216 | 217 | age | 31.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
218 | 219 | telephone | 1.0 | 1 | NA |
220 | 221 | employment | 12.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | good |
222 | 223 | install_rate | 3.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
224 | 225 | age | 30.0 | 1 | NA |
226 | 227 | employment | 8.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
228 | 229 | time_resid | 3.0 | 1 | NA |
230 | 231 | duration | 7.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
232 | 233 | purpose | 8.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
234 | 235 | purpose | 8.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
236 | 237 | amount | 1324.0 | 1 | NA |
238 | 239 | install_rate | 2.5 | 1 | NA |
240 | 241 | sav_acct | 2.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
242 | 243 | chk_acct | 2.0 | 1 | NA |
244 | 245 | job | 4.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
246 | 247 | job | 2.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
248 | 249 | other_install | 1.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
250 | 251 | other_credits | 1.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
252 | 253 | duration | 17.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
254 | 255 | age | 36.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
256 | 257 | history | 8.0 | 1 | NA |
258 | 259 | age | 43.5 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
260 | 261 | duration | 23.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
262 | 263 | purpose | 9.0 | 1 | NA |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | good |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | bad |
0 | 0 | NA | 0.0 | -1 | good |
Building the SVM
load('germanCredit.Rdata')
trPerc <- 0.7
sp <- sample(1:nrow(german),as.integer(trPerc*nrow(german)))
tr <- german[sp,]
ts <- german[-sp,]
s <- svm(response ~ ., german)
ps <- predict(s,ts)
Predicting with the SVM
## 1 3 5 6 8 11 17 24 31 34 36 43 44 48 50
## good good bad good good good good good good good bad good good good good
## 58 63 66 67 68 70 72 73 80 82 83 86 90 93 95
## good good good good good good good good good good good good good good good
## 96 102 105 106 109 112 113 117 121 124 127 128 129 133 134
## bad good good good good good good bad good good good good good good good
## 135 138 145 146 150 151 153 156 157 158 161 165 172 174 175
## good good good bad good good good good good good good good good good good
## 176 178 179 181 184 189 195 198 204 209 210 223 225 231 235
## good good good good good good bad bad good good good good good good good
## 236 237 240 242 243 245 250 251 253 260 262 266 271 275 279
## bad bad good good bad good good good bad good good good good bad good
## 280 285 289 290 292 296 297 302 303 304 305 306 307 308 314
## good good good bad good bad good good good good good good good good good
## 315 316 318 321 324 325 337 340 343 344 347 354 358 361 363
## good bad good good good good good good good good good bad good good good
## 368 373 375 377 378 380 381 394 400 402 403 406 409 414 415
## good good bad good good good good good good good good good good good good
## 416 417 418 421 424 428 429 439 447 448 452 453 454 460 464
## good good good good good good good good bad good good good good good good
## 469 471 472 473 474 475 478 481 482 487 491 499 503 504 506
## good good bad good good good good good good good good good good good good
## 508 509 510 513 518 522 527 530 537 542 543 546 548 557 565
## good good good good good good good good good good good bad good good good
## 566 567 569 573 575 579 582 583 585 592 600 601 608 611 612
## good good good good good bad good good good good good good bad good good
## 613 616 618 619 624 627 632 644 647 650 651 659 660 661 663
## bad good good good good good bad good good bad bad good good good good
## 665 667 669 671 673 675 676 678 681 684 685 693 694 700 705
## good good good good good good good bad good good good good good good good
## 709 710 716 717 718 725 728 734 749 751 753 756 765 773 776
## good good good good good good bad good good good good bad good good bad
## 780 782 787 791 792 794 795 806 807 808 821 826 829 832 834
## good good good good good good good bad good good good good good bad good
## 837 840 844 846 852 853 864 869 871 878 879 883 884 886 887
## good good good good good good good good good good good good good bad good
## 892 894 899 900 901 903 906 909 910 913 929 932 933 935 936
## good good good good good good good good good good good good good good bad
## 948 949 954 956 959 968 969 972 974 981 984 995 998 999 1000
## good good bad good good good good good bad good good good good bad good
## Levels: bad good
The Confusion Matrix for the SVM
confuseSVM <- table(ps,ts$response)
colnames(confuseSVM) <- c('Predict Good','Predict Bad')
rownames(confuseSVM) <- c('Actual Good','Actual Bad')
Predict Good | Predict Bad | |
---|---|---|
Actual Good | 34 | 3 |
Actual Bad | 57 | 206 |
The Error Rate of the SVM The Error Rate measures the proportion of the predictions that are incorrect.
mc <- table(ps, ts$response)
erro <- 100 * (1-sum(diag(mc))/sum(mc))
erro
## [1] 20
The Overall Benefit/Cost of the SVM Calculated by multiplying the Confusion Matrix by the Cost/Benfit Matrix
utilitySVM <- sum(confuseSVM*costMatrix)
utilitySVM
## [1] 288
SVM using the performanceEstimation package
load('germanCredit.Rdata')
svm1 <- performanceEstimation(
PredTask(response ~ ., german),
workflowVariants("standardWF", learner="svm",
learner.pars=list(cost=c(1,10),
gamma=c(0.1,0.01))),
HldSettings(nReps=5,hldSz=0.3))
##
##
## ##### PERFORMANCE ESTIMATION USING HOLD OUT #####
##
## ** PREDICTIVE TASK :: german
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v1
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v2
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v3
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v4
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
plot(svm1)
Random Forest Model using the performanceEstimation package
library(performanceEstimation)
load('germanCredit.Rdata')
rf1 <- performanceEstimation(
PredTask(response ~ ., german),
workflowVariants("standardWF", learner="randomForest",
learner.pars=list(cost=c(1,10),
gamma=c(0.1,0.01))),
HldSettings(nReps=5,hldSz=0.3))
##
##
## ##### PERFORMANCE ESTIMATION USING HOLD OUT #####
##
## ** PREDICTIVE TASK :: german
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v1
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v2
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v3
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v4
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
plot(rf1)
Comparing both models using the performanceEstimation package
load('germanCredit.Rdata')
result <- performanceEstimation(
c(PredTask(response ~ ., german), PredTask(response ~ ., german)),
c(workflowVariants("standardWF", learner="svm",
learner.pars=list(cost=c(1,10),
gamma=c(0.1,0.01))),
workflowVariants("standardWF", learner="randomForest",
learner.pars=list(se=c(0,0.5,1)),
predictor.pars=list(type="class"))),
HldSettings(nReps=5,hldSz=0.3))
##
##
## ##### PERFORMANCE ESTIMATION USING HOLD OUT #####
##
## ** PREDICTIVE TASK :: german
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v1
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v2
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v3
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v4
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v1
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v2
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v3
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ** PREDICTIVE TASK :: german
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v1
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v2
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v3
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> svm.v4
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v1
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v2
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
##
##
## ++ MODEL/WORKFLOW :: standardWF variant -> randomForest.v3
##
## 5 x 70 %/ 30 % Holdout run with seed = 1234
## Repetition : 1 2 3 4 5
plot(result)
## $german
## Workflow Estimate
## err randomForest.v1 0.255
##
## $german
## Workflow Estimate
## err randomForest.v1 0.255
## Workflow Object:
## Workflow ID :: randomForest.v1
## Workflow Function :: standardWF
## Parameter values:
## learner.pars -> se=0
## predictor.pars -> type=class
## learner -> randomForest