Obtaining a Classification Model for a Bank Credit dataset

Outline

Supplemental Analysis + Description of the German credit dataset + Loading the dataset and installing the necessary packages + Utilization of Random Forest to determine Variable Importance + A Classification Tree for the German credit dataset

Random Forest (Requirement #1) + Building the Random Forest Model + The Confusion Matrix for the Random Forest Model + The Error Rate for the Random Forest Model + The Overall Benefit/Cost of the Random Forest Model + A Visualization of (1) Random Forest Model

Support Vector Machine (SVM) (Requirement #2) +

Description of the German credit dataset.

1. Title: German Credit data

2. Source Information

Professor Dr. Hans Hofmann  
Institut f"ur Statistik und "Okonometrie  
Universit"at Hamburg  
FB Wirtschaftswissenschaften  
Von-Melle-Park 5    
2000 Hamburg 13 

3. Number of Instances: 1000

4. Two datasets are provided. The original dataset, in the form provided by Prof. Hofmann, contains categorical/symbolic attributes and is in the file “german.data”.

5. For algorithms that need numerical attributes, Strathclyde University produced the file “german.data-numeric”.
This file has been edited and several indicator variables added to make it suitable for algorithms which cannot cope with categorical variables. Severalattributes that are ordered categorical (such as attribute 17) have been coded as integer. This was the form used by StatLog.

6. Number of Attributes german: 20 (7 numerical, 13 categorical) Number of Attributes german.numer: 24 (24 numerical)

7. Attribute description for german

8. Cost Matrix

This dataset requires use of a cost matrix (see below)

costMatrix <- matrix(c(0,5,1,0),ncol=2)
colnames(costMatrix) <- c('Predict Good','Predict Bad')
rownames(costMatrix) <- c('Actual Good','Actual Bad')
Predict Good Predict Bad
Actual Good 0 1
Actual Bad 5 0

(1 = Good, 2 = Bad)

The rows represent the actual classification and the columns the predicted classification.

**It is worse to class a customer as good when they are bad (5), than it is to class a customer as bad when they are good (1).**

Load the German Credit dataset and the necessary packages

## Warning: package 'randomForest' was built under R version 3.1.1
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
## Warning: package 'rattle' was built under R version 3.1.1
## Rattle: A free graphical interface for data mining with R.
## Version 3.1.0 Copyright (c) 2006-2014 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
## Warning: package 'DMwR' was built under R version 3.1.1
## Loading required package: lattice
## Loading required package: grid
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
## Warning: package 'e1071' was built under R version 3.1.1
## Warning: package 'performanceEstimation' was built under R version 3.1.1
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.1.1

Utilization of Random Forest to determine Variable Importance

plot of chunk unnamed-chunk-4

MeanDecreaseAccuracy MeanDecreaseGini
chk_acct 34.197 45.432
duration 19.263 39.800
history 14.765 25.718
purpose 8.926 36.170
amount 13.196 52.916
sav_acct 9.393 21.216
employment 5.846 24.054
install_rate 5.000 16.830
pstatus 3.392 15.128
other_debtor 10.570 7.390
time_resid 4.704 15.836
property 6.819 19.271
age 7.951 39.791
other_install 7.370 11.201
housing 5.533 9.647
other_credits 5.061 9.178
job 3.257 12.381
num_depend 3.135 5.771
telephone 4.249 6.062
foreign 0.509 1.514

A Classification Tree for the German credit dataset Utilizing the ‘rpart’ and ‘rattle’ packages

## Loading required package: rpart.plot
## Loading required package: rpart
## Loading required package: RColorBrewer
## Warning: package 'RColorBrewer' was built under R version 3.1.1

plot of chunk unnamed-chunk-6

Random Forest (Requirement #1)

Building the Random Forest Model

load('germanCredit.Rdata')
set.seed(1234)
trPerc <- 0.7
sp <- sample(1:nrow(german),as.integer(trPerc*nrow(german)))
tr <- german[sp,]
ts <- german[-sp,]
m <- randomForest(response ~ ., tr, ntree = 3000)
ps <- predict(m, ts)

Predicting with the Random Forest Model

##    4    5    7    8   11   15   16   20   24   28   29   33   34   38   41 
## good  bad good good  bad  bad  bad good good good good good good good good 
##   42   44   46   47   48   51   52   53   56   57   58   60   62   64   66 
## good good good good good good good good good good good  bad good  bad good 
##   73   76   77   79   80   87   88   93   98  100  103  107  108  112  118 
## good good  bad good good good good good good good good good good good good 
##  121  124  125  129  134  136  137  138  140  142  143  144  147  150  153 
## good good good good good good good good good  bad good good good good  bad 
##  159  161  164  168  173  177  179  187  188  191  203  205  207  208  212 
## good good good good good good good good good good good good good good good 
##  213  216  218  219  226  227  229  232  234  236  237  239  241  251  253 
## good good  bad  bad  bad good good good good good good good  bad good  bad 
##  256  259  260  262  264  265  267  286  287  290  301  303  314  318  319 
## good good good good good good good  bad  bad  bad good good good good good 
##  325  326  328  329  334  336  337  338  340  351  358  359  364  365  370 
## good good good good good good good good  bad good good good good good good 
##  372  383  385  387  404  406  407  408  410  424  425  428  429  432  437 
## good good good good good good good good good good good good good  bad good 
##  441  444  445  455  456  460  465  468  469  482  486  495  497  498  499 
## good good  bad good good good good good good  bad good good  bad good good 
##  505  511  517  519  521  523  525  540  553  556  557  560  561  562  564 
##  bad good good good good  bad good good good good  bad good good good good 
##  567  568  570  571  572  574  575  577  585  588  594  598  600  605  608 
## good good  bad  bad good  bad good good good good  bad good good good good 
##  610  619  625  628  630  632  633  639  644  648  650  651  654  658  659 
## good good good good good  bad good good good good  bad  bad good good  bad 
##  664  666  670  671  675  676  678  683  684  689  692  696  697  701  704 
## good good good good good good  bad good good good good good good good good 
##  706  710  712  714  723  725  729  730  731  737  739  740  741  744  745 
## good good  bad good  bad good  bad good good  bad good  bad good good  bad 
##  747  750  752  756  759  760  763  767  775  782  790  793  795  797  802 
## good good  bad  bad good good good good good good  bad good good good good 
##  816  822  826  829  831  839  841  846  849  851  855  859  863  864  866 
##  bad good  bad good good good  bad good good good good good good good good 
##  870  878  879  880  882  884  885  889  891  893  894  901  908  910  915 
##  bad good good good good good good good good good good good good good  bad 
##  922  923  924  925  926  928  929  933  934  936  937  938  939  940  941 
## good good good good  bad  bad good good good good good good  bad good good 
##  945  947  956  962  968  971  976  978  980  981  984  993  994  995 1000 
## good good good good good good good good  bad good good good  bad good good 
## Levels: bad good

The Confusion Matrix for the Random Forest Model

confuseRF <- table(ps,ts$response)
colnames(confuseRF) <- c('Predict Good','Predict Bad')
rownames(confuseRF) <- c('Actual Good','Actual Bad')
Predict Good Predict Bad
Actual Good 35 18
Actual Bad 64 183

The Error Rate of the Random Forest Model The Error Rate measures the proportion of the predictions that are incorrect.

err <- 100 * (1 - sum(diag(confuseRF))/sum(confuseRF))
err
## [1] 27.33

The Overall Benefit/Cost of the Random Forest Model Calculated by multiplying the Confusion Matrix by the Cost/Benfit Matrix

utilityRF <- sum(confuseRF*costMatrix)
utilityRF
## [1] 338

A Visualization of (1) Random Forest Model

tree <- getTree(m,1,labelVar=TRUE)
left daughter right daughter split var split point status prediction
2 3 property 3.0 1 NA
4 5 pstatus 3.0 1 NA
6 7 age 29.5 1 NA
8 9 chk_acct 1.0 1 NA
10 11 sav_acct 12.0 1 NA
12 13 employment 15.0 1 NA
14 15 chk_acct 3.0 1 NA
16 17 purpose 42.0 1 NA
18 19 sav_acct 12.0 1 NA
0 0 NA 0.0 -1 good
20 21 foreign 1.0 1 NA
22 23 sav_acct 1.0 1 NA
24 25 pstatus 3.0 1 NA
26 27 sav_acct 9.0 1 NA
28 29 sav_acct 8.0 1 NA
30 31 amount 2609.0 1 NA
32 33 age 64.0 1 NA
0 0 NA 0.0 -1 good
34 35 age 25.5 1 NA
36 37 time_resid 2.5 1 NA
0 0 NA 0.0 -1 good
38 39 job 6.0 1 NA
40 41 other_credits 3.5 1 NA
0 0 NA 0.0 -1 good
42 43 history 1.0 1 NA
44 45 telephone 1.0 1 NA
46 47 other_install 1.0 1 NA
0 0 NA 0.0 -1 bad
48 49 amount 9492.5 1 NA
50 51 employment 9.0 1 NA
0 0 NA 0.0 -1 good
52 53 amount 1237.0 1 NA
0 0 NA 0.0 -1 good
54 55 duration 13.5 1 NA
56 57 chk_acct 2.0 1 NA
58 59 amount 6201.0 1 NA
60 61 amount 969.5 1 NA
62 63 pstatus 5.0 1 NA
64 65 pstatus 4.0 1 NA
66 67 duration 22.5 1 NA
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
68 69 property 4.0 1 NA
70 71 history 13.0 1 NA
72 73 history 8.0 1 NA
0 0 NA 0.0 -1 good
74 75 purpose 17.0 1 NA
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
76 77 property 1.0 1 NA
78 79 amount 997.5 1 NA
80 81 amount 2272.0 1 NA
82 83 job 2.0 1 NA
84 85 purpose 73.0 1 NA
86 87 purpose 5.0 1 NA
88 89 history 1.0 1 NA
90 91 sav_acct 1.0 1 NA
92 93 num_depend 1.5 1 NA
94 95 other_install 2.0 1 NA
96 97 chk_acct 1.0 1 NA
98 99 employment 6.0 1 NA
100 101 amount 2170.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
102 103 amount 2799.5 1 NA
104 105 age 32.5 1 NA
106 107 age 50.5 1 NA
108 109 amount 3317.5 1 NA
110 111 amount 1686.5 1 NA
0 0 NA 0.0 -1 good
112 113 install_rate 3.5 1 NA
114 115 sav_acct 4.0 1 NA
0 0 NA 0.0 -1 good
116 117 time_resid 2.5 1 NA
0 0 NA 0.0 -1 good
118 119 time_resid 3.5 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
120 121 other_credits 1.5 1 NA
0 0 NA 0.0 -1 good
122 123 chk_acct 2.0 1 NA
124 125 time_resid 3.0 1 NA
126 127 chk_acct 6.0 1 NA
0 0 NA 0.0 -1 bad
128 129 age 31.5 1 NA
130 131 install_rate 1.5 1 NA
132 133 duration 42.0 1 NA
134 135 pstatus 4.0 1 NA
136 137 chk_acct 1.0 1 NA
138 139 duration 42.0 1 NA
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
140 141 employment 8.0 1 NA
142 143 other_debtor 2.0 1 NA
144 145 other_debtor 3.0 1 NA
146 147 age 25.0 1 NA
148 149 purpose 16.0 1 NA
150 151 property 4.0 1 NA
152 153 duration 39.0 1 NA
0 0 NA 0.0 -1 good
154 155 age 22.5 1 NA
156 157 history 2.0 1 NA
158 159 amount 613.5 1 NA
0 0 NA 0.0 -1 bad
160 161 num_depend 1.5 1 NA
162 163 employment 12.0 1 NA
0 0 NA 0.0 -1 bad
164 165 other_credits 1.5 1 NA
166 167 duration 45.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
168 169 employment 2.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
170 171 age 32.0 1 NA
0 0 NA 0.0 -1 bad
172 173 purpose 16.0 1 NA
0 0 NA 0.0 -1 bad
174 175 purpose 8.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
176 177 other_credits 1.5 1 NA
178 179 install_rate 2.5 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
180 181 amount 4102.5 1 NA
182 183 history 2.0 1 NA
184 185 other_install 3.0 1 NA
186 187 age 25.0 1 NA
0 0 NA 0.0 -1 bad
188 189 amount 1613.5 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
190 191 time_resid 3.5 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
192 193 purpose 9.0 1 NA
194 195 property 1.0 1 NA
196 197 age 33.5 1 NA
198 199 install_rate 3.0 1 NA
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
200 201 foreign 1.0 1 NA
0 0 NA 0.0 -1 good
202 203 other_credits 1.5 1 NA
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
204 205 age 26.0 1 NA
0 0 NA 0.0 -1 bad
206 207 duration 19.0 1 NA
0 0 NA 0.0 -1 good
208 209 other_debtor 2.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
210 211 time_resid 2.0 1 NA
212 213 chk_acct 4.0 1 NA
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
214 215 purpose 128.0 1 NA
216 217 age 31.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
218 219 telephone 1.0 1 NA
220 221 employment 12.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 good
222 223 install_rate 3.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
224 225 age 30.0 1 NA
226 227 employment 8.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
228 229 time_resid 3.0 1 NA
230 231 duration 7.5 1 NA
0 0 NA 0.0 -1 bad
232 233 purpose 8.0 1 NA
0 0 NA 0.0 -1 good
234 235 purpose 8.0 1 NA
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
236 237 amount 1324.0 1 NA
238 239 install_rate 2.5 1 NA
240 241 sav_acct 2.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
242 243 chk_acct 2.0 1 NA
244 245 job 4.0 1 NA
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
246 247 job 2.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
248 249 other_install 1.0 1 NA
0 0 NA 0.0 -1 good
250 251 other_credits 1.5 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
252 253 duration 17.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
254 255 age 36.0 1 NA
0 0 NA 0.0 -1 good
256 257 history 8.0 1 NA
258 259 age 43.5 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
260 261 duration 23.0 1 NA
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good
262 263 purpose 9.0 1 NA
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 good
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 bad
0 0 NA 0.0 -1 good

Support Vector Machine (SVM) (Requirement #2)

Building the SVM

load('germanCredit.Rdata')
trPerc <- 0.7
sp <- sample(1:nrow(german),as.integer(trPerc*nrow(german)))
tr <- german[sp,]
ts <- german[-sp,]

s <- svm(response ~ ., german)
ps <- predict(s,ts)

Predicting with the SVM

##    1    3    5    6    8   11   17   24   31   34   36   43   44   48   50 
## good good  bad good good good good good good good  bad good good good good 
##   58   63   66   67   68   70   72   73   80   82   83   86   90   93   95 
## good good good good good good good good good good good good good good good 
##   96  102  105  106  109  112  113  117  121  124  127  128  129  133  134 
##  bad good good good good good good  bad good good good good good good good 
##  135  138  145  146  150  151  153  156  157  158  161  165  172  174  175 
## good good good  bad good good good good good good good good good good good 
##  176  178  179  181  184  189  195  198  204  209  210  223  225  231  235 
## good good good good good good  bad  bad good good good good good good good 
##  236  237  240  242  243  245  250  251  253  260  262  266  271  275  279 
##  bad  bad good good  bad good good good  bad good good good good  bad good 
##  280  285  289  290  292  296  297  302  303  304  305  306  307  308  314 
## good good good  bad good  bad good good good good good good good good good 
##  315  316  318  321  324  325  337  340  343  344  347  354  358  361  363 
## good  bad good good good good good good good good good  bad good good good 
##  368  373  375  377  378  380  381  394  400  402  403  406  409  414  415 
## good good  bad good good good good good good good good good good good good 
##  416  417  418  421  424  428  429  439  447  448  452  453  454  460  464 
## good good good good good good good good  bad good good good good good good 
##  469  471  472  473  474  475  478  481  482  487  491  499  503  504  506 
## good good  bad good good good good good good good good good good good good 
##  508  509  510  513  518  522  527  530  537  542  543  546  548  557  565 
## good good good good good good good good good good good  bad good good good 
##  566  567  569  573  575  579  582  583  585  592  600  601  608  611  612 
## good good good good good  bad good good good good good good  bad good good 
##  613  616  618  619  624  627  632  644  647  650  651  659  660  661  663 
##  bad good good good good good  bad good good  bad  bad good good good good 
##  665  667  669  671  673  675  676  678  681  684  685  693  694  700  705 
## good good good good good good good  bad good good good good good good good 
##  709  710  716  717  718  725  728  734  749  751  753  756  765  773  776 
## good good good good good good  bad good good good good  bad good good  bad 
##  780  782  787  791  792  794  795  806  807  808  821  826  829  832  834 
## good good good good good good good  bad good good good good good  bad good 
##  837  840  844  846  852  853  864  869  871  878  879  883  884  886  887 
## good good good good good good good good good good good good good  bad good 
##  892  894  899  900  901  903  906  909  910  913  929  932  933  935  936 
## good good good good good good good good good good good good good good  bad 
##  948  949  954  956  959  968  969  972  974  981  984  995  998  999 1000 
## good good  bad good good good good good  bad good good good good  bad good 
## Levels: bad good

The Confusion Matrix for the SVM

confuseSVM <- table(ps,ts$response)
colnames(confuseSVM) <- c('Predict Good','Predict Bad')
rownames(confuseSVM) <- c('Actual Good','Actual Bad')
Predict Good Predict Bad
Actual Good 34 3
Actual Bad 57 206

The Error Rate of the SVM The Error Rate measures the proportion of the predictions that are incorrect.

mc <- table(ps, ts$response)
erro <- 100 * (1-sum(diag(mc))/sum(mc))
erro
## [1] 20

The Overall Benefit/Cost of the SVM Calculated by multiplying the Confusion Matrix by the Cost/Benfit Matrix

utilitySVM <- sum(confuseSVM*costMatrix)
utilitySVM
## [1] 288

Comparing the Models (Requirement 3)

SVM using the performanceEstimation package

load('germanCredit.Rdata')
svm1 <- performanceEstimation(
  PredTask(response ~ ., german),
  workflowVariants("standardWF", learner="svm",
                   learner.pars=list(cost=c(1,10),
                                   gamma=c(0.1,0.01))),
  HldSettings(nReps=5,hldSz=0.3))
## 
## 
## ##### PERFORMANCE ESTIMATION USING  HOLD OUT  #####
## 
## ** PREDICTIVE TASK :: german
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v1 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v2 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v3 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v4 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
plot(svm1)

plot of chunk unnamed-chunk-21

Random Forest Model using the performanceEstimation package

library(performanceEstimation)

load('germanCredit.Rdata')
rf1 <- performanceEstimation(
  PredTask(response ~ ., german),
  workflowVariants("standardWF", learner="randomForest",
                   learner.pars=list(cost=c(1,10),
                                     gamma=c(0.1,0.01))),
  HldSettings(nReps=5,hldSz=0.3))
## 
## 
## ##### PERFORMANCE ESTIMATION USING  HOLD OUT  #####
## 
## ** PREDICTIVE TASK :: german
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v1 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v2 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v3 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v4 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
plot(rf1)

plot of chunk unnamed-chunk-22

Comparing both models using the performanceEstimation package

load('germanCredit.Rdata')
result <- performanceEstimation(
  c(PredTask(response ~ ., german), PredTask(response ~ ., german)),
  c(workflowVariants("standardWF", learner="svm",
                     learner.pars=list(cost=c(1,10),
                                       gamma=c(0.1,0.01))),
    workflowVariants("standardWF", learner="randomForest",
                     learner.pars=list(se=c(0,0.5,1)),
                     predictor.pars=list(type="class"))),
  HldSettings(nReps=5,hldSz=0.3))
## 
## 
## ##### PERFORMANCE ESTIMATION USING  HOLD OUT  #####
## 
## ** PREDICTIVE TASK :: german
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v1 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v2 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v3 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v4 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v1 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v2 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v3 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ** PREDICTIVE TASK :: german
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v1 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v2 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v3 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  svm.v4 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v1 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v2 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
## 
## 
## ++ MODEL/WORKFLOW :: standardWF  variant ->  randomForest.v3 
## 
##  5 x 70 %/ 30 % Holdout run with seed =  1234 
## Repetition :  1  2  3  4  5
plot(result)

plot of chunk unnamed-chunk-23

## $german
##            Workflow Estimate
## err randomForest.v1    0.255
## 
## $german
##            Workflow Estimate
## err randomForest.v1    0.255
## Workflow Object:
##  Workflow ID       ::  randomForest.v1 
##  Workflow Function ::  standardWF
##      Parameter values:
##       learner.pars  ->  se=0 
##       predictor.pars  ->  type=class 
##       learner  ->  randomForest