Gui Larangeira - HW 6 - Random Forests - 05/22/2017

We tried applying RFs to the Credit data from ISLR book, but for some reason the algorithm did not converge. So we are using the same German Credit data once more. We do a 70/30 split between training and testing.

Step 3 Training a simple Random Forest:

model <- randomForest(default ~ . , data = train, ntree=1000, mtry=5)
model <- randomForest(default ~ . , data = train, ntree=1000, mtry=5)
model

Call:
 randomForest(formula = default ~ ., data = train, ntree = 1000,      mtry = 5) 
               Type of random forest: classification
                     Number of trees: 1000
No. of variables tried at each split: 5

        OOB estimate of  error rate: 24.43%
Confusion matrix:
     no yes class.error
no  450  41  0.08350305
yes 130  79  0.62200957

Step 4 - Evaluating Model Performance

After training the model, we evaluate it’s performance on the test data:

sum(pred==test$default) / nrow(test)
[1] 0.76

A accuracy of 0.76 leaves room for improvement so we follow on to step 5, boosting and tuning the RF (could be Radio Frequency but meaning Rand Forest!)

Step 5 - Improving Model Performance

We use Cross-Validation for tuning and C5.0 for boosting.

m_c50 <- train(default ~ ., data = train, method = "C5.0",
               metric = "Kappa", trControl = ctrl,
               tuneGrid = grid_c50)
<U+393C><U+3E31>!<U+393C><U+3E32> not meaningful for factors

And again we look to evaluate the performance of the trained model on the test data:

sum(pred==test$default) / nrow(test)
[1] 0.7233333

After a pretty lenghty number-crunching session, our results have actually worsened!

LS0tDQp0aXRsZTogIkNTVUVCIFNUQVRTIDY2MjAgLSBNYWNoaW5lIExlYXJuaW5nIHdpdGggUiAtIFNwcmluZyAyMDE3IC0gUHJvZiBFLiBTdWVzcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCiMjIyBHdWkgTGFyYW5nZWlyYSAtIEhXIDYgLSBSYW5kb20gRm9yZXN0cyAtIDA1LzIyLzIwMTcNCg0KV2UgdHJpZWQgYXBwbHlpbmcgUkZzIHRvIHRoZSBDcmVkaXQgZGF0YSBmcm9tIElTTFIgYm9vaywgYnV0IGZvciBzb21lIHJlYXNvbiB0aGUgYWxnb3JpdGhtIGRpZCBub3QgY29udmVyZ2UuIFNvIHdlIGFyZSB1c2luZyB0aGUgc2FtZSBHZXJtYW4gQ3JlZGl0IGRhdGEgb25jZSBtb3JlLiBXZSBkbyBhIDcwLzMwIHNwbGl0IGJldHdlZW4gdHJhaW5pbmcgYW5kIHRlc3RpbmcuDQoNCiMjIyMgU3RlcCAzIFRyYWluaW5nIGEgc2ltcGxlIFJhbmRvbSBGb3Jlc3Q6DQoNCmBgYHtyfQ0KY3JlZGl0IDwtIHJlYWQuY3N2KCJodHRwOi8vd3d3LnNjaS5jc3VlYXN0YmF5LmVkdS9+ZXN1ZXNzL2NsYXNzZXMvU3RhdGlzdGljc182NjIwL1ByZXNlbnRhdGlvbnMvbWwxMC9jcmVkaXQuY3N2IikNCnRhYmxlKGNyZWRpdCRkZWZhdWx0KQ0KDQpzZXQuc2VlZCgxMjMpDQpzYW1wIDwtIHNhbXBsZShucm93KGNyZWRpdCksIDAuNyAqIG5yb3coY3JlZGl0KSkNCnRyYWluIDwtIGNyZWRpdFtzYW1wLCBdDQp0ZXN0IDwtIGNyZWRpdFstc2FtcCwgXQ0KDQpsaWJyYXJ5KHJhbmRvbUZvcmVzdCkNCm1vZGVsIDwtIHJhbmRvbUZvcmVzdChkZWZhdWx0IH4gLiAsIGRhdGEgPSB0cmFpbiwgbnRyZWU9MTAwMCwgbXRyeT01KQ0KbW9kZWwNCmBgYA0KDQojIyMjIFN0ZXAgNCAtIEV2YWx1YXRpbmcgTW9kZWwgUGVyZm9ybWFuY2UNCg0KQWZ0ZXIgdHJhaW5pbmcgdGhlIG1vZGVsLCB3ZSBldmFsdWF0ZSBpdCdzIHBlcmZvcm1hbmNlIG9uIHRoZSB0ZXN0IGRhdGE6DQoNCmBgYHtyfQ0KcHJlZCA8LSBwcmVkaWN0KG1vZGVsLCBuZXdkYXRhID0gdGVzdCkNCnRhYmxlKHByZWQsIHRlc3QkZGVmYXVsdCkNCnN1bShwcmVkPT10ZXN0JGRlZmF1bHQpIC8gbnJvdyh0ZXN0KQ0KDQpgYGANCg0KQSBhY2N1cmFjeSBvZiAwLjc2IGxlYXZlcyByb29tIGZvciBpbXByb3ZlbWVudCBzbyB3ZSBmb2xsb3cgb24gdG8gc3RlcCA1LCBib29zdGluZyBhbmQgdHVuaW5nIHRoZSBSRiAoY291bGQgYmUgUmFkaW8gRnJlcXVlbmN5IGJ1dCBtZWFuaW5nIFJhbmQgRm9yZXN0ISkNCg0KIyMjIyBTdGVwIDUgLSBJbXByb3ZpbmcgTW9kZWwgUGVyZm9ybWFuY2UNCg0KV2UgdXNlIENyb3NzLVZhbGlkYXRpb24gZm9yIHR1bmluZyBhbmQgQzUuMCBmb3IgYm9vc3RpbmcuDQoNCmBgYHtyIH0NCiMgc3VtbWFyeSBvZiB0dW5pbmcgcmVzdWx0cw0KbGlicmFyeShjYXJldCkNCmN0cmwgPC0gdHJhaW5Db250cm9sKG1ldGhvZCA9ICJyZXBlYXRlZGN2IiwNCiAgICAgICAgICAgICAgICAgICAgIG51bWJlciA9IDEwLCByZXBlYXRzID0gMTApDQoNCg0KIyBhdXRvLXR1bmUgYSBib29zdGVkIEM1LjAgZGVjaXNpb24gdHJlZQ0KZ3JpZF9jNTAgPC0gZXhwYW5kLmdyaWQoLm1vZGVsID0gInRyZWUiLA0KICAgICAgICAgICAgICAgICAgICAgICAgLnRyaWFscyA9IGMoMTAsIDIwLCAzMCwgNDApLA0KICAgICAgICAgICAgICAgICAgICAgICAgLndpbm5vdyA9ICJGQUxTRSIpDQoNCnNldC5zZWVkKDMwMCkNCm1fYzUwIDwtIHRyYWluKGRlZmF1bHQgfiAuLCBkYXRhID0gdHJhaW4sIG1ldGhvZCA9ICJDNS4wIiwNCiAgICAgICAgICAgICAgIG1ldHJpYyA9ICJLYXBwYSIsIHRyQ29udHJvbCA9IGN0cmwsDQogICAgICAgICAgICAgICB0dW5lR3JpZCA9IGdyaWRfYzUwKQ0KbV9jNTANCg0KYGBgDQoNCkFuZCBhZ2FpbiB3ZSBsb29rIHRvIGV2YWx1YXRlIHRoZSBwZXJmb3JtYW5jZSBvZiB0aGUgdHJhaW5lZCBtb2RlbCBvbiB0aGUgdGVzdCBkYXRhOg0KYGBge3J9DQpwcmVkIDwtIHByZWRpY3QobV9jNTAsIG5ld2RhdGEgPSB0ZXN0KQ0KdGFibGUocHJlZCwgdGVzdCRkZWZhdWx0KQ0Kc3VtKHByZWQ9PXRlc3QkZGVmYXVsdCkgLyBucm93KHRlc3QpDQoNCmBgYA0KQWZ0ZXIgYSBwcmV0dHkgbGVuZ2h0eSBudW1iZXItY3J1bmNoaW5nIHNlc3Npb24sIG91ciByZXN1bHRzIGhhdmUgYWN0dWFsbHkgd29yc2VuZWQhDQoNCg0K