Test Algorithm

Tunning algorithm is important in bulding modeling. In random forest model, you can not pre-understand your result because your model are randomly processing. Tunning algorithm will help you control training proccess and gain better result. In this study, we will focus on two main tunning parameters in random forest model is mtry and ntree. Beside, there are many other method but these two parameters perhaps most likely have biggest affect to model accuracy.

In below result we use repeatedcv method to divide our dataset into 10 folds cross-validation and repeat only 3 repeat times in order to slows down our process. I will hold back validation set for back testing.

#https://machinelearningmastery.com/tune-machine-learning-algorithms-in-r/
library(randomForest)
library(mlbench)
library(caret)
library(e1071)
 
# Load Dataset
data(Sonar)
dataset <- Sonar
x <- dataset[,1:60]
y <- dataset[,61]

#10 folds repeat 3 times
control <- trainControl(method='repeatedcv', 
                        number=10, 
                        repeats=3)
#Metric compare model is Accuracy
metric <- "Accuracy"
set.seed(123)
#Number randomely variable selected is mtry
mtry <- sqrt(ncol(x))
tunegrid <- expand.grid(.mtry=mtry)
rf_default <- train(Class~., 
                      data=dataset, 
                      method='rf', 
                      metric='Accuracy', 
                      tuneGrid=tunegrid, 
                      trControl=control)
print(rf_default)
## Random Forest 
## 
## 208 samples
##  60 predictor
##   2 classes: 'M', 'R' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 187, 187, 187, 188, 188, 187, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8408442  0.6765085
## 
## Tuning parameter 'mtry' was held constant at a value of 7.745967

Tune by tools

In randomeForest() have tuneRF() for searching best optimal mtry values given for your data. We will depend on OOBError to define the most accurate mtry for our model which have the least OOBEError.

set.seed(1)
bestMtry <- tuneRF(x,y, stepFactor = 1.5, improve = 1e-5, ntree = 500)
## mtry = 7  OOB error = 14.9% 
## Searching left ...
## mtry = 5     OOB error = 12.5% 
## 0.1612903 1e-05 
## mtry = 4     OOB error = 14.9% 
## -0.1923077 1e-05 
## Searching right ...
## mtry = 10    OOB error = 15.38% 
## -0.2307692 1e-05

print(bestMtry)
##        mtry  OOBError
## 4.OOB     4 0.1490385
## 5.OOB     5 0.1250000
## 7.OOB     7 0.1490385
## 10.OOB   10 0.1538462

According to this results, mtry = 5 is the best parameter for our model. This is quite different with Grid search method when accuracy by 82% at mtry = 82.5%.