Caret classification example

Using the iris dataset:

library(dplyr)
library(caret)

iris %>% head

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Create a binary response:

d <- iris %>% 
  mutate(response=factor(ifelse(Species=='versicolor', 'IsVersicolor', 'NotVersicolor'))) %>% 
  select(-Species) %>% 
  arrange(sample(nrow(.))) # Arrange rows randomly

Train a model:

# Train random forest (over 10 fold CV) and show results
# * The 'tuneLength' parameter automatically takes care of trying several random forest settings
tc <- trainControl(method = 'cv', number=10, savePredictions = T, classProbs = T)
m <- train(response ~ ., data=d, method='rf', tuneLength=3, trControl = tc) # Method could be 'gbm' instead
m

## Random Forest 
## 
## 150 samples
##   4 predictor
##   2 classes: 'IsVersicolor', 'NotVersicolor' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa      Accuracy SD  Kappa SD 
##   2     0.9466667  0.8826521  0.06126244   0.1341698
##   3     0.9533333  0.8956391  0.05488484   0.1233041
##   4     0.9533333  0.8956391  0.05488484   0.1233041
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was mtry = 3.

Show accuracy and other stats:

confusionMatrix(table(m$pred$obs, m$pred$pred))

## Confusion Matrix and Statistics
## 
##                
##                 IsVersicolor NotVersicolor
##   IsVersicolor           141             9
##   NotVersicolor           13           287
##                                           
##                Accuracy : 0.9511          
##                  95% CI : (0.9269, 0.9691)
##     No Information Rate : 0.6578          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8907          
##  Mcnemar's Test P-Value : 0.5224          
##                                           
##             Sensitivity : 0.9156          
##             Specificity : 0.9696          
##          Pos Pred Value : 0.9400          
##          Neg Pred Value : 0.9567          
##              Prevalence : 0.3422          
##          Detection Rate : 0.3133          
##    Detection Prevalence : 0.3333          
##       Balanced Accuracy : 0.9426          
##                                           
##        'Positive' Class : IsVersicolor    
##